Clustering Analysis of Micro Blogs Based on Active Learning
Author:
Affiliation:
Fund Project:
摘要
|
图/表
|
访问统计
|
参考文献
|
相似文献
|
引证文献
|
资源附件
摘要:
K Means聚类算法由于无法准确确定初始化聚类中心,容易造成 聚类结果准确率低下。对微博数据聚类时,可能会导致无法正确反映兴趣热点。本文 设计了基于主动学习的聚类算法,在确定初始聚类中心过程中应用Min Max主动学习策略, 使 得算法每次在很小数量的查询后都会提供数据点供用户进行初始中心点确认,并在K Means算 法中重新计算聚类中心时设置其权重值,从而减少迭代的数量,提高聚类结果的准确 率,并将这一算法运用于微博聚类分析,得出微博热门话题。
Abstract:
The K Means clustering algorithm can not determine the initial cluster ing centers, which results in low accuracy and inability to reflect the interest ing hotspots. Here, algorithm based on clustering is proposed through applying Min M ax active learning strategy to ask the user for identifying the seed points. Several points are provided in small quantities of query for users to confirm the initial centers, and the weight is set in the recalculation of K Means centers, which reduces the number of iterations and improves the accu racy of clustering results. Moreover, the hot topics are obtained by applying th is algorithm to the micro blog clustering analysis.