基于词共现网络的微博话题发现方法
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Micro-blog Topic Detection in Frequent Word Networks
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    微博作为一个重要的信息平台,每天都有大量用户访问,重要的舆论事件在微博上会形成热门话题。本文提出了一种新的微博话题发现方法:基于词共现网络的话题发现方法(Topic detection in freqent word network,TDFWN) ,来挖掘微博语料中蕴含的热点话题。该方法首 先对微博文本中的k频繁词集(k≥3)进行挖掘,利用频繁词集的共现关系构建词共现网络。对该 网络进行社区划分,同一社区内的词通常描述同一微博话题,即话题以社区的形式出现。实验结果表明TDFWN算法能够快速、全面地发现微博中的热门话题,并且可以实现微博文本的自动聚类。

    Abstract:

    As an important information platform, micro-blog has a large number of user visits every day, and important public opinion events will form a hot topic on micro-blog. In this study, we propose a novel micro-blog topic detection method, named TDFWN (Topic detection in frequent word networks),to excavate hot topics in micro-blog corpus. First, frequent k-item sets (k≥3) in Microblog text data are mined. Second, a word co-occurrence network is build based on these mined frequent k-item sets. Third, the network is partitioned into different communities by using a community detection method, where each community represents a micro-blog hot topic. At last, the micro-blog text data are clustered into different groups by computing similarity of each micro-blog text with the found topics. The empirical study shows that the TDFWN method is able to find hot topics in micro-blog text data and cluster the micro-blog text data by the found topics simultaneously.

    参考文献
    相似文献
    引证文献
引用本文

李伟 贾彩燕.基于词共现网络的微博话题发现方法[J].数据采集与处理,2018,33(1):186-194

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-04-09