基于词亲和度的微博词语语义倾向识别算法
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Semantic Orientation Identification for Terms From Chinese Micro-blogs Based on Word Affinity Measure
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    准确识别词语语义倾向并构建高质量的情感词典,从而提高微博文本情感分析的准确率,具有重要意义。传统的基于语料库方法对种子词选取敏感,并且不能有 效对低频词语语义倾向进行识别。本文提出了一种基于词亲和度的微博词语语义倾向识别算法。利用词性组合模式提取候选词集,选取微博表情符号作为种子词, 并构建词亲和度网络,利用同义词词林对低频词进行扩展,计算候选词与种子词之间语义倾向相似度。根据设定阈值判断词语语义倾向。在200万条微博语料上分别将本文算法与传统算法进行对比,实验结果表明本文算法优于传统算法。

    Abstract:

    How to identify the semantic orientation of terms and build a high-quality sentiment dictionary to improve the accuracy of sentiment analysis on Micro-blogs has significant importance. Traditional algorithms based on corpus are sensitive to the seed words, and cannot effectively identify semantic orientation identification on low-frequency terms. To solve this problem, an algorithm based on word affinity measure is proposed to identify the semantic orientation of terms from Chinese Micro-blogs. Firstly, candidate words are extracted by the part of speech combination patterns. Secondly, Micro-blog emoticons are selected as seed words, and word affinity networks are built. Then, low frequency words are expanded by a synonyms dictionary during calculating the semantic orientation similarity between candidate words and seed words. Finally, the semantic orientation is determined according to the threshold. Experiments are conducted on a corpus with two million Micro-blogs using the proposed algorithm and traditional algorithms respectively. Experimental results show the advantage of the proposed algorithm.

    参考文献
    相似文献
    引证文献
引用本文

唐浩浩,王波,周杰,陈东,刘绍毓.基于词亲和度的微博词语语义倾向识别算法[J].数据采集与处理,2015,30(1):137-147

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2015-03-03