融合语言特性的越南语兼类词消歧
作者:
作者单位:

1.昆明理工大学信息工程与自动化学院,昆明,650500;2.昆明理工大学云南省人工智能重点实验室,昆明,650500

作者简介:

通讯作者:

基金项目:

国家自然科学基金重点 61732005国家自然科学基金(61262041,61562052,61662041)资助项目;国家自然科学基金重点(61732005)资助项目。


Vietnamese Multi-category Words Disambiguation Combined with Language Features
Author:
Affiliation:

1.School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China;2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, 650500, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    兼类词歧义直接影响词性标注的准确率。本文针对越南语兼类词歧义问题提出一种融合语言特性的越南语兼类词消歧方法。通过构建越南语兼类词词典和兼类词语料库,分析越南语的语言特征和兼类词特点,选取有效的特征集;然后利用条件随机场能添加任意特征等优点,在使用词和词性上下文信息的同时,引入句法成分和指示词特征,得到消歧模型。最后在兼类词语料上实验,准确率达到了87.23%。实验表明本文所提出的越南语兼类词消歧方法有效可行,可以提高词性标注正确率。

    Abstract:

    Multi-category words disambiguation directly affects the part of speech (POS) tagging accuracy. This paper proposed a statistical disambiguation method combined with linguistic characteristics of Vietnamese multi-category words. First, the paper builds Vietnamese multi-category words dictionary and Vietnamese multi-category words corpus, and selects effective feature sets for multi-category words by analyzing of Vietnamese language and multi-category words. Secondly, the paper takes into account the advantages of adding any features of CRFs model, introduces the syntactic and lexical features excepting the features of words and POS, and then builds up the disambiguation model. Finally, testing is carried out on the real multi-category category words corpus, and the accuracy is 87.23%. Experimental results show that the proposed Vietnamese multi-category words disambiguation model is effective and feasible, which can improve the correct rate of POS tagging.

    参考文献
    相似文献
    引证文献
引用本文

郭剑毅,赵晨,刘艳超,毛存礼,余正涛.融合语言特性的越南语兼类词消歧[J].数据采集与处理,2019,34(4):577-584

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2017-10-31
  • 最后修改日期:2019-06-28
  • 录用日期:
  • 在线发布日期: 2019-09-01