Vietnamese Multi-category Words Disambiguation Combined with Language Features
CSTR:
Author:
Affiliation:

1.School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China;2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, 650500, China

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Multi-category words disambiguation directly affects the part of speech (POS) tagging accuracy. This paper proposed a statistical disambiguation method combined with linguistic characteristics of Vietnamese multi-category words. First, the paper builds Vietnamese multi-category words dictionary and Vietnamese multi-category words corpus, and selects effective feature sets for multi-category words by analyzing of Vietnamese language and multi-category words. Secondly, the paper takes into account the advantages of adding any features of CRFs model, introduces the syntactic and lexical features excepting the features of words and POS, and then builds up the disambiguation model. Finally, testing is carried out on the real multi-category category words corpus, and the accuracy is 87.23%. Experimental results show that the proposed Vietnamese multi-category words disambiguation model is effective and feasible, which can improve the correct rate of POS tagging.

    Reference
    Related
    Cited by
Get Citation

Guo Jianyi, Zhao Chen, Liu Yanchao, Mao Cunli, Yu Zhengtao. Vietnamese Multi-category Words Disambiguation Combined with Language Features[J].,2019,34(4):577-584.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 31,2017
  • Revised:June 28,2019
  • Adopted:
  • Online: September 01,2019
  • Published:
Article QR Code