Failing to identify multiword expression (MWE) may cause serious problems for many natural language processing (N LP) tasks. Because of lacking of Chinese MWE tagging corpus, a semi supervised method is used to extract Chinese MWE. DE-Tri-Training semi-supervised clustering algorithm uses supervised information in the beginning of the cluster, and obtains good results. The selection method of original cluster center based head word expansion and the consistency collaborative learning data depuration method based supervised information are proposed, which adds the supervised information into the mid and late steps of clustering, so that classifiers can use correct label information to train it. The contrast experiment show that the extraction results of Chinese multi-word expression using the improved DE-Tri-Training algorithm are better than that of using unimproved one. The effectiveness of the improved DE-Tri-Training algorithm is thus verified.
Reference
Related
Cited by
Get Citation
Liang Yinghong, Tan Hongye, Xian Xuefeng, Huang Dandan, Qian Haizhong, Shen Chunze. Chinese Multi-word Expression Extraction Based Improved DE-Tri-Training Algorithm[J].,2017,32(1):141-148.