Quality Phrase Mining Method Based on Statistic Features
CSTR:
Author:
Affiliation:

1.College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang, 050024,China;2.Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security, Hebei Normal University, Shijiazhuang, 050024, China;3.Key Laboratory of Network & Information Security, Hebei Normal University, Shijiazhuang, 050024, China;4.College of Information Engineering, Hebei GEO University, Shijiazhuang, 050031, China;5.School of Mathematical Sciences, Hebei Normal University, Shijiazhuang, 050024, China

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Quality Phrase mining is a process of extracting meaningful phrases from text corpus, which is the basis of tasks such as document summary and information retrieval. However, the existing unsupervised phrase mining methods have problems of low quality of candidate phrases and average distribution of feature weight of Quality Phrase. Therefore, a Quality Phrase mining method based on statistic features is proposed. This method combines frequent N-Gram mining, combinatorial constraints of multi-word phrases, and spell checking to ensure the quality of candidate phrases. The public knowledge base is introduced to add labels to the candidate phrases, and the weight distribution of Quality Phrase is realized. The penalty factor is set to adjust the weight ratio considering the mutual influence between the features. The Quality Phrase is extracted according to the score of the feature weighting function of the candidate phrases. Experimental results show that the Quality Phrase mining method based on statistic features significantly improves the precision of phrase mining. Compared with the optimal unsupervised phrase mining methods, the precision, recall and F1-Score values are improved by 5.97%, 1.77%, and 4.02%, respectively.

    Reference
    Related
    Cited by
Get Citation

YANG Huanhuan, ZHAO Shuliang, LI Wenbin, WU Yongliang, TIAN Guoqiang. Quality Phrase Mining Method Based on Statistic Features[J].,2020,35(3):458-473.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 19,2019
  • Revised:December 11,2019
  • Adopted:
  • Online: May 25,2020
  • Published:
Article QR Code