Short Text Clustering Based on Feature Word Embedding
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Aiming at the problem of poor clustering performance for short text caused by sparse feature and quick updating of short text on the internet, a short text clustering algorithm based on feature word embedding is proposed in this paper. Firstly, the formula for feature word extraction based on word part-of-speech(POS) and length weighting is defined and used to extract feature words as short texts. Secondly, the word embedding that represents semantics of the feature word is gained by means of the training in large scale corpus with continous skip-gram model. Finally, word mover′s distance is introduced to calculate the similarity between short texts and applied in the hierarchical clustering algorithm to realize the short text clustering. The evaluation results on four testing datasets show that the proposed algorithm is significantly superior to traditional clustering algorithms with the mean F of 58.97% higher than the secondly best result.

    Reference
    Related
    Cited by
Get Citation

Liu Xin, She Xiandong, Tang Yongwang, Wang Bo. Short Text Clustering Based on Feature Word Embedding[J].,2017,32(5):1052-1060.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: April 10,2018
  • Published:
Article QR Code