Short Text Clustering Based on Feature Word Embedding

doi:10.16337/j.1004-9037.2017.05.023

Home > Archive>Volume 32, Issue 5, 2017 >1052-1060. DOI:10.16337/j.1004-9037.2017.05.023

Short Text Clustering Based on Feature Word Embedding
DOI:
                        10.16337/j.1004-9037.2017.05.023
                    
CSTR:
                        
Author:
                        
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Aiming at the problem of poor clustering performance for short text caused by sparse feature and quick updating of short text on the internet, a short text clustering algorithm based on feature word embedding is proposed in this paper. Firstly, the formula for feature word extraction based on word part-of-speech(POS) and length weighting is defined and used to extract feature words as short texts. Secondly, the word embedding that represents semantics of the feature word is gained by means of the training in large scale corpus with continous skip-gram model. Finally, word mover′s distance is introduced to calculate the similarity between short texts and applied in the hierarchical clustering algorithm to realize the short text clustering. The evaluation results on four testing datasets show that the proposed algorithm is significantly superior to traditional clustering algorithms with the mean F of 58.97% higher than the secondly best result.

Reference

Cited by

Get Citation

Liu Xin, She Xiandong, Tang Yongwang, Wang Bo. Short Text Clustering Based on Feature Word Embedding[J]. Journal of Data Acquisition and Processing,2017,32(5):1052-1060.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:
Revised:
Adopted:
Online: April 10,2018
Published:

For Authors

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code