Multi Label Text Categorization Algorithm Based on Topic Model PLSA
DOI:
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Usually in multi label text classification, the relationship of labels is obscure and the dimension of features is too high. To solve these problem s, a multi label text categorization algorithm called multi label algorithm of hypothe sis reuse based on probabilistic latent semantic analysis (PLSA) is proposed. Fi r stly, the training samples are mapped to a hidden semantic space by PLSA model, using the theme distribution to represent a piece of text, which remov e the noise interference and reduce the data dimension significantly. Then, the m ulti label algorithm of hypothesis reuse (MAHR) is utilized to classify samples . The features obtained from PLSA dimension reduction have the semantic informat ion. Therefore, the relationship of labels can be obtained accurately to train t he ba se classifier, and the artificial defect is thus avoided. Experimental results d em onstrate that the proposed method can make full use of the semantic information by PLSA dimension reduction and improve the performance of multi label text cl assification.

    Reference
    Related
    Cited by
Get Citation

Jiang Mingchu, Pan Zhisong, You Jun. Multi Label Text Categorization Algorithm Based on Topic Model PLSA[J].,2016,31(3):541-547.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: June 24,2016
  • Published:
Article QR Code