Abstract:Usually in multi label text classification, the relationship of labels is obscure and the dimension of features is too high. To solve these problem s, a multi label text categorization algorithm called multi label algorithm of hypothe sis reuse based on probabilistic latent semantic analysis (PLSA) is proposed. Fi r stly, the training samples are mapped to a hidden semantic space by PLSA model, using the theme distribution to represent a piece of text, which remov e the noise interference and reduce the data dimension significantly. Then, the m ulti label algorithm of hypothesis reuse (MAHR) is utilized to classify samples . The features obtained from PLSA dimension reduction have the semantic informat ion. Therefore, the relationship of labels can be obtained accurately to train t he ba se classifier, and the artificial defect is thus avoided. Experimental results d em onstrate that the proposed method can make full use of the semantic information by PLSA dimension reduction and improve the performance of multi label text cl assification.