Abstract:Traditional text classification methods assume that feature words in the training set and test set follow the same probability distribution. Nevertheless, deviations exist in a practical application, which can affect the final classification results. To solve the problem, a feature transfer learning algorithm for text categorization is proposed. By calculating the transfer volume and amending the vector space model in the training set, the distribution probability of feature words can be reconciled for the training set and test set. Experiments on Chinese spam filtering and web page classification data sets demonstrate that the proposed method can eliminate the dissimilarity of distributions of feature words, and improve the various indexes of test classification evidently.