Abstract:Text segmentation has important applications in many fields, including text summarization, information retrieval, and so on. Topic model is an important tool in text segmentation. However previous text segmentation methods based on topic model generally rely on manually setting of the number of topics influencing results significantly. To solve the problem, a novel text segmentation method based on hierarchical Dirichlet process(HDP) model is proposed. Firstly, texts are modeled with HDP model to get their expression with topic vectors. Then, the topic vectors are used in C99 segmentation algorithm for text segmentation. Finally, two optimization strategies are applied to result optimization. Experimental results show that the presented method can omit manually setting of the topics numbers and improve the performance of text segmentation.