Abstract:To deal with the speech segmentation of TV-drama which has large coherent text transcriptions but no time-stamps, an automatic semi-supervised speech segmentation algorithm is proposed in the paper. Firstly, the original text transcriptions are used to build a biased language model, then the model is applied to the TV-drama speech recognition in a semi-supervised way, and finally, the resulting automatic speech decoding hypothesis are well combined with the traditional segmentation methods to improve the performances of speech segmentation. These traditional methods are usually based on the distance metric, model classification and the phone recognizers. Experimental results on the British TV-drama “Doctor Who” database demonstrate that, the proposed approach can achieve significant performance improvement over traditional baseline algorithms. Meanwhile, the proposed approach allows high quality segmentation and the associated transcription alignments for the large coherent TV-drama speech recordings.