Abstract:A unit selection speech synthesis method is presented using an automatic error detection. It aims to design a unit selection criterion consistent with the subjective perception of listeners so as to improve the naturalness of synthetic speech. Firstly, crowdsourcing platform, instead of linguistics experts in the traditional approach is facilitated to collect mass perceptual data efficiently. Then, a synthetic error detector based on a support vector machine(SVM) classifier is constructed based on speech features such as syllable duration, unit cost and acoustic parameters distance extracted from subjective evaluations. During speech synthesis, N-best unit selection results given by conventional unit selection algorithms are rescored by the trained synthetic error detector in order to select the optimal one. Preference test results show that the proposed method can effectively improve the naturalness of synthetic speech.