Abstract:The state of the art algorithms for voice conversion are computationally expensive and time consuming, thus they cannot be run in the embedded systems efficiently. An voice conversion method bas ed on mixture mapping of codebooks is proposed. In the training stage, different codebook mapping relationships are built according to the training speech amount, which saves training time and improves conver sion accuracy. In the transformation stage, the system converts the vocal tract parameters of voiced frames according to the corresponding codebook mapping buil t in the training stage. In addition, to improve the quality of the con verted speech, the system converts the feature parameters of unvoiced frames as well as correcting the formant frequency to overcome the formant jitters between frames. Both objective and subjective experiments show that the proposed method reduc es computational complexity and saves training time without degrading or deterio rating the quality of the converted speech.