Abstract:Voice conversion is a technique for changing the personality characteristics of a source speaker′s voice into the target speaker′s, while preserving the original semantic information. An adaptive particle swarm optimization (PSO) based method is proposed to model voice features by training the radial basis function (RBF) neural network in order to capture the spectral envelope mapping function between speakers. In addition, the pitch transformation is captured by modeling pitch with the joint spectral feature paramet ers in RBF neural network, which makes the converted pitch contain more target details. Finally, the performance of the improved voice conversion system is tested by subjective and objective method respectively. Experimental results show that the performance of the proposed method is better than that of the Gaussian mixture model (GMM) based system, especially for the male to female conversion.