Abstract:Speech recognition system based on the hybrid language model has the advantage of recognizing the out-of-vocabulary (OOV) words, but the recognition accuracy of the OOVs is far below that of the in-vocabulary (IV) words. To further improve the performance of hybrid speech recognition, a system combination method based on complementary acoustic models is proposed in this paper. Firstly, two hybrid speech recognition systems based on hidden Markov model and deep neural network (HMM-DNN) are set up by using different acoustic modeling unites. Aiming at the relevance of these two recognition tasks, the thought of multi-task learning (MTL) is then used to share the input and hidden layers of DNN and improve the modeling accuracy by joint training. Finally, the outputs of two systems are combined with recognizer output voting error reduction (ROVER). Experimental results show that the MTL-DNN modeling method can obtain better recognition performance than the single-task learning DNN(STL-DNN) and the combining of the two systems can further reduce the final word error rates(WER).