基于Tacotron模型和韵律修正的情感语音合成方法
作者:
作者单位:

南京师范大学教育科学学院, 南京 210097

作者简介:

通讯作者:

基金项目:

国家哲学社会科学基金(BCA150054)。


Expressive Speech Synthesis Method Based on Tacotron Model and Prosodic Correction
Author:
Affiliation:

College of Education Science, Nanjing Normal University, Nanjing 210097, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    语音合成技术日趋成熟,为了提高合成情感语音的质量,提出了一种端到端情感语音合成与韵律修正相结合的方法。在Tacotron模型合成的情感语音基础上,进行韵律参数的修改,提高合成系统的情感表达力。首先使用大型中性语料库训练Tacotron模型,再使用小型情感语料库训练,合成出具有情感的语音。然后采用Praat声学分析工具对语料库中的情感语音韵律特征进行分析并总结不同情感状态下的参数规律,最后借助该规律,对Tacotron合成的相应情感语音的基频、时长和能量进行修正,使情感表达更为精确。客观情感识别实验和主观评价的结果表明,该方法能够合成较为自然且表现力更加丰富的情感语音。

    Abstract:

    Speech synthesis technology is becoming more mature. In order to improve the quality of synthetic emotional speech, this study proposes a method combining end-to-end emotional speech synthesis with prosodic correction. Based on the Tacotron model, the prosodic parameters are modified to improve the emotion expression power of the synthetic system. Tacotron model is first trained with a large neutral corpus, and then a small emotional corpus is used to train and synthesize emotional speech. Then the Praat acoustic analysis tool is used to analyze the prosodic features of emotional speech in the corpus and summarize the parameters of different emotional states. Finally, with the help of this rule, the fundamental frequency, duration and energy of the corresponding emotional speech synthesized by Tacotron are modified to make the emotional expression more accurate. The results of objective emotion recognition experiment and subjective evaluation show that this method can synthesize more natural and expressive emotional speech.

    参考文献
    相似文献
    引证文献
引用本文

张昕,胡航烨,曹欣怡,王蔚.基于Tacotron模型和韵律修正的情感语音合成方法[J].数据采集与处理,2022,37(4):909-916

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-07-23
  • 最后修改日期:2021-10-27
  • 录用日期:
  • 在线发布日期: 2022-07-25