中文核心期刊
中国科技论文统计源期刊
国际刊号:1004-9037
国内刊号:32-1367/TN
用户登录
  E-mail:  
  密  码:  
  作者 审稿  
  编辑 读者  
期刊向导
联系方式
  • 主管:中国科学技术协会
  • 主办:南京航空航天大学
  •           中国电子学会
  • 国际刊号:1004-9037
  • 国内刊号:32-1367/TN
  • 地址:南京市御道街29号
  • 电话:025-84892742
  • 传真:025-84892742
  • E-mail:sjcj@nuaa.edu.cn
  • 邮编:210016
贾宁,郑纯军.融合图像显著性的声波动方程情感识别模型[J].数据采集与处理,2021,36(5):1062-1072
融合图像显著性的声波动方程情感识别模型
An Acoustic Wave Equation Emotion Recognition Model Based on Image Saliency
投稿时间:2021-05-24  修订日期:2021-09-11
DOI:10.16337/j.1004-9037.2021.05.021
中文关键词:  语音情感识别  图像显著性和门控循环的声波动方程情感识别  图像显著性  声波动方程  门控循环  多介质情感语音语料库
英文关键词:speech emotion recognition (SER)  image saliency gated recurrent acoustic wave equation emotion recognition (ISGR-AWEER)  image saliency  acoustic wave equation  gated recurrent  multi-media emotional speech corpus
基金项目:
作者单位邮编
贾宁 大连东软信息学院软件学院大连 116023 116023
郑纯军 大连东软信息学院软件学院大连 116023 116023
摘要点击次数: 80
全文下载次数: 174
中文摘要:
      语音情感识别(Speech emotion recognition, SER)是计算机理解人类情感的关键之处,也是人机交互的重要组成部分。当情感语音信号在不同的介质传播时,使用深度学习模型获得的识别精度不高,识别模型的迁移能力不强。为此,设计了一种融合图像显著性和门控循环的声波动方程情感识别(Image saliency gated recurrent acoustic wave equation emotion recognition, ISGR-AWEER)模型,该模型由图像显著性提取和基于门控循环的声波动模型构成。前者模拟注意力机制,用于提取语音中情感表达的有效区域,后者设计了一个声波动情感识别模型,该模型模拟循环神经网络的流程,可以有效提升跨介质下语音情感识别的精度,同时可快速地实现跨介质下的模型迁移。通过实验,在交互情感二元动作捕捉(Interactive emotional dyadic motion capture, IEMOCAP)情感语料库和自建多介质情感语音语料库上验证了当前模型的有效性,与传统的循环神经网络相比,情感识别精度获得了25%的改善,并且具有较强的跨媒介迁移能力。
英文摘要:
      Speech emotion recognition (SER) is the key point for computer to understand human emotion, and it is also important in human-computer interaction. When the emotional speech signal transforms in the different media, the recognition accuracy of traditional deep learning model is not high enough, and the migration ability is not strong. Here, an acoustic wave equation emotion recognition model, i.e., image saliency gated recurrent acoustic wave equation emotion recognition (ISGR-AWEER) model is designed. The model is composed of image saliency extraction and gated recurrent model. The first part simulates the attention mechanism, which is used to extract the salient regions in speech. An acoustic wave equation emotion recognition model is designed. The model simulates the recurrent neural network, which can effectively improve the accuracy of SER in cross-media, and can quickly realize the model migration in cross-media. The effectiveness of the current model is verified by the experiments on the interactive emotional dynamic motion capture emotional corpus and the self-built multi-media emotional speech corpus. Compared with recurrent neural network, the accuracy of emotion recognition is improved by 25%, and it has a strong ability of cross-media migration.
查看全文  HTML  查看/发表评论

Copyright @2010-2015《数据采集与处理》编辑部

地址:南京市御道街29号        邮编:210016

电话:025-84892742      传真:025-84892742       E-mail:sjcj@nuaa.edu.cn

您是本站第2371888位访问者 本站今日一共被访问430

技术支持:北京勤云科技发展有限公司