融合图像显著性的声波动方程情感识别模型

doi:10.16337/j.1004-9037.2021.05.021

首页 > 按月查看>2021年第5月 >1062-1072. DOI:10.16337/j.1004-9037.2021.05.021

融合图像显著性的声波动方程情感识别模型
DOI:
                        10.16337/j.1004-9037.2021.05.021
                    
作者:
                        
                        
                    
作者单位:大连东软信息学院软件学院，大连 116023
作者简介:
通讯作者:
基金项目:

An Acoustic Wave Equation Emotion Recognition Model Based on Image Saliency

Author:

Affiliation:

School of Software, Dalian Neusoft University of Information, Dalian 116023, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

语音情感识别（Speech emotion recognition， SER）是计算机理解人类情感的关键之处，也是人机交互的重要组成部分。当情感语音信号在不同的介质传播时，使用深度学习模型获得的识别精度不高，识别模型的迁移能力不强。为此，设计了一种融合图像显著性和门控循环的声波动方程情感识别（Image saliency gated recurrent acoustic wave equation emotion recognition， ISGR-AWEER）模型，该模型由图像显著性提取和基于门控循环的声波动模型构成。前者模拟注意力机制，用于提取语音中情感表达的有效区域，后者设计了一个声波动情感识别模型，该模型模拟循环神经网络的流程，可以有效提升跨介质下语音情感识别的精度，同时可快速地实现跨介质下的模型迁移。通过实验，在交互情感二元动作捕捉（Interactive emotional dyadic motion capture， IEMOCAP）情感语料库和自建多介质情感语音语料库上验证了当前模型的有效性，与传统的循环神经网络相比，情感识别精度获得了25%的改善，并且具有较强的跨媒介迁移能力。

Abstract:

Speech emotion recognition （SER） is the key point for computer to understand human emotion， and it is also important in human-computer interaction. When the emotional speech signal transforms in the different media， the recognition accuracy of traditional deep learning model is not high enough， and the migration ability is not strong. Here， an acoustic wave equation emotion recognition model， i.e.， image saliency gated recurrent acoustic wave equation emotion recognition （ISGR-AWEER） model is designed. The model is composed of image saliency extraction and gated recurrent model. The first part simulates the attention mechanism， which is used to extract the salient regions in speech. An acoustic wave equation emotion recognition model is designed. The model simulates the recurrent neural network， which can effectively improve the accuracy of SER in cross-media， and can quickly realize the model migration in cross-media. The effectiveness of the current model is verified by the experiments on the interactive emotional dynamic motion capture emotional corpus and the self-built multi-media emotional speech corpus. Compared with recurrent neural network， the accuracy of emotion recognition is improved by 25%， and it has a strong ability of cross-media migration.