融合图像显著性的声波动方程情感识别模型
作者:
作者单位:

大连东软信息学院软件学院,大连 116023

作者简介:

通讯作者:

基金项目:


An Acoustic Wave Equation Emotion Recognition Model Based on Image Saliency
Author:
Affiliation:

School of Software, Dalian Neusoft University of Information, Dalian 116023, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    语音情感识别(Speech emotion recognition, SER)是计算机理解人类情感的关键之处,也是人机交互的重要组成部分。当情感语音信号在不同的介质传播时,使用深度学习模型获得的识别精度不高,识别模型的迁移能力不强。为此,设计了一种融合图像显著性和门控循环的声波动方程情感识别(Image saliency gated recurrent acoustic wave equation emotion recognition, ISGR-AWEER)模型,该模型由图像显著性提取和基于门控循环的声波动模型构成。前者模拟注意力机制,用于提取语音中情感表达的有效区域,后者设计了一个声波动情感识别模型,该模型模拟循环神经网络的流程,可以有效提升跨介质下语音情感识别的精度,同时可快速地实现跨介质下的模型迁移。通过实验,在交互情感二元动作捕捉(Interactive emotional dyadic motion capture, IEMOCAP)情感语料库和自建多介质情感语音语料库上验证了当前模型的有效性,与传统的循环神经网络相比,情感识别精度获得了25%的改善,并且具有较强的跨媒介迁移能力。

    Abstract:

    Speech emotion recognition (SER) is the key point for computer to understand human emotion, and it is also important in human-computer interaction. When the emotional speech signal transforms in the different media, the recognition accuracy of traditional deep learning model is not high enough, and the migration ability is not strong. Here, an acoustic wave equation emotion recognition model, i.e., image saliency gated recurrent acoustic wave equation emotion recognition (ISGR-AWEER) model is designed. The model is composed of image saliency extraction and gated recurrent model. The first part simulates the attention mechanism, which is used to extract the salient regions in speech. An acoustic wave equation emotion recognition model is designed. The model simulates the recurrent neural network, which can effectively improve the accuracy of SER in cross-media, and can quickly realize the model migration in cross-media. The effectiveness of the current model is verified by the experiments on the interactive emotional dynamic motion capture emotional corpus and the self-built multi-media emotional speech corpus. Compared with recurrent neural network, the accuracy of emotion recognition is improved by 25%, and it has a strong ability of cross-media migration.

    表 2 流行SER模型的UA对比Table 2 UA of popular SER
    表 3 多介质语料库中的情感识别实验结果Table 3 Experimental results of speech emotion recognition in multi-media emotional speech corpus
    表 1 两个语料库中的情感识别实验结果Table 1 Experimental results of speech emotion recognition in two emotional speech corpus
    图1 ISGR-AWEER模型整体结构Fig.1 Model structure of ISGR-AWEER
    图2 Angry和happy类别的图像表达Fig.2 Image expression of angry and happy class
    图3 Angry和happy类别的显著性信号区域表达Fig.3 Expression of significant regions in angry and happy class
    图4 声波动模型结构Fig.4 Structure of acoustic wave equation model
    图5 自建语料库的情感识别混淆矩阵Fig.5 Emotion recognition confusion matrix based on self-built corpus
    图6 IEMOCAP情感识别混淆矩阵Fig.6 Emotion recognition confusion matrix based on IEMOCAP
    参考文献
    相似文献
    引证文献
引用本文

贾宁,郑纯军.融合图像显著性的声波动方程情感识别模型[J].数据采集与处理,2021,36(5):1062-1072

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-05-24
  • 最后修改日期:2021-09-11
  • 录用日期:
  • 在线发布日期: 2021-09-25