基于深度残差收缩网络多特征融合语音情感识别
作者:
作者单位:

南京航空航天大学民航学院,南京211106

作者简介:

通讯作者:

基金项目:

国家自然科学基金(U2033202,52172387,U1333119)。


Multi-feature Fusion Speech Emotion Recognition Based on Deep Residual Shrinkage Network
Author:
Affiliation:

College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    针对语音情感识别任务中说话者的差异性,计算谱特征的一阶差分、二阶差分组成三通道的特征集输入二维网络。结合卷积神经网络、双向长短时记忆网络以及注意力机制建立基线模型,引入深度残差收缩网络分配二维网络中的通道权重,进一步提高语音情感识别的精度。为提升模型的学习效果,采取特征层融合(特征向量并行和特征向量拼接两种方式)和决策层融合(平均得分和最大得分两种方式)等不同信息融合机制。结果表明:(1)特征层融合中的特征向量并行策略是更有效的方式;(2)本文提出模型在CASIA和EMO-DB数据库下分别取得了84.93%和86.83%的未加权平均召回率(Unweighted average recall, UAR),相较于基线模型,引入深度残差收缩网络后的模型在CASIA和EMO-DB数据库上的未加权召回率分别提高5.3%和6.2%。

    Abstract:

    Aiming at the difference of speakers in speech emotion recognition task, calculate the first-order difference and second-order difference of spectral features to form three-channel feature sets and input the feature sets to the two-dimensional network. The convolutional neural network, bidirectional short and long memory network and attention mechanism were combined to establish a baseline model, and the deep residual shrinkage network was introduced to allocate channel weights in the two-dimensional network to further improve the accuracy of speech emotion recognition. In order to improve the learning effect of the model, two different information fusion mechanisms, feature layer fusion (Add and Concatenate) and decision layer fusion (Average and Maximum), were adopted. The results show that :(1) Add strategy in feature layer fusion is more effective; (2) The proposed model achieves 84.93% and 86.83% of unweighted average recall (UAR) in CASIA and EMO-DB databases respectively. Compared with the baseline model, the unweighted recall rates of CASIA and EMO-DB are increased by 5.3% and 6.2% respectively after introducing deep residual shrinkage network.

    参考文献
    相似文献
    引证文献
引用本文

李瑞航,吴红兰,孙有朝,吴华聪.基于深度残差收缩网络多特征融合语音情感识别[J].数据采集与处理,2022,37(3):542-554

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-12-28
  • 最后修改日期:2022-03-25
  • 录用日期:
  • 在线发布日期: 2022-06-13