基于深度域适应CNN决策树的跨语料库情感识别
作者:
作者单位:

南京邮电大学通信与信息工程学院,南京210003

作者简介:

通讯作者:

基金项目:

国家自然科学基金(61901227); 中国国家留学基金(202008320043)。


Cross-Corpus Emotion Recognition Based on Deep Domain Adaptation and CNN Decision Tree
Author:
Affiliation:

College of Telecommunications & Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    在跨语料库语音情感识别中,由于目标域和源域样本不匹配,导致情感识别性能很差。为了提高跨语料库语音情感识别性能,本文提出一种基于深度域适应和卷积神经网络(Convolutional neural network, CNN)决策树模型的跨语料库语音情感识别方法。首先构建基于联合约束深度域适应的局部特征迁移学习网络,通过最小化目标域和源域在特征空间和希尔伯特空间的联合差异,挖掘两个语料库之间的相关性,学习从目标域到源域的可迁移不变特征。然后,为了降低跨语料库背景下多种情感间的易混淆情感的分类误差,依据情感混淆度构建CNN决策树多级分类模型,对多种情感先粗分类再细分类。使用CASIA,EMO-DB和RAVDESS三个语料库进行验证。实验结果表明,本文的跨语料库语音情感识别方法比CNN基线方法平均识别率高19.32%~31.08%,系统性能得到很大提升。

    Abstract:

    In cross-corpus speech emotion recognition, the mismatch between target domain and source domain samples leads to poor performance of emotion recognition. In order to improve the cross-corpus speech emotion recognition performance, this paper proposes a cross-corpus speech emotion recognition method based on deep domain adaptation and convolutional neural network (CNN) decision tree model. Firstly, a local feature transfer learning network based on joint constrained deep domain adaptation is constructed. By minimizing the joint difference between the target and source domains in the feature space and Hilbert space, the correlation between the two corpora is mined and the transferable invariant features from the target domain to the source domain are learned. Then, in order to reduce the classification error of confusable emotions among multiple emotions in the cross-corpus context, a CNN decision tree multi-level classification model is constructed based on the emotional confusion degree, and multiple emotions are first coarsely classified and then finely classified. The experiments are validated using three corpora, CASIA, EMO-DB and RAVDESS. The results show that the average recognition rate of the proposed cross-corpus speech emotion recognition method are 19.32%—31.08% higher than that of CNN baseline method, and the system performance is greatly improved.

    参考文献
    相似文献
    引证文献
引用本文

孙林慧,赵敏,王舜.基于深度域适应CNN决策树的跨语料库情感识别[J].数据采集与处理,2023,38(3):704-716

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-07-15
  • 最后修改日期:2023-02-23
  • 录用日期:
  • 在线发布日期: 2023-05-25