Multi-feature Fusion Speech Emotion Recognition Based on Deep Residual Shrinkage Network
CSTR:
Author:
Affiliation:

College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Aiming at the difference of speakers in speech emotion recognition task, calculate the first-order difference and second-order difference of spectral features to form three-channel feature sets and input the feature sets to the two-dimensional network. The convolutional neural network, bidirectional short and long memory network and attention mechanism were combined to establish a baseline model, and the deep residual shrinkage network was introduced to allocate channel weights in the two-dimensional network to further improve the accuracy of speech emotion recognition. In order to improve the learning effect of the model, two different information fusion mechanisms, feature layer fusion (Add and Concatenate) and decision layer fusion (Average and Maximum), were adopted. The results show that :(1) Add strategy in feature layer fusion is more effective; (2) The proposed model achieves 84.93% and 86.83% of unweighted average recall (UAR) in CASIA and EMO-DB databases respectively. Compared with the baseline model, the unweighted recall rates of CASIA and EMO-DB are increased by 5.3% and 6.2% respectively after introducing deep residual shrinkage network.

    Reference
    Related
    Cited by
Get Citation

LI Ruihang, WU Honglan, SUN Youchao, WU Huacong. Multi-feature Fusion Speech Emotion Recognition Based on Deep Residual Shrinkage Network[J].,2022,37(3):542-554.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 28,2021
  • Revised:March 25,2022
  • Adopted:
  • Online: May 25,2022
  • Published:
Article QR Code