Using speech and text features fusion to improve speech emotion recognition
DOI:
CSTR:
Author:
Affiliation:

School of Education Science,Nanjing Normal University

Clc Number:

Fund Project:

Construct a situational evaluation environment for Chinese language proficiency with intelligent interaction technology

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Emotion recognition has an important significance in human-computer interaction. The purpose of this study was to improve the accuracy of emotion recognition by fusing speech and text features. Speech features were acoustic features and phonological features ,and the text features were the traditional Bag-of-Words (BoW) features based on emotion dictionary and and N -gram model. We used these features to emotion recognition and compared their performance on the IEMOCAP data-sets. We also compared the effects of different features fusion methods,including feature-layer fusion and decision-layer fusion. Experiment results show that the performance of the fusion of speech and text features is better than that of single features; the performance of the decision-layer fusion of speech and text features is better than that of feature-layer fusion. At the same time, based on the CNN classifier, UAR of the decision-layer fusion with three features reach 68.98%, surpassing the previous best results on the IEMOCAP data sets.

    Reference
    Related
    Cited by
Get Citation

Feng Yaqin, Shen Lingjie, Hu Tingting and. Using speech and text features fusion to improve speech emotion recognition[J].,2019,34(4).

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 21,2018
  • Revised:April 04,2018
  • Adopted:May 10,2019
  • Online: August 09,2019
  • Published:
Article QR Code