融合声学特征和深度特征的语音文档分类
作者:
作者单位:

中国科学技术大学语音及语言信息处理国家工程实验室,合肥 230027

作者简介:

通讯作者:

基金项目:


Spoken Document Classification Based on Fusion of Acoustic Features and Deep Features
Author:
Affiliation:

National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Heifei 230027, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    传统的语音文档分类系统通常是基于语音识别系统所转录的文本实现的,识别错误会严重影响到这类系统的性能。尽管将语音和识别文本融合可以一定程度上减轻识别错误的影响,但大多数融合都是在表示向量层面融合,没有充分利用语音声学和语义信息之间的互补性。本文提出融合声学特征和深度特征的神经网络语音文档分类,在神经网络训练中,首先采用训练好的声学模型为每个语音文档提取包含语义信息的深度特征,然后将语音文档的声学特征和深度特征通过门控机制逐帧进行融合,融合后的特征用于语音文档分类。在语音新闻播报语料集上进行实验,本文提出的系统明显优于基于语音和文本融合的语音文档分类系统,最终的分类准确率达到97.27%。

    Abstract:

    Traditional speech document classification systems are usually completed through the transcribed text from speech recognition systems, which suffer from the recognition errors. Although the fusion of speech and recognized text can reduce the impact of recognition errors to some extent, the fusion that is made at the level of representation vector does not take full advantage of the complementarity between speech and text information. A neural network spoken document classification system based on the fusion of acoustic feature and deep feature is proposed in this paper. In the training procedure of the neural network,a trained acoustic model is first adopted to generate deep feature that contains semantic information for each document. Then acoustic feature and deep feature of each spoken document are fused frame by frame through the gating mechanism. Finally, the fused feature is used for spoken document classification. The proposed system is evaluated on a speech news broadcast corpus. The experimental result showed that the proposed system was obviously superior to the spoken document classification systems based on the fusion of speech and text, and the final accuracy reached 97.27%.

    表 1 不同模型的实验结果Table 1 Results of different models
    图1 基于语音和识别文本融合的语音文档分类系统结构图Fig.1 Architecture of spoken document classification system based on fusion of speech and recognized text
    图2 融合声学特征和深度特征的语音文档分类系统结构Fig.2 Architecture of spoken document classification system based on fusion of acoustic features and deep features
    表 2 消融实验结果Table 2 Results of ablation experiments
    参考文献
    相似文献
    引证文献
引用本文

刘谭,郭武.融合声学特征和深度特征的语音文档分类[J].数据采集与处理,2021,36(5):932-938

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-01-12
  • 最后修改日期:2021-07-12
  • 录用日期:
  • 在线发布日期: 2021-09-25