基于对比预测编码模型的多任务学习语种识别方法
作者:
作者单位:

1.贵州师范学院数学与大数据学院,贵阳 550018;2.贵州师范学院大数据科学与智能工程研究院,贵阳 550018;3.哈尔滨工业大学(深圳)计算机科学与技术学院,深圳 518000

作者简介:

通讯作者:

基金项目:

贵州省科技厅基础研究计划项目(黔科合基础-ZK〔2021〕一般334);贵州省教育厅基础研究计划项目(黔科合基础〔2020〕1Y258);贵州省省级重点学科“计算机科学与技术”项目(ZDXK〔2018〕007号);贵州省教育厅创新群体研究项目(黔教合KY字〔2021〕022);贵州省2018年第三批省级服务业发展引导资金项目(黔发改服务〔2018〕1181号)。


Language Identification Method for Multi-task Learning Based on Contrastive Predictive Coding Model
Author:
Affiliation:

1.School of Mathematics and Big Data, Guizhou Education University, Guiyang 550018, China;2.Big Data Science and Intelligent Engineering Research Institute, Guizhou Education University, Guiyang 550018, China;3.School of Computer Science and Technology, Harbin Institute of Technology(Shenzhen), Shenzhen 518000,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    语种识别的关键是从语音片段中提取有用的特征。通过延时神经网络(Time-delayed neural network, TDNN)可以提取包含丰富上下文信息的特征向量,有效提高系统性能。本文提出一种ECAPA(Emphasized channel attention)-TDNN+对比预测编码(Contrastive predictive coding,CPC)模型的多任务学习语种识别网络。ECAPA-TDNN为主干网络,提取语音全局特征,改进的CPC模型为辅助网络,对ECAPA-TDNN提取的帧级特征进行对比预测学习,通过联合损失函数进行优化训练。在东方语种竞赛数据集AP17-OLR的10类语种上进行了实验。实验结果表明,本文提出的网络在1 s,3 s和全长(All)测试集测得的识别准确率相比于基础网络都有明显的提高。

    Abstract:

    The key of language identification is to extract useful features from speech fragments. The time-delayed neural network (TDNN) can extract feature vectors, which contain rich context and improve system performance effectively. This paper proposes a multi-task learning method of ECAPA(Emphasized channel attention)-TDNN+contrastive predictive coding(CPC) network for language identification. ECAPA-TDNN is the main network to extract the global features of language. The improved CPC model is the auxiliary network, and the frame level features extracted by ECAPA-TDNN are compared and predicted. Finally, the joint loss function is used to optimize the network. The proposed method is tested on the 10 language data sets provided by the AP17-OLR data set.The result shows that the identification accuracy of the proposed network is higher than baseline on the 1 s, 3 s and All test data sets of AP17-OLR.

    参考文献
    相似文献
    引证文献
引用本文

赵建川,杨浩铨,徐勇,吴恋,崔忠伟.基于对比预测编码模型的多任务学习语种识别方法[J].数据采集与处理,2022,37(2):288-297

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-01-17
  • 最后修改日期:2022-02-19
  • 录用日期:
  • 在线发布日期: 2022-03-25