基于深度学习的说话人确认方法研究现状及展望
作者:
作者单位:

哈尔滨工业大学计算机科学与技术学院,哈尔滨 150001

作者简介:

通讯作者:

基金项目:

国家自然科学基金 (62376071)。


State of the Art and Prospects of Deep Learning-Based Speaker Verification
Author:
Affiliation:

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    随着深度学习的不断发展,说话人确认(Speaker verification)技术已经取得了长足的进步。该技术相较于其他生物特征识别技术,具有可远程操作、成本低和易于人机交互等优势,在公安刑侦、金融服务等领域展现出广泛的应用前景。本文系统综述了基于深度学习的说话人确认技术的发展脉络。首先,介绍了基于深度学习的说话人特征表示模型在模型输入与结构、池化层、有监督损失函数和自监督学习与预训练模型4个方面的发展历程和研究现状;其次,探讨了说话人确认技术在实际应用中面临的跨域不匹配问题,如噪声干扰、信道不匹配和远场语音等,并概述了相应的领域自适应和领域泛化方法;最后,指出了进一步的研究方向。

    Abstract:

    With the development of deep learning, speaker verification has made great progress. Compared with other biometric identification technologies, this technology has advantages of remote operation, low cost, easy human-computer interaction, etc., thus it shows a wide range of application prospects in the fields of public security, criminal investigation, and financial services. A systematic overview of the development lineage of deep learning-based speaker verification techniques is provided. Firstly, the development history and research status of deep learning-based speaker representation model are introduced in four aspects: Model input and structure, pooling layer, supervised loss function, and self-supervised learning and pre-training model. Then, the challenges faced by speaker verification are discussed, such as cross-domain mismatch problems like noise interference, channel mismatch and far-field speech, and the corresponding domain adaptation and domain generalization methods are outlined. Finally, the further research directions are presented.

    参考文献
    相似文献
    引证文献
引用本文

李建琛,韩纪庆.基于深度学习的说话人确认方法研究现状及展望[J].数据采集与处理,2024,(5):1062-1084

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-08-09
  • 最后修改日期:2024-09-13
  • 录用日期:
  • 在线发布日期: 2024-10-14