多说话人分离与目标说话人提取的研究现状与展望
作者:
作者单位:

北京工业大学信息科学技术学院语音与音频信息处理研究所,北京 100124

作者简介:

通讯作者:

基金项目:

国家自然科学基金(61831019)。


Research Situation and Prospects of Multi-speaker Separation and Target Speaker Extraction
Author:
Affiliation:

Institute of Speech and Audio Information Processing, School of Information Science and Technology,Beijing University of Technology, Beijing 100124, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    语音分离作为语音信号处理领域的前沿技术,具有重要的研究价值和广阔的应用前景。通常,麦克风拾取的信号包含有多个说话人的语音、噪声和混响。为了提升用户的听觉体验以及后端设备的处理性能,需要对混合信号进行语音分离。语音分离起源于著名的鸡尾酒会问题,旨在从混合信号中分离出说话人的语音信号。近年来,研究人员提出了大量的语音分离方法,显著提升了分离性能。本文对这些语音分离方法进行了系统的归纳和总结。首先,根据目标说话人的辅助信息利用与否,将语音分离方法分为两大类,即多说话人分离与目标说话人提取;其次,从传统到基于深度学习的角度,分别对多说话人分离和目标说话人提取两类方法进行详细介绍;最后,讨论了当前语音分离领域面临的一些挑战,并对未来的研究方向进行展望。

    Abstract:

    As a cutting-edge technology in speech signal processing, speech separation has significant research value and broad application prospects. Typically, the signal captured by the microphones contains speech signals from multiple speakers, noise and reverberation. To improve the user experience and the performance of backend devices, it is necessary to perform speech separation. Speech separation originated from the well-known cocktail party problem. It aims to separate the speech signals from the mixed signal. In recent years, researchers have proposed a large number of speech separation methods, which have significantly improved separation performance. This paper systematically reviews and summarizes these methods. First, based on whether the auxiliary information of the target speaker is leveraged, speech separation is divided into two categories, i.e., multi-speaker separation and target speaker extraction. Second, these methods are introduced in detail, following the progression from conventional approaches to deep learning-based techniques. Finally, the existing challenges in speech separation are discussed and prospective research in the future are highlighted.

    参考文献
    相似文献
    引证文献
引用本文

鲍长春,杨雪.多说话人分离与目标说话人提取的研究现状与展望[J].数据采集与处理,2024,(5):1044-1061

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-06-27
  • 最后修改日期:2024-08-15
  • 录用日期:
  • 在线发布日期: 2024-10-14