一种基于扩散模型的目标语音提取集成处理方法
DOI:
作者:
作者单位:

1.中国太原卫星发射中心;2.西安交通大学计算机科学与技术学院;3.国防科技大学计算机科学与技术学院

作者简介:

通讯作者:

基金项目:


An Integrated Diffusion-Based Framework for Target Speech Extraction
Author:
Affiliation:

1.Taiyuan Satellite Launch Center, China;2.College of Computer Science and Technology, Xi'3.'4.an Jiaotong University;5.Taiyuan Satellite Launch Cente, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    本文提出一种基于深度学习的两阶段目标语音分离方法,有效提升了单通道目标语音提取的性能。针对判别模型在提取过程中易过度优化客观指标,导致输出语音产生不自然伪影与失真、且主观听觉体验不足的问题,本文创新性地引入扩散模型作为第二阶段优化模块。该模块构建于随机微分方程框架,对判别模型的初步输出进行再生与优化,在降低错误提取率的同时,显著提升了目标语音的自然度与谐波结构清晰度。在WSJ0-2mix-extr数据集上的实验表明,该方法在模拟主观听觉感知的NISQA指标上提升了11.18%(从3.22至3.58),显著改善了语音质量的主观感受和自然度。进一步的CMOS主观听测结果证实了该方法在增强语音清晰度和可懂度方面的有效性。

    Abstract:

    This paper proposes a two-stage deep learning-based approach for target speech extraction, effectively enhancing the performance of single-channel speech separation. To address the tendency of discriminative models to over-optimize objective metrics during extraction resulting in unnatural artifacts, distortions, and insufficient subjective auditory quality, we innovatively integrate a diffusion model as a second-stage refinement module. Constructed within a stochastic differential equation framework, this module regenerates and optimizes the preliminary output from the discriminative model, reducing false extraction rates while significantly improving the naturalness and harmonic structure clarity of the target speech. Experiments on the WSJ0-2mix-extr dataset demonstrate an 11.18% improvement (from 3.22 to 3.58) on the NISQA metric, which simulates human auditory perception, indicating substantial enhancements in perceived speech quality and naturalness. Subjective listening tests via CMOS further validate the method’s effectiveness in improving speech clarity and intelligibility.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2025-03-11
  • 最后修改日期:2025-10-17
  • 录用日期:2025-11-06
  • 在线发布日期: