基于门控混合膨胀卷积的轻量级语音增强
作者:
作者单位:

南京邮电大学通信与信息工程学院, 南京 210003

作者简介:

通讯作者:

基金项目:

江苏省科技重大专项(BG2024027);国家自然科学基金(61901227)。


Lightweight Speech Enhancement Based on Gated Hybrid Dilated Convolution
Author:
Affiliation:

School of Communications and Information Engineering,Nanjing University of Posts and Telecommunications, Nanjing 210003, China

Fund Project:

Jiangsu Provincial Major Science and Technology Project (No.BG2024027);National Natural Science Foundation of China (No.61901227).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    针对主流语音增强模型存在的参数量膨胀以及计算复杂度剧增的问题,本文提出了一种基于门控混合膨胀卷积的轻量级语音增强网络。首先,设计了一种门控混合膨胀卷积模块,该模块结合门控线性单元与混合膨胀卷积,实现对语音信号的多尺度特征提取以及对噪声敏感区域的精准抑制,从而在有效保留语音长短时特征的同时,增强模型的鲁棒性;其次,设计了一种层级通道注意力模块,通过层级式特征融合,在低参数量条件下提升对通道维度中语音特征相关性的捕捉能力。在VoiceBank+DEMAND数据集上进行的实验结果表明,本文模型以仅0.41 M的参数量,在语音质量感知评价(Perceptual evaluation of speech quality,PESQ)、短时客观可懂度(Short-time objective intelligibility,STOI)、倒谱信噪比(Cepstral signal-to-noise ratio,CSIG)、倒谱背景噪声(Cepstral background noise,CBAK)、倒谱总体响度(Cepstral overall loudness,COVL)五项指标上表现良好,实现了模型轻量化与良好性能的有机结合。

    Abstract:

    To address the issues of parameter inflation and soaring computational complexity in mainstream speech enhancement models, a lightweight speech enhancement network based on gated hybrid dilated convolution is proposed in this paper. Firstly, a gated hybrid dilated convolution module is designed, which integrates gated linear units with hybrid dilated convolution to achieve multiscale feature extraction of speech signals and precise suppression of noise-sensitive regions, thereby effectively preserving both long-term and short-term speech characteristics while enhancing model robustness. Secondly, a hierarchical channel attention module is proposed to enhance the capture of speech feature correlations in channel dimensions through hierarchical feature fusion, while maintaining low parameter complexity. Experimental results on the VoiceBank+DEMAND dataset demonstrate that the proposed model, with only 0.41 million parameters, achieves competitive performance on the perceptual evaluation of speech quality (PESQ), the short-time objective intelligibility (STOI), cepstral signal-to-noise ratio (CSIG), cepstral background noise(CBAK) and cepstral overall loudness (COVL), thus achieving an organic integration of model lightweighting and high-precision performance.Highlights:1. Propose a lightweight speech enhancement network with gated hybrid dilated convolution.2. Integrate multiscale feature extraction, channel attention, and Ghost convolution for efficient feature modeling.3. Achieve a good balance between enhancement performance and model complexity on VoiceBank+DEMAND.

    参考文献
    相似文献
    引证文献
引用本文

孙林慧,魏鹏滨,王春艳,叶蕾,邵曦.基于门控混合膨胀卷积的轻量级语音增强[J].数据采集与处理,2026,(3):814-824

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2025-05-21
  • 最后修改日期:2025-08-30
  • 录用日期:
  • 在线发布日期: 2026-06-10