Abstract:To address the issues of parameter inflation and soaring computational complexity in mainstream speech enhancement models, a lightweight speech enhancement network based on gated hybrid dilated convolution is proposed in this paper. Firstly, a gated hybrid dilated convolution module is designed, which integrates gated linear units with hybrid dilated convolution to achieve multiscale feature extraction of speech signals and precise suppression of noise-sensitive regions, thereby effectively preserving both long-term and short-term speech characteristics while enhancing model robustness. Secondly, a hierarchical channel attention module is proposed to enhance the capture of speech feature correlations in channel dimensions through hierarchical feature fusion, while maintaining low parameter complexity. Experimental results on the VoiceBank+DEMAND dataset demonstrate that our proposed model, with only 0.41 million parameters, achieves competitive performance on the PESQ, STOI, CSIG, CBAK, and COVL metric, thus achieving an organic integration of model lightweighting and high-precision performance.