利用互子带滤波器和稀疏特性的多通道线性预测语音去混响方法

doi:10.16337/j.1004-9037.2024.05.007

首页 > 按月查看>2024年第5月 >1135-1146. DOI:10.16337/j.1004-9037.2024.05.007

利用互子带滤波器和稀疏特性的多通道线性预测语音去混响方法
DOI:
                        10.16337/j.1004-9037.2024.05.007
                    
作者:
                        
                        
                    
作者单位:1.国家开放大学数字化部，北京 100039;2.奥卢大学机器视觉与信号分析中心， 奥卢 90570;3.中国科学院噪声与振动重点实验室(声学研究所)，北京 100190
作者简介:
通讯作者:
基金项目:国家自然科学基金面上项目(62171438)；北京市自然科学基金(4242013)；中国科学院声学研究所自主部署“前沿探索”类项目(QYTS202111)；2023年度国家开放大学重点科研项目（Z23C0007）。

Multi-channel Linear Prediction for Speech Dereverberation Using Cross-Band Filters and Sparse Priors

Author:

Affiliation:

1.Digitalization Department, Open University of China, Beijing 100039, China;2.Center for Machine Vision and Signal Analysis, University of Oulu, Oulu 90570, Finland;3.Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

多通道线性预测是最为流行的语音去混响方法之一，现有相关研究大多利用子带谱减模型在每一个频带独立地获取期望信号，但这忽略了不同子带之间的相互影响。本文提出一种利用互子带谱减模型的多通道线性预测语音去混响方法。相比于大多数方法采用的子带谱减模型，本文方法采用的互子带谱减模型能够利用互子带滤波器来对不同子带之间的相互影响进行建模。本文方法利用复广义高斯分布建模期望信号，相比于常用的高斯分布，复广义高斯分布能够通过调整形状参数来描述语音信号的稀疏特性。在最大似然估计框架下，将语音去混响转化为关于互子带滤波器和子带滤波器的优化问题；并且基于替代最小化方法推导了保证收敛的优化算法。在不同混响时间、不同通道、不同声源和传声器距离情况下的一系列语音去混响实验验证了本文方法的性能显著优于传统去混响算法。

Abstract:

The multi-channel linear prediction （MCLP） is one of the most popular speech dereverberation methods. The band-to-band spectral subtraction model has been adopted by most existing studies to obtain the desired speech signal in each frequency band， but it neglects the interaction between different frequencies. This paper proposes a MCLP-based speech dereverberation method using the cross-band spectral subtraction model instead of the widely adopted band-to-band spectral subtraction model. The proposed model employs cross-band filters to account for the interactions between different frequencies. We model the desired signal using the complex generalized Gaussian （CGG） distribution. Compared with the Gaussian distribution， the CGG distribution can capture the sparse nature of speech signals using a suitable shape parameter. Within the maximum likelihood estimation framework， the speech dereverberation problem is formulated as an optimization problem involving the band-to-band and cross-band filters. An optimization algorithm with guaranteed convergence is derived based on the majorization-minimization method. A series of speech dereverberation experiments under various reverberation times， different channel numbers and different source-to-microphone distances demonstrate that the proposed method significantly outperforms traditional methods in terms of dereverberation performance.

参考文献

相似文献

引证文献

引用本文

康瑶,康坊,杨飞然.利用互子带滤波器和稀疏特性的多通道线性预测语音去混响方法[J].数据采集与处理,2024,39(5):1135-1146

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:2024-06-19
最后修改日期:2024-08-23
录用日期:
在线发布日期: 2024-10-14

引用本文

分享

文章指标

历史