一种基于LSTM<bold>-</bold>RNN的喉振传声器语音盲增强算法

doi:10.16337/j.1004-9037.2019.04.007

首页 > 按月查看>2019年第4月 >615-624. DOI:10.16337/j.1004-9037.2019.04.007

一种基于LSTM-RNN的喉振传声器语音盲增强算法
DOI:
                        10.16337/j.1004-9037.2019.04.007
                    
作者:
                        
                        
                    
作者单位:陆军工程大学，南京，210007
作者简介:
通讯作者:
基金项目:国家自然科学基金（61471394，61402519）资助项目。

Blind Enhancement Algorithm for Throat Microphone Speech Based on LSTM Recurrent Neural Networks

Author:

Affiliation:

Army Engineering University, Nanjing, 210007, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

喉振传声器以其优良的抗噪声特性已在多种强噪声场景中得到应用，但其产生的语音尚存在着中频成份厚重、高频成份缺失等问题，严重影响了语音的清晰度和可懂度。为改善喉振传声器的语音质量，本文提出了一种基于长短时记忆递归神经网络（Long short term memory recurrent neural networks， LSTM-RNN）的喉振传声器语音盲增强算法。与基于低维的谱包络特征估计算法不同，该算法首先利用LSTM-RNN对喉振传声器语音与空气传导语音的高维对数幅度谱之间的转换关系进行建模，能有效捕捉上下文信息实现语音幅度谱的重构，然后采用非负矩阵分解（Non-negative matrix factorization， NMF）对估计出的语音幅度谱进行处理，有效抑制了过平滑问题，进一步提高了语音质量。仿真实验得到的LLR，LSD，PESQ性能指标表明，该算法可有效改善喉振传声器的语音质量。

Abstract:

Throat microphones have been used in a variety of strong noise scenarios due to their excellent anti-noise characteristics. However, the generated speech has some shortcomings such as much higher energy in middle frequency and severe loss of high frequency, which have greatly affected the speech quality and intelligibility. In order to improve the speech quality， a blind speech enhancement algorithm based on long short memory recurrent neural networks (Long short term memory recurrent neural networks， LSTM-RNN) is proposed. In contrast to previous estimation algorithms based on low-dimensional spectral envelope features, this algorithm first models the relationship of the high-dimensional logarithmic amplitude spectrum between the throat and air-conducted microphone speech directly, and this kind of neural networks can impressively capture the context information to reconstruct the signal. Secondly, the estimated speech amplitude spectrum is processed by non-negative matrix factorization (Non-negative matrix factorization， NMF), which can effectively suppress the over-smoothing problem and further improve the speech quality. The simulation results of LLR, LSD, PESQ show that this algorithm can effectively improve the speech quality of throat microphones.

参考文献

相似文献

引证文献

引用本文

郑昌艳,张雄伟,曹铁勇,杨吉斌,孙蒙,邢益搏.一种基于LSTM-RNN的喉振传声器语音盲增强算法[J].数据采集与处理,2019,34(4):615-624

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:2018-03-18
最后修改日期:2018-05-07
录用日期:
在线发布日期: 2019-09-01

引用本文

分享

文章指标

历史