融合卷积网络与残差长短时记忆网络的轻量级骨导语音盲增强
作者:
作者单位:

1.陆军工程大学指挥控制工程学院,南京 210007;2.火箭军士官学校,青州 262500

作者简介:

通讯作者:

基金项目:

国家自然科学基金(62071484)资助项目。


Lightweight Model for Bone-Conducted Speech Enhancement Based on Convolution Network and Residual Long Short-Time Memory Network
Author:
Affiliation:

1.College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China;2.Department of Test and Control, High-Tech Institute, Qingzhou 262500, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    基于深度学习的骨导语音盲增强已经取得了较好的效果,但仍存在模型体积大、计算复杂度高等问题。为此提出一种融合卷积网络和残差长短时记忆网络的轻量级骨导语音增强深度学习模型,该模型在保持语音增强质量的前提下,能有效提升骨导语音盲增强的效率。该模型借助卷积网络参数量小、特征提取能力强等优点,在语谱图频率维度引入卷积结构,从而深入挖掘时频结构的细节和高低频信息间的关联关系以提取新型特征,并将此新型特征输入改进后的长短时记忆网络中,用于恢复高频成分信息并重构语音信号。通过在骨导语音数据库上实验,表明所提模型可以有效改善高频成分的时频结构,在提升增强效果的同时,降低了模型体积和推理的计算复杂度。

    Abstract:

    Bone-conducted speech enhancement based on deep learning has reached a milestone recently. However, there are still some issues to prevent its real-world applications, such as large models and high computational complexities. In this paper, a lightweight deep learning model is proposed to improve the efficiency of bone-conducted speech enhancement. Inspired by the fact that convolution network has unique advantages in feature extraction with a few of parameters, convolution structures are introduced into the frequency dimensions of the spectrogram in our model. These structures can extract the details of the spectrogram in the time-frequency structures and explore the potential relationship between high and low frequency components. These new features extracted by CNN are fed into the improved long short-term memory network to recover high-frequency components information and reconstruct speech signals. From the experiments on bone conduction speech database, we can draw a conclusion that the proposed model can reconstruct the time-frequency details of the high-frequency components. While improving the enhancement performance, the model size and the computational complexity are reduced.

    表 4 3种模型在不同实验对象下的LSD值Table 4 LSD scores of three models for different speakers
    表 1 网络结构参数Table 1 Parameters of network structure
    表 3 3种模型在不同实验对象下的STOI值Table 3 STOI scores of three models for different speakers
    表 2 3种模型在不同实验对象下的PESQ值Table 2 PESQ scores of three models for different speakers
    图1 气导语音与骨导语音语谱图Fig.1 Spectrogram of air-conducted and bone-conducted speeches
    图2 RCRNN联合模型增强方法的结构Fig.2 Structure of RCRNN joint model enhancement method
    图3 RCRNN网络结构Fig.3 Network structure of RCRNN
    图4 本文算法设计Fig.4 Design of the proposed algorithm
    图5 3次扩张率为2的3×3卷积后的结果Fig.5 Results of three times of 3 × 3 convolution with expansion rate of 2
    图6 3种模型的参数量和预测时间Fig.6 Parameters and prediction time of three models
    图7 经过不同模型增强的语音语谱图Fig.7 Speech spectrogram enhanced by different models
    参考文献
    相似文献
    引证文献
引用本文

邦锦阳,孙蒙,张雄伟,郑昌艳.融合卷积网络与残差长短时记忆网络的轻量级骨导语音盲增强[J].数据采集与处理,2021,36(5):921-931

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-03-01
  • 最后修改日期:2021-07-23
  • 录用日期:
  • 在线发布日期: 2021-09-25