中文核心期刊
中国科技论文统计源期刊
国际刊号:1004-9037
国内刊号:32-1367/TN
用户登录
  E-mail:  
  密  码:  
  作者 审稿  
  编辑 读者  
期刊向导
联系方式
  • 主管:中国科学技术协会
  • 主办:南京航空航天大学
  •           中国电子学会
  • 国际刊号:1004-9037
  • 国内刊号:32-1367/TN
  • 地址:南京市御道街29号
  • 电话:025-84892742
  • 传真:025-84892742
  • E-mail:sjcj@nuaa.edu.cn
  • 邮编:210016
邦锦阳,孙蒙,张雄伟,郑昌艳.融合卷积网络与残差长短时记忆网络的轻量级骨导语音盲增强[J].数据采集与处理,2021,36(5):921-931
融合卷积网络与残差长短时记忆网络的轻量级骨导语音盲增强
Lightweight Model for Bone-Conducted Speech Enhancement Based on Convolution Network and Residual Long Short-Time Memory Network
投稿时间:2021-03-01  修订日期:2021-07-23
DOI:10.16337/j.1004-9037.2021.05.007
中文关键词:  骨导语音盲增强  卷积网络  长短时记忆网络  轻量级模型
英文关键词:bone-conducted speech blind enhancement  convolutional neural network  long short-term memory network  lightweight model
基金项目:国家自然科学基金(62071484)资助项目。
作者单位邮编
邦锦阳 陆军工程大学指挥控制工程学院南京 210007 210007
孙蒙 陆军工程大学指挥控制工程学院南京 210007 210007
张雄伟 陆军工程大学指挥控制工程学院南京 210007 210007
郑昌艳 火箭军士官学校青州 262500 262500
摘要点击次数: 78
全文下载次数: 120
中文摘要:
      基于深度学习的骨导语音盲增强已经取得了较好的效果,但仍存在模型体积大、计算复杂度高等问题。为此提出一种融合卷积网络和残差长短时记忆网络的轻量级骨导语音增强深度学习模型,该模型在保持语音增强质量的前提下,能有效提升骨导语音盲增强的效率。该模型借助卷积网络参数量小、特征提取能力强等优点,在语谱图频率维度引入卷积结构,从而深入挖掘时频结构的细节和高低频信息间的关联关系以提取新型特征,并将此新型特征输入改进后的长短时记忆网络中,用于恢复高频成分信息并重构语音信号。通过在骨导语音数据库上实验,表明所提模型可以有效改善高频成分的时频结构,在提升增强效果的同时,降低了模型体积和推理的计算复杂度。
英文摘要:
      Bone-conducted speech enhancement based on deep learning has reached a milestone recently. However, there are still some issues to prevent its real-world applications, such as large models and high computational complexities. In this paper, a lightweight deep learning model is proposed to improve the efficiency of bone-conducted speech enhancement. Inspired by the fact that convolution network has unique advantages in feature extraction with a few of parameters, convolution structures are introduced into the frequency dimensions of the spectrogram in our model. These structures can extract the details of the spectrogram in the time-frequency structures and explore the potential relationship between high and low frequency components. These new features extracted by CNN are fed into the improved long short-term memory network to recover high-frequency components information and reconstruct speech signals. From the experiments on bone conduction speech database, we can draw a conclusion that the proposed model can reconstruct the time-frequency details of the high-frequency components. While improving the enhancement performance, the model size and the computational complexity are reduced.
查看全文  HTML  查看/发表评论

Copyright @2010-2015《数据采集与处理》编辑部

地址:南京市御道街29号        邮编:210016

电话:025-84892742      传真:025-84892742       E-mail:sjcj@nuaa.edu.cn

您是本站第2371862位访问者 本站今日一共被访问411

技术支持:北京勤云科技发展有限公司