基于多级残差网络的环境声音分类方法
作者:
作者单位:

湘潭大学物理与光电工程学院,湘潭 411105

作者简介:

通讯作者:

基金项目:

国家自然科学基金(62071411)资助项目;湖南省自然科学基金(2018JJ3486)资助项目。


Environmental Sound Classification Method Based on Multilevel Residual Network
Author:
Affiliation:

School of Physics and Optoelectronics, Xiang Tan University, Xiangtan 411105, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    为了对环境声音进行更好的识别和分类,提出了基于多级残差网络(Multilevel residual network, Mul-EnvResNet)的环境声音分类方法。对声音事件进行时标和基频压扩之后,提取其梅尔频率倒谱系数(Mel-frequency cepstral coefficients, MFCCs),以及它们的差分作为特征参数送入Mul-EnvResNet对声音事件进行分类。实验数据集采用ESC-50,将Mul-EnvResNet模型与端到端的卷积神经网络(EnvNet)、基于注意力机制的循环神经网络(Attention based convolutional recurrent neural network, ACRNN),以及受限卷积玻尔兹曼机的无监督滤波器组模型(Convolutional restricted Boltzmann machine, ConvRBM)进行对比实验。实验结果表明, Mul-EnvResNet取得了89.32%的最佳分类准确率,相较上述3种模型在分类准确率上分别有18.32%、3.22%、2.82%的提升,相较于其他的声音分类方法也均有明显的优势。

    Abstract:

    To better identify and classify environmental sound, a multilevel residual network (Mul-EnvResNet) is proposed for environmental sound classification. After time stretch and pitch shift for sound events, the Mel-frequency cepstral coefficients (MFCCs) and their deltas are extracted as feature parameters and sent into the Mul-EnvResNet to classify sound events. The experimental data set uses ESC-50, Mul-EnvResNet is compared with the end-to-end convolutional neural network (EnvNet), the attention based convolutional recurrent neural network (ACRNN) and the unsupervised filterbank learning using convolutional restricted Boltzmann machine (ConvRBM). The experimental results show that, Mul-EnvResNet achieves the best accuracy rate of 89.32% in terms of classification accuracy, compared with the above three models, the classification accuracy has been improved by 18.32%, 3.22% and 2.82%, respectively, which also has obvious advantages compared with other sound classification methods.

    表 1 不同模型和不同卷积核大小的短连接下的准确率Table 1 Accuracy of different models and shortcut with different convolution kernel sizes
    表 2 不同模型下分类准确率和训练时间Table 2 Classification accuracy and training time under different models
    图1 基于Mul-EnvResNet的ESC流程图Fig.1 ESC process based on Mul-EnvResNet
    图2 残差块的结构Fig.2 Structure of residual block
    图3 EnvResNet结构与残差块Fig.3 Structure of EnvResNet and residual block
    图4 Mul-EnvResNet结构与多级残差块Fig.4 Structure of Mul-EnvResNet and multilevel residual block
    图5 Mul-EnvResNet训练和测试曲线图Fig.5 Multilevel residual network training and test curves
    表 3 ESC-50上各模型对比实验结果Table 3 Camparison of experimental results of various models on ESC-50
    参考文献
    相似文献
    引证文献
引用本文

曾金芳,李友明,杨恢先,张钰,胡雅欣.基于多级残差网络的环境声音分类方法[J].数据采集与处理,2021,36(5):960-968

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2020-11-25
  • 最后修改日期:2021-02-28
  • 录用日期:
  • 在线发布日期: 2021-09-25