基于多级残差网络的环境声音分类方法

doi:10.16337/j.1004-9037.2021.05.011

首页 > 按月查看>2021年第5月 >960-968. DOI:10.16337/j.1004-9037.2021.05.011

基于多级残差网络的环境声音分类方法
DOI:
                        10.16337/j.1004-9037.2021.05.011
                    
作者:
                        
                        
                    
作者单位:湘潭大学物理与光电工程学院，湘潭 411105
作者简介:
通讯作者:
基金项目:国家自然科学基金(62071411)资助项目；湖南省自然科学基金(2018JJ3486)资助项目。

Environmental Sound Classification Method Based on Multilevel Residual Network

Author:

Affiliation:

School of Physics and Optoelectronics, Xiang Tan University, Xiangtan 411105, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

为了对环境声音进行更好的识别和分类，提出了基于多级残差网络（Multilevel residual network， Mul-EnvResNet）的环境声音分类方法。对声音事件进行时标和基频压扩之后，提取其梅尔频率倒谱系数（Mel-frequency cepstral coefficients， MFCCs），以及它们的差分作为特征参数送入Mul-EnvResNet对声音事件进行分类。实验数据集采用ESC-50，将Mul-EnvResNet模型与端到端的卷积神经网络（EnvNet）、基于注意力机制的循环神经网络（Attention based convolutional recurrent neural network， ACRNN），以及受限卷积玻尔兹曼机的无监督滤波器组模型（Convolutional restricted Boltzmann machine， ConvRBM）进行对比实验。实验结果表明， Mul-EnvResNet取得了89.32%的最佳分类准确率，相较上述3种模型在分类准确率上分别有18.32%、3.22%、2.82%的提升，相较于其他的声音分类方法也均有明显的优势。

Abstract:

To better identify and classify environmental sound， a multilevel residual network （Mul-EnvResNet） is proposed for environmental sound classification. After time stretch and pitch shift for sound events， the Mel-frequency cepstral coefficients （MFCCs） and their deltas are extracted as feature parameters and sent into the Mul-EnvResNet to classify sound events. The experimental data set uses ESC-50， Mul-EnvResNet is compared with the end-to-end convolutional neural network （EnvNet）， the attention based convolutional recurrent neural network （ACRNN） and the unsupervised filterbank learning using convolutional restricted Boltzmann machine （ConvRBM）. The experimental results show that， Mul-EnvResNet achieves the best accuracy rate of 89.32% in terms of classification accuracy， compared with the above three models， the classification accuracy has been improved by 18.32%， 3.22% and 2.82%， respectively， which also has obvious advantages compared with other sound classification methods.

表 1 不同模型和不同卷积核大小的短连接下的准确率Table 1 Accuracy of different models and shortcut with different convolution kernel sizes

表 2 不同模型下分类准确率和训练时间Table 2 Classification accuracy and training time under different models

图1 基于Mul-EnvResNet的ESC流程图Fig.1 ESC process based on Mul-EnvResNet

图2 残差块的结构Fig.2 Structure of residual block

图3 EnvResNet结构与残差块Fig.3 Structure of EnvResNet and residual block

图4 Mul-EnvResNet结构与多级残差块Fig.4 Structure of Mul-EnvResNet and multilevel residual block

图5 Mul-EnvResNet训练和测试曲线图Fig.5 Multilevel residual network training and test curves

表 3 ESC-50上各模型对比实验结果Table 3 Camparison of experimental results of various models on ESC-50

参考文献

相似文献

引证文献

引用本文

曾金芳,李友明,杨恢先,张钰,胡雅欣.基于多级残差网络的环境声音分类方法[J].数据采集与处理,2021,36(5):960-968

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:2020-11-25
最后修改日期:2021-02-28
录用日期:
在线发布日期: 2021-09-25

引用本文

分享

文章指标

历史