To better identify and classify environmental sound, a multilevel residual network (Mul-EnvResNet) is proposed for environmental sound classification. After time stretch and pitch shift for sound events, the Mel-frequency cepstral coefficients (MFCCs) and their deltas are extracted as feature parameters and sent into the Mul-EnvResNet to classify sound events. The experimental data set uses ESC-50, Mul-EnvResNet is compared with the end-to-end convolutional neural network (EnvNet), the attention based convolutional recurrent neural network (ACRNN) and the unsupervised filterbank learning using convolutional restricted Boltzmann machine (ConvRBM). The experimental results show that, Mul-EnvResNet achieves the best accuracy rate of 89.32% in terms of classification accuracy, compared with the above three models, the classification accuracy has been improved by 18.32%, 3.22% and 2.82%, respectively, which also has obvious advantages compared with other sound classification methods.