After echo hiding, the cepstrum coefficient of a speech signal will peak at the echo delay. The traditional echo hiding steganalysis mainly uses the statistical characteristics of the cepstrum coefficient as the steganalysis feature. However, the peak value of the cepstrum coefficient of the steganography signal is not obvious when the echo amplitude is low, and the detection performance of the method based on the statistical characteristics is unsatisfactory. This paper combines cepstrum analysis with image recognition technology, and proposes an steganalysis method for speech echo hiding based on cepstrum image. The speech signal is divided into frames and windowed for cepstrum calculation. Then, the image is generated with time as the horizontal axis, cepstrum sequence points as the vertical axis, and cepstrum coefficient amplitude as the gray level. The generated cepstrum image is used as the steganalysis input, and residual neural network is used as the classifier for echo hiding steganalysis. The experimental results show that the detection accuracy of the three classical echo hiding algorithms reaches 98.2%, 98.6% and 96.1% respectively at low echo amplitude. The detection accuracy of this method at low echo amplitude is greatly improved compared with the traditional echo hiding steganalysis method, which solves the problem that the traditional echo hiding steganalysis method has unsatisfactory detection effect at low echo amplitude.