Abstract:Due to the geometric distortion of the document paper, interference from the shooting scene, and perspective distortion brought on by the unfavorable shooting angle, the OCR quality of document photos taken by mobile devices has been severely hampered. Two networks based on Auto-Encoder are created to perform adaptive image correction and increase the accurate rate of text recognition in order to handle pre-processing distorted document images with folding and distortion. First, we propose two different types of residual blocks: dilatedSresidual blocks and asymmetric convolutional residual blocks, and thenScombine the residual blocks with the Auto-EncoderSto create an asymmetric dilated Auto-Encoder. In the meantime, we create a Spatial Pyramid Auto-EncoderSby using Spatial Pyramid Pooling insteadSof fully connected layers and implementingSfeature extraction with asymmetric convolutional residual blocks. The Experiment Result Expresses that, compared with the distorted images, the corrected images by the Asymmetric Dilated Auto-Encoder respectively improve by 26.3%, 20.4% and 12.3% in OCR precision, OCR recall, and text similarity. Besides the corrected images by the Spatial Pyramid Auto-Encoder respectively improve by 27.7%, 22% and 15.5% in OCR precision, OCR recall, and text similarity. Compared with other image rectification networks such as RectiNet, the corrected images by these two Auto-Encoders perform much better on optical character recognition. The corrected document images of both Asymmetric Dilated Auto-Encoder and Spatial Pyramid Auto-Encoder are effectively improved in terms of OCR precision, OCR recall, and text similarity. Not only that, they have relatively obvious advantages over existing networks in terms of robustness and generalizability.