Two Image Rectification Networks for Distorted and Warped Documents
Author:
Affiliation:
School of Electronic Information Engineering, Hebei University of Technology, Tianjin 300401, China
Fund Project:
摘要
|
图/表
|
访问统计
|
参考文献
|
相似文献
|
引证文献
|
资源附件
摘要:
由于文档纸张的几何形变、拍摄场景的干扰及拍摄角度不理想导致的透视失真,移动设备获取的文档图像的光学字符识别(Optical character recognition,OCR)性能受到很大挑战。针对折叠和扭曲的畸变文档图像预处理问题,设计了两种基于自编码器的网络结构,以实现自适应性图像矫正并提高文字识别正确率。首先提出空洞残差块和非对称卷积残差块两种残差块,然后将残差块与自编码器相结合,设计了一种非对称空洞自编码器网络;同时利用空间金字塔池化代替全连接层,并用非对称卷积残差块实现特征提取,设计了另一种空间金字塔自编码器网络。实验结果表明,与畸变图像相比,经非对称空洞自编码器网络矫正后的图像在OCR正确率、OCR召回率和文本相似度上分别提高了26.3%、20.4%和12.3%,而经空间金字塔自编码器网络矫正后的图像在正确率、召回率和文本相似度上分别提高了27.7%、22.0%和15.5%。与RectiNet等其他图像矫正网络相比,这两种网络可以自适应矫正多种类型的畸变文档图像,且矫正后的图像在文字识别上表现更为优异。本文提出的两种矫正网络能有效提高图像文字识别正确率、召回率和文本相似度,同时在鲁棒性、泛化性等方面与现有矫正网络相比具有明显的优势。
Abstract:
Due to the geometric distortion of the document paper, the interference from the shooting scene, and perspective distortion brought on by the unfavorable shooting angle, the optical character recognition (OCR) quality of document photos taken by mobile devices has been severely hampered. Two networks based on auto-encoder are created to perform adaptive image correction and increase the accurate rate of text recognition in order to handle pre-processing distorted document images with folding and distortion. First, we propose two different types of residual blocks: dilated residual blocks and asymmetric convolutional residual blocks, and then combine the residual blocks with the auto-encoder to create an asymmetric dilated auto-encoder. In the meantime, we create a spatial pyramid auto-encoder by using spatial pyramid pooling instead of fully connected layers and implementing feature extraction with asymmetric convolutional residual blocks. Experimental results show that, compared with distorted images, the corrected images by the asymmetric dilated auto-encoder respectively improve by 26.3%, 20.4% and 12.3% in OCR precision, OCR recall, and text similarity. Besides the corrected images by the spatial pyramid auto-encoder respectively improve by 27.7%, 22.0% and 15.5% in OCR precision, OCR recall, and text similarity. Compared with other image rectification networks such as RectiNet, the corrected images by these two auto-encoders perform much better on optical character recognition. The corrected document images of both asymmetric dilated auto-encoder and spatial pyramid auto-encoder are effectively improved in terms of OCR precision, OCR recall, and text similarity. Not only that, they have relatively obvious advantages over existing networks in terms of robustness and generalizability.