Two Image Rectification Networks for Distorted and Warped Documents
CSTR:
Author:
Affiliation:

School of Electronic Information Engineering, Hebei University of Technology, Tianjin 300401, China

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Due to the geometric distortion of the document paper, the interference from the shooting scene, and perspective distortion brought on by the unfavorable shooting angle, the optical character recognition (OCR) quality of document photos taken by mobile devices has been severely hampered. Two networks based on auto-encoder are created to perform adaptive image correction and increase the accurate rate of text recognition in order to handle pre-processing distorted document images with folding and distortion. First, we propose two different types of residual blocks: dilated residual blocks and asymmetric convolutional residual blocks, and then combine the residual blocks with the auto-encoder to create an asymmetric dilated auto-encoder. In the meantime, we create a spatial pyramid auto-encoder by using spatial pyramid pooling instead of fully connected layers and implementing feature extraction with asymmetric convolutional residual blocks. Experimental results show that, compared with distorted images, the corrected images by the asymmetric dilated auto-encoder respectively improve by 26.3%, 20.4% and 12.3% in OCR precision, OCR recall, and text similarity. Besides the corrected images by the spatial pyramid auto-encoder respectively improve by 27.7%, 22.0% and 15.5% in OCR precision, OCR recall, and text similarity. Compared with other image rectification networks such as RectiNet, the corrected images by these two auto-encoders perform much better on optical character recognition. The corrected document images of both asymmetric dilated auto-encoder and spatial pyramid auto-encoder are effectively improved in terms of OCR precision, OCR recall, and text similarity. Not only that, they have relatively obvious advantages over existing networks in terms of robustness and generalizability.

    Reference
    Related
    Cited by
Get Citation

FENG Jin, CHI Yue, ZHOU Yatong, HE Jingfei. Two Image Rectification Networks for Distorted and Warped Documents[J].,2024,39(1):167-180.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 19,2022
  • Revised:February 27,2023
  • Adopted:
  • Online: January 25,2024
  • Published:
Article QR Code