面向畸变扭曲文档的两种图像矫正网络
DOI:
作者:
作者单位:

河北工业大学电子信息工程学院

作者简介:

通讯作者:

基金项目:

国家自然科学(61801164)


Two Image Rectification Networks for Distorted and Warped Documents
Author:
Affiliation:

School of Electronics Information Engineering,Hebei University of Technology

Fund Project:

the National Natural Science Foundation of China

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    由于文档纸张的几何形变、拍摄场景的干扰、拍摄角度不理想导致的透视失真,移动设备获取的文档图像的文字识别(optical character recognition,OCR)性能受到很大挑战。针对折叠和扭曲的畸变文档图像预处理问题,设计了两种基于自编码器的网络结构,以实现自适应性图像矫正并提高文字识别正确率。首先提出空洞残差块和非对称卷积残差块两种残差块,然后将残差块与自编码器相结合,设计了一种非对称空洞自编码器网络;同时利用空间金字塔池化代替全连接层,并用非对称卷积残差块实现特征提取,设计了另一种空间金字塔自编码器网络。实验结果表明,与畸变图像相比,经非对称空洞自编码器网络矫正后的图像在OCR正确率、OCR召回率、文本相似度依次提高26.3%、20.4%、12.3%,而经空间金字塔自编码器网络矫正后的图像在正确率、召回率、文本相似度依次提高27.7%、22%、15.5%。与RectiNet等其它图像矫正网络相比,这两种网络可以自适应矫正多种类型的畸变文档图像,且矫正后的图像在文字识别上表现更为优异。本文提出的两种矫正网络能有效提高了图像文字识别正确率、召回率、文本相似度,同时在鲁棒性、泛化性等方面与现有矫正网络相比具有明显的优势。

    Abstract:

    Due to the geometric distortion of the document paper, interference from the shooting scene, and perspective distortion brought on by the unfavorable shooting angle, the OCR quality of document photos taken by mobile devices has been severely hampered. Two networks based on Auto-Encoder are created to perform adaptive image correction and increase the accurate rate of text recognition in order to handle pre-processing distorted document images with folding and distortion. First, we propose two different types of residual blocks: dilatedSresidual blocks and asymmetric convolutional residual blocks, and thenScombine the residual blocks with the Auto-EncoderSto create an asymmetric dilated Auto-Encoder. In the meantime, we create a Spatial Pyramid Auto-EncoderSby using Spatial Pyramid Pooling insteadSof fully connected layers and implementingSfeature extraction with asymmetric convolutional residual blocks. The Experiment Result Expresses that, compared with the distorted images, the corrected images by the Asymmetric Dilated Auto-Encoder respectively improve by 26.3%, 20.4% and 12.3% in OCR precision, OCR recall, and text similarity. Besides the corrected images by the Spatial Pyramid Auto-Encoder respectively improve by 27.7%, 22% and 15.5% in OCR precision, OCR recall, and text similarity. Compared with other image rectification networks such as RectiNet, the corrected images by these two Auto-Encoders perform much better on optical character recognition. The corrected document images of both Asymmetric Dilated Auto-Encoder and Spatial Pyramid Auto-Encoder are effectively improved in terms of OCR precision, OCR recall, and text similarity. Not only that, they have relatively obvious advantages over existing networks in terms of robustness and generalizability.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-12-19
  • 最后修改日期:2023-02-27
  • 录用日期:2023-04-23
  • 在线发布日期: