Joint Inference of Visual Attention and Semantic Perception for Scene Text Recognition
CSTR:
Author:
Affiliation:

College of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Irregular text recognition in scenes is still a challenging problem. For arbitrary shapes and low-quality text in scenes, this paper proposes a multimodal network that combines a visual attention module and a semantic perception module. The visual attention module uses a parallel attention-based approach to extract visual features of images combined with positional encoding. The semantic perception module based on weak supervised learning is used to learn linguistic information to compensate for the deficiencies of visual features. The module uses a Transformer-based variant that improves the model’s contextual semantic inference by randomly masking a character in a word for training. The visual semantic fusion module interacts information from different modalities through a gating mechanism to generate robust features for character prediction. The proposed approach is demonstrated through extensive experiments to be effective in recognizing arbitrarily shaped and low-quality scene text, and competitive results are obtained on several benchmark datasets. In particular, accuracy rates of 93.6% and 86.2% are achieved for the datasets SVT and SVTP, which contain low-quality text, respectively. Compared with the method containing only the visual module, the accuracy is improved by 3.5% and 3.9%, respectively, which fully demonstrates the importance of semantic information for text recognition.

    Reference
    Related
    Cited by
Get Citation

Tong Guoxiang, Dong Tianrong, HU Hengzhang. Joint Inference of Visual Attention and Semantic Perception for Scene Text Recognition[J].,2023,38(3):665-675.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 23,2022
  • Revised:March 21,2023
  • Adopted:
  • Online: May 25,2023
  • Published:
Article QR Code