基于知识蒸馏的缅甸语光学字符识别方法

doi:10.16337/j.1004-9037.2022.01.015

首页 > 按月查看>2022年第1月 >173-182. DOI:10.16337/j.1004-9037.2022.01.015

基于知识蒸馏的缅甸语光学字符识别方法
DOI:
                        10.16337/j.1004-9037.2022.01.015
                    
作者:
                        
                        
                    
作者单位:1.昆明理工大学信息工程与自动化学院，昆明 650500;2.昆明理工大学云南省人工智能重点实验室，昆明 650500
作者简介:
通讯作者:
基金项目:国家自然科学基金重点项目（61732005）；国家自然科学基金（62166023，61866019，61761026，61972186）；云南省重大科技专项计划（202103AA080015）；云南省应用基础研究计划重点项目（2019FA023）；云南省中青年学术和技术带头人后备人才项目（2019HB006）。

Burmese OCR Method Based on Knowledge Distillation

Author:

Affiliation:

1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500,China;2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

与传统的图像文本识别任务不同，缅甸语光学字符识别（Optical character recognition， OCR）需要计算机在一个感受野内识别由多个字符嵌套组合的复杂字符，这给缅甸语OCR任务带来了巨大的挑战。为了解决该问题，提出了一种基于知识蒸馏的缅甸语OCR方法，构建了使用卷积神经网络（Convolutional neural networks， CNN）+循环神经网络（Recurrent neural network， RNN）框架的教师网络和学生网络，以集成学习的方式进行训练的模型架构，在训练过程中通过教师集成的子网络与学生网络进行耦合，实现学生网络中单个感受野对应的局部字符图像特征与教师网络中整体字符图像特征的对齐，以此增强对长序列字符图像中局部特征的获取。实验结果表明，在没有背景噪声图像和有背景噪声图像作为训练数据集的情况下，本文模型的性能分别优于基线2.9％和2.7％。

Abstract:

Different from traditional image text recognition tasks， the Burmese optical character recognition （OCR） requires computers to recognize complex characters nested and combined by multiple characters in a receptive field， which brings great challenges to Burmese OCR tasks. To solve this problem， a Burmese OCR method based on knowledge distillation is proposed. This paper constructs a model of teacher network and student network using the framework of convolutional neural networks （CNN）+ recurrent neural networks （RNN） to train in an integrated learning way. In the training process， the teacher integrated sub-network is coupled with the student network to realize the alignment of the local character image features corresponding to a single receptive field in the student network and the overall character image features in the teacher network， so as to enhance the acquisition of local features in long sequence character images. The experimental results show that the performance of our model is better than the baseline by 2.9% and 2.7% respectively without and with background noise images as training data sets.

表 4 具有背景噪声的情况下每个字符准确率和全序列准确率的实验结果Table 4 Experimental results of accuracy of per character and accuracy of full sequence with background noise

表 3 训练集为数据集1、2和3时的识别结果Table 3 Recognition results with training set of datasets 1， 2 and 3

表 2 训练集为数据集1和3时的识别结果Table 2 Recognition results with the training set of datasets 1 and 3

表 1 数据集格式及对应标签示例Table 1 Example of data set format and corresponding label

图1 1个感受野内不同语言的字符结构Fig.1 Structure of characters in different languages in a receptive field

图2 缅甸语OCR模型网络框架图Fig.2 Network framework diagram of Burmese OCR model

图3 不同数据集大小的单字符准确率Fig.3 Accuracy of per character for different sizes of datasets

图4 不同数据集大小的全序列句子准确率Fig.4 Accuracy of full sequence sentences with different sizes of datasets

参考文献

相似文献

引证文献

引用本文

毛存礼,谢旭阳,余正涛,高盛祥,王振晗,刘福浩.基于知识蒸馏的缅甸语光学字符识别方法[J].数据采集与处理,2022,37(1):173-182

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:2020-08-01
最后修改日期:2021-05-06
录用日期:
在线发布日期: 2022-01-25

引用本文

分享

文章指标

历史