基于注意力机制和多尺度集成学习的细粒度图像识别方法
作者:
作者单位:

东南大学信息科学与工程学院,南京 211102

作者简介:

通讯作者:

基金项目:

国家自然科学基金(61971128)。


Fine-Grained Image Recognition Method Based on Attention and Multi-scale Ensemble Learning
Author:
Affiliation:

School of Information Science and Engineering, Southeast University, Nanjing 211102, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    细粒度图像识别是计算机视觉领域中一项重要的研究课题,其主要目标是分辨同属一大类下外观具有高度相似性的子类。以弱监督的细粒度图像识别为研究内容,针对现有研究中存在的图像细粒度特征利用不充分以及判别性区域难以挖掘的问题,提出了基于注意力机制和多尺度集成学习策略的细粒度图像识别方法。该方法引入渐进式学习网络,利用集成学习的策略,基于深度神经网络3个层级的输出特征并行构建多尺度基分类器,并使用标签平滑的方法对分类器进行渐进式训练,从而大幅度提高低层特征的利用率;同时采用高效双通道注意力机制对特征施加通道权重,使得网络能够在通道层面自主筛选特征,从而提升高信息相关度通道的利用率。该方法还引入了自注意力区域建议网络,通过构建循环反馈机制促使模型逐步定位到更加具有判别性的区域,并在最后的分类模块中将完整图像与判别性区域的特征信息进行融合。实验结果表明,该方法在CUB-200-2011、FGVC Aircraft和Stanford Cars细粒度图像数据集上的识别准确率达到行业先进水平。

    Abstract:

    Fine-grained image recognition (FGIR) is an important research topic in the field of computer vision. Its main goal is to distinguish subclasses with high similarity in appearance under the same category. This paper focuses on the research of weakly-supervised fine-grained image recognition technology. Given the problems of insufficient use of feature of fine-grained images and difficulty in digging discriminative regions existing in the research of FGIR, the attention and multi-scale ensemble-learning based network (AMEN) is proposed. This method introduces a progressive learning network, which uses the strategy of ensemble learning to construct multi-scale base-classifiers based on three levels of output features of deep neural network in parallel, and uses the label smoothing method to carry out progressive training for multi-scale base-classifiers, so as to greatly improve the utilization of low-level features. At the same time, the efficient dual channel attention is used to impose channel weights on features, so that the network can independently select features at the channel level, so as to improve the utilization of high information correlation channels. This method also introduces a self-attention region proposal network, which promotes the model to gradually locate the more discriminative region by constructing a circular feedback mechanism, and fuses the feature information of the complete image and the discriminative region in the final classification module. Experimental results show that the recognition accuracy of AMEN on three fine-grained image datasets of CUB-200-2011, FGVC Aircraft and Stanford Cars has reached the advanced level of the field.

    参考文献
    相似文献
    引证文献
引用本文

季晟宇,江志康,马翔,杨绿溪.基于注意力机制和多尺度集成学习的细粒度图像识别方法[J].数据采集与处理,2025,40(2):384-400

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-04-29
  • 最后修改日期:2024-09-17
  • 录用日期:
  • 在线发布日期: 2025-04-11