基于多核扩展卷积的无监督视频行人重识别
作者:
作者单位:

1.兰州理工大学电气工程与信息工程学院,兰州 730050;2.甘肃省工业过程先进控制重点实验室, 兰州 730050;3.西北民族大学数学与计算机科学学院, 兰州 730030

作者简介:

通讯作者:

基金项目:

国家自然科学基金(62061042);甘肃省工业过程先进控制重点实验室开放基金项目(2022KX10)。


Unsupervised Video Person Re-identification Based on Multiple Kernel Dilated Convolution
Author:
Affiliation:

1.School of Electrical Engineering and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China;2.Key Laboratory of Gansu Advanced Control for Industrial Processes, Lanzhou 730050, China;3.College of Mathematics and Computer Science, Northwest Minzu University, Lanzhou 730030, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    行人重识别旨在跨监控摄像头下检索出特定的行人目标。由于存在姿态变化、物体遮挡和背景干扰的不同成像条件等问题,导致行人特征提取不充分。本文提出一种利用多核扩展卷积的无监督视频行人重识别方法,使得提取到的行人特征能够更全面、更准确地表达个体差异和特征信息。首先,采用预训练的ResNet50作为编码器,为了进一步提升编码器的特征提取能力,引入了多核扩展卷积模块,通过增加卷积核的感受野,使得网络能够更有效地捕获到局部和全局的特征信息,从而更全面地描述行人的外貌特征;其次,通过解码器将高级语义信息还原为更为底层的特征表示,从而增强特征表示,提高系统在复杂成像条件下的性能;最后,在解码器的输出中引入多尺度特征融合模块融合相邻层中的特征,进一步减少不同特征通道层之间的语义差距,以产生更鲁棒的特征表示。在3个主流数据集上进行离线实验,结果表明该方法在准确性和鲁棒性上均取得了显著的改进。

    Abstract:

    Person re-identification aims to identify specific individuals across surveillance cameras, overcoming challenges such as pose variations, occlusions, and background noise that often lead to insufficient feature extraction. This paper proposes a novel unsupervised video-based person re-identification method that utilizes multi-kernel dilated convolution to provide a more comprehensive and accurate representation of individual differences and features. Initially, we employ a pre-trained ResNet50 as an encoder. To further enhance the encoder’s feature extraction capability, we introduce a multiple kernel dilated convolution module. Enlarging the receptive field of convolutional kernels allows the network to more effectively capture both local and global feature information, offering a more comprehensive depiction of a person’s appearance features. Subsequently, a decoder is employed to restore high-level semantic information to a more fundamental feature representation, thereby strengthening feature representation and improving system performance under complex imaging conditions. Finally, a multi-scale feature fusion module is introduced in the decoder output to merge features from adjacent layers, reducing semantic gaps between different feature channel layers and generating more robust feature representations. Offline experiments are conducted on three mainstream datasets, and results show that the proposed method achieves significant improvements in both accuracy and robustness.

    参考文献
    相似文献
    引证文献
引用本文

刘仲民,张长凯,胡文瑾.基于多核扩展卷积的无监督视频行人重识别[J].数据采集与处理,2024,(5):1192-1203

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2023-11-27
  • 最后修改日期:2024-02-26
  • 录用日期:
  • 在线发布日期: 2024-10-14