Hybrid Convolutional Enhancement and Content-Aware Attention for Cross-Modality Person Re-identification
Author:
Affiliation:
College of Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Fund Project:
摘要
|
图/表
|
访问统计
|
参考文献
|
相似文献
|
引证文献
|
资源附件
摘要:
跨模态行人重识别作为计算机视觉领域的研究热点,旨在解决不同成像条件下的行人匹配问题。现有研究着重于提取模态共享特征,但不能充分挖掘、鉴别行人身份至关重要的细节特征。为了解决该问题,提出了一种基于混合卷积增强和内容感知注意力(Hybrid convolutional enhancement and content-aware attention, HCECA)的跨模态行人重识别方法,旨在提取更富含细节信息的行人特征。首先,在主干网络中嵌入混合卷积增强(Hybrid convolutional enhancement, HCE)模块,捕获更丰富的跨模态特征表示,提高特征的区分度和鲁棒性。然后,通过内容感知注意力(Content-aware attention, CA)模块来挖掘丰富的细节信息,以提升行人特征的区分性。最后,在SYSU-MM01和RegDB数据集上进行了实验。提出的HCECA在SYSU-MM01数据集的全搜索模式下,Rank-1和平均精度均值(Mean average precision,mAP)分别达到72.21%和69.89%,在RegDB数据集上可见-红外模式下,Rank-1和mAP分别达到92.23%和85.08%,均优于现有的跨模态行人重识别方法。
Abstract:
Cross-modality person re-identification, as a research hotspot in the field of computer vision, aims to solve the challenge of matching pedestrians across varying imaging conditions. Existing methods focus on extracting modality-shared features, but fail to fully mine the detailed features that are crucial for discriminative person identities. To address this issue, a hybrid convolutional enhancement and content-aware attention (HCECA) for cross-modality person re-identification is proposed, which aims to extract pedestrian features with more detailed information. First, a hybrid convolutional enhancement (HCE) module is embedded in the backbone network to capture richer cross-modality feature representation, enhancing the distinctiveness and robustness of the features. Second, a content-aware attention (CA) module is employed to mine rich detailed information, thereby improving the discriminability of pedestrian features. Finally, experiments are performed on the SYSU-MM01 and RegDB datasets. The proposed HCECA attains the Rank-1 accuracy of 72.21% and the mean average preeison(mAP) of 69.89% in the all-search mode on the SYSU-MM01 dataset, while achieving the Rank-1 accuracy of 92.23% and the mAP of 85.08% in the visible-infrared mode on the RegDB dataset. Both results outperform better than those of current cross-modality person re-identification methods.