联合多视图特征对齐与自适应度量组合的行人重识别方法

联合多视图特征对齐与自适应度量组合的行人重识别方法
DOI:
                        
作者:
                        
作者单位:重庆邮电大学通信与信息工程学院 重庆 400065
作者简介:
通讯作者:
基金项目:国家自然科学基金（62271096）

Joint Multi-View Feature Alignment and Adaptive Metric Fusion for Person Re-Identification

Author:

Affiliation:

School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications

Fund Project:

The National Natural Science Foundation of China (62271096)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

随着视觉感知技术在智能安防、行为分析和城市交通等异构场景中的广泛应用，视角差异导致的特征分布偏移成为行人重识别的核心挑战。传统的卷积神经网络虽擅长提取局部细节，但在建模跨视角的全局依赖和语义一致性方面存在不足。而Transformer具备全局建模能力，却面临高维特征下的计算冗余和泛化性能下降问题。为应对上述挑战，本文提出了一种兼顾局部细节刻画与全局特征对齐能力的多视角协同特征编码框架。该方法首先利用卷积神经骨干网络提取图像的局部细粒度特征，随后，通过引入跨视角邻域Transformer实现对特征空间的低秩建模，并结合互为邻域的稀疏注意力机制，强化跨相机视角之间的上下文交互，有效降低了多视角特征融合过程中的冗余干扰。为进一步提升多层次特征相似度的判别能力，本文还引入自适应加权度量组合机制，自适应整合多尺度相似性度量，以提升复杂环境中的判别稳健性与识别准确性。在Market1501、DukeMTMC-ReID与MSMT17三个主流公开数据集上的实验结果表明，本文方法在mAP/Rank-1指标上分别达到91.7%/96.1%、85.2%/92.4%与63.5%/83.6%，整体性能优于当前行人重识别方法，展现出较强的泛化能力与应用潜力。

Abstract:

With the widespread application of visual perception technologies in fields such as intelligent security, behavioral analysis, and urban transportation, viewpoint-induced feature distribution shifts have become a key challenge in person re-identification. While traditional Convolutional Neural Networks excel at capturing local details, they struggle with modeling cross-view global dependencies and ensuring semantic consistency. Transformers, on the other hand, offer strong global modeling but suffer from computational redundancy and poor generalization in high-dimensional settings. To address these challenges, the network propose a multi-view cooperative feature encoding framework that integrates fine-grained local representation with global feature alignment. This framework first uses a Convolutional Neural Networks backbone to extract detailed features and then employs a Cross-View Neighborhood Transformer for low-rank modeling. By incorporating a mutual-neighborhood sparse attention mechanism, it enhances cross-camera contextual interactions and reduces redundancy in multi-view feature fusion. Additionally, an adaptive metric combination strategy is introduced to improve discriminability and recognition accuracy in complex environments. Experiments on three public benchmarks Market1501, DukeMTMC-ReID, and MSMT17—show that method outperforms existing approaches with mAP/Rank-1 scores of 91.7%/96.1%, 85.2%/92.4%, and 63.5%/83.6%, respectively, demonstrating strong generali-zation and application potential.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:2025-08-10
最后修改日期:2025-12-03
录用日期:2025-12-05
在线发布日期:

引用本文

分享

文章指标

历史