Person Re-identification Method Based on Improved Transformer Encoder and Feature Fusion
Author:
Affiliation:
School of Electronics and Information Engineering, Shanghai University of Electric Power College, Shanghai 201306, China
Fund Project:
摘要
|
图/表
|
访问统计
|
参考文献
|
相似文献
|
引证文献
|
资源附件
摘要:
为了解决Transformer编码器在行人重识别中因图像块信息丢失以及行人局部特征表达不充分导致模型识别准确率低的问题,本文提出改进型Transformer编码器和特征融合的行人重识别算法。针对Transformer在注意力运算时会丢失行人图像块相对位置信息的问题,引入相对位置编码,促使网络关注行人图像块语义化的特征信息,以增强行人特征的提取能力。为了突出包含行人区域的显著特征,将局部patch注意力机制模块嵌入到Transformer网络中,对局部关键特征信息进行加权强化。最后,利用全局与局部信息特征融合实现特征间的优势互补,提高模型识别能力。训练阶段使用Softmax及三元组损失函数联合优化网络,本文算法在Market1501和DukeMTMC-reID两大主流数据集中评估测试,Rank-1指标分别达到97.5%和93.5%,平均精度均值(mean Average precision, mAP)分别达到92.3%和83.1%,实验结果表明改进型Transformer编码器和特征融合算法能够有效提高行人重识别的准确率。
Abstract:
In order to solve the problem of low accuracy of Transformer encoder caused by the loss of person image blocks information and insufficient expression of person local features in person re-identification, an improved Transformer encoder and feature fusion algorithm for person re-identification is proposed. This algorithm uses relative position encoding to solve the problem that Transformer will lose the relative position information of person image blocks during attention operation so that the network can focus on the semantic feature information of person image blocks, thus enhancing the ability to extract pedestrian features. Secondly, the local patch attention module is embedded into the Transformer network to weighted strengthen the local key feature information and highlight the significant features of the person area. Finally, the fusion of global and local information features is used to achieve complementary advantages between features and improve the recognition ability of the model. In the training stage, Softmax and triple loss functions are used to jointly optimize the network. The proposed algorithm is experimentally compared and analyzed on the mainstream datasets of Market1501 and DukeMTMC-reID. The Rank-1 accuracy reaches 97.5% and 93.5% respectively, and the mean average precision (mAP) reaches 92.3% and 83.1% respectively. The experimental results show that the improved Transformer encoder and feature fusion algorithm can effectively improve the accuracy of person re-identification.