摘要
随着深度学习的兴起,行人重识别逐渐成为计算机领域的热门话题。它通过给定的查询行人图像进行跨摄像机检索,找出与查询身份相匹配的行人。然而,由于受到不同视角下的背景、光照等因素影响,采集到的行人图像中存在大量的难样本,利用这些难样本训练得到的模型识别性能低下,缺乏鲁棒性。因此,为了提高模型对难样本的鉴别能力,设计了一种新颖的通过混淆因子合成具有难样本信息图像的方法。对于每批输入图片,通过相似性度量寻找每张图像对应的难样本,结合混淆因子合成具有难样本信息的新图像再以有监督的方式促使模型挖掘难样本信息,从而提高模型鲁棒性。大量对比实验表明,所提方法在主流数据集上达到了较高的识别率,消融实验证明了所提方法的有效性。
随着深度学习的出现,行人重识别技术得到了广泛的发
目前,行人重识别过程主要分为两步:(1)行人图片的有效特征提取;(2)对有效特征进行距离度量。然而由于数据集中大量难样本的干扰(

图1 锚点图像集合及难负样本集合
Fig.1 Positive sample sets and negative sample sets
随着深度学习的不断发展,2014年Goodfellow
针对上述方法中存在的问题,本文提出一种难样本混淆增强特征鲁棒性的行人重识别方法。主要工作如下:(1)使用基于ResNet⁃50的主体框架,充分利用难样本在图像层面的信息,赋予网络提取更具有鉴别性的特征的能力;(2)将距离度量的结果作为相似性找到与原图最为相似的难样本,在图像层面进行难样本混淆,增加了训练样本中信息的多样性,提高了模型泛化能力;(3)不需要额外的生成器和鉴别器,在端到端的模型中深度挖掘难样本信息,增强模型鲁棒性的同时让网络本身的结构相对简化。在不同数据集上的实验结果表明了本文方法的优越性,能够有效提升行人重识别性能。
在基于深度学习的行人重识别中,一般使用交叉熵损失函
(1) |
式中:为每批训练图像数量,为将输入图像预测为第个行人的概率,为指示函数,用来判断网络的预测结果是否正确。
(2) |
式中表示与输入图像对应的真实行人身份标签。由于常用的交叉熵损失函数过度地依赖训练集中正确的行人标签,这使得网络容易出现过拟合的现象。通常解决上述问题,标签平滑(Label smoothing, LS
(3) |
式中:为平滑超参数,在训练时作为容错因子引入到交叉熵损失函数中,实验中设置=0.1。由于训练集和测试集中的行人是互不重合的,标签平滑可以有效防止训练过程中出现的过拟合,进一步提升模型泛化能力。
常用的另一种优化特征的损失函数为三元组损失函
(4) |
式中为训练过程中设置的阈值参数。
(5) |
(6) |
式中:为选定的一个样本作为锚点。和分别为通过欧式距离选出来的难正样本和难负样本。

图2 网络结构示意图
Fig.2 Schematic diagram of network structure
所提方法的一个关键点在于寻找图像的难样本,数据集中的难样本很多,要挑选与到目标图像最相似的难样本是本节主要目标。给定一个锚点样本,表示该样本的身份,将其通过网络得到的特征与训练批次中其他身份的图像特征逐一进行距离度量,找到与最为相似的难样本。具体过程如下
(7) |
式中:为每个训练批次的图像数量(Batch size),定义为第个样本对应的难样本,其身份为。通过
(8) |
式中:为生成的混淆样本,为混淆因子。不同的会影响生成的混淆样本中难样本信息含量,后续实验中会展示不同的生成图像可视化结果,并且讨论的变化对性能的影响。对于混淆样本对应的标签,应与锚点图像的标签保持一致,同样为。因为所提出的难样本混淆方法目的是让当前锚点图像携带有部分难样本图像信息,从而生成一个新的混淆图像,通过这类混淆图像提升模型鉴别性,使网络在识别行人时不容易被难样本信息干扰。
为了验证难样本混淆算法的有效性,本文在两个大型主流数据集DukeMTMC⁃ReID和 Market⁃1501以及两个小数据集GRI

图3 4个不同数据集下的行人样本
Fig.3 Samples from four different datasets
实验中,本文采用累计匹配曲线(Cumulative match characteristic,CMC)中的Rank⁃1、Rank⁃5、Rank⁃10以及平均精度均值(Mean average precision,mAP
本文采用在Image⁃Net预训练过的ResNet⁃5
为了验证本文提出的算法性能,首先在大数据集Market⁃1501和DukeMTMC⁃ReID上与当前性能较为优异的方法进行了比较。
首先在Market⁃1501数据集上与3类方法进行了比较:(1)基于深度学习和属性的方法。PA
同样地,将本文提出的算法在DukeMTMC⁃ReID数据集上与目前较为主流的上述3类算法进行对比,额外加入行人掩膜引导的方法Human parsin
接下来,在小数据集PRID和GRID上与其他方法进行比较。小数据与大数据集Market⁃1501和DukeMTMC⁃ReID有显著差别,主要体现在行人图像数量较少,一般方法都是用Rank⁃1、Rank⁃5、Rank⁃10和Rank⁃20进行性能对比,而不选用mAP,因为有的数据集中同一个人可能只有一张图像,因此无法计算mAP。在PRID上(
在GRID上的实验结果如
通过
本节内容是对

图4 对Rank-1和mAP的影响
Fig.4 Effect of on Rank-1 and mAP
在本节中,讨论

图5 难样本混淆可视化
Fig.5 Visualization of negative sample confusion
为了分析不同对模型性能的影响,在Market⁃1501数据集上进行实验,并且以Rank⁃1和mAP进行说明。实验结果如

图6 对Rank-1和mAP的影响
Fig.6 Effect of on Rank-1 and mAP
本节对难样本选取过程中采用的度量方式进行分析。实验中,拟采用3种不同的度量方式选取训练批次图像中的难样本进行混淆。实验结果如
本文针对不同视角下的背景、光照等因素造成数据集中存在大量难样本的问题,提出一种难样本混淆增强特征鲁棒性的行人重识别方法,通过相似性度量寻找每张图像对应的难样本,利用混淆因子合成具有难样本信息的新图像再以有监督的方式促使模型挖掘难样本信息,从而提高模型鲁棒性。在多个数据集上的实验结果表明,本文算法在性能方面优于目前难样本信息挖掘的方法和一些主流的深度学习方法。消融实验进一步证明了所提方法的有效性,通过后续的参数分析选择了适用于不同数据集的最优参数,最后由难样本可视化阐明混淆过程以及混淆因子的作用。未来会研究如何在图像风格更加相似的图像中更加快速有效地找到难样本,进一步提高行人重识别的精度。
参考文献
曹亮,王洪元,戴臣超,等.基于多样性约束和离散度分层聚类的无监督视频行人重识别[J].南京航空航天大学学报,2020,52(5): 752-759. [百度学术]
Cao Liang,Wang Hongyuan,Dai Chenchao, et al. Unsupervised video-based person re-identification based on diversity constraint and dispersion hierarchical clusteing[J]. Journal of Nanjing University of Aeronautics & Astronautics, 2020,52(5): 752-759. [百度学术]
罗浩,姜伟,范星,等.基于基于深度学习的行人重识别研究进展[J].自动化学报,2019,45(11):2032-2049. [百度学术]
Luo Hao, Jiang Wei, Fan Xing, et al. A survey on deep learning based person re-identification[J]. Acta Automatica Sinica, 2019, 45(11): 2032-2049. [百度学术]
张文文,王洪元,万建武,等. 基于稀疏学习的行人重识别算法[J].数据采集与处理, 2018, 33(5):91-100. [百度学术]
Zhang Wengweng,Wan Hongyuan,Wang Jianwu, et al. A sparsity-learning-based person re-identification algorithm[J]. Journal of Data Acquisition and Processing, 2018, 33(5):91-100. [百度学术]
Liu Hao, Feng Jiashi, Qi Meibin, et al. End-to-end comparative attention networks for person re-identification[J]. IEEE Transactions on Image Processing, 2017, 26(7):3492⁃3506. [百度学术]
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Proceedings of the 2014 Advances in Neural Information Processing Systems(NIPS). Montréal, Canada: MIT Press,2014: 2672-2680. [百度学术]
Wei Longhui, Zhang Shiliang, Tian Qi, et al. Person transfer GAN to bridge domain gap for person re-identification[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2018: 79-88. [百度学术]
Huang Yan, Xu Jinsong, Wu Qiang,et al. Multi-pseudo regularized label for generated data in person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28(3):1391-1403. [百度学术]
Chen Xi, Duan Yan, Houthooft R,et al.Info GAN: Interpretable representation learning by information maximizing generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2016, 29: 2172-2180. [百度学术]
Deng Weijian, Zheng Liang, Kang Guoliang,et al. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2018: 994-1003. [百度学术]
Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2818-2826. [百度学术]
He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al.Deep residual learning for image recognition [C]//Proceedings of International Conference on Computer Vision and Pattern Recognition. [S.l.]: [s.n.], 2016:770-778. [百度学术]
Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification[J/OL]. (2017-05-17) [2020-04-20]. https://arxiv.org/abs/1703.07737. [百度学术]
Loy C C, Xiang T, Gong S. Time-delayed correlation analysis for multi-camera activity understanding[J]. International Journal of Computer Vision, 2010, 90(1): 106-129. [百度学术]
Hirzer M, Beleznai C, Roth P M, et al. Person re-identification by descriptive and discriminative classification[C]//Proceedings of Scandinavian Conference on Image Analysis. [S.l.]: Springer, 2011: 91-102. [百度学术]
Zheng Liang, Shen Liyue, Tian Lu, et al. Scalable person re-identification: A benchmark[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1116-1124. [百度学术]
Ristani E, Solera F, Zou R, et al. Performance measures and a data set for multi-target, multicamera tracking[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2016: 17-35. [百度学术]
Luo H, Gu Y, Liao X, et al. Bag of tricks and a strong baseline for deep person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. [S.l.]: IEEE, 2019. [百度学术]
Kingma D P, Ba J. Adam: A method for stochastic optimization[J] The International Conference for Learning Representations (ICLR). arXiv preprint arXiv:1412.6980, 2014. [百度学术]
Lin Yutian, Zheng Liang, Zheng Zhedong, et al.Improving person re-identification by attribute and identity learning[J]. Pattern Recognition, 2019,95:151-161. [百度学术]
Lv Jianming, Chen Weihang, Li Qing, et al. Unsupervised cross-dataset person re-identification by transfer learning of spatial-temporal patterns[C]//Proceedings of Computer Vision and Pattern Recognition. [S.l.]: [s.n.], 2018:7948-7956. [百度学术]
Tang Y, Yang X, Wang N, et al. Person re-identification with feature pyramid optimization and gradual background suppression[J]. Neural Networks, 2020, 124: 223-232. [百度学术]
Si T, Zhang Z, Liu S. Discrimination-aware integration for person re-identification in camera networks[J]. IEEE Access, 2019, 7: 33107-33114. [百度学术]
Chen Y, Duffner S, Stoian A, et al. List-wise learning-to-rank with convolutional neural networks for person re-identification[J]. Machine Vision and Applications, 2021, 32(2): 1-14. [百度学术]
Li Wei, Zhu Xiatian, Gong Shaogang. Harmonious attention network for person re-identification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2018:2285-2294. [百度学术]
徐龙壮, 彭力. 基于多尺度卷积特征融合的行人重识别[J]. 激光与光电子学进展, 2019, 56(14):221-227. [百度学术]
Xu Longzhuang, Peng Li. Person reidentification based on multiscale convolutional feature fusion[J]. Laser & Optoelectronics Progress, 2019, 56(14) :221-227. [百度学术]
Yuan Ye , Chen Wuyang , Yang Yang, et al. In defense of the triplet loss again: Learning robust person re-identification with fast approximated triplet loss and label distillation[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). [S.l.]: IEEE, 2020:354-355. [百度学术]
Kalayeh M M, Basaran E , Gokmen M. Human semantic parsing for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2018: 1062-1071. [百度学术]
Hirzer M, Roth P M, Köstinger M, et al. Relaxed pairwise learned metric for person re-identification[C]//Proceedings of European Conference on Computer Vision. Berlin, Heidelberg: Springer, 2012: 780-793. [百度学术]
Lisanti G, Masi I, Del Bimbo A. Matching people across camera views using kernel canonical correlation analysis[C]//Proceedings of the International Conference on Distributed Smart Cameras. [S.l.]: [s.n.], 2014: 1-6. [百度学术]
Xiong F, Gou M, Camps O, et al. Person re-identification using kernel-based metric learning methods[C]//Proceedings of European Conference on Computer Vision. Cham: Springer, 2014: 1-16. [百度学术]
Zhang L, Xiang T, Gong S. Learning a discriminative null space for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2016: 1239-1248. [百度学术]
Su C, Yang F, Zhang S, et al. Multi-task learning with low rank attribute embedding for multi-camera person re-identification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(5): 1167-1181. [百度学术]
Matsukawa T, Okabe T, Suzuki E, et al. Hierarchical gaussian descriptor for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2016: 1363-1372. [百度学术]
An L, Chen X, Yang S, et al. Person re-identification by multi-hypergraph fusion[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 28(11): 2763-2774. [百度学术]
Sun C, Wang D, Lu H. Person re-identification via distance metric learning with latent variables[J]. IEEE Transactions on Image Processing, 2016, 26(1): 23-34. [百度学术]
Yang X, Wang M, Hong R, et al. Enhancing person re-identification in a self-trained subspace[J]. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2017, 13(3): 1-23. [百度学术]
Chen Y C, Zhu X, Zheng W S, et al. Person re-identification by camera correlation aware feature augmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(2): 392-408. [百度学术]
Tao D, Guo Y, Yu B, et al. Deep multi-view feature learning for person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 28(10): 2657-2666. [百度学术]
Dai J, Zhang Y, Lu H, et al. Cross-view semantic projection learning for person re-identification[J]. Pattern Recognition, 2018, 75: 63-76. [百度学术]
Lei J, Niu L, Fu H, et al. Person re-identification by semantic region representation and topology constraint[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(8): 2453-2466. [百度学术]
Li H, Xu J, Zhu J, et al. Top distance regularized projection and dictionary learning for person re-identification[J]. Information Sciences, 2019, 502: 472-491. [百度学术]