Abstract:Although the person re-identification task has made significant progress, the occlusion problem is still a challenge in practical application scenes. In order to extract more effective features from occluded pedestrians, a learnable mask and position encoding (LMPE) method is proposed. Firstly, a learnable dual attention mask generator (LDAMG) is introduced to adapt to different occlusion patterns, and significantly improve the re-identification accuracy of occluded pedestrians. It makes the network more flexible and better adapts to diverse occlusion situations. At the same time, the network learns contextual information through the mask, which further improves the understanding of the scenes. In addition, we introduce the occlusion aware position encoding fusion (OAPEF) module to solve the problem of losing position information in Transformer. This method helps to perform the fusion of different regional position encoding and allows the network to gain stronger expressive ability. The integration of position encoding in all directions enables the network to understand the spatial correlation between pedestrians more accurately, and improves the ability to adapt to the occlusion situation. Finally, simulation experiments are conducted, and experiments demonstrate that LMPE performs well on Occluded-Duke and Occluded-ReID occluded datasets and Market-1501 and DukeMTMC-ReID unoccluded datasets, which confirms the effectiveness and superiority of our proposed method.