融合注意力机制的双路径孪生视觉跟踪方法

doi:10.16337/j.1004-9037.2022.01.008

首页 > 按月查看>2022年第1月 >94-107. DOI:10.16337/j.1004-9037.2022.01.008

融合注意力机制的双路径孪生视觉跟踪方法
DOI:
                        10.16337/j.1004-9037.2022.01.008
                    
作者:
                        
                        
                    
作者单位:1.昆明理工大学信息工程与自动化学院，昆明 650500;2.昆明理工大学云南省计算机技术应用重点实验室，昆明 650500
作者简介:
通讯作者:
基金项目:国家自然科学基金(61971208, 61671225, 52061020, 61702128)；云南省应用基础研究计划重点项目(2018FA034)；云南省中青年学术技术带头人后备人才计划(Shen Tao, 2018)；云南省万人计划青年拔尖人才计划(沈韬，朱艳，云南省人社厅No.2018 73)；昆明理工大学人才培养计划(KKSY201703016)。

Dual-Path Siamese Network Visual Tracking Method with Attention Mechanism

Author:

Affiliation:

1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;2.Yunnan Key Laboratory of Computer Technologies Application, Kunming University of Science and Technology, Kunming 650500, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

传统基于孪生网络的视觉跟踪方法在训练时是通过从大量视频中提取成对帧并且在线下独立进行训练而成，缺乏对模型特征的更新，并且会忽略背景信息，在背景驳杂等复杂环境下跟踪精度较低。针对上述问题，提出了一种融合注意力机制的双路径孪生网络视觉跟踪算法。该算法主要包括特征提取器部分和特征融合部分。特征提取器部分对残差网络进行改进，设计了一种双路径网络模型；通过结合残差网络对前层特征的复用性和密集连接网络对新特征的提取，将2种网络拼接后用于特征提取；同时采用膨胀卷积代替传统卷积方式，在保持一定感受视野的情况下提高了分辨率。这种双路径特征提取方式可以隐式地更新模型特征，获得更准确的图像特征信息。特征融合部分引入注意力机制，对特征图不同部分分配权重。通道域上筛选出有价值的目标图像信息，增强通道间的相互依赖；空间域上则更加关注局部重要信息，学习更丰富的上下文联系，有效地提高了目标跟踪的精度。为证明该方法的有效性，在OTB100和VOT2016数据集上进行验证，分别使用精确率（Precision）、成功率（Success rate）和平均重叠期望（Expect average overlaprate，EAO）作为评价标准。结果显示，本文算法的精确率、成功率和平均重叠期望分别为0.868、0.641和0.350；相比基准模型分别提高了5.1%、2.0%和0.9%。结果证明本文算法充分利用了不同网络的优点，在保证模型精度的同时，能够较好地适应目标外观的变化，降低相似物的干扰，取得更稳定的跟踪效果。

Abstract:

Traditional visual tracking methods based on the Siamese network extract pairs of frames from a large number of videos and train them on the offline independently at the stagey of training. They lack the update of the model features and neglect the background information， so the tracking accuracy is a little bit low in the complex environments such as background clutter. In response to the above problems， this paper proposes a dual-path Siamese network visual tracking method with the attention mechanism. The method mainly includes the feature extractor part and the feature fusion part. In the feature extractor part， the residual network is improved and a dual-path network model is designed. By combining the reusability of the residual networks to features of the former layer and the extraction of new features from the dense networks， these two networks are spliced for the feature extraction. At the same time， this paper uses the dilated convolution to replace the traditional convolution， which improves the resolution on the condition of maintaining a certain receptive field. This dual-path feature extraction method can implicitly update the model features， so that obtain the more accurate image feature information. Moreover， the attention mechanism is introduced to the feature fusion part， which can distribute the different weights to the different parts of the feature maps. In the channel domain， the method screens the valuable target image information and enhances the interdependence between the channels. In the spatial domain， it also pays more attention to the local important information and learns more rich contextual connections， which effectively improves the accuracy of object tracking. To confirm the effectiveness of the method， some experiments are conducted on the OTB100 and VOT2016 datasets. We use precision， success rate and expect average overlap-rate as the evaluation criterion， and their values are 0.868， 0.641 and 0.350 respectively on the two datasets， which increase by 5.1%， 2.0% and 0.9% compared with those of the benchmark model. Experimental results show that the proposed method makes full use of the advantages of different networks， and while ensuring the accuracy of the model， it can adapt to the deformation of the target well， reduce the interference between the similar objects， and achieve more stable tracking effect.