基于雷达与图像数据融合的人体目标检测方法

doi:10.16337/j.1004-9037.2021.02.014

首页 > 按月查看>2021年第2月 >324-333. DOI:10.16337/j.1004-9037.2021.02.014

基于雷达与图像数据融合的人体目标检测方法
DOI:
                        10.16337/j.1004-9037.2021.02.014
                    
作者:
                        
                        
                    
作者单位:安徽南瑞继远电网技术有限公司，合肥 230088
作者简介:
通讯作者:
基金项目:

Human Target Detection Method Based on Fusion of Radar and Image Data

Author:

Affiliation:

Anhui Narui Jiyuan Power Grid Technology Co Ltd, Hefei 230088, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

三维人体目标检测在智能安防、机器人、自动驾驶等领域具有重要的应用价值。目前基于雷达与图像数据融合的三维人体目标检测方法主要采用两阶段网络结构，分别完成目标概率较高的候选边界框的选取以及对目标候选框进行分类和边界框回归。目标候选边界框的预先选取使两阶段网络结构的检测准确率和定位精度得到提高，但相对复杂的网络结构导致运算速度受到限制，难以满足实时性要求较高的应用场景。针对以上问题，研究了一种基于改进型RetinaNet的三维人体目标实时检测方法，将主干网络与特征金字塔网络结合用于雷达点云和图像特征的提取，并将两者融合的特征锚框输入到功能网络从而输出三维边界框和目标类别信息。该方法采用单阶段网络结构直接回归目标的类别概率和位置坐标值，并且通过引入聚焦损失函数解决单阶段网络训练过程中存在的正负样本不平衡问题。在KITTI数据集上进行的实验表明，本文方法在三维人体目标检测的平均精度和耗时方面均优于对比算法，可有效实现目标检测的准确性和实时性之间的平衡。

Abstract:

Three-dimensional （3-D） human target detection has important application value in intelligent security， robot， automatic driving and other fields. At present， the 3-D human target detection method based on radar and image data fusion mainly adopts two-stage network structure， which respectively completes the selection of candidate boundary boxes with high target probability and the target classification/regression of target candidate boxes. Although the preselection of target candidate bounding box enables the two-stage network structure to achieve higher detection accuracy and positioning accuracy， the complexity of the network structure leads to the limitation of the operation speed， which cannot be applied in scenarios with high real-time requirements. In order to solve the above problem， this paper studies a real-time detection method of 3-D human targets based on improved RetinaNet. The backbone network and feature pyramid network are combined for point cloud and image feature extraction， and the fused feature anchors are input into the functional network to output the 3-D boundary boxes and target category information. By using the one-stage network structure， the method directly regresses the category probability and position coordinates of the targets， solving the imbalance problem of positive and negative samples in the process of one-stage network training by introducing focal loss function. Experiments on KITTI dataset show that the proposed method outperforms the contrast algorithms in terms of average accuracy and time-consuming， and can effectively balance the accuracy and real-time performance of target detection.

表 2 基于不同模型的人体目标检测结果对比Table 2 Comparison of human target detection results based on different models

表 1 基于不同数据来源的改进RetinaNet模型人体目标检测平均精度对比Table 1 Comparison of human target detection average precision of improved RetinaNet model based on different data sources