Three-dimensional (3-D) human target detection has important application value in intelligent security, robot, automatic driving and other fields. At present, the 3-D human target detection method based on radar and image data fusion mainly adopts two-stage network structure, which respectively completes the selection of candidate boundary boxes with high target probability and the target classification/regression of target candidate boxes. Although the preselection of target candidate bounding box enables the two-stage network structure to achieve higher detection accuracy and positioning accuracy, the complexity of the network structure leads to the limitation of the operation speed, which cannot be applied in scenarios with high real-time requirements. In order to solve the above problem, this paper studies a real-time detection method of 3-D human targets based on improved RetinaNet. The backbone network and feature pyramid network are combined for point cloud and image feature extraction, and the fused feature anchors are input into the functional network to output the 3-D boundary boxes and target category information. By using the one-stage network structure, the method directly regresses the category probability and position coordinates of the targets, solving the imbalance problem of positive and negative samples in the process of one-stage network training by introducing focal loss function. Experiments on KITTI dataset show that the proposed method outperforms the contrast algorithms in terms of average accuracy and time-consuming, and can effectively balance the accuracy and real-time performance of target detection.
表 2 基于不同模型的人体目标检测结果对比Table 2 Comparison of human target detection results based on different models
表 1 基于不同数据来源的改进RetinaNet模型人体目标检测平均精度对比Table 1 Comparison of human target detection average precision of improved RetinaNet model based on different data sources