Abstract:Accurate 3D object detection and localization are critical for UAV-based inspection and obstacle avoidance. Traditional methods often integrate detection and localization within a unified network, resulting in complex architectures, high computational costs, and challenges in real-time deployment. To address these issues, we propose a lightweight 3D object detection and localization framework based on network decoupling. First, a lightweight 2D object detection network is designed, incorporating efficient feature extraction and an enhanced attention mechanism, which significantly reduces the number of parameters while improving generalization across diverse target types. Second, we introduce a visual/LiDAR fusion-based depth completion network with cross-layer connections and auxiliary loss functions to achieve high-precision dense depth map estimation. Finally, a pixel/depth alignment scheme is developed to accurately compute 3D spatial positions of detected objects via coordinate transformation. Experimental results demonstrate that, compared to the YOLOv9 detection algorithm, the proposed method improves object detection accuracy by 14%, and enhances 3D localization accuracy by 45% over the AVOD framework. Moreover, the proposed approach achieves a processing rate of 36 frames per second on UAV edge devices, representing a 90% increase over AVOD, highlighting its practical value for real-time UAV-based object detection applications.