Aiming at the problem of how to obtain accurate positional information of objects in unstructured scenes by depth cameras with limited hardware device resources, a target position detection method based on bidirectional fusion of texture and depth information is proposed. In the learning phase, two networks adopt the full-flow bidirectional fusion (FFB6D) module, the texture information extraction part introduces the lightweight Ghost module to reduce the computation of the network, and adds the attention mechanism CBAM that can enhance useful features, and the depth information extraction part extends the local features and multilevel feature fusion to obtain more comprehensive features. In the output stage, in order to improve the efficiency, the instance semantic segmentation results are utilized to filter background points, then 3D keypoint detection is performed, and finally the position information is obtained by the least square fitting algorithm. Validations are carried out on LINEMOD, Occlusion LINEMOD and YCB-Video public datasets, whose accuracies reach 99.8%, 66.3% and 94%, respectively, and the amount of parameters is reduced by 31%, showing that the improved position estimation method can canreduce the number of parameters while guaranteeing the accuracy.