Abstract:In order to solve the problems of high computational complexity and large number of parameters in the current pose estimation model, this paper proposes a lightweight pose estimation algorithm, firstly, the local coding module PCE is introduced in the feature extraction process, and the local and global features of the image are extracted respectively by combining the advantages of convolutional neural network and visual encoder. Then, weighted feature fusion is introduced in the process of multi-scale feature fusion to enhance the multi-scale feature fusion ability of the model to avoid the problem of reduced accuracy caused by model lightweight. Then, in the process of regression prediction, the detection head of the human detection and classification parts was shared to improve the recognition efficiency of the model in the pose estimation task. Experimental results show that compared with the basic model, the proposed model reduces the parameters by 27%, the amount of computation decreases by 18%, the complexity of the model is reduced, and the accuracy is increased by 0.2%, which not only ensures the accuracy of recognition, but also realizes the lightweight of the detection algorithm, and provides an effective means to achieve real-time accurate pose estimation.