Abstract:For the deficiency of the existing words bag in object recognition. We improve the feature extraction and image representation etc to enhance the accuracy. Firstly, a fixed step size is used and scale-intensive is fixed to extract key points, and then the scale-invariant feature transform (SIFT) and local binry pattern(LBP) around the key points in the grids are extracted to describe the shape features and texture features. K-Means clustering algorithm is introduced to generate a visual dictionary and the local descriptors are encoded by approximated locality constrained linear coding, and max pooling and a histograms are generated using spatial pyramid matching. Both the spatial pyramid histograms are connected, therefore, the feature fusion in the image level is implemented under the words bag. Finally the fusion result is sent to the SVM for classification. Experimental result in public datasets shows that the proposed method can achieve higher recognition accuracy.