Deep learning has become new state-of-the-art framework in many task in big data circumstance.Most of methods need full annotated data or assume only an object in the image with simple background.However,complex background,more than one object in the image and expensive full annotation in the reality,object recognition becomes more challenging.Here,we propose a deep-model-based attention mechanism and recurrent neural network.It trains the network end-to-end on multi-label data with image-level label.The glimpses change along with stochastic gradient descent and focus on different local region in every step.Finally,the effectiveness of the proposed algorithm is verified on the PASCAL VOC 2007 and 2012 datasets.Results show that the network is easily interpretable than other methods.