Fine-grained image recognition (FGIR) is an important research topic in the field of computer vision. Its main goal is to distinguish subclasses with high similarity in appearance under the same category. This paper focuses on the research of weakly-supervised fine-grained image recognition technology. Given the problems of insufficient use of feature of fine-grained images and difficulty in digging discriminative regions existing in the research of FGIR, the attention and multi-scale ensemble-learning based network (AMEN) is proposed. This method introduces a progressive learning network, which uses the strategy of ensemble learning to construct multi-scale base-classifiers based on three levels of output features of deep neural network in parallel, and uses the label smoothing method to carry out progressive training for multi-scale base-classifiers, so as to greatly improve the utilization of low-level features. At the same time, the efficient dual channel attention is used to impose channel weights on features, so that the network can independently select features at the channel level, so as to improve the utilization of high information correlation channels. This method also introduces a self-attention region proposal network, which promotes the model to gradually locate the more discriminative region by constructing a circular feedback mechanism, and fuses the feature information of the complete image and the discriminative region in the final classification module. Experimental results show that the recognition accuracy of AMEN on three fine-grained image datasets of CUB-200-2011, FGVC Aircraft and Stanford Cars has reached the advanced level of the field.