Abstract:Visual attention mechanism has been commonly used in state-of-the-art fine-grained classification methods in recent years. However, most attention-based image classification systems only apply single-layer or part-specified attention feature, with simple multiplication-based attention applying method, which limits the information provided by the attention. This paper presents a multi-channel visual attention based fine-grained image classification system. Multi-channel attention features are extracted from the image and applied to low-level features, with subtraction of mean values corresponding to each layer of attention for high-order representation, making the model an end-to-end optimizable deep neural network architecture. On multiple commonly used fine-grained classification datasets, the presented method outperforms state-of-the-art methods with a large margin.