Abstract:In the video image based group activity recognition method, the traditional deep learning methods generally use the conventional(maximum / average)pooling to process the convolutional feature. However, these methods do not consider the importance of the key characters in the group activity which influence the classified result of group behavior. Therefore, we propose an attention based model to detect behavior in group activity videos. In order to identify the group behavior correctly in the video image, this model focuses on the key people in the activity and pools convolutional features dynamically according to the weight of the attention. We conduct extensive experiments on two group behavior datasets, CAD (Collective activity dataset) and CAE (Collective activity extended dataset). The recognition accuracy of our model is better than many existing models using conventional pooling structure.