Abstract:This paper proposes a new method describing human behavior in crowded scenes based on STIPs (Spatial Temporal Interesting Points). By comparing three different methods for STIPs extraction, which are Harris corner, Gabor wavelet and Hessian matrix, The scale-invariant extraction method which based on Hessian matrix is chosen in the paper. Histogram of gradient, histogram of optical flow orientation and spatial-temporal Haar feature are used to build descriptors for STIPs. Then bag-of-words model is used in normal behavior modeling. GMM based on EM estimation is introduced to produce keywords. Then each video of normal action is divided into several clips and they are described in probability vectors using keywords. All vectors construct normal behavior codebook. In testing phase, through calculating the similarity distance between the coding vector of the test sample and that of the normal, abnormal behavior can be detected when the distance exceeds the threshold. The algorithm is tested in UMN and UCF datasets, the experiments show that the proposed algorithm has effective identification for group abnormal behavior, and it has good adaptability against scale variance and illumination changing.