Abstract:Class overlap is defined as the overlay degree of data from different classes, quantified by the approaches of geometrical statistics and information theory, and it is used to measure the complexity of a classification. There are imbalanced data in the real world, and the great disparity of the sample amounts challenges classification. With the help of experiments, we evaluate the efficiency of the class overlap measures on imbalanced data classification. Firstly, focusing on two-class classification, the experiments are designed to evaluate the efficiency of the class overlap measures on synthetic unbalanced data, which are generated with various skewness, class boundary shapes, feature types and probability distributions. Secondly, according to the experimental results on the artificial data, the influence rules of the imbalanced ratio on the measures are analyzed, then the ways of the measures to guide unbalanced data classification are concluded. Finally, the conclusions are evaluated on the real-world imbalanced data sets. The experimental results demonstrate that those measures with higher robustness on data skeness can efficiently guide classifiers selection for imbalanced data classification.