The classification of imbalanced datasets is one of the important topics in machine learning. Most of the existing imbalance learning algorithms designed for dichotomies are insufficient to meet multi-class classification needs. To tackle multi-class imbalance classification problem, we design a new multi-classification model synthesizing rough sets, resampling methods and dynamic ensemble classification strategy in this study. The model utilizes the hybrid sampling and the rough set reduction algorithm to generate multiple balanced data subsets, on which the construction of the dynamic ensemble classification model is realized. The experiments on 22 real datasets demonstrate that the designed method has higher prediction performance on identifying minority samples compared with two previous algorithms, which can be an alternative selection strategy in multi-class imbalance classification.
表 2 类平均准确率结果Table 2 Results of MAvA
表 3 几何平均准确率结果Table 3 Results of G-mean
表 4 精确率结果Table 4 Results of precision
表 5 F值结果Table 5 Results of F-measure
图1 基于粗糙集属性约简的动态集成分类方法流程图Fig.1 Flow chart of dynamic ensemble classification method based on rough set attribute reduction
图2 数据平衡处理过程Fig.2 Data balancing process
图3 属性约简与数据平衡不同顺序的对比实验结果Fig.3 Comparison of experimental results with different order between attribute reduction and data balance
图4 实验步骤对结果的影响(单位:%)Fig.4 Influence of experimental steps on the results (Unit: %)
表 1 不平衡数据集描述Table 1 Description of the imbalanced data sets