基于属性选择和采样策略的不平衡数据动态分类方法
作者:
作者单位:

大连海事大学理学院,大连 116026

作者简介:

通讯作者:

基金项目:

国家自然科学基金(61803065,61773352)资助项目;中央高校基本科研业务费专项基金(3132019602)资助项目。


Dynamic Classification for Multi-imbalanced Datasets via Attribute Selection and Sampling Strategy
Author:
Affiliation:

School of Science, Dalian Maritime University, Dalian 116026, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    不平衡数据分类是机器学习领域的重要研究方向之一,现有不平衡学习算法大多针对二分类而无法满足多分类需求。本文面向多类不平衡数据分类问题,通过结合粗糙集、重采样方法以及动态集成分类策略设计了一种新的多分类模型。该模型运用综合采样方式和粗糙集属性约简技术获得多个平衡数据子集,在此基础上实现动态集成分类模型的构建。真实数据集上的22组实验验证了该模型与两种经典算法相比对少数类样本具有更好的预测性能,可成为多类不平衡数据分类的可选策略。

    Abstract:

    The classification of imbalanced datasets is one of the important topics in machine learning. Most of the existing imbalance learning algorithms designed for dichotomies are insufficient to meet multi-class classification needs. To tackle multi-class imbalance classification problem, we design a new multi-classification model synthesizing rough sets, resampling methods and dynamic ensemble classification strategy in this study. The model utilizes the hybrid sampling and the rough set reduction algorithm to generate multiple balanced data subsets, on which the construction of the dynamic ensemble classification model is realized. The experiments on 22 real datasets demonstrate that the designed method has higher prediction performance on identifying minority samples compared with two previous algorithms, which can be an alternative selection strategy in multi-class imbalance classification.

    表 2 类平均准确率结果Table 2 Results of MAvA
    表 3 几何平均准确率结果Table 3 Results of G-mean
    表 4 精确率结果Table 4 Results of precision
    表 5 F值结果Table 5 Results of F-measure
    图1 基于粗糙集属性约简的动态集成分类方法流程图Fig.1 Flow chart of dynamic ensemble classification method based on rough set attribute reduction
    图2 数据平衡处理过程Fig.2 Data balancing process
    图3 属性约简与数据平衡不同顺序的对比实验结果Fig.3 Comparison of experimental results with different order between attribute reduction and data balance
    图4 实验步骤对结果的影响(单位:%)Fig.4 Influence of experimental steps on the results (Unit: %)
    表 1 不平衡数据集描述Table 1 Description of the imbalanced data sets
    参考文献
    相似文献
    引证文献
引用本文

赵冬雪,王昕,王利东.基于属性选择和采样策略的不平衡数据动态分类方法[J].数据采集与处理,2021,36(3):509-518

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2020-06-12
  • 最后修改日期:2020-12-11
  • 录用日期:
  • 在线发布日期: 2021-06-16