基于Bootstrap方法最大熵优化过采样算法
作者:
作者单位:

大连海事大学理学院,大连 116026

作者简介:

通讯作者:

基金项目:

国家自然科学基金(11571056)。


An Over-Sampling Algorithm for Maximum Entropy Optimization Based on Bootstrap Method
Author:
Affiliation:

School of Science, Dalian Maritime University, Dalian 116026, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    随着数据时代的到来,非平衡数据的分类问题受到越来越多的关注。在非平衡数据的分类问题中,往往因为少数类样本与多数类样本比例失衡而导致分类结果错误。因此,提出了一种在最大熵原理下基于自助法(Bootstrap method)的过采样算法。首先,通过自助法获得数据样本的概率分布,并用最大熵原理对概率分布进行优化;其次,根据少数类生成新的少数类的能力不同,提出基于少数类样本分布的概率增强算法。该算法使数据随机性得到了充分体现,保证了少数类样本的概率密度在数据集平衡前后保持一致性,从而提高分类算法的有效性;最后,通过从UCI和KEEL数据库选取8组数据进行实验,实验结果表明所提出的新算法比现有的其他算法更有效。

    Abstract:

    With the advent of the data era, the classification of unbalanced data is receiving more and more attention. In the classification of unbalanced data, classification results are often incorrect due to an imbalance in the ratio of minority class samples to majority class ones. Therefore, we propose an oversampling algorithm based on the Bootstrap method under the maximum entropy principle. Firstly, the probability distribution of the data sample is obtaited through self-help method and optimized using the principle of maximum entropy. Secondly, a probability enhancement algorithm based on minority class sample distribution is proposed based on different abilities of minority classes to generate new minority classes. The algorithm allows the randomness of the data to be fully represented and ensures that the probability density of the minority class remains consistent before and after the data set is balanced, thus improving the effectiveness of the classification algorithm. Finally, experiments are conducted by selecting eight data sets from the UCI and KEEL databases, whose results show that the proposed algorithm is more effective than other algorithms.

    参考文献
    相似文献
    引证文献
引用本文

雷天纲,陈刚.基于Bootstrap方法最大熵优化过采样算法[J].数据采集与处理,2023,38(3):727-740

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-01-20
  • 最后修改日期:2023-04-20
  • 录用日期:
  • 在线发布日期: 2023-05-25