基于分类间隔增强的不平衡多标签学习算法
作者:
作者单位:

1.安徽省高校智能感知与计算重点实验室(安庆师范大学),安庆 246133;2.安庆师范大学创新团队,安庆 246133

作者简介:

通讯作者:

基金项目:


Imbalanced Multi-label Learning Algorithm Based on Classification Interval Enhanced
Author:
Affiliation:

1.University Key Laboratory of Intelligent Perception and Computing of Anhui Province(Anqing Normal University), Anqing 246133,China;2.Innovation Team of Anqing Normal University, Anqing 246133,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    传统的多标签学习算法一般没有考虑标签的不均衡性,从而忽略了标签不平衡给分类带来的影响。但统计发现,目前常用的多标签数据集均存在标签不均衡问题,且少数类标签往往更加重要。基于此,本文提出了一种基于分类间隔增强的不平衡多标签学习算法(Imbalanced multi-label learning algorithm based on classification interval enhanced, MLCIE),旨在利用各标签分类间隔的重构来增强分类器对少数类标签样本的学习效率,提升样本标签质量,从而减少多标签不平衡对分类器学习精度的影响。首先利用各标签密度与条件熵计算各标签的不确定性系数;然后构建分类间隔增强矩阵,将各标签独有的密度信息融入到原始标签矩阵中,获取平衡的标签空间;最后使用极限学习机作为线性分类器进行分类。本文在11个多标签标准数据集上与其他7种多标签学习算法进行对比实验,结果表明本文算法在解决标签不平衡问题上有一定效果。

    Abstract:

    Traditional multi-label learning algorithms generally do not consider the label imbalance, so the impact of label imbalance on classification is not ignored. However, statistics show that the current multi-label datasets have the problem of label imbalance, and a few kinds of labels are often more important. Based on this, this paper proposes an imbalanced multi-label learning algorithm based on classification interval enhanced (MLCIE), which aims to enhance the learning efficiency and improve the quality of the sample label by using the reconstruction of each label classification interval, so as to reduce the impact of multi-label imbalance on the learning accuracy of the classifier. Firstly, the uncertainty coefficient of each label is calculated by using the density and conditional entropy of each label; Then the enhancement matrix of classification interval is constructed, so that the unique density information of each label is integrated into the original label matrix to obtain the balanced label space; Finally, the limit learning machine is used as the linear classifier for classification. In this paper, the proposed algorithm is compared with other seven multi-label learning algorithms on the 11 multi-label standard datasets. The results show that the proposed algorithm can solve the problem of label imbalance.

    表 1 4种不确定性系数Table 1 Four kinds of uncertainty coefficient
    表 2 改造前后标签变化Table 2 Labels change before and after remoulding
    表 5 各算法的平均排序Table 5 Average sorting of algorithms
    图1 Yeast数据集标签密度直方图Fig.1 Density histogram of Yeast data set label
    图2 各算法性能对比Fig.2 Performance comparison of each algorithm
    表 3 多标签数据集详细描述Table 3 Detailed description of multi-label datasets
    表 4 各算法在11个数据集上的AP↑值Table 4 AP↑ values of each algorithm on 11 data sets
    参考文献
    相似文献
    引证文献
引用本文

程玉胜,曹天成.基于分类间隔增强的不平衡多标签学习算法[J].数据采集与处理,2021,36(3):519-528

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2020-06-22
  • 最后修改日期:2020-11-06
  • 录用日期:
  • 在线发布日期: 2021-06-16