基于标记增强和模糊辨识度的标记分布特征选择
作者:
作者单位:

1.江西农业大学计算机与信息工程学院,南昌 330045;2.江西农业大学软件学院,南昌 330045

作者简介:

通讯作者:

基金项目:

国家自然科学基金(61966016)资助项目;江西省自然科学基金(20192BAB207018)资助项目;江西教育厅科学技术研究基金(GJJ180200)资助项目。


Label Enhancement and Fuzzy Discernibility Based Label Distribution Feature Selection
Author:
Affiliation:

1.School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330045, China;2.School of Software, Jiangxi Agricultural University, Nanchang 330045, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    特征选择作为多标记学习任务中关键预处理步骤,能够有效地解决高维多标记数据存在的维度灾难问题。在现有大部分的多标记学习中,标记是以逻辑分布的形式刻画,即示例中相关标记的重要性相同;然而,在许多现实生活中,每个示例的标记重要程度呈现差异性。本文提出了一种基于模糊相似性的标记增强算法,通过衡量示例中标记的模糊相关性,将传统的多标记数据转换为标记分布数据;分析了标记分布数据中在标记上的标记差异性和在特征上的模糊相对辨识关系,给出了在标记空间和特征空间上的模糊辨识度,并构造了衡量特征辨识能力的特征重要度;在此基础上,构建面向标记分布数据的特征选择算法,能获得按特征重要度降序的特征选择结果。最后通过在多个多标记数据集上实验对比和分析,进一步验证了算法的有效性和可行性。

    Abstract:

    Feature selection is the key pre-processing step of multi-label learning tasks. It can efficiently solve the problem of the “curse of dimensionality”, which is existed in the high-dimensional multi-label data. In multi-label learning, the label is described as the form of logical distribution, in which the importance of each label associated with the instance is equivalent. However, the label importance of each label is usually different in many fields. For this issue, a label enhancement algorithm is proposed in this paper. By evaluating the fuzzy similarity relation on labels among instances, it transforms the multi-label data to the label distribution data. The discernibility relation on labels and the fuzzy relative discernibility relation on features are analyzed in details for label distribution data, then the fuzzy discernibility on the label space and the feature space is defined, and the significance of feature is constructed to assess the discernibility ability of the feature. On this basis, a feature selection algorithm is proposed for label distribution data, which can obtain the result of feature selection in descending order of feature significance. Finally, the experimental results show that the proposed algorithm is effective and feasible on several multi-label datasets.

    表 1 实验数据集描述Table 1 Description of experimental datasets
    图1 Emotions数据集分类效果随阈值变化情况Fig.1 Variation of the classification performance with the threshold for Emotions dataset
    图2 Gpositive数据集分类效果随特征数量变化情况Fig.2 Variation of the classification performance with the number of feature for datasets Gpositive
    参考文献
    相似文献
    引证文献
引用本文

熊传镇,钱文彬,王映龙.基于标记增强和模糊辨识度的标记分布特征选择[J].数据采集与处理,2021,36(3):529-543

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2020-04-12
  • 最后修改日期:2020-10-10
  • 录用日期:
  • 在线发布日期: 2021-05-25