融合矩阵分解和代价敏感的微生物数据扩增算法
作者:
作者单位:

西南石油大学计算机科学学院,成都 610500

作者简介:

通讯作者:

基金项目:

中央引导地方科技发展专项项目(2021ZYD0003);西南石油大学启航计划(2018QHR007)。


Fusing Matrix Factorization and Cost-Sensitive Microbial Data Augmentation Algorithm
Author:
Affiliation:

School of Computer Science, Southwest Petroleum University, Chengdu 610500, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    微生物会对人类健康产生直接影响,对相关数据的分析有助于疾病诊断。然而,采集到的数据存在类不平衡与高稀疏性两个问题。现有的过采样方法在一定程度上可缓解数据的类不平衡,但是难以应对微生物数据的高稀疏性。本文提出了一种融合矩阵分解和代价敏感的数据扩增算法,其包含3个技术。首先,将原始矩阵分解为样本子空间和特征子空间;其次,利用样本子空间的正向量及其近邻向量生成合成向量;最后,根据合成向量与所有负向量的距离对其过滤。实验在8个微生物数据集上进行,同时与5种过采样算法对比,结果表明本文所提算法能够增强正样本的多样性,在识别出更多正样本的同时,分类结果的代价更低。

    Abstract:

    Microorganisms have a direct impact on human health, and the analysis of relevant data is helpful for disease diagnosis. However, the collected data suffers from two problems: class imbalance and high sparseness. Existing oversampling methods can alleviate the class imbalance of data to a certain extent, but it is difficult to cope with the high sparsity of microbial data. This paper proposes a data augmentation algorithm that fuses matrix factorization and cost-sensitive, which consists of three techniques. First, the original matrix is decomposed into a sample subspace and a feature subspace. Second, the positive vectors of the sample subspace and their neighbor vectors are used to generate synthetic vectors. Finally, the synthetic vectors are filtered according to their distance from all negative vectors. The proposed algorithm is compared with five oversampling algorithms on 8 microbial datasets. The results show that the proposed algorithm can enhance the diversity of positive samples and identify more positive samples with lower classification cost.

    参考文献
    相似文献
    引证文献
引用本文

王曦,温柳英,闵帆.融合矩阵分解和代价敏感的微生物数据扩增算法[J].数据采集与处理,2023,38(2):401-412

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-05-18
  • 最后修改日期:2022-11-22
  • 录用日期:
  • 在线发布日期: 2023-03-25