Fusing Matrix Factorization and Cost-Sensitive Microbial Data Augmentation Algorithm
CSTR:
Author:
Affiliation:

School of Computer Science, Southwest Petroleum University, Chengdu 610500, China

Clc Number:

TP181

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Microorganisms have a direct impact on human health, and the analysis of relevant data is helpful for disease diagnosis. However, the collected data suffers from two problems: class imbalance and high sparseness. Existing oversampling methods can alleviate the class imbalance of data to a certain extent, but it is difficult to cope with the high sparsity of microbial data. This paper proposes a data augmentation algorithm that fuses matrix factorization and cost-sensitive, which consists of three techniques. First, the original matrix is decomposed into a sample subspace and a feature subspace. Second, the positive vectors of the sample subspace and their neighbor vectors are used to generate synthetic vectors. Finally, the synthetic vectors are filtered according to their distance from all negative vectors. The proposed algorithm is compared with five oversampling algorithms on 8 microbial datasets. The results show that the proposed algorithm can enhance the diversity of positive samples and identify more positive samples with lower classification cost.

    Reference
    Related
    Cited by
Get Citation

Wang Xi, Wen Liuying, Min Fan. Fusing Matrix Factorization and Cost-Sensitive Microbial Data Augmentation Algorithm[J].,2023,38(2):401-412.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 18,2022
  • Revised:November 22,2022
  • Adopted:
  • Online: March 25,2023
  • Published:
Article QR Code