Binary Ensemble Classification for Imbalanced Big Data Based on MapRecuce and Upper Sampling
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Based on MapReduce and upper sampling, an approach for imbalanced big data classification is proposed in this paper. The proposed method includes five steps:(1) For each positive instance, its nearest neighbor is found by MapReduce. (2) Some positive instances on the line between the two points are created. (3) According to the cardinality of the set of positive instances, the set of negative instances is partitioned into some subsets. (4) Some balanced subsets are generated with the set of positive instances and the subset of negative instances. (5) Some classifiers are trained by extreme learning machine on the generated balanced subsets, and the trained classifiers are integrated by majority voting for classifying new instances. Experimental comparisons with three related methods are conducted on five imbalanced big data sets. The experimental results show that the proposed method outperforms the three methods.

    Reference
    Related
    Cited by
Get Citation

Zhai Junhai, Zhang Mingyang, Wang Chenxi, Liu Xiaomeng, Wang Yaoda. Binary Ensemble Classification for Imbalanced Big Data Based on MapRecuce and Upper Sampling[J].,2018,33(3):416-425.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 07,2016
  • Revised:November 29,2016
  • Adopted:
  • Online: July 09,2018
  • Published:
Article QR Code