Based on MapReduce and upper sampling, an approach for imbalanced big data classification is proposed in this paper. The proposed method includes five steps:(1) For each positive instance, its nearest neighbor is found by MapReduce. (2) Some positive instances on the line between the two points are created. (3) According to the cardinality of the set of positive instances, the set of negative instances is partitioned into some subsets. (4) Some balanced subsets are generated with the set of positive instances and the subset of negative instances. (5) Some classifiers are trained by extreme learning machine on the generated balanced subsets, and the trained classifiers are integrated by majority voting for classifying new instances. Experimental comparisons with three related methods are conducted on five imbalanced big data sets. The experimental results show that the proposed method outperforms the three methods.
Reference
Related
Cited by
Get Citation
Zhai Junhai, Zhang Mingyang, Wang Chenxi, Liu Xiaomeng, Wang Yaoda. Binary Ensemble Classification for Imbalanced Big Data Based on MapRecuce and Upper Sampling[J].,2018,33(3):416-425.