Abstract:Classifiers are often used in entity resolution to classify record pairs into matches, non-matches and possible matches based on field similarity vector, in which case, the performance of classifiers is directly related to the performance of entity resolution. To improve the accuracy of classifier, a multiple classifier system is constructed. We make full use of the characters of entity resolution to distinguish the ambiguous instances before classification, vary the resampling ratio to generate a group of resampled instances, and use the resampled instances to train classifiers for constructing a parallel multiple classifier system. Moreover,we emphasize on the diversity and sparsity between classifiers to select the best classifier subset, and use non-linear programming and extreme value to solute the ensemble selection problem, respectively. Empirical experiments show the proposed multiple classifier system is superior to the state-of-art ones in accuracy due to resampling and ensemble selection.