Research and Implementation of Big Data Clustering Based on Spark

doi:10.16337/j.1004-9037.2018.06.016

Home > Archive>Volume 33, Issue 6, 2018 >1077-1085. DOI:10.16337/j.1004-9037.2018.06.016

Research and Implementation of Big Data Clustering Based on Spark
DOI:
                        10.16337/j.1004-9037.2018.06.016
                    
CSTR:
                        
Author:
                        
Affiliation:
Clc Number:TP391
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Traditional clustering algorithms can not meet the requirements of current big data processing because of the limitations of stand-alone memory and computing power. Therefore it is urgent to find new solutions. Aiming at problems occurred in stand-alone memory calculating, combined with iterative computing features of clustering algorithms, a clustering system based on Spark platform is proposed. For the two different types of data sets, which are sparse sets and dense sets, the system firstly uses different strategies to achieve data preprocessing. Secondly, the performance of different clustering algorithms on Spark platform is analyzed and the best solution is given. Finally, the computing speed is improved with data persistence technology. Experimental results show that the proposed system can effectively meet the requirements of massive data clustering analysis.

Reference

Cited by

Get Citation

Wang Lei, Zou Encen, Zeng Cheng, Xi Xuefeng, Lu You. Research and Implementation of Big Data Clustering Based on Spark[J].,2018,33(6):1077-1085.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 08,2017
Revised:November 13,2017
Adopted:
Online: December 06,2018
Published:

For Authors

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code