Research and Implementation of Big Data Clustering Based on Spark
CSTR:
Author:
Affiliation:

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Traditional clustering algorithms can not meet the requirements of current big data processing because of the limitations of stand-alone memory and computing power. Therefore it is urgent to find new solutions. Aiming at problems occurred in stand-alone memory calculating, combined with iterative computing features of clustering algorithms, a clustering system based on Spark platform is proposed. For the two different types of data sets, which are sparse sets and dense sets, the system firstly uses different strategies to achieve data preprocessing. Secondly, the performance of different clustering algorithms on Spark platform is analyzed and the best solution is given. Finally, the computing speed is improved with data persistence technology. Experimental results show that the proposed system can effectively meet the requirements of massive data clustering analysis.

    Reference
    Related
    Cited by
Get Citation

Wang Lei, Zou Encen, Zeng Cheng, Xi Xuefeng, Lu You. Research and Implementation of Big Data Clustering Based on Spark[J].,2018,33(6):1077-1085.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 08,2017
  • Revised:November 13,2017
  • Adopted:
  • Online: December 06,2018
  • Published:
Article QR Code