基于负载均衡和冗余剪枝的并行FP-Growth算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Parallel FP-Growth Algorithm Based on Load Balancing and Redundancy Pruning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    针对现有的并行FP-Growth算法在数据并行分组时存在数据冗余和负载不均的问题,提出了基于负载估算和冗余剪枝的优化算法。首先,在采用高频策略分组时,引入节点任务估算方法,把每个分组中最大模式树的最长路径和支持度作为该分组的估计值,将估计值远大于其他节点的分组进行分割,平均到其他分组中,并且对不同分 组中重复的列表元素进行截断,去除冗余数据。实验表明,本文提出的算法能够有效防止并行化的数据倾斜,减少数据冗余,在时间和空间复杂度上要低于以前的并行化FP-Growth算法。

    Abstract:

    Focusing on the data redundancy and load-unbalancing problems in the ex isting parallel FP-Growth algorithm, an improved algorithm for redundancy pruning and load balancing is proposed. Firstly, the improved algorithm introduces group task estimation method when using high-frequency strategies group. The longest path in the maximum pattern tree and the highest frequency are used as estimation. The group task will be averaged to others again when the group estimated is much larger than the value of other group. Then, the repetitive elements are removed in the list of different groups. Experimental results show that the improved algorithm avoides the data skew in the MapReduce and it is superior to the original one due to its high execution efficiency and speedup.

    参考文献
    相似文献
    引证文献
引用本文

刘祥哲 刘培玉任敏 伊静 高钊.基于负载均衡和冗余剪枝的并行FP-Growth算法[J].数据采集与处理,2016,31(1):223-230

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-04-09