基于RAPIDS的无参DBSCAN算法
作者:
作者单位:

1.电子科技大学计算机科学与工程学院,成都611731;2.重庆电子工程职业学院人工智能与大数据学院,重庆401331;3.中国电子科技集团公司第二十九研究所,成都610036

作者简介:

通讯作者:

基金项目:

重庆市教委科学技术研究项目(KJQN202103109)。


Parameter-Free DBSCAN Algorithm Based on RAPIDS
Author:
Affiliation:

1.School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China;2.School of Artificial Intelligence and Big Data,Chongqing College of Electronic Engineering, Chongqing 401331,China;3.The 29th Research Institute,China Electronics Technology Group Corporation,Chengdu 610036,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    具有噪声的基于密度的空间聚类(Density-based spatial clustering of applications with noise, DBSCAN)能够发现不同密度和大小的类簇,对噪声也有很好的鲁棒性,被广泛地应用到数据挖掘的任务中。DBSCAN通常需要调整参数MinPtsEps以达到更优的聚类效果,但往往在搜索最优参数的过程中会影响DBSCAN的性能。本文从两个方面优化DBSCAN,一方面,提出一种无参的方法优化DBSCAN全局参数选择。无参方法利用自然最近邻获得数据集的自然特征值,并将自然特征值作为参数MinPts值。然后,根据自然特征值计算自然特征集合,利用自然特征集合中的数据分布特性,分别采取统计最小值、平均值和最大值3种方式得到Eps值。另一方面,采用集成数据科学实时加速平台(Real-time acceleration platform for integrated data science,RAPIDS)的图形处理器(Graphics processing unit,GPU)计算加快DBSCAN算法的收敛速度。实验结果表明,本文提出的方法在优化DBSCAN参数选择的同时,取得了与密度峰值聚类(Density peaks clustering, DPC)相当的聚类结果。

    Abstract:

    Density-based spatial clustering of applications with noise (DBSCAN) can find clusters of different densities and sizes, is also robust to noise, and is widely used in data mining tasks. DBSCAN needs to adjust the parameters MinPts and Eps to achieve a better clustering effect, but it often affects the performance of DBSCAN in the process of searching for the optimal parameters. This article optimizes DBSCAN from two aspects. On one hand, a parameter-free method is proposed to optimize DBSCAN global parameter selection. The parameter-free method uses the natural nearest neighbor to obtain the natural feature value of the data set, and uses the natural feature value as MinPts. Then, the natural feature set is calculated according to the natural feature value, and three strategies (i.e. statistics of minimum, mean and maximum) are used to obtain the Eps values by using the data distribution characteristics of the natural feature set. On the other hand, it uses the graphics processing unit (GPU) of the real-time acceleration platform for integrated data science (RAPIDS) platform to accelerate the convergence of DBSCAN algorithm. The experimental results show that the proposed method can optimize DBSCAN parameter selection while obtaining the comparable clustering results of density peaks clustering (DPC) algorithm.

    参考文献
    相似文献
    引证文献
引用本文

卢建云,邵俊明,张蔚.基于RAPIDS的无参DBSCAN算法[J].数据采集与处理,2023,38(2):426-438

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-09-30
  • 最后修改日期:2022-12-12
  • 录用日期:
  • 在线发布日期: 2023-03-25