一种面向不完备信息系统的集对k-means聚类算法
作者:
作者单位:

1.华北理工大学理学院,唐山, 063210;2.华北理工大学迁安学院,唐山, 063210;3.河北省数据科学与应用重点实验室,唐山,063210

作者简介:

通讯作者:

基金项目:

河北省自然科学基金(F2018209374,F2016209344)资助项目。


A Set Pair k-means Clustering Algorithm for Incomplete Information System
Author:
Affiliation:

1.College of Science, North China University of Science and Technology, Tangshan,063210,China;2.Qian’an College, North China University of Science and Technology, Tangshan, 063210,China;3.Key Laboratory of Data Science and Application of Hebei Province, Tangshan, 063210,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    针对不完备信息系统的数据聚类问题,将集对分析理论引入k-means聚类中,同时为了更好地表示样本与类簇的关系,构建了一种面向不完备信息系统的集对k-means (Set pair k-means,SPKM)聚类算法。首先,基于集对理论提出了一种集对距离度量方法,并将该度量方法运用到k-means算法中,得到初步聚类结果;随后,对于同时属于多个类的样本,将其分配到相应类的边界域,对于只属于一个类的样本,将其分配到相应类的正同域或边界域,其中聚类结果由肯定属于该类簇的正同域、可能属于该类簇的边界域以及肯定不属于该类簇的负反域3个部分共同表示;最后通过选取UCI数据库中的6个数据集与4种对比算法进行实验评价。实验结果表明,SPKM算法在准确率、F1值、Jaccard系数、FMI和ARI等指标上均具有良好的聚类性能。

    Abstract:

    For the data clustering problem of incomplete information system, the set pair analysis theory is introduced into k-means clustering. At the same time, to better represent the relationship between the sample and the cluster, a set pair k-means(SPKM) clustering algorithm for incomplete information system is constructed. Firstly, a set pair distance measurement method is proposed according to set pair theory, and the measurement method is applied to the k-means algorithm to obtain the preliminary clustering results. Then, for samples belonging to multiple clusters at the same time, the samples are assigned into the boundary region of the corresponding clusters. And for samples belonging to only one cluster, it is assigned into the positive region or boundary region of the corresponding clusters. The clustering results are expressed by three parts, which are the positive region belonging to the cluster, the boundary region that may belong to the cluster and the negative region which does not belong to the cluster. Finally, six data sets in the UCI database and four contrast algorithms are selected for experimental evaluation. Experimental results show that the SPKM algorithm has good clustering performance in accuracy, F1 value, Jaccard coefficient, FMI and ARI.

    参考文献
    相似文献
    引证文献
引用本文

张春英,高瑞艳,刘凤春,王佳昊,陈松,冯晓泽,任静.一种面向不完备信息系统的集对k-means聚类算法[J].数据采集与处理,2020,35(4):613-629

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2020-04-30
  • 最后修改日期:2020-07-10
  • 录用日期:
  • 在线发布日期: 2020-08-07