基于耦合度量的多尺度聚类挖掘方法
作者:
作者单位:

1.河北师范大学计算机与网络空间安全学院,石家庄,050024;2.河北师范大学河北省供应链大数据分析与数据安全工程研究中心,石家庄,050024;3.河北师范大学河北省网络与信息安全重点实验室,石家庄,050024;4.河北地质大学信息工程学院,石家庄,050031;5.河北师范大学数学科学学院,石家庄,050024

作者简介:

通讯作者:

基金项目:

国家社会科学基金重大(13&ZD091, 18ZdA200)资助项目。


Multi-scale Clustering Mining Method Based on Coupled Metric Similarity
Author:
Affiliation:

1.College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang, 050024, China;2.Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security, Hebei Normal University , Shijiazhuang, 050024, China;3.Key Laboratory of Network & Information Security, Hebei Normal University, Shijiazhuang, 050024,China;4.College of Information Engineering, Hebei GEO University, Shijiazhuang, 050031, China;5.School of Mathematical Sciences, Hebei Normal University, Shijiazhuang, 050024, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    为了能够更好地对非独立同分布的多尺度分类型数据集进行研究,基于无监督耦合度量相似性方法,提出针对非独立同分布的分类属性型数据集的多尺度聚类挖掘算法。首先,对基准尺度数据集进行基于耦合度量的基准尺度聚类;其次,提出基于单链的尺度上推和基于Lanczos核的尺度下推尺度转换算法;最后,利用公用数据集以及H省真实数据集进行实验验证。将耦合度量相似性(Couple metric similarity, CMS)、逆发生频率(Inverse occurrence frequency, IOF)、汉明距离(Hamming distance, HM)等方法与谱聚类结合作为对比算法,结果表明,尺度上推算法与对比算法相比,NMI值平均提高13.1%,MSE值平均减小0.827,F-score值平均提高12.8%;尺度下推算法NMI值平均提高19.2%,MSE值平均减小0.028,F-score值平均提高15.5%。实验结果表明,所提出的算法具有有效性和可行性。

    Abstract:

    To better study the non-independent and identically distributed multi-scale categorical data sets, based on the unsupervised coupling measure similarity method, a multi-scale clustering mining algorithm for non-independent and identically distributed classification attribute data sets is proposed. Firstly, the data set of benchmark scale is clustered based on coupled metric similarity method. Secondly, scale conversion algorithms upscaling based on single chain and downscaling based on Lanczos kernel are proposed for scale conversion. Finally, experiments are performed using the public data sets and the real data sets of the H province. In the experiment, couple metric similarity (CMS), inverse occurrence frequency (IOF), hamming distance (HM) and other similarity metric methods combined with spectral clustering algorithm are compared and the experimental results demonstrate that the NMI value of the upscaling increases by 13.1%, the mean of MSE value reduces by 0.827, and the mean of F-score value increases by 12.8%. Compared with other comparison algorithms, the mean of NMI value of downscaling increases by 19.2%, the mean of MSE value reduces by 0.028, and the mean of F-score value increases by 15.5%. Experimental results and theoretical analysis show that the proposed algorithm is effective and feasible.

    参考文献
    相似文献
    引证文献
引用本文

田真真,赵书良,李文斌,张璐璐,陈润资.基于耦合度量的多尺度聚类挖掘方法[J].数据采集与处理,2020,35(3):549-562

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2019-12-01
  • 最后修改日期:2019-12-29
  • 录用日期:
  • 在线发布日期: 2020-05-25