一种基于赋权联合概率模型的聚类算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Clustering Algorithm Based on Weighting Joint Probability Model
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    序列化信息瓶颈 (Sequential information bottleneck, sIB) 算法是一种广泛使用的聚类算法。该算法采用联合概率模型表示数据,对样本和属性的相关性有较好的表达能力。但是sIB算法采用的联合概率模型假设数据各个属性对聚类的贡献度相同,从而削弱了聚类效果。本文提出了赋权联合概率模型概念,采用互信息度量属性重要度,并构建赋权联合概率模型来优化数据表示,从而达到突出代表性属性、抑制冗余属性的目的。UCI数据集上的实验表明,基于赋权联合概率模型的WJPM_sIB算法优于sIB算法,在F1评价下,WJPM_sIB算法聚类结果比sIB算法提高了5.90%。

    Abstract:

    Sequential information bottleneck (sIB) algorithm is one of the widely used clustering algorithms. The sIB algorithm applies the joint probability model to describe data, which has good ability to express the relationship between data samples and data attributes. However, the sIB algorithm suggests that all data attributes are equally important, which influences the clustering effect. To address the issue, the paper proposes the weighting joint probability model. The proposed model applies the mutual information measurement to the important level of data attributes so that to highlight representative attributes and depress redundancy attributes. Experiments on UCI datasets show that the proposed the weighting joint probability model (WJPM) sIB algorithm based on WJPM improves the F1 measure by 5.90% than the sIB algorithm.

    参考文献
    相似文献
    引证文献
引用本文

姬波 叶阳东 卢红星.一种基于赋权联合概率模型的聚类算法[J].数据采集与处理,2016,31(1):130-138

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-04-09