基于分步聚类的人名消歧算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Name Disambiguation Based on Clustering by Step
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    针对知识库中存在单条实体定义特征稀疏和人工设置相似度阈值适用性不强的问题,本文提出了一种基于分步聚类的人名消歧算法。首先,将知识库中人名实体定义的人物属性特征作为查询特征,利用文本检索的方式实现基于知识库的初次聚类,弥补了知识库中单条实体定义中特征稀疏的问题;然后,利用初次聚类的结果,采用基于自适应阈值的凝聚层次聚类算法实现知识库人名消歧;最后,采用条件随机场进行Other类识别,利用基于自适应阈值的凝聚层次聚类完成S类聚类,从而实现非知识库人名消歧。在CLP2012的中文人名消歧评测语料上进行实验,结果表明本文的算法能够有效地对人名进行消歧。

    Abstract:

    In the knowledge base there exist characteristics of sparse for a single entity, and it is difficult to determine the similarity threshold of clustering. Therefore, this paper presents a name disambiguation algorithm based on cluster by step. Firstly, query features for character attribute are obtained from knowledge base, and the initial clustering based on knowledge base is carried out by text retrieval, which make up characteristics of sparse for a single entity name defined in knowledge base. Then, taking initial clustering results as input, name disambiguation in knowledge base is completed by using hierarchical clustering algorithm based on adaptive threshold. Finally, the other classes are identified by conditional random fields, and the cluster by using hierarchical clustering algorithm based on adaptive threshold is completed. The experiment on data of CLP2012 Chinese person name disambiguation results shows that the proposed algorithm can effectively achieve disambiguation names.

    参考文献
    相似文献
    引证文献
引用本文

阳怡林 周杰李弼程 席耀一.基于分步聚类的人名消歧算法[J].数据采集与处理,2016,31(1):213-222

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-04-09