基于词向量的实体链接方法

doi:10.16337/j.1004-9037.2017.03.020

首页 > 按月查看>2017年第3月 >604-611. DOI:10.16337/j.1004-9037.2017.03.020

基于词向量的实体链接方法
DOI:
                        10.16337/j.1004-9037.2017.03.020
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
基金项目:

Method of Entity Linking Based on Word Embedding

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

实体链接任务主要包括命名实体识别、查询扩展、候选实体选择、特征抽取和排序。本文针对查询词的扩展，提出了一种基于词向量的扩展方法。该方法利用连续词袋(Continuous bag-of-words,CBOW)模型训练语料中词语的词向量，然后将距离查询词近的词作为扩展词。词向量从语料中挖掘出词与词之间的语义相关性是对基于规则的查询扩展方法的补充，以此来召回候选实体。在特征抽取时，把文档之间的潜在狄利克雷分布（Latent Dirichlet allocation, LDA）的主题相似性作为特征之一。在计算文档相似性时，不再以高频词作为向量的维度，而是以基于词向量的相关词作为向量维度，由此得到文档的语义相似性特征。最后利用基于单文档方法的排序学习模型把查询词链接到相应的候选实体。实验结果表明利用该方法能使F1值达到0.71，具有较好的效果。

Abstract:

Entity linking includes entity discovery， query expansion， candidate generation， feature extraction and ranking. Here the query expansion method based on word embedding is proposed. Word embedding of words are trained by continuous bag-of-words (CBOW) model. Then the related words become the expansion words. The related words could make up the expansion based on rule. The related words could recall more and more candidate words simultaneously. In the feature extraction，the topic similarity between texts is extracted as the feature based on latent Dirichlet allocation(LDA). This paper extracts the synonyms based on word embedding as the dimension of text vector. Finally, learning to rank model is used to select the best candidate entity. The result shows that the method can ensure F1 reaching 0.71, and be effective for entity linking.

参考文献

相似文献

引证文献

引用本文

齐爱芹徐蔚然.基于词向量的实体链接方法[J].数据采集与处理,2017,32(3):604-611

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2017-06-28

引用本文

分享

文章指标

历史