面向数据集成的多真值发现算法
作者:
作者单位:

广东工业大学计算机学院, 广州 , 510006

作者简介:

通讯作者:

基金项目:

广东省重大科技厅重大专项 2016B030306003广东省重大科技厅重大专项(2016B030306003)资助项目。


Multi‑Truth Finding Algorithms for Data Integration
Author:
Affiliation:

Department of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    大数据时代,大规模数据往往由多个数据源组成并服务于多个数据驱动型应用程序。由于数据源的可信度不同,不同数据源往往会产生数据冲突,使得难以判断哪些信息是真实的。近年来,真值发现方法通过从多个数据源中找到最符合现实的真值来解决冲突而成为研究热门。当前真值发现算通常假设实体某个属性只有一个真值,然而在现实中,实体具有多个真值的情况更为常见。针对多值实体提出了一个多真值发现算法,该算法将多真值发现转化为一个函数优化问题。根据对目标函数的求解选取置信度最高的多个值作为实体的真值。同时在计算描述值的置信度时,提出一种非对称的支持度计算方法,结合相似值的支持对其置信度进行修正。通过多个真实数据集上的实验表明本文算法的准确性优于现有的真值发现算法。

    Abstract:

    In the era of big data, large?scale data are often contributed by numerous data sources and used by many data?driven applications. Because of different trustworthiness of sources, different sources often produce data conflicts, making it difficult to determine which information is true. In recent years, truth finding has become a research hotspot by finding the most credibility values from multiple sources. The current truth finding methods usually assume that the entity has only one truth, while in reality, entities may have multiple true values. In this paper, we present an approach for multi?truth finding, which transforms the multi?truth finding into an optimization problem. In so doing, we select the values with the highest credibility as truths of entities. We also propose an asymmetric approach to compute support between values and incorporate influences of similar values to measure value credibility for better truth finding. Experiments on several data sets show that the effectiveness of our algorithm outperform the existing state?of?the?art techniques.

    参考文献
    相似文献
    引证文献
引用本文

陈烈锋,许青林.面向数据集成的多真值发现算法[J].数据采集与处理,2019,34(3):442-452

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2018-07-11
  • 最后修改日期:2018-09-20
  • 录用日期:
  • 在线发布日期: 2019-06-12