基于聚类和核密度估计假设检验的异常值检测方法
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Outlier Detection Based on Clustering and KDE Hypothesis Testing
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    异常值检测是数据挖掘领域中的核心问题,在工业生产中也有着广泛的应用。准确高效的异常值检测方法能够及时反映出工业系统运行状态,为相关人员提供参考,而传统的异常值检测方法无法很好地检测出变化模式复杂、变化范围小、具有流数据特性的数据中的异常值。因此,本文提出了一种新的针对该类型数据的异常值检测方法:首先通过对数据进行聚类划分,将相似的数据进行归类,从而将原本复杂的数据分布拆解成为每个聚类下简单数据分布的叠加;然后使用核密度估计假设检验的方法对待检测数据进行异常值检测。在标准数据集和真实数据上的实验结果表明,该方法相比于传统的异常值检测方法在检测精度上有一定的提升。

    Abstract:

    Outlier detection is the core problem in data mining and is widely used in industrial production. Accurate and efficient outlier detection method can reflect the condition of industrial system in time, which provides reference for the relevant personnel. Traditional outlier detection algorithms can′t efficiently detect outliers in those data with complicated change modes, small change range and the characteristics of streaming data. In this paper a new method for detecting outliers is proposed. Firstly, the data are clustered into several categories by clustering. The data in the same categories share the common characteristics. In this way, we believe that the data in the same categories are under the same distribution which are simpler to fit than the whole data. So the original complex data distribution can be factored into several simple distributions. Secondly, kernel density estimation (KDE) hypothesis testing is used for abnormal value detection. Experiments in the UCI dataset and real industrial data show that the proposed method is more efficient than traditional methods.

    参考文献
    相似文献
    引证文献
引用本文

周春蕾田品卓杨晨琛王皓.基于聚类和核密度估计假设检验的异常值检测方法[J].数据采集与处理,2017,32(5):997-1004

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-04-10