基于分层抽样的k近邻分类加速算法
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Acceleration Algorithm for k Nearest Neighbor Classification Based on Stratified Sampling
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    k近邻(k nearest neighbor, kNN)分类作为数据挖掘中最典型的算法之一,以较高的泛化性能以及充足的理论基础被广泛应用。然而kNN在测试时需要计算待识别实例与所有训练实例之间的距离,以至于在面对大规模数据时需要大量的时间。 为此,提出一种基于分层抽样的kNN加速算法(KNN based on stratified sampling,SS-kNN)。首先将训练实例所在的空间划分为若干个实例个数相等的区域,然后从每个区域内抽取实例,最后判定待识别实例落入划 分区域中的哪一个,并从此区域以及相邻区域抽取的实例中寻找其k个近邻。与原始kNN算法以及基于随机抽样的kNN算法相比,SS-kNN算法可以获得与其相近分类精度,但将其运 行速度分别提高大约399倍和16倍。

    Abstract:

    k nearest neighbor (kNN), which is one of the most typical data mining algorithms, is widely applied in various areas due to its better generation ability and sufficient theory results. The method needs to compute the distances between the test instances and all the training instances during executing prediction. However, it costs substantial time as facing the large-scale data. To solve the problem, we propose an acceleration algorithm for k nearest neighbor classification based on stratified sampling (SS-kNN). In the method, SS-kNN firstly divides the instance space into several subranges with the same number of instances, and then samples instances from each subrange, finally judges which subrange the test instance sit and finds its nearest neighbors from this subrange. Compared with kNN and its variant based on the random sampling, SS-kNN could not only obtain the similar classification accuracy, but also accelerates the running time by an average of 399 and 16 times respectively.

    参考文献
    相似文献
    引证文献
引用本文

宋云胜梁吉业.基于分层抽样的k近邻分类加速算法[J].数据采集与处理,2017,32(6):1153-1162

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-04-10