首页  |  期刊简介  |  稿件审稿流程  |  学术道德规范  |  编委会  |  征订启事  |  联系我们  |  English
中文核心期刊
中国科技论文统计源期刊
国际刊号:1004-9037
国内刊号:32-1367/TN
用户登录
  E-mail:  
  密  码:  
  作者 审稿  
  编辑 读者  
期刊向导
联系方式
  • 主管:中国科学技术协会
  • 主办:南京航空航天大学
  •           中国电子学会
  • 国际刊号:1004-9037
  • 国内刊号:32-1367/TN
  • 地址:南京市御道街29号
  • 电话:025-84892742
  • 传真:025-84892742
  • E-mail:sjcj@nuaa.edu.cn
  • 邮编:210016
基于分层抽样的k近邻分类加速算法
An Acceleration Algorithm for k Nearest Neighbor Classification Based on Stratified Sampling
投稿时间:2016-04-01  最后修改时间:2017-11-10
DOI:
中文关键词:  分层抽样;数据划分;近邻;分类精度; 运行时间
英文关键词:stratified sampling; data partition; nearest neighbor; classification accuracy; running time
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
作者单位E-mail
宋云胜 山西大学计算机与信息技术学院 sys_sd@126.com 
梁吉业 山西大学计算机与信息技术学院  
摘要点击次数: 121
全文下载次数: 2
中文摘要:
      随着传感技术的高速发展,来自于各行各业的数据正以指数量级增长,而这对如何管理和处理数据提出了巨大的挑战.k近邻(kNN)分类作为数据挖掘中最典型的算法之一,其以较高的泛化性能以及充足的理论基础被广泛的应用.然而kNN在测试时需要计算待识别实例与所有训练实例之间的距离,以至于在面对大规模数据时需要大量的时间.为此,我们提出了一种基于分层抽样的kNN加速算法(SS-kNN).首先将训练实例所在的空间划分为若干个实例个数相等的区域,然后从每个区域内抽取实例,最后判定待识别实例落入划分区域中的哪一个并从此区域以及相邻区域内抽取的实例中寻找其k个近邻.与原始kNN算法以及基于随机抽样的kNN算法相比,SS-kNN算法可以获得与其相近分类精度,但将其运行速度分别提高大约399倍和16倍.
英文摘要:
      With the development of the sensor technology, the data from all the fields is exponentially increasing. It is a great challenge to how to manage and process the data. K nearest neighbor (kNN), which is one of the most typical data mining algorithms, is widely applied in various areas due to its better generation ability and sufficient theory results. This method needs to compute the distances between the test instances and all the training instances during executing prediction. However, it costs substantial time as facing with the large-scale data. To solve this problem, we propose an acceleration algorithm for k nearest neighbor classification based on stratified sampling (SS-kNN). In this method, SS-kNN firstly divides the instance space into several subranges with the same number of instances, and then samples instances from each subrange, finally judges which subrange the test instance sit and finds its nearest neighbors from this subrange. Compared with kNN and its variant based on the random sampling, SS-kNN could not only obtain the similar classification accuracy, but also accelerates the running time by an average of 399 and16 times respectively.
查看全文  查看/发表评论  下载PDF阅读器
关闭

Copyright @2010-2015《数据采集与处理》编辑部

地址:南京市御道街29号        邮编:210016

电话:025-84892742      传真:025-84892742       E-mail:sjcj@nuaa.edu.cn

您是本站第897288位访问者 本站今日一共被访问84

技术支持:北京勤云科技发展有限公司