College of Computer Science and Technology, Nanjing University of Aeronautics & Astronautics, Nanjing 211106, China
Fund Project:
摘要
|
图/表
|
访问统计
|
参考文献
|
相似文献
|
引证文献
|
资源附件
摘要:
虽然软大间隔聚类(Soft large margin clustering,SLMC)相比其他诸如K-Means等算法具有更优的聚类性能与某种程度的可解释性,然而当面对大规模分布存储数据时,均遭遇了同样的可扩展瓶颈,其涉及的核矩阵计算需要高昂的时间代价。消减此代价的有效策略之一是采用随机Fourier特征变换逼近核函数,而逼近精度所依赖的特征维度常常过高,隐含着可能过拟合的风险。本文将稀疏性嵌入核SLMC,结合交替方向乘子法(Alternating direction method of multipliers, ADMM),给出了一个分布式稀疏软大间隔聚类算法(Distributed sparse SLMC, DS-SLMC)来克服可扩展问题,同时通过稀疏化获得更好的可解释性。
Abstract:
Soft large margin clustering (SLMC) has been proved to achieve better clustering performance and interpretability than other algorithms, such as K-Means. However, when facing large scale distributed data storage, computing involved kernel matrix requires large time cost. One of the effective strategies to reduce this time cost is to use random Fourier feature transform to approximate the kernel function, and the feature dimension on which approximating accuracy depends is often too high, which implies the risk of overfitting. This paper embeds the sparsity into kernel SLMC and combines the alternating direction method of multipliers (ADMM) with SLMC. Finally, we propose a distributed sparse soft large margin clustering algorithm (DS-SLMC) to overcome scalability problem and achieve better interpretability through sparsity.