基于核极限学习机的多标签数据流集成分类方法

doi:10.16337/j.1004-9037.2022.01.016

首页 > 按月查看>2022年第1月 >183-193. DOI:10.16337/j.1004-9037.2022.01.016

基于核极限学习机的多标签数据流集成分类方法
DOI:
                        10.16337/j.1004-9037.2022.01.016
                    
作者:
                        
                        
                    
作者单位:1.大数据知识工程教育部重点实验室(合肥工业大学)，合肥 230601;2.合肥工业大学计算机与信息学院，合肥 230601
作者简介:
通讯作者:
基金项目:国家自然科学基金（61976077, 62076085）。

Multi-label Data Stream Ensemble Classification Approach Based on Kernel Extreme Learning Machine

Author:

Affiliation:

1.Key Laboratory of Big Data Knowledge Engineering Ministry of Education (Hefei University of Technology), Hefei 230601,China;2.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

极限学习机因具有高效处理、性能优越以及更少人工参数设定等优点，已成功应用于批处理多标签分类问题。然而，实际应用领域涌现的数据流呈现海量快速、多标签和概念漂移等特点，使得这些传统的多标签分类算法面临精度与时空的挑战。本文提出一种基于核极限学习机的多标签数据流集成分类方法。首先，为适应数据流环境，利用滑动窗口机制将数据流划分为数据块，在前k个数据块上构建k个核极限学习机的集成分类模型；同时，考虑类标签相关性，利用Apriori算法得到每个数据块的标签间的关联规则，并将关联规则中的同现标签的置信度引入到基于集成模型的预测过程中，以提高整体的分类精度；其次，引入MUENLForeset模型检测新到来的数据块是否发生概念漂移，对分类器设置损失函数更新集成模型以适应概念漂移问题。最后，在实际多标签数据上的大量实验表明：与经典多标签批处理和流数据分类方法相比，所提方法不仅能适应多标签数据流中的概念漂移问题，同时在分类精度上具有显著优势。

Abstract:

Extreme learning machine has a series of achievements on batch processing due to high-activity processing， superior performance， less manual parameter settings and so on， which has been successfully applied in multi-label classification. However， data streams emerging in the real-world applications present the characteristics of high-volume， high-speed， multi-label and concept drift， which poses the challenges in accuracy， time and space consumptions for traditional multi-label classification algorithms. Therefore， this paper proposes a multi-label classification data stream ensemble approach based on kernel extreme learning machine （KELM）. Firstly， to adapt to the environment of data streams， the sliding window mechanism is used to partition data chunks， and an ensemble model consisted of k KELM models is built on k data chunks. Meanwhile， considering the label correlation， the Apriori algorithm is used to achieve the association rules of labels， and the confidence of label occurrence is introduced in the prediction using the generated model. Secondly， the MUENLForest model is introduced to detect whether a concept drift occurs in the new arriving data chunk， correspondingly the loss function is specified to update the ensemble model for adapting to concept drifts. Finally，massive experiments on the real multi label data sets demonstrate that the proposed approach outperforms the traditional multi label classification methods in accuracy and can adapt data drifts in multi label data streams quickly.