基于串行交叉混合集成的概念漂移检测及收敛方法
作者:
作者单位:

1.山西大学计算机与信息技术学院,太原 030006;2.计算智能与中文信息处理教育部重点实验室(山西大学),太原 030006

作者简介:

通讯作者:

基金项目:

国家自然科学基金(62276157,U21A20513,62076154,61503229);中央引导地方科技发展资金(YDZX20201400001224);山西省自然科学基金(201901D111033);山西省重点研发计划项目(国际合作)(201903D421050)。


Concept Drift Detection and Convergence Based on Hybrid Ensemble of Serial and Cross
Author:
Affiliation:

1.School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;2.Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University), Ministry of Education, Taiyuan 030006, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    概念漂移处理大多采用集成学习策略,然而这些方法多数不能及时提取漂移发生后新分布数据的关键信息,导致模型性能较差。针对这个问题,本文提出一种基于串行交叉混合集成的概念漂移检测及收敛方法(Concept drift detection and convergence method based on hybrid ensemble of serial and cross,SC_ensemble)。在流数据处于平稳状态下,该方法通过构建串行基分类器进行集成,以提取代表数据整体分布的有效信息。概念漂移发生后,在漂移节点附近构建并行的交叉基分类器进行集成,提取代表最新分布数据的局部有效信息。通过串行基分类器和交叉基分类器的混合集成,该方法兼顾了流数据包含的整体分布信息,又强化了概念漂移发生时的重要局部信息,使集成模型中包含了较多“好而不同”的基学习器,实现了漂移发生后学习模型的高效融合。实验结果表明,该方法可使在线学习模型在漂移发生后快速收敛,提高了模型的泛化性能。

    Abstract:

    Concept drift is an important and difficult issue in streaming data mining tasks. At present, the concept drift processing methods adopt the ensemble learning strategy mostly. However, most of these methods cannot extract the key information of the new data distribution after concept drift, leading to poor model performance. To solve this problem, this paper proposes a concept drift detection and convergence method based on hybrid ensemble of serial and cross (SC_ensemble). When streaming data are in a stable state, the method trains serial base classifiers for ensemble learning, to extract effective information representing the overall data distribution. After concept drift occurs, parallel cross base classifiers are constructed near the drift site for ensemble learning, to extract the local effective information representing the latest data distribution. By ensemble learning of serial base classifiers and cross classifiers, the method takes into account the overall distribution information contained in streaming data, and strengthens the important local information when concept drift occurs, so that the ensemble model contains more “good but different” base learners, and realizes the efficient combination of learning models after concept drift. The experimental results show that the proposed method can make the online learning model converge quickly after concept drift, and improve the generalization performance of the model.

    参考文献
    相似文献
    引证文献
引用本文

郭虎升,高淑花,王文剑.基于串行交叉混合集成的概念漂移检测及收敛方法[J].数据采集与处理,2022,37(5):997-1011

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-09-16
  • 最后修改日期:2022-01-01
  • 录用日期:
  • 在线发布日期: 2022-10-12