基于谷检测和三支集成选择的随机森林聚类方法
作者:
作者单位:

江苏师范大学人工智能与计算机学院江苏省高校教育智能技术重点实验室, 徐州 221116

作者简介:

通讯作者:

基金项目:

国家自然科学基金(62006104);江苏师范大学研究生科研与实践创新基金(2025XKT1457)。


Random Forest Clustering Based on Valley Detection and Three-Way Ensemble Selection
Author:
Affiliation:

Jiangsu Key Laboratory of Educational Intelligent Technology, School of Artificial Intelligence and Computer Science, Jiangsu Normal University, Xuzhou 221116, China

Fund Project:

National Natural Science Foundation of China (No.62006104); Postgraduate Research & Practice Innovation Program of Jiangsu Normal University (No.2025XKT1457).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    随机森林聚类作为一种无监督学习方法,虽在高维复杂数据处理中鲁棒性强,但面临负类样本引入导致原始数据区分度弱及噪声决策树干扰聚类效果的问题。针对上述问题,本文提出一种基于谷检测和三支集成选择的随机森林聚类(Random forest clustering based on valley detection and three-way ensemble selection, VDTES-RFC)方法。首先,利用谷检测技术寻找潜在分裂点生成训练数据,并计算Gini指数确定最优分裂点以完成分类森林训练;其次,将决策树视为基聚类器并提取其相似度矩阵,采用三支集成选择策略优选高质量决策树组成新森林;最后,使用共识函数整合相似度矩阵得到最终聚类结果。实验结果表明,该方法有效提升了聚类的准确性与鲁棒性,实现了效率与性能的双优化。

    Abstract:

    As an unsupervised learning method, although random forest clustering demonstrates strong robustness in processing high-dimensional and complex data, it still faces challenges such as weak discriminability of original data caused by the introduction of negative samples and the interference of noisy decision trees on clustering performance. To address these issues, this paper proposes a random forest clustering based on valley detection and three-way ensemble selection (VDTES-RFC) method. First, the valley detection technology is utilized to identify potential split points for generating training data, and the Gini index is calculated to determine the optimal split points to complete the training of the classification forest. Second, each decision tree is treated as a base clusterer to extract its similarity matrix, and a three-way ensemble selection strategy is adopted to select high-quality decision trees to construct a new forest. Finally, a consensus function is used to integrate the similarity matrices to obtain the final clustering result. Experimental results demonstrate that this method effectively improves clustering accuracy and robustness, achieving dual optimization of efficiency and performance.Highlights:1. The paper proposes a valley detection and three-way ensemble selection-based random forest clustering (VDTES-RFC) method to overcome original data discriminability loss and noisy decision tree interference.2. The paper develops a dual-stage clustering scheme centered on potential split point optimization and dynamic tree filtering. It aligns Gini index-based data partitioning with similarity matrix extraction to ensure high-quality base clusterers.3. The paper adopts a three-way ensemble selection strategy combined with a consensus function to filter high-quality decision trees. It achieves a dual optimization of clustering efficiency and performance.

    参考文献
    相似文献
    引证文献
引用本文

谭诚,李金玉,张文斌,杜明晶.基于谷检测和三支集成选择的随机森林聚类方法[J].数据采集与处理,2026,(3):780-794

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2025-06-15
  • 最后修改日期:2025-07-03
  • 录用日期:
  • 在线发布日期: 2026-06-10