Random Forest Clustering Based on Valley Detection and Three-Way Ensemble Selection
CSTR:
Author:
Affiliation:

Jiangsu Key Laboratory of Educational Intelligent Technology, School of Artificial Intelligence and Computer Science, Jiangsu Normal University, Xuzhou 221116, China

Clc Number:

TP181

Fund Project:

National Natural Science Foundation of China (No.62006104); Postgraduate Research & Practice Innovation Program of Jiangsu Normal University (No.2025XKT1457).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    As an unsupervised learning method, although random forest clustering demonstrates strong robustness in processing high-dimensional and complex data, it still faces challenges such as weak discriminability of original data caused by the introduction of negative samples and the interference of noisy decision trees on clustering performance. To address these issues, this paper proposes a random forest clustering based on valley detection and three-way ensemble selection (VDTES-RFC) method. First, the valley detection technology is utilized to identify potential split points for generating training data, and the Gini index is calculated to determine the optimal split points to complete the training of the classification forest. Second, each decision tree is treated as a base clusterer to extract its similarity matrix, and a three-way ensemble selection strategy is adopted to select high-quality decision trees to construct a new forest. Finally, a consensus function is used to integrate the similarity matrices to obtain the final clustering result. Experimental results demonstrate that this method effectively improves clustering accuracy and robustness, achieving dual optimization of efficiency and performance.Highlights:1. The paper proposes a valley detection and three-way ensemble selection-based random forest clustering (VDTES-RFC) method to overcome original data discriminability loss and noisy decision tree interference.2. The paper develops a dual-stage clustering scheme centered on potential split point optimization and dynamic tree filtering. It aligns Gini index-based data partitioning with similarity matrix extraction to ensure high-quality base clusterers.3. The paper adopts a three-way ensemble selection strategy combined with a consensus function to filter high-quality decision trees. It achieves a dual optimization of clustering efficiency and performance.

    Reference
    Related
    Cited by
Get Citation

TAN Cheng, LI Jinyu, ZHANG Wenbin, DU Mingjing. Random Forest Clustering Based on Valley Detection and Three-Way Ensemble Selection[J]. Journal of Data Acquisition and Processing,2026,(3):780-794.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 15,2025
  • Revised:July 03,2025
  • Adopted:
  • Online: June 10,2026
  • Published:
Article QR Code