基于NaN-Bicluster SMOTE的非均衡信贷数据分类研究及应用
作者:
作者单位:

南京航空航天大学经济与管理学院,南京211106

作者简介:

通讯作者:

基金项目:

国家自然科学基金面上项目(71971115); 国家自然科学基金青年项目(72201126);智能决策与数字化运营工业和信息化部重点实验室项目(NJ2023027)。


Research and Application of Imbalanced Credit Data Classification Based on NaN-Bicluster SMOTE
Author:
Affiliation:

College of Economics and Management, Nanjing University of Aeronautics & Astronautics, Nanjing 211106, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    为了有效评估非均衡信贷数据中的借款人信用风险,基于合成少数过采样技术(Synthetic minority oversampling technique,SMOTE)、自然近邻(Natural neighbor,NaN)和双聚类(Bicluster)构建了NaN-Bicluster SMOTE方法以改进SMOTE。首先使用无参数的自然近邻设定采样样本选取的逻辑规则,规避了r近邻划分样本时产生的不稳定性;其次基于自然近邻稳定结构规定安全范围设定的逻辑规则,避免合成样本成为噪声样本;然后使用双聚类挖掘局部规则,以合成样本继承局部规则的方式改进SMOTE合成公式;最后,在Prosper小额贷款平台的非均衡信贷数据集上将NaN-Bicluster SMOTE与若干采样方法和机器学习模型进行对比分析,并进一步使用统计检验方法验证其性能的优越性。

    Abstract:

    To assess borrower’s credit risk using imbalanced data, we propose an improved SMOTE, called NaN-Bicluster SMOTE, which is based on synthetic minority oversampling technique (SMOTE), natural neighbor (NaN) and bicluster. Firstly, we use parameterless NaN to set logical rules for sampling sample selection, avoiding the instability caused by r nearest neighbor partitioning of samples. Secondly, based on the neighbor relationship of stable structure, we set logical rules that specify security range to avoid samples becoming noise samples. Then, we use bicluster to mine local rules, synthetic samples inherit local rules, and synthetic formula is improved. Finally, we apply several sampling methods and machine learning models, carry out various experiments of NaN-Bicluster SMOTE and comparative models on Prosper’s credit data, and further use statistical testing methods to verify the performance of NaN-Bicluster SMOTE.

    参考文献
    相似文献
    引证文献
引用本文

何亮,徐海燕,陈璐.基于NaN-Bicluster SMOTE的非均衡信贷数据分类研究及应用[J].数据采集与处理,2023,38(6):1482-1494

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-07-13
  • 最后修改日期:2022-10-05
  • 录用日期:
  • 在线发布日期: 2023-12-08