基于惩罚逻辑回归的乳腺癌预测
作者:
作者单位:

1.重庆工商大学数学与统计学院,重庆 400067;2.重庆工商大学经济社会应用统计重庆市重点实验室, 重庆400067;3.重庆工商大学长江上游经济研究中心,重庆 400067

作者简介:

通讯作者:

基金项目:

重庆市第五批高等学校优秀人才支持计划(68021900601)资助项目; 重庆市科委基础研究与前沿探索一般项目(cstc.2018jcyjA2073) 重庆市统计学研究生导师团队(yds183002) 重庆市教委科学技术研究计划重大项目(KJZD-M202100801) 重庆市社会科学规划项目(2019WT59) 社会经济应用统计重庆市重点实验室平台开放项目(KFJJ2018066) 重庆工商大学数理统计团队(ZDPTTD201906) 资助项目。


Prediction of Breast Cancer Based on Penalized Logistic Regression
Author:
Affiliation:

1.School of Mathematics and Statistics, Chongqing Technology and Business University, Chongqing 400067, China;2.Chongqing Key Laboratory of Economic and Social Applied Statistics, Chongqing Technology and Business University, Chongqing 400067, China;3.Research Center for Economy of Upper Reaches of the Yangtse River, Chongqing Technology and Business University, Chongqing 400067, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    本文采用惩罚逻辑回归方法,利用威斯康星大学的乳腺癌数据对乳腺肿瘤进行预测。首先选取与乳腺癌相关的10个指标作为自变量,接着采用逻辑回归、LASSO惩罚逻辑回归、L2惩罚逻辑回归和弹性网惩罚逻辑回归作为分类器,利用75%的数据集作为训练集建立模型,最后利用25%的测试集、混淆矩阵和ROC曲线评估不同模型的预测精度。结果表明,LASSO惩罚逻辑回归的预测表现最好,预测精度达到97.18%;弹性网惩罚逻辑回归的预测表现随着α的增大发生变化,特别当α=0.9时,预测精度达到97.18%,与LASSO惩罚逻辑回归的预测表现一样好;L2惩罚逻辑回归的预测表现排第3,逻辑回归表现最差。因此,在乳腺肿瘤诊断中可借助LASSO惩罚逻辑回归和弹性网惩罚逻辑回归提高诊断精度。

    Abstract:

    In this paper, we mainly apply the breast cancer data from University of Wisconsin System to predict breast cancer using penalized logistic regression. Firstly, the ten indicators related to breast cancer are selected as the predictor variables. Then, logistic regression, the LASSO penalized logistic regression, the L2 penalized logistic regression and the elastic net penalized logistic regression are used as the four classifiers. 75% of the data set is used as the training set to build models. Finally, 25% test set, a confusion matrix and a ROC curve are used to evaluate their prediction accuracy. The results show that the LASSO penalized logistic regression performs best, whose prediction accuracy reaches 97.18%. The prediction performance of the elastic net penalized logistic regression changes with the increase of α, especially when α=0.9, the corresponding prediction accuracy is 97.18%, as good as that of LASSO penalized logistic regression. The L2 penalized logistic regression ranks the third and logistic regression performs the worst in prediction performance. Therefore, for the diagnosis of breast tumors, doctors can apply the LASSO penalized logistic regression and the elastic net penalized logistic regression to improve the diagnostic accuracy.

    图1 10个预测变量的相关系数矩阵图Fig.1 Correlation coefficient matrix for ten prediction variables
    图2 几种模型的ROC曲线Fig.2 ROC curves for several models
    参考文献
    相似文献
    引证文献
引用本文

胡雪梅,谢英,蒋慧凤.基于惩罚逻辑回归的乳腺癌预测[J].数据采集与处理,2021,36(6):1237-1249

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2020-12-03
  • 最后修改日期:2021-05-26
  • 录用日期:
  • 在线发布日期: 2021-11-25