信息增益混合邻域粗糙集的肺部肿瘤高维特征选择算法
作者:
作者单位:

1.宁夏医科大学理学院,银川,750004;2.北方民族大学计算机科学与工程学院,银川,750021;3.中国电信股份有限公司宁夏分公司,银川,750002;4.宁夏智能信息与大数据处理重点实验室,银川,750021

作者简介:

通讯作者:

基金项目:

国家自然科学基金(61561040)资助项目;宁夏312人才计划资助项目;北方民族大学引进人才科研启动(2020KYQD08)资助项目。


High-Dimensional Feature Selection Algorithm for Lung Tumors Based on Information Gain and Neighborhood Rough Set
Author:
Affiliation:

1.School of Science, Ningxia Medical University, Yinchuan, 750004, China;2.School of Computer Science and Engineering, North Minzu University, Yinchuan, 750021, China;3.China Telecom Corporation Limited Ningxia Branch, Yinchuan, 750002, China;4.Ningxia Key Laboratory of Intelligent Information and Big Data Processing, Yinchuan, 750021, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    针对冗余属性和不相关属性过多对肺部肿瘤诊断的影响以及Pawlak粗糙集只适合处理离散变量而导致原始信息大量丢失的问题,提出混合信息增益和邻域粗糙集的肺部肿瘤高维特征选择算法(Information gain-neighborhood rough set-support vector machine,IG-NRS-SVM)。该算法首先提取3 000例肺部肿瘤CT图像的104维特征构造决策信息表,借助信息增益结果选出高相关的特征子集,再通过邻域粗糙集剔除高冗余的属性,通过两次属性约简得到最优的特征子集,最后采用网格寻优算法优化的支持向量机构建分类识别模型进行肺部肿瘤良恶性的鉴别。从约简和分类识别两个角度验证方法的可行性与有效性,并与不约简算法、Pawlak粗糙集、信息增益和邻域粗糙集约简算法进行对比。结果表明混合算法精确度优于其他对比算法,精确度达到96.17%,并且有效降低了时间复杂度,对肺部肿瘤计算机辅助诊断具有一定的参考价值。

    Abstract:

    Aiming at the influence of excessive redundant and unrelated attributes on the diagnosis of lung tumors and the fact that Pawlak rough set is only suitable for dealing with discrete variables and causing a large loss of original information, a high-dimensionality of lung tumors with mixed information gain and neighborhood rough set is proposed.The algorithm first extracts the 104-dimensional feature structure decision information table of 3 000 CT images of lung tumors. With the information gain result, the high correlation feature subset is selected, and the high redundancy attribute is eliminated by the neighborhood rough set. The optimal feature subset is obtained through two attribute reductions. Finally, the support vector machine optimized by the grid optimization algorithm is used to construct the classification recognition model to identify the benign and malignant lung tumors.The feasibility and effectiveness of the method are verified from the two aspects of reduction and classification, and compared with the non-reduction algorithm, Pawlak rough set, information gain and neighborhood rough set reduction algorithm.The results show that the accuracy of the hybrid algorithm is better than other comparison algorithms, the accuracy is 96.17%, and the time complexity is effectively reduced. It has certain reference value for computer-aided diagnosis of lung tumors.

    参考文献
    相似文献
    引证文献
引用本文

陆惠玲,周涛,张飞飞,霍兵强.信息增益混合邻域粗糙集的肺部肿瘤高维特征选择算法[J].数据采集与处理,2020,35(3):536-548

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2019-10-30
  • 最后修改日期:2019-12-04
  • 录用日期:
  • 在线发布日期: 2020-05-25