Abstract:Informative gene selection is an essential step to perform tumor classification with large scale gene expression profiles. However, it is difficult to select informative genes related to tumor from gene expression profiles because of its characteristics such as high dimensionality and relatively small samples, many noises, and some of the genes are superfluous and irrelevant. To deal with the challenging problem of finding an informative gene subset with the least number of genes but the highest classification performance, a novel hybrid gene selection algorithm named SUNRS is proposed based on the symmetric uncertainty (SU) and neighborhood rough set (NRS). Firstly, the symmetric uncertain index, which aims to eliminate redundant and irrelevant genes, is used to select top-ranked genes as the candidate gene subset. Secondly, the neighborhood rough set reduction algorithm is used to obtain the target gene subset by optimizing the candidate gene subset. Experimental results show that the proposed algorithm can obtain higher classification accuracy with less informative gene, which not only improves the generalization performance of the algorithm, but also enhances the time efficiency.