Abstract:Colorectal cancer is one of the common malignant tumors in digestive system, and its mortality rate is third of malignant tumor mortality in developed countries. The aim of this paper is to identify the pathogenic gene of colorectal cancer through biological analysis and data mining. Firstly, the expression spectrum dataset GSE9348 is downloaded from GEO database, and 339 differentially expressed genes are screened with P<0.05 and Fold change>2 in colorectal cancer by using LIMMA Package in R language. Secondly, based on the known disease genes of colorectal cancer, OMIM database and STRING database, the PPI network composed by the differentially expressed genes and known disease genes is obtained. Furthermore, the network module analysis is performed through ClusterONE plugin of Cytoscape software, and a subnetwork containing 53 genes is obtained. Finally, through network topology analysis, 5 candidate genes of colorectal cancer are considered to be candidate disease genes of colorectal cancer, including CCND1, EGR1, FOS, CEBPB and NOS3. Simultaneously, the newly discovered genes are verified by using the functional enrichment analysis and literature mining.