Abstract:There are no extant high-quality gene structures for newly sequenced genomes to train ab initio gene prediction algorithms. In the study, we present the building reliable training gene set(BRTGS) computational method for building reliable training gene set from RNA-seq assembly. Firstly, the initial gene structures are obtained from RNA-seq assembly. Then, the gene structures with complete and correct coding region are identified with the alignments of transcripts against homology protein. Finally, the sites of start and stop codon are determined according to the homology evidences and RNA-seq assembly structures. Experimental results show that BRTGS can build high-quality of training gene set for various genomes and ab initio algorithms trained on the gene sets can obtain good prediction performance.