基于RNA-seq的基因训练集构建方法
作者:
作者单位:

作者简介:

段荣静(1980-),女,助理研究员,研究方向:生物信息学,E-mail:duangri@njau.edu.cn;刘金定(1978-),男,博士,副教授,研究方向:生物信息学与数据挖掘,E-mail:liujd@njau.edu.cn

通讯作者:

基金项目:

国家自然科学基金(31301691)资助项目;教育部中央高校基本业务费(KYZ201667,KJQN201430)资助项目。


Construction Method of Gene Training Set Based on RNA-seq
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    针对基因组新测序物种缺乏高质量的基因结构用于从头预测软件训练的现状,本文提出了一种以新测序物种自身RNA-seq组装为基础的可靠基因训练集构建方法(Building reliable training gene set,BRTGS)。该方法利用RNA-seq组装获得大量初始基因结构,然后根据蛋白同源证据筛选具有正确且编码区相对完整的基因结构,最后综合利用RNA-seq组装结构和蛋白同源证据统计信息确定的基因起始密码子和终止密码子位置,从而获得基因完整的编码结构。实验结果表明,该方法不仅可为各种组装水平的基因组构建高质量的基因训练集,而且从头预测软件在这些基因集上训练后能够获得很好的预测性能。

    Abstract:

    There are no extant high-quality gene structures for newly sequenced genomes to train ab initio gene prediction algorithms. In the study, we present the building reliable training gene set(BRTGS) computational method for building reliable training gene set from RNA-seq assembly. Firstly, the initial gene structures are obtained from RNA-seq assembly. Then, the gene structures with complete and correct coding region are identified with the alignments of transcripts against homology protein. Finally, the sites of start and stop codon are determined according to the homology evidences and RNA-seq assembly structures. Experimental results show that BRTGS can build high-quality of training gene set for various genomes and ab initio algorithms trained on the gene sets can obtain good prediction performance.

    参考文献
    相似文献
    引证文献
引用本文

段荣静, 刘金定.基于RNA-seq的基因训练集构建方法[J].数据采集与处理,2018,33(4):637-645

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2014-05-09
  • 最后修改日期:2016-10-14
  • 录用日期:
  • 在线发布日期: 2018-09-08