保留非全长读段的ISO-seq数据转录组表达分析
作者:
作者单位:

1.南京航空航天大学计算机科学与技术学院,南京,211106;2.南京林业大学信息科学技术学院,南京,210037

作者简介:

通讯作者:

基金项目:

国家自然科学基金 61802193;江苏省自然科学基金 BK20170934国家自然科学基金(61802193)资助项目;江苏省自然科学基金(BK20170934)资助项目。


Transcriptome Expression Analysis of ISO-seq Data with Non-Full-Length Reads Reserved
Author:
Affiliation:

1.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China;2.College of Information Science and Technology, Nanjing Forestry University, Nanjing, 210037, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    近年来,基于单分子测序技术的ISO-seq数据以其超长读段长度被越来越多地应用于转录组新型异构体预测研究,但目前大多数研究工作只用到全长读段数据,丢失了非全长读段数据中较多有用信息,因而数据没有得到充分利用。针对这一问题,本文在保留非全长读段的基础上提出了两个能同时预测异构体结构和计算其表达比例的模型基于狄利克雷采样的异构体探测与预测(Dirichlet sampling for isoform detection and prediction, DSIDP)和基于马尔科夫链的异构体探测与预测(Markov chain for isoform detection and predition, MCIDP)。两个模型均从全长读段中建立异构体预测集,并采用全长读段和非全长读段计算异构体表达比例。DSIDP将所有读段比对至异构体预测集,并使用Dirichlet采样解决多源映射问题,MCIDP使用马尔科夫链模拟基因外显子之间的选择性剪切,该模型还能预测出数据中没有全长读段的异构体。本文采用模拟数据和真实数据验证了两个模型的有效性。

    Abstract:

    ISO-seq data based on single molecule sequencing are widely used in novel isoform detection due to its long read length in recent years. Most of the current researches only utilize full-length reads, thus lots of information in the non-full-length reads is lost. To address this problem, two models, DSIDP and MCIDP, are proposed in this paper to predict the structure of isoforms and calculate their expression levels with non-full-length reads reserved. Both models establish a predictive isoform set from full-length reads and calculate their expression levels with all reads including non-full-length reads and full-length reads.DSIDP maps all reads to the set and solves the multi-mapping problem with Dirichlet sampling. Utilizing Markov chains to simulate alternative splicing between gene exons, MCIDP can also predict isoforms that have no supports of full-length reads in raw data. Both models are validated on simulation and real data.

    参考文献
    相似文献
    引证文献
引用本文

刘学军,瞿锡垚,张礼.保留非全长读段的ISO-seq数据转录组表达分析[J].数据采集与处理,2019,34(4):594-604

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2018-09-01
  • 最后修改日期:2018-12-24
  • 录用日期:
  • 在线发布日期: 2019-09-01