改进的RNA-Seq数据转录组表达分析研究
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Improved Trancriptome Expression Analysis for RNA Seq Data
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    基于高通量测序的RNA-Seq(RNA-sequencing)是用于转录组研究的一种新技术,针对该技术在转录组表达分析研究中存在的读段多源映射和读段非均匀分布等难点,提出一个改进的转录组表达研究方法LDASeqII(Improvement of latent Dirichlet allocation for sequencing data)。模型利用剪接异构体结构信息对参数进行约束并进行外显子读段数目归一化处理,解决了读段非均匀分布下的多源映射问题。通过引入“伪外显子”和“伪转录本”分别处理接合区读段和噪声读段。将模型应用到真实数据集上,并与原LDASeq(Latent Dirichlet allocation for sequencing data)模型和目前流行的 Cufflinks与RSEM(RNA-Seq by expectation maximization)方法进行对比。结果显示,改进方法获得了更为准确的转录本及基因表达水平计算结果。

    Abstract:

    RNA Seq(RNA sequencing), based on high throughput sequencing, is a new technique for transcriptome research.Considering the difficulties in the analysis of transcript expression using RNA Seq data, an improved method, improvement of latent dirichlet allocation for sequencing data(LDASeqⅡ) is proposed to calculate the transcript expression.To deal with multi-mappings between reads and isoforms and non-uniform distribution of reads along reference, LDASeqⅡ utilizes the known gene-isoform annotation to constrain the hyperparameters and normalizes the read counts by exon length for each individual exon.By introducing ″pseudo-exon″ and ″pseudo-transcript″, the conjunction reads and noise reads gain proper treatments.LDASeqⅡ is validated using two real datasets on gene and transcript expression calculation and compared with latent dirichlet allocation for sequencing data(LDASeq) and other two popular methods Cufflinks and RNA Seq by expectation maximization(RSEM). The results show that LDASeqⅡ obtains more accurate transcript and gene expression measurements than other approaches.

    参考文献
    相似文献
    引证文献
引用本文

石新新 刘学军 张礼.改进的RNA-Seq数据转录组表达分析研究[J].数据采集与处理,2015,30(5):1028-1035

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2015-10-29