Abstract:RNA Seq(RNA sequencing), based on high throughput sequencing, is a new technique for transcriptome research.Considering the difficulties in the analysis of transcript expression using RNA Seq data, an improved method, improvement of latent dirichlet allocation for sequencing data(LDASeqⅡ) is proposed to calculate the transcript expression.To deal with multi-mappings between reads and isoforms and non-uniform distribution of reads along reference, LDASeqⅡ utilizes the known gene-isoform annotation to constrain the hyperparameters and normalizes the read counts by exon length for each individual exon.By introducing ″pseudo-exon″ and ″pseudo-transcript″, the conjunction reads and noise reads gain proper treatments.LDASeqⅡ is validated using two real datasets on gene and transcript expression calculation and compared with latent dirichlet allocation for sequencing data(LDASeq) and other two popular methods Cufflinks and RNA Seq by expectation maximization(RSEM). The results show that LDASeqⅡ obtains more accurate transcript and gene expression measurements than other approaches.