Download - Applied Bioinformatics Journal Club Pacbio RNA-Seq

Journal Club

A single-molecule long-read survey of the human transcriptome

Sharon et al., Nature Biotechnology 31, 1009–1014 (2013)

Sanzhen LiuPlant Pathology

3/12/2014

PacBio technology

• Amplification-free sequencing• Very long (up to 20kb, peak on 2-6 kb)• High errors (random, no-context-specific errors)

PacBio website

CCS approach

• High-quality, single-molecule, circular-consensus (CCS) reads

http://flxlexblog.wordpress.com/2013/02/11/applications-for-pacbio-circular-consensus-sequencing/

Figure 1

• Input: pooled RNAs from 20 tissues• Approach: prepare double-stranded cDNAs -> CCS library -> PacBio sequencing• Output: 476,000 CCS reads, mean=1kb

• 61% reads cover all introns and most first and last exons• CCS reads well cover (generally >90%) short transcripts (<1.2 kb) but stay low

coverage for long transcripts, especially for those with >2.4 kb

Figure 2

Missing 3’ ends

Missing 5’ ends

The correlations of the number of reads and …

ERCC, mixture of known/quantified RNAs

Figure 3

• 67% molecules with splicing sites were estimated• CSMM: consensus split-mapped molecule (accurate CCS reads with splicing sites?)

• Splicing sites well match annotated splicing sites• PacBio (versus 454) exhibits much higher power to detect isoforms with >=10 introns• Estimate: 21,000 genes and 139,000 isoforms can be detected with high-depth seq

Summary

• Full-length RNA of up to 1.5kb can readily be monitored with little sequence loss at the 5’ ends

• With 476k CCS reads (>300bp), 14,000 spliced genes were identified.

• The majority of introns are consistent with annotations, but >10% are novel.

Conclusion

• Isoforms can be monitored at a single-molecule level without amplification or fragmentation

• The majority of reads represent all splice sites of the original transcripts

• Unannotated splice isoforms: long non-coding RNAs with few introns and isoforms of known protein-coding genes with many introns