Post on 16-Jul-2015
transcript
David-Emlyn Parfitt
Shen Lab, Irving Cancer Research Center
Using RNA Seq to conduct systems-level analysis of
embryonic pluripotency, self-renewal and differentiation
The molecular regulators of self-renewal and pluripotency are
not completely defined or characterized
mESC hESCmEpiSC
Mouse blastocyst
(3.5 days)
Mouse egg cylinder
(5.5 days)
Epiblast
Inner Cell
Mass
≈
Human blastocyst
(5-7 days)
Self-renewal and PluripotencyNanog
Oct4
Sox2
JAK-STAT
MAPK
Novel Master Regulators?
150 Combinatory
Chemical
Treatments
Genome-Wide GEP Data
Algorithmic
analysis
(ARACNe,
MINDy)
Master
Regulator
Analysis
Ra
nk
ESC/EpiSC
„Interactome‟
In vitro and in vivo
validation
Defining the molecular networks associated with stem cell self-
renewal, pluripotency and differentiation
Which tool to use for
expression profiling?
Gene Expression Profiling:
Microarrays vs RNA-Sequencing
Arrays:
Well defined technique
High throughput
Discrete measurement
Background noise + batch effect
No distinction between isoforms/alleles
aaaaaaa
aaaaaaa
Total RNA
Fragment
Reverse-transcribe
to cDNA
aaaaaaa
aaaaaaa
Gene Expression Profiling:
Microarrays vs RNA-Sequencing
RNA Sequencing:
Gene Expression Profiling:
Microarrays vs RNA-Sequencing
Single base resolution
Low background noise
Distinction of isoform and allelic
expression
Low amount of RNA needed
*Including non-coding RNAs, depending
on purification protocol
RNA Sequencing:
aaaaaaa
aaaaaaa
Total RNA*
Reverse-transcribe
to cDNA
aaaaaaa
aaaaaaa
Algorithmic and logistic challenge
Lengthy library preparation
RNA-Sequencing Methodology:
Deciding the parameters
Read length?
-Efficiency vs faithfulness
Single end or paired end reads?
-Efficiency vs faithfulness
-Alignment accuracy
Number of reads?
-Depth of coverage
-Cost
How many to effectively cover
the mouse genome (~50MB)?
aaaaaaa
aaaaaaa
aaaaaaa
aaaaaaa
Deciding the parameters:
How many 100 bp reads is necessary for comprehensive
coverage of the mouse genome?
RPKM:
Normalized measurement of transcript abundance
Reads per kilobase of exome per million mapped
reads
RPKM for a particular transcript does not change
when overall number of reads changes, and it is
the same for transcripts with same abundance
Deciding the parameters:
How many 100 bp reads is necessary for comprehensive
coverage of the mouse genome?
RPKM:
Normalized measurement of transcript abundance
Reads per kilobase of exome per million mapped
reads
RPKM for a particular transcript does not change
when overall number of reads changes, and it is
the same for transcripts with same abundance
Deciding the parameters:
How many 100 bp reads is necessary for comprehensive
coverage of the mouse genome?
100 million, 100bp, SE reads
RA-72H-1 RA-72H-2 CM CM
Number of raw reads (million) 97.3 88 87 95
Number of mapped reads (million) 97 87.7 87 94
Transcripts w. RPKM > 0.01 (/27641) 72% 77% 84% 84%
Setting the transcript ‘detection’ threshold
RA-72H-1 RA-72H-2 CM CM
Number of raw reads (million) 97.3 88 87 95
Number of mapped reads (million) 97 87.7 87 94
Transcripts w. RPKM > 1 (/27641) 49% 48% 51% 52%
Setting the transcript ‘detection’ threshold
r2=0.9 r2=0.97
RPKM is constant, regardless of number of reads
“RPKM for a particular transcript does not change
when overall number of reads changes”
0.749
0.725
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Media
nR
PK
M
20 40 60 80
Reads (millions)
i.e. We are not detecting significantly more genes/transcripts above
20-30 million reads
RPKM becomes relatively constant with increased read
number
0.7
0.75
0.8
0.85
0.9
0.95
1
0 20 40 60 80 100
Perc
ent
of final
transcripts
Reads (millions)
[60,)
[30,60)
[15,30)
[7.5,15)
[3.75,7.5)
[0.01,3.74)
Transcript
Abundance
(RPKM)
Between 20 and 30 million 100bp reads is sufficient to capture
~100% of the most abundant transcripts and 95% of the least
abundant
How many 100 bp reads is necessary for comprehensive
coverage of the mouse genome?
Acknowledgements
Shen Lab:
Michael Shen
Hui Zhao
Shen Lab Members
Califano Lab:
Andrea Califano
Mariano Alvarez
Yufeng Shen
Xiaoyun Sun
Olivier Couronne
Erin Bush