Epigenomics: SomeEpigenomics: SomeEpigenomics: Some Epigenomics: Some Statistical ApplicationsStatistical ApplicationsStatistical Applications Statistical Applications
Rafael A. IrizarryRafael A. IrizarryDepartment of BiostatisticsJ h H ki Bl b S h l f P bliJohn Hopkins Bloomberg School of Public
Health
AcknowledgementsAcknowledgementsAcknowledgementsAcknowledgements
• Tom Albert Nimblegen• Tom Albert, Nimblegen• Benilton Carvalho, JHU Biostatistics• Andy Feinberg Lab, JHU Medicine• Todd Richmond, NimblegenTodd Richmond, Nimblegen• Hao Wu, JHU Biostatistics
J W B Bi t t• Jean Wu, Brown Biostat• Vasan Yegnasubramanian, JHU g
Oncology
OutlineOutlineOutlineOutline
• Quick Introduction to Epigenetics• Quick Introduction to Epigenetics• Introduction to Methylationy• Overview of competing technologies
Review: Expression arrays lessons• Review: Expression arrays lessons• Comparisonp• Role of statisticians
Genetics: the alphabet of lifeGenetics: the alphabet of life
• Letters of DNALetters of DNA sequence carry the informationinformation
EpigeneticsEpigenetics
(3.4x10-10 meters/bp) x (6x109 bp/genome) = ~2 meters/genome
Radius of the nucleus is ~ 10 µM !!!µ
Klug and Cummings, 1997
[(6 x 109 bp/genome) / (195 bp/nucleosome)] = ~ 30.8 x 106 nucleosomes/genome~ 5 % of nuclear volume
http://www.albany.edu/~achm110/solenoidchriomatin.html
Epigenetics: the grammar of Epigenetics: the grammar of lifelifelifelife
DNA methylationDNA methylationDNA methylationDNA methylation
DNA methyltransferase I (DNMT1)Obsered to expeced = Pr(CG) / { Pr(C) Pr(G) }
DNA methylation can lead to silencing of gene expressionDNA methylation can lead to silencing of gene expression
HDACMeCP2
Sin3A HDACMeCP2
Sin3A
HDAC2
HDAC1
MBD3
RbAp46RbAp48
Mi 2
MTA268kD
66kDHDAC2
HDAC1
MBD3
RbAp46RbAp48
Mi 2
MTA268kD
66kD>2 MDalton Complex
MBD2Mi-2 MBD2Mi-2
Robertson and Wolffe, Nat Rev Genet, 2000
ENCODE TrackENCODE TrackENCODE TrackENCODE Track
Expression Array LessonsExpression Array Lessons
NormalizationNormalizationNormalizationNormalization
Probe effectProbe effectProbe effectProbe effect
I t it B k d P b Eff t Q tit EIntensity = Background + Probe Effect x Quantity x Error
Sequence effect for BGSequence effect for BGSequence effect for BGSequence effect for BGWu et al. (2004) JASA 99(468) 909
Aff ∑ ∑ 125
jbk CGTAj
kj kAffinity =
= ∈∑ ∑= 1
1 },,,{,μ µj,k ~ smooth function
of k
Back to MethylationBack to Methylation
High throughput of course….High throughput of course….
Densities for three methodsDensities for three methodsDensities for three methodsDensities for three methods
HCT116 lots of methylation
DKO very little methylation
Hunh?Hunh?Hunh?Hunh?
MeDIP (like ChIPchip)MeDIP (like ChIPchip)
Total Reverse crosslinks Amplify Label/hybridizeAmplify Label/hybridize
Crosslink YLyse & Sonicate
IP Reverse crosslinks
Amplify Label/hybridize
Other controls for IP(e.g., no antibody, non-
specific antibody)
Some DataSome DataSome DataSome Data
Problem: Not specificProblem: Not specificProblem: Not specificProblem: Not specific
HELP: Two enzymesHELP: Two enzymesHELP: Two enzymesHELP: Two enzymesCuts at CCGG Cuts at CMCGG
No Methylation
HELP after PCRHELP after PCRHELP after PCRHELP after PCR
No Methylation
HELPHELPHELPHELP
Methylation
HELPHELPHELPHELP
No Methylation
Problem with HELPProblem with HELPProblem with HELPProblem with HELPCuts at CCGG Cuts at CMCGG
No Methylation
The ProblemThe ProblemThe ProblemThe Problem
Ob d t d P (CG) / { P (C) P (G) }Obsered to expeced = Pr(CG) / { Pr(C) Pr(G) }
Proportion of neighboring CpG also Proportion of neighboring CpG also th l t d/ t th l t dth l t d/ t th l t dmethylated/not methylatedmethylated/not methylated
McRBC on Tiling arrayMcRBC on Tiling arrayMcRBC on Tiling arrayMcRBC on Tiling array
ROC nowROC nowROC nowROC now
ENCODE TrackENCODE TrackENCODE TrackENCODE Track
Problems for StatisticiansProblems for StatisticiansProblems for StatisticiansProblems for Statisticians
• Background Correction +• Background Correction + Normalization
• Probability Model for Segments• Use these to from null and alternative• Use these to from null and alternative
models… we need power!• Use these to create bump finding
algorithmsg• Adapt to high-throughput sequencing
Supplemental SlidesSupplemental Slides
McRBC: One enzymeMcRBC: One enzymeMcRBC: One enzymeMcRBC: One enzymeCuts at AmCG or GmCG Input
No Methylation
McRBC after GelMcRBC after GelMcRBC after GelMcRBC after Gel
No Methylation
McRBC after GelMcRBC after GelMcRBC after GelMcRBC after Gel
No Methylation
McRBCMcRBCMcRBCMcRBC
Methylation
McRBC after GELMcRBC after GELMcRBC after GELMcRBC after GEL
Methylation
McRBC after GELMcRBC after GELMcRBC after GELMcRBC after GEL
Methylation