Date post: | 12-Jul-2015 |
Category: |
Presentations & Public Speaking |
Upload: | yi-feng-chang |
View: | 520 times |
Download: | 1 times |
Experience Sharing: Methylation Sequencing Analysis
Yi-Feng ChangInstitute of Biomedical Informatics, NYMU
Outline
DNA Methylation
Bisulfite Sequencing Technologies
BS-Seq Data Analysis
Tools for BS-Seq Data Analysis
A Case Study
2
Epigenetics Overview
3http://commonfund.nih.gov/epigenomics/figure.aspx
DNA Methylation: Functions and Diseases
4Singal, R. & Ginder, G.D. DNA Methylation. Blood Journal 93, 4059-4070 (1999).
Lee, K.W. & Pausova, Z. Cigarette smoking and DNA
methylation. Front Genet 4, 132 (2013).
Mikeska, T. & Craig, J.M. DNA methylation biomarkers: cancer
and beyond. Genes (Basel) 5, 821-64 (2014).
Grayson, D.R. & Guidotti, A. The dynamics of DNA methylation in schizophrenia and related
psychiatric disorders. Neuropsychopharmacology 38, 138-66 (2013).
Chen, C. et al. Correlation between DNA methylation and gene expression in the brains of
patients with bipolar disorder and schizophrenia. Bipolar Disord 16, 790-9 (2014).
De Jager, P.L. et al. Alzheimer's disease: early alterations in brain DNA methylation at ANK1,
BIN1, RHBDF2 and other loci. Nat Neurosci 17, 1156-63 (2014).
DNA Methylation Pathway
5
Moore, L.D., Le, T. & Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 38, 23-38 (2013).
DNA Demethylation Pathway
6Moore, L.D., Le, T. & Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 38, 23-38 (2013).
• 5mC: 5-Methylcytosine
• 5hmC: 5-hydroxymethylcytosine
• 5hmU: 5-hydroxymethyluracil
• 5fC: 5-formylcytosine
• 5caC: 5-carboxycytosine
• Tet: Ten-eleven translocation enzymes
• AID/ APOBEC: activation-induced cytidine
deaminase/apolipo- protein B mRNA-editing
enzyme complex
• TDG: Thymine DNA glycosylase
• SMUG1: Single-strand-selective
monofunctional uracil-DNA glycosylase 1
Bisulfite Sequencing Technologies
Timeline of Technologies for Studying DNA Methylation
8
MS-HRM
MeDIP-Seq
BS-Seq
MethylC-SeqTAB-Seq
MAB-Seq
Harrison, A. & Parle-McDermott, A. DNA methylation: a timeline of methods and applications. Front Genet 2, 74 (2011).
COBRA: Combined Bisulfite Restriction AnalysisAP-PCR: Methylation-Sensitive Arbitrarily Primed PCRAIMS: DNA methylation by amplification of intermethylated sitesRRBS: Reduced representation bisulfite sequencing
MS-HRM: Methylation-sensitive high resolution meltingMeDIP-Seq: Methylated DNA immunoprecipitation sequencingMethylC-Seq/BS-Seq: Bisulfite sequencingTAB-Seq: Tet-Assisted Bs-SeqMAB-Seq: M.SssI methylase-assisted BS-Seq
2015
The Steps to Determining the Methylation Status of Cytosine in a Known DNA Sequence by The Bisulfite Conversion Method
9Singal, R. & Ginder, G.D. DNA Methylation. Blood Journal 93, 4059-4070 (1999).
Techniques for Enrichment of Methylated or Target Regions Prior to BS- Seq
10
Lister, R. & Ecker, J.R. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 19, 959-66 (2009).
Genomic DNA
Deep Sequencing
Techniques for Genome-Wide Sequencing of Cytosine Methylation Sites
11
Lister, R. & Ecker, J.R. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 19, 959-66 (2009).
Genomic DNA
Deep Sequencing
Approaches for Detecting Active DNA Demethylation at Single Base Resolution
12
TAB-Seq: Tet-Assisted Bs-Seq
Yu, M. et al. Tet-assisted bisulfite sequencing of 5-hydroxymethylcytosine. Nat Protoc 7, 2159-70 (2012).Yu, M. et al. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell 149, 1368-80 (2012).
MAB-Seq: M.SssI methylase-assisted BS-Seq
Wu, H., Wu, X., Shen, L. & Zhang, Y. Single-base resolution analysis of active DNA demethylation using methylase-assisted bisulfite sequencing. Nat Biotechnol 32, 1231-40 (2014).
Genomic Coverage of MeDIP-seq, MethylCap-seq, RRBS and Infinium
13Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).
MeDIP-seq and MethylCap-seq provide broad coverage of the genome, whereas RRBS and Infinium are more restricted to CpG islands and promoter regions
Key Metrics of the Technology Comparison
14Beck, S. Taking the measure of the methylome. Nat Biotechnol 28, 1026-8 (2010).
BS-Seq Data Analysis
15
Effect and Problems of Bisulfite Treatment of DNA
16
Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).
Mapping bisulfite reads to 4 possible bisulfite strands (OT/CTOT/OB/CTOB) is equivalent to mapping the bisulfite read and its reverse complementary read to both Top/Bottom strands of the original reference sequence.
How to Align BS Reads Against Reference Genome?
17Krueger, F. & Andrews, S.R. Bismark: A flexible aligner and methylation caller for Bisulfite-Seqapplications. Bioinformatics (2011).
Bock, C. Analysing and interpreting DNA methylation data. Nat Rev Genet 13, 705-19 (2012)
Y=C or T
TCGA TCGT ACGTATGA
Multiple hits
TTGT ATGT
Multiple hits
TCGA ATGA
Workflow for Analyzing BS-Seq data
18Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).
http://omictools.com/bisulfite-seq/
Published Tools
19
B-SOLANA Bisulphite aligner for processing bisulphite-sequencing color space data http://code.google.com/p/bsolana
BatMeth Base and color space data http://code.google.com/p/batmeth
Bicycle Lister et al. 2009 workflow http://sing.ei.uvigo.es/bicycle/howitworks.html
BiQ Analyzer HTLocus-specific analysis and visualization of high-throughput bisulfite sequencing data
http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de
BiSeq DMR for RRBS data R/Bioconductor package BiSeq
BISMA Support analysis of repetitive sequences http://biochem.jacobs-university.de/BDPC/BISMA
BismarkProbably the most widely used three-letter bisulphite aligner; supports both Bowtie (fast, gap-free alignment) and Bowtie 2.0 (sensitive, gapped alignment)
http://www.bioinformatics.babraham.ac.uk/projects/bismark
Bis-SNPVariant caller for inferring DNA methylation levels and genomic variants from BS-Seq reads that have been aligned by other tools
http://epigenome.usc.edu/publicationdata/bissnp2011
Bisulfighter Using Last for mapping, HMM for DMR detection http://epigenome.cbrc.jp/bisulfighter
BRAT Highly configurable and well-documented three-letter BS-Seq aligner http://compbio.cs.ucr.edu/brat
BS-SeekerBS-Seeker 2
Three-letter BS-Seq aligner based on Bowtiehttp://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html
BSMAP Probably the most widely used wild-card BS-Seq aligner http://code.google.com/p/bsmap
Bsmooth Mapping, quality control and DMR analysis pipeline http://rafalab.jhsph.edu/bsmooth
COHCAP Integration with gene expression data https://sourceforge.net/projects/cohcap/
CpG_MPs Methylation patterns of genomic regions http://202.97.205.78/CpG_MPs/
DMAP DMR for BS-Seq and RRBS datahttp://biochem.otago.ac.nz/research/databases-software/
DMR2+ DMR for array based data
DSS Bayesian hierarchical model to detect differentially methylated loci (DML) R/Bioconductor package DSS
Epidiff DMR detection http://bioinfo.hrbmu.edu.cn/epidiff
Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).Tran, H., Porter, J., Sun, M.A., Xie, H. & Zhang, L. Objective and comprehensive evaluation of bisulfite short read mapping tools. Adv Bioinformatics 2014, 472045 (2014).http://omictools.com/bisulfite-seq/
Published Tools (cont.)
20
GSNAP Wild-card BS-Seq aligner included in a widely used general-purpose alignment tool http://share.gene.com/gmap
GBSA Analysis pipeline for gene-centric or gene-independent focus http://ctrad-csi.nus.edu.sg/gbsa
FadE Mapping for Base and Color space http://code.google.com/p/fade
Kismeth Designed to be used with plants http://katahdin.mssm.edu/kismeth
LastRecent and well-validated wild-card BS aligner included in a general-purpose alignment tool
http://last.cbrc.jp
MethPipe Mapping, BS conversion rate, HMR, DMR pipeline http://smithlabresearch.org/software/methpipe
Methyl-MAPSMethyl-Analyzer
Base and color space data + post analysishttp://epigenomicspub.columbia.edu/methylanalyzer_data.html
MethylCoderThree-letter Bs-Seq aligner that can be used with either Bowtie (high speed) or GSNAP (high sensitivity)
https://github.com/brentp/methylcode
MethylExtract Detects variation http://bioinfo2.ugr.es/MethylExtract
MethylSig R package pipeline for BS-Seq and RRBS http://sartorlab.ccmb.med.umich.edu/software
MOABS DMR detection http://code.google.com/p/moabs
Pash Wild-card BS aligner included in a general-purpose alignment tool http://brl.bcm.tmc.edu/pash
RMAPRMAPBS
Wild-card BS aligner included in a general-purpose alignment toolhttp://www.cmb.usc.edu/people/andrewds/rmaphttp://smithlabresearch.org/software/methpipe
RRBSMAPVariant of BSMAP that is specialized on reduced-representation bisulphitesequencing (RRBS) data
http://rrbsmap.computational-epigenetics.org
SAAP-RRBS RRBS mappinghttp://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm
segemehl Wild-card bisulphite aligner included in a general-purpose alignment tool http://www.bioinf.uni-leipzig.de/Software/segemehl
SOCS-B Robin-Karp hashin, color space data http://solidsoftwaretools.com/gf/project/socs
Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).Tran, H., Porter, J., Sun, M.A., Xie, H. & Zhang, L. Objective and comprehensive evaluation of bisulfite short read mapping tools. Adv Bioinformatics 2014, 472045 (2014).http://omictools.com/bisulfite-seq/
How to Select BS-Seq Analysis Tools?
Actively update and good supports from authors or communities Aligner
BS-Seeker 2
Bismark
RMAPBS/RMAPBS-PE
Post-analysis tools
MethPipe
21
• Chatterjee, A., Stockwell, P.A., Rodger, E.J. & Morison, I.M. Comparison of alignment software for genome-wide bisulphite sequence data. Nucleic Acids Res 40, e79 (2012).
• Bismark• BSMAP• RMAPBS
• Tran, H., Porter, J., Sun, M.A., Xie, H. & Zhang, L. Objective and comprehensive evaluation of bisulfite short read mapping tools. Adv Bioinformatics 2014, 472045 (2014).
• BSMAP• BS-Seeker• Bismark• BRAT-BW• BiSS
• Kunde-Ramamoorthy, G. et al. Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res 42, e43 (2014)
• Bismark• BSMAP • Pash
Public BS-Seq Resources from MethBase and NCBI GEO
22
http://smithlabresearch.org/software/methbase/
Glycine max (Soy beans)Schistocerca gregaria (Locust)Rattus norvegicus (Rat)Danio rerio (Zebra fish)Drosophila melanogaster (Fruit fly)Oryza sativa (Rice)Macaca mulatta (Rhesus monkey)Mus musculus domesticus (Western Europen house mouse)Xenopus (Silurana) tropicalis (Frog)Cynoglossus semilaevis (Tongue sole, bony fish)Bombyx mori (Silkworm)Harpegnathos saltator (Jerdon's jumping ant)Camponotus floridanus (Florida carpenter ant)
A Case Study
BS-Seq of Female and Male Mouse Thalamus
23
Analysis Pipeline
24Allele-specific Methylated Regions
amrfinder allelicmeth
Differential Methylation Region
dmr
Large Hypo/Hyper-Methylation Domainspmd
Hypo/Hyper-Methylation Regionshmr hyperhmr pmr
Methylation Callingmethcounts + error correction
Bisulfite Conversion Ratebsrate
Remove Duplicate Readsduplicate-remover
Mappingrmapbs rmapbs-pe
Quality Trimmingfastq_masker
Cross-species Comparison of Methylomes
liftOver
Calculating Methylation Ratio for Regions
bigWigAverageOverBed roimethstat Bwtools
Generate Methylation BED file
Bedtools bedGraphToBigWig
fastx toolkit: http://hannonlab.cshl.edu/fastx_toolkit/
MethPipe: http://smithlabresearch.org/software/methpipe/
Bedtools: https://github.com/arq5x/bedtools2
Programs from UCSC Genome Browser: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64
bwtool: https://github.com/CRG-Barcelona/bwtool/wiki
Whole Gnome BS-Seq of Male/Female Mouse Thalamus
25
Sample namelibrary
size (bp)Total base
(Mbases)/ laneTotal Base
(Mb)/ sample# Reads
(PE read)Total reads
% of >= Q30 Bases (PF)
Mean Quality Score (PF)
seqcoverage
Male Thalamus 369 25,53790,889
255,372,054908,891,140
92.33 36.2 33.7Male Thalamus 337 28,165 281,645,674 92.42 36.12Male Thalamus 312 37,187 371,873,412 90.62 35.56
Female Thalamus 386 29,96584,106
299,653,436841,061,998
90.81 35.68 31.2Female Thalamus 362 31,305 313,048,784 91.53 35.82Female Thalamus 317 22,836 228,359,778 76.36 31.28
Sequencing Output
Sample Mapped ReadsAfter Remove
DuplicatesSequencing depth
(CpG), When > 0BS Conversion
Rate
Male Thalamus 770,958,854 655,643,506 26.5951 0.980197
Female Thalamus 698,661,676 630,472,842 27.1065 0.980664
Mapping Results
How to Correct Error Rates in BS-Seq Data?
• Error Rate: non-conversion rate + sequencing error (0.01% )• Higher non-conversion rate will produce more methylated CpG sites
• Using statistical test (binomial distribution test with FDR < 1%)• Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences.
Nature 462, 315-22 (2009)
• http://sing.ei.uvigo.es/bicycle
• Filtering CpG sites with read depth
• A CpG site is unmethylated if its read depth < 5
26
Sex Sample BS Conversion Rate Error Rate
Male Thalamus 0.980197 0.019903
Female Thalamus 0.980664 0.019436
How to Compare Two or More BS-Seq Data?
27
chrom position strand context probabilityunmeth read
sample Ameth readsample A
unmeth readSample B
meth readsample B
meth. ratiosample A
meth. ratiosample B meth. diff.
p-value(two-sided)
p-value(left-sided)
p-value(right-sided)
chr1 3004854 + CpG 5.303E-03 8 18 15 7 0.3077 0.6818 -0.3741 1.943E-02 1.040E-02 9.982E-01
chr1 3005927 + CpG 1.087E-02 6 4 12 0 0.6000 1.0000 -0.4000 2.871E-02 2.871E-02 1.000E+00chr1 3006295 + CpG 5.954E-05 0 8 9 1 0.0000 0.9000 -0.9000 4.114E-04 2.057E-04 1.000E+00chr1 3006364 + CpG 9.966E-01 4 2 0 9 0.6667 0.0000 0.6667 1.099E-02 1.000E+00 1.099E-02chr1 3009412 + CpG 7.792E-05 23 33 42 13 0.4107 0.7636 -0.3529 2.250E-04 1.473E-04 1.000E+00
chr1 3011067 + CpG 9.922E-01 47 1 40 8 0.9792 0.8333 0.1458 3.053E-02 9.987E-01 1.526E-02chr1 3011800 + CpG 4.762E-03 0 3 5 0 0.0000 1.0000 -1.0000 1.786E-02 1.786E-02 1.000E+00chr1 3016267 + CpG 9.979E-01 12 4 0 5 0.7500 0.0000 0.7500 6.192E-03 1.000E+00 6.192E-03chr1 3021463 + CpG 1.260E-02 56 7 109 3 0.8889 0.9732 -0.0843 3.668E-02 2.677E-02 9.953E-01
chr1 3021653 + CpG 9.855E-01 87 0 102 6 1.0000 0.9444 0.0556 3.423E-02 1.000E+00 2.708E-02
1. Using Fisher's exact test to determine whether a CpG site in two BS-Seq data is differentially methylatedLister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315-22 (2009).
chrom position sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 Chi Sq
chr1 3264648 0.916 0.781 0.913 0.748 0.891 0.551 0.894 0.698 0.885 0.713 0.860 0.719 163.886
chr1 3265559 0.836 0.517 0.808 0.524 0.848 0.427 0.851 0.508 0.862 0.505 0.850 0.524 326.237
chr1 3604995 0.070 0.051 0.000 0.000 0.068 0.076 0.065 0.000 0.101 0.000 0.000 0.000 102.970
chr1 4242297 0.000 0.000 0.000 0.000 0.000 0.026 0.000 0.000 0.000 0.000 0.000 0.000 196.797
chr1 4322330 0.432 0.097 0.352 0.112 0.254 0.643 0.392 0.103 0.439 0.154 0.325 0.139 129.562
chr1 4431875 0.000 0.000 0.000 0.000 0.075 0.000 0.000 0.000 0.143 0.000 0.000 0.000 142.589
chr1 4597796 0.784 0.343 0.493 0.349 0.583 0.830 0.827 0.306 0.860 0.382 0.798 0.258 119.062
chr1 4916463 0.146 0.143 0.272 0.270 0.719 0.782 0.195 0.161 0.429 0.305 0.208 0.298 107.710
chr1 4916643 0.272 0.438 0.157 0.288 0.711 0.737 0.183 0.197 0.208 0.214 0.257 0.266 129.678
chr1 5011040 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.333 0.162 0.000 0.150 107.753
2. Using Chi square to find sample-specific methylated CpG siteHon, G.C. et al. Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat Genet 45, 1198-206 (2013).
Methylation Status of Gene Structure
28
Methylation Female Male % of diff. meth CpG Female vs Male
0% 143809 141905 100% 6533
5% 13157 14411 95% 178
10% 14189 15190 90% 333
15% 15268 16186 85% 290
20% 19380 19468 80% 537
25% 26789 26047 75% 1437
30% 38455 36165 70% 8037
35% 38599 38593 65% 232
40% 51006 51116 60% 1817
45% 53893 56669 55% 134
50% 112061 105476 50% 126906
55% 99034 98250 45% 1295
60% 134921 132829 40% 7599
65% 196008 188988 35% 99828
70% 267239 257493 30% 4314
75% 406567 391491 25% 69793
80% 606620 588995 20% 71484
85% 835161 835741 15% 32022
90% 982156 1023248 10% 11372
95% 431721 472753 5% 548
100% 304113 290657 0% 4272664
Divide mouse chromosomes into 200bp bins: ~4,8,000,000/13,628,844 have CpG sites
CGI, protein_coding
nCGI, protein_coding
0.2
0.4
0.6
0.8
0.60
0.65
0.70
0.75
0.80
0 20 40Gene Structure (-10K, Gene Body: 1 ~ 100%, +10K)
Cp
G M
eth
yla
tio
n R
atio
(x1
00
%)
sample
thalamus-f
thalamus-m
-10K TSS 40% 80% 100% 10K
-10K +10KTSS TxEnd50bins 50bins 50bins
Gene body
CGI, protein_coding
nCGI, protein_coding
0.950
0.955
0.960
0.965
0.970
0.950
0.955
0.960
0.965
0.970
0 20 40Gene Structure (-10K, Gene Body: 1 ~ 100%, +10K)
Sim
iarity
of C
pG
Meth
. S
tatu
s (
x100%
)
samplethalamus-f vsthalamus-m
-10K TSS 40% 80% 100% 10K
Differential Methylated Region between Female and Male Murine Thalamus Genomes
Differential Methylated Region (DMR)1. Merge two adjacent differential methylated
CpG sites with distance ≤ 200 bp
2. Keep merged regions with > 3 CpG sites and > 70% of CpG sites are differential methylated
3. Merge two regions with distance ≤ 500 bp
29
ChromLower meth.
in femaleLower meth.
in male Mosaic DMR
chr1 290 130 20chr2 276 127 28chr3 157 74 9chr4 236 112 23chr5 228 151 22chr6 137 82 12chr7 226 94 23chr8 189 105 21chr9 175 106 13
chr10 169 88 15chr11 265 143 22chr12 149 88 14chr13 138 75 12chr14 108 63 11chr15 140 87 15chr16 73 51 8chr17 151 79 15chr18 99 53 8chr19 102 64 12chrX 939 89 4
0
500
1000
1500
2000
2500
3000
3500
4000
4500
# of Gene Structure Overlapped with DMR
DMR may span across multiple gene structures
Acknowledgement
陽明大學生化暨分子生物研究所徐明達教授 Aim for the Top University Plan from the Ministry of Education, Taiwan
NSC102-2319-B010-001 高通量基因體分析核心設施(III)
NSC101-2319-B010-003-B4 高通量基因體分析核心設施(II)-核心設施服務計畫C1-1
NSC100-2319-B010-001 高通量基因體分析核心設施(I)
NSC99-3112-B010-020 高通量基因體分析核心設施(III)
NSC98-3112-B010-011 高通量基因體分析核心設施(II)
Mouse Brain Tissues 陽明大學生命科學系暨基因體研究所蔡亭芬教授
台北醫學大學轉譯醫學博士學位學程陳怡帆助理教授
Computing Resource National Research Program for Biopharmaceuticals (NRPB, NSC 1022325-B-492-001)
National Center for High-performance Computing (NCHC) of National Applied Research Laboratories (NARLabs)
30
Experience Sharing: Methylation Sequencing Analysis
DNA Methylation
Bisulfite Sequencing Technologies
BS-Seq Data Analysis
Tools for BS-Seq Data Analysis
A case study
Questions?
This slide is available in http://www.slideshare.net/YiFengChang
31