Processing and analysis methods for DNA methylation array data
Giovanni FioritoPhD Student in Complex Systems for Life SciencesDepartment of Medical SciencesUniversity of Turin, Italy
Outline
Brief introduction to epigenetics, DNA methylation, and genome-wide association studies (GWAS).
Statistical analysis of DNA methylation array data using R- single marker test,- multiple marker test.
Applications and interesting results.
GENETIC BACKGROUND
ENVIRONMENTAL EXPOSURE(pollution, occupational exposure, …)
LIFESTYLE(smoking habits, diet,
physical activity, obesity, …)
SUSCEPTIBILITY TO DISEASES
AGEING
EPIGENETICS(DNA methylation, gene expression,
histone modifications)
“ Epigenetics is the study of heritable changes in gene activity that are not caused bychanges in the DNA sequence ”.
Epigenetics, DNA methylation and environmental exposure
S. Dalì, Ritratto di mio padre, 1921
Bisulfite conversion
Measure Genome-Wide DNA Methylation
Single experiment -> DNA methylation percentage of ~450k markers simultaneously
Workflow
Wilhelm-Benartzi, Br. J. Cancer, 2013
Fluorescence intensities areconverted in numerical values.
𝜷 =𝑴𝒆𝒕𝒉 + 𝒃𝒈
𝑴𝒆𝒕𝒉 + 𝑼𝒏𝒎𝒆𝒕𝒉 + 𝒃𝒈
Pre-processing and analysis data: R packages
Wilhelm-Benartzi, Br. J. Cancer, 2013
β vs. M values
𝑀 = 𝑙𝑜𝑔2β
1 − β
− βvalues are not normally distributedand has severe heteroscedastic variance but are biologically easily interpretable.- M values are approximately normally distributed and has approximately homoscedastic variance.- It is recommended to use M values for statistical comparisons and βvalues in final reports.
Pan et. al., BMC Bioinformatics, 2013
S. Dalì, Battaglia più di un dente di leone, 1938
GWAS (Genome-Wide Association Study)
Maurano, Science, 2012
Single marker test- About 450k statistical comparisons (T-test, multivariate linear/logistic regression, …).- R package CpGassoc.
Example: DNA methylation and Multiple Sclerosis susceptibility
Problem: Multiple testing correction- More than 20k false positive considering α = 0.05.- Bonferroni correction αB = 0.05/N ~ 1 x 10-07. Too much conservative.- False Discovery Rate (FDR), Holm correction (R function p.adjust, R package fdrtool).- Permutation is the better way to reduce type I and type II error (R function permutest, R
package CpGassoc), but is computationally expensive.
GWAS (Genome-Wide Association Study)
Maurano, Science, 2012
Differential DNA Methylation in Purified Human Blood Cells
Whole blood is a mixture of several cell types.
Each cell type has specific DNA methylation profile.
Differences in proportion of whole blood cell type can introduce a bias in the statistical analysis.
In DNA methylation analyses using whole blood, the cell specific pattern should be evaluated beforehand.
Reinius, Epigenetics, 2013
Blood-based profiles of DNA methylation predict the underlined distribution of cell types
Houseman, Epigenetics, 2013
Based on selected ~500 DNA methylation markers is possible to predict blood cell type proportion.
R script wbcInference.R is available on line.
Cell type proportion should be used as covariates in multivariate linear/logistic regression analysis.
Ageing and DNA methylation
Johansson, PLoS One, 2013
DNA methylation and blood cell proportion are strongly age dependent.
Adjustment for blood cell proportion avoid several false positive associations.
S. Dalì, Uomo anziano al crepuscolo, 1917-1918
Multiple markers test
R package RPMM (Recursively Partitioned Mixture Model)
- Hierarchical clustering samples based on DNA methylation similarity
- Testing association between different clusters and outcome of interest
Example: Susceptibility to mental disorders
P. Hasmann, Rappresentazione di S. Dalì
DNA Methylation influence survivor in bladder cancer patients
Tajuddin, Br. J. Cancer, 2014
Different DNA methylation profiles ->Different response to therapy.
S. Dalì, La persistenza della memoria, 1931
DNA methylation modulate cardiovascular disease (CVD) risk conferred by intake of B vitamins
Causal mediation analysis canbe performed using the Rpackages mediation andRmediation.
DNA methylation of One Carbon Metabolism
(OCM) genes
Low B vitamins intakeMyocardial
Infarction risk
Fiorito, Nutr. Metabolism and CVD, 2013
DE
TE
ME
Gene Set Enrichment Analysis (GSEA)
Gene List
Molecular Signatures Database
Pathways Enrichment
R package GSEA find biological pathway enrichments based on the hypergeometric distribution test.The online version is easy to use but it is not possible to change analysis parameters.
S. Dalì, L’enigma senza fine, 1938
Gene Set Enrichment Analysis (GSEA)R package graphite- Convert pathway topology to gene network.- Online version is available (graphite web).
Romualdi, BMC Bioinformatics, 2014
Conclusions
DNA methylation is strongly regulated by external exposure and is associated to several complex diseases.
Several R packages are developed for processing and analysis of genome-wide epigenetics data.
GWAS are powerful method for the identification of novel candidate genes altered in complex diseases.
Acknowledgements
Genomic variation in human population and complex diseases unit. Human Genetics Foundation (HuGeF).
Prof. Giuseppe MatulloAlessandra AllioneAlessia RussoBarbara PardiniCornelia di GaetanoElisabetta CasaloneFabio RosaFederica ModicaGiovanni FioritoSimonetta Guarrera