Home >Documents >Cindy G Boer

Cindy G Boer

Date post:15-Jan-2022
View:0 times
Download:0 times
Share this document with a friend
PowerPoint-presentatieInternal Medicine
Erasmus MC
Annotation of genetic loci
Where is your SNP? & What could it do? 1. Coding or in non-coding DNA
2. In a gene body or in an intergenic region ?
3. In a regulatory region? – Promoters, enhancers, inhibitors, insulators, transcription factor binding
sites etc.
Linkage Disequilibrium (LD)
• Association between disease trait and (tag) SNP – Array designed on LD structure not functional SNP
• Identification of Causal variant?
• LD structure plotted • SNPs high LD • (r2 >0.8 or r2 > 0.6)
Castaño Betancourt, et al.,(2016), PLOS genetics
Genome-wide association signal (Best case scenario)
Top SNP (+SNPs LD >0.8) is located in the coding sequence of a gene
• Synonymous? Or Non-Synonymous?
• Gene? What is known, what does it do? – Damaging effect of the hit?
(first part of the practical)
Genome-wide association signal (Realistic scenario)
Most GWAS findings are located in non-coding regions of the genome [M.T. Maurano et al., Science, 337, 1190 (2012)]
– Introns or intergenic
Regulatory elements
GWAS SNPs are enriched for regulatory elements.
Regulatory regions Promoters, enhancers, inhibitors, insulators, transcription factor binding sites etc.
1. What is a regulatory region/how is a regulatory region defined?
2. How will you know if your hit is located in a regulatory region?
[M.T. Maurano et al., Science, 337, 1190 (2012)]
Gene Expression
• Promoter : region of DNA that initiates transcription of a gene
• Enhancer : short region of DNA that increases/helps initiate the transcription of a gene.
• Inhibitor : short region of DNA that decreases/inhibits the transcription of a gene.
• The regulation/control of gene expression is essential for cell function, survival, differentiation etc.
Epigenetics = Changes/regulation of gene expression, caused by mechanisms other than DNA sequence variation
All epigenetic modifications on the genetic material of a cell
The Central Dogma
“Same Blueprint of DNA each cell”
How are there different cell types?
gene expression or cellular
DNA structure & Regulation
The Histone Code
Specific proteins involved in gene control recognize and interrogate the patterns of histone modifications: Ex. RNA polymerase II, Transcription factors & DNA binding proteins
– Transcription factor recruitment
Inactive Enhancer Active Enhancer
H3K9me2 H3K4me1 [enhancer specific]
Many many (100+) different histone modifications known! very complex!
Regulatory regions: Chromatin States
Roadmap Epigenomics Consortium, et al., Nature 2015
Epigenetics: symphony No. 9
Non-specific binding: polymerases, histones
Specific binding: Transcription factors, nucleases
Specific binding recognition consensus sequence
Change in consensus sequence change in DNA binding affinity? change in gene regulation/expression?
Consensus sequences
• Found in databases:
CTCF methylation
CTCF binding is affected by methylation in it’s core sequence
Proper CTCF functioning is essential!
“severe dysregulation of CTCF in cancer cells”
Mouse mutants CTCF – embryonic lethal
So Far we have:
• Transcription factor binding sites
GWAS & EWAS goal Identify novel targets/genes involved in phenotype X
So far only annotation, No (potential) causal gene
Gene Regulation
Adapted from: Alberts, Molecular Biology of the Cell 5th Edition, figure 7-44
Typical eukaryotic gene regulation • Complex 3D looping (CTCF) • Multiple regulatory regions • Involvement of multiple transcription factors • Can be cell type specific
Gene regulation is highly complex!
Gene Regulation
– Sonic Hedgehog, essential developmental gene
Circadian rhythm : Epigenetics
• Mammalian circadian clock
• Oscillation of ~ 24h
• A conserved transcriptional–translational auto-regulatory loop generates molecular oscillations of ‘clock genes’ at the cellular level
PARP1- and CTCF-Mediated Interactions between Active and Repressed Chromatin at the Lamina Promote Oscillating Transcription, Zhao et al., 2015 Molecular Cell
Complex 3D structure
Finding [causal] Genes
• Gene expression levels (RNA-seq)
– Predicted gene activity (ex active gene transcription mark: H3K36me3)
• Gene expression – Genotype
Enhancer site (likely) to regulate gene 1 or gene 2?
Cell type selection:
• Not in all cases the selection of target tissue will be easy: – Cell fate – Cell state and Cell type – Complex diseases & phenotypes
Availability of material & data Proxy tissues:
• Same lineage, similar functioning tissue • (gene of interest) expression vs no expression
• Tools & databases to select target tissue • GWAS SNPs are enriched for gene regulatory regions….in
target cell type!
• DNA regulatory elements: promoter, enhancer, inhibitor
• Epigenetics is cell type specific, think on what cell type is relevant to you
Go and Annotate your GWAS hit
Genome-wide association signal
..How to Find?
• Where is your hit (SNP) located? – Chromosome & position – Near or in which genes
• Coding variant – Synonymous/non-synonymous
– gene function
• Cell type?
– Structured & Searchable
– Publically available
– Genomic variation: dbSNP, HapMa .... – Sequence: NCBI RefSeq database, Entrez Nucleotide, miRbase... – Proteins: RCSB protein databand, UniProt, SMART... – Pathways: KEGG, Reactome, STRING... – DNA annotation: ENCODE, ROADMAP epigenetics
• Genome Browsers: genomic database, integrating all data associated to genome annotation & function.
• Mining Tools: FUMA & HaploReg
• Genome annotation:
• Links to other specialized Databases
• NCBI, UCSC and EnsEMBL use the same human genome assembly generated by NCBI – Release timing and data availability can differ between sites
• NOTE: the version of the genome assembly – Annotation location and availability will be different between different
• Own preference which to use
• Practical: mainly UCSC and some forays into other databases, including NCBI, EnsEMBL & ENCODE
Mining Tools
– Monday Practical & Todays practical
– Novel Tool!
Mining Tools
HaploReg HaploReg is a tool for exploring annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci.
• Mine ENCODE & RADMAP data be careful! Not always up to-date or gives clear information!
Your Research
• Be critical
Go and get lost... (and write down where you went)
Your research NEEDS biological databases!
The Practical
• UCSC genome browser links to other databases & data – Ensembl, ENCODE, ROADMAP, HaploREG, FUMA, GTEX………..
• 3 part practical I. Beginner database and bioinformatictools (FUMA, UCSC, HaploReg)
II. Advanced: adding regulatory data and gene expression data
III. More Advanced: Adding 3D chromatin structure to your annotation
Focus on “real life” examples
Use for your own research!
UCSC Genome Browser
UCSC Genome Browser
UCSC Genome Browser
Hints for the Practical • Ask us anything (me, Linda & Joost)
- (related to the practical or genetics)
• DNA is LARGE and a 3D molecule
– So check your surroundings! (i.e. zoom out)
• Can I click on it? YES more information! more track control!
• GIYF: Google is your friend
• Practical is in 3 parts
– Intro – standard – difficult