Training materials
• Ensembl training materials are protected by a CC BY license • http://creativecommons.org/licenses/by/4.0/• If you wish to re-use these materials, please credit Ensembl for
their creation• If you use Ensembl for your work, please cite our papers • http://www.ensembl.org/info/about/publications.html
EBI is an Outstation of the European Molecular Biology Laboratory.
Annotating your variants:
Ensembl Variant Effect Predictor (VEP)
Helen Sparrow
Ensembl
EMBL-EBI
2nd November 2016
http://tinyurl.com/VEPCrete
Course materials
http://tinyurl.com/VEPCrete
• VEP Presentation
• VEP Coursebook (screenshots of demo)
• VEP Exercises
Ensembl Features
- Gene builds for ~70 species
- Gene trees
- Regulatory build
- Variation display and VEP
- Display of user data
- BioMart (data export)
- Programmatic access via the APIs
- Completely Open Source
http://tinyurl.com/VEPCrete
What is the VEP?
Determine the effect of variants (SNPs, insertions, deletions, CNVs or structural variants):
- Variant Coordinates- VCF- HGVS- Variant IDs
- Affected gene, transcript and protein sequence
- Pathogenicity
- Frequency data
- Regulatory consequences
- Splicing consequences
- Literature citations
http://tinyurl.com/VEPCrete
What is the VEP?
Perl scriptWeb interface REST API
XML
ensembl.org/Tools/VEP rest.ensembl.org
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and REST API
● Over 5,000 species
● Input
● Transcript sets
● Regulatory regions
● Known Variants
● Plugins
● Allele frequencies
● Associated phenotype, disease or trait
● Clinical significance states
● Important projects which use VEP
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input● Transcript sets
● Regulatory regions
● Known Variants
● Plugins
● Allele frequencies
● Associated phenotype, disease or trait
● Clinical significance states
● Important projects which use VEP
Web: 50mb (~2 million variants)
Script: unlimited variants
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input● Transcript sets
● Regulatory regions
● Known Variants
● Plugins
● Allele frequencies
● Associated phenotype, disease or trait
● Clinical significance states
● Important projects which use VEP
● VCF● rsID● HGVS● BED● Pileup
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input
● Transcript sets● Regulatory regions
● Known Variants
● Plugins
● Allele frequencies
● Associated phenotype, disease or trait
● Clinical significance states
● Important projects which use VEP
GENCODE
GENCODE Basic
RefSeq
GENCODE & RefSeq
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input
● Transcript sets
● Regulatory regions● Known Variants
● Plugins
● Allele frequencies
● Associated phenotype, disease or trait
● Clinical significance states
● Important projects which use VEP
The Ensembl Regulatory Build:● ENCODE● BLUEPRINT● NIH Epigenomics Roadmap
Can be limited to regulatory regions observed in specific cell types.
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input
● Transcript sets
● Regulatory regions
● Known Variants● Plugins
● Allele frequencies
● Associated phenotype, disease or trait
● Clinical significance states
● Important projects which use VEP
● dbSNP● Cosmic● Clinvar● ESP● HGMD-Public● Phencode
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input
● Transcript sets
● Regulatory regions
● Known Variants
● Plugins ● Allele frequencies
● Associated phenotype, disease or trait
● Clinical significance states
● Important projects which use VEP
E.g.● Splicing predictions● Loss of Function predictions● Expression levels across
transcripts
Anything - customisable!
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input
● Transcript sets
● Regulatory regions
● Known Variants
● Plugins
● Allele frequencies● Associated phenotype, disease or trait
● Clinical significance states
● Important projects which use VEP
● 1000 Genomes
● ESP
● ExAC projects
● GnomAD - coming soon!
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input
● Transcript sets
● Regulatory regions
● Known Variants
● Plugins
● Allele frequencies
● Associated phenotype, disease or trait● Clinical significance states
● Important projects which use VEP
● OMIM● Orphanet● GWAS Catalog● others
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input
● Transcript sets
● Regulatory regions
● Known Variants
● Plugins
● Allele frequencies
● Associated phenotype, disease or trait
● Clinical significance states ● Important projects which use VEP
Assigned by ClinVar
http://tinyurl.com/VEPCrete
Features of VEP● Web, Perl script and API
● Over 5,000 species
● Input
● Transcript sets
● Regulatory regions
● Known Variants
● Plugins
● Allele frequencies
● Associated phenotype, disease or trait
● Clinical significance states
● Important projects which use VEP
● 1000 Genomes● ExAC● DECIPHER● OpenTargets● LRG● GnomAD
Your own variant dataVariant coordinates 1 881907 881906 -/C +
5 140532 140532 T/C +12 1017956 1017956 T/A +2 946507 946507 G/C +14 19584687 19584687 C/T -
HGVS notation ENST00000285667.3:c.1047_1048insC5:g.140532T>CNM_153681.2:c.7C>TENSP00000439902.1:p.Ala2233AspNP_000050.2:p.Ile2285Val
VCF #CHROM POS ID REF ALT20 14370 rs6054257 G A20 17330 . T A20 1110696 rs6040355 A G,T20 1230237 . T .
Variant IDs rs41293501COSM327779rs146120136FANCD1:c.475G>Ars373400041
http://tinyurl.com/VEPCrete
Variant types
1) Small scale in one or few nucleotides of a gene
• Small insertions and deletions (DIPs or indels)
• Single nucleotide polymorphism (SNP)
A G A C T T G A C C T G T C T - A A C T G G AT G A C T T G A C - T G T C T G A A C G G G A
2) Large scale in chromosomal structure (structural variant)
• Copy number variants (CNV)
• Large deletions/duplications, insertions, translocations
deletion duplication insertion translocation
http://tinyurl.com/VEPCrete
Variant consequences
ATG AAAAAAA
Regulatory
3’ UTRIntronic
CODINGMissense
CODINGSynonymous
Splice site5’ Upstream 5’ UTR 3’ Downstream
● Identify transcripts that overlap the coordinates of the variants - Gencode or RefSeq or BOTH
● Predict the consequences of the variants
http://www.ensembl.org/info/docs/variation/predicted_data.html
Consequence terms
http://tinyurl.com/VEPCrete
Missense variants- pathogenicity
SIFT PolyPhen1
0
0.05Deleterious
Tolerated
1
0
0.2Probably damaging
Benign
0.1Possibly damaging
http://tinyurl.com/VEPCrete
VEP plugins
• Plugins add extra functionality to the VEP
• They may extend, filter or manipulate the output of the VEP
• Plugins may make use of external data or code
• Available on the web tool and with the script
http://tinyurl.com/VEPCrete
Pathogenicity Prediction Plugins
• dbNSFP - annotation database for missense SNPs
• Condel - consensus deleteriousness from SIFT and PolyPhen
• LoFtool - ranks susceptibility to disease based on Loss of Function to synonymous variants in ExAC data
Hands on
We have identified four variants on human chromosome nine, an A deletion at 128328461, C->A at 128322349, C->G at 128323079 and G->A at 128322917.
We will use the Ensembl VEP to determine:- Whether my variants have already been annotated
in Ensembl- What genes are affected by my variants?- Do any of my variants affect gene regulation?
Questions?
Help and documentationCourse online http://www.ebi.ac.uk/training/online/subjects/11
Tutorials www.ensembl.org/info/website/tutorials
Videos
www.youtube.com/user/EnsemblHelpdesk
Email us [email protected]
• Invite one of our outreach team to teach at your institution for free (except trainer’s expenses)
• E-mail us: [email protected]
Browser Course
½ - 2 day course on the Ensembl browser, aimed at wet-lab scientists. 1-2 trainers.
API course
2-4 day course on the Ensembl APIs (Perl or REST) aimed at bioinformaticians. 1-4 trainers.
Host a FREE Workshop!
AcknowledgementsThe Entire Ensembl Team
Funding
Co-funded by the European Union
Training materials
• Ensembl training materials are protected by a CC BY license
• http://creativecommons.org/licenses/by/4.0/• If you wish to re-use these materials, please
credit Ensembl for their creation• If you use Ensembl for your work, please cite our
papers • http://www.ensembl.org/info/about/publication
s.html