+ All Categories
Home > Documents > publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the...

publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the...

Date post: 29-May-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
49
Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html Training materials
Transcript
Page 1: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

• Ensembl training materials are protected by a CC BY license

http://creativecommons.org/licenses/by/4.0/

• If you wish to re-use these materials, please credit Ensembl for their creation

• If you use Ensembl for your work, please cite our papers

http://www.ensembl.org/info/about/publications.html

Training materials

Page 2: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Variation data in Ensembl

Erin [email protected]

@ensembl /@ensemblgenomes

Page 3: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Questions?○ We’ve muted all of your microphones

○ Join our Slack workspace and ask questions (link in your registration confirmation email)

○ My Ensembl colleagues will respond during the talk

○ Please reply @username to reply to a specific person

Emily Perry Astrid Gall

Page 4: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Course exercisesAll materials and exercises located here:

http://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-series-2016

A link to exercises and their solutions will appear in the

page hierarchy

This text will be replaced by a YouTube (link to YouKu too) video of the webinar

and a pdf of the slides.

The “next page” will be the exercises

Page 5: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Get help with the exercises

• Use the exercise solutions in the online course

• Join our Slack workspace and discuss the exercises with everybody in dedicated channels (register to get sent a link)

• Email us [email protected]

Page 6: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

This webinar courseDate Webinar topic Instructor

4th Sept Introduction to Ensembl ✔

Ensembl genes ✔

Astrid Gall

Emily Perry

6th Sept Variation data in Ensembl and the Ensembl VEP

Comparing genes and genomes with Ensembl Compara

Erin Haskell

Astrid Gall

11th Sept Finding features that regulate genes – the Ensembl Regulatory Build

Data export with BioMart

Emily Perry

Erin Haskell

13th Sept Uploading your data to Ensembl

Introduction to the Ensembl REST APIs

Astrid Gall

Emily Perry

Page 7: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Variation data in Ensembl

Erin [email protected]

@ensembl /@mycoacia

Page 8: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Session structurePresentation:Part 1: Ensembl variation dataPart 2: The Ensembl Variant Effect Predictor (VEP)

Demo:Part 1: Viewing variation data in the browserPart 2: Using the VEP

Exercises:Available on the train online site

Page 9: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Ensembl variation data- What types of variants are in Ensembl?

- Where does the data come from?

- What are the biological consequences of variants?

- Things to watch out for

The Ensembl Variant Effect Predictor (VEP) tool- What data can I use with the VEP?

- Identifying known variants

- Predicting consequences for novel variants

Session Overview

Page 10: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

What types of variant are in Ensembl?

ensembl.org/info/genome/variation/index.html

Two broad categories:

1. Sequence variants (small alterations ≤50bp)

2. Structural variants (larger alterations ≥50bp)

Page 11: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Variant type 1: Sequence variants

● Single nucleotide polymorphisms (SNP/SNV)

ref...TTGACGTA...

alt...TTGGCGTA...

● Small insertions & deletions

ref...TTGACGTA... ins...TTGAGCGTA...del...TTG-CGTA...

indel...TTGGCTCGTA...

http://www.ensembl.org/info/genome/variation/prediction/classification.html

Page 12: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

● Copy number variation (CNV)

● Inversion - nucleotide sequence inverted at same position

● Translocation - nucleotide sequence moved to a new position

Variant type 2: Structural variants

RefGainLoss

RefInvert

> > >> > >

RefTranslocated: same chromosomeTranslocated: diff chromosome

http://www.ensembl.org/info/genome/variation/prediction/classification.html

Page 13: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Where does the data come from?

Linked data

Quality control

Variant import

Ensembl analysis

The Ensembl variation process

Page 14: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Ensembl variation process: Import

Linked data

Quality control

Variant import

Ensembl analysis

Import variant data from

publicly available archives

and data repositories.

http://www.ensembl.org/info/genome/variation/species/sources_documentation.html

EVA

...and many many more

Page 15: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Data import: 23 species with variation data

http://www.ensembl.org/info/genome/variation/species/species_data_types.html

Page 16: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

http://ensemblgenomes.org/info/genomes?variation=1

Division Number of species with variation data

Bacteria 0

Fungi 8

Metazoa 4

Plants 12

Protists 3

Data import: 27 species with variation data

Page 17: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Ensembl variation process: QC

Linked data

Quality control

Variant import

Ensembl analysis

● Mapping to reference assembly○ GRCh37 GRCh38

● Checks on alleles

● Checks for IUPAC ambiguity codes

● Excluding ‘suspect’ variants

http://www.ensembl.org/info/genome/variation/prediction/variant_quality.html#quality_control

Page 18: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

http://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html

Ensembl variation process: Linked data

Linked data

Quality control

Variant import

Ensembl analysis

Import ‘accessory’ data

● Phenotype/disease

● Allele frequencies

● Publication data

Page 19: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

CEU CHBJPT

LWKMSLASW

YRI

TSIMXL

GIHPUR

CLM

PEL

ACB

GW

D

IBR

GBRFIN

CHS

KHV

CDXPJL

Sequencing 2,500 individuals at 4X coverage

BEB

ITUSTU

ESN

Linked data: 1000 genomes project

America Africa Europe East Asia Central-South Asia http://www.internationalgenome.org

Page 20: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

macarthurlab.org/2017/02/27/the-genome-aggregation-database-gnomad/

The Genome Aggregation Database provides allele frequency data from 7 different populations

Linked data: GnomAD allele frequencies

Sam

ple

nu

mb

er

Page 21: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Ensembl variation process: Analysis

Linked data

Quality control

Variant import

Ensembl analysis

Ensembl predicts:

● Variant consequences

● Protein function prediction

● Linkage disequilibrium data

● Variant conservation across species

http://www.ensembl.org/info/genome/variation/prediction/index.html

Page 22: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

http://www.ensembl.org/info/genome/variation/prediction/predicted_data.html

Analysis: Variant consequence termsStandardised variant consequence terms as defined by

http://www.sequenceontology.org

Page 23: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

http://www.ensembl.org/info/genome/variation/prediction/predicted_data.html

Analysis: Variant consequence termsStandardised variant consequence terms as defined by

http://www.sequenceontology.org

Page 24: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

- For missense variants only

- Two prediction algorithms:

- SIFT (Sorting Intolerant From Tolerant)

- PolyPhen (Polymorphism Phenotyping)

Score changes in amino acid sequence based on:

- How conserved the amino acid is

- The chemical change in the amino acid

Analysis: Pathogenicity scores

ensembl.org/info/genome/variation/predicted_data.html#sift

Page 25: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

SIFT1

0

0.05Deleterious

Tolerated

0

0.2

0.1

1Probably damaging

Benign

Possibly damaging

PolyPhen

Analysis: Pathogenicity scores

Page 26: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Analysis: Linkage disequilibrium

Linkage Disequilibrium (LD)

“the non-random association of

alleles at 2 or more loci within a given

population”

or

“how often two variants or specific

sequences are inherited together”

Page 27: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Analysis: Linkage disequilibrium

The Linkage Disequilibrium (LD) calculator

Within a genomic region...

For a list of variants...

For an defined area surrounding

your variant...

Page 28: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Where can I find this data?

● Website www.ensembl.org

● Variant Effect Predictor (VEP)

● BioMart

● Programmatically:

○ Perl API (including VEP)

○ REST API

Ensembl variation process

Linked data

Quality control

Variant import

Ensembl analysis

Page 29: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

IM

CM

AL

BL

BL102

AL476

CM

553IM

768

AL476

AGTCGTAGCTAGCAAGGCCATAGGCGA

Frequency A = 0.01, frequency G = 0.99G is the ancestral alleleA causes disease susceptibility

A is allele in the contig used⸫ A is the reference allele⸫ G is the alternate allele⸫ Alleles are A/G

Note: Reference & alternate alleles

Page 30: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Note: Reference & alternate alleles

http://www.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=12:120999079-121000079;v=rs1169305;vdb=variation;vf=829489

Page 31: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

AGTCGTAGCTAGCT/GAGGCCATAGGCGA

TCGCCTATGGCCTA/CGCTAGCTACGACT

Exon sequence:TATGGCCTA/CGCTAGC

Alleles in database = T/GAlleles in gene = A/C

Alleles = A/C -ve strand or T/G +ve strand

Alleles = A/C or T/GOften lack further info

Note: Allele strand

Page 32: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Demonstration

- Finding variants in a gene of interest, MCM6

- Finding variants at a genomic location of interest

- Finding out more information about a specific variant, rs4988235

Page 33: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

The Variant Effect Predictor

McLaren et al 2016 europepmc.org/abstract/MED/27268795

Page 34: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Your variant data

What does the VEP do?

• Affected gene, transcript

and protein sequence

• Splicing consequences

• Regulatory consequences

• Known variants:

+ Pathogenicity

+ Frequency data

+ Literature citations

A tool to predict and annotate the functional consequences of variants

/

Page 35: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

What does the VEP do?

Page 36: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Variant data input formatsVariant coordinates(Ensembl default)

1 881907 881906 -/C +5 140532 140532 T/C +12 1017956 1017956 T/A +2 946507 946507 G/C +14 19584687 19584687 C/T -

HGVS notation ENST00000285667.3:c.1047_1048insC5:g.140532T>CNM_153681.2:c.7C>TENSP00000439902.1:p.Ala2233AspNP_000050.2:p.Ile2285Val

VCF #CHROM POS ID REF ALT20 14370 rs6054257 G A20 17330 . T A20 1110696 rs6040355 A G,T20 1230237 . T .

Variant IDs rs41293501COSM327779rs146120136FANCD1:c.475G>Ars373400041

http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#input

Page 37: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Are your variants are already known?

○ dbSNP○ COSMIC○ Clinvar○ ESP○ HGMD-Public○ Phencode

How common are your variant alleles in different populations?

○ 1000 Genomes○ ESP ○ ExAC projects○ GnomAD

Phenotype/disease, clinical significance○ OMIM○ Orphanet○ GWAS catalog○ ClinVar

VEP features: finding known variants

Page 38: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Consequence predictions (choose multiple databases)○ Ensembl○ RefSeq○ Merged○ GENCODE basic

Does your variant overlap regulatory regions?○ ENCODE

○ BLUEPRINT

○ NIH Epigenomics Roadmap

○ Can be limited to regulatory regions observed in specific cell types.

Pathogenicity predictions○ SIFT○ PolyPhen○ via plugins: CADD, FATHMM, LRT, MutationTaster, and many more!

VEP features: consequence prediction

Plugin info: http://www.ensembl.info/ecode/category/vep-plugins/

Page 39: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

VEP features: plugins

Plugin info: http://www.ensembl.info/ecode/category/vep-plugins/

● Plugins add extra functionality to the VEP

● They may extend, filter or manipulate the output of the VEP.

● Plugins may make use of external data or code.

● Available on the web tool and with the script.

Page 40: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Use VEP with any species

http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html

● Access through the

web browser, REST

API or Perl API

● Use prebuilt caches

for Ensembl species.

...and for all species in

Page 41: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Use VEP with any species

http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html

● Speed up your VEP script with an offline cache.

● Or make your own from GTF and FASTA files - even for

genomes not in Ensembl.

Page 42: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Using VEP

ensembl.org/info/docs/tools/vep/index.html

Page 43: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

We have identified four variants on human chromosome nine:- A deletion at 128328461 - C->A at 128322349- C->G at 128323079- G->A at 128322917

We will use the Ensembl VEP to find out:- Are any of my variants already known?- What genes are affected by my variants?- Do any of my variants affect gene regulation?

Demonstration

Page 44: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Questions?○ We’ve muted all of your microphones

○ Join our Slack workspace and ask questions (link in your registration confirmation email)

○ My Ensembl colleagues will respond during the talk

○ Please reply @username to reply to a specific person

Emily Perry Astrid Gall

Page 45: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Course exercisesAll materials and exercises located here:

http://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-series-2016

A link to exercises and their solutions will appear in the

page hierarchy

This text will be replaced by a YouTube (link to YouKu too) video of the webinar

and a pdf of the slides.

The “next page” will be the exercises

Page 46: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Get help with the exercises

• Use the exercise solutions in the online course

• Join our Slack workspace and discuss the exercises with everybody in dedicated channels (register to get sent a link)

• Email us [email protected]

Page 47: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

This webinar courseDate Webinar topic Instructor

4th Sept Introduction to Ensembl ✔�

Ensembl genes ✔�

Astrid Gall

Emily Perry

6th Sept Variation data in Ensembl and the Ensembl VEP ✔�

Comparing genes and genomes with Ensembl Compara

Erin Haskell

Astrid Gall

11th Sept Finding features that regulate genes – the Ensembl Regulatory Build

Data export with BioMart

Emily Perry

Erin Haskell

13th Sept Uploading your data to Ensembl

Introduction to the Ensembl REST APIs

Astrid Gall

Emily Perry

Page 48: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

Coming up!

Comparing genes and genomes with Ensembl Compara

Ensembl Compara allows you to perform detailed analysis of

gene models between species.

During this webinar we take a look at the gene trees and

homologues of a set of genes, and at whole genome alignments

between pairs and groups of species.

Starting in ∼5 minutes! Astrid Gall

Page 49: publications - European Bioinformatics Institute · 6th Sept Variation data in Ensembl and the Ensembl VEP Comparing genes and genomes with Ensembl Compara Erin Haskell Astrid Gall

• Ensembl training materials are protected by a CC BY license

http://creativecommons.org/licenses/by/4.0/

• If you wish to re-use these materials, please credit Ensembl for their creation

• If you use Ensembl for your work, please cite our papers

http://www.ensembl.org/info/about/publications.html

Training materials


Recommended