Validation of 100,000 Genomes Project results; from WGS to the Genomics
England result
Emma Baple Consultant Clinical Geneticist, South West GMC
Clinical Lead for Rare Disease Validation and Feedback
• Illumina Laboratory Services (ILS) set
up to deliver the 100k genome project
• Operate in Chesterford and Hinxton
facility
• Illumina – NHS Genomics Medicine
Sequencing Centre purpose-built
sequencing facility
• The Ogilvie Building, Hinxton
Campus
• Illumina Laboratory Services at Illumina
Cambridge Limited has been accredited
to ISO15189:2012 Acc No. 8523 on 6th
April
Illumina Laboratory Services
Simplified Workflow
Patient/ family
Phenotypes & Pedigree
DNA
Genome sequence
Annotated VCFs
Tiered variants
Gene Panel Variant filtering
Annotation by partner company
Review
Gene Panels
Clinical assessment
GeCIP(s)
Validation Outcomes
Result Email Alert
Report QC
Simplified Workflow
Patient/ family
Phenotypes & Pedigree
DNA
Genome sequence
Annotated VCFs
Tiered variants
Gene Panel Variant filtering
Annotation by partner company
Review
Gene Panels
Clinical assessment
GeCIP(s)
Validation Outcomes
Result Email Alert
Report QC
PanelApp
“For diagnostic purpose, only
genes with a known
(i.e. published and confirmed)
relationship between the aberrant
genotype and the pathology,
should be included in the
analysis.”
(EuroGentest and ESHG guidelines)
PanelApp
Gene used for variant tiering
Gene on reserve list
We are
currently
undergoing
evaluation of
reviews &
further
curation
A gene panel
is set up on
PanelApp for
all approved
rare diseases
to allow review
and genes to
be added
Version 0 gene panel on PanelApp
Version 1 gene panel on PanelApp
Gene found in 3 or 4 sources
Gene found in 2 sources
Gene found in 1 source/expert list
Expert review & resolution
We now have
50%
Version 1
panels
A big thank
you to all our
expert
reviewers
without
whom these
panels would
not exist
https://bioinfo.extge.co.uk/crowdsourcing/PanelApp/
Simplified Workflow
Patient/ family
Phenotypes & Pedigree
DNA
Genome sequence
Annotated VCFs
Tiered variants
Gene Panel Variant filtering
Annotation by partner company
Review
Gene Panels
Clinical assessment
GeCIP(s)
Validation Outcomes
Result Email Alert
Report QC
Primary Findings
• All variants will be available to GMCs and GeCIPs
• Rare functional variants of potential relevance to the patient’s
condition will be automatically categorised into three tiers
Tier 1 and 2 in the gene panel(s)
Tier 3 outside gene panel(s)
• A small number of plausible candidate variants will be flagged
to aid clinical evaluation
NB - Secondary looked-for findings (if consented) will be returned
separately as part of the main programme, not in the pilot
Details Criteria Details
MAF Datasets: ExAC, 1000G, NHLBI, UK10K, GEL internal Thresholds: <0.1% for dominant models, <1% per variant for recessive models
LOCATION Within gene on gene panels, selected automatically based on level 4 disease code and supplemented by GEL review team based on phenotype review
GENOTYPE Het (or hemi) under dominant models, hom or compound het in recessive models
INHERITANCE Where known: Mendelian rules observed (plus imprinting)
PREDICTED CONSEQUENCE
Most severe consequence predicted across all transcripts using VEP, categorised into high impact and moderate impact:
High impact (mostly protein truncating) = transcript ablation, splice donor variant, splice acceptor variant, stop gained, frameshift variant, stop lost, start lost
Moderate impact (mostly protein altering) = inframe insertion, inframe deletion, missense variant, transcript amplification, splice region variant or incomplete terminal codon variant
10
VEP Impact Sequence Variant Class
HIGH Frameshift, splice donor/acceptor, stop codon gained/lost, initiator codon (truncating variants)
MODERATE Inframe indel, missense variant, splice region, incomplete terminal codon
LOW Synonymous variant, stop codon retained, mature miRNA variant, 5’/3' UTR, intron variant, nonsense mediated decay (NMD) transcript variant
LOWEST Intergenic, transcription factor binding site variant
Variant Effect Prediction
• GEL will tier variants automatically for all patients • Gene panels automatically and manually selected
• Tiered variants sent to annotation providers to be highlighted in decision-support tool
• Each annotation provider may highlight a small number of candidate diagnostic variants based on:
• In-house algorithms
• Review by in-house clinical scientists
• GMCs decide which variants to validate and communicate to patients
• If there are no plausible candidates, you can issue an ‘empty’ result
19 August 2015
Process
Primary Findings
AIM: to maximise diagnostic efficiency
Genomic variants will be automatically annotated, filtered and
prioritised based on:
-Frequency i.e. rare
-Location i.e. inside gene panel(s)
-Genotype i.e. allelic state consistent with MOI of gene
-Inheritance i.e. consistent with family history
-Predicted consequence i.e. coding change
Conservative approach that balances sensitivity and specificity.
Note that:
-Some automatically prioritised variants will not be clinically relevant
-Some diagnostic variants will not be automatically prioritised (but
hopefully will be highlighted by company annotations…)
Yes No
Tier 1 Tier 2 Tier 3
Yes No
Is the variant in a gene in the Virtual Gene Panel (green list) for that disorder?
Known Pathogenic
Yes No
Tier 3
Is the variant in a gene in the Virtual Gene Panel (green list) for that disorder?
Most severe predicted consequence of variant? Other
consequence
The variant allele is not commonly found in the general healthy population Allelic state matches known mode of inheritance for the gene and disorder
Familial segregation (where applicable)
Variant
Protein truncating
Protein altering
Tier 1 & 2
TIER 1: Known pathogenic variants, protein truncating variants
and de novo protein altering variants WITHIN the
phenotype assigned gene panel(s)
TIER 2: Protein altering variants WITHIN the phenotype assigned
gene panel(s)
Expect ~0-2 per trio, ~0-20 per singleton
Should be clinically evaluated by GMCs
(NB - May be supplemented by additional diagnostic candidates from
annotation partners)
Tier 3
TIER 3: Protein truncating and protein altering variants OUTSIDE
known disease gene panel(s)
All modes of inheritance considered (dominant, recessive, X-linked,
imprinted)
Expect ~10-20 per trio, ~100-400 per singleton
NOT intended for GMCs to evaluate routinely, aimed more at research!
Most will be irrelevant.
BUT may contain diagnosis if:
•Gene panel is incomplete (need strong evidence of association)
•Appropriate gene panel not applied
•Etc.
Untiered variants
Obviously millions per person!
REALLY NOT intended for GMCs to evaluate routinely! Almost 100%
will be irrelevant.
BUT annotation providers may find candidates here, as may contain
diagnosis if:
•Inappropriately excluded due to MAF cut-off (e.g. founder effect)
•Segregation or MOI for gene in panel incorrect
•Incomplete penetrance
•Etc
Simplified Workflow
Patient/ family
Phenotypes & Pedigree
DNA
Genome sequence
Annotated VCFs
Tiered variants
Gene Panel Variant filtering
Annotation by partner company
Review
Gene Panels
Clinical assessment
GeCIP(s)
Validation Outcomes
Result Email Alert
Report QC
Advantages of Tools
•Interactive
•Visualisation of read level support for variants
•Additional annotation including: Allele frequencies, previous pathogenicity assignments, in silico prediction tools
•Additional variant prioritisation tools eg: semantic similarity
•Audit record at variant and case levels
•Database and software version record
•Facilitate MDT work
Simplified Workflow
Patient/ family
Phenotypes & Pedigree
DNA
Genome sequence
Annotated VCFs
Tiered variants
Gene Panel Variant filtering
Annotation by partner company
Review
Gene Panels
Clinical assessment
GeCIP(s)
Validation Outcomes
Reporting Portal
Report QC
Primary Findings
Default view will be:
Tier 1 and 2 variants
identified in the gene
panel(s) applied and
any candidate(s)
highlighted by the
annotation provider
Simplified Workflow
Patient/ family
Phenotypes & Pedigree
DNA
Genome sequence
Annotated VCFs
Tiered variants
Gene Panel Variant filtering
Annotation by partner company
Review
Gene Panels
Clinical assessment
GeCIP(s)
Validation Outcomes
Result Email Alert
Report QC
Data Quality
Genetic checks to
ensure clinical
and genomic data
consistency
Proportion of the genome where 0,1 or 2 alleles are shared between 2 individuals
Sex
Relatedness
Ethnicity
Result Quality
PATIENT-LEVEL QC:
ID correct match for age/sex/family structure, recruiting centre correct, phenotypes correct, correct disease assigned, correct panel(s) applied
VARIANT-LEVEL QC: Prioritised variants
Read level support for variant.
Does allelic state, expected mode of inheritance and segregation match with gene mode of inheritance and family history, any detected complex events (e.g. UPD) included in report
Simplified Workflow
Patient/ family
Phenotypes & Pedigree
DNA
Genome sequence
Annotated VCFs
Tiered variants
Gene Panel Variant filtering
Annotation by partner company
Review
Gene Panels
Clinical assessment
GeCIP(s)
Validation Outcomes
Result Email Alert
Report QC
Validation and Confirmation
Heterozygous de novo GATA6
missense mutation:
c.1354A>G, p.Thr452Ala
Exportable csv
file to aid variant
confirmation
Simplified Workflow
Patient/ family
Phenotypes & Pedigree
DNA
Genome sequence
Annotated VCFs
Tiered variants
Gene Panel Variant filtering
Annotation by partner company
Review
Gene Panels
Clinical assessment
GeCIP(s)
Validation Outcomes
Result Email Alert
Report QC
Interpretation Outcomes
For all variants evaluated by GMCs, we need to know:
• Was the variant confirmed using an orthogonal technique?
(validated, false positive, not applicable)
• Is the variant (or variant combination) pathogenic?
(assessment using standard 1-5 scale from benign to pathogenic)
• Do variant(s) explain all or part of the patient’s phenotype?
Simplified Workflow
Patient/ family
Phenotypes & Pedigree
DNA
Genome sequence
Annotated VCFs
Tiered variants
Gene Panel Variant filtering
Annotation by partner company
Review
Gene Panels
Clinical assessment
GeCIP(s)
Validation Outcomes
Result Email Alert
Report QC
Validation & Feedback GeCIP Working Groups: Interested in clinical results
Validation: Analytical validity: is the variant correctly identified?
Are variants correctly filtered in/out by bioinformatics pipeline? Are reported variants actually present in the DNA sequence?
Clinical validity: are variants relevant to the patient’s phenotype? Is variant(s) pathogenic?
Does variant(s) explain all or part of the phenotype?
Feedback: Clinical utility: does result alter clinical management?
Which variant(s) are reported to patient/family? Has result changed management of patient/family?
V & F GeCIP
• First 39 pilot cases – 15 singletons: • 11/15 had 0 tier1&2
• 3/15 had 2 tier1&2
• 1/15 had 3 tier1&2
• 12/39 had a probable diagnosis
19 August 2015 35
NB. Variety of gene panels
First GEL data
Illumina BAMs
Annotation provider
GEL Bioinformatics Illumina SNVs/indels
Illumina CNVs
GEL SNVs/indels 1
GEL tiered variants
QC, variant calling, annotation
Gene panels
Candidate variants
Prioritization algorithms
GEL QC check
Validation & Feedback
MDT review
Access via virtual desktop to annotation partners’ tool
1 & GEL CNVs in future 2 Personal identifiable data is stored in a separate database. From Autumn 2016, this will be hosted by NHS Digital (formerly HSCIC)
Genomics England Simplified Genomic Data Flow (July 2016)
Personal data 2
GMC
Phenotypes & pedigree
Summary • Tiering and quality checks will be done centrally at Genomics
England to ensure consistency
• Tier 1&2: small number of plausibly pathogenic variants within gene panel(s) relevant to the patient’s phenotype
• Tier 3: potentially interesting variants outside of gene panel(s) relevant to the patient’s phenotype
• Third party decision support systems where sites can review,
validate and report results
• GMCs should aim to clinically evaluate tier 1&2 variants and
any candidates highlighted by the annotation partner(s) for a
likely diagnosis
• Genomics England export the data to produce a single
consistent knowledge base
• GMCs control diagnostic interpretation and reporting
• GeCIPs will help with interpretation and key role is improving
diagnostic yield through discovery
Sequencing Production at Illumina
Sample Accession
Library Prep
Library Quality Control qPCR
X10 Sequence + Run QC
Genome build tumour / normal subtraction
gVCF annotation HiSeq Analysis Software
Analysis Quality Control, Identity Check, Contamination
screen
Network Delivery
Delivery check
Automation + 96 well format
Flowcell prep
cBOT2
Genotype set up
Clarity X LIMs & AIMS Project configured
Track lab and analysis processes Project management, Pipeline automation
Library Amp
(if needed)/
Genotyping
Sample Quant Quality Control,
FFPE check
Pre-PCR lab
Yield per run Tb
Yield Per lane Gb
% Cluster PF
% ≥Q30 R1
% ≥Q30 R2
% Aligned
2.11 132 72 93 83 95