Stephan Züchner, MD
John P. Hussman Institute for Human Genomics
University of Miami Miller School of Medicine
• Part of a patent licensing agreements with Athena Diagnostics.
• Receiving honorarium from Illumina.
Completion of Human Genome Project: 2001 - 2011
• Completion using Sanger sequencing
• Initiation of new seq technologies:
Shot-gun approach
Sequencing by synthesis
• Today, seq industry very competitive,
extremely innovative
• HiSeq2000 produces >300 billion bases per run (9days, ~$20K)
that is a 100,000-fold improvement in 10 years
• >600Gb by mid 2011
• The rate of technical improvement in the sequencing arena by far
outpaces Moore's Law (2 fold in 1.5 years).
Recent numbers …
• Most challenges relate to the analysis of data.
• Study designs.
• Interdisciplinary teams are key (molecular, bioinformatics, clinical,
statistical expertise).
• Ever evolving tool set – much time is occupied by staying up-to-date.
Challenges
03/2010
and Richard A. Gibbs, Ph.D.
Commentary in Nat Rev Neurology, S. Züchner 2010
Individual genome vs exome sequencing
o Not yet suitable for routine whole human genome sequencing:
o Cost for sequencing (still ~$10K per genome)
o Cost for data processing and storage
o Cost and time for bioinformatic analysis and follow-up studies
o For many disease-oriented applications in human genetics, partial sequencing
of the human genome is sufficient (linkage peaks, association areas, etc).
Hence, EXOME sequencing is becoming a major (temporary) application.
What (exactly) is the “Exome”?
Coding exons Mb coding exonic sequence
CCDS 196,266 ~32
Exome enrichment kits (Roche, Agilent, Illumina)
~200,000 ~38 - 62
o The number of all coding exons in the human genome.
o The true size is unknown and will continue to change over the next years.
o Exome kits capture ~96 - 98% of CCDS (Consensus Coding Sequence).
6,000 monogenic disorders described
<2,000 disease genes identified
For many disorders, Mendelian genes have provided unique guidance to the underlying pathways.
Immediate modeling in vitro and in vivo possible.
Gene discovery in Mendelian diseases
GWAS have successfully determined the contribution of common variation to disease.
A large gap of “missing heritability” exists for many phenotypes.
Rare variants may play a significant role in common so-called complex disease.
Rare variant discovery in common disease
o % of reads aligning to the human genome reference sequence.
o % of reads on target.
o % of targets covered by a minimum of reads.
o Allelic bias.
General issues with exon capture and NGS
(from Hedges et al., 2009; Nimblegen arrays/ 454 seq)
10
100
1000
Rea
d d
ep
th i
n r
ea
ds p
er
ba
se
po
sit
ion
• Uniformity of depth of sequence coverage requires 100-200 - times the
sequence amount of the target size
bp-wise sequence depth of CMT genes
Uniformity/ evenness of coverage depth
Nimblegen V. 2
Newer designs of capture kits improve evenness and coverage Plots of Coverage Depth Across exons of 40 CMT Genes
Nimblegen V. 1
* (p<0.05)
Coefficient of variation - EZ exome V1 vs. V2
Based on 40 neuropathy
related genes.
V1 Roche V2 in house V2 Roche
Proportion of uncovered bases - EZ exome V1 vs. V2
Based on 40 neuropathy
related genes.
V1 Roche V2 in house V2 Roche
Avg
. pro
port
ion o
f uncovere
d b
ases p
er
gene
(Miller syndrome)
(Bartter syndrome) November 2009
American Journal of Human Genetics, February, 2011
• Retinitis pigmentosa (RP) causes degeneration of photoreceptors:
Impaired night vision loss of peripheral vision
loss of central vision in later life.
• Prevalence is approximately 1 in 3,000 - 4,500 individuals.
• 50 genes are known to cause RP, but …
~ 50% of RP patients have mutations in unknown genes.
Images from the Foundation Fighting Blindness.
• We studied an RP family of Ashkenazi Jewish origin.
• All known RP genes had been excluded.
• Single pedigree with only three affected siblings
- traditionally very difficult to find the underlying novel gene.
Affected
Sibling 1
Affected
Sibling 2
Affected
Sibling 3
Missense,
non-sense, splice
site variations
8,712 8,716 8,752
Filtered for
homozygosity
and novelty
11 18 27
Variants detected with exome sequencing
• Across the four individuals we identified 19,307 coding single nucleotide variants.
• No novel indels co-segregated with disease.
Affected
Sibling
1+2
Affected
Sibling
1+2+3
+ NOT in
Unaffected
Sibling 4
5
4
1
(DHDDS)
All detected changes Sharing within family
Chromosomes screened
Variant observed
Estimated MAF
Estimated homozygous frequency
Jewish 1,434 8 0.0056 0.00003136
Non Jewish 13,954 0 < 0.000072 < 5.2 E-09
Unknown Ethnicity 11,786 1 0.000085 7.2E-09
Sum 27,174 9
Detailed results of genotyping of population controls for
the identified variant in DHDDS.
DHDDS (dehydrodolichol diphosphate synthase) links
important pathways in RP
1. Pathway analyses
The mutated amino acid is highly conserved across species
2. Conservation analysis
3D in silico modeling of protein function
• The K42 (+) residue stabilizes the farnesyl-pyrophosphate (FPP) binding
pocket via charge-charge repulsive forces towards R38 (+).
• The mutant E42 (-) will compete for R38 (+) binding.
3. In-silico function
Morpholino knock-down of DHDDS function in zebrafish
DHDDS deficient Morpholino control
Compared to control zebrafish, morpholino knock-down of DHDDS
significantly reduces escape reactions to light changes.
4. Animal modeling
Histopathology of zebrafish eye – rods of photoreceptors are degenerated
DHDDS deficient Wild type
5. Additional genetic support
DHDDS mutation found in
15 out of 123 index patients
(12%)
Summary
• We have identified a novel RP gene, DHDDS, highlighting a key
biological pathway.
• Exome sequencing of rare genetically heterogeneous
phenotypes will require complementary functional approaches.
• We have demonstrated that in silico protein studies and
zebrafish modeling are sufficient, fast, and cost-effective
strategies.
Science, November 2011
Team work ...
HIHG
Stephan Züchner, Gary Beecham, Adam Naj, Amjad Farooq, Martin Kohli,
Patrice L. Whitehead, William Hulme, Ioanna Konidari, Juan Young, David
Seo, Susan Blanton, Jeffery M. Vance, and Margaret A Peričak-Vance
Department of Biology
Julia Dallman
BPEI
Byron Lam, Rong Wen, Eduardo Alfonso
Vanderbilt University
Jonathan Haines
Department of Biochemistry
Amjad Farooq
Mt. Sinai Hospital, NYC
Joseph Buxbaum
What can go wrong in targeted or exome sequencing?
Capture/ enrichment:
• Technical issues, sample mix-up
• Relevant variant(s) not covered by capture/ enrichment kit (capture probe design, large
sequence never 100% suitable for hybridization)
• Uniformity/ evenness low
Sequencing:
• Technical issues
• Insufficient sequence amount (low coverage)
• Read length choice, single vs paired-end reads
Analysis:
• Ambiguous and/ or multiple alignment of reads (pseudo genes, repetitive sequence, GC)
• Variant calling fails for specific reasons (low coverage or quality)
Annotation:
• Automated mass annotation is essential, but can be erroneous or incomplete (splice
variants, functional synonymous changes, bindings sites for regulatory factors, unknown
exons)
Interpretation:
• Wrong assumptions regarding the outcome (statistical model, class of molecular variant)
• Inadequate statistical power
• Human error
What do we usually miss with exome sequencing?
• Copy number variation
• Large indels (>20bp)
• Long repeats (STR)
• Homologous regions
• Unknown exons
• UTR
• Regulatory and intronic changes
Hussman Institute for Human Genomics
• 7 next generation sequencing instruments, max capacity of 1.5 Trillion base
pairs every 9 days (this will roughly double with instrument upgrade early May).
• Single run produces ~4 Terabytes of raw data: 1.2 Petabyte disc storage.
• 5,000 node computing cluster.
• Developed fully-automated exome capture on Caliper robot with capacity of 288
exome samples per week.
At HIHG a wide range of diseases are being studied with
targeted and exome sequencing
• Alzheimer disease
• Amyotrophic lateral sclerosis
• Age-related macula degeneration
• Autism
• Club foot
• Charcot-Marie-Tooth disease
• Deafness
• Essential tremor
• Dilated cardiomyopathy
• Hereditary spastic paraplegia
• HIV
• Multiple sclerosis
• Parkinson disease
• Variety of recessive syndromes
• …
HIHG faculty are actively publishing in the exome field since late 2009
• Hedges D, et al. (2009) Exome sequencing of a multigenerational human pedigree. PloS One.
• Martin ER et al. (2010) SeqEM: an adaptive genotype-calling approach for next-generation sequencing
studies. Bioinformatics.
• Sirmaci A et al. (2010). MASP1 mutations in patients with facial, umbilical, coccygeal, and auditory findings of
Carnevale, Malpuech, OSA, and Michels syndromes. Am J Hum Genet.
• Montenegro G et al. (2011) Exome sequencing allows for rapid gene identification in a Charcot-Marie-Tooth
disease family. Annals of Neurology.
• Norton N et al. (2011) Genome-wide Studies of Copy Number Variation and Exome Sequencing Identify Rare
Variants in BAG3 as a Cause of Dilated Cardiomyopathy. Am J Hum Genet.
• Züchner S et al (2011) Whole-exome sequencing links a variant in DHDDS to retinitis pigmentosa. American
Journal of Human Genetics.
• Hedges DJ et al. (2011) Comparison of three targeted enrichment strategies on the SOLiD sequencing
platform” PloS One.
• …
Exome and targeted sequencing is a mature research tool.
Cost-effective: < US $2,000 today; ~$1,000 by end 2011
It allows entry into Human Genomics with all its complications of data analysis
and interpretation.
Summary
Is targeted sequencing here to stay (vs whole genome seq)?
• Probably as long as the economics are attractive.
• And as long as new discoveries are indeed possible.