+ All Categories
Home > Documents > Building a clinical genome interpretation services company

Building a clinical genome interpretation services company

Date post: 11-May-2015
Category:
Upload: reece-hart
View: 486 times
Download: 2 times
Share this document with a friend
Description:
Talk given to the Berkeley sequencing supergroup in January 2012.
Popular Tags:
28
1/28 Reece Hart — Locus Development Reece Hart, Ph.D. [email protected] Locus Development Inc. http://locusdevelopmentinc.com/ Building a clinical genome Building a clinical genome interpretation services company interpretation services company
Transcript
Page 1: Building a clinical genome interpretation services company

1/28Reece Hart — Locus Development

Reece Hart, [email protected]

Locus Development Inc.http://locusdevelopmentinc.com/

Building a clinical genome Building a clinical genome interpretation services companyinterpretation services company

Page 2: Building a clinical genome interpretation services company

2/28Reece Hart — Locus Development

OpportunityOpportunity

Page 3: Building a clinical genome interpretation services company

3/28Reece Hart — Locus Development

Clinical Genome InterpretationClinical Genome Interpretation

photos:Baylor College of Medicine, Univ. Utah, learningradiology.com, sciencephotos.com

Patient presents with symptoms

If genomic interpretation might influence diagnosis or treatment, doctor refers patient to genetic counselor

GC takes history; sample is sent to internal or one of hundreds of labs that provide specific genomic tests

Sequencing and other lab data are processed into preliminary iterpretation

Report is returned to GC and/or physician who

verify interpretation and consult with patient

Page 4: Building a clinical genome interpretation services company

4/28Reece Hart — Locus Development

100s of laboratory diagnostic testing labs100s of laboratory diagnostic testing labs

Page 5: Building a clinical genome interpretation services company

5/28Reece Hart — Locus Development

Common variants are hard to interpretCommon variants are hard to interpret

Page 6: Building a clinical genome interpretation services company

6/28Reece Hart — Locus Development

Some variants are informativeSome variants are informative

Page 7: Building a clinical genome interpretation services company

7/28

The Significance ofThe Significance of“Variants of Uncertain Significance”“Variants of Uncertain Significance”

“VUS – Variant of uncertain significance. A variation in a genetic sequence whose association with disease risk is unknown. Also called variant of uncertain significance, variant of unknown significance, and unclassified variant.”http://www.cancer.gov/cancertopics/genetics-terms-alphalist

Page 8: Building a clinical genome interpretation services company

8/28

The long tail of rare diseases.The long tail of rare diseases.

“A rare disease typically affects a patient population estimated at fewer than 200,000 in the U.S. There are more than 6,000 rare diseases known today and they affect an estimated 25 million persons in the U.S.”

NIH Office of Rare Diseases Researchhttp://rarediseases.info.nih.gov/

Page 9: Building a clinical genome interpretation services company

9/28Reece Hart — Locus Development

The Problems to SolveThe Problems to Solve

➢ Develop a reliable database of genotypes and phenotypes.

➢ Develop methods to interpret all types of variants, not just common SNVs.

➢ Provide meaningful, reliable interpretations based on genomic data.

➢ Do it better than everyone else.

Page 10: Building a clinical genome interpretation services company

10/28Reece Hart — Locus Development

PlanPlan

Page 11: Building a clinical genome interpretation services company

11/28Reece Hart — Locus Development

Company OverviewCompany Overview

Genomic Sequenceand Variants

ClinicalInterpretation

Locus

Page 12: Building a clinical genome interpretation services company

12/28Reece Hart — Locus Development

Curating Genotypes, Phenotypes, and RiskCurating Genotypes, Phenotypes, and Risk

Genotypes/Variants

Phenotypes/Conditions

Genotype-Phenotype Database

RiskModels

dbSNPLSDBs

PharmGKB…

GOOMIMICD-9/10…

Page 13: Building a clinical genome interpretation services company

workflow and tracking

Locus OverviewLocus Overview

hospitals/clinics, physicians, insurers

sequencessequences variants/attributesvariants/attributes

conditionpredictionscondition

predictionsinter-

pretationinter-

pretation

Page 14: Building a clinical genome interpretation services company

14/28Reece Hart — Locus Development

ImplementationImplementation

Page 15: Building a clinical genome interpretation services company

15/28Reece Hart — Locus Development

Curation ContentCuration Content

➢ Many sources● automated and manual tools● databases and literature

➢ Most kinds of variants● SNV, del, ins, delins, repeat, conv, CNV,

haplotypes➢ Many kinds of conditions

● inherited, spontaneous, dominant, recessive, x-linked, preventative, cancer, metabolic, pharmacogenomic, cardio

➢ Examples:● Cystic Fibrosis (w/modifiers)● CMT (~21 subclasses)● Long and Short QT● TPMT, warfarin, CYP2D6

Page 16: Building a clinical genome interpretation services company

execution frameworkexecution framework

The pipelineThe pipeline

inter-pretation

inter-pretation

variantcallingvariantcalling

reads(fastq)reads(fastq)

calls(vcf)calls(vcf)

report(xml)report(xml)selectionselection

attributes(xml)

attributes(xml)

req'n andsample inforeq'n and

sample infocond'n var.cond'n var. risk modelsrisk models

curationcuration

@G88NFDU01AI6Z3 rank=0000170 x=101.0 y=1953.5 length=56AGTGTAGTAGTGAGAAAAACTTTGTGGGGATATGGATACAATTATTTACCCAAATC+IIIIIIIIIIIGC>////-....826666<EIIIIIIIIIIIHI6644/..222==@G88NFDU01AKOQI rank=0000178 x=118.0 y=1960.0 length=59agtgtagtagtaaggaagattgagtgcctgaccttCCGGGTGGCGGTAGCGTTGGCCCC+BHBEEIIIIIEEEEEBGBECCCDEIIIIIIIIIIIEEICC===988ED>?>>>88...-@G88NFDU01AL6H7 rank=0000323 x=135.0 y=2013.5 length=95agtgtagtagtgtgagctggtgaagaaggtctccGATGTCATATGGAACAGCCTCAGCCGCTCCTACTTCAAGGATCGGGCCCACATCCAGTCCC+=>BBBBB==;;B>454@EA@>>===>>BBIIE@ACIGIEIFFDD66665@@:::>AA777A<;;>A>?>>4433;>>;660000.9=85533,,,

##fileformat=VCFv4.1…##FILTER=<ID=LowQual,Description="Low quality">##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in ##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">##VariantFiltration="analysis_type=VariantFiltration input_file=[] sample_metadata=[] read_buffer_size=null phone_home=NO_ET read_filter=[] intervals=null excl##contig=<ID=GL000240.1,length=41933,assembly=b37>##reference=file:///locus/data/references/genomes/human_g1k_v37/sequences/human_g1k_v37.fasta##source=SelectVariants#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT LS991 145414740 . A N . PASS AC=0;AF=0.00;AN=2;DP=137;MQ=213.16;MQ0=0 GT:DP:GQ:PL:AD 0/0:137:99:0,412,5401:137,01 145414741 . G N . PASS AC=0;AF=0.00;AN=2;DP=138;MQ=213.16;MQ0=0 GT:DP:GQ:PL:AD 0/0:138:99:0,415,5406:138,01 145414742 . C N . PASS AC=0;AF=0.00;AN=2;DP=139;MQ=212.39;MQ0=0 GT:DP:GQ:PL:AD 0/0:139:99:0,418,5289:139,0

<?xml version="1.0" encoding="UTF-8"?> <sample-attributes> <sample-info id="LS125" gender="Unknown" … > <reference> <organism>homo sapiens</organism> <build-id>human_g1k_v37</build-id> </reference> <loci> <locus chr="3" start="10183685" end="10183685" sequence="G" read-coverage="0"> <alt read-coverage="" quality-score="" sequence="" locus-cvid-code="CVID1003741" locus-cvid="A|G" locus-cvid-start="10183685"/> </locus>

<locus-report format="1.0"> <requisition> <!-- I'm not giving this section too much thought. Good enough for now Can update later, when we commercialize --> <client id="uuid"></client> <patient name="LS99" ethnicity="" gender="Male" dob="" id="LS99"></patient>… <conditions> <condition code="VonHL"> <associated-conditions></associated-conditions> </condition> </conditions> </requisition> <coverage> <sequence minimum-depth="100" sensitivity="98.9" specificity="99.0"> <region genome-build="GRCh37" chrom="3" end="440" start="400"></region> <region genome-build="GRCh37" chrom="3" end="700" start="600"></region>

variants_and_refagree.vcf filtered_on_callable.vcf: lake.mk reads.fastq (set -e; \ source /locus/opt/lake/bin/lakeSetupEnv; \ $(MAKE) -f $< $@; \ ) 2>[email protected]

calls.vcf: variants_and_refagree.vcf #filtered_on_callable.vcf ln -s $< $@

attr.xml: calls.vcf req.xml sample.xml ${ATTR_FILE} generate_attributes_file.py …

report.xml: attr.xml req.xml sampleconditionreport $^ -o $@

report.html: report.xml reportrenderer $< -o $@

LIMSLIMS

<?xml version="1.0"?><requisition> <conditions> <condition>VonHL</condition> </conditions></requisition>

<?xml version="1.0" encoding="UTF-8"?><samples><sample-info id="LS99" gender="Male" birth-date="/Date(1320120000000-0700)/" type="GenomicDNA" status="New" ordering-clinician="JMajor" nanodrop-concentration="300" original-barcode="NA06994" use-type="RD" origin="Coriell" code="NA06994" concentration="300" description="" accession-date="" accession-user=""/></samples>

Page 17: Building a clinical genome interpretation services company

The pipeline in actionThe pipeline in action$ ls reads.fastq.gz req.xml sample.* Makefile Makefile reads.fastq.gz req.xml sample.info sample.xml

$ time make report.html report.pdfgzip -cdq <reads.fastq.gz >reads.fastqlake --recipe reads_to_variants >lake.mkln -s variants_and_refagree.vcf calls.vcf

generate_attributes_file.py ...

sampleconditionreport attr.xml req.xml -o report.xml

reportrenderer report.xml -o report.html

wkhtmltopdf report.html report.pdf

real 7m14.804suser 7m16.490ssys 2m0.150s

Page 18: Building a clinical genome interpretation services company

18/28Reece Hart — Locus Development

Locus InterpretationLocus Interpretation

Page 19: Building a clinical genome interpretation services company

19/28Reece Hart — Locus Development

The big lesson…The big lesson…

Transcripts are muchTranscripts are muchmessier than expected.messier than expected.

Page 20: Building a clinical genome interpretation services company

20/28Reece Hart — Locus Development

Problem statementProblem statement

There is no single source of transcripts that is all of: stable (archived), mapped, agree with the reference genome, have RefSeq accessions.

➢ Issues:● Poor access / programmability● No archived mappings● RefSeq != reference genome due to origin,

ambiguity, error● Patches are difficult to use

Page 21: Building a clinical genome interpretation services company

21/28Reece Hart — Locus Development

When RefSeq != Genome ReferenceWhen RefSeq != Genome Reference

NM_0123.4:c.45C>T

NC_000006.11:g.31030103C>T

variant publishedrelative to RefSeq

NM_0123.4:c.832T>G

NC_000006.11:g.31038124T>G

discovered variantreported relative to RefSeq

Amismatch ins/del

-

downstream coordinatesshifted

Page 22: Building a clinical genome interpretation services company

22/28Reece Hart — Locus Development

17.8% of RefSeq transcripts differ from 17.8% of RefSeq transcripts differ from GRCh37GRCh37

Garla, V., Kong, Y., Szpakowski, S., & Krauthammer, M. (2011).MU2A--reconciling the genome and transcriptome to determine the effects of base substitutions.Bioinformatics (Oxford, England), 27(3), 416-8. doi:10.1093/bioinformatics/btq658

5.4% have coordinate-changing differences

Page 23: Building a clinical genome interpretation services company

23/28Reece Hart — Locus Development

Sources of transcript informationSources of transcript information

➢ NCBI:● map current transcripts to current genome only● maps with splign● doesn't agree with ref genome ~18%● no local database option

➢ UCSC:● current transcripts only● maps using blat

➢ Ensembl:● aligns using in-house gene building process● cross-linked to refseqs● incorporates NCBI transcripts ad hoc● well-maintained; good API; broad data; VEP

Page 24: Building a clinical genome interpretation services company

24/28Reece Hart — Locus Development

PTEN: insertion/deletion in 5' UTRPTEN: insertion/deletion in 5' UTR

Page 25: Building a clinical genome interpretation services company

25/28Reece Hart — Locus Development

NEFL: genome insertion leads to NEFL: genome insertion leads to frameshift/stopframeshift/stop

Page 26: Building a clinical genome interpretation services company

26/28Reece Hart — Locus Development

RefSeq HandlingRefSeq Handling

---------- Forwarded message ----------Date: Wed, Jan 25, 2012 at 1:59 PMSubject: [Genome] How does UCSC hg19 gene model add exons to RefSeqs?To: [email protected]

Hi, when using the human reference hg19 gene model… where the hg19 model has an exon that does not exon exist in the RefSeq accession (or any historical version of the RefSeq accession).

How/why does the alignment introduce an intron in this case? Does it ensure there are plausible flanking splice junctions before inserting an intron to a RefSeq sequence that lacks it but it maps to?

Page 27: Building a clinical genome interpretation services company

27/28Reece Hart — Locus Development

338 genes so far

➢ We should encourage LRG and adopt it when ready(and we'll still have to deal with legacy transcripts)

Page 28: Building a clinical genome interpretation services company

28/28Reece Hart — Locus Development

Not pictured: Jon Sorenson


Recommended