HVP Critical Assessment of Genome Interpretation

Post on 11-May-2015

734 views 2 download

Tags:

description

Note: CAGI occurred in Dec 2010, after I left Berkeley. Susanna Repo made the event happen and it would not have occurred without her.

transcript

CAGI (\ˈkā-jē\)Critical Assessment of Genome InterpretationA community experiment to evaluate phenotype prediction

Reece Hart (with Steven Brenner and John Moult)QB3 / Center for Computational BiologyUC Berkeleyreece@berkeley.edu

Human Variome Project MeetingParis 2010-05-12

ca·gey \ˈkā-jē\ adjective1: hesitant about committing oneself;2a: wary of being trapped or deceived;2b: marked by cleverness

2

The Significance of“Variants of Uncertain Significance”

“VUS – Variant of uncertain significance. A variation in a genetic sequence whose association with disease risk is unknown. Also called variant of uncertain significance, variant of unknown significance, and unclassified variant.”http://www.cancer.gov/cancertopics/genetics-terms-alphalist

3

The long tail of rare diseases.

“A rare disease typically affects a patient population estimated at fewer than 200,000 in the U.S. There are more than 6,000 rare diseases known today and they affect an estimated 25 million persons in the U.S.”

NIH Office of Rare Diseases Researchhttp://rarediseases.info.nih.gov/

4

Interpretation of Unclassified Variantsa sampling of responses from genetic counselors

➢ Routinely used● dbSNP● OMIM● GeneReviews● PolyPhen● SIFT● PubMed● Mailing lists

➢ Selectively used● PharmGKB● LSDBs● Domain prediction● Structure impact

analysis● Homology

5

Genome Variant Impact Prediction Toolsan incomplete list

Program URL

CUPSAT

SIFTSNAP

SNPs3D

Align-GVGD http://agvgd.iarc.fr/AutoMute http://proteins.gmu.edu/automute/

http://cupsat.tu-bs.de/Dmutant http://sparks.informatics.iupui.edu/hzhou/mutation.htmlnsSNPAnalyzer http://snpanalyzer.uthsc.edu/PantherPSEC http://www.pantherdb.org/tools/csnpScoreForm.jspPhD-SNP http://gpcr.biocomp.unibo.it/~emidio/PhD-SNP/PhD-SNP.htmPmut http://mmb2.pcb.ub.es:8080/PMut/

PolyPhen http://coot.embl.de/PolyPhen/http://sift.jcvi.org/http://cubic.bioc.columbia.edu/services/snap/

SNP Function Pred. http://www.ensembl.org/ [N.B. login required]SNPinfo / FuncPred http://snpinfo.niehs.nih.gov/snpfunc.htm

http://snps3d.org/UMD-predictor http://www.umd.be/

6

Current methods are the tip of the iceberg.

~1%

~99%

m

Cnon-proteintranscripts

proteintranscripts

repeats indels epigenetics

7

Objectively Assessing Computational Predictions

Data Acquisition

Publication

The Prediction Window~1-12 months when unpublishedhigh-quality data are available

➢ CASP – Structure prediction➢ CAPRI – Protein-ligand docking➢ EGASP – Encode Gene Annotation➢ RGASP – RNA-Seq mapping➢ DREAM – network model assessment

8

➢ Follow the successful critical assessment framework:

● Solicit pre-publication genotype-phenotype associations

● Provide genomic data to predictors and collect their predictions

● Assess predictions against revealed annotations, mechanisms, and phenotypes

CAGI – Critical Assessment of Genome InterpretationA community assessment of the state-of-the-art in phenotype prediction.

9

Please contact us if you have pre-publication genotype-phenotype association data.

Sample Prediction Categories

MolecularA

T

OrganismalA

T

CellularA

T

MTHFR mutants – Yeast growth rates with variousMTHFR mutations and [folate].(Jasper Rine)

Breast Cancer –Segregation of rare variants among 2500 cases and controls.(Sean Tavtigian)

PGP100 – Unpublished phenotypes from PGP100 project.

(George Church)

10

Census of Molecular Mechanismspossible mechanisms of variant impact for WTCCC SNVs

Wellcome Trust Case Control Consortium Nature. 2007;447(7145):661-78.

11

Contributors, Predictors, Assessorsan incomplete list of participants

Gad Getz

Pauline Ng

Sean Tavtigian

George ChurchMarc Greenblatt

Jasper RineRachel Karchin

Mauno Vihinen

12

Sample CAGI Timeline05

-03

05-1

0

05-1

7

05-2

4

05-3

1

06-0

7

06-1

4

06-2

1

06-2

8

07-0

5

07-1

2

07-1

9

07-2

6

08-0

2

08-0

9

08-1

6

08-2

3

08-3

0

09-0

6

09-1

3

09-2

0

09-2

7

10-0

4

10-1

1

10-1

8

10-2

5

11-0

1

11-0

8

11-1

5

11-2

2

11-2

9

12-0

6

12-1

3

12-2

0

12-2

7

01-0

3

01-1

0

01-1

7

01-2

4

01-3

1

Data Gathering

Prediction Season

Assessment

Key Dates▲ finalize data sources ▲ workshop

▲ release prospectus / rules▲ open participant registration

Dates are for illustration – exact dates have not been set.

13

CAGI Summary

➢ CAGI will:● objectively assess phenotype prediction methods● inform future research directions● introduce researchers in diverse fields

➢ CAGI is being planned for the end of 2010 or early 2011.

➢ Now seeking data contributors, assessors, and predictors.

➢ Feedback is sought! reece@berkeley.edu

➢ See http://genomecommons.org/cagi for more information.

14

15

The Genome Commons:A Flagship Project Within QB3

10 km

16

Reece HartChief ScientistUC Berkeley

Steven BrennerPlant & Mol. BiologyUC Berkeley

Sandrine DudoitBiostatisticsUC Berkeley

Robert NussbaumChief, Medical GeneticsUCSF

Jasper RineGenetics, Genomics & DevChair, Computational BiologyUC Berkeley

Lior PachterMathematicsMol., Cell, BiolUC Berkeley

Bernie LoDirector, Medical EthicsDepartment of MedicineUCSF

Rasmus NielsenoMichael I. JordanIan HolmesKimmen SjölanderYun SongMonty SlatkinTerry SpeedMark van der LaanRichard KarpBernd SturmfelsSteven EvansElizabeth PurdomHaiyan HuangPeter BickelSusan MarquseeMichael EisenLisa BarcellosRachel BremTom Alber

Program in Translational Genomics