+ All Categories
Home > Science > Phenopackets as applied to variant interpretation

Phenopackets as applied to variant interpretation

Date post: 15-Feb-2017
Category:
Upload: mhaendel
View: 19 times
Download: 4 times
Share this document with a friend
21
Making Phenotypic data FAIR++ for Disease Diagnosis and Discovery Findable Accessible outside paywalls and private data sources Attributable Interoperable and Computable, Reusable, exchangeable across contexts and disciplines @ontowonka elissa Haendel, PhD
Transcript
Page 1: Phenopackets as applied to variant interpretation

Making Phenotypic data FAIR++ for Disease Diagnosis and

DiscoveryFindable

Accessible outside paywalls and private data sources

Attributable

Interoperable and Computable,

Reusable, exchangeable across contexts and disciplines

@ontowonkaMelissa Haendel, PhD

Page 2: Phenopackets as applied to variant interpretation

Genes Environment Phenotypes+ =

Computable encodings are essential

Base pairsVariant notation (eg. HGVS)

Human Phenotype Ontology

Mammalian Phenotype Ontology

Medical procedure codingEnvironment Ontology

@ontowonka

Page 3: Phenopackets as applied to variant interpretation

Genes Environment Phenotypes

VCF PXFGFF

Standard exchange formats exist for genes …

but for phenotypes? Environment?

NEW

BED

@ontowonka

Page 4: Phenopackets as applied to variant interpretation

Problems with tabular formats

• Denormalized– Repetition of fields– Ad-hoc syntax for multi-values fields, nesting

• Proliferation– different formats generated for each use case• E.g. disease-phenotype, patient-phenotype, …

• Hard to extend– Not all phenotypes can be pre-packaged as a phenotype

term• E.g. Measurements, environments

• Ad hoc software, need standard libraries• Focus should be on the datamodel

Page 5: Phenopackets as applied to variant interpretation

Phenopackets for clinical labs

Patient and

family history

Diagnostic tests, clinical

phenotypes

Genomic informati

onPhysical

exam

Patient medical history

Clinical labs often get no phenotypes or one-line descriptions.

What if we could make the phenotype data PHI-free and simultaneously more descriptive?

Clinical testing lab

Page 6: Phenopackets as applied to variant interpretation

Phenopackets for journals

Each article can be associated with a

phenopacket

Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372

Each phenopacket can be shared via DOI in any repository outside paywall (eg. Figshare,

Zenodo, etc) and cited as a data

citation

Page 7: Phenopackets as applied to variant interpretation

Phenopackets for databases

Databases could share G2P data in a standardized format, retaining domain or species specificity

OMIA

Page 8: Phenopackets as applied to variant interpretation

Ontologies provide pre-packaged phenotype descriptions

Page 9: Phenopackets as applied to variant interpretation

A simple data model Entities–Organism• Patient• Non-human animal• Population

–Genetic/genomic element–Condition• Disease• Phenotype

Associations–E.g. between disease and phenotype–Each association has• Evidence• Provenance

Entity

Condition

association

Evidence

Disease Phenotype

Page 10: Phenopackets as applied to variant interpretation

PhenoPacket export formats

CSV JSON RDF OWL

Page 11: Phenopackets as applied to variant interpretation

monarchinitiative.org

title: "age of onset example"persons:- id: "#1" label: "Donald Trump" sex: "M"

phenotype_profile:- entity: "person#1" phenotype: types: - id: "HP:0200055" label: "Small hands" onset: description: "during development" types: - id: "HP:0003577" label: "Congenital onset" evidence: - types: - id: "ECO:0000033" label: ”Traceable Author Statement" source: - id: "PMID:1"

Image credits: upi.com

What does a PhenoPacket look like?

Canonical JSON format

Page 12: Phenopackets as applied to variant interpretation

Nesting allows refinementphenotype_profile: - entity: “#1” phenotype: types: - id: HP:0100024 label: conspicuously happy disposition onset: types: - id: HP:0011463 label: Adult onset description: “Writes distracting tweets”

header

entities

assocs

persons: - id: „#1“ label: Mickey Mouse date_of_birth: 1928-01-01 sex: M - id: „#2“ label: Goofy sex: M

patients.pxf

Page 13: Phenopackets as applied to variant interpretation

monarchinitiative.org

title: "measurement example, taken from genenetwork.org"organisms:- id: "#1" label: "BXD mouse population” taxon: NCBITaxon:10090phenotype_profile:- entity: "#1" phenotype: description: "cerebellum weight" types: - id: "PATO:0000128" label: "weight" measurements: - unit: mg value: 61.400 property_values: - property: standard_error filler: 2.38 attribute_of: types: - id: "UBERON:0002037" label: "cerebellum" onset: description: "measured in adults" types: - id: "MmusDv:0000061" label: "early adult"

Ontology ofStatisticalproperties

We can representpopulation phenotypes too

attribute

For non-abnormalphenotypes we canuse a trait ontology,or a building block approach, with• PATO• Uberon

Measured entity

UO

How does it handle measurements?

Page 14: Phenopackets as applied to variant interpretation

Example: pathogenicity for a variant)disease_profile:

- entity: CLINVAR:226213 disease: - id: NCIT:C4872 label: "Breast Carcinoma" interpretation: "pathogenic" contributors: - id: CLINGEN:Agent007 label: "Clinical Pathogenicity Calculator v1" created: "2016-07-12T11:00:59+00:00" method: - id: doi:10.1038/gim.2015.30 label: "ACMG ISV guidelines 2015" evidence: - id: CLINGEN:ev025 type: ECO:9000100 ('population frequency evidence') acmg_criterion: CLINGEN:vic008 ('ACMG v2015 PM2, absent from controls in population databases') description: "Variant is absent from a large cohort of non-finnish europeans (NFE) in the ExAC population database, with sequencing coverage of the variant exceeding 25X" outcome: "moderately supporting" supporting_reference: - id: PMID:27997510 supporting_data: - id: CLINGEN:PAF082A type: SEPIO:9000895 ('allele frequency data') value: "0" - id: CLINGEN:PAF082B type: SEPIO:9000846 ('median sequencing coverage data') value: "28X" - id: CLINGEN:PAF082C type: SEPIO:9000878 ('population ethnicity data') value: "non-finnish european”…....

header

entities

assocs

variants: - id: CLINVAR:226213 type: SO:0001483 ('single

nucleotide variant') label:"NM_007294.3(BRCA1):c.4677_5075del" positions:

- type: HGVS value:"NM_007294.3:c.4677_5075del"

Use GA4GH variant representation (Reece Hart leading)

http://bit.ly/variant-path-PXF

ClinGen (Larry Babb) collab

Page 15: Phenopackets as applied to variant interpretation

Complex phenotypes

Not every phenotype can be boiled down to a pre-packaged ontology term

PXF allows post-coordination / post-composition– E.g. ‘mild’, ‘severe’ qualifiers– Temporal qualifiers: start, end, acute/chronic, …– Specifying precise location of phenotype– On-the-fly composition of phenotypic descriptors from base ontologies

• Chemical entities• Cell types• GO• Anatomy

Additionally– Free text descriptions– Measurements / quantitative phenotypes– Environments (ongoing)

Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2

Page 16: Phenopackets as applied to variant interpretation

PXF and GA4GH Stack

PXF primary use case is as a file format GA4GH primary use case as an API Obviously these are related… ...But the devil is in the details– E.g. Is there a well-defined mapping between proto and

JSON? How can we better interoperate? Working to converge

(M. Diekhans)– Define PXF using ProtoBuf– What would a query API look like?

• As an exchange format, we don’t have to worry about this• Query APIs for complex data structures proliferate complexity• What is the overall GA4GH strategy here?

Page 17: Phenopackets as applied to variant interpretation

PXF, GA4GH, and other related activities

G2P– PXF extends initial implementation–Make PXF a FHIR resource

Metadata– Align how to reference ontology terms– Standardizing identifier prefixes

MME– PXF does not provide a search API– PXF subsumes phenotype profile representation

Beacon– PXF could be a response element

Page 18: Phenopackets as applied to variant interpretation

Summary: Phenotype Exchange Format

• One model, derive alternate concrete forms– YAML, JSON, RDF, TSV (subset)

• Species-agnostic– From microbes through plants through humans– clinical and basic research

• Applicable to a variety of entities– Patients/individual organisms, cohorts, populations– Diseases– Papers– Genes, genotypes, alleles, variants

• Simple for simple cases…– Bag of terms model

• …Incremental expressivity– Temporality and causality– Quantitative as well as qualitative– Negation, severity, frequency, penetrance, expressivity

• Ontology-smart– Rational Composition (post-coordination)– Explicit semantics

http://phenopackets.org

Page 19: Phenopackets as applied to variant interpretation

Phenopacket Tool ecosystem

• Non JVM language bindings– Python (beta)

• https://github.com/phenopackets/phenopacket-python/ – Javascript (alpha)

• https://github.com/phenopackets/phenopacket-js/ • Pxftools

– command line library, Scala utilities– https://github.com/phenopackets/pxftools

• PhenoPacketScraper– GSOC project to make phenopackets from case study articles– https://github.com/monarch-initiative/phenopacket-scraper-core

• OwlSim– Like blast, for phenotypes– https://github.com/monarch-initiative/owlsim-v3

• WebPhenote– Noctua extension for phenopacket creation– http://create.monarchinitiative.org

Page 20: Phenopackets as applied to variant interpretation

Acknowledgments• Chris Mungall

(schema/architecture)• Jules Jacobsen (java API)• James Balhoff (pxftools)• Jeremy Nguyen-Xuan (pxftools)• Seth Carbon (web phenote)• Kent Shefcheck (python API)• Matt Brush (modeling)• Dan Keith (web phenote)• Satwik Bhattamishra (GSOC

student, PhenoPacketScraper)

• Julie McMurry• Peter Robinson• Pier Buttigieg• Ramona Walls• Damian Smedley• Sebastian Kohler• Tudor Groza• Harry Hochheiser• Mark Diekhans• Melanie Courtot• Michael Baudis• Helen Parkinson• Suzanna Lewis

Page 21: Phenopackets as applied to variant interpretation

Recommended