Making Phenotypic data FAIR++ for
Disease Diagnosis and Discovery
Findable Accessible outside paywalls and private data sources Attributable Interoperable and Computable, Reusable, exchangeable across contexts and disciplines
Melissa Haendel, PhD @ontowonka
Biology central dogma
Genes Environment Phenotypes + =
Standards for encoding and exchanging data must be up to these challenges
@ontowonka
Computable encodings are essential
Genes Environment Phenotypes + =
Base pairs Medical procedure coding Human Phenotype Variant notation (eg. HGVS) Environment Ontology Ontology
Mammalian Phenotype Ontology
@ontowonka
Standard exchange formats exist for genes … but for phenotypes? Environment?
Genes Environment Phenotypes
VCF PXFGFF BED
@ontowonka
- - -
Ontologies provide pre-packaged phenotype descriptions
Köhler, S., Doelken, S. C., Mungall, C. J., … Robinson, P. N. (2013). The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. doi:10.1093/nar/gkt1026
Smith, C. L., & Eppig, J. T. (2015). Expanding the mammalian phenotype ontology to support automated exchange of high throughput mouse phenotyping data generated by large-scale mouse knockout screens, J Biomed Semantics. 2015; 6: 11 doi:10.1186/s13326 015 0009 1
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4378007/
A simple data model Entities
– Organism • Patient • Non-human animal • Population
– Genetic/genomic element – Condition
• Disease • Phenotype
Associations – E.g. between disease and phenotype – Each association has
• Evidence • Provenance
Entity
Evidence
Condition
association
Disease Phenotype
Phenopackets for clinical labs Patient medical history
Patient and family history
Diagnostic tests,
clinical phenotypes
Genomic information
Physical exam
Clinical testing lab
Clinical labs often get no phenotypes or one-line descriptions. What if we could make the phenotype data PHI-free and
simultaneously more descriptive?
Phenopackets for journals
Each phenopacket can be shared via
DOI in any repository outside paywall (eg. Figshare, Zenodo,
Each article can be associated with a
phenopacket
etc) and cited as a Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372 data citation
Phenopackets for biomedical
databases
OMIA
Databases could share G2P data in a standardized format, retaining domain or species specificity
Phenopackets for laypersons
• Dry eyes • Developmental delay • Elevated liver function
phenotype_profile: - entity: ”patient16"
phenotype: types: - id: "HP:0000522"
label: ”Alacrima" onset:
description: ”at birth" types:
- id: "HP:0003577" label: "Congenital onset"
evidence: - types:
- id: "ECO:0000033" label: ”Traceable Author Statement"
source: - id: ”
• Disease registries • Patient communities • Social media Image credits: ngly1.org https://twitter.com/examplepatient/status/1
23456789"
http:ngly1.orghttps://twitter.com/examplepatient/status/1
PhenoPacket formats
CSV JSON RDF OWL
Export phenopacket to
Simple Example (patient profile) phenotype_profile: - entity: “#1”
phenotype: types: - id: HP:0100024
label: conspicuously happy disposition - entity: “#1”
phenotype: types: - id: MP:0001284
label: absent vibrissae - entity: “#2”
phenotype: types:
- id: HP:0100024 label: conspicuously happy disposition
header
entities
assocs
persons: - id: „#1“
label: Mickey Mouse date_of_birth: 1928-01-01 sex: M
- id: „#2“ label: Goofy sex: M
patients.pxf
Nesting allows refinement phenotype_profile: - entity: “#1”
phenotype: types: - id: HP:0100024
label: conspicuously happy disposition onset: types:
- id: HP:0011463 label: Childhood onset
description: “welcomes strangers with open arms” - entity: “#1”
phenotype: types: - id: MP:0001284
label: absent vibrissae - entity: “#2”
phenotype: types:
- id: HP:0100024 label: conspicuously happy disposition
header
entities
assocs
persons: - id: „#1“
label: Mickey Mouse date_of_birth: 1928-01-01 sex: M
- id: „#2“ label: Goofy sex: M
patients.pxf
Simple Example (variants) variants.pxf
phenotype_profile: - entity: “var#1”
phenotype: types: - id: HP:0001595
label: Abnormality of the hair onset: types:
- id: HP:0011463 label: Childhood onset
description: “missing whiskers” …
header
entities
assocs
variants: - id: „var#1“
label: "c.2441+7A>G” descrHGVS: “c.2441+7A>G" startPosition: 0 endPosition: 0 …
Insert GA4GH module here
https://monarchinitiative.org/variant/ClinVarVariant:195890https://monarchinitiative.org/variant/ClinVarVariant:195890
Semantics with JSON-LD
{ "@context" : { "id": "@id", "label": "rdfs:label", "types": { "@id": "rdf:type", "@type": "@id"
}, "negated_types": { "@id": "owl:complementOf", "@type": "@id"
}, "title": "dc:title",
"dc": "http://purl.org/dc/terms/", "MP" : "http://purl.obolibrary.org/obo/MP_",
Provides a direct mapping to RDF
Allows reasoning to be performed on phenopackets
Provides a prefix map to unambiguously interpret CURIE-style identifiers (e.g. as recorded in PrefixCommons)
http://purl.obolibrary.org/obo/MPhttp://purl.org/dc/terms
Summary: Phenotype Exchange Format • One model, derive alternate concrete forms
– YAML, JSON, RDF, TSV (subset) • Species-agnostic
– From microbes through plants through humans – clinical and basic research
• Applicable to a variety of entities – Patients/individual organisms, cohorts, populations – Diseases – Papers – Genes, genotypes, alleles, variants
• Simple for simple cases… – Bag of terms model
• …Incremental expressivity – Temporality and causality – Quantitative as well as qualitative – Negation, severity, frequency, penetrance,
expressivity • Ontology-smart
– Rational Composition (post-coordination) – Explicit semantics
http://phenopackets.org
http://phenopackets.org/
Acknowledgments • Chris Mungall
(schema/architecture) • Jules Jacobsen (java API) • James Balhoff (pxftools) • Jeremy Nguyen-Xuan (pxftools) • Seth Carbon (web phenote) • Kent Shefcheck (python API) • Matt Brush (modeling) • Dan Keith (web phenote) • Satwik Bhattamishra (GSOC
student, PhenoPacketScraper)
• Julie McMurry • Peter Robinson • Pier Buttigieg • Ramona Walls • Damian Smedley • Sebastian Kohler • Tudor Groza • Harry Hochheiser • Mark Diekhans • Melanie Courtot • Michael Baudis • Helen Parkinson • Suzanna Lewis
Monarch Initiative NIH R24 OD011883
Phenopacket Tool ecosystem
• Non JVM language bindings – Python (beta)
• https://github.com/phenopackets/phenopacket-python/ – Javascript (alpha)
• https://github.com/phenopackets/phenopacket-js/ • Pxftools
– command line library, Scala utilities – https://github.com/phenopackets/pxftools
• PhenoPacketScraper – GSOC project to make phenopackets from case study articles – https://github.com/monarch-initiative/phenopacket-scraper-core
• OwlSim – Like blast, for phenotypes – https://github.com/monarch-initiative/owlsim-v3
• WebPhenote – Noctua extension for phenopacket creation – http://create.monarchinitiative.org
https://github.com/phenopackets/phenopacket-python/https://github.com/phenopackets/phenopacket-js/https://github.com/phenopackets/pxftoolshttps://github.com/monarch-initiative/phenopacket-scraper-corehttps://github.com/monarch-initiative/owlsim-v3http://create.monarchinitiative.org/
Making Phenotypic data FAIR++ for Disease Diagnosis and DiscoveryBiology central dogmaComputable encodings are essentialStandard exchange formats exist for genes … �but for phenotypes? Environment?�Slide Number 5A simple data modelPhenopackets for clinical labsPhenopackets for journalsPhenopackets for biomedical databasesPhenopackets for laypersonsPhenoPacket formatsSimple Example (patient profile)Nesting allows refinementSimple Example (variants)Semantics with JSON-LDSummary: Phenotype Exchange FormatAcknowledgmentsPhenopacket Tool ecosystem