Date post: | 13-Apr-2017 |
Category: |
Health & Medicine |
Upload: | mhaendel |
View: | 761 times |
Download: | 1 times |
???
Monarch is supported generously by: a NIH Office of the Director Grant #5R24OD011883 as well as byNCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman)
[email protected] @monarchinit
The Problem: Human genome is poorly annotated
A better understanding of human gene function and disease mechanisms is critical for diagnosis, precision medicine, and targeted therapies
The Approach: Monarch cross-speciesG2P Integration Pipeline
Ontologies Data Standards Curation andData Modeling
Algorithms Tools
The Solution: Leverage all the species data
Solve the cross-species language divide
www.monarchinitiative.org/sources
Acknowledgements and Contact Info
Palmoplantarhyperkeratosis
Thick hand skin
Ulcerated paws
MONARCH TEAM MAINTAINS
MONARCH TEAM CONTRIBUTESLEGEND
Data source OntologyBridgingOntology
PHEN
OTY
PES
DIS
EASE
S
MO
DEL
O
RGN
ISM
HU
MA
N
Community Ontology Term Phenotype
AN
ATO
MY
ClinVar
Coriell
CTD
Elem of Morph
Gene Reviews
GWAS
HPOAOMIMdb
Orphanet
KEGG
AnimalQTLDB
FlyBase
IMPC
MGI
MPD
OMIA
RGDWormBase
ZFIN
MeSHMedGen
OMIM
HPEFO
ORDO
VT
FBcv
ZP
WPMP
MONDO
UPheno
MA
ZFA
UBERON
FBbt
WACL
EMAPAMO
DEL
O
RGN
ISM
HU
MA
N
PROBLEM
Phenotypic language differs by organism and also by community, thus impeding integration
SOLUTION SOLUTIONMonarch integrates the data sources through bridging ontologies
PROBLEM
SOLUTION
PROBLEM
SOLUTION
SOLUTION
SOLUTION
SOLUTION
SOLUTION
The phenotypes are associated with very different aspects of the genotype in each data source.
The Challenge: Fragmented, heterogeneous G2P data
Musmgdmgd
mmrrcmmrrcmgimgi
animalqtldbanimalqtldb
Homo
cgdcgdclinvarclinvar
gwascataloggwascataloghpoahpoakeggkeggomimomim
orphanetorphanet
coriellcoriell
omiaomia
monarchmonarch-curated
Canis
Macaca
Panthera
Equus
Ovis
Danio
zfinzfin
Gallus
Sula
Vulpes
AnasCoturnixPeromyscus
Tragelaphus
other
>100SPECIES
Bos
Sus
0%
40%
60%
80%
100%
Humanonly
Human +other
20%
The phenotypic consequences of mutation for the human coding genome are <20%; inclusion of orthologs from other species boosts this number to over 80%
We learn about different phenotypes from different species, and want to useall this data
Improve data quality and interoperability
Evidence and provenance for G2P associations is incomplete, not computable, and frequently conflated. This hampers integration and pathogenicity determination.
Disentangle these concepts, and model data to make it computable.
PROBLEMS SOLUTIONS
https://mme.monarchinitiative.org
github.com/ga4gh/schemas
Diagnosing rare diseases requires identifying similar patients and models Monarch integrated cross-species data available on pa-tient matchmaker exchange.
Data models for modeling any bio-logical database source expecially G2P sources are highly heterogene-ous.
Data are insufficiently described to understand what they are or how they were produced.
Monarch integrated cross-species data available on patient matchmaker exchange
Monarch is contributing GA4GH Schemas to bridge the heterogeneous G2P sources
HCLS provides a guide to indicate what are the essential metadata, and how to express it. Monarch was a key contributor toward this community effort and is testing the model for all sources in its corpus
Compute over diseases, phenotypes, modesto diagnose diseases
PhenoGrid
http://www.sanger.ac.uk/science/tools/exomiser
http://patientarchive.org/
Exomiser
https://www.npmjs.com/package/phenogrid
Whole exome
Remove off-target and common variants
Variant score from allele freq and pathogenicity
Phenotype score from phenotypic similarity
PHIVE score to give final candidates
Mendelian filters
Combine genotype and phenotype data for variant prioritization
Visualize phenotype profile comparisonsBetween patients and... - Other patients - Known diseases - Models
Embeddable 3rd party widget for data resources
PhenoTua / Noctua
Uniquely identify a model or disease
Check organism/genotype nomenclature
Choose terms from any phenotype ontology
Provide evidence
Edit collaboratively, group sharing
View in two modalities: - Ontology smart spreadsheet - Graphical Causal Networks
HPO Pubmed Browser
Curate causal networks between genes, genotypes, phenotypes, diseases, using organism-agnostic standardized owl models
http://create.monarchinitiative.org/
Check Annotation Sufficiency
Automated extraction of Human Phenotype Ontology concepts from free text clinical summaries.Intuitive visualization of patient phenotype profiles and diagnoses.Immediate visual feed-back on phenotype profiles using the Monarch annotation sufficiency score.Fine-grained patient sharing access control.Encrypted patient sensitive data - yet with the possibility of searching over this data.
Visualize and Browse Relationships
Finding literature relevant to a set of phenotypesshould be easy.
http://pubmed-browser.human-phenotype-ontology.org/
Zemojtel, T. et al. Effective diagnosis of genetic disease by computation-al phenotype analysis of the disease-associated genome. Science Trans-lational Medicine Vol. 6, Issue 252, pp. 252ra123 (11 diagnosed fami-lies)
Pippucci, T. et al. A novel null homozygous mutation confirms CAC-NA2D2 as a gene mutated in epileptic encephalopathy. PLoS One 8, e82154 (2013). (1 diagnosed family)
Requena, T. et al. Identification of two novel mutations in FAM136A and DTNA genes in autosomal-dominant familial Meniereʼs disease. Human Molecular Genetics. 24, 1119–26 (2015). (2 diagnosed families)
Bone, W. et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genetics in Medicine. In press (2015). doi:10.1038/gim.2015.137 (4 diagnosed families)
18 P
ublis
hed
Dia
gnos
es
www.monarchinitiative.org
www.owlsim.org
Patient X
Disease Y Model Z
Make causal relationships computable:Improve modeling of evidence and provenance
owlsim
http://brcaexchange.org/
Providence Evidence Claim
- Data (eg: images, sequences) - Evidence codes - Publications - Statistical confidence (p-val, z-score) - Summary figures - Conclusions from previous studies - Tacit knowledge of a domain expert
- types of assay/technique/study or instances thereof - agent(s) who produced evidence - agent(s) who asserted the claim - time and place - materials (e.g. models systems, reagents, instruments)
Process historyKey participants in process Outputs of process
http://tinyurl.com/brca-g2phttp://tinyurl.com/acmg-guidelines
- Causal relationships, hypothesizedrelationships, coorelations etc.
Fuzzy matching between patients, phenotypes, and diseases
Problem: It is difficult to prioritize candidate genes for diagnosis, or identifying model that best capitulates a disease
Compute similarity of phenotypic profiles
Graph-based semantic similarity
PROBLEM SOLUTION
Researchers donʼt know when their phenotyping is sufficient to be useful beyond their specialized community
Clinicians donʼt know when their phe-notyping is sufficient for diagnosis
Compare patient or organism phenotypic profile against all known diesases and genotypes. Get feedback in real time.
http://tinyurl.com/phenotypesufficiency
https://monarchinitiative.org/page/services
patientarchive
? ? ? ? ?
patientarchive
PROBLEMS SOLUTIONS
Problems with identifier design and provision result in link rot and content drift therefore com-promising the flow and integrity of information.
Identifiers must resolve, and when referenced in the same context must not collide. Prefixes play a critical role in these two goals; however, due to confusion and inconsistency about prefixes, a single identifier can be referenced multiple differ-ent ways: 12345, MGI:12345, MGI:MGI:12345, MGI:MGI_12345, thus complicating determina-tions of equivalence and data integration.
Moreover prefixes used in the same context can conflict (eg. GEO).
Monarch is a key contributor to identifier standards for big data integration
10 Simple Rules for Design and Provision of Life Science Database Identifiers for the Web
Monarch is leading a community effort to coordinate prefixes between the eight active prefix registries
JDDCP
prefix commons
zenodo.org/record/31765
github.com/prefixcommons
health care &life sciences
w3.org/TR/hcls-dataset/
MENDELIAN DISEASES
3,462OMIM ?
47,964VARIANTS
CLINVAR
with no known genetic basis with no known diseases
1 Oregon Health & Sciences University; Portland, OR • 2 Lawrence Berkeley National Lab, Berkeley, CA • 3 University of Pittsburgh, Pittsburgh, PA • 4 University of California San Diego, San Diego, CA • 5 Garvan Institute, Sydney, Australia • 6 Sanger Center, Hinxton, UK • 7 Charite
From Model Mechanism to Precision Medicine:an Open Science Integrated Genotype-Phenotype Platform
Nicole Vasilevsky1, Nicole Washington2, Chuck Borromeo3, Matthew Brush1, Seth Carbon2, Michael Davis3, Nathan Dunn2, Mark Englestad1, Jeremy Espino3, Shahim Essaid1, Jeffrey Grethe4, Tudor Groza5, Harry Hochheiser3, Sebastian Köhler6, Suzanna Lewis2, Julie McMurry1, Craig McNamara5, Chris Mungall2, Jeremy Nguyen Xuan2, Peter Robinson7, Kent Shefchek1, Damian Smedley6, Zhou Yuan3, Edwin Zhang5, Melissa Haendel1,
Human Disease: HADZISELIMOVIC
SYNDROME
mouse model:b2b1035Clo
(aka Blue Meanie)
tricuspid valve atresiaMP:0006123
prenatal growth retardation
MP:0010865
persistent truncus arteriosis
MP:0002633
cleft palateMP:0000111 1
Ventricular hypertrophy
HP:0001714
High-arched palate
HP:0000156
Failure to thrive HP:0001508
Pulmonary artery atresia
HP:0004935
Renal hypoplasia
HP:0000089
abnormal kidney
morphology
abnormal palate
morphology
growth deficiency
Malformation of the heart and great vessels
abnormal heart and
great artery attachment
duplex kidney MP:0004017
common(UPheno)