Semantic WikiPathwaysAndra Waagmeester
Department of Bioinformatics - BiGCaT 2
Overview
• Introduction to WikiPathways• GPML• Pathway to RDF conversion• Vocabularies used• Obtaining identifiers URIs pathways and
pathway elements• Example queries• Identifier mapping
Department of Bioinformatics - BiGCaT 3
WikiPathways
• Curated by the community• Built on top of Mediawiki framework• Live demo
Department of Bioinformatics - BiGCaT 4
GPML - GenMAPP Pathway Markup Language• XML-based format.• XML representation of GenMAPP MAPP
format• Recognised by WikiPathways, Pathvisio
and Cytoscape
Department of Bioinformatics - BiGCaT 5
Department of Bioinformatics - BiGCaT 6
Pathway to RDF conversion
Department of Bioinformatics - BiGCaT 7
WP to RDF conversion
Department of Bioinformatics - BiGCaT 8
Vocabularies used
• Internal (vocabularies.wikipathways.org)– GPML– WP
• External– Biopax3– Bibo– Foaf– tbd
Department of Bioinformatics - BiGCaT 9
Identifiers.org
• Providing resolvable persistent URIs.• Based on Miriam
(http://www.ebi.ac.uk/miriam)• Textual Pathway identifiers (xrefs)
converted to identifiers.org uri through the miriam api
Department of Bioinformatics - BiGCaT 10
Example query (1)
List pathways and their speciesSELECT DISTINCT ?organism ?labelWHERE { ?concept wp:organism ?organism . ?organism rdfs:label ?label . }
http://sparqlbin.com/#4f91f23a032a87cc46c5a526a6da673a
Department of Bioinformatics - BiGCaT 11
Example query (2)
List all datasource currently captured in WikiPathways and count
the number of entries per data source:
SELECT DISTINCT ?datasource count(?datasource) as ?numberEntries
WHERE {
?concept dc:source ?datasource .
}
ORDER BY DESC(?numberEntries)
http://sparqlbin.com/#a86569fa8a5ae4b004fdf2432b3ce98c
Department of Bioinformatics - BiGCaT 12
Example query (3)
Extract the amount of pathways edited per contributorSELECT DISTINCT ?contributor COUNT(?pathway) as ?pathwaysEdited
WHERE {
?pathway dc:contributor ?contributor
}
ORDER BY DESC(?pathwaysEdited)
Department of Bioinformatics - BiGCaT 13
Example query nExtracting Inchi out of ChEMBLSELECT ?concept ?inchi ?pathway ?identifier
WHERE {
?concept dcterms:isPartOf ?pathway .
?concept dc:source "ChEMBL compound"^^xsd:string .
?concept dc:identifier ?identifier .
SERVICE <http://rdf.farmbio.uu.se/chembl/sparql>{
?s rdfs:label ?chidentifier .
?s <http://www.blueobelisk.org/chemistryblogs/inchi> ?inchi .}
FILTER (xsd:string(?chidentifier)= xsd:string(?identifier)) .
}
Department of Bioinformatics - BiGCaT 14
Federated query
• Required unified URIs• Identifier mapping
Department of Bioinformatics - BiGCaT 15
Identifier mapping
• Data enrichment• Query expansion• Unified identifier conversion
Department of Bioinformatics - BiGCaT 16
Data enrichment
Department of Bioinformatics - BiGCaT 17
Unified Identifiers
Department of Bioinformatics - BiGCaT 18
Query expansion
Department of Bioinformatics - BiGCaT 19
Acknowledgements
• the Innovative Medicines Initiative Joint (n°115191,
FP7/2007-2013)
• The NIH National Institute for General Medical Sciences
(R01-GM100039)
Department of Bioinformatics - BiGCaT 20
Acknowledgement
Gladstone Institute
•Alex Pico
Maastricht University
•Martina Kutmon
•Egon Willighagen
•Chris Evelo
Manchester University
•Alasdair Gray
•Christian BreninkMeijer
Department of Bioinformatics - BiGCaT 21
Resources:
• WikiPathways: www.wikipathways.org
• Pathvisio: www.pathvisio.org
• RDF: rdf.wikipathways.org
• SPARQL: sparql.wikipathways.org
• Vocabularies:vocabularies.wikipathways.org
• BridgeDb: www.bridgeDb.org
• IMS: ondex2.cs.man.ac.uk:9090/QueryExpander/)