+ All Categories
Home > Science > Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

Date post: 12-Apr-2017
Category:
Upload: alexandre-riazanov
View: 300 times
Download: 0 times
Share this document with a friend
41
COMPREHENSIVE SELF-SERVICE LIFE SCIENCE DATA FEDERATION WITH SADI SEMANTIC WEB SERVICES AND HYDRA Alexandre Riazanov, CTO IPSNP Computing Inc Oslo University, Sep 23, 2015
Transcript
Page 1: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

COMPREHENSIVE SELF-SERVICE

LIFE SCIENCE DATA FEDERATION

WITH SADI SEMANTIC WEB SERVICES

AND HYDRA

Alexandre Riazanov, CTOIPSNP Computing Inc

Oslo University, Sep 23, 2015

Page 2: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

WHO WE ARE

• IPSNP Computing Inc -- a Canadian startup, building on and commercializing prior academic research on SADI.

• Founded to develop an industrial strength query tool for SADI, to supercede a research proof-of-concept prototype.

• Looking for customers/partners and investors.

Page 3: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

BIOMEDICAL RESEARCHERS AND CLINICIANS USE DATA FROM MULTIPLE SOURCES

• Online and in-house databases, spreadsheets.

• Web services, e.g., literature search, etc.

• Nomenclatures, ontologies, controlled vocabularies.

• Web sites, scientific publications, patents, etc.

• Algorithms, e.g., BLAST, molecular structure prediction, various text mining programs, etc.

Page 4: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

BIG VISION: FEDERATED QUERYING OF HETEROGENEOUS AND DISTRIBUTED DATA SOURCES

• We want to query 1000s of data sources as a single database.

• We want more agility than datawarehousing can provide: e.g., just-in-time algorithm execution, plug-and-play data source addition, live data querying.

• We want to use simple and declarative queries, not to program workflow scripts.

Page 5: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

IS THIS SCI-FI?

Page 6: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

WE CAN ACTUALLY DO THIS WITH SEMANTIC WEB SERVICES

Here is how our data federation engine HYDRA works:

Page 7: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

HOW IS THIS ALL POSSIBLE?

• Key ingredient: the SADI framework for Semantic Web services (Semantic Automated Discovery and Integration).

• SADI services are: • RESTful services• consuming and producing one format -- RDF,• with semantic descriptions (in OWL) fully defining

their functionality.

Page 8: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

PLAN OF THE TALK

• What are SADI services?

• Automatic service discovery and invocation in query engines (HYDRA).

• Self-service querying vision.

• Query composition with HYDRA GUI.

• An overview of Bioinformatics and Clinical Intelligence case studies.

Tons of screenshots!

Page 9: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

SADI SERVICE I/O

• Input: RDF description of an input object.

• Output: another RDF graph providing more (computed or retrieved) info about the input object or linking it to other objects.

• Since all SADI services “talk the same language” (RDF), they are 100% syntactically interoperable:– output of one SADI service can be directly

consumed by any other SADI services.

Describe your input, and I will tell you something else about it”

Page 10: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

COMPLETE SEMANTIC DESCRIPTIONSOF SERVICE FUNCTIONALITY

• SADI services carry semantic descriptions of their I/O that completely define what the service expects and can accept as input, and what RDF assertions the service can output.

• Unique and extremely powerful property: it facilitatescompletely automatic discovery

and orchestration of services.

Page 11: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

HYDRA QUERY ENGINE

● Given a SPARQL query, HYDRA analyses it by using an intelligent logic-based algorithm (proprietary, unlike SADI itself).

● HYDRA requests descriptions of potentially useful services from available SADI service registries.

● HYDRA processes the descriptions and figures out which services have to be invoked, on what data and in what order.

SPARQL is a W3C standard semantic query language -- much more intuitive than SQL.

Page 12: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

QUERY EXAMPLE

• Find documents mentioning "haloalkane dehalogenase activity", extract information about mutations and visualise the mutations on 3D protein structure images.

• HYDRA automatically finds and orchestrates 5 services from our registry:– PubMed search: keyword query ⟶ document PubMed IDs– PDF retrieval: PubMed ID ⟶ PDF file URL– ASCII extraction: PDF file ⟶ ASCII text– Text mining: ASCII text ⟶ mutation info– Visualisation: mutation & protein ⟶ 3D image (Jmol)

Page 13: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

RESULTS

Deploying mutation impact text-mining software with the SADI Semantic Web Services frameworkhttp://www.biomedcentral.com/qc/1471-2105/12/S4/S6

Page 14: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

WHAT IS SO COOL ABOUT IT?

• Data federation at its best:

– independent, heterogeneous data sources (PubMed doc search, PubMed Central for PDFs);

– not only data is integrated: ASCII extraction, text mining and 3D visualisation are algorithms!

• Execution is completely automatic: HYDRA finds and invokes the services without any help from the user.

Page 15: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

MORE QUERY EXAMPLES

• Find drug products that contain active ingredient X.• Find drugs that have been studied in clinical trials targeting

infections caused by bacteria X.• Annotate a DNA sequence X with molecular functions of

proteins produced by the corresponding gene.

• Find patients with precondition X diagnosed with infections Y resulting from procedure Z.

• Many many other questions that Life Scientists and Clinicians ask on a daily basis.

Page 16: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

IT’S ONLY ½ OF THE STORY

Page 17: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

REMEMBER THE BIG VISION?

Page 18: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

HERE IS AN EVEN BIGGER VISION:Self-service ad hoc querying of federated data.

Page 19: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

HYDRA IMPLEMENTS SEMANTIC QUERYING

• Users need not know how the source data is organised or accessed.

• They just need to know the terminology of their subject domain.

• Queries are completely declarative: specify what you want to find, not how.

Page 20: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

HYDRA ALSO SUPPORTS CONCEPT HIERARCHIES AND RULES

● Some queries would be too complex if we could not exploit generality:o a query concerning all antibiotics requires

generalisation, otherwise all types of antibiotics would have to be enumerated in the query.

● Much better way to do this is to import a classification of drugs and use it in query execution.

● HYDRA facilitates such reasoning and even more complex reasoning with rules.

Page 21: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

THERE ARE NO PRINCIPLE OBSTACLES TO SELF-SERVICE QUERYING

We just need an adequate user interface for building queries.

Page 22: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

HYDRA QUERY TOOL = ENGINE + GUI

Page 23: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

QUERY COMPOSITION

Queries built based on entry of “Google-like” keyphrases:

Keyphrase: “document mentions protein “P22607”

Page 24: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

A QUERY GRAPH IS GENERATED FOR THE KEYPHRASE

“document mentions protein “P22607””

Page 25: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

Keyphrase: “has pubmed id”:

ADDING ANOTHER KEYPHRASE

Page 26: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

QUERY GRAPH IS EXTENDED WITH NODES CORRESPONDING TO THE SECOND KEYPHRASE

Keyphrase: “has pubmed id”Keyphrase: “document mentions protein “P22607”

Page 27: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

OPTION 2: MANUALLY ADD/DELETE CLASSES, INCOMING AND OUTGOING PROPERTIES

Page 28: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

MANUALLY ADDED PROPERTY

Page 29: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

FINISHED QUERY: FIND PUBMED IDS OF DOCUMENTS MENTIONING PROTEIN P22607 AND CO-MENTIONED PROTEINS

Page 30: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

SERVICES IN THE REGISTRY

Page 31: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

SPARQL GENERATION

Page 32: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

QUERY EXECUTION WITH THE HYDRA ENGINE

Page 33: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

EXPORTED RESULTS IN AN EXCEL SPREADSHEET

Page 34: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

SADI AND HYDRA QUERY TOOL

AT WORK

Page 35: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

BIOINFORMATICS AND CHEMINFORMATICS CASE STUDIES AND PILOTS WITH SADI AND HYDRA

• Integrating genomics text mining results with online biomedical data and visualisation algorithms.

• Integrating programs for lipid molecule structural analysis and classification.

• Interpreting toxicity experiment data by discovering relevant info in online databases.

• Large-scale retrieval of toxicity information from publications.

Page 36: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

INTERPRETING TOXICITY EXPERIMENT DATA

• Partner: university lab studying effects of environmental pollutants.

• Querying needs: finding relevant prior experiments, gene annotation, protein domain annotation, etc.

• Data sources: ArrayExpress, BLAST, HMMER3, RefSeq, Pfam, ORFPredictor, GO, UniProt, NCBI Taxonomy -- all queried as a single DB!

Page 37: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

SUBTASK: DNA MICROARRAY ANNOTATION

• Toxicity experiments with microarrays: which DNA sequences are under/overexpressed after organism’s exposure to toxin X?

• Interpretation requires knowing affected protein functions and domains.

• HYDRA virtually implements this workflow:

Page 38: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

RETRIEVAL OF TOXICITY DATA FROM PUBLICATIONS

• Customer: government agency (Canada).

• Querying needs: online publication search by organism and chemical types, text-mining for toxicity data.

• Data sources: NCBI Taxonomy and ChEBI with free-text search, PubMed search, electronic libraries, journal Web sites, Google Scholar, specialised text-mining algorithm, text utilities.

Apparent value: some queries save many man-weeks of work of a postdoc.

Page 39: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

CLASSIFYING NEW LIPID MOLECULES

• One of the early experiments with SADI.• A group in Carleton U. had a program for

identifying functional groups in a molecule structure.

• A group in U. of New Brunswick had a classifier estimating lipid classes based on presence/absence of functional groups.

• Publishing the prototypes as SADI services allowed us to integrate them with each other and relevant external resources.

Page 40: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

CLINICAL IT CASE STUDIES AND PILOTS WITH SADI AND HYDRA

• Ad hoc querying of clinical data for Hospital Acquired Infections surveillance and research (with UNB, McGill SoM and Ottawa H.)

• On-going pilot with a US hospital.

• Looking for pilot opportunities for Clinical Trial Cohort selection:• trial eligibility criteria can be implemented as queries

over heterogeneous and distributed clinical data;• benefits: cost reduction and timely alerts.

Page 41: Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

THANK YOU!

Further materials/services are available on request:• Live and recorded demos.

• Publications on previous (academic) case studies.

• Training/consulting.

• http://ipsnp.com/ (Canada) and http://ipsnp.co/ (UK)


Recommended