+ All Categories
Home > Technology > Practical semantics in the pharmaceutical industry - the Open PHACTS project

Practical semantics in the pharmaceutical industry - the Open PHACTS project

Date post: 10-May-2015
Category:
Upload: orcid-0000-0002-2668-4821
View: 2,979 times
Download: 1 times
Share this document with a friend
Description:
The information revolution has transformed many business sectors over the last decade and the pharmaceutical industry is no exception. Developments in scientific and information technologies have unleashed an avalanche of content on research scientists who are struggling to access and filter this in an efficient manner. Furthermore, this domain has traditionally suffered from a lack of standards in how entities, processes and experimental results are described, leading to difficulties in determining whether results from two different sources can be reliably compared. The need to transform the way the life-science industry uses information has led to new thinking about how companies should work beyond their firewalls. In this talk we will provide an overview of the traditional approaches major pharmaceutical companies have taken to knowledge management and describe the business reasons why pre-competitive, cross-industry and public-private partnerships have gained much traction in recent years. We will consider the scientific challenges concerning the integration of biomedical knowledge, highlighting the complexities in representing everyday scientific objects in computerised form. This leads us to discuss how the semantic web might lead us to a long-overdue solution. The talk will be illustrated by focusing on the EU-Open PHACTS initiative (openphacts.org), established to provide a unique public-private infrastructure for pharmaceutical discovery. The aims of this work will be described and how technologies such as just-in-time identity resolution, nanopublication and interactive visualisations are helping to build a powerful software platform designed to appeal to directly to scientific users across the public and private sectors.
Popular Tags:
46
Practical semantics in the pharmaceutical industry - the Open PHACTS project Antony Williams On behalf of the Open PHACTS Team (and with a focus on Chemistry!)
Transcript
Page 1: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Practical semantics in the pharmaceutical industry - the Open PHACTS project

Antony Williams

On behalf of the Open PHACTS Team

(and with a focus on Chemistry!)

Page 2: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Fundamental issue:

There is a LOT of science online!

Chaotic, varying quality and very valuable!

Scientists want to find information quickly and easily

Often they just “can’t get there” (or don’t even know where “there” is)

And you have to manage it all (or not)

Page 3: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Pre-competitive Informatics:Pharma are all accessing, processing, storing & re-processing external research data

LiteraturePubChem

GenbankPatents

DatabasesDownloads

Data Integration Data AnalysisFirewalled Databases

Repeat @ each

companyx

Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944

Page 4: Practical semantics in the pharmaceutical industry - the Open PHACTS project

The Project

Innovative Medicines Initiative• EC funded public-private

partnership for pharmaceutical research

• Focus on key problems– Efficacy, Safety,

Education & Training, Knowledge Management

The Open PHACTS Project• Create a semantic integration hub (“Open

Pharmacological Space”)…• Delivering services to support on-going drug

discovery programs in pharma and public domain• Not just another project; Leading academics in

semantics, pharmacology and informatics, driven by solid industry business requirements

• 23 academic partners, 8 pharmaceutical companies, 3 biotechs INITIALLY

• Work split into clusters:• Technical Build • Scientific Drive• Community & Sustainability

Page 5: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Major Work Streams

Build: OPS service layer and resource integration

Drive: Development of exemplar work packages & Applications

Sustain: Community engagement and long-term sustainability

Assertion & Meta Data MgmtTransform / TranslateIntegrator

OPS Service Layer

Corpus 1

‘Consumer’Firewall

SupplierFirewall

Db 2

Db 3

Db 4

Corpus 5

Std PublicVocabularies

TargetDossier

CompoundDossier

PharmacologicalNetworks

BusinessRules

Work Stream 1: Open Pharmacological Space (OPS) Service LayerStandardised software layer to allow public DD resource integration− Define standards and construct OPS service layer− Develop interface (API) for data access, integration

and analysis− Develop secure access models

Existing Drug Discovery (DD) Resource Integration

Work Stream 2: Exemplar Drug Discovery Informatics toolsDevelop exemplar services to test OPS Service Layer Target Dossier (Data Integration)Pharmacological Network Navigator (Data Visualisation)Compound Dossier (Data Analysis)

Page 6: Practical semantics in the pharmaceutical industry - the Open PHACTS project

ChEMBL DrugBankGene

OntologyWikipathways

UniProt

ChemSpider

UMLS

ConceptWiki

ChEBI

TrialTrove

GVKBio

GeneGo

TR Integrity

“Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM”

“What is the selectivity profile of known p38 inhibitors?”

“Let me compare MW, logP and PSA for known oxidoreductase inhibitors”

Page 7: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Number sum Nr of 1 Question

15 12 9 All oxidoreductase inhibitors active <100nM in both human and mouse

18 14 8Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?

24 13 8Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives.

32 13 8 For a given interaction profile, give me compounds similar to it.

37 13 8The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X.

38 13 8Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not).

41 13 8

A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature.

44 13 8 Give me all active compounds on a given target with the relevant assay data

46 13 8Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)

59 14 8 Identify all known protein-protein interaction inhibitors

Business Question Driven Approach

Page 8: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 9: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Open PHACTS Scientific Services

Platform Explorer

Standards

Apps

API

“Provenance Everywhere”

Page 10: Practical semantics in the pharmaceutical industry - the Open PHACTS project

RDFNanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices

Identity Resolution

Service

Chemistry RegistrationNormalisation & Q/C

IdentifierManagement

Service

Indexing

Co

re P

latf

orm

P12374EC2.43.4

CS4532

“Adenosine receptor 2a”

RDF

VoID

Db

RDFNanopub

Db

VoID

RDF

Db

VoID

RDFNanopub

VoID

Public Content Commercial

Public Ontologies

User Annotations

Apps

Page 11: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 12: Practical semantics in the pharmaceutical industry - the Open PHACTS project

RDF/VoIDRDF (Resource Description Framework)VoID (Vocabulary of Interlinked Datasets)

– Metadata describing the RDF– Describes how Datasets are linked using Linksets

• skos:exactMatch (Simple Knowledge Organisation System)E.g. To link compounds in OPS with compounds in ChEBI.• skos:closeMatch E.g. To link Stereo Insensitive Parents to their Children within OPS.• skos:relatedMatch E.g. To link Parent compounds that contain others as Fragments.• dul:expresses (DOLCE+DnS Ultralite) – describes what links the Datasets. We

use Cheminf to express the links E.g. http://semanticscience.org/resource/CHEMINF_000059 represents an InChIKey.

– Recommendations on how to create the VoID have been specified by Manchester here: http://www.cs.man.ac.uk/~graya/ops/2012/ED-datadesc/

Page 13: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Chemistry RegistrationNormalisation & Q/C

Chemistry Registration

• Old chemistry registration system uses standard ChemSpider deposition system: includes low-level structure validation and manual curation service by RSC staff.

• New Registration System• Utilizes ChemSpider Validation and

Standardization platform including collapsing tautomers

• Utilizes FDA rule set as basis for standardizations

• Generate Open PHACTS identifier (OPS ID)

Page 14: Practical semantics in the pharmaceutical industry - the Open PHACTS project

STANDARD_TYPE   UNIT_COUNT---------------- -------AC50                  7 Activity         421 EC50                 39 IC50                 46 ID50                 42 Ki                   23 Log IC50             4 Log Ki               7 Potency              11 log IC50             0 

STANDARD_TYPE      STANDARD_UNITS     COUNT(*)------------------ ------------------ --------IC50               nM                   829448 IC50               ug.mL-1               41000 IC50                                     38521 IC50               ug/ml                  2038 IC50               ug ml-1                 509 IC50               mg kg-1                 295 IC50               molar ratio             178 IC50               ug                      117 IC50               %                       113 IC50               uM well-1                52 

~ 100 units

>5000 types

Implemented using the Quantities, Dimension, Units, TypesOntology (http://www.qudt.org/)

Quantitative Data Challenges

Page 15: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Content Changes Regularly! POINT IN TIME

Source Initial Records Triples Properties

ChEMBL 1,149,792 ~1,091,462 cmpds ~8845 targets

146,079,194 17 cmpds13 targets

DrugBank 19,628~14,000 drugs ~5000 targets

517,584 74

UniProt 536,789 156,569,764 78

ENZYME 6,187 73,838 2

ChEBI 35,584 905,189 2

GO/GOA 38,137 24,574,774 42

ChemSpider/ACD 1,194,437 161,336,857 22 ACD, 4 CS

ConceptWiki 2,828,966 3,739,884 1

Page 16: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Infrastructure

Hardware (development)- 2 x Intel Xeon E5-2640 - 384 GB DDR3 1333MHz RAM- 1.5 TB SSD - 3TB 7200rpm

Triple Store- Virtuoso 7 column store- Shown to scale to > 100 billion

triples

Network- AMX-IS- Extensive memcache

Page 17: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Antony Williams vs Identifiers

Passport ID

Dad, Tony, others

SSN

Green Card

License5 email addressesChemConnector (blog, Twitter account, Facebook, Friendfeed)OpenID, ORCID….

Page 18: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 19: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 20: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 21: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 22: Practical semantics in the pharmaceutical industry - the Open PHACTS project

P12047X31045

P120

47

GB:29384RS

_2353

Let a Mapping Service take the strain….

Page 23: Practical semantics in the pharmaceutical industry - the Open PHACTS project

PubChemDrugbankChemSpider

Imatinib

Mesylate

What Is Gleevec?

Page 24: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Strict Relaxed

Analysing Browsing

Dynamic Equality

LinkSet#1 { chemspider:gleevec hasParent imatinib ... drugbank:gleevec exactMatch imatinib ...}

chemspider:gleevec drugbank:gleevec

Page 25: Practical semantics in the pharmaceutical industry - the Open PHACTS project

ChemSpider Validation & Standardization Platform

Quality Assurance

Page 26: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Chemistry Validation and Standardization Platform (CVSP)

at cvsp.chemspider.com• Validation• Standardization• Parent generation

RDF Export

Data

Page 27: Practical semantics in the pharmaceutical industry - the Open PHACTS project

CTABREGID1DataSourceSynonym1Synonym2XRef1etc

DepositedSDF record Standardized

entity

OPS_ID1

Parents

Charge Parent (OPS_ID7)

Isotope Parent (OPS_ID5)

Stereo Parent (OPS_ID4)

Tautomer Parent (OPS_ID6)

Super Parent (OPS_ID8)

Fragment (OPS_ID3)

Fragment (OPS_ID2)

Page 28: Practical semantics in the pharmaceutical industry - the Open PHACTS project

For each Compound (CSID) parent generation is attempted: “Tautomerism in large databases”, Sitzmann and others, J.Comput Aided Mol Des (2010)

Parent Description

Charge-Unsensitive

An attempt is made to neutralize ionized acids and bases. Envisioned to be an ongoing improvement while new cases appear.

Isotope-Unsensitive

Isotopes replaced by common weight

Stereo-Unsensitive Stereo is stripped

Tautomer-Unsensitive

Tautomer canonicalization is attempting to generate a “reasonable” tautomer

Super-Unsensitive This parent is all of the above

Page 29: Practical semantics in the pharmaceutical industry - the Open PHACTS project

O H

O

O H

O

O–

O

Na+

Na+

O

O–

O

O–

OPS1

O–

ONa

+

DrugBank ID DB07241

OPS5OPS4

OPS3

OPS2

OPS6

ops:OPS1 skos:exactMatch <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB07241> .

ops:OPS2 skos:relatedMatch ops:OPS1 .

ops:OPS3 skos:relatedMatch ops:OPS1 .

ops:OPS3 skos:closeMatch ops:OPS4 .

ops:OPS3 skos:closeMatch ops:OPS5 .

ops:OPS4 skos:closeMatch ops:OPS6 .

ops:OPS5 skos:closeMatch ops:OPS6 .

Page 30: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 31: Practical semantics in the pharmaceutical industry - the Open PHACTS project

A Precompetitive Knowledge Framework

Integration

Pharma Needs

Inputs

Sustainability

StabilitySecurity

Management /

Governance Data Mining

Services/Algorithms

Mapping & Populating

Architecture

Interfaces & Services

ContentStructured

& Unstructure

d

Vocabularies &

Identifiers (URIs)

CommunityKD

Innovation

Page 32: Practical semantics in the pharmaceutical industry - the Open PHACTS project

The Ecosystem is ….

API

Approach

Community

Industry

AcademiaData

Provider

Software Provider

Page 33: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Kick-Starting SustainabilityC

olla

bo

rati

on

Gra

nts

Ind

ust

ry

Open PHACTSA

PI U

sers

Apps

API

Page 34: Practical semantics in the pharmaceutical industry - the Open PHACTS project

explorer.openphacts.org

Page 35: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Example applications

Advanced analytics

ChemBioNavigator Navigating at the interface of chemical and biological data with sorting and plotting options

TargetDossier Interconnecting Open PHACTS with multiple target centric services. Exploring target similarity using diverse criteria

PharmaTrek Interactive Polypharmacology space of experimental annotations

UTOPIA Semantic enrichment of scientific PDFs

Predictions

GARFIELD Prediction of target pharmacology based on the Similar Ensemble Approach

eTOX connector Automatic extraction of data for building predictive toxicology models in eTOX project

Page 36: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Front-end framework to visualize biological data

Target dossier (CNIO)

Page 37: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 38: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 39: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 40: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 41: Practical semantics in the pharmaceutical industry - the Open PHACTS project
Page 42: Practical semantics in the pharmaceutical industry - the Open PHACTS project

The Open PHACTS community ecosystem

Page 43: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Becoming part of the Open PHACTS Foundation

Members

membership offers early access to platform updates and releases

the opportunity to steer research and development directions

receive technical support

work with the ecosystem of developers and semantic data integrators around Open PHACTS

tiered membership

familiar business and governance model

A UK-based not-for-profit member owned company

Page 44: Practical semantics in the pharmaceutical industry - the Open PHACTS project

What are the problems with licensing we had to address?– To make data and software generated by the project usable/ reusable– Multiplicity of unclear or non-standard licenses on original data sources

• ‘Public’ can mean use but not redistribute, use in commercial environment, • Legal position on use and reuse extremely unclear • Different issues than just linking to data

– Legal status of integrated collections of the above, and of derived knowledge?

– Appropriate software license selection– Legal clarity for EFPIA and end users– Approaches for commercial data integration, EFPIA in-house data

AIM: enable maximum possible dissemination and usability of integrated data and architecture with approaches that will be applicable in other data integration projects

Licensing Challenges

Page 45: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Chose John Wilbanks as consultant

A framework built around STANDARD well-understood Creative Commons licences – and how they interoperate

Deal with the problems by:

Interoperable licences

Appropriate terms

Declare expectations to users and data publishers

One size won‘t fit all requirements

Data Licensing Solution

Page 46: Practical semantics in the pharmaceutical industry - the Open PHACTS project

Open PHACTS Project Partners

Pfizer Limited – Coordinator

Universität Wien – Managing entity

Technical University of Denmark

University of Hamburg, Center for Bioinformatics

BioSolveIT GmBH

Consorci Mar Parc de Salut de Barcelona

Leiden University Medical Centre

Royal Society of Chemistry

Vrije Universiteit Amsterdam

Spanish National Cancer Research Centre

University of Manchester

Maastricht University

Aqnowledge

University of Santiago de Compostela

Rheinische Friedrich-Wilhelms-Universität Bonn

AstraZeneca

GlaxoSmithKline

Esteve

Novartis

Merck Serono

H. Lundbeck A/S

Eli LillyNetherlands Bioinformatics CentreSwiss Institute of BioinformaticsConnectedDiscoveryEMBL-European Bioinformatics Institute

Janssen

OpenLink

[email protected]


Recommended