Opportunities in chemical structure standardization

Post on 13-Apr-2017

61 views 0 download

transcript

Opportunities in Chemical Structure

StandardizationValery Tkachenko

Science Data Software, Rockville, USA

Expanding IUPAC Standards for Chemical InformationEMBL-EBI Workshop, March 20-21st 2017

DIKW workflow

Predictive data models & toolsExperimental Design

Data Analysis and

Modeling

Structured Nanomaterials

DataRepository

Data collection, curation, integration,

and structuring (ontology)

Literature data

Electronic Databases:

Analysis

Text Mining

Processing

Experimental Data

Disease

ExperimentalValidation

Feedback

, new

data

3

Effect

Decision support

Karmann Mills and Anthony HickeyRTI International, RTP, NC 27709andAlex TropshaEshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC 27599

Standards and authorities

We live in hyperconnected World

Data repositories

Fourches, Muratov, Tropsha. Nat Chem Biol. 2015,11(8):535.

How the problem is being solved now

[Very incomplete] list of common problems• Violation of chemical and common sense• Violations of valence bond theory• Unsupported format and chemical model features• Information loss during conversion• Tautomers• Stereochemical issues• Mixtures• Other classes of chemicals (materials, formulations, biologicals, structurally

diverse, etc)• Equivalence/mapping issues• Identifiers/names issues• Etc, etc, etc…

…problems (continued)• Multiple [historical, proprietary, shortcoming] formats

• ChemDraw, ChemSketch, AccelrysDraw• MOL, SDF• SMILES• Identifiers• Names and Synonyms

• Multiple toolkits/models• Open Source (alphabetical)

• CDK• RDKit• Indigo• OpenBabel• Etc…

• Commercial (alphabetical)• CACTVS• ChemAxon• OpenEye• Etc…

• Historical Hysterical software• No [machine-readable] standards• No authorities No coordinated efforts!!!

Solution• Agreed and machine-readable (digital) standards• Open-source (transparent) solution• Organizations AND community support and involvement• Accessible solution• Data triaging at data repositories level• Real-time validation/standardization (API, library, “docker”, etc)

11@gray_alasdair Big Data Integration

OpenPHACTS

OpenPHACTSChemistry Registry System (CRS)

OpenPHACTS CRS shortcomings…• Platform-dependent• Toolkit-dependent (potential licensing issues)• No deployable library• No [convenient] API

…OpenPHACTS CRS1 - ongoing work• Microsoft platform independent

• .NET Core, Python• Linux• NoSQL

• Toolkit independent• Indigo• RDKit (in progress)• CDK (planned)

• Docker image

• RESTful API

1 Was open-sourced and now supported by OpenPHACTS Foundation

CVSP on Jupyter

Meet the Team

Alexandru KorotcovData Science

Rick ZakharovTechnology

Valery TkachenkoSupport

Boris SattarovCheminformatics

Slides: https://www.slideshare.net/valerytkachenko16