Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing:...

Post on 27-Jul-2020

4 views 0 download

transcript

Chemical databasing: state of the art and current

challenges

Valery Tkachenko

Royal Society of Chemistry

Kazan Summer School on Cheminformatics

Kazan, Russia

July 6th 2015

Why databases?

Efficient storage

Quick access (browse, search)

ACID (Atomicity, Consistency, Isolation,

Durability)

Scalability

Migrations

Security

Safety (backup/restore)

Database – model and data

Database – relational example

Chemical database

Chemistry-specific searches

Identity – same atoms connected in the same way

Substructure – find all chemicals having query as a substructure

Superstructure – find all chemicals which are substructures of a query

Similarity – find all “similar” chemicals

InChI (http://www.inchi-trust.org/)

Pidolic acid

Fingerprints

Human Molecule

SciFinder

Reaxys

PubChem

• 32 million chemicals and growing

• Data sourced from >500 different sources

• Crowdsourced curation and annotation

• Ongoing deposition of data from our

journals and our collaborators

• A structure centric hub for web-searching

ChemSpider

ChemSpider

Properties - experimental

Properties - ACDLabs

Properties – EPI Suite

Properties - ChemAxon

Literature references

Patents references

Books

Classification

Chemical vendors and datasources

Multimedia

Chemical space - 1060

RSC Archive – since 1841

Digitally Enabling RSC Archive

Advanced Search

It is so difficult to navigate…

What’s the

structure?

Are they in

our file?

What’s

similar?

What’s the

target? Pharmacology

data?

Known

Pathways?

Working On

Now? Connections

to disease?

Expressed in

right cell type?

Competitors?

IP?

ChemSpider Synthetic Pages

Compounds

Reaction

Analytical Data

Text and References

Electronic Laboratory Notebook (ELN)

RSC Data Repository

Data Repository

PropertiesNames and Identifiers

Spectra ArticlesData

CollectionsPatents Etc

Input pipeline

Output pipeline

RSC Databases

RSC Compounds

RSC Reactions

RSC Spectra

RSC Crystals

RSC Polymers

RSC Materials

RSC Assays

RSC Algorithms

RSC Models

…and on…

Compounds domain

Data quality issue and CVSP

– Robochemistry

– Proliferation of errors in public and

private databases • ChemSpider

• PubChem

• DrugBank

• KEGG

• ChEBI/ChEMBL

– Automated quality control system

Chemistry Validation and Standardization Platform

Chemistry Validation and Standardization Platform

Reactions domain

Reactions domain

• ChemSpider Synthetic Pages

• Methods in Organic Synthesis

• Catalysts and Catalyzed Reactions

• USPTO

Reactions domain

Analytical data domain

Crystallography domain

3D printable structures

We are a part of a larger world

Who is involved?

29 partners

Research questions

OpenPHACTS Architecture

OpenPHACTS UI

http://explorer.openphacts.org/

National Chemistry Database

Thank you

Email: tkachenkov@rsc.org

Slides:

http://www.slideshare.net/valerytkachenko16