+ All Categories
Home > Technology > Chemistry Online and The vision and challenges associated with building the chem spider resource for...

Chemistry Online and The vision and challenges associated with building the chem spider resource for...

Date post: 10-May-2015
Category:
Upload: orcid-0000-0002-2668-4821
View: 402 times
Download: 1 times
Share this document with a friend
Description:
Today ChemSpider (www.chemspider.com) is one of the community’s primary online resources for chemists. Now hosting over 28 million unique chemical compounds linked to over 400 data sources, ChemSpider offers its users a structure centric platform facilitating access to publications and patents, experimental and predicted property data, spectral data and many other forms of data and information that can benefit a chemist. ChemSpider is a crowdsourcing platform allowing the community to contribute data directly to the database by allowing the deposition and sharing of structure data, properties, spectra and reaction syntheses. The crowdsourcing also allows for the annotation and curation of existing data thereby allowing the community to assist in the much-needed curation and validation of chemistry data on the internet. This work is imperative in order to provide the chemistry underpinnings to semantic web projects such as Open PHACTS (www.openphacts.org) of which Merck is sure to benefit when it is released to the community. This presentation will provide an overview of the ChemSpider platform and will also examine the challenges of dealing with heterogeneous data quality when attempting to provide a rich resource of data for the community. If you use the internet to research chemistry based data this presentation will be an essential guide to how to source high quality data.
Popular Tags:
118
Chemistry Online The Vision and Challenges Associated With Building the ChemSpider Resource for Chemists Antony Williams Merck, October 2012
Transcript
Page 1: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Chemistry Online – The Vision and

Challenges Associated With Building

the ChemSpider Resource for Chemists

Antony WilliamsMerck, October 2012

Page 2: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

We Have …Too Much Data!!!

Page 3: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

It is so difficult to navigate…

What’s the

structure?

Are they in

our file?

What’s

similar?

What’s the

target?Pharmacology

data?

Known

Pathways?

Working On

Now?Connections

to disease?

Expressed in

right cell type?

Competitors?

IP?

Page 4: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 5: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

The World of Online Chemistry

Property databases

Compound aggregators

Screening assay results

Scientific publications

Encyclopedic articles (Wikipedia)

Metabolic pathway databases

ADME/Tox data – eTOX for example

Blogs/Wikis and Open Notebook Science

Contributing Open Source code to projects

Page 6: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

PubChem

Page 7: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChEMBL

Page 8: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Collaborative Knowledge Management

Page 9: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Data on the Web

Page 10: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

RSC’s ChemSpider

Page 11: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

We Want to Answer Questions

Questions a chemist might ask…

What is the melting point of n-heptanol?

What is the chemical structure of Xanax?

Chemically, what is phenolphthalein?

What are the stereocenters of cholesterol?

Where can I find publications about xylene?

What are the different trade names for Ketoconazole?

What is the NMR spectrum of Aspirin?

What are the safety handling issues for Thymol Blue?

Page 12: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Available Information…

Linked to vendors, safety data, toxicity, metabolism

Page 13: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Available Information….

Page 14: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Crowdsourced “Annotations”

Users can add

Descriptions/Syntheses/Commentaries

Links to PubMed articles

Links to articles via DOIs

Add spectral data

Add Crystallographic Information Files

Add photos

Add MP3 files

Add Videos

Page 15: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 17: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Spectra Linked

Page 18: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Spectra Linked

Page 19: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Chemistry Data online is messy

We have inherited errors

All public compound databases, including ours, have errors

“Incorrect” structures – assertions, timelines etc

“Incorrect” names associated with structures

Properties

Links

Publications

ENORMOUS CHALLENGE

Page 20: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

What could create change?

Harvard Business Review (2010)

“One change would make a substantial difference [to drug R&D]: the creation of

agreed-upon standards for digitally representing drug assets.”

Consider drug structures ONLY…

Page 21: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

The Structure of Vitamin K?

Page 22: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

MeSH

A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants, VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione). Vitamin K 3 provitamins, after being alkylated in vivo, exhibit the antifibrinolytic activity of vitamin K. Green leafy vegetables, liver, cheese, butter, and egg yolk are good sources of vitamin K

Page 23: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

The Structure of Vitamin K1?

Page 24: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

What is the Structure of Vitamin K1?

Page 25: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

CAS’s Common Chemistry

Page 26: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Wikipedia

Page 27: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 28: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 29: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChEBI – Manual Curation

Page 30: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 31: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 32: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 33: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

“2-methyl-3-(3,7,11,15-tetramethylhexadec-2-enyl)naphthalene-1,4-dione”

Variants of systematic names on PubChem

2-methyl-3-[(E,7R,11R)-3,7,11,15-tetramethyl

2-methyl-3-[(E,7S,11R)-3,7,11,15-tetramethyl

2-methyl-3-[(E,7R,11S)-3,7,11,15-tetramethyl

2-methyl-3-[(E,7S,11S)-3,7,11,15-tetramethyl

2-methyl-3-[(E,11S)-3,7,11,15-tetramethyl

2-methyl-3-[(E)-3,7,11,15-tetramethyl

2-methyl-3-(3,7,11,15-tetramethyl

2-methyl-3-[(E)-3,7,11,15-tetramethyl

Page 34: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Chemistry on The Internet Is Messy

Page 35: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

It’s Methane…

Page 36: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

What’s Methane?

Page 37: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

What’s Methane?

Page 38: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

What ELSE is Methane???

Page 39: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 40: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

EPA’s DailyMed

Page 41: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

EPA’s DailyMed

Page 42: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

EPA’s DailyMed

Page 43: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

With Great Fanfare…

Page 44: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

NPC Browser http://tripod.nih.gov/npc/

Page 45: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

NPC Browser http://tripod.nih.gov/npc/

Page 46: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 47: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

The EXPERTS must get it right?!

Page 48: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Wikipedia, C&E News, PubChemC&E News (from ACS)

Page 49: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

People Use Trusted Resources…

Page 50: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Earlier this month…

Page 51: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Stop Whining – Fix it

Page 52: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Crowdsourced Curation

Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Page 53: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Search “Vitamin H”

Page 54: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

“Curate” Identifiers

Page 55: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

“Curate” Identifiers

Page 56: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

“Curate” Identifiers

Page 57: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

What is the outcome of this???

IF we can get the community to help clean up the internet of chemistry then we have:

High quality online reference resources

Freely available reference data

Ongoing iterative curation – how many chemical structures are “reworked”

And what is the value of “curated chemical dictionaries???”

Page 58: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Successful Semantic Markup

Depends on Dictionaries

Page 59: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Dictionaries Enhance Publications

Page 60: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

I want to know about “Vincristine”

Page 61: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Vincristine Identifiers

Page 62: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Vincristine: PatentsLinked by Name

Page 63: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Vincristine: ArticlesLinked by Name

Page 64: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

What are the names for this

compound just in patents????

Page 65: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

A disambiguation NIGHTMARE!

Page 66: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Ambiguity in Identifiers

Page 67: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Crowdsourcing Works

>130 people have deposited data and participated in data curation

Different level curators check each other

More curators and depositors encouraged! 28 million chemicals is a long list…

Page 68: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChemSpider for Analytical Sciences

ChemSpider is being developed with the intention of

Being the world’s richest resource of freely accessible curated analytical data

As a platform for structure verification and dereplication

To provide access to supporting prediction algorithms

Page 70: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Spectral Uploading

Various types of NMR spectra supported

Page 72: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Multiple Spectra for One Structure

Page 73: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChemSpider ID 24528095 H1 NMR

Page 74: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChemSpider ID 24528095 C13 NMR

Page 75: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChemSpider ID 24528095 HHCOSY

Page 76: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChemSpider ID 24528095 HSQC

Page 77: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChemSpider ID 24528095 HMBC

Page 78: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Full C13 assignment uploaded

Page 79: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Available Spectra http://www.chemspider.com/spectra.aspx

Page 80: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

How do these data get curated?

Every spectrum can be commented on

Incorrect spectra have been annotated and curated by users…

But curation through gaming is also possible…

Page 82: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

www.SpectralGame.comhttp://www.jcheminf.com/content/1/1/9

Page 83: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Spectral Game

Page 84: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Increasing Complexity

Page 85: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Spectral Game

Page 86: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Reversed Spectrum

Page 87: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

True Curation of Data

Page 88: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

SpectralGame in the hand

Page 90: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Mass Spec Analysis

Page 91: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChemSpider Interface

Page 92: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 93: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Tinuvin 328

Page 94: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Position sorted by references

Page 95: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Position 1 only

Page 96: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Web Services

Page 97: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Web Services Open Up Collaboration

Agilent, Bruker, Waters and Thermo all use our web-based services for compound lookup

Many academic sites integrating directly –metabonomics, name lookup, semantic markup

Page 98: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Where do data come from?

ChemSpider users deposit data

Some contributions from NIST

Chemical vendors are starting to provide data. Synthonix are one of our major contributors (www.synthonix.com)

Page 99: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Commercial Database Access

Recently deposited to ChemSpider

EPA/NIST IR Database >5000 spectra

Presently under development

NIST MS database >200,000 MS spectra

Page 100: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Where next with Analytical Support?

PharmaSea project for the identification of natural products – dereplication approaches

Use mass spectrometry searches of natural product slices to identify

Pre-fragment compounds and develop searches

Dereplication using NMR data NMR features

Predicted spectra and “Verification approaches”

Page 101: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

NMRShiftDB: http://www.ebi.ac.uk/nmrshiftdb/

Page 102: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists
Page 103: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

NMR Prediction

Page 104: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

NMRShiftDB Data Review

• High quality NMR shift set of ca. 100,000 shifts

• Derived prediction algorithms give very similar

performance statistics to commercial algorithms

Page 105: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Crowdsourcing Chemical Synthesis

How much data generated in a lab, that COULDgo public, is lost forever?

Page 106: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Crowdsourcing Chemical Synthesis

How much data generated in a lab, that COULDgo public, is lost forever?

Public Domain reference databases of value?

Properties

Spectra

CIFs

Images

Syntheses

Page 107: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

An Adventure into the World of Small

but significant contribution..

Page 108: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

ChemSpider SyntheticPages

Page 109: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Micropublishing with Peer Review

(a chemical synthesis blog?)

Page 110: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Multi-Step Synthesis

Page 111: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Interactive Data

Page 112: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

MOBILE Structure Database Lookup

Page 113: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

It is so difficult to navigate…

What’s the

structure?

Are they in

our file?

What’s

similar?

What’s the

target?Pharmacology

data?

Known

Pathways?

Working On

Now?Connections

to disease?

Expressed in

right cell type?

Competitors?

IP?

Page 114: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Open PHACTS Project Develop a set of robust standards…

Implement the standards in a semantic integration hub

Deliver services to support drug discovery programs in pharma and public domain

22 partners, 8 pharmaceutical companies, 3 biotechs

36 months project – goes live next month

Guiding principle is open access, open usage, open source

- Key to standards adoption -

Page 115: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Internet Data

The Future

Commercial Software

Pre-competitive Data

Open Science

Open Data

Publishers

Educators

Open Databases

Chemical Vendors

Small organic molecules

Undefined materials

Organometallics

Nanomaterials

Polymers

Minerals

Particle bound

Links to Biologicals

Page 116: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

The Future of Chemistry on the Web?

Public compound databases federate & build a linked environment of validated data!

Data validation needs are not ignored

Publishers layer on information to make publications discoverable

Public-Private databases can be linked

Open Data proliferate

The “Semantic Web” in action

Page 117: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Can Merck Contribute to this Project?

Do you have any data that you can release into the public domain?

Measured property data

How many “common” spectra are thrown away?

How many syntheses are published and locked behind paywalls? (www.chemspider.com/reactions)

Can your scientists contribute annotations and curations if they use ChemSpider?

Is the challenge of Legal Clearance too big?

Page 118: Chemistry Online and The vision and challenges associated with building the chem spider resource for chemists

Thank you

Email: [email protected]

Twitter: ChemConnector

Blog: www.chemspider.com/blog

Personal Blog: www.chemconnector.com

SLIDES: www.slideshare.net/AntonyWilliams


Recommended