Date post: | 21-Nov-2014 |
Category: |
Technology |
Upload: | antony-williams-chemconnector |
View: | 2,698 times |
Download: | 1 times |
The Great Promise of Online Data for Chemistry and the Life Sciences
Antony J WilliamsSilverchair Colloquium 2012
READ FAST – IT’S HAPPENING NOW
20 minutes, >40 slides
Disruption Can be Cheap, Fast and Unexpectedly
Successful
Online Chemistry Databases in 2007
A search gave LOTS of “info”..What is Yohimbine?
For chemists…try filtering!
Why not Index the web of chemistry?
Build a search engine for chemistry
Index all public domain chemicals and link
Build a structure searchable web
Crowdsource new chemistry from the community
Crowdsource curation and annotation
Create a structure-centric hub
Answering Real Questions
Questions a chemist might ask… What is the melting point of n-heptanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
The World of Online Chemistry Safety data Toxicity data Blogs and Wikis Property databases Experimental results Scientific publications Compound aggregators Open Notebook Science Metabolic pathway databases Encyclopedic articles (Wikipedia)
Linked Data for Life Sciences growing…
Solve Real World Problems
Provide programmable interface against content Provide a chemistry database tuned to integrators
RSC and ChemSpider – May 2009
Why RSC acquired ChemSpider
Commitment to serve the community
Bring cheminformatics expertise in-house
Add additional data to publications
Potential freemium model – web services, data
Because data is critical to science
Making sense of data is overwhelming
Publications are Hosts to Data
Data has value, is Free, is Open
Data cannot be copyrighted. A particular expression of data, such as a chart or table in a publication, can be.
Data licensing is being dealt with and openness encouraged
Research data mandates are starting…
Who will manage the integration and curation and keep the access FREE!
Tell me about Yohimbine…
Of course it is out there…
SOME Chemistry Databases in 2012
Tell me more…but…
Where can I find the electronic structure? Papers/Patents about Yohimbine? What are the side effects of Yohimbine? Where can I order Yohimbine? What are the physicochemical properties? What are the associated metabolic pathways? Different synonyms of Yohimbine? Are there side effects with Yohimbine?
ChemSpider links all of this information and more
Yohimbine on ChemSpider
RSC Databases are Integrated
RSC Journals are Integrated
Patents are Linked
Google Books are Integrated
And so are…
Chemical vendors Safety and Toxicity information Experimental and Predicted properties Analytical data Images and Movies
And all for free…
And all “mobile”
Not only compounds but syntheses
And analytical data…
The world can take and contribute
Scientists can deposit their data
They can annotate and curate
They can download data
They can embed data in the social network
They can integrate and connect
Integrate to electronic lab notebooks
Integrate to electronic lab notebooks
Integrate to instruments and software
Primary analytical instrumentation vendors integrate
Agilent, Bruker, Thermo, Waters
Cheminformatics vendors link to ChemSpider
Accelrys, ACD/Labs, ChemAxon, iChemLabs
Publications are a summary of work
Scientific publications are a summary of work Is all work reported? How much science is lost to pruning? What of value sits in notebooks and is lost?
How much data is lost? How many compounds never reported? How many syntheses fail or succeed? How many characterization measurements?
What if we could capture it all?
Start with data in publications
But in the time of Big Data…it’s linked!
ONE example – data for life sciences
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
Crowdsourcing across drug discovery Open PHACTS : partnership between European
Community and European Pharma Companies 22 partners, 8 pharmaceutical companies, 3
biotechs working together for 3 years
Freely accessible for knowledge discovery and verification. Data on chemistry and biology Pharmacological profiles Proprietary and public data sources.
All that glisters is not gold…
Crowdsourced Assertions The future of publishing will include generation and
consumption of “nanopublications”
http://www.nanopub.org/
Nanopublications??
So what’s the business model?
Decisions are based on data
Publications encapsulate, reference and link data
More data is free and open. More services and APIS allow access – free or for fee. Ask Google
The large-scale licensed content business model is at risk without interfaces to integrate and mine
Acknowledgments
The RSC ChemSpider team
Our users, our depositors, our curators
GGA Software Services, OpenEye, ACD/Labs and a lot of Open Source code!
And Al Gore for supporting the internethttp://en.wikipedia.org/wiki/
Al_Gore_and_information_technology
Thank you
Email: [email protected] Twitter: ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams