Open Notebook Science Web Services - ACS Spring 2011

Post on 18-Dec-2014

1,691 views 4 download

description

Jean-Claude Bradley presents on March 30, 2011 at the American Chemical Society on Rapid Dissemination of Chemical Information for people and machines using Open Notebook Science.

transcript

Rapid dissemination of chemical information for people and machines using

Open Notebook Science

March 30, 2011

American Chemical Society Meeting

Jean-Claude Bradley

Department of ChemistryDrexel University

Andrew Lang

Department of MathematicsOral Roberts University

Motivation: Faster Science, Better Science

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

TRUST

PROOF

First record then abstract structure

In order to be discoverable use Google friendly formats (simple HTML, no login)

In order to be replicable use free hosted tools (Wikispaces, Google Spreadsheets)

Strategy for an Open Notebook:

Crowdsourcing Solubility Data

ONS Challenge Judges

ONS Challenge Award Winners

Data provenance: From Wikipedia to…

…the lab notebook and raw data

Calculations Made Public on Google Spreadsheets

Interactive NMR spectra using JSpecView and JCAMP-DX

Raw Data As Images

Splatter?

Some liquid

YouTube for demonstrating experimental set-up

The importance of raw data availability

Missed in a prior publication on solubility

for this compound

Solubilities collected in a Google Spreadsheet

Rajarshi Guha’s Live Web Query using Google Viz API

Web services for summary data

(Andrew Lang)

Web service calls from within a Google Spreadsheet for solubility measurement and

prediction

(Andrew Lang)

Integration of Multiple Web Services to Recommend Solvents for Reactions

(Andrew Lang)

Reaction Attempts Book

Reaction Attempts Book: Reactants listed Alphabetically

ONS Challenge Solubility Book cited for nanotechnology application

Lulu.com Data Disks

Visualizing molecule-researcher connection maps reveals link between 2 Open Notebooks (Todd and

Bradley)

(Don Pellegrino)

The Intersection of Open Notebooks (Bradley/Todd) and IP implications

Open Notebook could have blocked patent if

done earlier

The Chemical Information Validation Sheet

567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

The Chemical Information Validation Explorer

(Andrew Lang)

Discovering outliers for melting points (stdev/average)

Investigating the m.p. inconsistencies of EGCG

Investigating the m.p. inconsistencies of cyclohexanone

Sigma-Aldrich, Acros and Wolfram Alpha apparently use the same sources for melting

points

Sigma-Aldrich, Acros and Wolfram Alpha apparently use the same sources for boiling

points

Sigma-Aldrich, Acros and Wolfram Alpha apparently

DO NOT use the same sources for flash points

Most popular data sources

Alfa Aesar donates melting points to the public

Open Melting Point Explorer

Outliers

MDPI dataset

EPI (via ChemSpider)

Outliers

Alfa Aesar

Inconsistencies and SMILES problems within MDPI dataset

MDPI Dataset labeled with High Trust Level

Open Melting Point Datasets

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Melting point prediction service

Using melting point for temperature dependent solubility prediction

Decanoic acid

WaterNaCl

Phrase searching for useful solubility applications

The challenge of modeling inorganic solubility openly

All ONS web services

Dynamic links to private tagged Mendeley collections

(Andrew Lang)

For all Formats of ONS Projects

Conclusions

•Abstraction of ONS experimental data into a clear semantic format can be done quickly and easily and greatly leverages the utility of the data

•No information is lost as long as a link to the Open Notebook is captured