Date post: | 10-May-2015 |
Category: |
Technology |
Upload: | valery-tkachenko |
View: | 309 times |
Download: | 0 times |
Building a semantic chemistry platformwith the Royal Society of Chemistry
Valery Tkachenko, Colin Batchelor, Peter Corbett,
Ken Karapetyan, Alexey Pshenichnov, Antony Williams
ACS 247th National Meeting
Dallas, TX
March 16th 2014
Big Data World and Chemistry
ChemSpider
RSC Archive
RSC Chemistry Platform
Data quality
Global Chemistry Network
Chemical space - 1060
Automated learning
Managing Big Data
Big Data World and Chemistry
ChemSpider
RSC Archive
RSC Chemistry Platform
Data quality
Global Chemistry Network
• ~30 million chemicals and growing
• Data sourced from >500 different sources
• Crowdsourced curation and annotation
• Ongoing deposition of data from our journals and our collaborators
• A structure centric hub for web-searching
ChemSpider
ChemSpider - properties
ChemSpider - references
ChemSpider - classification
Share in a “proper way”
Big Data World and Chemistry
ChemSpider
RSC Archive
RSC Chemistry Platform
Data quality
Global Chemistry Network
RSC Archive – since 1841
It is so difficult to navigate…
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
Digitally Enabling RSC Archive
CSSP Article Example
Compounds
Reaction
Analytical Data
Text and References
Big Data World and Chemistry
ChemSpider
RSC Archive
RSC Chemistry Platform
Data quality
Global Chemistry Network
RSC Chemistry Platform
ChemSpider Compounds
ChemSpider Reactions
ChemSpider Spectra
ChemSpider Crystals
ChemSpider Materials
ChemSpider Assays
ChemSpider Algorithms
Data Pipeline
Deposition Gateway
Staging databases
Compounds Reactions Spectra Crystals
Materials
Compounds Module
Spectra Module
Reactions Module
Materials Module
TextminingModule
�͙Module
Web UI for unified depositions
DropBox, Google Drive, SkyDrive, etc
ELNs, templated data input
Documents
API, FTP, etc
Raw data
Val
idat
ed
data
Staging databases
All databases are sliced by data sources/data collections and have simple security model where each data slice/source is private, public or embargoed
Etc
Experiments
Research
Compounds Database
Reactions Database
• ChemSpider Synthetic Pages
• Methods in Organic Synthesis
• Catalysts and Catalyzed Reactions
• USPTO
Reactions Database
Analytical Data Database
Data Pipeline
Compounds Reactions Spectra Crystals Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
CrystalsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
CrystalsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
Big Data World and Chemistry
ChemSpider
RSC Archive
RSC Chemistry Platform
Data quality
Global Chemistry Network
Data quality
– Robochemistry
– Proliferation of errors in public and private databases
– Automated quality control system
– Crowdsourcing
Typical public databases errors
J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10
DB06287
Chemistry Validation and Standardization Platform
Crowdsourcing and AltMetrics
RSC/Rewards and Recognition
Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.
The First Step badge is awarded when a user submits (& has published) their 1st CSSP article.
Big Data World and Chemistry
ChemSpider
RSC Archive
RSC Chemistry Platform
Data quality
Global Chemistry Network
We are a part of a much larger world
Research data network
University 1
Data Hub
Workstations
University 2
Data Hub
Workstations
Company 3
Data Hub
Workstations
Data Repositoryindexed storage
Data Repository provideddata storage
Chemically intelligent services
Indexes
Data
External clients Publishers
Scientists Funding bodies
ChemSpider APIs
National Chemistry Database
http://www.openphacts.org
Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to
drug discovery in industry, academia and for small
businesses.
Semantic web is one of the corner stones
OSDD