Date post: | 01-Nov-2014 |
Category: |
Technology |
Upload: | mark-borkum |
View: | 1,304 times |
Download: | 0 times |
Integration of oreChem with the eCrystals repository for crystal structures
Mark Borkum, Simon Coles and Jeremy Frey15 September 2010
2
Overview• Motivation
• Implementation
• Discussion and Summary
3
Current Practice in Crystallography• Crystallography data
is highly structured
– The de facto standard adopted by the community is the CIF (Crystallographic Information File)
• Relatively few crystal structures are openly published
http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
4
Open Access Journals• Advantages:
– Rapid publication– Highly cited– Data is available to download
• Disadvantages:– Electronic only– Not all data is of primary
importance to the underlying chemistry
• By-products, unexpected results, tracking reactions, etc.
5
Crystallography and Fraud
6
The eCrystals Federation• JISC project to establish a network of
crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services
– Led by the UK National Crystallography Service (NCS)
– With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics
7
eCrystals – University of Southampton• Located @ http://ecrystals.chem.soton.ac.uk
• Archive for crystal structures that are generated by:
– Southampton Chemical Crystallography Group– UK National Crystallography Service (NCS)
• Modified version of EPrints 3.1
– OAI-PMH compliant– Extensible platform (with plug-ins architecture)
8
What is an eCrystal?• “all the fundamental
and derived data resulting from a single crystal X-ray structure determination”
• “the information supplied should enable any reader to check the reliability and validity”
http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
9
The Scientific Web
10
The Data Deluge• In Haiku:
– Lots of producers;Generating more datathan ever before.
• 40 years ago, a PhD student would determine 3 structures over the entire course of their study!
The Great Wave off Kanagawa by Katsushika Hokusai
11
Provenance• The 7 W’s [Goble
2002]
– Who, What, Where, Why, When, Which, & (W)How
• The Why aspect is usually ignored
– Rational, intent, hypothesis, protocol, methodology, workflow, etc.
“Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four
countries since it was painted for Philip II of Spain in the 1550s.”
Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
“In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra)
12
13
Why “Why” Matters• It is the reason for
the data’s existence
• It gives us the ability to interpret the data in the correct context
• It allows us to align the data with the big picture http://www.myexperiment.org/workflows/16.html
14
The oreChem Core Ontology• Describes three concepts:
1. The methodology (planned method) of a scientific experiment
2. The enactment of methodologies
3. The provenance of realised artefacts
15
Methodology (Planned Method)• The “plan” is
modelled as a directed graph
• Two node types:
– Plan Stagedescription of an activity that will be enacted
– Plan Object description of an artefact that will be realised
16
Enactment (of a Methodology)• Each “run” (of a plan)
is modelled as a directed graph
• Two node types:
– Stagedescription of an activity that has been enacted
– Objectdescription of an artefact that has been realised
17
Provenance• Prospective
– The plan describes a scientific experiment that will be enacted
• Retrospective
– The run describes a scientific experiment that has been enacted
– Every ‘run thing’ is linked to exactly one ‘plan thing’
18
oreChem Plug-in for eCrystals• Three components:
1. orechem:Plan (the eCrystals methodology)
2. “eCrystal orechem:Run” mapping
3. “orechem:Run provenance graph” pipeline
19
The eCrystals Methodology
Before After
20
Example: eCrystal #643
Before After
21
SPARQL RequestPREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#>PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reportedWHERE { ?run a orechem:Run ; orechem:hasPlan ecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported . ?raw a orechem:File ; orechem:hasPlanObject ecrystals:HKL . ?derived a orechem:File ; orechem:derivedFrom ?raw . ?reported a orechem:File ; orechem:hasPlanObject ecrystals:CIF ; orechem:derivedFrom ?derived .}
22
SPARQL Response (for eCrystal #643)
?run
?raw
?reported
?derived
23
Summary• <summary/>
24
Acknowledgments• oreChem is funded by Microsoft External Research
• eCrystals is funded by both EPSRC and JISC
• The oreChem project team:– Nico Adams, Mark Borkum, William Brouwer, Rameswara Sashi
Kiran Challa, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, Prasenjit Mitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden.
25
#ahm2010
#ahm
#ahm10
#pch2010
http://pegasus.chem.soton.ac.uk #ahm2010 until 11am Wed 15 Sept 2010