+ All Categories
Home > Documents > UKOLN is supported by:

UKOLN is supported by:

Date post: 19-Jan-2016
Category:
Upload: liluye
View: 18 times
Download: 0 times
Share this document with a friend
Description:
Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of Bath Leslie Carr, Simon Coles University of Southampton. UKOLN is supported by:. JCDL 2005, June 7-11, Denver. www.bath.ac.uk. - PowerPoint PPT Presentation
Popular Tags:
25
UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of Bath Leslie Carr, Simon Coles University of Southampton www.bath.ac.u k A centre of expertise in digital informaion management JCDL 2005, June 7-11, Denver
Transcript

                                                             

UKOLN is supported by:

Enhancing access to research data: the challenge of crystallography

Rachel Heery, Monica Duke, Michael Day

UKOLN, University of Bath

Leslie Carr, Simon Coles

University of Southampton

www.bath.ac.uk

A centre of expertise in digital informaion management

JCDL 2005, June 7-11, Denver

                                                             

Enhancing access to research data: overview

• Crystallography as an exemplar

• Impact of digital technologies on scientific research process

• Need new modes of data curation

• eBank project: applying digital library techniques to support data curation

• Next steps

                                                             

Changes in scientific research process

• Increasing data volumes from eScience / Grid-enabled / cyber-infrastructure applications, “big science”

• Changing research methods: high througput technologies, automation, ‘smart labs’

• Potential for re-use of data, new inter-disciplinary research

• Different types of data: observational data, experimental data, computational data: different stewardship requirements

Data Overload!

How do we disseminate?

EPSRC National Crystallography

Service

The data deluge: crystallography

Data overload & the publication bottleneck

Cl

Cl

Cl

Cl

Cl

Cl

ClCl Cl

Cl

Cl

ClCl

O

O

O

O

N

N

N

N

N+

O

O

O

N+

O

O

O

25,000,000

2,000,000

300,000

Current Publishing Process• Journal articles: aims, ideas, context, conclusions – only most significant data

• Raw & underlying data required by peers not readily available

                                                             

Context: existing data repositories• National data archives:

– UK Data Archive, Arts and Humanities Data Service, US National Archives and Records Administration (NARA), Atlas Datastore

• Discipline specific archives: – GenBank, Protein Data Bank

• Crystallography archives– Cambridge Crystallographic Data Centre (Cambridge

Structural Database) , Indiana University Molecular Structure Center (Crystal Data Server, Reciprocal Net), FIZ Karlsruhe (Inorganic crystals), Toth Information Systems (CHRYSTMET)

• Journals require deposit of data to support articles– Typically deposit of summary data…. partial coverage

Crystallography workflowRAW DATA DERIVED DATA RESULTS DATA

• Initialisation: mount new sample on diffractometer & set up data collection

• Collection: collect data• Processing: process and correct images• Solution: solve structures• Refinement: refine structure• CIF: produce CIF (Crystallographic Information File)• Validation: chemical & crystallographic checks

                                                             

eBank UK project overview

• JISC funded in 2003, now in Phase 2 to 2006• Joint effort between crystallographers, computer

scientists, digital library researchers• Investigating contribution of existing digital library

technologies to enable ‘publication at source’• Partners have interest in dissemination of

chemistry research data, open access, OAI, institutional repositories http://www.ukoln.ac.uk/projects/ebank-uk/

                                                             

eBank project team

University of Bath, UKOLN• Michael Day, Monica Duke, Rachel Heery, Liz

Lyon, Traugott KochUniversity of Southampton, School of Chemistry• Simon Coles, Jeremy Frey, Mike HursthouseUniversity of Southampton, School of Electronics

and Computer Science• Leslie Carr, Chris GutteridgeUniversity of Manchester, PSIgate• John Blunden-Ellis

                                                             

eBank phase one: achievements• Gathered requirements from crystallographers • Established pilot institutional repository for

crystallography data at Southampton with web interface

• Developed a demonstrator aggregator service at UKOLN (CCDC exploring aggregation service)

• Developed appropriate schema • Demonstrated a search interface as an embedded

service at PSIgate portal• Demonstrated an added value service linking

research data to papers (one-off)

                                                             

Institutional repositories…publication at source

• Institution establishes repository(s)• Institution pro-actively supports deposit

process• OAI provides basis for interoperability • Potential for added value services

• And/Or ….international subject based archives?

                                                             

Crystallography good fit….

• Crystallography has well defined data creation workflow

• Tradition of sharing using standard file format

• Crystallography Information File (CIF)

• What about other chemistry sub-disciplines? other scientific disciplines?

Data Flow in eBank UK

OA

I-P

MH

Submit

Store/link

Harvest (XML)

Index and Search

Data files

Metadatapresent

HTML

present

HTML

Institutional repository

eBank aggregator

Create

                                                             

Southampton digital repository

http://ecrystals.chem.soton.ac.uk

Access to ALL underlying data

Embedded search service at PSIgate

PSIgate subject gateway:service provider

                                                             

Schema for records made available for harvesting• Data holding (collection of files associated with

experiment)• Qualified Dublin Core data elements plus additional chemical

properties – Empirical formula– International Chemical Identifier (InChI)– Compound Class

• Individual data files• Separate records for stage status of each file

• Description set wrapped into one XML record using METS

• Research metadata/data as a complex object

ebank_dc record (XML)

Crystal structure (data holding)

Crystal structure report (HTML)

Dataset

Dataset

Institutional repositories

eBank UK aggregator service

ePrint UK aggregator service

Other aggregators and services

DepositHarvesting OAI-PMH

ebank_dc

Harvesting OAI-PMH oai_dc,ebank_dc

Harvesting OAI-PMH oai_dc

Dataset

dc:identifier

dcterms:references

Linking

dc:type=“CrystalStructure”

Model input Andy Powell, UKOLN.

Eprint oai_dc record (XML)

dcterms:isReferencedBy

dc:type=“Eprint” and/or ”Text”

eBank data model

Eprint “jump-off” page (HTML)

dc:identifierEprint manifestation (e.g. PDF)

Linking

Dep

osit

                                                             

Creating the metadata

• Potential to embed ‘deposit and disseminate’ into workflow of chemist in automated way

Data Collection

Diffraction

Unit Cell

Success

Strategy

Data Collection

Data Process

System Y

PreScans

Yes

Yes

BruNo Mount

BruNo Unmount

Setup via GUI

Sample Tray

No

No

                                                             

eBank phase two work areas

• Sub-disciplines of chemistry and physical sciences

• Pursue generic data model• Use of identifiers for citing datasets• Subject approach to discovering research

data• Access to research data in teaching and

learning context• Liaise with other digital repository initiatives

                                                             

For the future…

• Who provides added value services?– Authority files, automated subject indexing, annotation,

data mining, visualisation

• What are the preservation issues?– UK Digital Curation Centre http://www.dcc.ac.uk

– National Science Board Draft report on long-lived data collections http://www.nsf.gov/nsb/meetings/2005/LLDDC_draftreport.pdf

• How to manage complex objects descriptions within OAI

• Digital curation of research data presents new roles for scientists, computer scientists, data managers…. ‘data scientists’

                                                             

Thank you.Comments, questions?

http://www.ukoln.ac.uk/projects/ebank-uk/

Acnowledgement to all project partners for their contributions to this presentation.


Recommended