A centre of expertise in digital information management
www.ukoln.ac.uk
UKOLN is supported by:
Codes, Clouds & Constellations: Open Science in the Data Decade
Dr Liz Lyon, Director, UKOLN, University of Bath, UKAssociate Director, UK Digital Curation Centre
CNI Meeting, Baltimore, April 2010
.
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
1. Scaling to Share2. Publication and Attribution3. Pathways to Participation4. Institutions and Informatics
http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.html#november-2009
•2010 Perspectives
•November 2009
•Consultation
•eResearch Australasia slides •http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html#2009-november-australasia
•Progress, Prospects?
Scaling to Share
Human Genome printed http://www.flickr.com/photos/johnjobby/2252981353/sizes/l/
From the Laboratory bench....
…to a national crystallography service....
....to Diamond Light Source
• “Bridging the chasm” between the local laboratory bench and large scale facilities
• Develop Integrated Information Model
• Use cases and Inter-disciplinary Pilots
• Cost-benefit analysis: before and after
http://www.ukoln.ac.uk/projects/I2S2/
Diamond Light Source
National Crystallography Service (NCS)
Local Earth Sciences Lab University of Cambridge
Function International service -multiple communities
UK service - multiple institutions. Also uses Diamond
Lone researcher at institution - uses NCS and ISIS large-scale facility
Administration Peer-reviewed proposal required
Paper-based records –experiments, safety ERA, instrument time
Multiple proposals, multiple forms
Metadata Core Scientific MetaData Model
eBank/eCrystals schema
?
Identifiers Beam-line number DOI InChI ?
Workflow Formulaic and bespoke
Formulaic, unrecorded Complex, unrecorded
Software In-house scripts In-house scripts + open-source suite
In-house scripts + open-source suite
Raw data In-house GDA store ATLAS data-store Laptop / local server
Derived data Taken offsite on laptop / USB stick
eCrystals repository Laptop / local server / USB stick
Technology race to market$1000 genome in <15 minutes ....by 2013?
...data deluge challenges....
• Large-scale data storage that is:– Cost-effective (rent on-demand)– Secure (privacy and IPR)– Robust and resilient– Low entry barrier / ease-of-use– Has data-handling / transfer / analysis capability
• Move sequencing out of genome centres
• “....analyse an entire human genome in a single day sitting with a laptop at your local Starbucks.”
...cloud services?
...data clouds in the media
Clients in the cloud
Post-genome decade
Human genomes: >24 published &almost 200 unpublished
“P4 medicine : predictive, personalised, preventive, participatory.”Leroy Hood – Institute for Systems Biology
• Each patient’s genome sequenced• Your genome is the basis of your medical record • New predictive models of health and disease• Individualised treatments focusing on preventative therapies
Image from Scientific American
Genome scale network biologyGenomic data as a commodity
• Sage Bionetworks : Integrative genomics• Develop predictive models of disease: liver /
breast / colon cancer, diabetes, obesity • Open data in the Sage Commons• Human and mouse: clinical and genetics data• Congress San Francisco 23-24 April 2010
Stephen Friend
They have shared their data….
Heather Piwowar
…but many researchers don’t share…
…and are reluctant to re-use data…
Publication and
Attribution
http://www.flickr.com/photos/digitalfemme57/3271063366/
Calls for action, new metrics
• Journal
• Article
• Workflow
• Data
• Annotation
• Concept
Macro
Micro / Nano
Attribution granularity
... complexity challenges...
Citing network models
• Multiple data sources
• Many standards
• Workflow integration
• User requirements
• Service functionality?
Pathways to Participation
http://www.flickr.com/photos/lemontwist/502860137/sizes/o/
Continuum of Openness
Open accessClosed Access
Participation
Lone scholar
Professional, experts
Volunteers interested amateurs
Citizen science
“dark data”
Creative Commons Attribution-Non-Commercial-Share Alike 2.0
Data Informatics: Logistics dilemma
Professional scientistCitizens
Capability
Capacity
Data scientists , LIS
Peer production
Volunteers, interested amateurs
Community curation
Creative Commons Attribution-Non-Commercial-Share Alike 2.0
Professional scientist
Observations
Audit
Preservation
Ontologies
Metadata schema
Annotation
Data management plans
Selection & Appraisal
Data cleansing
Training
Visualisation
Peer Production
Using gaming to drive curation
Professional Scientists Enthusiastic amateurs
Training Citizen scientist
Standards and ethics Local : natural history, environ.
Peer-review Global : astronomy
Organisational support Self-supporting
Citizen science...
Privacy issues?
… “participatory urbanism”?
“You have zero privacy anyway. Get over it”
Scott McNealy, CEO Sun Microsystems, 1999
Working with science professionals
...cultural challenges for faculty?
Institutions and Informatics
University of Edinburgh Informatics Forum http://www.flickr.com/photos/chris_malcolm/2638210422/sizes/l/
Open Science at Web-Scale Report 2009
Institutional response : High Throughput Biology
• North Carolina universities
• Cyber-infrastructure project
• Data cloud across three campuses
• “regional”
• Policy & practice
New data support structures
Facilitating team science
- Future Chips
- Biocomputation & Bioinformatics
- Tetherless World
- Integrative Systems Biology
- Graphic designers?
- Animators?
- Social scientists?
- Legal experts?
Embedding data informatics education
...for faculty & LIS...
Take homes1. Data sharing requires
pragmatic solutions
2. Attribution granularity & citation complexity
3. We need “the crowd”
4. Institutional strategies embrace informatics
5. The prospects are transformational...
http://www.flickr.com/photos/29170077@N05/4412360636/
Slides will be available at :http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html
http://www.dcc.ac.uk/