Post on 28-Mar-2015
transcript
A centre of expertise in digital information management
www.ukoln.ac.uk
UKOLN is supported by:
Dealing with Data: Roles, Rights, Responsibilities & Relationships
Dr Liz Lyon, Director, UKOLN
Associate Director, UK Digital Curation Centre
JISC Digital Repositories Conference, Manchester June 2007.
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
Overview
• Outcomes of a recent JISC-funded study by UKOLN– Institutions (repositories) and data centres– Roles, rights, responsibilities, relationships– High-level data-flow models
• Positioned in the UK context– 8 perspectives from Strategy to Practice– Examples of best practice– Recommendations
Strategy & Co-ordination• Synthesis
– Funder support for data curation is (still) patchy– Gaps in infrastructure support– High level and strategic – Operational level and practical : data services & data centres– Within and between institutions – Within and between disciplines : globally
• Recommendations– Datasets Mapping & Gap Analysis– Data Curation & Preservation Strategy for the UK– Data Audit Framework for institutions– Data Networking Forum for data centre staff
Policy & Planning
• Synthesis – Limited formal links between programme planning and support infrastructure but
examples of good practice– Formal data policies are essential– Web 2.0 influence: data sharing using social software– Better joint planning for data management
• Recommendations– Funders should openly publish, implement and enforce a Data Management,
Preservation and Sharing Policy– Research projects should submit a Data Management Plan for peer-review– Universities should implement an Institutional Data Management, Preservation
and Sharing Policy
A centre of expertise in digital information management
www.ukoln.ac.uk
January 2007
Data Management and Sharing Plan required “if creating or developing a resource for the research community as the primary goal” or “involve the generation of a significant quantity of data that could potentially be shared for added benefit”
NATURALENVIRONMENTRESEARCH COUNCIL
NERC has:
• 7 designated data centres
• Published policy (under review)
• Data Management Co-ordinator
• Developing DataGrid
General Data Selection Criteria • Usability
– Quality of data– Usable data format– Conditions of Use– Reputable Author– Documentation
• Usefulness– Data quality– Uniqueness of data– Potential Strategic Use– Usefulness of parametersNATURAL
ENVIRONMENTRESEARCH COUNCIL
Practice• Synthesis
– Data capture automatically at source from instruments, in the lab, in the field– Not much data in Institutional Repositories (IR)…. yet?– Integrated architectures linking IRs and datacentres– Models for sharing data? – Barriers: lack of awareness, resistance to change– Level of re-use of data?
• Recommendations– Data capture as part of end-to-end research workflow– Evaluate re-purposing of datasets: identify the significant properties which
facilitate re-use – Develop Disciplinary Case Studies
Technical Integration and Interoperability
• Synthesis – Data are highly complex and diverse– Data discovery to delivery– Standards, standards, standards, standards….– Value of generic data models, metadata application profiles?
• Recommendations– Identifiers and data citation best practice– Version control of datasets– Annotation models and standards best practice– Bi-directional interdisciplinary linking between data objects and derived
resources
Microarray data to inform gene expression• Consensus on community standards MIAME• Data pipelines at source via Laboratory Information Management Systems
LIMS• User tools MIAMExpress & value-added services• Annotation of data using the Gene Ontology• Submission & deposit is embedded in community culture: requirement for
publication• Training programme, eLearning materials coming
This level of data curation is expensive!!
Reactome
EnsEMBLGenome
Annotation
EMBL-BankDNA sequences
UniProtProtein Sequences
Array-ExpressMicroarray
Expression Data
EMSDMacromolecularStructure Data
IntActProtein Interactions Source: Graham
Cameron, EBI
Flybase
MGD
SGD
BRENDA
Chemicaldata
resources
Medical data resources
Biodiversitydata
resources
IMGT
Pasteur DBs
Eumorphia/Phenotypes
Corebiomolecular
resources
Specialist biomolecular data resource examples
Mutants
Large resources in related disciplines
Model organism resource examples
Mouse AtlasSource: Graham
Cameron, EBI
Publisher
CreateDeposit
Link
Curate Preserve
Scientist
Funder
Blogs, wikis
Collaborate Share
User
Discover Re-use
Domain Data Deposit Model
Link
Link
Domain Data Centre
Domain Data Centre
Domain Data Centre
Scientist
ScientistScientist
Scientist
Policy & Advocacy
Link
Training Advocacy
Standards
Community standards
This work is licensed under a Creative Commons LicenseAttribution-ShareAlike 2.0 © Liz Lyon (UKOLN, University of Bath) 2007
Aggregator services
Institutional data repositories
Deposit , Validation
Publication
ValidationData analysis
Search, harvest
Presentation services / portals
Data discovery, linking, citation
Laboratory repository
Deposit
Institutions: eCrystals Federation (eBank Project)
Publishers: peer-review journals, conference proceedings, etc
Curation
Preservation
Subject Repository
Institution Library & Information Services
Data creation & capture in “Smart lab”
Data discovery, linking, citation
Search, harvest
Search, harvest
Deposit
Deposit
Deposit
Publisher
CreateDeposit
Link
Curate Preserve Standards
Scientist
Funder
Blogs, wikis
Collaborate Share
User
Discover Re-use
Federation Data Deposit Model
Link
Link
Scientist
Scientist
Scientist
Policy AdvocacyTraining
Aggregator ? Data Centre
Harvest
IR
IRIR
IR
Federation
This work is licensed under a Creative Commons LicenseAttribution-ShareAlike 2.0 © Liz Lyon (UKOLN, University of Bath) 2007
Legal and Ethical Issues• Synthesis
– IPR is a barrier to data sharing e.g. geospatial data, performing arts– We need a better understanding of the issues
• Recommendations– JISCLegal provide enhanced advice about data and IPR– Develop model licences with other organisations
Sustainability• Synthesis
– Are current economic models for preservation & data sharing infrastructure a) appropriate? b) adequate? c) sustainable?
– Should inform research prioritisation and investment
• Recommendations– Cost-benefit study– Construct new economic models
Advocacy• Synthesis
– Programmes need to reach across sectors– Harmonisation and consistent messages – Researcher has some curatorial responsibility
• Recommendations– UK Co-ordination and target at specific disciplines
Training and Skills• Synthesis
– Leverage library & archive experience, EU projects DPE and PLANETS– Data curators and “native data scientists”
• Recommendations– Co-ordination: in the UK– Review career development of data scientists– Assess value of data handling and curation in the curriculum
UK Digital Curation Centre
http://www.dcc.ac.uk/
Scientist : creation and use of dataRights
Of first use.
To be acknowledged.
To expect IPR to be honoured.
To receive data training and advice.
Responsibilities
Manage data for life of project.
Meet standards for good practice.
Comply with funder / institutional data policies and respect IPR of others.
Work up data for use by others.
Relationships
With institution as employee.
With subject community
With data centre.
With funder of work.
Baroness Susan Greenfield, UK
Institution : curation of and access to data
Rights
To be offered a copy of data.
Responsibilities
Set internal data management policy.
Manage data in the short term.
Meet standards for good practice.
Provide training and advice to support scientists.
Promote the repository service.
Relationships
With scientist as employer.
With data centre through expert staff.
http://www.flickr.com/photos/nrparmar/383549700/in/pool-bath-uni/
Data centre : curation of and access to data
Rights
To be offered a copy of data.
To select data of long-term value.
Responsibilities
Manage data for the long-term.
Meet standards for good practice.
Provide training for deposit.
Promote the repository service.
Protect rights of data contributors.
Provide tools for re-use of data.
Relationships
With scientist as “client”
With user communities.
With institution through expert staff.
With funder of service.
User : use of 3rd party data
Rights
To re-use data (non-exclusive licence).
To access quality metadata to inform usability.
Responsibilities
Abide by licence conditions.
Acknowledge data creators / curators.
Manage derived data effectively.
Relationships
With data centre as supplier.
With institution as supplier.
GridPP computing facilities at Imperial College, London
Funder : set/react to public policy drivers
Rights
To implement data policies.
To require those they fund to meet policy obligations.
Responsibilities
Consider wider public-policy perspective & stakeholder needs.
Participate in strategy co-ordination.
Develop policies with stakeholders.
Participate in policy co-ordination, joint planning & fund service delivery.
Monitor and enforce data policies.
Resource post-project long-term data management.
Act as advocate for data curation & fund expert advisory service(s).
Support workforce capacity development of data curators.
Relationships
With scientist as funder.
With institution.
With data centre as funder.
With other funders.
With other stakeholders as policy-maker and funder of services.
Publisher : maintain integrity of the scientific record
Rights
To expect data are available to support publication.
To request pre-publication data deposit in long-term repository.
Responsibilities
Engage stakeholders in development of publication standards.
Link to data to support publication standards.
Monitor & enforce public. standards.
Relationships
With scientist as creator, author and reader.
With data centres and institutions as suppliers.
A centre of expertise in digital information management
www.ukoln.ac.uk
Dealing with Data Reportwill be published shortly at www.ukoln.ac.uk