Because good research needs good data
Funded by
Supporting Research Data Management
at the University of Stirling
Graham Pryor and Martin Donnelly
Digital Curation Centre
27 April 2012
This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License
The Digital Curation Centre is
• a consortium comprising units from the Universities of
Bath (UKOLN), Edinburgh (DCC Centre) and Glasgow
(HATII)
• launched 1st March 2004 as a national centre for
solving challenges in digital curation that could not be
tackled by any single institution or discipline
• funded by JISC
• with additional HEFCE funding from 2011 for
• the provision of support to national cloud services
• targeted institutional development
The DCC Mission
Helping to build capacity, capability and skills in data management and curation
across the UK’s higher education research community
– DCC Phase 3 Business Plan
DCC institutional stakeholders
University managers
Researchers
Research support staff with a role to play in data management, particularly those from
• University libraries
• IT services
• The research and innovation office
• Digital repositories
Why manage research data? The impact of e-Science and the global network
• “Research data is a form of infrastructure, the basis
for data intensive research across many domains” –
EC Riding the Wave report, 2010
• “Funders expect research to be international in
scope. A third of all articles published are
internationally collaborative” – Royal Society, 2011
The governmental and funder imperative
• “Publicly-funded research data must be made
available for secondary scientific research” – ESRC
research data policy
Why manage research data? The researcher incentive
• “By making their data available via licensed
platforms researchers stand to improve their
status as researchers through the mandatory
citing and attribution of their original work”
– Mark Hahnel, FigShare, IDCC 2011
Why manage research data? The researcher incentive
• “By making their data available via licensed
platforms researchers stand to improve their
status as researchers through the mandatory
citing and attribution of their original work”
– Mark Hahnel, FigShare, IDCC 2011
The same demanding, sometimes competing
community of perspectives that the Digital Curation
Centre was created to unravel…
Where is the data in research?
The six datacentric phases of the research lifecycle
Reflections: the research data lifecycle
Three perspectives
Scale and complexity – Volume and pace
– Infrastructure
– Open science
Policy – Funders
– Institutions
– Ethics & IP
Management – Storage
– Incentives
– Costs & Sustainability http://www.nonsolotigullio.com/effettiottici/images/escher.jpg/
“Surfing the
Tsunami” Science: 11 February 2011
The data deluge
http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.htm
l#november-2009
“For science to effectively function,
and for society to reap the full
benefits from scientific endeavours,
it is crucial that science data be
made open”
Open to all? Case studies of openness
in research
Choices are made according to context, with
degrees of openness reached according to:
• The kinds of data to be made available
• The stage in the research process
• The groups to whom data will be made
available
• On what terms and conditions it will be
provided
Default position of most:
• YES to protocols, software, analysis tools,
methods and techniques
• NO to making research data content freely
available to everyone
After all, where is the incentive? Angus Whyte, RIN/NESTA, 2010
“While many researchers are
positive about sharing data in
principle, they are almost
universally reluctant in
practice. ..... using these
data to publish results before
anyone else is the
primary way of gaining
prestige in nearly all
disciplines.” INCREMENTAL Project
“Data
sharing was
more readily
discussed by
early career
researchers.”
Rules and regulations…
Compliance
• Rights, Exemptions, Enforcement Data Protection Act
1998
• Climategate, Tree Rings, Tobacco and…(what’s next?)
Freedom of Information Act 2000
• etc. etc. etc……….. Computer Misuse Act
1980
Policy
• Public good
• Preservation
• Discovery
• Confidentiality
• First use
• Recognition
• Public funding
RCUK Policy and Code of Conduct on the
Governance of Good Research Conduct (updated Oct 2011)
UNACCEPTABLE RESEARCH CONDUCT includes mismanagement or
inadequate preservation of data and/or primary materials, including failure
to:
keep clear and accurate records of the research procedures followed
and the results obtained, including interim results;
hold records securely in paper or electronic form;
make relevant primary data and research evidence accessible to
others for reasonable periods after the completion of the research:
data should normally be preserved and accessible for 10 yrs (in some
cases 20 yrs or longer);
manage data according to the research funder’s data policy and all
relevant legislation;
wherever possible, deposit data permanently within a national
collection.
Responsibility for proper management and preservation of data and primary
materials is shared between the researcher and the research organisation.
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx
EPSRC’s nine expectations and
a roadmap - implications for HEIs
DCC
policy
summary
http://www.dcc.ac.uk/resources/policy-and-legal
…….addressing where
European copyright and
database law poses flaws and
obstacles to the access to
research data
Intellectual Property Rights and Digital Preservation
21.11.2011 at the Clifton Hill House, Bristol University
“a poor fit between technology, processes and
regulations constrains preservation actions and
significantly inhibits the benefits which long-term
access ought to deliver”
Regulation, regulation…
Data access as headline news
JISC Legal
Management – infrastructure and
data storage challenges...
The case for cloud computing in genome
informatics. Lincoln D Stein, May 2010
Scaleable
Cost-effective (rent on-demand)
Secure (privacy and IPR)
Robust and resilient
Low entry barrier / ease-of-use
Has data-handling / transfer /
analysis capability
Cloud services?
“Departments don’t have guidelines or
norms for personal back-up and researcher
procedure, knowledge and diligence varies
tremendously. Many have experienced
moderate to catastrophic data loss”
Incremental Project Report, June 2010
http://www.flickr.com/photos/mattimattila/3003324844/
Management - incentivisation,
recognition and reward
Management -
costs, benefits
and value
DCC Institutional Support:
Tools and Services
Martin Donnelly
Digital Curation Centre
University of Edinburgh
University of Stirling 27 April 2012
Institutional Engagements
With funding from HEFCE we’re:
• Working intensively with 18 HEIs to increase RDM capability
– 60 days of effort per HEI drawn from a mix of DCC staff
– Deploy DCC & external tools, approaches & best practice
• Support varies based on what each institution wants/needs
• Lessons & examples to be shared with the community
www.dcc.ac.uk/community/institutional-engagements
Some current IE activities
Assessing
needs
RDM roadmaps
Piloting tools
e.g. DataFlow
Policy
development
Policy
implementation
Support offered by the DCC
Assess
needs
Make the case
Develop
support
and
services
RDM policy development
Customised Data Management Plans
DAF & CARDIO assessments Guidance
and training
Workflow assessment
DCC support
team
Advocacy to senior management
Institutional data catalogues
Pilot RDM tools
…and support policy implementation
DATA MANAGEMENT STRATEGY
(Research and Admin)
Five components:
• Policy
• Advocacy
• Planning
• Tools
• Training
Four DCC Tools
Your Data as Assets: DAF
• What are the characteristics of
research data assets?
– Number?
– Scale?
– Complexity?
– Dependencies?
– Liabilities?
• Why do researchers act the way they
do with respect to data?
• What do they need to do research?
IN BRIEF
The Data Asset Framework provides a methodology
and online tool to identify research data assets and
find out how they are being managed. This
information will enable institutions to develop a data
strategy so their assets are preserved and remain
accessible in the long term. It is usually applied at
research group / department level to ensure the
scope is manageable.
URL: http://www.data-audit.eu
Data Management Planning:
DMP Online
• A growing requirement from
funders, publishers and HEIs,
in the UK and internationally
• Supportive of good research
practice, according to RCUK
• A cross-cutting activity
involving multiple stakeholder
types (researchers, librarians,
IT managers, support staff)
IN BRIEF
DMP Online is the DCC's web-based data
management planning tool. It allows you to build and
edit DMPs according to the requirements of the
major UK funders.
The tool also contains helpful guidance and links for
researchers and other data professionals. The
structure of the tool is based on the DCC’s Checklist
for a Data Management Plan.
URL: http://www.dcc.ac.uk/dmponline
Capacity Assessment and
Building: CARDIO • How well does an institution (or
department, School, etc) manage its data?
• Depends on: – Finances
– Technology
– Policy management
– Organisational will
• Demands acknowledgement of many perspectives
IN BRIEF
An online tool which helps departments or research
groups to identify and communicate their current data
management capabilities, and subsequently identify
coordinated pathways for future enhancement via a
dedicated knowledge base.
CARDIO emphasises a collaborative, consensus-
driven approach, and enables benchmarking with
other groups and institutions.
URL: http://cardio.dcc.ac.uk/
Risk Management: DRAMBORA
• A variety of risk factors, both internal and external, affect the management of digital objects such as research data
• Risks can tangible (fire/flood) or intangible (accidental data loss leading to reputational impact)
• They may exist in isolation, or lead to other risks if not adequately managed
IN BRIEF
DRAMBORA is an audit methodology and tool for
identifying and planning for the management of risks
which may threaten the availability and/or usability of
content in a digital repository or archive.
URL: http://www.repositoryaudit.eu
DCC Services
• Policy
• Strategy
• Training
• Other services…
Policy (i)
The DCC has a number of guidance resources related to
research data policy. We can guide institutions on their
requirements to manage/share data, and offer practical
steps to help them develop data policies by:
- Providing templates and examples to demonstrate
what aspects could be incorporated into a data policy;
- Coordinating / contributing to meetings of relevant
stakeholders to ensure all activities and perspectives are
addressed;
- Reviewing and feeding back on draft policies;
- Assisting with communications to launch and
implement the policy.
Policy (ii)
Benefits of developing a data policy:
- Compliance with funder guidelines, e.g. the EPSRC
expectation that HEIs have a RDM roadmap in place by
May 2012, and be fully compliant by May 2015;
- Assuring the good conduct of research in line with
Research Integrity guidelines (see RCUK & UKRIO docs);
- Clarity for researchers and demonstrable institutional
commitment for RDM;
- The prestige of joining a small but growing group of
leading institutions with a data policy:
http://www.dcc.ac.uk/resources/policy-and-
legal/institutional-data-policies
Strategy (i)
We offer a half-day workshop in which key stakeholders
from an institution (e.g. librarians, senior IT staff, research
administration, repository staff, researchers, etc) convene
to discuss and develop an institutional strategy for RDM.
Benefits:
- Coherence across service providers and agreed
direction for RDM services;
- Ability to reference strategy / commitment to RDM (the
University of Oxford policy may be a useful example of
this - http://www.admin.ox.ac.uk/rdm);
- A move towards more efficient management of data.
Strategy (ii)
Through practical breakout sessions, senior DCC staff can
lead and mediate discussion to help the institution
determine its priorities and define practical next steps.
These might include the development of infrastructure (e.g.
data repositories), new services (e.g. DMP support), policy
development, improved guidance or data management
training provision.
Suggested actions will depend on gaps/areas for
improvement as perceived by the institution.
Training (i)
We offer a variety of training courses:
- DC101 introduction to data management
- Tools of the Trade courses which give practical
overviews and hands-on exercises using DCC tools
- Train-the-Trainer, which equips information professionals
to teach RDM courses.
We also organise regional data management roadshow
events which can incorporate a training element.
Generic training materials are available online, and
hardcopy packs can be produced.
Training (ii)
The DCC can:
- Run courses, tailoring content to institutional needs;
- Assist in the development of online learning materials
(screencasts, audio-synced slides);
- Develop resources such as guidance documents, case
studies and manuals.
Key benefits of training provision are:
- Improved data management capacity;
- The opportunity to profile and raise awareness of
institutional support services.
Other services... CARDIO Used at research group or department level to assess activity and
data management infrastructure and contribute to an institution-wide
view
Data Asset Framework DAF is a structured mechanism used to identify what data exists and
understand how research data are being managed and shared
Customised DMP We can work with you to develop an institution-specific instance of
DMP Online for developing data management plans that fit funder
requirements before and after an award of grant
Policy development We can assist in the development of institutional policy
Workflow assessment Using tested methodologies we can analyse current research data
workflows
Training We can train people in the use of many of the above tools and in
generic skills such as data quality assessment
Costing We can assist with the development of costing and pricing for data
management services
Risk management Working with you to identify risks in current or planned research data
management practice, we will make recommendations on mitigation
and the elimination of those risks
Institutional data
catalogues
We can recommend options for exposing metadata about your
research data via CRIS systems, repositories, or a mix of these
Recap: support offered by the DCC
Assess
needs
Make the case
Develop
support
and
services
RDM policy development
Customised Data Management Plans
DAF & CARDIO assessments Guidance
and training
Workflow assessment
DCC support
team
Advocacy with senior management
Institutional data catalogues
Pilot RDM tools
…and support policy implementation
Practicalities
• University Modernisation Fund provides
resource for 18 “institutional engagements”
between DCC and HEIs
• Up to 60 days of effort available per
institution, between now and March 2013
• Institution agrees a schedule of work with
the DCC, and each assigns a primary
contact / programme manager
Questions and Thanks
For more information:
– Visit http://www.dcc.ac.uk
– Email [email protected] or
This work is licensed under a Creative Commons Attribution 2.5
UK: Scotland License. © Digital Curation Centre 2012