Download - Data Selection & Triage

Data Selection & Triage

JISC/DCC Progress

Workshop Managing

Research Data & Institutional

EngagementNottingham25 October

2012

This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License

http://creativecommons.org/licenses/by/2.5/scotland/

Introduction

How can researchers and support staff effectively decide what data is worth holding on to, agree what to do with it, and arrange for its handover?

What challenges does this represent

How to address them?

Outline

• What guidelines are there and why do we need more?Angus Whyte DCC and Marie Therese Gramstadt - KAPTUR

• UK Data Archive's Data Review Process - Veerlevan Eynden UKDA

• Applying NERC's Data Value Checklist - Sam Pepler, British Atmospheric Data Centre

• Discussion

Guidelines clarify expectations

What criteria will be used to judge what’s handed over?

…adapted by Archaeology Data Service NERC KAPTUR University of Leicester

Basic model

1. Define a policy i.e. criteria and range of decisions

2. Archive manager applies criteria, involving researchers

3. Select the significant, dispose of the rest

For records records yes, but researchdata?

All data

10%

90%

Characterising research data…• Research process more uncertain and open-ended

than admin processes

• Research data purpose may change before complete

• More effort to make reusable - complex inter-relationships, and richer contexts to document

• Originators should be engaged but may not have capacity e.g. if project funding has ceased

• Others may need to be involved with broader view of potential in other disciplines

• More than keep/dispose choice –need to prioritise attention and effort to make data fit for reuse

Triage analogy

Criteria

Duty of care

Reuse value

Quality and condition

Accessibility

Costs associated

Prioritise

High reuse value +needs attentionaffordable

Otherpermutations

More permutations

Low reuse value,Unaffordable

Tiered approach to deploying resources

Discoverability

Access management

Storage performance

Preservation actions

Deposit location

Institutional Data Repository

Data Centre

Subject Repository etc.

Potential to automate ?

First characterise

research data

Clarify expectations

What kinds of “data” are wanted

For what kinds of reuse

e.g.Data Centre Collection Policies

9

http://archaeologydataservice.ac.uk/advice/collectionsPolicy

“The ADS expects to collect all of the following archaeological data types…”

Costs should persuade us

10

IDC Digital Universe Study- Increasing volumes outpace declining storage hardware costs

According to: John Gantz and David Reinsel 2011 Extracting Value from Chaos http://www.emc.com/digital_universe.

We can’t afford it all

11

“Keeping 2018’s data in S3 would cost the entire global GDP”

http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html

Selection presumes description

12

• You can’t value what you don’t know about!

• Researchers can’t afford NOT to spend effort on minimal metadata description and organisation, because costs of retention will be much higher if they don’t

• Description makes data affordable – is citation potential a concrete enough reward?

Challenges

• Identify what datasets are created and where they are

• Differentiate those that are of high value from those where most uncertainty or least reusability

• Be able to justify ‘natural’ wastage of low priority data as much as deliberate selection of high value

Questions

• What has worked/is working

• What lessons have you learned and how generalisable

• What challenges remain

• How may they be approached and what do you intend to do

• What DCC / MRD activity do you think may help make the challenge more tractable.