My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Post on 11-Aug-2014

598 views 26 download

Tags:

description

My keynote talk for Eurocris2014, Rome. I make the case for reuse of research data, discuss the barriers and look at ways we are trying to overcome them.

transcript

My Data, Our Data, Your Data:data reuse through data management

Kevin Ashley Digital Curation Centre

www.dcc.ac.uk@kevingashley

Kevin.ashley@ed.ac.uk

Reusable with attribution: CC-BY The DCC is supported by Jisc

2

A summary

• Why data reuse ?• What stops us ?• How data management helps• Harmonising the goals of research

administration and research• Barriers again• The case for reuse - again

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

3

My home – the DCC

• Mission – to increase capability and capacity for research data services in UK institutions

• Not just a UK problem – an international one

• Training, shared services, guidance, policy, standards, futures

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Kevin Ashley – Eurocris2014 - CC-BY 42014-05-14

What is data curation ?

• “Maintaining, preserving and adding value to research data throughout its lifecycle”

• More than preservation:– Active management – dealing with change

• Less than preservation:– Lifecycle sometimes involves destruction

5

DCC guidance

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

62014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

SWEDEN

DENMARK

CANADA

7

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

8

What a paleontologist looks at

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Now100 million years ago

25m50m 75m

1m

9

What a paleontologist looks at

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Now100 million years ago

25m50m 75m

1mNow 1 million years

750,000500,000100,000

10

What an archaeologist looks at

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Now 1 million years

750,000500,000100,000

100,000 years ago75,00050,00025,000

11

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

122014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

The Old weather project

Data for research, not from research

Kevin Ashley – Eurocris2014 - CC-BY 132014-05-14

14

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

15

Data reuse - messages

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Often your data tells stories that your

publications do not

Not all data comes from other researchers

One person’s noise is another person’s signal

Discipline-bounded data discovery doesn’t give us

all we need or want

Kevin Ashley – Eurocris2014 - CC-BY 162014-05-14

Why care?

• Data is expensive – an investment• Reuse:

– More research– Teaching & Learning– Planning

• Impact – with or without publication• Accountability• Legal & regulatory requirements

17

Why does this matter?

• Research quality– How close can we get to

the truth?• Research speed

– How quickly can we get to the truth?

• Research finance– How much does the

truth cost?

• Improving one or more of these is of interest to all actors:

• Researchers as data creators

• Researchers as data reusers

• Research institutions• Funders – hence

government and society

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Kevin Ashley – Eurocris2014 - CC-BY 18

G8UK - Endorses OAOpen Data CharterPolicy Paper18 June 2013

2014-05-14

G8UK - Billigt offenen ZugangEine offene Daten CharterStrategiepapier.

19

Funder requirements

• UK

• USA – NSF, NEH, NIH• Europe

• Most place burden on researcher – some on the institution

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx

20

RCUK policy - The 1-minute version

• Research data are a public good – make openly available in timely & responsible way

• Have policies & plans. Data with long-term value should be preserved & usable

• Metadata for discovery & reuse. Link publications & data

• Sometimes law, ethics get in the way. We understand.• Limited embargos OK. Recognition is important – always

cite data sources• OK to use public money to do this. Do it efficiently.

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Kevin Ashley – Eurocris2014 - CC-BY

EPSRC policy points

• Awareness of regulatory environment• Data access statement• Policies and processes• Data storage• Structured metadata descriptions• DOIs for data• Securely preserved for a minimum of 10 years

from last use2014-05-14

21

Compliance expected by 2015

Kevin Ashley – Eurocris2014 - CC-BY 222014-05-14

DCC Policy Summary

http://www.dcc.ac.uk/resources/policy-and-legal

Kevin Ashley – Eurocris2014 - CC-BY 232014-05-14

Findable, citable data has value

• Important to link publications to data (and vice versa)• Increases citations – of data & publication• Increases reuse (hence value)• But effects exist even without publication, if data is:

– Archived– Citable– Discoverable

MORAL: build a data registry

24

What stops data reuse• Loss• Destruction• Pride• Gluttony• Ineptitude• Concealment• Bureaucracy• Complexity• Procrastination• Lack of potential

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Kevin Ashley – Eurocris2014 - CC-BY 25

“Departments don’t have guidelines or norms for personal back-up and researcher procedure,

knowledge and diligence varies tremendously. Many have experienced moderate to

catastrophic data loss”

Incremental Project Report, June 2010

http://www.flickr.com/photos/mattimattila/3003324844/

2014-05-14

26

What stops data reuse• Loss• Destruction• Pride• Gluttony• Ineptitude• Concealment• Bureaucracy• Complexity• Procrastination• Lack of potential

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

27

How people talk about data

• I put my data in figshare and I got a DOI for it• Not our data; the university’s data; my

funder’s data; the data; the people’s data; your data.

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

28

Data ownership – it’s messy

• You need ownership to make data free• Governments may assert this• Industrial collaborators – understanding role

of public funding• Research admin tracks the rules

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

29

ON METADATA

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

30

Disciplines – current state

• Typically specialised• Focussed on discipline-specific concerns• Frequently embedded – hence processing

required to expose independently• Historic failure to express generic concepts

generically– Place– Time

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Kevin Ashley – Eurocris2014 - CC-BY 312014-05-14

Kevin Ashley – Eurocris2014 - CC-BY 322014-05-14

Understanding Data Requirements

http://www.dcc.ac.uk/

Kevin Ashley – Eurocris2014 - CC-BY 332014-05-14

Kevin Ashley – Eurocris2014 - CC-BY 34

Data centres are good value!

• See Jisc reports on ADS, BADC, UKDA:• Returns on investment between 400% and

1200%

2014-05-14

352014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

36

Integrity

• Not everyone publishes here

• Almost all fraud connected to unavailable data

• People suffer & die due to research fraud

• When your research is reproducible – it gets cited

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

37

Integrity – not without data• Cyril Burt

– Twin studies on intelligence.– Questioned 1976; now discredited

• Duke case– Data hiding leads to wasted treatments, clinical trials,

probable death & huge lawsuits• Dutch cases

– Stapel – 55 publications – “fictitious data”– Poldermans – fabricated data or negligence?

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256

38

Citability

• Making data available increases citations• Everyone – academic, funder, institution – loves

citations• Want evidence?

– Alter, Pienta, Lyle – 240%, social sciences *– Piwowar, Vision – 9% (microarray data)†– Henneken, Accomazzi – 20% (astronomy) #

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1

* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307

# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618

Kevin Ashley – Eurocris2014 - CC-BY 392014-05-14How to cite data

What data to keep

40

The Data Deluge is upon us

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Sensor’s ability to produce data outstrips IT’s ability to process it

Kevin Ashley – Eurocris2014 - CC-BY 412014-05-14

Kevin Ashley – Eurocris2014 - CC-BY 42

Roles and Responsibilities

What data to keep

2014-05-14

Kevin Ashley – Eurocris2014 - CC-BY 43

Excuses – and responses• “People will ask questions”

– So use a data centre or repository• “It will be misinterpreted”

– Stuff happens. Also, openness encourages correction• “It’s not interesting”

– Let others be the judge – your noise is my signal• “I might get another paper out of it”

– Up to a point. We might get more research out of it• “I don’t have permission”

– A real problem. But solvable at senior level• “It’s too bad/complicated” –see above• “It’s not a priority”

– Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well

2014-05-14

See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/

44

Should all data be open?

• NO• Many reasons – most to do with human

subjects• But data existence should always be open• Allows discovery & negotiation on use• Avoids pointless replication

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Kevin Ashley – Eurocris2014 - CC-BY 45

Some conundrums

• Releasing genome data is OK when it’s:– An identified human subject– An anonymous human subject– Your pet dog– Another mammal– An insect– A plant– A virus

2014-05-14

46

It’s amazing what people will share…

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

47

Data reuse from Hubble

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

482014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

49

Pimp your data –

make it findable & reusable

2014-04-25 Kevin Ashley, DCC – SocSciScot14 - CC-BY

Gking.harvard.edu/data

50

Data is variable

• Not always textual• Not always tabular• Not always fixed – continual change• Not always clearly authored – think of archival

provenance• Not always associated with publication• Often with indistinct boundaries• Multi-dimensional and non-linear

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

51

Some messages for you

• Some things we need to know about data:– When/where/what is it about?– Who owns it– What rights apply– What it is derived from & how– What software may be associated– What data management plan applies– How do I gain access ?– Where is it ?– When was/will it be destroyed?

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

52

What about your data?

• If administrative data isn’t freely available, why not?

• Expose it in bulk – not just as a web page• Gain the value from your overheads!

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

53

What about collaboration?

• Collaborate within the university• Collaborate with partners• Collaborate with regional, national services• Not everything can be done well locally• Some examples…

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Kevin Ashley – Eurocris2014 - CC-BY 54http://dataintelligence.3tu.nl/en/home/

http://www.sheffield.ac.uk/is/research/projects/

rdmrose

Choice of RDM training materials for librarians

Up-skilling for data

http://datalib.edina.ac.uk/mantra/libtraining.html

2014-05-14

55

My message to researchers• The credit belongs to you• The data belongs to all of us• Share, and we all reap the

benefits

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY