+ All Categories
Home > Documents > Data Provenance and Attribution for Published Datasets The Challenge and the reality check April...

Data Provenance and Attribution for Published Datasets The Challenge and the reality check April...

Date post: 27-Mar-2015
Category:
Upload: ava-morales
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
18
Data Provenance and Attribution Data Provenance and Attribution for Published Datasets for Published Datasets The Challenge The Challenge and the reality and the reality check check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA Cyndy Chandler Cyndy Chandler Biological and Chemical Oceanography Biological and Chemical Oceanography Data Management Office Data Management Office Woods Hole Oceanographic Institution Woods Hole Oceanographic Institution
Transcript
Page 1: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

Data Provenance and Attribution for Data Provenance and Attribution for Published DatasetsPublished Datasets

The Challenge The Challenge and the reality checkand the reality check

April 9-10, 2009National Academy of Sciences, Woods Hole, MA

Cyndy ChandlerCyndy Chandler

Biological and Chemical Oceanography Biological and Chemical Oceanography Data Management OfficeData Management Office

Woods Hole Oceanographic InstitutionWoods Hole Oceanographic Institution

Page 2: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 22 of 18 of 18

What is the goal?What is the goal?

to establish best practice guidelines for to establish best practice guidelines for metadata capture and recording to support metadata capture and recording to support data provenance and attribution of data provenance and attribution of published datasetspublished datasets

this talk will focus on oceanographic datathis talk will focus on oceanographic data

Page 3: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 33 of 18 of 18

What is the problem?What is the problem? Why aren’t we doing this already?Why aren’t we doing this already? provenance tracking and attribution provenance tracking and attribution

systems have been in use for a long timesystems have been in use for a long time works of artworks of art works of literatureworks of literature

Page 4: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 44 of 18 of 18

Why aren’t we doing this already?Why aren’t we doing this already? What is so difficult about associating What is so difficult about associating source data with a journal publication?source data with a journal publication?

data acquisition

data publication

journal publication

Page 5: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 55 of 18 of 18

Why aren’t we doing this already?Why aren’t we doing this already?

What are the challenges?What are the challenges?

TechnicalTechnical CulturalCultural UsualUsual

Page 6: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 66 of 18 of 18

Why aren’t we doing this already?Why aren’t we doing this already?

Technical reasons …Technical reasons … data are not publisheddata are not published

what is the definition of a published dataset?what is the definition of a published dataset?

and if the data are ‘published’ and if the data are ‘published’ it’s not clear how to cite themit’s not clear how to cite them they lack sufficient metadatathey lack sufficient metadata metadata are non-standardmetadata are non-standard or they lack a persistent identifieror they lack a persistent identifier

Page 7: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 77 of 18 of 18

Why aren’t we doing this already?Why aren’t we doing this already?

Technical reasons …Technical reasons … data sets used to be smaller and were often data sets used to be smaller and were often

published on paper (in a journal article or a published on paper (in a journal article or a data report, and they fit in Table 1)data report, and they fit in Table 1)

data were published as a tangible thingdata were published as a tangible thing as data acquisition becomes automated, rate of as data acquisition becomes automated, rate of

acquisition and volume increasesacquisition and volume increases but metadata acquisition (data documentation) but metadata acquisition (data documentation)

is not being automated at the same rateis not being automated at the same rate

Page 8: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 88 of 18 of 18

Why aren’t we doing this already?Why aren’t we doing this already?

Cultural reasons …Cultural reasons … little incentive for researchers to publish their little incentive for researchers to publish their

data data often augmented by the perception that the often augmented by the perception that the

data are the ‘property’ of the originating data are the ‘property’ of the originating investigator, and might be ‘stolen’investigator, and might be ‘stolen’

Conventional wisdom is still that ‘publish or perish’ applies predominantly to journal publications, not data publication. (Funding agency program managers are beginning to effect change in this area.)

Page 9: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 99 of 18 of 18

Why aren’t we doing this already?Why aren’t we doing this already?

Usual reasons …Usual reasons … lack of resourceslack of resources

FundingFunding ExpertiseExpertise TimeTime

Page 10: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1010 of 18 of 18

remember remember where these data where these data

come from …come from …

… … this is this is the the

office !office !

Think I’ll go record some

metadata.

Who’srecording the

metadata?

Page 11: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1111 of 18 of 18

Why aren’t we doing this already?Why aren’t we doing this already?

What is so difficult about associating What is so difficult about associating source data with a journal publication?source data with a journal publication?

data acquisition

data publication

journal publication

Page 12: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1212 of 18 of 18

data acquisition

data publication

journal publication

a relatively simple casea relatively simple case Many of the VERTIGO Many of the VERTIGO

project cruise data sets are project cruise data sets are available online from available online from BCO-DMOBCO-DMO

and they’re tagged with and they’re tagged with metadata.metadata.

The introductory paper The introductory paper refers to the online data refers to the online data server.server.

Source data are available Source data are available online for this special online for this special volume. volume.

Page 13: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1313 of 18 of 18

Why aren’t we doing this already?Why aren’t we doing this already?

Let’s assume this effort is fully funded ~ so all the usual Let’s assume this effort is fully funded ~ so all the usual reasons are no longer an issue ~ funding, expertise, time ~ reasons are no longer an issue ~ funding, expertise, time ~ no longer a challenge ! no longer a challenge !

Combined cultural and technical challenges …Combined cultural and technical challenges … The simplest system for data publication and attribution The simplest system for data publication and attribution

involves at least one representative from each of these involves at least one representative from each of these three communities:three communities:

• Oceanographer ( research discipline )Oceanographer ( research discipline )• Data manager ( information science )Data manager ( information science )• Editor ( publishing community ) Editor ( publishing community )

Page 14: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1414 of 18 of 18

Why aren’t we doing this already?Why aren’t we doing this already?

Combined cultural and technical challenges …Combined cultural and technical challenges … The The successfulsuccessful system for data publication and system for data publication and

attribution more likely involves attribution more likely involves sixsix communities communities• Oceanographer (research discipline )Oceanographer (research discipline )• Data manager (information science )Data manager (information science )• Library science Library science • Information technology expertise from these fieldsInformation technology expertise from these fields• Social scienceSocial science• Editor ( publishing community ) Editor ( publishing community )

and effective communication between those communitiesand effective communication between those communities

Page 15: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1515 of 18 of 18

Additional ChallengesAdditional Challenges What if all the whining from the previousWhat if all the whining from the previous

slides could be addressed somehow? slides could be addressed somehow?

EducationEducation Cultural changesCultural changes Standards development and implementationStandards development and implementation Funding sourcesFunding sources CommunicationCommunication

challengeschallenges

Page 16: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1616 of 18 of 18

Additional ChallengesAdditional Challenges

micro attribution – what level is required to micro attribution – what level is required to support scientific inquiry? support scientific inquiry? what are the identifiable entities within a what are the identifiable entities within a

publication that require data attributionpublication that require data attribution• the entire article?the entire article?• each table? each figure?each table? each figure?• publications often have many source data setspublications often have many source data sets

who does all that work? The author(s) ?who does all that work? The author(s) ?

Page 17: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1717 of 18 of 18

It is important to figure this out.It is important to figure this out.

Data are Data are difficult and difficult and expensive to expensive to collect, and collect, and can not be can not be recollected.recollected.

We want to maximize data reuse.We want to maximize data reuse.

Page 18: Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1818 of 18 of 18

thank youthank you


Recommended