Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | ava-morales |
View: | 215 times |
Download: | 0 times |
Data Provenance and Attribution for Data Provenance and Attribution for Published DatasetsPublished Datasets
The Challenge The Challenge and the reality checkand the reality check
April 9-10, 2009National Academy of Sciences, Woods Hole, MA
Cyndy ChandlerCyndy Chandler
Biological and Chemical Oceanography Biological and Chemical Oceanography Data Management OfficeData Management Office
Woods Hole Oceanographic InstitutionWoods Hole Oceanographic Institution
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 22 of 18 of 18
What is the goal?What is the goal?
to establish best practice guidelines for to establish best practice guidelines for metadata capture and recording to support metadata capture and recording to support data provenance and attribution of data provenance and attribution of published datasetspublished datasets
this talk will focus on oceanographic datathis talk will focus on oceanographic data
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 33 of 18 of 18
What is the problem?What is the problem? Why aren’t we doing this already?Why aren’t we doing this already? provenance tracking and attribution provenance tracking and attribution
systems have been in use for a long timesystems have been in use for a long time works of artworks of art works of literatureworks of literature
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 44 of 18 of 18
Why aren’t we doing this already?Why aren’t we doing this already? What is so difficult about associating What is so difficult about associating source data with a journal publication?source data with a journal publication?
data acquisition
data publication
journal publication
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 55 of 18 of 18
Why aren’t we doing this already?Why aren’t we doing this already?
What are the challenges?What are the challenges?
TechnicalTechnical CulturalCultural UsualUsual
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 66 of 18 of 18
Why aren’t we doing this already?Why aren’t we doing this already?
Technical reasons …Technical reasons … data are not publisheddata are not published
what is the definition of a published dataset?what is the definition of a published dataset?
and if the data are ‘published’ and if the data are ‘published’ it’s not clear how to cite themit’s not clear how to cite them they lack sufficient metadatathey lack sufficient metadata metadata are non-standardmetadata are non-standard or they lack a persistent identifieror they lack a persistent identifier
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 77 of 18 of 18
Why aren’t we doing this already?Why aren’t we doing this already?
Technical reasons …Technical reasons … data sets used to be smaller and were often data sets used to be smaller and were often
published on paper (in a journal article or a published on paper (in a journal article or a data report, and they fit in Table 1)data report, and they fit in Table 1)
data were published as a tangible thingdata were published as a tangible thing as data acquisition becomes automated, rate of as data acquisition becomes automated, rate of
acquisition and volume increasesacquisition and volume increases but metadata acquisition (data documentation) but metadata acquisition (data documentation)
is not being automated at the same rateis not being automated at the same rate
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 88 of 18 of 18
Why aren’t we doing this already?Why aren’t we doing this already?
Cultural reasons …Cultural reasons … little incentive for researchers to publish their little incentive for researchers to publish their
data data often augmented by the perception that the often augmented by the perception that the
data are the ‘property’ of the originating data are the ‘property’ of the originating investigator, and might be ‘stolen’investigator, and might be ‘stolen’
Conventional wisdom is still that ‘publish or perish’ applies predominantly to journal publications, not data publication. (Funding agency program managers are beginning to effect change in this area.)
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 99 of 18 of 18
Why aren’t we doing this already?Why aren’t we doing this already?
Usual reasons …Usual reasons … lack of resourceslack of resources
FundingFunding ExpertiseExpertise TimeTime
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1010 of 18 of 18
remember remember where these data where these data
come from …come from …
… … this is this is the the
office !office !
Think I’ll go record some
metadata.
Who’srecording the
metadata?
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1111 of 18 of 18
Why aren’t we doing this already?Why aren’t we doing this already?
What is so difficult about associating What is so difficult about associating source data with a journal publication?source data with a journal publication?
data acquisition
data publication
journal publication
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1212 of 18 of 18
data acquisition
data publication
journal publication
a relatively simple casea relatively simple case Many of the VERTIGO Many of the VERTIGO
project cruise data sets are project cruise data sets are available online from available online from BCO-DMOBCO-DMO
and they’re tagged with and they’re tagged with metadata.metadata.
The introductory paper The introductory paper refers to the online data refers to the online data server.server.
Source data are available Source data are available online for this special online for this special volume. volume.
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1313 of 18 of 18
Why aren’t we doing this already?Why aren’t we doing this already?
Let’s assume this effort is fully funded ~ so all the usual Let’s assume this effort is fully funded ~ so all the usual reasons are no longer an issue ~ funding, expertise, time ~ reasons are no longer an issue ~ funding, expertise, time ~ no longer a challenge ! no longer a challenge !
Combined cultural and technical challenges …Combined cultural and technical challenges … The simplest system for data publication and attribution The simplest system for data publication and attribution
involves at least one representative from each of these involves at least one representative from each of these three communities:three communities:
• Oceanographer ( research discipline )Oceanographer ( research discipline )• Data manager ( information science )Data manager ( information science )• Editor ( publishing community ) Editor ( publishing community )
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1414 of 18 of 18
Why aren’t we doing this already?Why aren’t we doing this already?
Combined cultural and technical challenges …Combined cultural and technical challenges … The The successfulsuccessful system for data publication and system for data publication and
attribution more likely involves attribution more likely involves sixsix communities communities• Oceanographer (research discipline )Oceanographer (research discipline )• Data manager (information science )Data manager (information science )• Library science Library science • Information technology expertise from these fieldsInformation technology expertise from these fields• Social scienceSocial science• Editor ( publishing community ) Editor ( publishing community )
and effective communication between those communitiesand effective communication between those communities
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1515 of 18 of 18
Additional ChallengesAdditional Challenges What if all the whining from the previousWhat if all the whining from the previous
slides could be addressed somehow? slides could be addressed somehow?
EducationEducation Cultural changesCultural changes Standards development and implementationStandards development and implementation Funding sourcesFunding sources CommunicationCommunication
challengeschallenges
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1616 of 18 of 18
Additional ChallengesAdditional Challenges
micro attribution – what level is required to micro attribution – what level is required to support scientific inquiry? support scientific inquiry? what are the identifiable entities within a what are the identifiable entities within a
publication that require data attributionpublication that require data attribution• the entire article?the entire article?• each table? each figure?each table? each figure?• publications often have many source data setspublications often have many source data sets
who does all that work? The author(s) ?who does all that work? The author(s) ?
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1717 of 18 of 18
It is important to figure this out.It is important to figure this out.
Data are Data are difficult and difficult and expensive to expensive to collect, and collect, and can not be can not be recollected.recollected.
We want to maximize data reuse.We want to maximize data reuse.
09 April 09 April 20092009 Cyndy Chandler ~ Woods Hole Oceanographic InstitutionCyndy Chandler ~ Woods Hole Oceanographic Institution 1818 of 18 of 18
thank youthank you