Date post: | 15-Jan-2015 |
Category: |
Technology |
Upload: | andrew-treloar |
View: | 686 times |
Download: | 3 times |
Data, Librarians, and Services
TICER 2010
Dr Andrew TreloarDirector of Technology, ANDS
1
Contents Data – the past and the future Data and scholarly communications Data problems in published literature Why re-use data? Data sharing services and librarians’ role
2
A historical perspective Data capture and management problems have
been with us for a while… But for how long? And what are some of the basic operations?
Create
Store
Describe
Discover
Preserve
Doomed data
In the vill in which St. Peter’s Church is situated [Westminster] the abbot of the same place holds 13½ hides. There is land for 11 ploughs. To the demesne belongs 9 hides and 1 virgate, and there are 4 ploughs. The villeins have 6 ploughs, and there could be 1 plough more. There are 9 villeins each on 1 virgate and 1 villein on 1 hide, and 9 villeins on each half a virgate and 1 cottar on 5 acres, and 41 cottars who pay 40 shillings a year for their gardens. [There is] Meadow for 11 ploughs, pasture for the livestock of the vill, woodland for 100 pigs, and 25 houses of the abbot’s knights and other men who pay 8 shillings a year. In all it is worth £10; when received, the same; TRE £12. This manor belonged and belongs to the demesne of St. Peter’s Church
http://www.learningcurve.gov.uk/focuson/domesday/take-a-closer-look/
“A Correct Tide-Table, Shewing the True Times of the High-Waters at London-Bridge, to Every Day in the Year 1683. By Mr. Flamstead”Philosophical Transactions, Vol. 13, (1683), pp. 10-15
“An Observation of the Beginning of the Lunar Eclipse which Hapned Aug. 19. 1681. in the Morning, Made on the Island of St. Lawrence or Madagascar, by Mr. Tho. Heathcot, and Communicated by Mr. Flamstead”Philosophical Transactions, Vol. 13, (1683), p. 15
Data problems in published literature
Inconvenient data
DOI: 10.1098/rsta.2005.1569
Imprisoned data
DOI 10.1098/rsta.2006.1793
Invisible data
DOI 10.1098/rsta.2006.1793
Inaccessible data
Missing negative data Need title capture for negative results
GROUP EXERCISE #1
ands.org.au 19
Why Data? Why Now? We are in an era of increasing data-intensive research Almost all data is now born digital Increasing amount of data generated
(semi-)automatically “Consequently, increasing effort and therefore funding
will necessarily be diverted to data and data management over time” Towards the Australian Data Commons (TADC), p. 4
20
Need for standardisation Software and hardware keep getting cheaper, wetware
keeps getting more expensive Fixing data management problems is enormously
labour intensive and costly “Consequently, standardisation within forms of data
and simplification in the frameworks around retention, storage, access and use of data, and the elimination of differences whose resolution requires labour, must be made, if the on-going keeping and reuse of data is to remain affordable” (TADC, p. 5)
21
Bringing data together With more data online, more can be done Possible now to answer questions unrelated to
reasons why data was collected originally Increasing focus on cross-disciplinary science “Consequently greater clarity is needed over control
and access to community-funded data, and the means of aggregating, federating and accessing such data are increasingly important” (TADC, p. 5)
22
Why re-use data? Efficiency Validation Integrity Value for money Self-interest
ands.org.au 23
24
Astronomy case study Hubble Space Telescope (HST) operating since 1990 Observations are proposed, and if accepted, data is collected
and made available to the proposers – who then write a research paper
Each year around 1,000 proposals are reviewed and approximately 200 are selected, for a total of 20,000 individual observations
Data is stored at the Space Telescope Science Institute and made available after embargo period
There are now more research papers written by “second use” of the research data, than by the use initially proposed
ands.org.au 25
Cancer micro-array trial case study Piwowar, et. al., “Sharing Detailed Research Data Is
Associated with Increased Citation Rate” http://www.plosone.org/article/info:doi/10.1371/journal.pone.
0000308 Looked at the citation history of cancer microarray
clinical trial publications Found that publicly available data was associated with
a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin
ands.org.au 26
ands.org.au 27
Climate proxy data case study The southern limit of whaling is constrained by sea
ice, and since 1931 whaling records have been collected for every whale caught
Analysis indicates that the Antarctic summer sea-ice edge has moved southwards by 2.8° of latitude between the mid 1950s and early 1970s
This suggests a decline in the area covered by sea ice of some 25%
Nature, Vol 389, 4 Sep 1997, pp. 57-60
Australian National Data Service (ANDS) An initiative of the Australian Government being
conducted as part of the National Collaborative Research Infrastructure Strategy ($A24M) and the Super Science Initiative ($A48M)
A collaboration between Monash University, the Australian National University and CSIRO
About 40 staff, funded to mid 2013 More researchers re-using more data more often Data as a first-class object
ands.org.au 28
Key differentiators Nationally co-ordinated approach Institutionally-focussed engagement
“helping them meet their research data ambitions” But also engaging with large nationally-funded
discipline investments Bulk of funds spent outside ANDS All disciplines covered Focus on data re-use
29ands.org.au
ands.org.au 3030
What does ANDS provide? Resourcing
community infrastructure projects institutional change funding for data activities, infrastructure development
Online Services ANDS infrastructure to support data registration, identification, publication,
classification, etc Expertise and information
consultancy recommendations, policy, and advice capability building information sharing, sharing experience
Policy advocacy
Australian Research Data Commons
31
Building the Australian Research Data Commons
QUESTIONS SO FAR?
ands.org.au 32
Data Sharing Verbs for Discovery and Reuse
ands.org.au 33
Plan Planning is important to get research institutions,
managers and researchers to think about issues early, and to make sure steps aren’t missed
Librarians’ role provide leadership and advice develop policies, procedures, planning guidelines get these adapted to, implemented and used in institutions advocate for, and promote, best practice examples
ands.org.au 34
Create (or Capture) Adding metadata is done most cheaply and
effectively as close as possible to point of creation/capture Treloar/Wilkinson, DOI 10.1109/eScience.2008.41
Librarians’ role advise researchers/technologists on appropriate standards
and metadata schema assist with metadata content quality
ands.org.au 35
Store Needs to be done well, on institutionally-
supported system/s Librarians’ role
work with researchers to identify appropriate solutions partner with it to ensure availability of solutions use metadata expertise to ameliorate poor metadata
management in data store solutions raise awareness of risks of non-appropriate solutions
ands.org.au 36
Describe Four kinds of information needed for re-use
information for discovery information for determination of value information for access information for re-use
Librarians’ role use metadata expertise to assist researchers and
technologists with metadata standards
ands.org.au 37
Identify Persistent identifiers for data provide level of
indirection to assist with long term access DataCite consortium formed in 2009 to assign DOIs
to data objects Librarians’ role
advise researchers on how to cite data (and make it available for citing)
lobby authors of style guides and bibliography mgt systems
ands.org.au 38
Register Metadata about data can be made available to
registries for discovery and re-use OAI-PMH may be available, DC probably less useful
for data than for documents Librarians’ role
help data infrastructure folks to identify and feed appropriate registries
investigate use of IRs for data and maintain feeds to registries
ands.org.au 39
Discover Range of discovery options (web search engine,
metadata aggregators, discipline-specific) Librarians’ role
help data store managers to identify right discovery systems to feed
ands.org.au 40
Access Multiple options
direct link to open-access data link to data store with its own access controls
register/login only for open access (DANS) register/login for restricted access
contact information for how to get data Librarians’ role
advise researchers on IP and rights issues ensure data is managed appropriately in data store
ands.org.au 41
Exploit Focused on the kinds of things that can be built on
top of data once it is re-usable mashups data fusion cross-disciplinary discovery and visualisation
Librarians’ role assist researchers to locate relevant data provide advice about 3rd party copyright/IP issues and
licensing for data
ands.org.au 42
Preserve Could have chosen Curate (but this is bigger) or
Migrate (but this is a means not an end) Will require engagement with storage service
providers Librarians’ role
provide expertise in long-term preservation and curation of objects
partner with technologists and archivists to combine all relevant expertise
ands.org.au 43
GROUP EXERCISE #2
ands.org.au 44
Conclusion Data is becoming steadily more important for
research Research results need to be communicated Data is the next great challenge for scholarly
communication And so, it should be the next great challenge for
libraries Over to you!
Acknowledgements and Links Thanks to Cathrine Harboe-Ree (University
Librarian) and Sam Searle (Data Management Coordinator), Monash University
ANDS Web Site: http://ands.org.au/ ANDS Services site: http://services.ands.org.au/
Me: andrew.treloar.netands.org.au 46