The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management...

Post on 13-Jul-2020

0 views 0 download

transcript

The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science & Scholarship Graduate School of Library & Information Science University of Illinois at Urbana-Champaign Digital Humanities at Oxford Summer School 14-18 July 2014

Agenda

Data management –  ...as a DH technique

•  “…valued ends…” •  “…available resources…”

– DMP Agency Mandates – DMP beyond two pages

Sustainability – Significant properties – 2 Case studies in DH sustainability

“I’m trying to deflate the idea of digital humanities from a domain to an underlying set of practices” 6 July DH 2014

DM as a DH Technique

Many different

Techniques

Data Management as a DH Technique

“…the ensemble of practices by which one uses available resources in order to achieve certain valued ends.”

Harold Lasswell

Valued Ends

•  Preservation of Knowledge (material artifacts that are produced, as well as ways of knowing)

•  Maximize the value of public investment •  Increase the efficiency of doing digital

humanities research – both immediate and long-term.

7 The Royal Society Science Policy Centre. (2012). Science as an open enterprise. Page 60.

Data management

•  Is highly personal •  Interpersonal when collaborating •  Intrapersonal in our relationship with

institutions, organizations and funding agencies

! =

Data management techniques include concerns of …

•  Planning ( more in a bit ) / Costing •  Documentation •  Formatting •  Storage •  Copyright / IP / Licensing

Documentation

Documentation : tricks and tips

•  Include a “header” line that describes the variables as the first line in the table.

•  Use plain ASCII text for your file names, variable names, and data values.

•  Record naming schemes (<- develop naming schemes)

•  When you export from an analysis environment (e.g. SPSS, R, Gephi, etc.) record transformations in a separate: readme_(filename).txt file

Storage & Formatting!

Storage : DIY Cyberinfrastructure

Formatting & Storage: Tricks and Tips

•  Store data in nonproprietary software formats (e.g., comma delimited text file, .csv); proprietary software (e.g., Excel, Access) can become unavailable, whereas text files can always be read.

•  When in an analysis stage - store an uncorrected (raw) data file. Do not make any corrections to this file; make corrections within a scripted language.

Modified from: https://www.nceas.ucsb.edu/content/simple-guidelines-effective-data-management

Copyright / IP slide

IP: Tricks of Trade

Melissa Levine’s Checklist on the DH Curation Guide: http://guide.dhcuration.org/legal/policy/#p05

Data Management Planning

•  Is highly social – Dialectic (optimal vs. practical) – Plans change

Peer Reviewed

Components Enforcement

AHRC

Yes

Summary of Digital Outputs and Digital Technologies; Technical Methodology; Standards and Formats; Hardware and Software; Data Acquisition, Processing, Analysis and Use; Technical Support and Relevant Experience; Preservation, Sustainability and Use; Preserving Your Data; Ensuring Continued Access and Use of Your Digital Outputs

Unclear

NEH

YES

Expected types of data Period of data retention Data forms and dissemination Data storage and preservation

YES

EU

No Data set reference and name Data set description Standards and metadata Data sharing Archiving and preservation

Sliding

DMP Mandates (Funding Agencies)

AHRC Example Project: Kitchen Cosmology Project University of Bristol. PI: Dr. Rita Langer. Link: http://bit.ly/1n0eVUn

NEH Example Project: A unified approach to preserving cultural software objects and their development histories : UC – Santa Cruz. PI Noah Wardrip-Fruin Link: http://1.usa.gov/1kNxM8n

completed worksheets

Costing – Tricks and Tips

4C: Overview of 10 curation cost models: http://bit.ly/1lDMUFt “…provides a short description of each of the models and a presentation of their core features…”

More tricks of the trade slide

•  Advertise your data •  Say how you would like it to be cited (paper?

data? both?) •  State known limitations (fit-for-purpose) •  Rely on journals, repositories and colleagues

for guidance •  Don’t rely on journals, repositories or

colleagues for guidance

SUSTAINABILITY How do projects end?

Why this matters to DC

Fundamental questions of digital preservation: 1.  What must you retain to ensure the integrity

and authenticity of the digital object? 2. What can you lose without potential implications?

Significant Properties

“…characteristics of an information object that must be maintained to ensure that object’s continued access, use, and meaning over time as it is moved to new technologies.” (Wilson, 2007).

Five categories of SPs • Content • Context • Rendering • Structure • Behavior

Criteria for deciding significance

Grace, S. & Knight, G. (2008)

GLOBALIZATION AND AUTONOMY ONLINE COMPENDIUM

Case study 1 : Sustainability

Rockwell, Day, Yu, and Engel (2014) Burying Dead Projects: Depositing the Globalization Compendium. Digital Humanities Quarterly; 8 (002). http://www.digitalhumanities.org/dhq/vol/8/2/000179/000179.html

Then we came to (planning for) the end

-  XML files with content; -  A MySQL bibliographic database; -  A metadata database of the content

for generating topical pages and for searching;

-  A full text index for searching the text;

-  The code that handles the dynamic generation of the site, the searching, linking, and the XSL transforms;

-  Some HTML pages and CSS stylesheets;

-  And various images that are embedded in pages.

End of what?

http://globalautonomy.ca/global1/index.jsp

“The experience of the Compendium is that the intellectual work is not only in the individual articles, or even in the bibliographic data – it is in the interaction between these, mediated by code and in the user experience.”

Rockwell et al. 2014

What was deposited?

Content: …the texts, including bibliography, and glossary. We also considered the text on the HTML pages content.

Code: HTML, CSS, and includes the XSLT code that generated much of the interface Process: …materials (but not all) that document the editorial processes, including the editorial backend that strictly speaking was not part of the Compendium as experienced. The User Experience: …information about the experience of the Compendium as an interactive work by writing a narrative along with screen shots of typical use of the Compendium stored as PDFs

Five categories of SPs • Content • Context • Rendering • Structure • Behavior

Rockwell’s Categories •  Content •  Code •  Process •  User Experience

PERSEUS DIGITAL LIBRARY Case study 2 : Sustainability

How would Perseus End? (hint – not by beheading Medusa)

RESOURCE LIST

Rockwell, Day, Yu, and Engel (2014) Burying Dead Projects: Depositing the Globalization Compendium. Digital Humanities Quarterly; 8 (002). http://www.digitalhumanities.org/dhq/vol/8/2/000179/000179.html Grace, S. & Knight, G. (2008) What are significant properties and why should I care? Presentation delivered at Digital Curation 101, October, 7 2008. Edinburgh, Scotland