treating data like dataunifying data processing workflows for datasets in the IR
Steve Van Tuyl - @badgerbouse
Data and Digital Repository Librarian,
Oregon State University
#WorstTalkEver
• introduction
• the setup
• phase 1: new definitions
• phase 2: what to expect
• lessons learned
1.9 gb
data = data
phase 1: new definitions
“I was a little daunted by the
documenting mentioned in the last
e-mail, as I am starting a new PhD
program, and have lots of
responsibilities there.”
- The Perpetrator
iterate
At least for the next couple of
years, until the RDM community
has made such an impact that
incoming graduate students
know how to manage data from
the start
LOL, JK
iterate
encourage
tattle
phase 2: what to expect
93theses & dissertations
45% excel
22% images
25% documents
25% other “data”
text
database
statistical
23% code
15% executables
12% “metadata”
33% of excel have:
linked info
charts
macros
30% unknown
unopenable
obsolete
3% missing data
?
definitions
“ScholarsArchive@OSU promises to ensure that the
following common file formats (among many others)
are useable in the future, using whatever
combination of techniques (such as migration,
emulation, etc.) is appropriate given the context of
need”
- someone 10 years ago
promises, promises
nothing ever changes
new baseline