Grab a bucket! It's raining data!

Post on 27-Jan-2015

3,269 views 1 download

Tags:

description

For Access 2009 conference. Grab a bucket, it's raining data! Library data, research data, primary data, mashed-up data, raw data, cooked data, our data, other people's data... But which bucket should we grab? And can we really, truly fit all the data in one bucket? And don't we risk turning data into sludge if we mix it all together in our bucket? Finding a bucket is the easy part. Grappling with data acquisition, modeling, discovery, and reuse is hard. How will we do it? Can we?

transcript

Photo: http://www.flickr.com/photos/peasap/655111542/

It’s raining data!

Grab a bucket!

Dorothea SaloUniversity of Wisconsin

Access 2009

the...

of Open AccessPainting: “Cassandra,” Evelyn de MorganPhoto: http://commons.wikimedia.org/wiki/File:Cassandra1.jpeg

I’ve got nothing against

but the reality was...Photo: http://www.flickr.com/photos/y2bk/528300692/

... blurrier.

Photo: http://www.flickr.com/photos/jennsstuff/2965783700/

goals?

means?

something for nothing?

fit between content and container?

fit between user needs and system?

and so now, I may be becoming

the...

of Data Curation?

What do we know about data?

Photo: http://www.flickr.com/photos/kentbye/2053916246/

There’s a lot of data.

Photo: http://www.flickr.com/photos/noelzialee/2126153623/

Data are there to be interacted with.Photo: http://www.flickr.com/photos/jonevans/1032687817/

Data are wildly diverse in nature...

... as are their technical environments.Photo: http://www.flickr.com/photos/28481088@N00/670258156/

Data are already out there.

Photo: NASA (via http://nasaimages.org/), “Multiwavelength M81”

... but really want to be digital.

A lot of data are analog...

Photo: http://www.flickr.com/photos/mrbill/3452943573/

Data are project-based.

http://www.exploringthehyper.net/

Data are sloppy.

Photo: http://www.flickr.com/photos/midorisyu/2622024163/

Data aren’t standardized.

Photo: http://www.flickr.com/photos/mikewade/3463334719/

Our Big Bucket:

the digital library

Our other Big Bucket:

the institutional repository

Impedance mismatchesPhoto: http://www.flickr.com/photos/peasap/655111542/

What do we know about these?Photo: http://www.flickr.com/photos/schex/193912573/

Carefully built and tended

http://www.collectionscanada.gc.ca/naskapi/index-e.html

Production is a Taylorist’s dream.Photo: http://www.flickr.com/photos/villeneuve53/1808995620/

when it isn’t a Taylorist’s nightmare.Photo: http://www.flickr.com/photos/elsie/97542274/

What do we know about these?

inside our institutions.

We’re caged up

Photo: http://www.flickr.com/photos/annia316/115439737/

Any color...Photo: http://commons.wikimedia.org/wiki/File:Black_Ford_Model_T_in_HK.JPG

Bring it on; we’ll take anything!

... as long as it’s static and final.Photo: http://www.flickr.com/photos/orblivio/146691405/

Right, anything you’ve got!

... one file at a time.Photo: http://www.flickr.com/photos/jetalone/39990302/

Any look and feel...

... as long as it’s key-value pairs.

Any metadata you want!

Photo: http://www.flickr.com/photos/rattodisabina/2460905893/

Do anything you want...

... as long as it’s “download.”Photo: http://www.flickr.com/photos/procsilas/306417902/

Content models

Enough said.

So where does all that leave us?

Photo: http://www.flickr.com/photos/library_of_congress/2162653769/

We need bigger, better buckets.Photo: http://www.flickr.com/photos/jonevans/1032687817/

Silos are both necessary

and unacceptable.Photo: http://www.flickr.com/photos/jojakeman/2818910104/

We have a lot of modeling to do.

And meta-modeling.Photo: http://www.flickr.com/photos/crobj/727348790/

We have a lot of code to write.Photo: http://www.flickr.com/photos/fienna/170559081/

We can’t code or model in isolation.Photo: http://www.flickr.com/photos/naus3a01/240614578/

Fedora is the new world.

But Fedora must change.Photo: http://www.flickr.com/photos/mythwhisper/3361907495/

Solr brings it all togetherPhoto: http://www.flickr.com/photos/chantrybee/2911840052/

... the

of Data Curation.Vermeer: the Muse Clio, from “The Allegory of Painting”

This presentation is available under a Creative Commons Attribution 3.0 United States license.

Thank you!