+ All Categories
Home > Education > UKSG 2015 Mechanical curator and British Library labs

UKSG 2015 Mechanical curator and British Library labs

Date post: 15-Jul-2015
Category:
Upload: benosteen
View: 1,939 times
Download: 1 times
Share this document with a friend
Popular Tags:
48
The Mechanical Curator, maps and the online community Ben O’Steen, British Library Labs @benosteen [email protected]
Transcript
Page 1: UKSG 2015  Mechanical curator and British Library labs

The Mechanical Curator, maps and the online community

Ben O’Steen, British Library Labs@benosteen [email protected]

Page 2: UKSG 2015  Mechanical curator and British Library labs

Andrew W. Mellon funded project seeking to bring researchers and our digital data closer together.

There is a significant gap between them for many reasons.

Page 3: UKSG 2015  Mechanical curator and British Library labs

Andrew W. Mellon funded project seeking to bring researchers and our digital data closer together.

There is a significant gap between them for many reasons.

I’m there to work out what bridges to build.

Page 4: UKSG 2015  Mechanical curator and British Library labs
Page 5: UKSG 2015  Mechanical curator and British Library labs
Page 6: UKSG 2015  Mechanical curator and British Library labs

Modern research forces us to re-evaluate what is meant by ‘access’

Enabling compute for example:Distant reading, machine learning, statistical methods -

an ever-growing list.

Page 7: UKSG 2015  Mechanical curator and British Library labs

Infancy of understanding

Large-scale analysis of text is evolving but young.

Exasperating situation where ‘black boxes’ of algorithms are used to draw conclusions.

http://www.scottbot.net/HIAL/?p=41271

Page 8: UKSG 2015  Mechanical curator and British Library labs

“Black Boxes”:a misnomer

It is legitimate and useful to use code that you could not write.

It is not legitimate to simply believe the ‘label’ on the side of the box.

E.g. “Sentiment Analysis” is often nothing of the sort.

Page 9: UKSG 2015  Mechanical curator and British Library labs

Quoting Scott Weingart: (emphasis mine)

● Do sentiment analysis algorithms agree with one another enough to be considered

valid?

● Do sentiment analysis results agree with humans performing the same task

enough to be considered valid?

● Is Jockers’ instantiation of aggregate sentiment analysis validly measuring

anything besides random fluctuations?

● Is aggregate sentiment analysis, by human or machine, a valid method for revealing

plot arcs?

● If aggregate sentiment analysis finds common but distinct patterns and they don’t seem to

map onto plot arcs, can they still be valid measurements of anything at all?

● Can a subjective concept, whether measured by people or machines, actually be

considered invalid or valid?

(again from http://www.scottbot.net/HIAL/?p=41271)

Page 10: UKSG 2015  Mechanical curator and British Library labs

Do researchers need to “level up” and become machine learning experts to use it?

Page 11: UKSG 2015  Mechanical curator and British Library labs

In short, no.

We do not require scientists to have a masters degree in Statistics to publish on numerical results, nor be prize-winning novelists to write research papers.*

There is a middle ground between treating something as magic and being an expert in the field.

I cannot say who specifically - librarian, data scientist, PI, consultant, etc - is best placed to gain and use this knowledge without evidence or trials.

* although, it likely couldn’t hurt given the papers I’ve read.

Page 12: UKSG 2015  Mechanical curator and British Library labs

Let’s consider a real example.

Peter Francois, 2013 British Library Labs Competition winner

Page 13: UKSG 2015  Mechanical curator and British Library labs

“I am interested in travel accounts in Europe during the

19th Century”

Page 14: UKSG 2015  Mechanical curator and British Library labs

“The Great Unread”, Graph, Maps and Trees and Franco Moretti

From a review of “Graph, Maps and Trees”:

“Professor Franco Moretti argues heretically that literature scholars should stop reading books and start counting, graphing, and mapping them instead [...]”

“For any given period scholars focus on a select group of a mere few hundred texts: the canon. As a result, they have allowed a narrow distorting slice of history to pass for the total picture.”

“Moretti offers bar charts, maps, and time lines instead, developing the idea of "distant reading," set forth in his path-breaking essay "Conjectures on World Literature," into a full-blown experiment in literary historiography, where the canon disappears into the larger literary system.”

Page 15: UKSG 2015  Mechanical curator and British Library labs

2013 Competition winnershttp://labs.bl.uk/Ideas+for+Labs

Pieter Francois

Page 16: UKSG 2015  Mechanical curator and British Library labs

Bias in digitisation

The tool was made to give a statistically valid sample.

Due to the paltry amount digitised, it showed how skewed the digital corpus is, compared to the overall holdings.

Allen B. Riddell in “Where are the novels?”* estimates that using HathiTrust’s corpus:

“... about 58%—somewhere between 47% and 68%—of the 2,903 novels [all publications in English between 1800 and 1836] have publicly accessible scans.”

* (2012) https://ariddell.org/where-are-the-novels.html

Page 17: UKSG 2015  Mechanical curator and British Library labs

Written versus What is Read

Page 18: UKSG 2015  Mechanical curator and British Library labs

Presentation shapes research questions

“On The Road”, Jack Kerouac

(via http://www.openculture.com/2007/08/on_the_road_the_original_scroll.html)

Page 19: UKSG 2015  Mechanical curator and British Library labs
Page 20: UKSG 2015  Mechanical curator and British Library labs
Page 23: UKSG 2015  Mechanical curator and British Library labs

Impact?

Hard to measure but:- 17-20 million hits on average every month,

over 250 million in 14 months.- Over 200,000 tags added.- > 5,500 clicks on ‘purchase a high

resolution version’- Hundreds of contributors.- Iterative crowdsourcing is ongoing.

- https://commons.wikimedia.org/wiki/Commons:British_Library/Mechanical_Curator_collection/map_tag_status

Page 24: UKSG 2015  Mechanical curator and British Library labs
Page 25: UKSG 2015  Mechanical curator and British Library labs
Page 26: UKSG 2015  Mechanical curator and British Library labs

Rethinking access

What if everything had (at least) one URL?

Every book?Every article?Every page?Every paragraph?

What if that URL worked in predictable ways?

Page 27: UKSG 2015  Mechanical curator and British Library labs
Page 28: UKSG 2015  Mechanical curator and British Library labs

David Normalhttp://www.davidnormal.com/

Page 29: UKSG 2015  Mechanical curator and British Library labs
Page 30: UKSG 2015  Mechanical curator and British Library labs
Page 31: UKSG 2015  Mechanical curator and British Library labs

Burning Man Festival

David Normal created light boxes around theBurning man, using the British Library’s Flickr Images

Page 32: UKSG 2015  Mechanical curator and British Library labs

Codename: “Burning Man Meets the British Library” - 20th June 2015

Page 33: UKSG 2015  Mechanical curator and British Library labs
Page 34: UKSG 2015  Mechanical curator and British Library labs

Can code identify subjective qualities?

Page 36: UKSG 2015  Mechanical curator and British Library labs

There is a lot more to explore…And too much for a single project to tackle alone.

Page 38: UKSG 2015  Mechanical curator and British Library labs

Georeferencing - http://bl.uk/maps

Page 39: UKSG 2015  Mechanical curator and British Library labs

Iterative crowdsourcing* and curation

Release data with the attitude that people will tell you why it is wrong and give them tools to fix it.

Georeferencing maps found in books, gives data that can be used to generate more specific metadata about what those books concern.

* A term I have borrowed from Mia Ridge

Page 40: UKSG 2015  Mechanical curator and British Library labs
Page 41: UKSG 2015  Mechanical curator and British Library labs
Page 42: UKSG 2015  Mechanical curator and British Library labs

Light-hearted but underlines a crucial pattern of access

Interfaces to content need to expect and to cater to machine access.

A human may not be present to say, ‘log in’.Keyword search is useless as a filtering mechanism

Text- and data-mining is like throwing a magnet into a haystack, without knowing if there are any steel needles in there.

Page 43: UKSG 2015  Mechanical curator and British Library labs

Gaming re-use

Page 44: UKSG 2015  Mechanical curator and British Library labs

Off the Map 2014 Winners

2014 winning team: Gothulus RiftUniversity of South Wales

Created a Fonthill Abbey inspired game called Nix using Oculus Rift

Blog: http://nixgamedevblog.blogspot.co.uk

YouTube flythrough: http://youtu.be/8ESieZO4VHw

Page 45: UKSG 2015  Mechanical curator and British Library labs

Off the Map 2015Alice’s Adventures Off the Map

Part of the British Library's celebrations for the 150th anniversary of Alice in Wonderland

http://gamecity.org/alices-adventures-off-the-map/

Page 46: UKSG 2015  Mechanical curator and British Library labs

British Library Labs Competitions

http://labs.bl.uk/British+Library+Labs+Competition+2015

Unofficial descriptions of the two main aspects of this:

“Tell us your ideas”and“Show us what you have done”


Recommended