+ All Categories
Home > Technology > Archives, algorithms and people

Archives, algorithms and people

Date post: 26-Jun-2015
Category:
Upload: tristan-ferne
View: 244 times
Download: 0 times
Share this document with a friend
Description:
How we put the BBC World Service radio archive online using machines and crowdsourcing. A talk given to the UK Museums on the Web conference, November 2013. One of the major challenges of a big digitisation project is you simply swap out an under-used physical archive for its digital equivalent. Without easy ways to navigate the data there's no way for your users to get to the bits they want. We recently worked with the BBC World Service to generate metadata for their radio archive, 50,000 programmes from over 45 years. First using algorithms to generate "good enough" topics to put the archive online and then using crowd-sourcing to improve the data. Throughout 2013 we have been running this experiment to crowdsource improvements to the metadata that we automatically created. At http://worldservice.prototyping.bbc.co.uk people can search and browse for programmes, listen to them, correct and add new topics. This talk describes how we went about this and what we've learnt with this massive online multimedia archive - about understanding audio, automatically generating topics and crowdsourcing improvements to the data.
Popular Tags:
31
Tristan Ferne / @tristanf Executive Producer BBC Research & Development Archives, algorithms and people or How we put the BBC World Service radio archive online using machines and crowdsourcing
Transcript
Page 1: Archives, algorithms and people

Tristan Ferne / @tristanfExecutive Producer

BBC Research & Development

Archives, algorithms and peopleor

How we put the BBC World Service radio archive online using machines and

crowdsourcing

Page 2: Archives, algorithms and people

The BBC World Service archive

Page 3: Archives, algorithms and people

1947-2012

Page 4: Archives, algorithms and people

Spelling mistake

Missing data

Sometimes incorrect dataNo semantic data

The missing metadata

Page 5: Archives, algorithms and people

How it works

Page 6: Archives, algorithms and people

Listening machines

Page 7: Archives, algorithms and people

Noisy transcripts

Page 8: Archives, algorithms and people

Algorithms

Page 9: Archives, algorithms and people

Algorithms and people

Page 10: Archives, algorithms and people

The prototype

Page 12: Archives, algorithms and people
Page 13: Archives, algorithms and people

Show Synopsis editing version

Page 14: Archives, algorithms and people
Page 15: Archives, algorithms and people
Page 17: Archives, algorithms and people

Machine learning

Page 18: Archives, algorithms and people

Results

Page 19: Archives, algorithms and people

70000tag edits

How much data?

1000synopsis edits

71000edits

36000listenableprogrammes

1mmachine tags

70000programmes

3000users

of programmes listened to36%

of programmes tagged21%

Page 20: Archives, algorithms and people

And four lost programmes

Page 21: Archives, algorithms and people

Tags are a large and sparse space

When is a tag correct?

When is a programme tagged completely?

How do you measure crowd-sourced data?

How good is the data?

Page 22: Archives, algorithms and people

Who does the work?

1 person = 30% of edits

10 people = 70% of edits

10% of people = 98% of edits

Page 23: Archives, algorithms and people

The shape of the archive

Page 24: Archives, algorithms and people
Page 25: Archives, algorithms and people
Page 26: Archives, algorithms and people

Places mentioned

Page 27: Archives, algorithms and people

Linking from the News

Page 28: Archives, algorithms and people

The Last Danish Christmas Broadcast

“Entirely in Danish”

Page 29: Archives, algorithms and people

We can significantly improve the data

It’s cost-effective with re-usable technology

A crowdsourcing approach

What we’ve learnt

Page 30: Archives, algorithms and people

How good are the machine tags?

How much crowdsourcing do you need?

When is your data good enough?

Open questions


Recommended