+ All Categories
Transcript
Page 1: Archives, algorithms and people

Tristan Ferne / @tristanfExecutive Producer

BBC Research & Development

Archives, algorithms and peopleor

How we put the BBC World Service radio archive online using machines and

crowdsourcing

Page 2: Archives, algorithms and people

The BBC World Service archive

Page 3: Archives, algorithms and people

1947-2012

Page 4: Archives, algorithms and people

Spelling mistake

Missing data

Sometimes incorrect dataNo semantic data

The missing metadata

Page 5: Archives, algorithms and people

How it works

Page 6: Archives, algorithms and people

Listening machines

Page 7: Archives, algorithms and people

Noisy transcripts

Page 8: Archives, algorithms and people

Algorithms

Page 9: Archives, algorithms and people

Algorithms and people

Page 10: Archives, algorithms and people

The prototype

Page 12: Archives, algorithms and people
Page 13: Archives, algorithms and people

Show Synopsis editing version

Page 14: Archives, algorithms and people
Page 15: Archives, algorithms and people
Page 17: Archives, algorithms and people

Machine learning

Page 18: Archives, algorithms and people

Results

Page 19: Archives, algorithms and people

70000tag edits

How much data?

1000synopsis edits

71000edits

36000listenableprogrammes

1mmachine tags

70000programmes

3000users

of programmes listened to36%

of programmes tagged21%

Page 20: Archives, algorithms and people

And four lost programmes

Page 21: Archives, algorithms and people

Tags are a large and sparse space

When is a tag correct?

When is a programme tagged completely?

How do you measure crowd-sourced data?

How good is the data?

Page 22: Archives, algorithms and people

Who does the work?

1 person = 30% of edits

10 people = 70% of edits

10% of people = 98% of edits

Page 23: Archives, algorithms and people

The shape of the archive

Page 24: Archives, algorithms and people
Page 25: Archives, algorithms and people
Page 26: Archives, algorithms and people

Places mentioned

Page 27: Archives, algorithms and people

Linking from the News

Page 28: Archives, algorithms and people

The Last Danish Christmas Broadcast

“Entirely in Danish”

Page 29: Archives, algorithms and people

We can significantly improve the data

It’s cost-effective with re-usable technology

A crowdsourcing approach

What we’ve learnt

Page 30: Archives, algorithms and people

How good are the machine tags?

How much crowdsourcing do you need?

When is your data good enough?

Open questions


Top Related