+ All Categories
Home > Documents > Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages...

Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages...

Date post: 14-Dec-2015
Category:
Upload: emory-stephens
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
90
Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1
Transcript
Page 1: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

1

Martin Benjamin

The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space

21 May, 2015 – CERN, Geneva

Page 2: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

2

kamusi is Swahili for dictionary

Page 3: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

3

Goal: A complete matrix of human expression across time and space

• As a knowledge resource• As a data resource

Page 4: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

4

In service since 1994 (originally at Yale Council on African Studies)International NGO since 2009• Registered non-profit in USA and Switzerland

Academic Home since 2013:EPFL - Swiss Federal Institute of Technology in LausanneLSIR - Distributed Systems Information Laboratory

Page 5: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

5

White House Big Data Initiative:

Launch Partner for Building the Data Innovation Ecosystem Networking and Information Technology R&D ProgramOffice of Science and Technology Policy

Page 6: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

6

What is the overlap between and ?

• Big goals, small particles• Big collaboration• 7000 languages• “Human Languages Project”

• Pure science – data for knowledge• Practical science – data for use• High energy particle detectors

Page 7: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

7

Problems for Lexicography

What are Concepts?• How to explain an idea in

its own language• How to express an idea

across languages• How to account for

variation

What are Words?• A set of letters?• A set of sounds?

• A “canonical” form?• A single entity?

Page 8: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

8

Problems for Lexicography

What are Concepts?• How to explain an idea in

its own language• How to express an idea

across languages• How to account for

variation

What are Words?• A set of letters?• A set of sounds?

• A “canonical” form?• A single entity?

Page 9: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

9

Problems for Lexicography

What are Concepts?• How to explain an idea in

its own language• How to express an idea

across languages• How to account for

variation

What are Words?• A set of letters?• A set of sounds?

• A “canonical” form?• A single entity?

Page 10: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

10

Problems for Lexicography

What are Concepts?• How to explain an idea in

its own language• How to express an idea

across languages• How to account for

variation

What are Words?• A set of letters?• A set of sounds?

• A “canonical” form?• A single entity?

C-L-I-E-N-T

Page 11: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

11

Problems for Lexicography

What are Concepts?• How to explain an idea in

its own language• How to express an idea

across languages• How to account for

variation

What are Words?• A set of letters?• A set of sounds?

• A “canonical” form?• A single entity?

whined wind wined

Page 12: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

12

Problems for Lexicography

What are Concepts?• How to explain an idea in

its own language• How to express an idea

across languages• How to account for

variation

What are Words?• A set of letters?• A set of sounds?

• A “canonical” form?• A single entity?

SEEseessawseenseeing

Kinyarwanda900 million forms for every verb

Page 13: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

13

Problems for Lexicography

What are Concepts?• How to explain an idea in

its own language• How to express an idea

across languages• How to account for

variation

What are Words?• A set of letters?• A set of sounds?

• A “canonical” form?• A single entity?

African fish eagle drive up the wall

Page 14: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

14

light

Page 15: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

15

light

why multilingual dictionaries were impossible

Page 16: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

16

light

lumineux

léger

allégé

léger

why multilingual dictionaries were impossible

Page 17: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

17

light

lumineux

léger

allégé

léger

why multilingual dictionaries were impossible

WOLF 02121424-a:légerlumière

WOLF 01186408-a:léger

WOLF 00993117-a:légerallégélumièrelight

WOLF 00269989-a:lumièrelumineuxclair

PWN (English Wordnet):light x 47

WOLF (French Wordnet):light = lumière x 44light = léger x 37

Page 18: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

18

lightléger

why multilingual dictionaries were impossible

lumineux

allégé

léger

Page 19: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

19why multilingual dictionaries were impossible

Page 20: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

20why multilingual dictionaries were impossible

lumineux

Page 21: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

21

light

fr: lumineux

fr: léger

fr: allégé

fr: léger

why multilingual dictionaries were impossible

th: ที่��แคลอรี่��ต่ำ��

fi: kaloritonsw: pungufu

th: เบ�

fi: kevyt

sw: -epesi

th: สว่��ง

fi: valoisasw: -enye mwanga

th: ซึ่��งไรี่�ส�รี่ะ

fi: tyhjänpäiväinen

sw: -a kuchekesha

Page 22: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

22

en: light

fr: lumineux

fr: léger

fr: allégé

fr: léger

why multilingual dictionaries were impossible

th: ที่��แคลอรี่��ต่ำ��

fi: kaloritonsw: pungufu

th: เบ�

fi: kevyt

sw: -epesi

th: สว่��ง

fi: valoisasw: -enye mwanga

th: ซึ่��งไรี่�ส�รี่ะ

fi: tyhjänpäiväinen

sw: -a kuchekesha

en: light

en: light

en: light

Page 23: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

23

fr: lumineux

fr: léger

fr: allégé

why multilingual dictionaries were impossible

th: ที่��แคลอรี่��ต่ำ��

fi: kaloritonsw: pungufu

th: เบ�

fi: kevyt

sw: -epesi

th: สว่��ง

fi: valoisasw: -enye mwanga

light

fr: léger

th: ซึ่��งไรี่�ส�รี่ะ

fi: tyhjänpäiväinen

sw: -a kuchekesha

Page 24: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

24why multilingual dictionaries were impossible

Page 25: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

25

light

how Kamusi makes a multilingual dictionary possible

Page 26: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

26

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

Page 27: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

27

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

fr: lumineux

fr: léger

fr: allégé

fr: léger

Page 28: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

28

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

fr: lumineux th: สว่��งfi: valoisasw: -enye mwanga

Page 29: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

29

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

fr: léger th: เบ�fi: kevytsw: -epesi

Page 30: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

30

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

fr: léger th: ซึ่��งไรี่�ส�รี่ะfi: tyhjänpäiväinensw: -a kuchekesha

Page 31: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

31

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

fr: allégé th: ที่��แคลอรี่��ต่ำ��fi: kaloritonsw: pungufu

Page 32: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

32

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

fr: allégé th: ที่��แคลอรี่��ต่ำ��fi: kaloritonsw: pungufu

fr: léger th: ซึ่��งไรี่�ส�รี่ะfi: tyhjänpäiväinensw: -a kuchekesha

fr: léger th: เบ�fi: kevytsw: -epesi

fr: lumineux th: สว่��งfi: valoisasw: -enye mwanga

Page 33: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

33how Kamusi makes a multilingual dictionary possible

light (not heavy) fr: léger th: เบ�fi: kevytsw: -epesi

fr: léger (sandy)

fr: léger (low alcohol)

fr: léger (without much luggage)

Page 34: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

34

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

Page 35: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

35

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

Page 36: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

36

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

Page 37: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

37

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

Page 38: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

38

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

Page 39: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

39

light (not serious)

light (not fattening)

light (not heavy)

light (not dark)

how Kamusi makes a multilingual dictionary possible

fr: lumineux th: สว่��งfi: valoisasw: -enye mwanga/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Page 40: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

40how Kamusi makes a multilingual dictionary possible

Catalan: brillant illuminós

Japanese:明るい 明らか

Croatian:

svjetleći

svijetao

Spanish:claro

luminoso

light (not dark)

Page 41: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

41how Kamusi makes a multilingual dictionary possible

light

Page 42: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

42

Page 43: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

43

Page 44: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

44

light

Page 45: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

45

light

Page 46: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

46

light

Page 47: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

47

light

meaning

shape

sound

place

time

relationships

Page 48: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

48

light

meaning

shape

sound

place

time

relationships

Page 49: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

49

light

lighter

lightest

meaning

shape

sound

place

time

relationships

light

lights

lightedlit

lighting

Page 50: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

50

light

meaning

shape

sound

place

time

relationships

robot

Page 51: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

51

light

meaning

shape

sound

place

time

relationships

Page 52: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

52

light

meaning

shape

sound

place

time

relationships

linhtaz

Page 53: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

53

light

meaning

shape

sound

place

time

relationships

torch(hyponym)

lamp(synonym)

lighthouse(spawn)

dark(antonym)

car(holonym)

Page 54: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

54

(difference)meaning

shape

sound

place

time

relationships

lamp(synonym)

light

Page 55: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

55

light

meaning

shape

sound

place

time

relationships

Page 56: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

56

light

meaning

shape

sound

place

time

relationships

Page 57: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

57

light

meaning

definition examples

translations

Page 58: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

58

light

meaning

translations

Page 59: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

59

light

meaning

translations

Page 60: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

60

equivalence• Parallel• Similar• Explanatory

translations

Page 61: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

61

equivalence• Parallel• Similar• Explanatory

hand (English) = main (French)

✓: transitive across languages

translations

Page 62: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

62

equivalence• Parallel• Similar• Explanatory

mkono (Swahili) = hand + arm (English)

⁇ : might be transitive across languages

translations

difference difference translation

Page 63: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

63

equivalence• Parallel• Partial• Similar

hand (English) = 10.2 cm (most languages)

✗: not transitive across languages

translations

Page 64: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

64

light

meaning

definition examples

translationsdefinitiontranslations

example translations

Page 65: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

65

light

meaning

definition examples

timehardeasy place

notes

Page 66: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

66

light

shape

inflections multiple words

alternates

Page 67: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

67

light

lighter

lightest

shape

inflections

soundtranslation shape

separability (MWEs)

• SimpleConfigurable forme.g., English verbs

• ComplexFixed tablee.g., French verbs

• AgglutinativeRule-based codinge.g., Swahili verbs

alternate spellings

place

spelling sets:polysemous terms often have the same inflections.

Page 68: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

68

light

liteshape

alternates金魚 きんぎょ キンギョ kingyo goldfish

Kanji Hiragana Katakana Rōmaji English

https://en.wikipedia.org/wiki/Japanese_writing_system

Page 69: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

69

shape

multiple words

inflections (+separability)

drives || up the walldrove || up the walldriven || up the walldriving || up the wall

separability

drive || up the wall

Research question:Can we determine Separability Sets?

Page 70: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

70

shape

sign languagese.g. Uganda Sign LanguageSolomon Islander Sign Language

• no sound• no spelling

• need for gesture recognition(future research)

ideograms光

• no relation between shape and sound

• no sequencing• ontological

relationships

Page 71: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

71

light

place

dialect dialect word sightings

sound sightings

Page 72: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

72

light

sound

audio tone

IPA (phonetics)

place

Page 73: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

73

light

time

ancestors (other languages)

ancestors(own language)

datings (examples)

Page 74: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

74

light

relationships

synonyms ontologies

terminologies

transitivitywith

translations

hierarchiesor

reciprocity

Page 75: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

75

Lexicography vs.

TerminologyLexicography:

• General terms

• Variability of concepts among languages

• Describes indigenous words

Terminology

• Domain-specific terms

• Fixed meaning within context

• Prescribes words

sabilli

Page 76: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

76

Collecting Data

• Gathering new data• For languages with zero digitized data (most world languages)• For languages with incomplete data (all languages)

• Aligning existing data• To separate terms at concept level• To match concepts across languages

Page 77: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

77

Collecting DataExisting Data

• Copyright restrictions• Data structure• Data alignment

Page 78: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

78

Collecting DataExpert Interface: Edit Engine

Page 79: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

79

Crowdsourcing Lexicography• Gathering new data

• For languages with zero digitized data (most world languages)

• For languages with incomplete data (all languages)

• Aligning existing data• To separate terms at concept level• To match concepts across languages

People are very good at these tasksMachines are very badScholars are very busy

Page 80: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

80

Crowdsourcing with Games

• Engage the public in producing raw data• Data can be built upon and refined over time• Collecting “facts” that• can best come from native informants• can be verified by consensus as fulfilling a communicative role

• Wrong data and bad actors can be removed

Page 81: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

81

Game Architecture

• Simple tasks the public can understand• “Word” questions to stimulate the mind• Competition elements to stimulate the heart• Answers validated by consensus• Starts with English concept set to have a shared realm of ideas• Grows progressively – winning answers in one mode generate more

advanced questions in the next

Page 82: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

82

Page 83: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

83

Game Modes

1. Translation2. Synonyms3. Word Forms4. Definitions5. Examples6. Alignment7. Equivalence8. Difference

Page 84: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

84

Translation Game

Page 85: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

85

Translation Game

Page 86: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

86

Definition Game

Page 87: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

87

Definition Game

Page 88: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

88

Definition Game

Page 89: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

89

Example Game

Page 90: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1.

90

Martin Benjamin

The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space

[email protected]


Recommended