+ All Categories
Home > Documents > Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating...

Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating...

Date post: 26-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
44
Final Projects A Large-Scale Multilingual Disambiguation of Glosses José Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli lcl.uniroma1.it/disambiguated-glosses
Transcript
Page 1: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Final Projects

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

lcl.uniroma1.it/disambiguated-glosses

Page 2: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

2

Page 3: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Definitional Knowledge in NLP

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

○ Word Sense Disambiguation

○ Taxonomy/Ontology Learning

○ Information Extraction

○ Plagiarism Detection

○ Question Answering

○ ...

3

Lesk, 1986Banerjee and Pedersen, 2002

Navigli and Velardi, 2005

Agirre and Soroa, 2009

Fernandez-Ordonez et al., 2012Chen et al., 2014

Camacho-Collados et al., 2015Velardi et al., 2013

Flati et al., 2014Espinosa-Anke et al., 2016

Richardson et al., 1998

Delli Bovi et al., 2015

Franco-Salvador et al., 2016

Hill et al., 2015

Page 4: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

○ WordNet/Open Multilingual WordNet150k definitions in 5 languages

Wiktionary 285k definitions in 1 language

Wikidata 8M definitions in 255 languages

OmegaWiki 118k definitions in 89 languages

Wikipedia >30M definitions in 264 languages

Definitions and glosses are everywhere!

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

4

Page 5: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Disambiguating glosses on a large scale

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

5

Our goal:✓ Construct a large-scale, multilingual repository of glosses and

definitions with sense annotations

Page 6: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Disambiguating glosses on a large scale

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

5

Our goal:✓ Construct a large-scale, multilingual repository of glosses and

definitions with sense annotations

○ The largest multilingual encyclopedic dictionary and semantic network

○ Merger of 13 different knowledge resources

○ >35M definitions in >250 languages!

babelnet.org

Page 7: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Disambiguating glosses on a large scale

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

5

Our goal:✓ Construct a large-scale, multilingual repository of glosses and

definitions with sense annotations

○ The largest multilingual encyclopedic dictionary and semantic network

○ Merger of 13 different knowledge resources

○ >35M definitions in >250 languages!

babelnet.org

How?

Page 8: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Disambiguating glosses on a large scale

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

6

Problem:○ Disambiguating definitions is hard!

Page 9: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Disambiguating glosses on a large scale

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

6

Problem:○ Disambiguating definitions is hard!

Interchanging the positions of the king and a rook. Definition of “castling” in chess

(WordNet)

Page 10: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Disambiguating glosses on a large scale

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

6

Problem:○ Disambiguating definitions is hard!

Interchanging the positions of the king and a rook. Definition of “castling” in chess

(WordNet)Multilingual WSD/EL based on BabelNet (Moro et al., 2014)

Page 11: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Disambiguating glosses on a large scale

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

6

Our goal:✓ Construct a large-scale, multilingual repository of glosses and

definitions with sense annotations

Problem:○ Disambiguating definitions is hard!

Short and concise, not enough context

Page 12: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Disambiguating glosses on a large scale

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

6

Our goal:✓ Construct a large-scale, multilingual repository of glosses and

definitions with sense annotations

Problem:○ Disambiguating definitions is hard!

Intuition:○ Use various definitions of the same concept or entity at the

same time and in multiple languages

Short and concise, not enough context

Page 13: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

7

Step 1: Context-rich Disambiguation

Page 14: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

7

○ Multilingual preprocessing pipeline:

● Tokenization: from the Polyglot project (165 languages)

● Part-of-speech tagging: Stanford parser trained on Universal Dependencies (30 languages)

Today at LREC, Session O19!

Step 1: Context-rich Disambiguation

Page 15: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

7

○ Multilingual preprocessing pipeline:

● Tokenization: from the Polyglot project (165 languages)

● Part-of-speech tagging: Stanford parser trained on Universal Dependencies (30 languages)

○ Context enrichment:

● Given a definiendum, collect all its definitions in every available language and resource and bring them together into a single, heterogeneous multilingual text!

Step 1: Context-rich Disambiguation

Page 16: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

7

○ Multilingual preprocessing pipeline:

● Tokenization: from the Polyglot project (165 languages)

● Part-of-speech tagging: Stanford parser trained on Universal Dependencies (30 languages)

○ Context enrichment:

● Given a definiendum, collect all its definitions in every available language and resource and bring them together into a single, heterogeneous multilingual text!

Step 1: Context-rich Disambiguation

Page 17: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

8

○ Babelfy (Moro et al., 2014):

● Unified graph-based approach to multilingual Word Sense Disambiguation and Entity Linking

● Designed to handle multilingual text (“language-agnostic” setting)

Step 1: Context-rich Disambiguation

babelfy.org

Page 18: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Step 1: Context-rich Disambiguation

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

9

Our running example: castling

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Interchanging the positions of the king and a rook.

Page 19: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Step 1: Context-rich Disambiguation

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

9

Our running example: castling

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werdenEl enroque es un movimiento especial

en el juego de ajedrez que involucra al rey y a una de las torres del jugador.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž.

Rok İngilizce'de kaleye rook denmektedir.

Rokade er et spesialtrekk i sjakk.

Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

Page 20: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Step 1: Context-rich Disambiguation

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

9

Our running example: castling

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werdenEl enroque es un movimiento especial

en el juego de ajedrez que involucra al rey y a una de las torres del jugador.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž.

Rok İngilizce'de kaleye rook denmektedir.

Rokade er et spesialtrekk i sjakk.

Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

Page 21: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Step 2: Disambiguation Refinement

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

10

Our running example: castling

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werdenEl enroque es un movimiento especial

en el juego de ajedrez que involucra al rey y a una de las torres del jugador.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž.

Rok İngilizce'de kaleye rook denmektedir.

Rokade er et spesialtrekk i sjakk.

Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.?

Page 22: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

NASARI_embed: Latent semantic representations of BabelNet synsets and Wikipedia pages as 300-dimensional vectors.

(Camacho-Collados, Pilehvar and Navigli, ACL 2015)

- Goal: Re-disambiguate low confidence annotations from the first step.

- How: We obtain the centroid NASARI vector of high-confidence annotations and compute cosine similarity with all the candidate synsets NASARI vectors.

Step 2: Disambiguation Refinement

9

SEMANTIC SIMILARITY

lcl.uniroma1.it/nasari/

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

11

Page 23: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Step 2: Disambiguation Refinement

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

12

Our running example: castling

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werdenEl enroque es un movimiento especial

en el juego de ajedrez que involucra al rey y a una de las torres del jugador.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž.

Rok İngilizce'de kaleye rook denmektedir.

Rokade er et spesialtrekk i sjakk.

Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

Page 24: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Step 2: Disambiguation Refinement

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

12

Our running example: castling

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werdenEl enroque es un movimiento especial

en el juego de ajedrez que involucra al rey y a una de las torres del jugador.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž.

Rok İngilizce'de kaleye rook denmektedir.

Rokade er et spesialtrekk i sjakk.

Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

vrook-chess

vrook-chess

vrook-chess

vrook-chess

vrook-chess

vrook-chess

vchess

vchess

vchess

vchess

vchess

Page 25: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

ccastling vrook-chess

vrook-chess

vrook-chess

Step 2: Disambiguation Refinement

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

12

Our running example: castling

Interchanging the positions of the king and a rook.

vrook-chess

vrook-chess

vrook-chess

vchess

vchess

vchess

vchess

vchess

Page 26: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Step 2: Disambiguation Refinement

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

13

Our running example: castling

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werdenEl enroque es un movimiento especial

en el juego de ajedrez que involucra al rey y a una de las torres del jugador.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž.

Rok İngilizce'de kaleye rook denmektedir.

Rokade er et spesialtrekk i sjakk.

Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

Page 27: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Step 2: Disambiguation Refinement

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

13

Our running example: castling

Interchanging the positions of the king and a rook.

Castling is a move in the game of chess involving a player’s king and either of the player's original rooks.

Manœuvre du jeu d'échecs

Spielzug im Schach, bei dem König und Turm einer Farbe bewegt werdenEl enroque es un movimiento especial

en el juego de ajedrez que involucra al rey y a una de las torres del jugador.

A move in which the king moves two squares towards a rook, and the rook moves to the other side of the king.

Rošáda je zvláštní tah v šachu, při kterém táhne zároveň král a věž.

Rok İngilizce'de kaleye rook denmektedir.

Rokade er et spesialtrekk i sjakk.

Το ροκέ είναι μια ειδική κίνηση στο σκάκι που συμμετέχουν ο βασιλιάς και ένας από τους δυο πύργους.

Xθ=0.86

θ<0.5

Page 28: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Evaluation

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

14

● Extrinsic Evaluation:○ Open Information Extraction (DefIE)○ Sense Clustering (NASARI)

● Manual Intrinsic Evaluation:○ 3 languages (EN, IT, ES)○ Sample of 100 definitions each

Page 29: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Extrinsic Evaluation I: Open Information Extraction

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

15

DefIE

Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis (Delli Bovi et al., TACL 2015)

lcl.uniroma1.it/defie/

- DefIE uses disambiguated definitions as input. We simply plugged-in our disambiguated definitions as input and leave its whole pipeline unchanged.

- This leaves to improvements according to both manual and automatic evaluation

Page 30: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Extrinsic Evaluation I: Open Information Extraction

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

16

Evaluation on a sample of 150 definitions

NUMBER OF EXTRACTIONS PRECISION

Page 31: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Extrinsic Evaluation II: Construction of NASARI+

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

17

NASARI semantic representation construction pipeline (ACL 2015)

Page 32: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Extrinsic Evaluation II: Construction of NASARI+

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

17

We simply enrich BabelNet taxonomy with the high-precision disambiguated glosses. The whole pipeline remains unchanged.

+ Glosses

+Glo

sses

Page 33: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Extrinsic Evaluation II: Wikipedia Sense Clustering

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

18

ACCURACY

Page 34: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Intrinsic Evaluation

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

19

Manual evaluation on a sample of 300 definitions

PRECISION OF THE THREE DIFFERENT DISAMBIGUATION STRATEGIES

Page 35: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Intrinsic Evaluation

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

19

Manual evaluation on a sample of 300 definitions

PRECISION OF THE THREE DIFFERENT DISAMBIGUATION STRATEGIES

Coverage of the high-precision version of the corpus: ~65% for all PoS and ~75% for nouns

Page 36: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Overview of the release

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

20

● Two different versions of the corpus:○ Complete version before refinement (Step 1)○ High-Precision version after refinement (Step 2)

● Formatted in an easy-to-process XML, divided by language and resource

Page 37: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Overview of the release

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

20

● Two different versions of the corpus:○ Complete version before refinement (Step 1)

Page 38: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Overview of the release

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

20

● Two different versions of the corpus:○ Complete version before refinement (Step 1)

Page 39: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Overview of the release

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

20

● Two different versions of the corpus:○ Complete version before refinement (Step 1)

Page 40: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Overview of the release

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

20

● Two different versions of the corpus:○ Complete version before refinement (Step 1)

○ High-Precision version after refinement (Step 2)

Page 41: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Statistics - #Sense annotations

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

21

Before refinement (Step 1)

249,544,708 annotations

After refinement (Step 2)

163,029,131 annotations

Page 42: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Statistics - #Sense annotations per language

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

22

~58.8M

~37.9M

~8.3M ~10.6M ~8.4M~3.4M

~14.1M~18.2M

~13.4M~5.2M

Page 43: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Conclusion

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

23

A large-scale multilingual corpus of disambiguated glosses:

● 250 milion sense-annotations for both concepts and named entities

● In total, over 35 million definitions have been disambiguated

● 256 languages

● Both versions of the corpus freely available online

PLAY WITH ME!

http://lcl.uniroma1.it/disambiguated-glosses/

Page 44: Final Projects - GitHub Pages · definitions with sense annotations Problem: Disambiguating definitions is hard! Intuition: Use various definitions of the same concept or entity at

Thank you!

A Large-Scale Multilingual Disambiguation of GlossesJosé Camacho Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli

http://lcl.uniroma1.it/disambiguated-glosses/


Recommended