+ All Categories
Home > Documents > UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Date post: 21-Jul-2015
Category:
Upload: pierpaolo-basile
View: 326 times
Download: 2 times
Share this document with a friend
Popular Tags:
16
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets Pierpaolo Basile, Annalina Caputo, Giovanni Semeraro, Fedelucio Narducci {fedelucio.narducci, pierpaolo.basile}@uniba.it #Microposts2015, NEEL Challenge, Florence 18th May 2015
Transcript
Page 1: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

UNIBA: Exploiting a Distributional Semantic Model for

Disambiguating and Linking Entities in Tweets

Pierpaolo Basile, Annalina Caputo, Giovanni Semeraro, Fedelucio Narducci

{fedelucio.narducci, pierpaolo.basile}@uniba.it

#Microposts2015, NEEL Challenge, Florence 18th May 2015

Page 2: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

The Challenge

Just watched Frozen for the first time ever and knew the words to all the songs... How?! #productplacement

Problem: Find and link entities in tweets

Product Entity type

Page 3: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Our Approach

• Entity Recognition • using PoS-tag

• relying on n-grams

• Disambiguation • knowledge-based method that combines a

Distributional Semantic Models (DSM) with prior probability assigned to each DBpedia concept

• Type • manual map for all types defined in the dbpedia-owl

ontology to the respective types in the task

Page 4: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Entity Recognition: Indexing Frozen

<dbpedia.org/resource/Frozen_(Madonna_song)>

Frozen

<dbpedia.org/resource/Frozen_(2013_film)>

Apple

<dbpedia.org/resource/Apple_Inc.>

Apple Inc.

<dbpedia.org/resource/Apple_Inc.>

Barack Obama

<http://dbpedia.org/resource/Barack_Obama>

DBpedia titles file and DBpedia NLP resources http://wifo5-04.informatik.uni-mannheim.de/downloads/datasets/

Search Score Levenshtein Distance Jaccard Index

Indexing

Page 5: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Entity Recognition…

PoS-tagger

N-grams generation

Tokenization and Normalization

Candidate list of surface

forms

Tweet

Page 6: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

…Entity Recognition

Search and Filtering

Search Score Levenshtein Distance

Jaccard Index

Candidate list of surface

forms

Candidate entities and

list of possible concepts

Page 7: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Disambiguation

Building the glosses

Building the context

Semantic Ranking

3-step approach

Page 8: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Disambiguation: Building the glosses

"Frozen" is a song by American singer-songwriter Madonna…

Frozen is a 2013 American 3D computer-animated musical…

DBpedia extended abstracts

Page 9: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Disambiguation: Building the context

Just watched Frozen for the first time ever and knew the words to all the songs... How?! #productplacement

<just, watched, first, time, knew, words, all, songs, how, product, placement>

Context

Page 10: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Disambiguation: Semantic Ranking 1/3

• Words as points in a mathematical space

• Close words are similar • Word space is built analyzing

word co-occurrences in a large corpus

• Vector composition using superposition (+)

Page 11: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Disambiguation: Semantic Ranking 2/3

word2vec: https://code.google.com/p/word2vec/

Distributional Semantic Model built on Wikipedia

Context

• Cosine similarity between the gloss and the context

• Linear combination with a function which takes into account the usage of concepts in Wikipedia

Page 12: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Disambiguation: Semantic Ranking 3/3 Statistics about the usage of concepts in Wikipedia

𝑝 𝑐𝑖𝑗 𝑒𝑖 =𝑡 𝑒𝑖 , 𝑐𝑖𝑗 + 1

#𝑒𝑖 + |𝐶𝑖|

Concept probability given the entity

Page 13: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

𝑝 𝑐𝑖𝑗 𝑒𝑖 =𝑡 𝑒𝑖 , 𝑐𝑖𝑗 + 1

#𝑒𝑖 + |𝐶𝑖|

Disambiguation: Semantic overlap 3/3 Statistics about the usage of concepts in Wikipedia

Number of times ei is linked as cij

Number of concepts assigned to ei

Page 14: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Evaluation

• Development set • 500 manually annotated tweets

• Metrics • SLM: Strong Link Match • STMM: Strong Typed Mention Match • MC: Mention Ceaf

• System setup • TweetNLP for tokenization and PoS-tagging • word2vec for DSM building: 400 vector dimensions

analyzing only terms that occur at least 25 times • Developed in JAVA

Page 15: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Results

• Low performance in entity recognition

• Good results in disambiguation: F=0.825 considering correct recognition and no-NIL instances

Entity Recognition F-SLM F-STMM F-MC

PoS-tag 0.362 0.267 0.389

N-grams 0.258 0.191 0.306

Page 16: UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Recommended