+ All Categories
Home > Documents > Word Sense Disambiguation

Word Sense Disambiguation

Date post: 26-Feb-2016
Category:
Upload: rufus
View: 32 times
Download: 0 times
Share this document with a friend
Description:
Word Sense Disambiguation. MAS.S60 Catherine Havasi Rob Speer. Banks?. The edge of a river “I fished on the bank of the Mississippi.” A financial institution “Bank of America failed to return my call.” The building that houses the financial institution - PowerPoint PPT Presentation
Popular Tags:
27
Word Sense Disambiguation MAS.S60 Catherine Havasi Rob Speer
Transcript
Page 1: Word Sense Disambiguation

Word Sense Disambiguation

MAS.S60Catherine Havasi

Rob Speer

Page 2: Word Sense Disambiguation

Banks?• The edge of a river– “I fished on the bank of the Mississippi.”

• A financial institution– “Bank of America failed to return my call.”

• The building that houses the financial institution– “The bank burned down last Thursday.”

• A “biological repository”– “I gave blood at the blood bank”.

Page 3: Word Sense Disambiguation

Word Sense Disambiguation• Most NLP tasks need WSD• “Played a lot of pool last

night… my bank shot is improving!”

• Usually keying to WordNet

“I hit the ball with the bat.”

Page 4: Word Sense Disambiguation

Types• “All words”– Guess the WN sysnet

• Lexical Subset– A small number of pre-defined words

• Course Word Sense– All words, but more intuitive senses

Page 5: Word Sense Disambiguation

Types• “All words”– Guess the WN sysnet

• Lexical Subset– A small number of pre-defined words

• Coarse Word Sense– All words, but more intuitive senses

IAA is 75-80% for all words task with WordNet90% for simple binary tasks

Page 6: Word Sense Disambiguation

What is a Coarse Word Sense?

• How many word senses does the word “bag” have in WordNet?

Page 7: Word Sense Disambiguation

What is a Coarse Word Sense?

• How many word senses does the word “bag” have in WordNet?– 9 noun senses, 5 verb senses

• Coarse WSD: 6 nouns, 2 verbs• A Coarse WordNet: 6,000 words (Navigli and Litkowski 2006)

• These distinctions are hard even for humans (Snyder and Palmer 2004)

– Fine Grained IAA: 72.5% – Coarse Grained IAA: 86.4%

Page 8: Word Sense Disambiguation

“Bag”: Noun• 1. A coarse sense containing:

– bag (a flexible container with a single opening)– bag, handbag, pocketbook, purse (a container used for carrying money and

small personal items or accessories)– bag, bagful (the quantity that a bag will hold)– bag, traveling bag, travelling bag, grip, suitcase (a portable rectangular

container for carrying clothes)• 2. bag (the quantity of game taken in a particular period)• 3. base, bag (a place that the runner must touch before scoring)• 4. bag, old bag (an ugly or ill-tempered woman)• 5. udder, bag (mammary gland of bovids (cows and sheep and goats))• 6. cup of tea, bag, dish (an activity that you like or at which you are

superior)

Page 9: Word Sense Disambiguation

Frequent Ingredients• Open Mind Word Expert• WordNet • eXtended WordNet (XWN)• SemCor 3.0 (“brown1” and “brown2”)• ConceptNet

Page 10: Word Sense Disambiguation

Semcor

Page 11: Word Sense Disambiguation

No training set, no problem• Julia Hockenmaier’s “Psudoword” evaluation• Pick two random words– Say, “banana” and “door”

• Combine them together– “BananaDoor”

• Replace all instances of either in your corpora with your new pseudoword

• Evaluate• A bit easier…

Page 12: Word Sense Disambiguation

The “Flip-flop” Method• Stephen Brown and Jonathan Rose, 1991• Find a single feature or set of features which

disambiguated the words – think the named entity recognizer

Page 13: Word Sense Disambiguation

An Example

Page 14: Word Sense Disambiguation

Standard Techniques• Naïve Bayes (notice a trend)– Bag of words– Priors are based on word frequencies

• Unsupervised clustering techniques– Expectation Maximization (EM)– Yarowsky

Page 15: Word Sense Disambiguation

Yarowsky

(slides from Julia Hockenmaier)

Page 16: Word Sense Disambiguation

Training Yarowsky

Page 17: Word Sense Disambiguation

• Created a blend using a large number of resources

• Created an ad hoc category for a word and its surroundings in sentence

• Find which word sense is most similar to category

• Keep the system machinery as general as possible.

Using OMCS

Page 18: Word Sense Disambiguation

Adding Associations• ConceptNet was included in two forms:– Concept vs. feature matrices– Concept-to-concept associations

• Associations help to represent topic areas• If the document mentions computer-related

words, expect more computer-related word senses

Page 19: Word Sense Disambiguation

Constructing the Blend

Page 20: Word Sense Disambiguation

Calculating the Right Sense

“I put my money in the bank”

Page 21: Word Sense Disambiguation

SemEval Task 7• 14 different systems were submitted in 2007• Baseline: Most frequent sense• Spoiler!: Our system would have placed 4th

• Top three systems:– NUS-PT: parallel corpora with SVM (Chang et al, 2007)

– NUS-ML: Bayesian LDA with specialized features (Chai, et al, 2007)

– LCC-WSD: multiple methods approach with end-to-end system and corpora (Novichi et al, 2007)

Page 22: Word Sense Disambiguation

Results

Page 23: Word Sense Disambiguation

Parallel Corpora• IMVHO the “right” way to do it.• Different words have different sense in

different languages• Use parallel corpora to find those instances– Like Euro or UN proceedings

Page 24: Word Sense Disambiguation

English and Romanian

Page 25: Word Sense Disambiguation

Gold standards are overrated

• Rada Mihalcea, 2007: “Using Wikipedia for Automatic Word Sense Disambiguation”

Page 26: Word Sense Disambiguation

Lab: making a simple supervised WSD classifier

• Big thanks to some guy with a blog (Jim Plush)• Training data: Wikipedia articles surrounding

“Apple” (the fruit) and “Apple Inc.”• Test data: hand-classified tweets about apples

and Apple products• Use familiar features + Naïve Bayes to get > 90%

accuracy• Optional: use it with tweetstream to show only

tweets about apples (the fruit)

Page 27: Word Sense Disambiguation

Slide Thanks• James Pustejovsky, Gerard Bakx, Julie

Hockenmaier• Manning and Schutze


Recommended