+ All Categories
Home > Documents > USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer...

USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer...

Date post: 01-Apr-2015
Category:
Upload: ayanna-dodgen
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey
Transcript
Page 1: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS

İlknur Durgar El-Kahlout and Kemal Oflazer

Sabancı Universityİstanbul, Turkey

Page 2: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Problem

For a given definition, find the appropriate word (or words)

Traditional dictionary is of no use From a dictionary, find an appropriate

word that has a “similar” definition

Page 3: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Examples User definition:

Akımı ölçmek için kullanılan alet(A device that is used to measure the currenta)

In the dictionary:akımölçer: elektrik akımının şiddetini

ölçmeye yarayan araç, ampermetre(ammeter: a device that measures the intensity

of electrical current, amperemeter)

?

Page 4: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Applications Computer-assisted language

learning Solving crossword puzzles Reverse dictionary

Page 5: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Outline Problem statement Meaning-to-Word System (MTW) Our Approach Methods Results Result Summary Conclusion

Page 6: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Problem Statement Find the “similarity” between two

definitionsAkımı ölçmek için kullanılan alet

(A device that is used to measure the current)

Elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 7: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Meaning-to-Word (MTW) addresses the problem of finding

the appropriate word (or words), whose meaning “matches” the given definition

Two subproblems finding words whose definitions are

"similar" to the query in some sense ranking the candidate words using a

variety of ways

Page 8: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Information Flow in MTW

Page 9: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Available Resources

Turkish Monolingual Dictionary About 50.000 entries

Turkish WordNet About 11.000 synsets

Page 10: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Normalization

Normalization

Page 11: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Normalization

Tokenization Stemming Stop Word Elimination

Page 12: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Query Processing

Query Processing

Page 13: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Query Processing Subset Generation

Search with different set of words Select informative words from user’s

queryQuery: daha önce hiç evlenmemiş kişi (a person who

has never been married)

{önce, evlen, kişi} (before, marry, person)

{evlen, kişi}, {önce, kişi}, {önce, evlen} (marry, person) (before, person) (before, marry)

{evlen}, {önce}, {kişi} (marry) (before) (person)

Page 14: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Query Processing

Subset Sorting Unordered list of subsets are

insufficient Rank the generated subsets

1) By the number of words{önce, evlen, kişi} (before, marry, person)

{evlen, kişi} (marry, person)

2) By the sum of frequency logarithm{evlen, kişi} (marry, person)

{önce, kişi} (before, person)

Page 15: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Searching for Meanings

Page 16: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Searching for Meanings Two methods

Stem Matching Query Expansion (using WordNet)

Page 17: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Stem Matching Morphological normalization of

words Find meanings that contain

morphological variants of the original definition

Page 18: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Stem Matching (Ex.)(A device that is used to measure the current)

{ akımı ölçmek için kullanılan alet }

ak (white) ölç(measure) için(to) kullan(use) alet (device)

akım(current) iç(drink) kul (slave)

akı (flux)

Colored stems are the matching ones

Page 19: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Stem Matching

(A device that is used to measure the current)

akımı ölçmek için kullanılan alet

elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 20: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Stem Matching

(A device that is used to measure the current)

akımı ölçmek için kullanılan alet

elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 21: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Drawbacks Generate noisy stems ilim (science, my city) ilim (science), il (city)

Conflate two words with very different meanings to the same stem

ilim (science, my city), ilde (in the city) il (city)

Cannot find relations between similar words

kimse (someone) kişi (person)

bölüm (part) kısım (portion)

Stem Matching

Page 22: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Using Query Expansion Two different approaches:

Expand query with relations (synonyms, specializations, generalizations)

Expand query with unexpanded query’s relevant answers

WordNet synonyms are used in MTW

{besin, gıda} (food, nourishment) {iyileş, düzel} (to get better) /{iyileş, geliş} (to

improve)

Page 23: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Query Expansion (Ex.)(A device that is used to measure the current)

{ akımı ölçmek için kullanılan alet }

ak (white) ölç(measure) için(to) kullan(use) alet (device)

akım(current) iç(drink) kul (slave)

akı (flux)

beyaz faydalan araç

debi yararlan gereç

akış köle

Page 24: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Query Expansion (Ex.)(A device that is used to measure the current)

akımı ölçmek için kullanılan alet

elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 25: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Query Expansion (Ex.)(A device that is used to measure the current)

akımı ölçmek için kullanılan alet

elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 26: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Ranking

Page 27: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Ranking Very important part of MTW

Having the right answer in the retrieved set is not enough

Aim is to have the right answer at top of the retrieved set (Ex: in first top 50 answers)

Page 28: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Ranking Simple but effective methods

Number of matched words Subset informativeness - frequency of

words in the subset Ratio of number of matched words to

the number of words in the candidate dictionary definition

Longest Common Subsequence - order of the matched words

Page 29: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Some Statistics Training sets:

50 queries from users 50 queries from a dictionary

Test sets: 50 queries from users 50 queries from a separate dictionary

Test set 1 (user)

Training set 1

Test set 2 (dict.)

Training set 2

# of queries 50 50 50 50

Avg. # of query words

5.66 4.64 9.24 13.98

Max. # of query words

17 12 23 45

Min. # of query words

2 1 1 6

Page 30: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Rank Test set 1

Training set 1

Test set 2

Training set 2

1-10 13 (26%)

18 (36%)

45 (90%)

41 (82%)

11-50 7 (14%) 12 (24%)

2 (4%) 5 (10%)

>50 19 (38%)

10 (20%)

3 (6%) 4 (8%)

Not found

11 (22%)

10 (20%)

0 (0%) 0 (0%)

Stem Matching all stems included

Low % in top 10 in user queries but very high results in dictionary queries

Page 31: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Stem Matching

Rank Test set 1

Training set 1

Test set 2

Training set 2

1-10 14 (28%)

21 (42%)

46 (92%)

43 (86%)

11-50 5 (10%) 9 (18%) 1 (2%) 5 (10%)

>50 18 (36%)

9 (18%) 3 (6%) 2 (4%)

Not found

13 (26%)

11 (22%)

0 (0%) 0 (0%)

longest stem included (heuristics)

Improvement in user queries, slightly better performance in dictionary queries

Page 32: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Query Expansion (WordNet)

Rank Test set 1

Training set 1

Test set 2

Training set 2

1-10 14(28%)

24 (48%)

45 (90%)

41 (82%)

11-50 9 (18%) 9 (18%) 2 (4%) 5 (10%)

>50 18 (36%)

12 (24%)

3 (6%) 4 (8%)

Not found

9 (18%) 5 (10%) 0 (0%) 0 (0%)

all stems included

Better results in user queries, no change in dictionary queries

Page 33: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Query Expansion (WordNet)

Rank Test set 1

Training set 1

Test set 2

Training set 2

1-10 14 (28%)

24 (48%)

41 (82%)

39 (78%)

11-50 6 (12%) 8 (16%) 5 (10%) 6 (12%)

>50 21 (42%)

13 (26%)

1 (2%) 5 (10%)

Not found

9 (18%) 5 (10%) 0 (0%) 0 (0%)

longest stem included (heuristics)

Better performance than ‘longest stem matching’ in user queries, but worse performance in dictionary queries

Page 34: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Result Summary Stem Matching (longest stem

included) 60% success in real user queries 96% success in dictionary queries

Query Expansion (all stems included) 68% success in real user queries 92% success in dictionary queries

Page 35: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Conclusion We have implemented a ‘Meaning to

Word’ system for Turkish Results on unseen data are rather

satisfactory Query expansion is better

Although, it cannot find the words for all queries

68% of real user queries and 90% of dictionary queries are found in the first 50 results

Page 36: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

THANK YOU !


Recommended