+ All Categories
Home > Documents > Overview of Peter D. Turney’s Work on Similarity

Overview of Peter D. Turney’s Work on Similarity

Date post: 03-Jan-2016
Category:
Upload: cally-camacho
View: 30 times
Download: 0 times
Share this document with a friend
Description:
Overview of Peter D. Turney’s Work on Similarity. From 2001-2008. similarity. Attributional similarity (2001 - 2003) the degree to which two words are synonymous also known as Semantic relatedness and semantic association Relational similarity (2005 - 2008) - PowerPoint PPT Presentation
Popular Tags:
109
Overview of Peter D. Turney’s W ork on Similarity From 2001-2008
Transcript
Page 1: Overview of  Peter D. Turney’s Work on Similarity

Overview of Peter D. Turney’s Work o

n Similarity

From 2001-2008

Page 2: Overview of  Peter D. Turney’s Work on Similarity

similarity Attributional similarity (2001 - 2003)

the degree to which two words are synonymous also known as

Semantic relatedness and semantic association

Relational similarity (2005 - 2008) the degree to which two relations are analogous

Page 3: Overview of  Peter D. Turney’s Work on Similarity

Objective evaluation of the approaches by

Attributional similarity 80 TOFEL Synonym questions

Relational similarity 374 SAT analogy questions

Page 4: Overview of  Peter D. Turney’s Work on Similarity

2001 Mining the Web for

Synonyms: PMI-IR versus LSA on TOEFL

In Proceedings of the 12th European

Conference on Machine Learning,

pages 491–502, Springer, Berlin, 2001.

Page 5: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction

识别同义词: 给定一个词和一组候选词,从候选词中选出与给定

词意义最相近的一个。 核心思想:基于 co-occurrence

“a word is characterized by the company it keeps”

Page 6: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction: idea

给定一个词 problem 和一组候选词 {choice1, choice2, …, choicen} 计算 choicei 的 score(choicei) ,得分最高的即为同义词。

uses Pointwise Mutual Information (PMI) to analyze statistical data collected by Information

Retrieval (IR).

ii 2

i

p(problem & choice )score(choice ) = log

p(problem)p(choice )

Page 7: Overview of  Peter D. Turney’s Work on Similarity

2 formula

Score 1:

Score 2: NEAR为十个单词以内

i1 i

i

hits(problem AND choice )score (choice ) =

hits(choice )

i2 i

i

hits(problem NEAR choice )score (choice ) =

hits(choice )

Page 8: Overview of  Peter D. Turney’s Work on Similarity

2 formula

Score 3: 避免反义词如 big vs. small

Score 4: 引入上下文 context

context word的选择:只选一个(保证样本数)

3 i

i i

i i

score (choice ) =

hits((problem NEAR choice ) AND NOT ((problem OR choice ) NEAR "not"))

hits(choice AND NOT (choice NEAR "not"))

4 i

i i

i i

score (choice ) =

hits((problem NEAR choice ) AND context AND NOT ((problem OR choice ) NEAR "not"))

hits(choice AND context AND NOT (choice NEAR "not"))

Page 9: Overview of  Peter D. Turney’s Work on Similarity

3 Experiments

Compare with LSA: Latent Semantic Analysis

利用百科全书构造初始矩阵 X : 61,000 * 30,473 文档片段:整篇文档 压缩降维: SVD Element: tfidf weight Similarity: cosine

学生的 TOFEL 成绩

Page 10: Overview of  Peter D. Turney’s Work on Similarity

Dataset: 80 个 TOFEL 试题

50 个 ESL 考试题

Page 11: Overview of  Peter D. Turney’s Work on Similarity

3 Experiments: PMI-IR Vs. LSA

时间效率 PMI-IR :程序简单,耗时少

2s/query * 8 querys ,几乎全部耗时在网络交互 并行: 2S

LSA :耗时长 61,000 * 30,473 压缩到 61,000 *300 , UNIX Station 需时大

约三小时

Page 12: Overview of  Peter D. Turney’s Work on Similarity

3 Experiments

80 个 TOFEL 试题, 50 个 ESL考试题 PMI-IR : 73.75%(59/80) 74%(37/50) 留学生: 64.5%(51.6/80) LSA: 64.4%(51.5/80)

性能 : PMI-IR WIN: 10% 原因

NEAR 的使用, Smaller chunk size LSA 64.4% PMI-IR with AND 62.5% PMI-IR with NEAR 72.5%

Page 13: Overview of  Peter D. Turney’s Work on Similarity

4 Conclusion

结合 PMI 和 IR 用共现来衡量词语间的相关程度

PMI 利用向引擎发送查询

解决了数据稀疏的问题

Page 14: Overview of  Peter D. Turney’s Work on Similarity

2003Combining independent modules

in lexical multiple-choice problems

In RANLP-03, pages 482–489,

Borovets, Bulgaria(RANLP: Recent Advances in Natural Language Proc

essing )

Page 15: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction

There are several approaches to natural language problems

No one will be the best for all problem instances.

How about combine them?

Page 16: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction

two main contributions introduces and evaluates several new modules

for answering multiple-choice synonym questions and analogy questions.

3 merging rules presents a novel product rule compares it with other 2 similar merging rules.

Page 17: Overview of  Peter D. Turney’s Work on Similarity

2 Merging rules: the parameter

The parameter of the rules: w ph

ij >= 0 represents the probability

第 i 个 module 1 <= i <= n 第 h 个 instance 1 <= h <= m. 第 j 个 choice 1 <= j <= k

Dh,wj be the probability

assigned by the merging rule to choice j of training instance h when the weights are set to w.

1<= a(h) <= k be the correct answer for instance

, '

( )'arg maxh w

a hw hw D

Page 18: Overview of  Peter D. Turney’s Work on Similarity

2 Merging rules: old

mixture rule: very common

归一化

logarithmic rule

,

1

nh w hj i ij

i

M w p

,

1

exp ln ( ) i

nwh w h h

j i ij ijii

L w p p

, , ,

1

kh w h w h wj j j

j

D L L

, , ,

1

kh w h w h wj j j

j

D M M

Page 19: Overview of  Peter D. Turney’s Work on Similarity

2 Merging rules: novel

product rule

, ( (1 ) )h w hj i ij iiP w p w k

, , ,

1

kh w h w h wj j j

j

D P P

Page 20: Overview of  Peter D. Turney’s Work on Similarity

3 Synonym: dataset

a training set of 431 4-choice synonym questions

randomly divided them into 331 training questions and 100 testing questions. Optimize w with the training set

Page 21: Overview of  Peter D. Turney’s Work on Similarity

3 Synonym: Modules LSA PMI-IR Thesaurus

queries Wordsmyth (www.wordsmyth.net) Create synonyms lists for both stem and choices scored them by their overlap

Connector used summary pages from querying Google with a pair of

words Weighted sum of

the times when the words appear separated by a symbol [, ”, :, ,, =, /, ,, (, ] means, defined, equals, synonym, whitespace, and

the number of times “dictionary” or “thesaurus” appear

Page 22: Overview of  Peter D. Turney’s Work on Similarity

3 Synonym: combine results 3 rules’ accuracies are nearly identical the product and logarithmic rules assign

higher probabilities to correct answers as evidenced by the mean likelihood.

Page 23: Overview of  Peter D. Turney’s Work on Similarity

3 Synonym: compare with other approaches

Page 24: Overview of  Peter D. Turney’s Work on Similarity

4 Analogies: dataset

374 5-choice instances randomly split the collection into 274 training i

nstances and 100 testing instances. Eg. cat:meow::

(a) mouse:scamper,(b) bird:peck, (c) dog:bark, (d) horse:groom,(e) lion:scratch

Page 25: Overview of  Peter D. Turney’s Work on Similarity

4 Analogies: modules

Phrase vectors Create vector r to present the relationship betwee

n X and Y. Phrases with 128 patterns

Eg. “X for Y", “Y with X", “X in the Y", “Y on X“ Query and record the number of hits Measure by cosine

Thesaurus paths (WordNet) degree of similarity between paths

Page 26: Overview of  Peter D. Turney’s Work on Similarity

4 Analogies: combine results

Lexical relation modules a set of more specific modules using the WordNet 9 modules: Each checks a relationship

Synonym, Antonym, Hypernym, Hyponym, Meronym:substance, Meronym:part, Meronym:member, Holonym:substance, Holonym:member.

Check the stem first, then the choices Similarity

Make use of definition Similarity:dict uses dictionary.com and Similarity:wordsmyth uses wordsmyth.net

Given A:B::C:D, similarity = sim (A, C) + sim (B, D)

Page 27: Overview of  Peter D. Turney’s Work on Similarity
Page 28: Overview of  Peter D. Turney’s Work on Similarity

5 Conclusion

applied three trained merging rules to TOEFL questions Accuracy: 97.5%

provided first results on a challenging analogy task with a set of novel modules that use both lexical databases and statistical information. Accuracy: 45%

the popular mixture rule was consistently weaker than the logarithmic and product rules at assigning high probabilities to correct answers.

Page 29: Overview of  Peter D. Turney’s Work on Similarity

State of the art (accuracy)

LSA HUMAN PIM-IR

(2001)

HYBRID

(2003)

Synonym

question

64.4% 64.5% 73.75% 97.5%

HYBRID HUMAN

Analogies 45% 57%

Page 30: Overview of  Peter D. Turney’s Work on Similarity

2005 Corpus-based Learning of

Analogies and Semantic Relations

IJCAI 2005

Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh,

Scotland, UK, July 30-August 5, 2005.

Page 31: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction

Verbal analogy: VSM A:B :: C:D The novelty of the paper is the application of VSM

to measure the similarity between relationships. Noun-modifier pairs relations: supervised nea

rest neighbour algorithm Dataset: Nastase and Szpakowicz (2003), 600 n

one-modifier pairs.

Page 32: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction: examples

Analogy

Noun-modifier pairs relations Laser printer Relation: instrument

Page 33: Overview of  Peter D. Turney’s Work on Similarity

2 Solving Analogy Problems

assign scores to candidate analogies A:B::C:D For multiple-choice questions, guess highest

scoring choice Sim(R1, R2) difficulty is that R1 and R2 are implicit

attempt to learn R1 and R2 using unsupervised learning from a very large corpus

Page 34: Overview of  Peter D. Turney’s Work on Similarity

2 Solving Analogy Problems: Vector Space Model

create vectors, r1 and r2, that represent features of R1 and R2

measure the similarity of R1 and R2 by the cosine of the angle θ between r1 and r2

Page 35: Overview of  Peter D. Turney’s Work on Similarity

2 Solving Analogy Problems:简易图解版 Generate vector for each word pair

Joining terms:

“X for Y", “Y with X", “X in the Y", “Y on X“

vector

[ log(hit1), log(hit2)…, log(hit128) ]

Word PairA:B

64 joining terms

phrases

searchhits vector

log

Page 36: Overview of  Peter D. Turney’s Work on Similarity

2 Solving Analogy Problems: experiment

Page 37: Overview of  Peter D. Turney’s Work on Similarity

2 Solving Analogy Problems: experiment

Page 38: Overview of  Peter D. Turney’s Work on Similarity

3 Noun-Modifier Semantic Relations

First attempt to classify semantic relations without a lexicon.

Page 39: Overview of  Peter D. Turney’s Work on Similarity

30 Semantic Relations of training data

Page 40: Overview of  Peter D. Turney’s Work on Similarity

3 Noun-Modifier Semantic Relations: algorithm

nearest neighbour supervised learning nearest neighbour = cosine

Cosine (training pair, testing pair) vector of 128 elements, same joining terms as

before

Page 41: Overview of  Peter D. Turney’s Work on Similarity

3 Noun-Modifier Semantic Relations:Experiment for the 30 Classes

Page 42: Overview of  Peter D. Turney’s Work on Similarity

30 Semantic Relations

F when precision and recall are balanced 26.5%

F for random guessing 3.3%

much better than random guessing but still much room for improvement

30 classes is hard too many possibilities for confusing classes

try 5 classes instead group classes together

Page 43: Overview of  Peter D. Turney’s Work on Similarity

5 Semantic Relations

Page 44: Overview of  Peter D. Turney’s Work on Similarity

F for the 5 Classes

Page 45: Overview of  Peter D. Turney’s Work on Similarity

5 Semantic Relations

F when precision and recall are balanced 43.2%

F for random guessing 20.0%

better than random guessing better than 30 classes

26.5% but still room for improvement

Page 46: Overview of  Peter D. Turney’s Work on Similarity

Execution Time

experiments presented here required 76,800 queries to AltaVista 600 word pairs × 128 queries per word pair = 76,800 queries

as courtesy to AltaVista, inserted a five second delay between each query processing 76,800 queries took about five

days

Page 47: Overview of  Peter D. Turney’s Work on Similarity

Conclusion

The cosine metric in the VSM used to Analogy Classify semantic relations

It performs much better than random guessing, but below human levels.

Page 48: Overview of  Peter D. Turney’s Work on Similarity

State of the art

accuracy HYBRID

(2003)

VSM

(2005)

HUMAN

Analogies 45% 47% 57%

F-measure VSM

(2005)

Noun-Modifier

(5 classes)

43.2%

Page 49: Overview of  Peter D. Turney’s Work on Similarity

2006aSimilarity of Semantic

Relations

Computational Linguistics, 32(3):379–416.

Page 50: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction

Latent Relational Analysis (LRA) LRA extends the VSM approach of Turney an

d Littman (2005) in three ways: The connecting patterns are derived automatical

ly from the corpus, instead of using a fixed set of patterns.

Singular Value Decomposition (SVD) is used to smooth the frequency data.

automatically generated synonyms are used to explore variations of the word pairs.

Page 51: Overview of  Peter D. Turney’s Work on Similarity

2 A short description of LRA简易图解版 Generate vector for each word pair

Word PairA:B

64 joining terms

phrases

searchhits vector

log

A’:B, A:B’同义词扩展

熵 *log(hit)

矩阵

SVD

Calculate avg(cosine)

自动获得的pattern

Page 52: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: Word Analogy Questions Baseline LSA

Matrix: 17,232 * 8,000, density of 5.8% Time required: 209:49:36, 9 days Performance:

Page 53: Overview of  Peter D. Turney’s Work on Similarity

Experiment: Word Analogy Questions LSA vs. VSM

Corpus size: AltaVista: 5*1011 English words WMTS: 5*1010 English words

Page 54: Overview of  Peter D. Turney’s Work on Similarity

Experiment: Word Analogy Questions Varying the Parameters

Page 55: Overview of  Peter D. Turney’s Work on Similarity

Experiment: Word Analogy Questions Ablation Experiments

No SVD: not significant, but maybe significant with more word pairs

No synonyms: recall drops No both: recall drops VSM: drop is significant

Page 56: Overview of  Peter D. Turney’s Work on Similarity

Experiments with Noun-Modifier Relations

Dataset 600 noun-modifier pairs, hand-labeled with 30 cla

sses of semantic relations Algorithm

Baseline LRA with Single Nearest Neighbor LRA: a distance (nearness) measure

Page 57: Overview of  Peter D. Turney’s Work on Similarity
Page 58: Overview of  Peter D. Turney’s Work on Similarity

Discussion

For Word Analogy Questions Performance is not yet be adequate for practical

application Speed

For noun-modifier classification More hand-labeled data, but it’s expensive the choice of classification scheme for the

semantic relations Hybrid approach

combine the corpus-based approach of LRA with the lexicon-based approach of Veale (2004)

Page 59: Overview of  Peter D. Turney’s Work on Similarity

Conclusion of 2006a

LRA, extend the VSM (2005) in Patterns are derived automatically SVD is used to smooth and compress data. automatically generated synonyms are used to

explore variations of the word pairs.

Page 60: Overview of  Peter D. Turney’s Work on Similarity

State of the art

accuracy HYBRID

(2003)

VSM

(2005)

LRA

(2006a)

HUMAN

Analogies 45% 47% 56.8% 57%

F-measure VSM

(2005)

LRA

(2006a)

Noun-Modifier

(5 classes)

43.2% 54.6%

Page 61: Overview of  Peter D. Turney’s Work on Similarity

2006bExpressing Implicit Semantic Relations

without Supervision

Coling/ACL-06

Page 62: Overview of  Peter D. Turney’s Work on Similarity

Introduction

Hearst (1992): pattern X:Y Pattern “Y such as the X” can be used to mine l

arge text corpora for hypernym-hyponym Search using the pattern “Y such as the X” and fi

nd the string “bird such as the ostrich”, then we can infer that “ostrich” is a hyponym of “bird”.

Here we consider the inverse of this problem: X:Y pattern Can we mine a large text corpus for patterns that

express the implicit relations between X and Y?

Page 63: Overview of  Peter D. Turney’s Work on Similarity

Introduction

Discovering high quality patterns Pertinence: measure of quality Reliable for mining further word pairs with the

same semantic relations

Page 64: Overview of  Peter D. Turney’s Work on Similarity

2 Pertinence the first formal measure of quality for text mining patterns. a set of word pairs a set of patterns

Pi is pertinent to Xj:Yj

if highly typical word pairs Xk:Yk for the pattern Pi tend to be relationally similar to Xj:Yj

Pertinence tends to be highest with unambiguous patterns

1 1: ,..., :n nW X Y X Y1{ ,..., }mP P P

1

: ,

( : | ) : , :

j j i

n

k k i r j j k kk

pertinence X Y P

p X Y P sim X Y X Y

Page 65: Overview of  Peter D. Turney’s Work on Similarity

2 Pertinence: 计算 fk,I is the number of occurrences in a corpus of the w

ord pair Xk:Yk with the pattern Pi

Smoothing, ,

1

( | : )m

i k k k i k jj

p P X Y f f

1

( : ) ( | : )( : | )

( : ) ( | : )

k k i k kk k i n

j j i j jj

p X Y p P X Yp X Y P

p X Y p P X Y

( : ) 1j jp X Y n

1

( | : )( : | )

( | : )

i k kk k i n

i j jj

p P X Yp X Y P

p P X Y

( : | )k k ip X Y P

贝叶斯定理

, ,1

( : | ) ( : , ) ( )n

k k i k k i i k i j ij

p X Y P p X Y P p P f f

Page 66: Overview of  Peter D. Turney’s Work on Similarity

3 Related Work Hearst (1992)

describes a method for finding patterns like “Y such as the X”. but her method requires human judgment.

Riloff and Jones (1999) use a mutual bootstrapping technique that can find patterns

automatically but the bootstrapping requires an initial seed of manually cho

sen examples.

Other works all require training examples or initial seed patterns for each relation

Page 67: Overview of  Peter D. Turney’s Work on Similarity

3 Related Work

Turney (2006a): LRA maps each pair X:Y to a high-dimensional vector

v, then calculate the cosine. Pertinence is based on it A limitation:

the semantic content of the vectors is difficult to interpret

Page 68: Overview of  Peter D. Turney’s Work on Similarity

The Algorithm 1. Find phrases 2. Generate patterns

Note pattern frequency (TF) A local frequency count

3. Count pair frequency: It’s a global frequency count (DF)

4. Map pairs to rows: both for Xj:Yj and Yj:Xj

5. Map patterns to columns drop all patterns with a pair frequency less than 10 1,706,845 distinct patterns 42,032 patterns

Page 69: Overview of  Peter D. Turney’s Work on Similarity

The Algorithm 6. Build a sparse matrix

Element is frequency 7. Calculate entropy: log and entropy

gives more weight to patterns that vary substantially in frequency for each pair.

8. Apply SVD: 9. Calculate cosines: 10. Calculate conditional probabilities:

For every word pair and every pattern

11. Calculate pertinence: 1

( | : )( : | )

( | : )

i k kk k i n

i j jj

p P X Yp X Y P

p P X Y

Page 70: Overview of  Peter D. Turney’s Work on Similarity

The Algorithm:简易图解版

语义相似度 = pattern list 的相似度

{ 词对 } 矩阵词对 1, pattern list1

……词对 n, pattern listn

检索 , 统计patterns 等 计算 , 排序

Page 71: Overview of  Peter D. Turney’s Work on Similarity

5 Experiments with Word Analogies Dataset

374 college-level multiple-choice word analogies, taken from the SAT test.

6*374 = 2244 pairs 4194rows * 84,064 columns The sparse matrix density is 0.91%

Score =

( rankstem + rankchoice ) / 2

Page 72: Overview of  Peter D. Turney’s Work on Similarity
Page 73: Overview of  Peter D. Turney’s Work on Similarity

the four highest ranking patterns for the stem and solution for the first example

Page 74: Overview of  Peter D. Turney’s Work on Similarity

the top five pairs match the pattern “Y such as the X”.

Page 75: Overview of  Peter D. Turney’s Work on Similarity

Comparing with other measures

Page 76: Overview of  Peter D. Turney’s Work on Similarity

Experiments with Noun-Modifiers

Page 77: Overview of  Peter D. Turney’s Work on Similarity

Method and Result Method

A single nearest neighbour algorithm with leave-one-out cross-validation.

The distance between two noun-modifier pairs is measured by the average rank of their best shared pattern.

Result

Page 78: Overview of  Peter D. Turney’s Work on Similarity

More

For the 5 general classes

Page 79: Overview of  Peter D. Turney’s Work on Similarity

Comparing with other measures

Page 80: Overview of  Peter D. Turney’s Work on Similarity

Discussion Time

Word Analogies: 5 hours, vs. 5 days (2005), 9 days(2006a) Noun-Modifiers: 9 hours the majority of the time was spent in SEARCHING

Performance Near the level of the average senior high school student

(54.6% vs. 57%) For applications such as building a thesaurus, lexicon, or

ontology, this level of performance suggests that our algorithm could assist, but not replace, a human expert.

Page 81: Overview of  Peter D. Turney’s Work on Similarity

Conclusion

LRA is a black box The main contribution of this paper is the idea

of pertinence use it to find patterns that express the implicit

semantic relations between two words.

Page 82: Overview of  Peter D. Turney’s Work on Similarity

State of the art

accuracy HYBRID

(2003)

VSM

(2005)

LRA

(2006a)

pertinence

(2006b)

HUMAN

Analogies 45% 47% 56.8% 55.7% 57%

F-measure VSM

(2005)

LRA

(2006a)

pertinence

(2006b)

Noun-Modifier

(5 classes)

43.2% 54.6% 50.2%

Page 83: Overview of  Peter D. Turney’s Work on Similarity

2008A Uniform Approach to Analogies,

Synonyms, Antonyms,and Associations

Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), August 2008, Manchester, UK, Pages 905-912

Page 84: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction

语义种类太多,不可能每种都提供一种特别的算法 we restrict our attention to

analogous synonymous Antonymous Associated

As far as we know, the algorithm proposed here is the first attempt to deal with all four tasks using a uniform approach.

Page 85: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction: idea

Analogous Synonymous

X:Y is analogous to the pair levied:imposed Antonymous

X:Y is analogous to the pair black:white Associated

X:Y is analogous to the pair doctor:hospital

Page 86: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction: Why not WordNet?

WordNet contains all of the needed relations. Corpus-based algorithm is BETTER than lexicon

answer 374 multiple-choice SAT analogy questionsWordNet (Veale, 2004): 43%corpus-based (Turney, 2006a): 56%

Less human labor Easy to extend to other languages

Page 87: Overview of  Peter D. Turney’s Work on Similarity

1 Introduction: experiments

SAT college entrance test TOFEL ESL a set of word pairs that are labeled similar,

associated, and both, developed for experiments in cognitive psychology

Page 88: Overview of  Peter D. Turney’s Work on Similarity

2 Algorithm: PairClass

view the task of recognizing word analogies as a problem of classifying word pairs standard classification problem for supervised

machine learning

Page 89: Overview of  Peter D. Turney’s Work on Similarity

2 Algorithm: Resource

Corpus: 5 × 1010 words, consisting of web pages gathered by a web

crawler, gathered by Clarke,CharlesL.A., 2003

Wumpus: an efficient search engine for passage retrieval from large c

orpora. (http://www.wumpus-search.org/) to study issues that arise in the context of indexing dynami

c text collections in multi-user environments.

Page 90: Overview of  Peter D. Turney’s Work on Similarity

2 Algorithm: PairClasstraining set

&testing set

Step1: generate morphological variations

Step 2: search in a large corpus forall phrases

Step 3:

generate patterns

Step 4: reduce the number

of patterns

Step 5: generate feature

vectors

Step 6: apply a standard su

pervisedlearning algorithm

Weka

mason:stone

masons:stones

the mason cut the stone

with

[0 to 1 words] X [0 to 3 words] Y [0 to 1 words]

“the X cut * Y with”

“* X * the Y *”

2(n−2) patterns

topkN patterns

k = 20

SMO RBF algorithm

Page 91: Overview of  Peter D. Turney’s Work on Similarity

PairClass vs. LSA(Turney, 2006a)

PairClass does not use a lexicon to find synonyms for the input word pairs. a pure corpus-based algorithm can handle synony

ms without a lexicon. PairClass uses a support vector machine (SV

M) instead of a nearest neighbour (NN) learning algorithm.

PairClass does not use SVD to smooth the feature vectors. It has been our experience that SVD is not neces

sary with SVMs.

Page 92: Overview of  Peter D. Turney’s Work on Similarity

Measure of similarity PairClass: probability estimates, more useful Turney (2006): cosine

The automatically generated patterns are slightly more general PairClass: [0 to 1 words] X [0 to 3 words] Y [0 to 1 words] Turney (2006): X [0 to 3 words] Y

The morphological processing in PairClass (Minnen et al., 2001) is more sophisticated than in Turney (2006).

Page 93: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: SAT Analogies

use a set of 374 multiple-choice questions from the SAT college entrance exam.

Eg.

a binary classification problem

Page 94: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: SAT Analogies

1st DIFFICULTY: no negative examples the training set consists of one positive example

(the stem pair) and the testing set consists of five unlabeled examples (the five choice pairs).

Solution: Randomly choose one of the other 373 questions,

to be a negative example use PairClass to estimate the probability that

each testing example is positive, and we guess the testing example with the highest probability.

Page 95: Overview of  Peter D. Turney’s Work on Similarity
Page 96: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: SAT Analogies

2nd DIFFICULTY: the algorithm is very unstable, for lack of example

s. Solution:

To increase the stability, we repeat the learning process 10 times, using a different randomly chosen negative training example each time.

Average the 10 probability

Page 97: Overview of  Peter D. Turney’s Work on Similarity

PairClass: accuracy of 52.1%

52.1%

Page 98: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: TOEFL Synonyms

Recognizing synonyms a set of 80 multiple-choice synonym question

s from the TOEFL

Page 99: Overview of  Peter D. Turney’s Work on Similarity

View it as a binary classification problem

Page 100: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: TOEFL Synonyms

80 questions, 80 positive, 240 negative apply PairClass using ten-fold cross-validation

In each random fold, 90% of the pairs are used for training and 10% are used for testing.

For each fold, the model that is learned from the training set is used to assign probabilities to the pairs in the testing set.

They are non-overlapping, so can cover the whole dataset.

Choice: the one with hightest probability

Page 101: Overview of  Peter D. Turney’s Work on Similarity

PairClass: accuracy of 76.1%

76.1%

Page 102: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: Synonyms and Antonyms

a set of 136 ESL practice questions

Page 103: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: Synonyms and Antonyms

By patterns: hand-coded Lin et al. (2003) two patterns, “from X to Y ” and “either X or Y ”.

Antonyms: they occasionally appear in a large corpus in one of these two patterns

Synonyms: very rare to appear in these patterns.

PairClass: automatically

Page 104: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: Synonyms and Antonyms

RESULT PairClass: ten-fold cross-validation

accuracy of 75.0% (ten-fold cross-validation) Baseline:

accuracy of 65.4% (Always guessing the majority class)

NO COMPARISON

Page 105: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: Similar, Associated, and Both

Lund et al. (1995) evaluated their corpus-based algorithm for measuring word similarity with word pairs that were labeled similar, associated, or both.

These 144 labeled pairs were originally created for cognitive psychology experiments with human subjects

Page 106: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: Similar, Associated, and Both

Lund et al. (1995) did not measure the accuracy showed that their algorithm’s similarity scores were

correlated with the response times of human subjects in priming tests.

PairClass with ten-fold cross-validation accuracy of 77.1%

Baseline: guessing the majority and Randomly guessing: 33.3%

Since the three classes are of equal size

Page 107: Overview of  Peter D. Turney’s Work on Similarity

3 Experiment: summary

For the first two experiments PairClass is not the best, But it performs competitively

For the second two experiments, PairClass performs significantly above the baselin

es.

Page 108: Overview of  Peter D. Turney’s Work on Similarity

State of the art

YEAR 算法 类型 synonym analogy

2001 PMI-IR Corpus-based 73.75%

2003 PR Hybrid 97.50%

2005 VSM Corpus-based 47.1%

2006a LRA Corpus-based 56.1%

2006b PERT Corpus-based 53.5%

2008 PairClass Corpus-based 76.1% 52.1%

HUMAN 64.5% 57.0%

Page 109: Overview of  Peter D. Turney’s Work on Similarity

终于讲完了 o_0

Any Questions?


Recommended