Lecture 40 of 42

Computing & Information SciencesKansas State University

Friday, 01 Dec 2006CIS 490 / 730: Artificial Intelligence

Lecture 40 of 42

Friday, 01 December 2006

William H. Hsu

Department of Computing and Information Sciences, KSU

KSOL course page: http://snipurl.com/v9v3

Course web site: http://www.kddresearch.org/Courses/Fall-2006/CIS730

Instructor home page: http://www.cis.ksu.edu/~bhsu

Reading for Next Class:

Sections 22.1, 22.6-7, Russell & Norvig 2nd edition

NLP and Philosophical IssuesDiscussion: Machine Translation (MT)



(Hidden) Markov Models:Review

Definition of Hidden Markov Models (HMMs) Stochastic state transition diagram (HMMs: states, aka nodes, are hidden)

Compare: probabilistic finite state automaton (Mealy/Moore model)

Annotated transitions (aka arcs, edges, links)

• Output alphabet (the observable part)

• Probability distribution over outputs

Forward Problem: One Step in ML Estimation Given: model h, observations (data) D

Estimate: P(D | h)

Backward Problem: Prediction Step Given: model h, observations D

Maximize: P(h(X) = x | h, D) for a new X

Forward-Backward (Learning) Problem Given: model space H, data D

Find: h H such that P(h | D) is maximized (i.e., MAP hypothesis)

HMMs Also A Case of LSQ (f Values in [Roth, 1999])

0.4 0.5

0.6

0.8

0.2

0.5

1 2 3

A 0.4B 0.6

A 0.5G 0.3H 0.2

E 0.1F 0.9

E 0.3F 0.7

C 0.8D 0.2

A 0.1G 0.9



NLP Hierarchy:Review

Problem Definition

Given: m sentences containing untagged words

Example: “The can will rust.”

Label (one per word, out of ~30-150): vj s (art, n, aux, vi)

Representation: labeled examples <(w1, w2, …, wn), s>

Return: classifier f: X V that tags x (w1, w2, …, wn)

Applications: WSD, dialogue acts (e.g., “That sounds OK to me.” ACCEPT)

Solution Approaches: Use Transformation-Based Learning (TBL)

[Brill, 1995]: TBL - mistake-driven algorithm that produces sequences of rules

• Each rule of the form (ti, v): a test condition (constructed attribute) and a tag

• ti: “w occurs within k words of wi” (context words); collocations (windows)

For more info: see [Roth, 1998], [Samuel, Carberry, Vijay-Shankar, 1998]

Recent Research

E. Brill’s page: http://www.cs.jhu.edu/~brill/

K. Samuel’s page: http://www.eecis.udel.edu/~samuel/work/research.html

Discourse Labeling

Speech Acts

Natural Language

Parsing / POS Tagging

Lexical Analysis



Statistical Machine TranslationStatistical Machine Translation

Kevin Knight

USC/Information Sciences InstituteUSC/Computer Science Department



Clients do not sell pharmaceuticals in Europe => Clientes no venden medicinas en Europa

Spanish/English Parallel Corpora:Review

Spanish/English Parallel Corpora:Review

1a. Garcia and associates .1b. Garcia y asociados .

7a. the clients and the associates are enemies .7b. los clients y los asociados son enemigos .

2a. Carlos Garcia has three associates .2b. Carlos Garcia tiene tres asociados .

8a. the company has three groups .8b. la empresa tiene tres grupos .

3a. his associates are not strong .3b. sus asociados no son fuertes .

9a. its groups are in Europe .9b. sus grupos estan en Europa .

4a. Garcia has a company also .4b. Garcia tambien tiene una empresa .

10a. the modern groups sell strong pharmaceuticals .10b. los grupos modernos venden medicinas fuertes .

5a. its clients are angry .5b. sus clientes estan enfadados .

11a. the groups do not sell zenzanine .11b. los grupos no venden zanzanina .

6a. the associates are also angry .6b. los asociados tambien estan enfadados .

12a. the small groups are not modern .12b. los grupos pequenos no son modernos .



Data for Statistical MTand data preparation



Ready-to-Use Online Bilingual DataReady-to-Use Online Bilingual Data

0

20

40

60

80

100

120

140

1994

1996

1998

2000

2002

2004

Chinese/English

Arabic/English

French/English

(Data stripped of formatting, in sentence-pair format, available from the Linguistic Data Consortium at UPenn).

Millions of words(English side)




020

406080

100120140

160180

1994

1996

1998

2000

2002

2004

Chinese/English

Arabic/English

French/English

(Data stripped of formatting, in sentence-pair format, available from the Linguistic Data Consortium at UPenn).


+ 1m-20m words formany language pairs




020

406080

100120140

160180

1994

1996

1998

2000

2002

2004

Chinese/English

Arabic/English

French/English


One Billion?

???



From No Data to Sentence PairsFrom No Data to Sentence Pairs

Easy way: Linguistic Data Consortium (LDC) Really hard way: pay $$$

Suppose one billion words of parallel data were sufficient At 20 cents/word, that’s $200 million

Pretty hard way: Find it, and then earn it! De-formatting Remove strange characters Character code conversion Document alignment Sentence alignment Tokenization (also called Segmentation)



Sentence AlignmentSentence Alignment

The old man is happy. He has fished many times. His wife talks to him. The fish are jumping. The sharks await.

El viejo está feliz porque ha pescado muchos veces. Su mujer habla con él. Los tiburones esperan.




1. The old man is happy.

2. He has fished many times.

3. His wife talks to him.

4. The fish are jumping.

5. The sharks await.

1. El viejo está feliz porque ha pescado muchos veces.

2. Su mujer habla con él.

3. Los tiburones esperan.




1. The old man is happy.

2. He has fished many times.


4. The fish are jumping.








1. The old man is happy. He has fished many times.






Note that unaligned sentences are thrown out, andsentences are merged in n-to-m alignments (n, m > 0).



Tokenization (or Segmentation)Tokenization (or Segmentation)

English Input (some byte stream):

"There," said Bob. Output (7 “tokens” or “words”): " There , " said Bob .

Chinese Input (byte stream):

Output:

美国关岛国际机场及其办公室均接获一名自称沙地阿拉伯富商拉登等发出的电子邮件。

美国关岛国际机场及其办公室均接获一名自称沙地阿拉伯富商拉登等发出的电子邮件。



MT Evaluation



MT EvaluationMT Evaluation

Manual: SSER (subjective sentence error rate) Correct/Incorrect Error categorization

Testing in an application that uses MT as one sub-component Question answering from foreign language documents

Automatic: WER (word error rate) BLEU (Bilingual Evaluation Understudy)



Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance.

BLEU Evaluation Metric(Papineni et al, ACL-2002)

• N-gram precision (score is between 0 & 1)– What percentage of machine n-grams can

be found in the reference translation? – An n-gram is an sequence of n words

– Not allowed to use same portion of reference translation twice (can’t cheat by typing out “the the the the the”)

• Brevity penalty– Can’t just type out single word “the”

(precision 1.0!)

*** Amazingly hard to “game” the system (i.e., find a way to change machine output so that BLEU goes up, but quality doesn’t)



Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .


BLEU Evaluation Metric(Papineni et al, ACL-2002)

• BLEU4 formula (counts n-grams up to length 4)

exp (1.0 * log p1 + 0.5 * log p2 + 0.25 * log p3 + 0.125 * log p4 – max(words-in-reference / words-in-machine – 1, 0)

p1 = 1-gram precisionP2 = 2-gram precisionP3 = 3-gram precisionP4 = 4-gram precision



Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert .

Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter .

Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places .


Multiple Reference Translations

Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert .

Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter .

Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places .




BLEU Tends to Predict Human JudgmentsBLEU Tends to Predict Human Judgments

R2 = 88.0%

R2 = 90.2%

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

Human Judgments

NIS

T S

co

re

Adequacy

Fluency

Linear(Adequacy)Linear(Fluency)

slide from G. Doddington (NIST)

(va

ria

nt

of

BL

EU

)



Word-Based Statistical MT



Statistical MT SystemsStatistical MT Systems

Spanish BrokenEnglish

English

Spanish/EnglishBilingual Text

EnglishText

Statistical Analysis Statistical Analysis

Que hambre tengo yo

What hunger have I,Hungry I am so,I am so hungry,Have I that hunger …

I am so hungry



Statistical MT SystemsStatistical MT Systems

Spanish BrokenEnglish

English

Spanish/EnglishBilingual Text

EnglishText

Statistical Analysis Statistical Analysis

Que hambre tengo yo I am so hungry

TranslationModel P(s|e)

LanguageModel P(e)

Decoding algorithmargmax P(e) * P(s|e) e



Three Problems for Statistical MTThree Problems for Statistical MT

Language model Given an English string e, assigns P(e) by formula good English string -> high P(e) random word sequence -> low P(e)

Translation model Given a pair of strings <f,e>, assigns P(f | e) by formula <f,e> look like translations -> high P(f | e) <f,e> don’t look like translations -> low P(f | e)

Decoding algorithm Given a language model, a translation model, and a new sentence f …

find translation e maximizing P(e) * P(f | e)



The Classic Language ModelWord N-Grams


Goal of the language model -- choose among:

He is on the soccer fieldHe is in the soccer field

Is table the on cup theThe cup is on the table

Rice shrineAmerican shrineRice companyAmerican company





Generative approach: w1 = STARTrepeat until END is generated:

produce word w2 according to a big table P(w2 | w1)w1 := w2

P(I saw water on the table) =

P(I | START) *P(saw | I) *P(water | saw) *P(on | water) *P(the | on) *P(table | the) *P(END | table)

Probabilities can be learnedfrom online English text.



Translation Model?Translation Model?

Mary did not slap the green witch

Maria no dió una botefada a la bruja verde

Source-language morphological analysis

Source parse tree

Semantic representation

Generate target structure

Generative approach:



Translation Model?Translation Model?



Source-language morphological analysis

Source parse tree

Semantic representation

Generate target structure

Generative story:

What are allthe possiblemoves andtheir associatedprobabilitytables?



The Classic Translation ModelWord Substitution/Permutation [IBM Model 3, Brown et al., 1993]

The Classic Translation ModelWord Substitution/Permutation [IBM Model 3, Brown et al., 1993]


Mary not slap slap slap the green witch n(3|slap)


d(j|i)

Mary not slap slap slap NULL the green witchP-Null

Maria no dió una botefada a la verde brujat(la|the)

Generative approach:

Probabilities can be learned from raw bilingual text.




… la maison … la maison bleue … la fleur …

… the house … the blue house … the flower …

All word alignments equally likely

All P(french-word | english-word) equally likely






“la” and “the” observed to co-occur frequently,so P(la | the) is increased.






“house” co-occurs with both “la” and “maison”, butP(maison | house) can be raised without limit, to 1.0,

while P(la | house) is limited because of “the”

(pigeonhole principle)






settling down after another iteration






Inherent hidden structure revealed by EM training!For details, see:

• “A Statistical MT Tutorial Workbook” (Knight, 1999).• “The Mathematics of Statistical Machine Translation” (Brown et al, 1993)• Software: GIZA++






P(juste | fair) = 0.411P(juste | correct) = 0.027P(juste | right) = 0.020 …

new Frenchsentence

Possible English translations,to be rescored by language model



Decoding for “Classic” Models Decoding for “Classic” Models

Of all conceivable English word strings, find the one maximizing P(e) x P(f | e)

Decoding is an NP-complete challenge (Knight, 1999)

Several search strategies are available

Each potential English output is called a hypothesis.



The Classic ResultsThe Classic Results

la politique de la haine . (Foreign Original) politics of hate . (Reference Translation) the policy of the hatred . (IBM4+N-grams+Stack)

nous avons signé le protocole . (Foreign Original) we did sign the memorandum of agreement . (Reference Translation) we have signed the protocol . (IBM4+N-grams+Stack)

où était le plan solide ? (Foreign Original) but where was the solid plan ? (Reference Translation) where was the economic base ? (IBM4+N-grams+Stack)

the Ministry of Foreign Trade and Economic Cooperation, including foreigndirect investment 40.007 billion US dollars today provide data includethat year to November china actually using foreign 46.959 billion US dollars and



Flaws of Word-Based MTFlaws of Word-Based MT

Multiple English words for one French word IBM models can do one-to-many (fertility) but not many-to-one

Phrasal Translation “real estate”, “note that”, “interest in”

Syntactic Transformations Verb at the beginning in Arabic Translation model penalizes any proposed re-ordering Language model not strong enough to force the verb to move to the right place



Phrase-Based Statistical MT



Phrase-Based Statistical MTPhrase-Based Statistical MT

Foreign input segmented in to phrases “phrase” is any sequence of words

Each phrase is probabilistically translated into English P(to the conference | zur Konferenz) P(into the meeting | zur Konferenz)

Phrases are probabilistically re-ordered

See [Koehn et al, 2003] for an intro.

This is state-of-the-art!

Morgen fliege ich nach Kanada zur Konferenz

Tomorrow I will fly to the conference In Canada



Advantages of Phrase-BasedAdvantages of Phrase-Based

Many-to-many mappings can handle non-compositional phrases Local context is very useful for disambiguating

“Interest rate” … “Interest in” …

The more data, the longer the learned phrases Sometimes whole sentences



How to Learn the Phrase Translation Table?How to Learn the Phrase Translation Table?

One method: “alignment templates” (Och et al, 1999)

Start with word alignment, build phrases from that.

Mary

did

not

slap

the

green

witch

Maria no dió una bofetada a la bruja verde

This word-to-wordalignment is a by-product of training a translation modellike IBM-Model-3.

This is the best(or “Viterbi”) alignment.




One method: “alignment templates” (Och et al, 1999)

Start with word alignment, build phrases from that.

Mary

did

not

slap

the

green

witch


This word-to-wordalignment is a by-product of training a translation modellike IBM-Model-3.

This is the best(or “Viterbi”) alignment.



IBM Models are 1-to-ManyIBM Models are 1-to-Many

Run IBM-style aligner both directions, then merge:

EF bestalignment

Union or Intersection

MERGE

FE bestalignment




Collect all phrase pairs that are consistent with the word alignment

Mary

did

not

slap

the

green

witch


oneexamplephrase

pair



Consistent with Word AlignmentConsistent with Word Alignment

Phrase alignment must contain all alignment points for allthe words in both phrases!

x

x

Mary

did

not

slap

Maria no dió

Mary

did

not

slap

Maria no dió

Mary

did

not

slap

Maria no dió

consistent inconsistent inconsistent



Mary

did

not

slap

the

green

witch


Word Alignment Induced PhrasesWord Alignment Induced Phrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)



Mary

did

not

slap

the

green

witch




(a la, the) (dió una bofetada a, slap the)



Mary

did

not

slap

the

green

witch





(Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the)

(bruja verde, green witch)



Mary

did

not

slap

the

green

witch






(bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap)

(a la bruja verde, the green witch) …



Mary

did

not

slap

the

green

witch






(bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap)

(a la bruja verde, the green witch) …

(Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)



Phrase Pair ProbabilitiesPhrase Pair Probabilities

A certain phrase pair (f-f-f, e-e-e) may appear many times across the bilingual corpus.

We hope so!

So, now we have a vast list of phrase pairs and their frequencies – how to assign probabilities?



Phrase Pair ProbabilitiesPhrase Pair Probabilities

Basic idea: No EM training Just relative frequency: P(f-f-f | e-e-e) = count(f-f-f, e-e-e) / count(e-e-e)

Important refinements: Smooth using word probs P(f | e) for individual words connected in the word

alignment Some low count phrase pairs now have high probability, others have low

probability Discount for ambiguity

If phrase e-e-e can map to 5 different French phrases, due to the ambiguity of unaligned words, each pair gets a 1/5 count

Count BAD events too If phrase e-e-e doesn’t map onto any contiguous French phrase, increment event

count(BAD, e-e-e)



Advanced Training Methods



Basic Model, RevisitedBasic Model, Revisited

argmax P(e | f) = e

argmax P(e) x P(f | e) / P(f) = e

argmax P(e) x P(f | e) e




argmax P(e | f) = e

argmax P(e) x P(f | e) / P(f) = e

argmax P(e)2.4 x P(f | e) … works better! e




argmax P(e | f) = e

argmax P(e) x P(f | e) / P(f) e

argmax P(e)2.4 x P(f | e) x length(e)1.1

e

Rewards longer hypotheses, since these are unfairly punished by P(e)




argmax P(e)2.4 x P(f | e) x length(e)1.1 x KS 3.7 …

e

Lots of knowledge sources vote on any given hypothesis.

“Knowledge source” = “feature function” = “score component”.

Feature function simply scores a hypothesis with a real value.

(May be binary, as in “e has a verb”).

Problem: How to set the exponent weights?



MT PyramidMT Pyramid

SOURCE TARGET

words words

syntax syntax

semantics semantics

interlingua

phrases phrases



Why Syntax?Why Syntax?

Need much more grammatical output

Need accurate control over re-ordering

Need accurate insertion of function words

Word translations need to depend on grammatically-related words



.

Reorder

VB

PRP VB2 VB1

TO VB

MN TO

he adores

listening

music to

Insert

desu

VB

PRP VB2 VB1

TO VB

MN TO

he ha

music to

ga

adores

listening no

Translate

Kare ha ongaku wo kiku no ga daisuki desu

Take Leaves

desu

VB

PRP VB2 VB1

TO VB

MN TO

kare ha

ongaku wo

ga

daisuki

kiku no

VB

PRP VB1

he adores

listening

VB2

VB TO

MNTO

musicto

Parse Tree(E)

Sentence(J)

Yamada/Knight 01: Modeling and Training



Japanese/English Reorder TableJapanese/English Reorder Table

Original Order Reordering P(reorder|original)

PRP VB1 VB2 PRP VB1 VB2 PRP VB2 VB1 VB1 PRP VB2 VB1 VB2 PRP VB2 PRP VB1 VB2 VB1 PRP

0.074 0.723 0.061 0.037 0.083 0.021

VB TO VB TO TO VB

0.107 0.893

TO NN TO NN NN TO

0.251 0.749

For French/English, useful parameters like P(N ADJ | ADJ N).



Casting Syntax MT Models As Tree Transducer Automata [Graehl & Knight 04]

Casting Syntax MT Models As Tree Transducer Automata [Graehl & Knight 04]

q S

NP1 VP

VB NP2

S

NP1VP NP2

q S

PRO VP

VB NPthere

are

two men

CD NN

S

PR NP

hay

dos hombres

CD NN

NP

NP1 PP

of

P NP2

NP

NP2 P NP1

q S

WH-NP SINV/NP

MD S/NPWho

did NP VP/NP

VB

see

S

S ka

SNP

SNP

VB

<saw>

PRO P

dare o

NP P

ga

*

Non-local Re-Ordering (English/Arabic) Non-constituent Phrasal Translation (English/Spanish)

Lexicalized Re-Ordering (English/Chinese) Long-distance Re-Ordering (English/Japanese)



SummarySummary

Phrase-based models are state-of-the-art Word alignments Phrase pair extraction & probabilities N-gram language models Beam search decoding Feature functions & learning weights

But the output is not English Fluency must be improved Better translation of person names, organizations, locations More automatic acquisition of parallel data, exploitation of monolingual data across a

variety of domains/languages Need good accuracy across a variety of domains/languages



Available ResourcesAvailable Resources

Bilingual corpora 100m+ words of Chinese/English and Arabic/English, LDC (www.ldc.upenn.edu) Lots of French/English, Spanish/French/English, LDC European Parliament (sentence-aligned), 11 languages, Philipp Koehn, ISI

(www.isi.edu/~koehn/publications/europarl) 20m words (sentence-aligned) of English/French, Ulrich Germann, ISI

(www.isi.edu/natural-language/download/hansard/) Sentence alignment

Dan Melamed, NYU (www.cs.nyu.edu/~melamed/GMA/docs/README.htm) Xiaoyi Ma, LDC (Champollion)

Word alignment GIZA, JHU Workshop ’99 (www.clsp.jhu.edu/ws99/projects/mt/) GIZA++, RWTH Aachen (www-i6.Informatik.RWTH-Aachen.de/web/Software/GIZA++.html) Manually word-aligned test corpus (500 French/English sentence pairs), RWTH Aachen Shared task, NAACL-HLT’03 workshop

Decoding ISI ReWrite Model 4 decoder (www.isi.edu/licensed-sw/rewrite-decoder/) ISI Pharoah phrase-based decoder

Statistical MT Tutorial Workbook, ISI (www.isi.edu/~knight/) Annual common-data evaluation, NIST (www.nist.gov/speech/tests/mt/index.htm)



Some Papers Referenced on SlidesSome Papers Referenced on Slides

ACL [Och, Tillmann, & Ney, 1999] [Och & Ney, 2000] [Germann et al, 2001] [Yamada & Knight, 2001, 2002] [Papineni et al, 2002] [Alshawi et al, 1998] [Collins, 1997] [Koehn & Knight, 2003] [Al-Onaizan & Knight, 2002] [Och & Ney, 2002] [Och, 2003] [Koehn et al, 2003]

EMNLP [Marcu & Wong, 2002] [Fox, 2002] [Munteanu & Marcu, 2002]

AI Magazine [Knight, 1997]

www.isi.edu/~knight [MT Tutorial Workbook]

• AMTA– [Soricut et al, 2002]– [Al-Onaizan & Knight, 1998]

• EACL– [Cmejrek et al, 2003]

• Computational Linguistics– [Brown et al, 1993]– [Knight, 1999]– [Wu, 1997]

• AAAI– [Koehn & Knight, 2000]

• IWNLG– [Habash, 2002]

• MT Summit– [Charniak, Knight, Yamada, 2003]

• NAACL– [Koehn, Marcu, Och, 2003]– [Germann, 2003]– [Graehl & Knight, 2004]– [Galley, Hopkins, Knight, Marcu, 2004]



Terminology

Simple Bayes, aka Naïve Bayes Zero counts: case where an attribute value never occurs with a label in D

No match approach: assign an c/m probability to P(xik | vj)

m-estimate aka Laplace approach: assign a Bayesian estimate to P(xik | vj)

Learning in Natural Language Processing (NLP) Training data: text corpora (collections of representative documents)

Statistical Queries (SQ) oracle: answers queries about P(xik, vj) for x ~ D

Linear Statistical Queries (LSQ) algorithm: classification using f(oracle response)

• Includes: Naïve Bayes, BOC

• Other examples: Hidden Markov Models (HMMs), maximum entropy

Problems: word sense disambiguation, part-of-speech tagging

Applications

• Spelling correction, conversational agents

• Information retrieval: web and digital library searches



Summary Points

More on Simple Bayes, aka Naïve Bayes More examples

Classification: choosing between two classes; general case

Robust estimation of probabilities: SQ

Learning in Natural Language Processing (NLP) Learning over text: problem definitions

Statistical Queries (SQ) / Linear Statistical Queries (LSQ) framework

• Oracle

• Algorithms: search for h using only (L)SQs

Bayesian approaches to NLP

• Issues: word sense disambiguation, part-of-speech tagging

• Applications: spelling; reading/posting news; web search, IR, digital libraries

Next Week: Section 6.11, Mitchell; Pearl and Verma Read: Charniak tutorial, “Bayesian Networks without Tears”

Skim: Chapter 15, Russell and Norvig; Heckerman slides

Date post:	23-Jan-2016
Category:	Documents
Upload:	maude
View:	13 times
Download:	1 times

Lecture 40 of 42

Documents