Download - LING 388: Language and Computers Sandiway Fong Lecture 27: 12/1.

LING 388: Language and Computers

Sandiway Fong

Lecture 27: 12/1

Today

• Three things

– a bit more on the technology behind Statistical Machine Translation (SMT)

– Homework 4 Review

– Class Evaluations

Part 1

Last Time

• Statistical Machine Translation (SMT)– popular now– Language Weaver

• (Arabic, also French etc.)• newest one: Persian

Translating is EU's new boom industry

2004article



market is there:opportunitiesfor machinetranslation?

Statistical MT

• Avoid the explicit construction of linguistically sophisticated models of grammar

• Pioneered by IBM researchers (Brown et al., 1990)–Language Model

•Pr(S) estimated by n-grams

–Translation Model•Pr(T|S) estimated through alignment models

N-grams

• language model: P(sentence)• idea:

– collect statistics on co-occurrence of adjacent words• Brown corpus (1 million words):

– word w frequency(w) probability(w)

– the 69,971 0.070

– rabbit 11 0.000011

• example:– Just then, the white

– expectation is p(white rabbit) > p(white the)

– but p(the) > p(rabbit)

Statistical Machine Translation

• Machine Translation– Source sentence S– Target sentence T– Every pair (S,T) has a probability– P(T|S) = probability target is T given S


• The Language Model: P(S)– bigrams:

• w1 w2 w3 w4 w5

• w1w2, w2w3, w3w4, w4w5

– sequences of words• S = w1 … wn

• P(S) = P(w1)P(w2| w1)…P(wn | w1 …wn-1)– product of probability of wi given preceding context for wi

• problem: we need to know too many probabilities

– bigram approximation• limit the context to the previous word• P(S) ≈ P(w1)P(w2| w1)…P(wn | wn-1)

– bigram probability estimation from corpora• P(wi| wi-1) ≈ freq(wi-1wi)/freq(wi-1) in a corpus


• The Translation Model: P(T|S)– Alignment model:

• assume there is a transfer relationship between source and target words

• not necessarily 1-to-1

– Example• S = w1 w2 w3 w4 w5 w6 w7

• T = u1 u2 u3 u4 u5 u6 u7 u8 u9

• w4 -> u3 u5

• fertility of w4 = 2• distortion w5 -> u9


• Alignment notation– use word positions in parentheses– no word position, no mapping– Example

• ( Les propositions ne seront pas mises en application maintenant | The(1) proposal(2) will(4) not(3,5) now(9) be implemented(6,7,8) )

• This particular alignment is not correct, an artifact of their algorithm


• How to compute probability of an alignment?– Need to estimate

• Fertility probabilities– P(fertility=n|w) = probability word w has fertility n

• Distortion probabilities– P(i|j,l) = probability target word is at position i given source word at position

j and l is the length of the target

– Example• (Le chien est battu par Jean | John(6) does beat(3,4) the(1) dog(2))

– P(f=1|John)P(Jean|John) x– P(f=0|does) x– P(f=2|beat)P(est|beat)P(battu|beat) x– P(f=1|the)P(Le|the) x– P(f=1|dog)P(chien|dog) x– P(f=1|<null>)P(par|<null>) x distortion probabilities…


• Not done yet– Given T– translation problem is to find S that

maximizes P(S)P(T|S)– can’t look for all possible S in the

language

• Idea (Search):– construct best S incrementally– start with a highly likely word transfer– and find a valid alignment– extending candidate S at each step– (Jean aime Marie | * )– (Jean aime Marie | John(1) * )

• Failure?– best S not a good

translation• language model

failed or• translation model

failed

– couldn’t find best S• search failure


• Parameter Estimation– English/French

• from the Hansard corpus– 100 million words– bilingual Canadian parliamentary proceedings– unaligned corpus

– Language Model• P(S) from bigram model

– Translation Model• how to estimate this with an unaligned corpus?• Used EM (Estimation and Maximization) algorithm, an iterative algorithm for re-

estimating probabilities• Need

– P(u|w) for words u in T and w in S– P(n|w) for fertility n and w in S– P(i|j,l) for target position i and source position j and target length l


• Experiment 1: Parameter Estimation for the Translation Model– Pick 9,000 most common

words for French and English

– 40,000 sentence pairs– 81,000,000 parameters– Initial guess: minimal

assumptions


• Experiment 1: results– (English) Hear, hear!– (French) Bravo!


• Experiment 2: Translation from French to English– Make task manageable

• English lexicon– 1,000 most frequent English words in corpus

• French lexicon– 1,700 most frequent French words in translations completely covered by

the selected English words

• 117,000 sentence pairs with words covered by the lexicons• 17 million parameters estimated for the translation model• bigram model of English

– 570,000 sentences – 12 million words

– 73 test sentences• Categories: (exact, alternate, different), wrong, ungrammatical




48% (Exact, alternate, different)Editing

776 keystrokes1,916 Hansard

Part 2

• Homework 4 Review

English Grammar: e21.pl

• DCG rules• sbar(PA) --> np(X,wh), do(_,_),

s_objectwh(_,S,P), {headof(X,O), PA =..[P,S,O]}.

• sbar(S) --> s(S).• s_objectwh(s(Y,Z),S,P) --> np(Y,_),

vp_objectwh(Z), {headof(Y,S),headof(Z,P)}.

• s(PA) --> np(Y,_), vp(Z,_), {predarg(Y,Z,1,PA)}.

• np(np(Y),Q) --> pronoun(Y,Q).• np(np(Y),notwh) --> proper_noun(Y).• np(np(D,N),Q) --> det(D,Number),

common_noun(N,Number,Q).• vp(vp(v(died)),ed) -->

[kicked,the,bucket].• vp(vp(Y,Z),F) --> transitive(Y,F),

np(Z,_).• vp(vp(A,V),F) --> aux(A,F),

transitive(V,en).

• vp_objectwh(vp(Y)) --> transitive(Y,root).

• det(det(the),_) --> [the].• det(det(a),sg) --> [a].• common_noun(n(bucket),sg,notwh)

--> [bucket].• common_noun(n(buckets),pl,notwh

) --> [buckets].• common_noun(n(apple),sg,notwh)

--> [apple].• common_noun(n(apples),pl,notwh)

--> [apples].• common_noun(n(man),sg,notwh) --

> [man].• common_noun(n(book),sg,notwh)

--> [book].• common_noun(n(books),pl,notwh)

--> [books].

English Grammar: e21.pl• pronoun(who,wh) --> [who].• pronoun(what,wh) --> [what].• proper_noun(john) --> [john].• transitive(v(eats),s) --> [eats].• transitive(v(ate),ed) --> [ate].

• transitive(v(eaten),en) --> [eaten]. • transitive(v(buy),root) --> [buy].• transitive(v(buys),s) --> [buys].• transitive(v(bought),ed) --> [bought].• transitive(v(bought),en) --> [bought].• transitive(v(kicks),s) --> [kicks].• transitive(v(kicked),ed) --> [kicked].• transitive(v(kicked),en) --> [kicked].• aux(aux(was),ed) --> [was].• aux(aux(is),s) --> [is].• do(aux(does),s) --> [does].• do(aux(did),ed) --> [did].

Japanese Grammar: j21.pl

• DCG Rules• s(PA) --> np(Y,Q1), nomcase, vp(Z,Q2), sf(Q1,Q2),

{predarg(Y,Z,2,PA)}.• vp(vp(Z,Y),Q) --> np(Z,Q), acccase, transitive(Y).• transitive(v(katta)) --> [katta].• nomcase --> [ga].• acccase --> [o].• np(np(taroo),notwh) --> [taroo].• np(np(hon),notwh) --> [hon].• np(np(dare),wh) --> [dare].• np(np(nani),wh) --> [nani]. • sf(wh,notwh) --> [ka].• sf(notwh,wh) --> [ka].• sf(notwh,notwh) --> [].• sf(wh,wh) --> [ka].

predarg(X,Y,Order,PA) :- headof(X,S), headof(Y,P), order(Order,Y,NP), headof(NP,O), PA =.. [P,S,O].predarg(X,Y,_,PA) :- headof(X,S), headof(Y,P), Y = vp(_), PA =.. [P,S].

order(1,vp(_,NP),NP).order(2,vp(NP,_),NP).

headof(np(_,n(N)),N).headof(vp(v(V),_),V).headof(vp(_,v(V)),V).headof(vp(v(V)),V).headof(np(N),N).

Translator: t.pl

• Prolog translation code– translate(E,J) :- % Translator– sbar(X,E,[]), % English grammar– mapPA(X,Xp), – js(Xp,J,[]). % Japanese grammar– mapPA(E,J) :- % Map predicate-argument

• E =.. [P,S,O], • je(PJ,P),• je(SJ,S),• je(OJ,O),• J =.. [PJ,SJ,OJ].

– je(katta,bought). % Bilingual dictionary– je(hon,book).– je(taroo,john).– je(dare,who).– je(nani,what).– je(katta,buy).

Question 2: Tense

• Homework Question– (A) English morphology and tense

– (1) (1pt) • Why does • ?- translate(X,[taroo,ga,hon,o,katta]). • return duplicate answers?

– (2) (2pts) • fix the problem

– (3) (1pt) • *John buy the book (John buys the book)

– are accepted by the English grammar

• fix the problem

– submit both your grammar and relevant examples

Question 2: Tense

• English Grammar– code– vp(vp(Y,Z),F) --> transitive(Y,F), {F \== en, F \==

root}, np(Z,_).

– replacing unrestricted– vp(vp(Y,Z),F) --> transitive(Y,F), np(Z,_).

– blocks both– *John bought the book bought = -en form– *John buy the book buy = root form

Question 2: Tense

• Homework Question– (B) Tense and predicate-argument structure– let’s expand the grammar slightly– assume kau (buy(s)) is the present tense form of katta (bought)– (3pts) – Modify the translator to respect tenses when translating between

• John buys a book taroo-ga hon-o kau• John bought a book taroo-ga hon-o katta

– submit both your code and all relevant translations, e.g.• ?- translate([john,buys,a,book],X).• ?- translate(X,[taroo,ga,hon,o,kau]).

Question 2: Tense

• Translator– Code– je(kau,buys).

• Japanese grammar– Code– jtransitive(v(kau)) --> [kau].

Question 3: Yes-No Questions

• Homework Question– modify the English and Japanese grammars to incorporate yes-no

questions– modify the translator to operate on yes-no questions

– examples:• Did John buy a book? yesno(buy(john,book)) • Taroo-ga hon-o katta ka yesno(katta(taroo,hon))

Question 3: Yes-No Questions

• English grammar– Code– sbar(yesno(PA)) --> do(_,_), s_rootv(PA).– s_rootv(PA) --> np(Y,_), vp_rootv(Z,_), {predarg(Y,Z,1,PA)}.– vp_rootv(vp(Y,Z),root) --> transitive(Y,root), np(Z,_).

• Japanese grammar– Code– js(yesno(PA)) --> jnp(Y,notwh), nomcase, jvp(Z,notwh), [ka],

{predarg(Y,Z,2,PA)}.

• Translator– Code– mapPA(yesno(E),yesno(J)) :- mapPA(E,J).

Question 4: English Idiom

• Complete the translator so that– John kicked the bucket

• has both a literal and an idiomatic translation• Taroo-ga buketsu-o ketta• Taroo-ga shinda• buketsu = bucket• shinda = died ketta = kicked

– John kicked the buckets• has only a literal translation• Taroo-ga buketsu-o ketta • (assuming Japanese does not distinguish number)

Question 4: English Idiom

• Japanese grammar– Code– jvp(vp(Y),notwh) --> jintransitive(Y).– jintransitive(v(shinda)) --> [shinda].– jtransitive(v(ketta)) --> [ketta].– jnp(np(buketsu),notwh) --> [buketsu].

• Translator– Code– je(ketta,kicked).– je(shinda,died).– je(buketsu,bucket).– je(buketsu,buckets).– mapPA(E,J) :- % for intransitive P(S) <=> PJ(SJ)– E =.. [P,S], je(PJ,P), je(SJ,S), J =.. [PJ,SJ].

Question 5: Japanese Idiom

• examples– taroo-ga sensei-ni goma-o sutta– taroo-nom teacher-dat sesame-acc grinded– “John flattered the teacher”– taroo-ga Hanako-ni goma-o sutta– taroo-nom Hanako-dat sesame-acc grinded– “John flattered Mary”– ni = (dat) dative Case marker– odateta is the Japanese counterpart for flattered

Question 5: Japanese Idiom

• English grammar– Code– common_noun(n(teacher),sg,notwh) --> [teacher].– proper_noun(mary) --> [mary].– transitive(v(flattered),ed) --> [flattered].

• Japanese grammar– Code– jvp(vp(Z,v(odateta)),Q) --> jnp(Z,Q),datcase, [goma],acccase, [sutta].– jtransitive(v(odateta)) --> [odateta].– jnp(np(hanako),notwh) --> [hanako].– jnp(np(sensei),notwh) --> [sensei].– datcase --> [ni].

• Translator– Code– je(odateta,flattered).– je(hanako,mary).– je(sensei,teacher).

Part 3

• Class Evaluations