LING 388: Language and Computers
Sandiway Fong
Lecture 27: 12/1
Today
• Three things
– a bit more on the technology behind Statistical Machine Translation (SMT)
– Homework 4 Review
– Class Evaluations
Part 1
Last Time
• Statistical Machine Translation (SMT)– popular now– Language Weaver
• (Arabic, also French etc.)• newest one: Persian
Translating is EU's new boom industry
2004article
Translating is EU's new boom industry
Translating is EU's new boom industry
market is there:opportunitiesfor machinetranslation?
Statistical MT
• Avoid the explicit construction of linguistically sophisticated models of grammar
• Pioneered by IBM researchers (Brown et al., 1990)–Language Model
•Pr(S) estimated by n-grams
–Translation Model•Pr(T|S) estimated through alignment models
N-grams
• language model: P(sentence)• idea:
– collect statistics on co-occurrence of adjacent words• Brown corpus (1 million words):
– word w frequency(w) probability(w)
– the 69,971 0.070
– rabbit 11 0.000011
• example:– Just then, the white
– expectation is p(white rabbit) > p(white the)
– but p(the) > p(rabbit)
Statistical Machine Translation
• Machine Translation– Source sentence S– Target sentence T– Every pair (S,T) has a probability– P(T|S) = probability target is T given S
Statistical Machine Translation
• The Language Model: P(S)– bigrams:
• w1 w2 w3 w4 w5
• w1w2, w2w3, w3w4, w4w5
– sequences of words• S = w1 … wn
• P(S) = P(w1)P(w2| w1)…P(wn | w1 …wn-1)– product of probability of wi given preceding context for wi
• problem: we need to know too many probabilities
– bigram approximation• limit the context to the previous word• P(S) ≈ P(w1)P(w2| w1)…P(wn | wn-1)
– bigram probability estimation from corpora• P(wi| wi-1) ≈ freq(wi-1wi)/freq(wi-1) in a corpus
Statistical Machine Translation
• The Translation Model: P(T|S)– Alignment model:
• assume there is a transfer relationship between source and target words
• not necessarily 1-to-1
– Example• S = w1 w2 w3 w4 w5 w6 w7
• T = u1 u2 u3 u4 u5 u6 u7 u8 u9
• w4 -> u3 u5
• fertility of w4 = 2• distortion w5 -> u9
Statistical Machine Translation
• Alignment notation– use word positions in parentheses– no word position, no mapping– Example
• ( Les propositions ne seront pas mises en application maintenant | The(1) proposal(2) will(4) not(3,5) now(9) be implemented(6,7,8) )
• This particular alignment is not correct, an artifact of their algorithm
Statistical Machine Translation
• How to compute probability of an alignment?– Need to estimate
• Fertility probabilities– P(fertility=n|w) = probability word w has fertility n
• Distortion probabilities– P(i|j,l) = probability target word is at position i given source word at position
j and l is the length of the target
– Example• (Le chien est battu par Jean | John(6) does beat(3,4) the(1) dog(2))
– P(f=1|John)P(Jean|John) x– P(f=0|does) x– P(f=2|beat)P(est|beat)P(battu|beat) x– P(f=1|the)P(Le|the) x– P(f=1|dog)P(chien|dog) x– P(f=1|<null>)P(par|<null>) x distortion probabilities…
Statistical Machine Translation
• Not done yet– Given T– translation problem is to find S that
maximizes P(S)P(T|S)– can’t look for all possible S in the
language
• Idea (Search):– construct best S incrementally– start with a highly likely word transfer– and find a valid alignment– extending candidate S at each step– (Jean aime Marie | * )– (Jean aime Marie | John(1) * )
• Failure?– best S not a good
translation• language model
failed or• translation model
failed
– couldn’t find best S• search failure
Statistical Machine Translation
• Parameter Estimation– English/French
• from the Hansard corpus– 100 million words– bilingual Canadian parliamentary proceedings– unaligned corpus
– Language Model• P(S) from bigram model
– Translation Model• how to estimate this with an unaligned corpus?• Used EM (Estimation and Maximization) algorithm, an iterative algorithm for re-
estimating probabilities• Need
– P(u|w) for words u in T and w in S– P(n|w) for fertility n and w in S– P(i|j,l) for target position i and source position j and target length l
Statistical Machine Translation
• Experiment 1: Parameter Estimation for the Translation Model– Pick 9,000 most common
words for French and English
– 40,000 sentence pairs– 81,000,000 parameters– Initial guess: minimal
assumptions
Statistical Machine Translation
• Experiment 1: results– (English) Hear, hear!– (French) Bravo!
Statistical Machine Translation
• Experiment 2: Translation from French to English– Make task manageable
• English lexicon– 1,000 most frequent English words in corpus
• French lexicon– 1,700 most frequent French words in translations completely covered by
the selected English words
• 117,000 sentence pairs with words covered by the lexicons• 17 million parameters estimated for the translation model• bigram model of English
– 570,000 sentences – 12 million words
– 73 test sentences• Categories: (exact, alternate, different), wrong, ungrammatical
Statistical Machine Translation
Statistical Machine Translation
Statistical Machine Translation
48% (Exact, alternate, different)Editing
776 keystrokes1,916 Hansard
Part 2
• Homework 4 Review
English Grammar: e21.pl
• DCG rules• sbar(PA) --> np(X,wh), do(_,_),
s_objectwh(_,S,P), {headof(X,O), PA =..[P,S,O]}.
• sbar(S) --> s(S).• s_objectwh(s(Y,Z),S,P) --> np(Y,_),
vp_objectwh(Z), {headof(Y,S),headof(Z,P)}.
• s(PA) --> np(Y,_), vp(Z,_), {predarg(Y,Z,1,PA)}.
• np(np(Y),Q) --> pronoun(Y,Q).• np(np(Y),notwh) --> proper_noun(Y).• np(np(D,N),Q) --> det(D,Number),
common_noun(N,Number,Q).• vp(vp(v(died)),ed) -->
[kicked,the,bucket].• vp(vp(Y,Z),F) --> transitive(Y,F),
np(Z,_).• vp(vp(A,V),F) --> aux(A,F),
transitive(V,en).
• vp_objectwh(vp(Y)) --> transitive(Y,root).
• det(det(the),_) --> [the].• det(det(a),sg) --> [a].• common_noun(n(bucket),sg,notwh)
--> [bucket].• common_noun(n(buckets),pl,notwh
) --> [buckets].• common_noun(n(apple),sg,notwh)
--> [apple].• common_noun(n(apples),pl,notwh)
--> [apples].• common_noun(n(man),sg,notwh) --
> [man].• common_noun(n(book),sg,notwh)
--> [book].• common_noun(n(books),pl,notwh)
--> [books].
English Grammar: e21.pl• pronoun(who,wh) --> [who].• pronoun(what,wh) --> [what].• proper_noun(john) --> [john].• transitive(v(eats),s) --> [eats].• transitive(v(ate),ed) --> [ate].
• transitive(v(eaten),en) --> [eaten]. • transitive(v(buy),root) --> [buy].• transitive(v(buys),s) --> [buys].• transitive(v(bought),ed) --> [bought].• transitive(v(bought),en) --> [bought].• transitive(v(kicks),s) --> [kicks].• transitive(v(kicked),ed) --> [kicked].• transitive(v(kicked),en) --> [kicked].• aux(aux(was),ed) --> [was].• aux(aux(is),s) --> [is].• do(aux(does),s) --> [does].• do(aux(did),ed) --> [did].
Japanese Grammar: j21.pl
• DCG Rules• s(PA) --> np(Y,Q1), nomcase, vp(Z,Q2), sf(Q1,Q2),
{predarg(Y,Z,2,PA)}.• vp(vp(Z,Y),Q) --> np(Z,Q), acccase, transitive(Y).• transitive(v(katta)) --> [katta].• nomcase --> [ga].• acccase --> [o].• np(np(taroo),notwh) --> [taroo].• np(np(hon),notwh) --> [hon].• np(np(dare),wh) --> [dare].• np(np(nani),wh) --> [nani]. • sf(wh,notwh) --> [ka].• sf(notwh,wh) --> [ka].• sf(notwh,notwh) --> [].• sf(wh,wh) --> [ka].
predarg(X,Y,Order,PA) :- headof(X,S), headof(Y,P), order(Order,Y,NP), headof(NP,O), PA =.. [P,S,O].predarg(X,Y,_,PA) :- headof(X,S), headof(Y,P), Y = vp(_), PA =.. [P,S].
order(1,vp(_,NP),NP).order(2,vp(NP,_),NP).
headof(np(_,n(N)),N).headof(vp(v(V),_),V).headof(vp(_,v(V)),V).headof(vp(v(V)),V).headof(np(N),N).
Translator: t.pl
• Prolog translation code– translate(E,J) :- % Translator– sbar(X,E,[]), % English grammar– mapPA(X,Xp), – js(Xp,J,[]). % Japanese grammar– mapPA(E,J) :- % Map predicate-argument
• E =.. [P,S,O], • je(PJ,P),• je(SJ,S),• je(OJ,O),• J =.. [PJ,SJ,OJ].
– je(katta,bought). % Bilingual dictionary– je(hon,book).– je(taroo,john).– je(dare,who).– je(nani,what).– je(katta,buy).
Question 2: Tense
• Homework Question– (A) English morphology and tense
– (1) (1pt) • Why does • ?- translate(X,[taroo,ga,hon,o,katta]). • return duplicate answers?
– (2) (2pts) • fix the problem
– (3) (1pt) • *John buy the book (John buys the book)
– are accepted by the English grammar
• fix the problem
– submit both your grammar and relevant examples
Question 2: Tense
• English Grammar– code– vp(vp(Y,Z),F) --> transitive(Y,F), {F \== en, F \==
root}, np(Z,_).
– replacing unrestricted– vp(vp(Y,Z),F) --> transitive(Y,F), np(Z,_).
– blocks both– *John bought the book bought = -en form– *John buy the book buy = root form
Question 2: Tense
• Homework Question– (B) Tense and predicate-argument structure– let’s expand the grammar slightly– assume kau (buy(s)) is the present tense form of katta (bought)– (3pts) – Modify the translator to respect tenses when translating between
• John buys a book taroo-ga hon-o kau• John bought a book taroo-ga hon-o katta
– submit both your code and all relevant translations, e.g.• ?- translate([john,buys,a,book],X).• ?- translate(X,[taroo,ga,hon,o,kau]).
Question 2: Tense
• Translator– Code– je(kau,buys).
• Japanese grammar– Code– jtransitive(v(kau)) --> [kau].
Question 3: Yes-No Questions
• Homework Question– modify the English and Japanese grammars to incorporate yes-no
questions– modify the translator to operate on yes-no questions
– examples:• Did John buy a book? yesno(buy(john,book)) • Taroo-ga hon-o katta ka yesno(katta(taroo,hon))
Question 3: Yes-No Questions
• English grammar– Code– sbar(yesno(PA)) --> do(_,_), s_rootv(PA).– s_rootv(PA) --> np(Y,_), vp_rootv(Z,_), {predarg(Y,Z,1,PA)}.– vp_rootv(vp(Y,Z),root) --> transitive(Y,root), np(Z,_).
• Japanese grammar– Code– js(yesno(PA)) --> jnp(Y,notwh), nomcase, jvp(Z,notwh), [ka],
{predarg(Y,Z,2,PA)}.
• Translator– Code– mapPA(yesno(E),yesno(J)) :- mapPA(E,J).
Question 4: English Idiom
• Complete the translator so that– John kicked the bucket
• has both a literal and an idiomatic translation• Taroo-ga buketsu-o ketta• Taroo-ga shinda• buketsu = bucket• shinda = died ketta = kicked
– John kicked the buckets• has only a literal translation• Taroo-ga buketsu-o ketta • (assuming Japanese does not distinguish number)
Question 4: English Idiom
• Japanese grammar– Code– jvp(vp(Y),notwh) --> jintransitive(Y).– jintransitive(v(shinda)) --> [shinda].– jtransitive(v(ketta)) --> [ketta].– jnp(np(buketsu),notwh) --> [buketsu].
• Translator– Code– je(ketta,kicked).– je(shinda,died).– je(buketsu,bucket).– je(buketsu,buckets).– mapPA(E,J) :- % for intransitive P(S) <=> PJ(SJ)– E =.. [P,S], je(PJ,P), je(SJ,S), J =.. [PJ,SJ].
Question 5: Japanese Idiom
• examples– taroo-ga sensei-ni goma-o sutta– taroo-nom teacher-dat sesame-acc grinded– “John flattered the teacher”– taroo-ga Hanako-ni goma-o sutta– taroo-nom Hanako-dat sesame-acc grinded– “John flattered Mary”– ni = (dat) dative Case marker– odateta is the Japanese counterpart for flattered
Question 5: Japanese Idiom
• English grammar– Code– common_noun(n(teacher),sg,notwh) --> [teacher].– proper_noun(mary) --> [mary].– transitive(v(flattered),ed) --> [flattered].
• Japanese grammar– Code– jvp(vp(Z,v(odateta)),Q) --> jnp(Z,Q),datcase, [goma],acccase, [sutta].– jtransitive(v(odateta)) --> [odateta].– jnp(np(hanako),notwh) --> [hanako].– jnp(np(sensei),notwh) --> [sensei].– datcase --> [ni].
• Translator– Code– je(odateta,flattered).– je(hanako,mary).– je(sensei,teacher).
Part 3
• Class Evaluations