Dynamically shaping the reordering search space of phrase...

Post on 15-Feb-2019

230 views 0 download

transcript

Dynamically shaping the reordering search space of phrase-based SMT

Arianna Bisazza & Marcello Federico

Phrase-based SMT

2  2  

•  No  sentence  structure,  can  only  model  local  dependencies  •  Wrt  tree-­‐based  SMT:  smaller  models,  faster  decoding,  very  

compe>>ve  for  transla>ng  between  similar  languages  

•  Most  popular  framework  in  SMT  produc>on  scenarios  today    

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Phrase-based SMT

3  3  

•  No  sentence  structure,  can  only  model  local  dependencies  •  Wrt  tree-­‐based  SMT:  smaller  models,  faster  decoding,  very  

compe>>ve  for  transla>ng  between  similar  languages  

•  Most  popular  framework  in  SMT  produc>on  scenarios  today    •  Problem:  doesn’t  handle  well  long-­‐range  reordering!  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Goal  of  this  work:  dynamically  shape  the  space  of  reorderings  explored  during  search  

•  BeNer  transla>on  and  faster  decoding  with  loose  reordering  contraints  

Phrase-based SMT

4  

wordT1      wordT2            wordT3      wordT4              .    .    .  

LM  scores  

     wordS1      wordS2      wordS3        wordS4    wordS5        wordS6      wordS7          

LM  scores  

Disto.  scores   Disto.  scores  

SRC:  

 TRG:  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

                                                       logPTM-­‐d(f|e)   logPTM-­‐i(e|f)   logPLM(e)   logPRM(ft-­‐1,ft)  

         αTM                                                      αTM-­‐i                                                                αLM                                      αRM          …    +   +  

5  

Reordering search space

5   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Reordering search space

6  6   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Searching  over  all  permuta>ons  is  NP-­‐hard  

•  Hard  reordering  constraints  applied  on  word-­‐to-­‐word  jumps  

Reordering search space

7  7   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

•  Searching  over  all  permuta>ons  is  NP-­‐hard  

•  Hard  reordering  constraints  applied  on  word-­‐to-­‐word  jumps  

 .  .  .    

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

Reordering search space

8  8   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Linear  distor>on  limit  (DL)  

•  Searching  over  all  permuta>ons  is  NP-­‐hard  

•  Hard  reordering  constraints  applied  on  word-­‐to-­‐word  jumps  

DL=3  

 .  .  .    

The problem with DL

9  9  

Arabic-­‐English  

AR  

EN  

AR  

EN  

w0  w1  w2  w3  w4  w5  w6  w7  w8  w9  w10  

<s>   0   1   2   3   4   5   6   7   8   9   10  w0   0   1   2   3   4   5   6   7   8   9  w1   2   0   1   2   3   4   5   6   7   8  w2   3   2   0   1   2   3   4   5   6   7  w3   4   3   2   0   1   2   3   4   5   6  w4   5   4   3   2   0   1   2   3   4   5  w5   6   5   4   3   2   0   1   2   3   4  w6   7   6   5   4   3   2   0   1   2   3  w7   8   7   6   5   4   3   2   0   1   2  w8   9   8   7   6   5   4   3   2   0   1  w9   10   9   8   7   6   5   4   3   2   0  w10   11  10   9   8   7   6   5   4   3   2  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

The problem with DL

10  10  

Arabic-­‐English  

AR  

EN  

AR  

EN  

w0  w1  w2  w3  w4  w5  w6  w7  w8  w9  w10  

<s>   0   1   2   3   4   5   6   7   8   9   10  w0   0   1   2   3   4   5   6   7   8   9  w1   2   0   1   2   3   4   5   6   7   8  w2   3   2   0   1   2   3   4   5   6   7  w3   4   3   2   0   1   2   3   4   5   6  w4   5   4   3   2   0   1   2   3   4   5  w5   6   5   4   3   2   0   1   2   3   4  w6   7   6   5   4   3   2   0   1   2   3  w7   8   7   6   5   4   3   2   0   1   2  w8   9   8   7   6   5   4   3   2   0   1  w9   10   9   8   7   6   5   4   3   2   0  w10   11  10   9   8   7   6   5   4   3   2  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

The problem with DL

11  11  

German-­‐English  

w0  w1  w2  w3  w4  w5  w6  w7  w8  w9  w10  

<s>   0   1   2   3   4   5   6   7   8   9   10  w0   0   1   2   3   4   5   6   7   8   9  w1   2   0   1   2   3   4   5   6   7   8  w2   3   2   0   1   2   3   4   5   6   7  w3   4   3   2   0   1   2   3   4   5   6  w4   5   4   3   2   0   1   2   3   4   5  w5   6   5   4   3   2   0   1   2   3   4  w6   7   6   5   4   3   2   0   1   2   3  w7   8   7   6   5   4   3   2   0   1   2  w8   9   8   7   6   5   4   3   2   0   1  w9   10   9   8   7   6   5   4   3   2   0  w10   11  10   9   8   7   6   5   4   3   2  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DE  

EN  

DE  

EN  

The problem with DL

12  12  

German-­‐English  

w0  w1  w2  w3  w4  w5  w6  w7  w8  w9  w10  

<s>   0   1   2   3   4   5   6   7   8   9   10  w0   0   1   2   3   4   5   6   7   8   9  w1   2   0   1   2   3   4   5   6   7   8  w2   3   2   0   1   2   3   4   5   6   7  w3   4   3   2   0   1   2   3   4   5   6  w4   5   4   3   2   0   1   2   3   4   5  w5   6   5   4   3   2   0   1   2   3   4  w6   7   6   5   4   3   2   0   1   2   3  w7   8   7   6   5   4   3   2   0   1   2  w8   9   8   7   6   5   4   3   2   0   1  w9   10   9   8   7   6   5   4   3   2   0  w10   11  10   9   8   7   6   5   4   3   2  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DE  

EN  

DE  

EN  

The problem with DL

13  13   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

SRC

� � � � ������ � ��

�� �� ������ ���� � �� ���� ��������� ��� ���� � ��� ���������� ����� ���� ������� � ��� ��� ������verb subj. obj. compl.

ywASl sfyr Almmlkp AlErbyp AlsEwdyp ldY lbnAn EbdAlEzyz xwjp tHrk -h fy AtjAh ...continues ambassador Kingdom Arabian Saudi to Lebanon Abdulaziz Khawja move his in direction

REF The Kingdom of Saudi Arabia ’s ambassador to Lebanon Abdulaziz Khawja continues his moves towards ...BASE continue to Saudi Arabian ambassador to Lebanon , Abdulaziz Khwja its move in the direction of ...NEW The Kingdom of Saudi Arabia ’s ambassador to Lebanon , Abdulaziz Khwja continue its move in the direction of ...

SRC

�������� �� ����� � ��� �� ��� ��� �� ���� ����� �� ���������� �� ����� � ������ ��� ��� ���� ��adv. verb obj. subj. compl.fymA dEA -hm r}ys Almktb AlsyAsy l- Hrkp HmAs xAld m$El AlY AltzAm AlHyAd

meanwhile called them head bureau political of movement Hamas Khaled Mashal to necessity neutrality

REF Meanwhile, the Head of the Political Bureau of the Hamas movement, Khaled Mashal, called upon them to remain neutralBASE The called them, head of Hamas’ political bureau, Khalid Mashal, to remain neutralNEW The head of Hamas’ political bureau, Khalid Mashal, called on them to remain neutral

Figure 3: Long reordering examples showing improvements over the baseline system (BASE) when the DL is raised to18 and early pruning based on WaW reordering scores is enabled (NEW).

Long jumps statistics and examples. To betterunderstand the behavior of the early-pruning system,we extract phrase-to-phrase jump statistics from thedecoder log file. We find that 132 jumps beyond thenon-prunable zone (D>5) were performed to trans-late the 586 sentences of eval09-nw; 38 out of thesewere longer than 8 and mostly concentrated on theVS- sentence subset (27 jumps D>8 performed invs-09).13 This and the higher reordering scores sug-gest that long jumps are mainly carried out to cor-rectly reorder clause-inital verbs over long subjects.

Fig. 3 shows two Arabic sentences taken fromeval09-nw, that were erroneuously reordered by thebaseline system. The system including the WaWmodel and early reordering pruning, instead, pro-duced the correct translation. The first sentence isa typical example of VSO order with a long subject:while the baseline system left the verb in its Ara-bic position, producing an incomprehensible trans-lation, the new system placed it rightly between theEnglish subject and object. This reordering involvedtwo long jumps: one with D=9 backward and onewith D=8 forward.

The second sentence displays another, less com-mon, Arabic construction: namely VOS, with a per-sonal pronoun object. In this case, a backward jumpwith D=10 and a forward jump with D=8 were nec-essary to achieve the correct reordering.

13Statistics computed on the medium-LM system.

6 Conclusions

We have trained a discriminative model to predictlikely reordering steps in a way that is complemen-tary to state-of-the-art PSMT reordering models. Wehave effectively integrated it into a PSMT decoder asadditional feature, ensuring that its total score over acomplete translation hypothesis is consistent acrossdifferent phrase segmentations. Lastly, we have pro-posed early reordering pruning as a novel methodto dynamically shape the input reordering space andcapture long-range reordering phenomena that areoften critical when translating between languageswith different syntactic structures.

Evaluated on a popular Arabic-English newstranslation task against a strong baseline, our ap-proach leads to similar or even higher BLEU, ME-TEOR and KRS scores at a very high distortion limit(18), which is by itself an important achievement.At the same time, the reordering of verbs, measuredwith a novel version of the KRS, is consistently im-proved, while decoding gets significantly faster. Theimprovements are also confirmed when a very largeLM is used and the decoder’s beam size is dou-bled, which shows that our method reduces not onlysearch errors but also model errors even when base-line models are very strong.

Word reordering is probably the most difficult as-pect of SMT and an important factor of both its qual-ity and efficiency. Given its strong interaction withthe other aspects of SMT, it appears natural to solve

337

Reordering search space

14  14   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DL  =  3  à  3,000    word  permuta>ons  

•  Current  solu>on:  increase  distor>on  limit  

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

Reordering search space

15  15   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Current  solu>on:  increase  distor>on  limit  

DL  =  3  à  3,000    word  permuta>ons  DL  =  7  à  1,246,000    

word  permuta>ons  

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

Reordering search space

16  16   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Current  solu>on:  increase  distor>on  limit  

DL  =  3  à  3,000    word  permuta>ons  DL  =  7  à  1,246,000    

word  permuta>ons  

Coarse  defini>on  of  reordering  space  :  à  slower  decoding  à  worse  transla>ons  

17  

Word-after-Word reordering model

17   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Word-after-Word model

18  18  

 …                w-­‐        $Ark        fy      AltZAhrp      E$rAt      AlmslHyn      mn      AlktA}b            yes   no  no   no  no  no   no   no  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

 …          and  dozens  of  militants  from  the  brigades  took  part  in  the  march  

•  Predict  whether  input  word  j  should  be  translated  right  a:er  input  word  i  

•  Maximum-­‐entropy  binary  classifier  •  Features  of  i,  j,  their  context  and  words  between  i  and  j  

Word-after-Word model

19  19  

 …                w-­‐        $Ark        fy      AltZAhrp      E$rAt      AlmslHyn      mn      AlktA}b            yes   no  no   no  no  no   no   no  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

 …          and  dozens  of  militants  from  the  brigades  took  part  in  the  march  

Feature  examples:  •  wi=“w-­‐”  and  wj=“E$rAt”  •  pi=conj  and  pj=nns    

•  ball=“$Ark  fy  AltZAhrp”  •  b*=“$Ark”  

Decoder integration

20  20   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Addi>onal  feature  func>on:          

logPWaW(wt-­‐1,wt)  +                                                          

logPTM-­‐d(f|e)   logPTM-­‐i(e|f)   logPLM(e)   logPRM(ft-­‐1,ft)  

         αTM                                                      αTM-­‐i                                                                αLM                                      αRM                                                                    αWaW            …    +   +  

usual  approach  

Decoder integration

21  21   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Addi>onal  feature  func>on:          

                         +    Dynamically  prune  the  reordering  search  space:  ‘early  reordering  pruning’  

logPWaW(wt-­‐1,wt)  +                                                          

logPTM-­‐d(f|e)   logPTM-­‐i(e|f)   logPLM(e)   logPRM(ft-­‐1,ft)  

         αTM                                                      αTM-­‐i                                                                αLM                                      αRM                                                                    αWaW            …    +   +  

usual  approach  

novel  approach  

22  

Early reordering pruning

22   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Early reordering pruning

23  23  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

DL=6  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Early reordering pruning

24  24  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

DL=6  

0.2   0.2    0.4    0.6      0.6   0.2        0.7      0.4  

WaW  scores  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Early reordering pruning

25  25  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

DL=6  

0.2   0.2    0.4    0.6      0.6   0.2        0.7      0.4  

WaW  scores  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Histogram  and  threshold  pruning  based  on  WaW  score  

Early reordering pruning

26  26  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

DL=6  

 0.6      0.6        0.7  

WaW  scores  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Histogram  and  threshold  pruning  based  on  WaW  score  

Early reordering pruning

27  27  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

DL=6  

 0.6      0.6        0.7  

WaW  scores  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Early reordering pruning

28  28  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

WaW  scores  

DL=6  

0.2    0.4    0.6      0.6        0.7  

ϑ=2  

“Safe  zone”  always  explored  

0.2  

Early reordering pruning

29  29   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DL=6  

0.2    0.4    0.6      0.6        0.7  

ϑ=2  

0.2  

Early reordering pruning

30  30   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DL=6  

0.2    0.4    0.6      0.6        0.7  

ϑ=2  

0.2  

0.6   0.5   0.2   0.1   0.3   0.1   0.1   0.2   0.2   0.1   10  

0.6   0.5   0.1   0.3   0.1   0.1   0.4   0.1   0.2   0.1  

0.6   0.9   0.4   0.2   0.2   0.1   0.1   0.2   0.1   0.1  

0.6   0.5   0.8   0.4   0.2   0.3   0.4   0.4   0.2   0.2  

0.2   0.4   0.3   0.9   0.3   0.4   0.6   0.2   0.5   0.3  

0.1   0.3   0.6   0.7   0.9   0.3   0.4   0.6   0.7   0.1  

0.1   0.1   0.4   0.5   0.2   0.6   0.8   0.4   0.4   0.2  

0.4   0.2   0.3   0.4   0.6   0.2   0.8   0.4   0.1   0.1  

0.1   0.1   0.1   0.3   0.5   0.3   0.1   0.9   0.5   0.7  

0.2   0.2   0.1   0.2   0.2   0.2   0.1   0.4   0.6   0.5  

0.1   0.1   0.2   0.1   0.1   0.8   0.6   0.1   0.3   0.6  

0.1   0.1   0.1   0.1   0.1   0.2   0.1   0.3   0.1   0.1  

Off  limits  

Prunable  zone  

Non-­‐prunable  zone  

31  

Experiments

31   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Experimental setup

32  32  

•  NIST-­‐MT09  Arabic-­‐English  newswire  (eval09)  

•  Hierarchical  lexicalized  reordering  models  [Galley  &  Manning  08]  

•  Early  distor>on  cost  [Moore  &  Quirk  07]  

•  Evalua>on  by:                  BLEU    for  lexical  match  &  local  order            KRS        Kendall  Reordering  Score  for  global  order  [Birch  &  al.10]  •  Two  tes>ng  condi>ons:  

 medium-­‐scale  LM,  stack  size  200    large-­‐scale  LM,  stack  size  400  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

base,DL8  

base,DL18  

+waw,DL18  +reoPrune  

83.8  

84.2  

84.6  

85.0  

50.2   50.4   50.6   50.8   51   51.2  

KRS  

BLEU  

Results (medium-scale)

33  33   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Transla>on  Quality  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

base,DL8  

base,DL18  

+waw,DL18  +reoPrune  

83.8  

84.2  

84.6  

85.0  

50.2   50.4   50.6   50.8   51   51.2  

KRS  

BLEU  

Results (medium-scale)

34  34   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Transla>on  Quality  

+0.6  BLEU  +1.0  KRS  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

base,DL8  

base,DL18  

+waw,DL18  +reoPrune  

83.8  

84.2  

84.6  

85.0  

50.2   50.4   50.6   50.8   51   51.2  

KRS  

BLEU  

Results (medium-scale)

35  35   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Decoding  Time  Transla>on  Quality  

87  

164  

68  

0   50   100   150  

 base,DL8  

 base,DL18  

   +WaW,DL18  +reo.prune  

ms/word  

+0.6  BLEU  +1.0  KRS  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

base,DL8  

+waw,DL18  +reoPrune  

base,DL18  

82.8  

83.2  

83.6  

84.0  

84.4  

84.8  

51   51.4   51.8   52.2   52.6   53  

KRS  

BLEU  

Results (large-scale)

36  36   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Transla>on  Quality  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

base,DL8  

+waw,DL18  +reoPrune  

base,DL18  

82.8  

83.2  

83.6  

84.0  

84.4  

84.8  

51   51.4   51.8   52.2   52.6   53  

KRS  

BLEU  

Results (large-scale)

37  37   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Transla>on  Quality  

+1.2  BLEU  +1.6  KRS  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

base,DL8  

+waw,DL18  +reoPrune  

base,DL18  

82.8  

83.2  

83.6  

84.0  

84.4  

84.8  

51   51.4   51.8   52.2   52.6   53  

KRS  

BLEU  

Results (large-scale)

38  38   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

2579  

5462  

1588  

0   1000   2000   3000   4000   5000   6000  

 base,DL8  

 base,DL18  

   +WaW,DL18  +reo.prune  

ms/word  

Decoding  Time  Transla>on  Quality  

+1.2  BLEU  +1.6  KRS  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

base,DL8  

+waw,DL18  +reoPrune  

base,DL18  

82.8  

83.2  

83.6  

84.0  

84.4  

84.8  

51   51.4   51.8   52.2   52.6   53  

KRS  

BLEU  

Results (large-scale)

39  39   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

2579  

5462  

1588  

0   1000   2000   3000   4000   5000   6000  

 base,DL8  

 base,DL18  

   +WaW,DL18  +reo.prune  

ms/word  

Decoding  Time  Transla>on  Quality  

+1.2  BLEU  +1.6  KRS  

More  metrics  &  language  pairs  in  [Bisazza  2013]  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

Example

40  40   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

SRC

� � � � ������ � ��

�� �� ������ ���� � �� ���� ��������� ��� ���� � ��� ���������� ����� ���� ������� � ��� ��� ������verb subj. obj. compl.

ywASl sfyr Almmlkp AlErbyp AlsEwdyp ldY lbnAn EbdAlEzyz xwjp tHrk -h fy AtjAh ...continues ambassador Kingdom Arabian Saudi to Lebanon Abdulaziz Khawja move his in direction

REF The Kingdom of Saudi Arabia ’s ambassador to Lebanon Abdulaziz Khawja continues his moves towards ...BASE continue to Saudi Arabian ambassador to Lebanon , Abdulaziz Khwja its move in the direction of ...NEW The Kingdom of Saudi Arabia ’s ambassador to Lebanon , Abdulaziz Khwja continue its move in the direction of ...

SRC

�������� �� ����� � ��� �� ��� ��� �� ���� ����� �� ���������� �� ����� � ������ ��� ��� ���� ��adv. verb obj. subj. compl.fymA dEA -hm r}ys Almktb AlsyAsy l- Hrkp HmAs xAld m$El AlY AltzAm AlHyAd

meanwhile called them head bureau political of movement Hamas Khaled Mashal to necessity neutrality

REF Meanwhile, the Head of the Political Bureau of the Hamas movement, Khaled Mashal, called upon them to remain neutralBASE The called them, head of Hamas’ political bureau, Khalid Mashal, to remain neutralNEW The head of Hamas’ political bureau, Khalid Mashal, called on them to remain neutral

Figure 3: Long reordering examples showing improvements over the baseline system (BASE) when the DL is raised to18 and early pruning based on WaW reordering scores is enabled (NEW).

Long jumps statistics and examples. To betterunderstand the behavior of the early-pruning system,we extract phrase-to-phrase jump statistics from thedecoder log file. We find that 132 jumps beyond thenon-prunable zone (D>5) were performed to trans-late the 586 sentences of eval09-nw; 38 out of thesewere longer than 8 and mostly concentrated on theVS- sentence subset (27 jumps D>8 performed invs-09).13 This and the higher reordering scores sug-gest that long jumps are mainly carried out to cor-rectly reorder clause-inital verbs over long subjects.

Fig. 3 shows two Arabic sentences taken fromeval09-nw, that were erroneuously reordered by thebaseline system. The system including the WaWmodel and early reordering pruning, instead, pro-duced the correct translation. The first sentence isa typical example of VSO order with a long subject:while the baseline system left the verb in its Ara-bic position, producing an incomprehensible trans-lation, the new system placed it rightly between theEnglish subject and object. This reordering involvedtwo long jumps: one with D=9 backward and onewith D=8 forward.

The second sentence displays another, less com-mon, Arabic construction: namely VOS, with a per-sonal pronoun object. In this case, a backward jumpwith D=10 and a forward jump with D=8 were nec-essary to achieve the correct reordering.

13Statistics computed on the medium-LM system.

6 Conclusions

We have trained a discriminative model to predictlikely reordering steps in a way that is complemen-tary to state-of-the-art PSMT reordering models. Wehave effectively integrated it into a PSMT decoder asadditional feature, ensuring that its total score over acomplete translation hypothesis is consistent acrossdifferent phrase segmentations. Lastly, we have pro-posed early reordering pruning as a novel methodto dynamically shape the input reordering space andcapture long-range reordering phenomena that areoften critical when translating between languageswith different syntactic structures.

Evaluated on a popular Arabic-English newstranslation task against a strong baseline, our ap-proach leads to similar or even higher BLEU, ME-TEOR and KRS scores at a very high distortion limit(18), which is by itself an important achievement.At the same time, the reordering of verbs, measuredwith a novel version of the KRS, is consistently im-proved, while decoding gets significantly faster. Theimprovements are also confirmed when a very largeLM is used and the decoder’s beam size is dou-bled, which shows that our method reduces not onlysearch errors but also model errors even when base-line models are very strong.

Word reordering is probably the most difficult as-pect of SMT and an important factor of both its qual-ity and efficiency. Given its strong interaction withthe other aspects of SMT, it appears natural to solve

337

Example

41  41   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

SRC

� � � � ������ � ��

�� �� ������ ���� � �� ���� ��������� ��� ���� � ��� ���������� ����� ���� ������� � ��� ��� ������verb subj. obj. compl.

ywASl sfyr Almmlkp AlErbyp AlsEwdyp ldY lbnAn EbdAlEzyz xwjp tHrk -h fy AtjAh ...continues ambassador Kingdom Arabian Saudi to Lebanon Abdulaziz Khawja move his in direction

REF The Kingdom of Saudi Arabia ’s ambassador to Lebanon Abdulaziz Khawja continues his moves towards ...BASE continue to Saudi Arabian ambassador to Lebanon , Abdulaziz Khwja its move in the direction of ...NEW The Kingdom of Saudi Arabia ’s ambassador to Lebanon , Abdulaziz Khwja continue its move in the direction of ...

SRC

�������� �� ����� � ��� �� ��� ��� �� ���� ����� �� ���������� �� ����� � ������ ��� ��� ���� ��adv. verb obj. subj. compl.fymA dEA -hm r}ys Almktb AlsyAsy l- Hrkp HmAs xAld m$El AlY AltzAm AlHyAd

meanwhile called them head bureau political of movement Hamas Khaled Mashal to necessity neutrality

REF Meanwhile, the Head of the Political Bureau of the Hamas movement, Khaled Mashal, called upon them to remain neutralBASE The called them, head of Hamas’ political bureau, Khalid Mashal, to remain neutralNEW The head of Hamas’ political bureau, Khalid Mashal, called on them to remain neutral

Figure 3: Long reordering examples showing improvements over the baseline system (BASE) when the DL is raised to18 and early pruning based on WaW reordering scores is enabled (NEW).

Long jumps statistics and examples. To betterunderstand the behavior of the early-pruning system,we extract phrase-to-phrase jump statistics from thedecoder log file. We find that 132 jumps beyond thenon-prunable zone (D>5) were performed to trans-late the 586 sentences of eval09-nw; 38 out of thesewere longer than 8 and mostly concentrated on theVS- sentence subset (27 jumps D>8 performed invs-09).13 This and the higher reordering scores sug-gest that long jumps are mainly carried out to cor-rectly reorder clause-inital verbs over long subjects.

Fig. 3 shows two Arabic sentences taken fromeval09-nw, that were erroneuously reordered by thebaseline system. The system including the WaWmodel and early reordering pruning, instead, pro-duced the correct translation. The first sentence isa typical example of VSO order with a long subject:while the baseline system left the verb in its Ara-bic position, producing an incomprehensible trans-lation, the new system placed it rightly between theEnglish subject and object. This reordering involvedtwo long jumps: one with D=9 backward and onewith D=8 forward.

The second sentence displays another, less com-mon, Arabic construction: namely VOS, with a per-sonal pronoun object. In this case, a backward jumpwith D=10 and a forward jump with D=8 were nec-essary to achieve the correct reordering.

13Statistics computed on the medium-LM system.

6 Conclusions

We have trained a discriminative model to predictlikely reordering steps in a way that is complemen-tary to state-of-the-art PSMT reordering models. Wehave effectively integrated it into a PSMT decoder asadditional feature, ensuring that its total score over acomplete translation hypothesis is consistent acrossdifferent phrase segmentations. Lastly, we have pro-posed early reordering pruning as a novel methodto dynamically shape the input reordering space andcapture long-range reordering phenomena that areoften critical when translating between languageswith different syntactic structures.

Evaluated on a popular Arabic-English newstranslation task against a strong baseline, our ap-proach leads to similar or even higher BLEU, ME-TEOR and KRS scores at a very high distortion limit(18), which is by itself an important achievement.At the same time, the reordering of verbs, measuredwith a novel version of the KRS, is consistently im-proved, while decoding gets significantly faster. Theimprovements are also confirmed when a very largeLM is used and the decoder’s beam size is dou-bled, which shows that our method reduces not onlysearch errors but also model errors even when base-line models are very strong.

Word reordering is probably the most difficult as-pect of SMT and an important factor of both its qual-ity and efficiency. Given its strong interaction withthe other aspects of SMT, it appears natural to solve

337

Conclusions

42  42   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Phrase-­‐based  remains  strong  baseline  in  many  language  pairs,  but  typically  at  the  expense  of  long-­‐reordering  phenomena  

•  We  presented  a  method  to  capture  long-­‐range  reordering  in  phrase-­‐based  SMT  without  sacrificing  efficiency  

•  Results:  beNer  reordering  and  transla>on  quality  in  a  large-­‐scale  Arabic-­‐English  transla>on  system  

•  Can  be  seen  as  mix  of  pre-­‐ordering  and  decoding-­‐>me  reordering  approaches  

•  Same  idea  can  be  applied  to  other  reordering  models!  

Conclusions

43  43   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Thanks  for  your  aNen>on!  

•  Phrase-­‐based  remains  strong  baseline  in  many  language  pairs,  but  typically  at  the  expense  of  long-­‐reordering  phenomena  

•  We  presented  a  method  to  capture  long-­‐range  reordering  in  phrase-­‐based  SMT  without  sacrificing  efficiency  

•  Results:  beNer  reordering  and  transla>on  quality  in  a  large-­‐scale  Arabic-­‐English  transla>on  system  

•  Can  be  seen  as  mix  of  pre-­‐ordering  and  decoding-­‐>me  reordering  approaches  

•  Same  idea  can  be  applied  to  other  reordering  models!