+ All Categories
Home > Documents > Dynamically shaping the reordering search space of phrase...

Dynamically shaping the reordering search space of phrase...

Date post: 15-Feb-2019
Category:
Upload: ngoduong
View: 230 times
Download: 0 times
Share this document with a friend
43
Dynamically shaping the reordering search space of phrase-based SMT Arianna Bisazza & Marcello Federico
Transcript
Page 1: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Dynamically shaping the reordering search space of phrase-based SMT

Arianna Bisazza & Marcello Federico

Page 2: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Phrase-based SMT

2  2  

•  No  sentence  structure,  can  only  model  local  dependencies  •  Wrt  tree-­‐based  SMT:  smaller  models,  faster  decoding,  very  

compe>>ve  for  transla>ng  between  similar  languages  

•  Most  popular  framework  in  SMT  produc>on  scenarios  today    

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 3: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Phrase-based SMT

3  3  

•  No  sentence  structure,  can  only  model  local  dependencies  •  Wrt  tree-­‐based  SMT:  smaller  models,  faster  decoding,  very  

compe>>ve  for  transla>ng  between  similar  languages  

•  Most  popular  framework  in  SMT  produc>on  scenarios  today    •  Problem:  doesn’t  handle  well  long-­‐range  reordering!  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Goal  of  this  work:  dynamically  shape  the  space  of  reorderings  explored  during  search  

•  BeNer  transla>on  and  faster  decoding  with  loose  reordering  contraints  

Page 4: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Phrase-based SMT

4  

wordT1      wordT2            wordT3      wordT4              .    .    .  

LM  scores  

     wordS1      wordS2      wordS3        wordS4    wordS5        wordS6      wordS7          

LM  scores  

Disto.  scores   Disto.  scores  

SRC:  

 TRG:  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

                                                       logPTM-­‐d(f|e)   logPTM-­‐i(e|f)   logPLM(e)   logPRM(ft-­‐1,ft)  

         αTM                                                      αTM-­‐i                                                                αLM                                      αRM          …    +   +  

Page 5: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

5  

Reordering search space

5   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 6: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Reordering search space

6  6   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Searching  over  all  permuta>ons  is  NP-­‐hard  

•  Hard  reordering  constraints  applied  on  word-­‐to-­‐word  jumps  

Page 7: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Reordering search space

7  7   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

•  Searching  over  all  permuta>ons  is  NP-­‐hard  

•  Hard  reordering  constraints  applied  on  word-­‐to-­‐word  jumps  

 .  .  .    

Page 8: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

Reordering search space

8  8   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Linear  distor>on  limit  (DL)  

•  Searching  over  all  permuta>ons  is  NP-­‐hard  

•  Hard  reordering  constraints  applied  on  word-­‐to-­‐word  jumps  

DL=3  

 .  .  .    

Page 9: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

The problem with DL

9  9  

Arabic-­‐English  

AR  

EN  

AR  

EN  

w0  w1  w2  w3  w4  w5  w6  w7  w8  w9  w10  

<s>   0   1   2   3   4   5   6   7   8   9   10  w0   0   1   2   3   4   5   6   7   8   9  w1   2   0   1   2   3   4   5   6   7   8  w2   3   2   0   1   2   3   4   5   6   7  w3   4   3   2   0   1   2   3   4   5   6  w4   5   4   3   2   0   1   2   3   4   5  w5   6   5   4   3   2   0   1   2   3   4  w6   7   6   5   4   3   2   0   1   2   3  w7   8   7   6   5   4   3   2   0   1   2  w8   9   8   7   6   5   4   3   2   0   1  w9   10   9   8   7   6   5   4   3   2   0  w10   11  10   9   8   7   6   5   4   3   2  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 10: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

The problem with DL

10  10  

Arabic-­‐English  

AR  

EN  

AR  

EN  

w0  w1  w2  w3  w4  w5  w6  w7  w8  w9  w10  

<s>   0   1   2   3   4   5   6   7   8   9   10  w0   0   1   2   3   4   5   6   7   8   9  w1   2   0   1   2   3   4   5   6   7   8  w2   3   2   0   1   2   3   4   5   6   7  w3   4   3   2   0   1   2   3   4   5   6  w4   5   4   3   2   0   1   2   3   4   5  w5   6   5   4   3   2   0   1   2   3   4  w6   7   6   5   4   3   2   0   1   2   3  w7   8   7   6   5   4   3   2   0   1   2  w8   9   8   7   6   5   4   3   2   0   1  w9   10   9   8   7   6   5   4   3   2   0  w10   11  10   9   8   7   6   5   4   3   2  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 11: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

The problem with DL

11  11  

German-­‐English  

w0  w1  w2  w3  w4  w5  w6  w7  w8  w9  w10  

<s>   0   1   2   3   4   5   6   7   8   9   10  w0   0   1   2   3   4   5   6   7   8   9  w1   2   0   1   2   3   4   5   6   7   8  w2   3   2   0   1   2   3   4   5   6   7  w3   4   3   2   0   1   2   3   4   5   6  w4   5   4   3   2   0   1   2   3   4   5  w5   6   5   4   3   2   0   1   2   3   4  w6   7   6   5   4   3   2   0   1   2   3  w7   8   7   6   5   4   3   2   0   1   2  w8   9   8   7   6   5   4   3   2   0   1  w9   10   9   8   7   6   5   4   3   2   0  w10   11  10   9   8   7   6   5   4   3   2  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DE  

EN  

DE  

EN  

Page 12: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

The problem with DL

12  12  

German-­‐English  

w0  w1  w2  w3  w4  w5  w6  w7  w8  w9  w10  

<s>   0   1   2   3   4   5   6   7   8   9   10  w0   0   1   2   3   4   5   6   7   8   9  w1   2   0   1   2   3   4   5   6   7   8  w2   3   2   0   1   2   3   4   5   6   7  w3   4   3   2   0   1   2   3   4   5   6  w4   5   4   3   2   0   1   2   3   4   5  w5   6   5   4   3   2   0   1   2   3   4  w6   7   6   5   4   3   2   0   1   2   3  w7   8   7   6   5   4   3   2   0   1   2  w8   9   8   7   6   5   4   3   2   0   1  w9   10   9   8   7   6   5   4   3   2   0  w10   11  10   9   8   7   6   5   4   3   2  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DE  

EN  

DE  

EN  

Page 13: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

The problem with DL

13  13   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

SRC

� � � � ������ � ��

�� �� ������ ���� � �� ���� ��������� ��� ���� � ��� ���������� ����� ���� ������� � ��� ��� ������verb subj. obj. compl.

ywASl sfyr Almmlkp AlErbyp AlsEwdyp ldY lbnAn EbdAlEzyz xwjp tHrk -h fy AtjAh ...continues ambassador Kingdom Arabian Saudi to Lebanon Abdulaziz Khawja move his in direction

REF The Kingdom of Saudi Arabia ’s ambassador to Lebanon Abdulaziz Khawja continues his moves towards ...BASE continue to Saudi Arabian ambassador to Lebanon , Abdulaziz Khwja its move in the direction of ...NEW The Kingdom of Saudi Arabia ’s ambassador to Lebanon , Abdulaziz Khwja continue its move in the direction of ...

SRC

�������� �� ����� � ��� �� ��� ��� �� ���� ����� �� ���������� �� ����� � ������ ��� ��� ���� ��adv. verb obj. subj. compl.fymA dEA -hm r}ys Almktb AlsyAsy l- Hrkp HmAs xAld m$El AlY AltzAm AlHyAd

meanwhile called them head bureau political of movement Hamas Khaled Mashal to necessity neutrality

REF Meanwhile, the Head of the Political Bureau of the Hamas movement, Khaled Mashal, called upon them to remain neutralBASE The called them, head of Hamas’ political bureau, Khalid Mashal, to remain neutralNEW The head of Hamas’ political bureau, Khalid Mashal, called on them to remain neutral

Figure 3: Long reordering examples showing improvements over the baseline system (BASE) when the DL is raised to18 and early pruning based on WaW reordering scores is enabled (NEW).

Long jumps statistics and examples. To betterunderstand the behavior of the early-pruning system,we extract phrase-to-phrase jump statistics from thedecoder log file. We find that 132 jumps beyond thenon-prunable zone (D>5) were performed to trans-late the 586 sentences of eval09-nw; 38 out of thesewere longer than 8 and mostly concentrated on theVS- sentence subset (27 jumps D>8 performed invs-09).13 This and the higher reordering scores sug-gest that long jumps are mainly carried out to cor-rectly reorder clause-inital verbs over long subjects.

Fig. 3 shows two Arabic sentences taken fromeval09-nw, that were erroneuously reordered by thebaseline system. The system including the WaWmodel and early reordering pruning, instead, pro-duced the correct translation. The first sentence isa typical example of VSO order with a long subject:while the baseline system left the verb in its Ara-bic position, producing an incomprehensible trans-lation, the new system placed it rightly between theEnglish subject and object. This reordering involvedtwo long jumps: one with D=9 backward and onewith D=8 forward.

The second sentence displays another, less com-mon, Arabic construction: namely VOS, with a per-sonal pronoun object. In this case, a backward jumpwith D=10 and a forward jump with D=8 were nec-essary to achieve the correct reordering.

13Statistics computed on the medium-LM system.

6 Conclusions

We have trained a discriminative model to predictlikely reordering steps in a way that is complemen-tary to state-of-the-art PSMT reordering models. Wehave effectively integrated it into a PSMT decoder asadditional feature, ensuring that its total score over acomplete translation hypothesis is consistent acrossdifferent phrase segmentations. Lastly, we have pro-posed early reordering pruning as a novel methodto dynamically shape the input reordering space andcapture long-range reordering phenomena that areoften critical when translating between languageswith different syntactic structures.

Evaluated on a popular Arabic-English newstranslation task against a strong baseline, our ap-proach leads to similar or even higher BLEU, ME-TEOR and KRS scores at a very high distortion limit(18), which is by itself an important achievement.At the same time, the reordering of verbs, measuredwith a novel version of the KRS, is consistently im-proved, while decoding gets significantly faster. Theimprovements are also confirmed when a very largeLM is used and the decoder’s beam size is dou-bled, which shows that our method reduces not onlysearch errors but also model errors even when base-line models are very strong.

Word reordering is probably the most difficult as-pect of SMT and an important factor of both its qual-ity and efficiency. Given its strong interaction withthe other aspects of SMT, it appears natural to solve

337

Page 14: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Reordering search space

14  14   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DL  =  3  à  3,000    word  permuta>ons  

•  Current  solu>on:  increase  distor>on  limit  

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

Page 15: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

Reordering search space

15  15   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Current  solu>on:  increase  distor>on  limit  

DL  =  3  à  3,000    word  permuta>ons  DL  =  7  à  1,246,000    

word  permuta>ons  

Page 16: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

w0   w1   w2   w3   w4   w5   w6   w7   w8   w9  

<s>   0   1   2   3   4   5   6   7   8   9  w0   0   1   2   3   4   5   6   7   8  w1   2   0   1   2   3   4   5   6   7  w2   3   2   0   1   2   3   4   5   6  w3   4   3   2   0   1   2   3   4   5  w4   5   4   3   2   0   1   2   3   4  w5   6   5   4   3   2   0   1   2   3  w6   7   6   5   4   3   2   0   1   2  w7   8   7   6   5   4   3   2   0   1  w8   9   8   7   6   5   4   3   2   0  w9   10   9   8   7   6   5   4   3   2  

Reordering search space

16  16   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Current  solu>on:  increase  distor>on  limit  

DL  =  3  à  3,000    word  permuta>ons  DL  =  7  à  1,246,000    

word  permuta>ons  

Coarse  defini>on  of  reordering  space  :  à  slower  decoding  à  worse  transla>ons  

Page 17: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

17  

Word-after-Word reordering model

17   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 18: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Word-after-Word model

18  18  

 …                w-­‐        $Ark        fy      AltZAhrp      E$rAt      AlmslHyn      mn      AlktA}b            yes   no  no   no  no  no   no   no  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

 …          and  dozens  of  militants  from  the  brigades  took  part  in  the  march  

•  Predict  whether  input  word  j  should  be  translated  right  a:er  input  word  i  

•  Maximum-­‐entropy  binary  classifier  •  Features  of  i,  j,  their  context  and  words  between  i  and  j  

Page 19: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Word-after-Word model

19  19  

 …                w-­‐        $Ark        fy      AltZAhrp      E$rAt      AlmslHyn      mn      AlktA}b            yes   no  no   no  no  no   no   no  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

 …          and  dozens  of  militants  from  the  brigades  took  part  in  the  march  

Feature  examples:  •  wi=“w-­‐”  and  wj=“E$rAt”  •  pi=conj  and  pj=nns    

•  ball=“$Ark  fy  AltZAhrp”  •  b*=“$Ark”  

Page 20: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Decoder integration

20  20   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Addi>onal  feature  func>on:          

logPWaW(wt-­‐1,wt)  +                                                          

logPTM-­‐d(f|e)   logPTM-­‐i(e|f)   logPLM(e)   logPRM(ft-­‐1,ft)  

         αTM                                                      αTM-­‐i                                                                αLM                                      αRM                                                                    αWaW            …    +   +  

usual  approach  

Page 21: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Decoder integration

21  21   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Addi>onal  feature  func>on:          

                         +    Dynamically  prune  the  reordering  search  space:  ‘early  reordering  pruning’  

logPWaW(wt-­‐1,wt)  +                                                          

logPTM-­‐d(f|e)   logPTM-­‐i(e|f)   logPLM(e)   logPRM(ft-­‐1,ft)  

         αTM                                                      αTM-­‐i                                                                αLM                                      αRM                                                                    αWaW            …    +   +  

usual  approach  

novel  approach  

Page 22: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

22  

Early reordering pruning

22   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 23: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Early reordering pruning

23  23  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

DL=6  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 24: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Early reordering pruning

24  24  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

DL=6  

0.2   0.2    0.4    0.6      0.6   0.2        0.7      0.4  

WaW  scores  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 25: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Early reordering pruning

25  25  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

DL=6  

0.2   0.2    0.4    0.6      0.6   0.2        0.7      0.4  

WaW  scores  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Histogram  and  threshold  pruning  based  on  WaW  score  

Page 26: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Early reordering pruning

26  26  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

DL=6  

 0.6      0.6        0.7  

WaW  scores  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Histogram  and  threshold  pruning  based  on  WaW  score  

Page 27: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Early reordering pruning

27  27  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

DL=6  

 0.6      0.6        0.7  

WaW  scores  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 28: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Early reordering pruning

28  28  

Standard  search:  explore  all  jumps  within  fixed  DL,  then  score  with  all  models  

Our  method:  only  explore  long  reorderings  that  are  likely  according  to  the  reordering  model  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

WaW  scores  

DL=6  

0.2    0.4    0.6      0.6        0.7  

ϑ=2  

“Safe  zone”  always  explored  

0.2  

Page 29: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Early reordering pruning

29  29   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DL=6  

0.2    0.4    0.6      0.6        0.7  

ϑ=2  

0.2  

Page 30: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Early reordering pruning

30  30   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

DL=6  

0.2    0.4    0.6      0.6        0.7  

ϑ=2  

0.2  

0.6   0.5   0.2   0.1   0.3   0.1   0.1   0.2   0.2   0.1   10  

0.6   0.5   0.1   0.3   0.1   0.1   0.4   0.1   0.2   0.1  

0.6   0.9   0.4   0.2   0.2   0.1   0.1   0.2   0.1   0.1  

0.6   0.5   0.8   0.4   0.2   0.3   0.4   0.4   0.2   0.2  

0.2   0.4   0.3   0.9   0.3   0.4   0.6   0.2   0.5   0.3  

0.1   0.3   0.6   0.7   0.9   0.3   0.4   0.6   0.7   0.1  

0.1   0.1   0.4   0.5   0.2   0.6   0.8   0.4   0.4   0.2  

0.4   0.2   0.3   0.4   0.6   0.2   0.8   0.4   0.1   0.1  

0.1   0.1   0.1   0.3   0.5   0.3   0.1   0.9   0.5   0.7  

0.2   0.2   0.1   0.2   0.2   0.2   0.1   0.4   0.6   0.5  

0.1   0.1   0.2   0.1   0.1   0.8   0.6   0.1   0.3   0.6  

0.1   0.1   0.1   0.1   0.1   0.2   0.1   0.3   0.1   0.1  

Off  limits  

Prunable  zone  

Non-­‐prunable  zone  

Page 31: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

31  

Experiments

31   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 32: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Experimental setup

32  32  

•  NIST-­‐MT09  Arabic-­‐English  newswire  (eval09)  

•  Hierarchical  lexicalized  reordering  models  [Galley  &  Manning  08]  

•  Early  distor>on  cost  [Moore  &  Quirk  07]  

•  Evalua>on  by:                  BLEU    for  lexical  match  &  local  order            KRS        Kendall  Reordering  Score  for  global  order  [Birch  &  al.10]  •  Two  tes>ng  condi>ons:  

 medium-­‐scale  LM,  stack  size  200    large-­‐scale  LM,  stack  size  400  

Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Page 33: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

base,DL8  

base,DL18  

+waw,DL18  +reoPrune  

83.8  

84.2  

84.6  

85.0  

50.2   50.4   50.6   50.8   51   51.2  

KRS  

BLEU  

Results (medium-scale)

33  33   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Transla>on  Quality  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

Page 34: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

base,DL8  

base,DL18  

+waw,DL18  +reoPrune  

83.8  

84.2  

84.6  

85.0  

50.2   50.4   50.6   50.8   51   51.2  

KRS  

BLEU  

Results (medium-scale)

34  34   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Transla>on  Quality  

+0.6  BLEU  +1.0  KRS  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

Page 35: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

base,DL8  

base,DL18  

+waw,DL18  +reoPrune  

83.8  

84.2  

84.6  

85.0  

50.2   50.4   50.6   50.8   51   51.2  

KRS  

BLEU  

Results (medium-scale)

35  35   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Decoding  Time  Transla>on  Quality  

87  

164  

68  

0   50   100   150  

 base,DL8  

 base,DL18  

   +WaW,DL18  +reo.prune  

ms/word  

+0.6  BLEU  +1.0  KRS  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

Page 36: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

base,DL8  

+waw,DL18  +reoPrune  

base,DL18  

82.8  

83.2  

83.6  

84.0  

84.4  

84.8  

51   51.4   51.8   52.2   52.6   53  

KRS  

BLEU  

Results (large-scale)

36  36   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Transla>on  Quality  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

Page 37: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

base,DL8  

+waw,DL18  +reoPrune  

base,DL18  

82.8  

83.2  

83.6  

84.0  

84.4  

84.8  

51   51.4   51.8   52.2   52.6   53  

KRS  

BLEU  

Results (large-scale)

37  37   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Transla>on  Quality  

+1.2  BLEU  +1.6  KRS  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

Page 38: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

base,DL8  

+waw,DL18  +reoPrune  

base,DL18  

82.8  

83.2  

83.6  

84.0  

84.4  

84.8  

51   51.4   51.8   52.2   52.6   53  

KRS  

BLEU  

Results (large-scale)

38  38   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

2579  

5462  

1588  

0   1000   2000   3000   4000   5000   6000  

 base,DL8  

 base,DL18  

   +WaW,DL18  +reo.prune  

ms/word  

Decoding  Time  Transla>on  Quality  

+1.2  BLEU  +1.6  KRS  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

Page 39: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

base,DL8  

+waw,DL18  +reoPrune  

base,DL18  

82.8  

83.2  

83.6  

84.0  

84.4  

84.8  

51   51.4   51.8   52.2   52.6   53  

KRS  

BLEU  

Results (large-scale)

39  39   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

2579  

5462  

1588  

0   1000   2000   3000   4000   5000   6000  

 base,DL8  

 base,DL18  

   +WaW,DL18  +reo.prune  

ms/word  

Decoding  Time  Transla>on  Quality  

+1.2  BLEU  +1.6  KRS  

More  metrics  &  language  pairs  in  [Bisazza  2013]  

Early  reo.  pruning:    -­‐  histogram:  3  -­‐  threshold:  0.1  -­‐  non-­‐prunable  zone  of  width  ϑ=5  

Page 40: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Example

40  40   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

SRC

� � � � ������ � ��

�� �� ������ ���� � �� ���� ��������� ��� ���� � ��� ���������� ����� ���� ������� � ��� ��� ������verb subj. obj. compl.

ywASl sfyr Almmlkp AlErbyp AlsEwdyp ldY lbnAn EbdAlEzyz xwjp tHrk -h fy AtjAh ...continues ambassador Kingdom Arabian Saudi to Lebanon Abdulaziz Khawja move his in direction

REF The Kingdom of Saudi Arabia ’s ambassador to Lebanon Abdulaziz Khawja continues his moves towards ...BASE continue to Saudi Arabian ambassador to Lebanon , Abdulaziz Khwja its move in the direction of ...NEW The Kingdom of Saudi Arabia ’s ambassador to Lebanon , Abdulaziz Khwja continue its move in the direction of ...

SRC

�������� �� ����� � ��� �� ��� ��� �� ���� ����� �� ���������� �� ����� � ������ ��� ��� ���� ��adv. verb obj. subj. compl.fymA dEA -hm r}ys Almktb AlsyAsy l- Hrkp HmAs xAld m$El AlY AltzAm AlHyAd

meanwhile called them head bureau political of movement Hamas Khaled Mashal to necessity neutrality

REF Meanwhile, the Head of the Political Bureau of the Hamas movement, Khaled Mashal, called upon them to remain neutralBASE The called them, head of Hamas’ political bureau, Khalid Mashal, to remain neutralNEW The head of Hamas’ political bureau, Khalid Mashal, called on them to remain neutral

Figure 3: Long reordering examples showing improvements over the baseline system (BASE) when the DL is raised to18 and early pruning based on WaW reordering scores is enabled (NEW).

Long jumps statistics and examples. To betterunderstand the behavior of the early-pruning system,we extract phrase-to-phrase jump statistics from thedecoder log file. We find that 132 jumps beyond thenon-prunable zone (D>5) were performed to trans-late the 586 sentences of eval09-nw; 38 out of thesewere longer than 8 and mostly concentrated on theVS- sentence subset (27 jumps D>8 performed invs-09).13 This and the higher reordering scores sug-gest that long jumps are mainly carried out to cor-rectly reorder clause-inital verbs over long subjects.

Fig. 3 shows two Arabic sentences taken fromeval09-nw, that were erroneuously reordered by thebaseline system. The system including the WaWmodel and early reordering pruning, instead, pro-duced the correct translation. The first sentence isa typical example of VSO order with a long subject:while the baseline system left the verb in its Ara-bic position, producing an incomprehensible trans-lation, the new system placed it rightly between theEnglish subject and object. This reordering involvedtwo long jumps: one with D=9 backward and onewith D=8 forward.

The second sentence displays another, less com-mon, Arabic construction: namely VOS, with a per-sonal pronoun object. In this case, a backward jumpwith D=10 and a forward jump with D=8 were nec-essary to achieve the correct reordering.

13Statistics computed on the medium-LM system.

6 Conclusions

We have trained a discriminative model to predictlikely reordering steps in a way that is complemen-tary to state-of-the-art PSMT reordering models. Wehave effectively integrated it into a PSMT decoder asadditional feature, ensuring that its total score over acomplete translation hypothesis is consistent acrossdifferent phrase segmentations. Lastly, we have pro-posed early reordering pruning as a novel methodto dynamically shape the input reordering space andcapture long-range reordering phenomena that areoften critical when translating between languageswith different syntactic structures.

Evaluated on a popular Arabic-English newstranslation task against a strong baseline, our ap-proach leads to similar or even higher BLEU, ME-TEOR and KRS scores at a very high distortion limit(18), which is by itself an important achievement.At the same time, the reordering of verbs, measuredwith a novel version of the KRS, is consistently im-proved, while decoding gets significantly faster. Theimprovements are also confirmed when a very largeLM is used and the decoder’s beam size is dou-bled, which shows that our method reduces not onlysearch errors but also model errors even when base-line models are very strong.

Word reordering is probably the most difficult as-pect of SMT and an important factor of both its qual-ity and efficiency. Given its strong interaction withthe other aspects of SMT, it appears natural to solve

337

Page 41: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Example

41  41   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

SRC

� � � � ������ � ��

�� �� ������ ���� � �� ���� ��������� ��� ���� � ��� ���������� ����� ���� ������� � ��� ��� ������verb subj. obj. compl.

ywASl sfyr Almmlkp AlErbyp AlsEwdyp ldY lbnAn EbdAlEzyz xwjp tHrk -h fy AtjAh ...continues ambassador Kingdom Arabian Saudi to Lebanon Abdulaziz Khawja move his in direction

REF The Kingdom of Saudi Arabia ’s ambassador to Lebanon Abdulaziz Khawja continues his moves towards ...BASE continue to Saudi Arabian ambassador to Lebanon , Abdulaziz Khwja its move in the direction of ...NEW The Kingdom of Saudi Arabia ’s ambassador to Lebanon , Abdulaziz Khwja continue its move in the direction of ...

SRC

�������� �� ����� � ��� �� ��� ��� �� ���� ����� �� ���������� �� ����� � ������ ��� ��� ���� ��adv. verb obj. subj. compl.fymA dEA -hm r}ys Almktb AlsyAsy l- Hrkp HmAs xAld m$El AlY AltzAm AlHyAd

meanwhile called them head bureau political of movement Hamas Khaled Mashal to necessity neutrality

REF Meanwhile, the Head of the Political Bureau of the Hamas movement, Khaled Mashal, called upon them to remain neutralBASE The called them, head of Hamas’ political bureau, Khalid Mashal, to remain neutralNEW The head of Hamas’ political bureau, Khalid Mashal, called on them to remain neutral

Figure 3: Long reordering examples showing improvements over the baseline system (BASE) when the DL is raised to18 and early pruning based on WaW reordering scores is enabled (NEW).

Long jumps statistics and examples. To betterunderstand the behavior of the early-pruning system,we extract phrase-to-phrase jump statistics from thedecoder log file. We find that 132 jumps beyond thenon-prunable zone (D>5) were performed to trans-late the 586 sentences of eval09-nw; 38 out of thesewere longer than 8 and mostly concentrated on theVS- sentence subset (27 jumps D>8 performed invs-09).13 This and the higher reordering scores sug-gest that long jumps are mainly carried out to cor-rectly reorder clause-inital verbs over long subjects.

Fig. 3 shows two Arabic sentences taken fromeval09-nw, that were erroneuously reordered by thebaseline system. The system including the WaWmodel and early reordering pruning, instead, pro-duced the correct translation. The first sentence isa typical example of VSO order with a long subject:while the baseline system left the verb in its Ara-bic position, producing an incomprehensible trans-lation, the new system placed it rightly between theEnglish subject and object. This reordering involvedtwo long jumps: one with D=9 backward and onewith D=8 forward.

The second sentence displays another, less com-mon, Arabic construction: namely VOS, with a per-sonal pronoun object. In this case, a backward jumpwith D=10 and a forward jump with D=8 were nec-essary to achieve the correct reordering.

13Statistics computed on the medium-LM system.

6 Conclusions

We have trained a discriminative model to predictlikely reordering steps in a way that is complemen-tary to state-of-the-art PSMT reordering models. Wehave effectively integrated it into a PSMT decoder asadditional feature, ensuring that its total score over acomplete translation hypothesis is consistent acrossdifferent phrase segmentations. Lastly, we have pro-posed early reordering pruning as a novel methodto dynamically shape the input reordering space andcapture long-range reordering phenomena that areoften critical when translating between languageswith different syntactic structures.

Evaluated on a popular Arabic-English newstranslation task against a strong baseline, our ap-proach leads to similar or even higher BLEU, ME-TEOR and KRS scores at a very high distortion limit(18), which is by itself an important achievement.At the same time, the reordering of verbs, measuredwith a novel version of the KRS, is consistently im-proved, while decoding gets significantly faster. Theimprovements are also confirmed when a very largeLM is used and the decoder’s beam size is dou-bled, which shows that our method reduces not onlysearch errors but also model errors even when base-line models are very strong.

Word reordering is probably the most difficult as-pect of SMT and an important factor of both its qual-ity and efficiency. Given its strong interaction withthe other aspects of SMT, it appears natural to solve

337

Page 42: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Conclusions

42  42   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

•  Phrase-­‐based  remains  strong  baseline  in  many  language  pairs,  but  typically  at  the  expense  of  long-­‐reordering  phenomena  

•  We  presented  a  method  to  capture  long-­‐range  reordering  in  phrase-­‐based  SMT  without  sacrificing  efficiency  

•  Results:  beNer  reordering  and  transla>on  quality  in  a  large-­‐scale  Arabic-­‐English  transla>on  system  

•  Can  be  seen  as  mix  of  pre-­‐ordering  and  decoding-­‐>me  reordering  approaches  

•  Same  idea  can  be  applied  to  other  reordering  models!  

Page 43: Dynamically shaping the reordering search space of phrase ...liacs.leidenuniv.nl/~bisazzaa/slides/TACL13-dynamic-reordering... · The problem with DL 13 Bisazza&"Federico"–Dynamically"shaping"the"reordering"search"space"of"PSMT"

Conclusions

43  43   Bisazza  &  Federico  –  Dynamically  shaping  the  reordering  search  space  of  PSMT  

Thanks  for  your  aNen>on!  

•  Phrase-­‐based  remains  strong  baseline  in  many  language  pairs,  but  typically  at  the  expense  of  long-­‐reordering  phenomena  

•  We  presented  a  method  to  capture  long-­‐range  reordering  in  phrase-­‐based  SMT  without  sacrificing  efficiency  

•  Results:  beNer  reordering  and  transla>on  quality  in  a  large-­‐scale  Arabic-­‐English  transla>on  system  

•  Can  be  seen  as  mix  of  pre-­‐ordering  and  decoding-­‐>me  reordering  approaches  

•  Same  idea  can  be  applied  to  other  reordering  models!  


Recommended