+ All Categories
Home > Documents > CS388: Natural Language Processing Lecture 18: Machine … · 2019. 10. 29. · Results: WMT...

CS388: Natural Language Processing Lecture 18: Machine … · 2019. 10. 29. · Results: WMT...

Date post: 27-Jan-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
35
CS388: Natural Language Processing Greg Durre8 Lecture 18: Machine Transla=on 2
Transcript
  • CS388:NaturalLanguageProcessing

    GregDurre8

    Lecture18:MachineTransla=on2

  • Administrivia

    ‣ Project2dueinoneweek

  • Recall:Phrase-BasedMT

    Unlabeled English data

    cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8 dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9 language ||| langue ||| 0.9 …

    Language model P(e)

    Phrase table P(f|e) P (e|f) / P (f |e)P (e)Noisy channel model: combine scores from translation model + language model to translate foreign to

    English

    “Translate faithfully but make fluent English”

    }

  • Recall:HMMforAlignment

    Brownetal.(1993)

    Thankyou,Ishalldosogladly.e

    ‣ Sequen=aldependencebetweena’stocapturemonotonicity

    0 2 6

    Gracias,loharedemuybuengrado.f

    5 7 7 7 7 8a

    ‣ Alignmentdistparameterizedbyjumpsize:‣ :wordtransla=ontableP (fi|eai)

    §  Wantlocalmonotonicity:mostjumpsaresmall§  HMMmodel(Vogel96)

    §  Re-es>mateusingtheforward-backwardalgorithm -2 -1 0 1 2 3

    P (f ,a|e) =nY

    i=1

    P (fi|eai)P (ai|ai�1)

  • Recall:Decoding

    …didnotidx=2

    Marynot

    Maryno

    4.2

    -1.2

    -2.9

    idx=2

    idx=2

    …notgiveidx=3

    …notslapidx=5

    …notslapidx=6

    1 2 3 4 5 6 7 8 9

    ‣ ScoresfromlanguagemodelP(e)+transla=onmodelP(f|e)

  • ThisLecture

    ‣ NeuralMTdetails

    ‣ DilatedCNNsforMT

    ‣ TransformersforMT

    ‣ Syntac=cMT

  • Syntac=cMT

  • LevelsofTransfer:VauquoisTriangle

    Slidecredit:DanKlein‣ Issyntaxa“be8er”abstrac=onthanphrases?

  • Syntac=cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

    NP→[DT1JJ2NN3;DT1NN3JJ2]

    DT→[the,la]

    NN→[car,voiture]

    JJ→[yellow,jaune]the yellow car

    ‣ Assumesparallelsyntaxuptoreordering

    DT→[the,le]

    la voiture jaune

    NP NP

    DT1 NN3 JJ2DT1 NN3JJ2

    ‣ Transla=on=parsetheinputwith“half”thegrammar,readoffotherhalf

  • Syntac=cMT

    Slidecredit:DanKlein

    ‣ Relaxthisbyusinglexicalizedrules,like“syntac=cphrases”

    ‣ LeadstoHUGEgrammars,
parsingisslow

  • NeuralMT

  • Encoder-DecoderMT

    Sutskeveretal.(2014)

    ‣ SOTA=37.0—notallthatcompe==ve…

    ‣ Sutskeverseq2seqpaper:firstmajorapplica=onofLSTMstoNLP‣ Basicencoder-decoderwithbeamsearch

  • Encoder-DecoderMT

    ‣ Be8ermodelfromseq2seqlectures:encoder-decoderwitha8en=onandcopyingforrarewords

    themoviewasgreat

    h1 h2 h3 h4

    h̄1

    c1

    distribu=onovervocab+copying

    le

  • Results:WMTEnglish-French

    Classicphrase-basedsystem:~33BLEU,usesaddi=onaltarget-languagedata

    RerankwithLSTMs:36.5BLEU(longlineofworkhere;Devlin+2014)

    Sutskever+(2014)seq2seqsingle:30.6BLEU

    Sutskever+(2014)seq2seqensemble:34.8BLEU

    ‣ ButEnglish-Frenchisareallyeasylanguagepairandthere’stonsofdataforit!Doesthisapproachworkforanythingharder?

    Luong+(2015)seq2seqensemblewitha8en=onandrarewordhandling:37.5BLEU

    ‣ 12Msentencepairs

  • Results:WMTEnglish-German

    ‣ BLEUisn’tcomparableacrosslanguages,butthisperformances=llisn’tasgood

    Classicphrase-basedsystem:20.7BLEU

    Luong+(2014)seq2seq:14BLEU

    ‣ French,Spanish=easiest
German,Czech,Chinese=harder
Japanese,Russian=hard(gramma=callydifferent,lotsofmorphology…)

    Luong+(2015)seq2seqensemblewithrarewordhandling:23.0BLEU

    ‣ 4.5Msentencepairs

  • MTExamples

    Luongetal.(2015)

    ‣ NMTsystemscanhallucinatewords,especiallywhennotusinga8en=on—phrase-baseddoesn’tdothis

    ‣ best=witha8en=on,base=noa8en=on

  • MTExamples

    Luongetal.(2015)

    ‣ best=witha8en=on,base=noa8en=on

  • Zhangetal.(2017)

    ‣ NMTcanrepeatitselfifitgetsconfused(pHorpH)

    ‣ Phrase-basedMTowengetschunksright,mayhavemoresubtleungramma=cali=es

    MTExamples

  • HandlingRareWords

    ‣Wordsareadifficultunittoworkwith:copyingcanbecumbersome,wordvocabulariesgetverylarge

    Sennrichetal.(2016)

    ‣ Character-levelmodelsdon’tworkwell

    Input:_the_ecotax_portico_in_Pont-de-Buis…

    Output:_le_portique_écotaxe_de_Pont-de-Buis

    ‣ Compromisesolu=on:usethousandsof“wordpieces”(whichmaybefullwordsbutmayalsobepartsofwords)

    ‣ Canachievetranslitera=onwiththis,subwordstructuremakessometransla=onseasiertoachieve

  • BytePairEncoding(BPE)

    ‣ Dothiseitheroveryourvocabulary(originalversion)oroveralargecorpus(morecommonversion)

    ‣ Startwitheveryindividualbyte(basicallycharacter)asitsownsymbol

    Sennrichetal.(2016)

    ‣ Countbigramcharactercooccurrences

    ‣Mergethemostfrequentpairofadjacentcharacters

    ‣ Doing8kmerges=>vocabularyofaround8000wordpieces.Includesmanywholewords

    ‣MostSOTANMTsystemsusethisonbothsource+target

  • WordPieces

    ‣ SentencePiecelibraryfromGoogle:unigramLM

    SchusterandNakajima(2012),Wuetal.(2016),KudoandRichardson(2018)

    Buildalanguagemodeloveryourcorpus

    Mergepiecesthatleadtohighestimprovementinlanguagemodelperplexity

    ‣ Issues:whatLMtouse?Howtomakethistractable?

    whilevocsize<targetvocsize:

    ‣ Result:wayofsegmen=nginputappropriatefortransla=on

  • Google’sNMTSystem

    Wuetal.(2016)

    ‣ 8-layerLSTMencoder-decoderwitha8en=on,wordpiecevocabularyof8k-32k

  • Google’sNMTSystem

    Wuetal.(2016)

    Luong+(2015)seq2seqensemblewithrarewordhandling:37.5BLEUGoogle’s32kwordpieces:38.95BLEU

    Google’sphrase-basedsystem:37.0BLEU

    English-French:

    Luong+(2015)seq2seqensemblewithrarewordhandling:23.0BLEUGoogle’s32kwordpieces:24.2BLEU

    Google’sphrase-basedsystem:20.7BLEU

    English-German:

  • HumanEvalua=on(En-Es)

    Wuetal.(2016)

    ‣ Similartohuman-level 
performanceonEnglish-Spanish

  • Google’sNMTSystem

    Wuetal.(2016)

    GenderiscorrectinGNMTbutnotinPBMT

    “sled”“walker”

  • Backtransla=on‣ ClassicalMTmethodsusedabilingualcorpusofsentencesB=(S,T)andalargemonolingualcorpusT’totrainalanguagemodel.CanneuralMTdothesame?

    Sennrichetal.(2015)

    s1,t1

    [null],t’1[null],t’2

    s2,t2…

    ‣ Approach1:forcethesystemtogenerateT’astargetsfromnullinputs

    ‣ Approach2:generatesynthe=c
sourceswithaT->Smachine
transla=onsystem(backtransla=on)

    s1,t1

    MT(t’1),t’1

    s2,t2…

    …MT(t’2),t’2

  • Backtransla=on

    Sennrichetal.(2015)

    ‣ parallelsynth:backtranslatetrainingdata;makesaddi=onalnoisysourcesentenceswhichcouldbeuseful

    ‣ Gigaword:largemonolingualEnglishcorpus

  • TransformersforMT

  • Recall:Self-A8en=on

    Vaswanietal.(2017)

    themoviewasgreat

    ‣ Eachwordformsa“query”whichthencomputesa8en=onovereachword

    ‣Mul=ple“heads”analogoustodifferentconvolu=onalfilters.UseparametersWkandVktogetdifferenta8en=onvalues+transformvectors

    x4

    x04

    scalar

    vector=sumofscalar*vector

    ↵i,j = softmax(x>i xj)

    x0i =nX

    j=1

    ↵i,jxj

    ↵k,i,j = softmax(x>i Wkxj) x

    0k,i =

    nX

    j=1

    ↵k,i,jVkxj

  • Transformers

    Vaswanietal.(2017)

    themoviewasgreat

    ‣ Augmentwordembeddingwithposi=onembeddings,eachdimisasine/cosinewaveofadifferentfrequency.Closerpoints=higherdotproducts

    ‣Worksessen=allyaswellasjustencodingposi=onasaone-hotvector

    themoviewasgreat

    emb(1)

    emb(2)

    emb(3)

    emb(4)

  • Transformers

    Vaswanietal.(2017)

    ‣ Encoderanddecoderarebothtransformers

    ‣ Decoderconsumesthepreviousgeneratedtoken(anda8endstoinput),buthasnorecurrentstate

  • Transformers

    Vaswanietal.(2017)

    ‣ Big=6layers,1000dimforeachtoken,16heads,base=6layers+otherparamshalved

  • Visualiza=on

    Vaswanietal.(2017)

  • Visualiza=on

    Vaswanietal.(2017)

  • Takeaways

    ‣ CanbuildMTsystemswithLSTMencoder-decoders,CNNs,ortransformers

    ‣Wordpiece/bytepairmodelsarereallyeffec=veandeasytouse

    ‣ Stateoftheartsystemsarege|ngpre8ygood,butlotsofchallengesremain,especiallyforlow-resourcese|ngs

    ‣ Next=me:pre-trainedtransformermodels(BERT),appliedtoothertasks


Recommended