+ All Categories
Home > Documents > Discriminave Phonotaccs for Dialect Recognion Using ...

Discriminave Phonotaccs for Dialect Recognion Using ...

Date post: 25-Feb-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
37
Discrimina)ve Phonotac)cs for Dialect Recogni)on Using Context‐Dependent Phone Classifiers Fadi Biadsy*, Hagen Soltau+, Lidia Mangu+, Jiri Navra6l+, Julia Hirschberg* *Columbia University, NY, USA +IBM T. J. Watson Research Center, NY, USA July 1 st , 2010 1
Transcript
Page 1: Discriminave Phonotaccs for Dialect Recognion Using ...

Discrimina)vePhonotac)csforDialectRecogni)onUsingContext‐Dependent

PhoneClassifiers

FadiBiadsy*,HagenSoltau+,LidiaMangu+,JiriNavra6l+,JuliaHirschberg*

*ColumbiaUniversity,NY,USA+IBMT.J.WatsonResearchCenter,NY,USA

July1st,2010

1

Page 2: Discriminave Phonotaccs for Dialect Recognion Using ...

DialectRecogni)on

 Similartolanguagerecogni)on,butusedialects/accentsofthesamelanguage

 Dialectsmaydifferinanydimensionofthelinguis)cspectrum Differencesarelikelytobemoresubtleacrossdialectsthanthoseacrosslanguages

 Thus,morechallengingproblemthanlanguagerecogni)on

2

Page 3: Discriminave Phonotaccs for Dialect Recognion Using ...

Mo)va)on:WhyStudyDialectRecogni)on?

  Discoverdifferencesbetweendialects

  ToimproveAutoma)cSpeechRecogni)on(ASR)  Modeladapta)on:Pronuncia)on,Acous)c,Morphological,Languagemodels

  Toinferspeaker’sregionaloriginfor  Forensicspeakerprofiling

  Speechtospeechtransla)on

  Annota)onsforBroadcastNewsMonitoring

  Spokendialoguesystems–adaptTTSsystems

  Charisma)cspeechiden)fica)on

3

Page 4: Discriminave Phonotaccs for Dialect Recognion Using ...

Mul)plecuesthatmaydis)nguishdialects:

  Phone)ccues:  Differencesinphonemicinventory

  Phonemicdifferences

  Allophonicdifferences(context‐dependentphones)

  Phonotac)cs

4

(Al‐Tamimi&Ferragne,2005)

Example:/r/ApproximantinAmericanEnglish[ɹ]–modifiesprecedingvowelsTrilledinSco8shEnglishin[Consonant]–/r/–[Vowel] andinsomeothercontexts

MSA: /s/ /a/ /t/ /u/ /q/ /A/ /b/ /i/ /l/ /u/ /h/ /u/

Egy: /H/ /a/ /t/ /?/ /a/ /b/ /l/ /u/

Lev: /r/ /a/ /H/ /t/ /g/ /A/ /b/ /l/ /u/

DifferencesinMorphology

Differencesinphone)cinventoryandvowelusage

“Shewillmeethim”

Page 5: Discriminave Phonotaccs for Dialect Recognion Using ...

Outline

 DialectsandCorpora

 CD‐PhoneRecognizer

 Baselines

 TwoIdeas:  GMM‐UBMwithfMLLR

 Discrimina)vePhonotac)cs

 Results

 ConclusionsandFutureWork

5

Page 6: Discriminave Phonotaccs for Dialect Recognion Using ...

CaseStudy:ArabicDialects

(by Arab Atlas)

Page 7: Discriminave Phonotaccs for Dialect Recognion Using ...

Corpora

  For testing:   (25% female – mobile, 25% female – landline, 25% male – mobile, 25 % male – landline)

  Egyptian: Training: CallHomeEgyp)an,Tes)ng:CallFriendEgyp)an

Dialect #Speakers Test20%–30s*testcuts

Corpus

Gulf 976 801 (AppenPtyLtd,2006a)

Iraqi 478 477 (AppenPtyLtd,2006b)

Levan)ne 985 818 (AppenPtyLtd,2007)

Dialect #TrainingSpeakers #120speakers30s*cuts

Corpora

Egyp)an 280 1912 (CanavanandZipperlen,1996)(Canavanetal.,1997)

7 *Exactly 30s

Page 8: Discriminave Phonotaccs for Dialect Recognion Using ...

Outline

 Mo)va)on

 Corpora

 CD‐PhoneRecognizer

 Baselines

 TwoIdeas:  GMM‐UBMwithfMLLR

 Discrimina)vePhonotac)cs

 Results

 ConclusionsandFutureWork

8

Page 9: Discriminave Phonotaccs for Dialect Recognion Using ...

Context‐Dependent(CD)PhoneRecognizer

 HMM‐triphone‐basedphonerecognizerusingIBM’sAjlasystem  Trainedon50hoursofGALEbroadcastnewsandconversa)ons

 230CD‐acousHcmodelsand20,000Gaussians

 Front‐End:  13DPLPfeaturesperframe

 Eachframeissplicedtogetherwithfourprecedingandfour

succeedingframesfollowedbyLDA40D

  CMVN

 SpeakerAdapta)on:  fMLLRfollowedbyMLLR

 UnigramphonelanguagemodeltrainedonMSA9

Page 10: Discriminave Phonotaccs for Dialect Recognion Using ...

Outline

 Mo)va)on

 Corpora

 CD‐PhoneRecognizer

 Baselines

 TwoIdeas:  GMM‐UBMwithfMLLR

 Discrimina)vePhonotac)cs

 Results

 ConclusionsandFutureWork

10

Page 11: Discriminave Phonotaccs for Dialect Recognion Using ...

Baselines

 StandardPRLM:atrigramphonotac)cmodelperdialect

 StandardGMM‐UBM: Front‐End:Sameasthefrontendofthephonerecognizer

 2048Gaussians–MLtrainedonequalnumberofframesfromeachdialect

 DialectModelsareMAPadaptedwith5itera)ons‐‐similarsejngsofthebaselinein(Torres‐Carrasquilloetal.,2008)

11

Page 12: Discriminave Phonotaccs for Dialect Recognion Using ...

Results(DETcurvesofPRLMandGMM‐UBM)–30sCuts

Approach EER(%)

PRLM 17.7

GMM‐UBM 15.3*

12 *ComparabletoGMM‐UBMof(Torres‐Carrasquilloetal.,2008)on3dialects

Page 13: Discriminave Phonotaccs for Dialect Recognion Using ...

Outline

 Mo)va)on

 Corpora

 CD‐PhoneRecognizer

 Baselines

 TwoIdeas:  GMM‐UBMwithfMLLR

 Discrimina)vePhonotac)cs

 Results

 ConclusionsandFutureWork

13

Page 14: Discriminave Phonotaccs for Dialect Recognion Using ...

OurGMM‐UBMImprovedwithfMLLR

  Mo)va)on:Featurenormaliza)on(CMVNandVTLN)improveGMM‐UBMforlanguageanddialectrecogni)on  (e.g.,WongandSridharan,2002;Torres‐Carrasquilloetal.,2008)

  Ourapproach:FeaturespaceMaximumLikelihoodLinearRegression(fMLLR)adapta)on

  UseaCD‐phonerecognizertoobtainCD‐phonesequence:transformthefeatures“towards”thecorrespondingacous)cmodelGMMs(amatrixforeachspeaker)

  SameasGMM‐UBMapproach,butusetransformedacous)cvectorsinstead

14

fMLLR

[Vowel]‐/r/‐[Consonant]

Page 15: Discriminave Phonotaccs for Dialect Recognion Using ...

Results–GMM‐UBM‐fMLLR–30sCuts

Approach EER(%)

PRLM 17.7

GMM‐UBM 15.3

GMM‐UBM‐fMLLR 11.0%

15

Page 16: Discriminave Phonotaccs for Dialect Recognion Using ...

Outline

 Mo)va)on

 Corpora

 CD‐PhoneRecognizer

 Baselines

 TwoIdeas:  GMM‐UBMwithfMLLR

 Discrimina)vePhonotac)cs

 Results

 ConclusionsandFutureWork

16

Page 17: Discriminave Phonotaccs for Dialect Recognion Using ...

Discrimina)vePhonotac)cs

  Hypothesis:Dialectsdifferintheirallophones(context‐dependentphones)andtheirphonotac)cs

  Idea:Discriminatedialectsfirstatthelevelofcontext‐dependent(CD)phonesandthenphonotac)cs

I.  Obtain CD-phones II.  Extract acoustic features for each CD-phone III.  Discriminate CD-phones across dialects IV.  Augment the CD-phone sequences and extract phonotactic features V.  Train a discriminative classifier to distinguish dialects

17

/r/ isApproximantinAmericanEnglish[ɹ]andtrilledinScojshin[Consonant] – /r/ – [Vowel]

Page 18: Discriminave Phonotaccs for Dialect Recognion Using ...

...

[Back vowel]-r-[Central Vowel]

[Plosive]-A-[Voiced Consonant]

[Central Vowel]-b-[High Vowel]

...

...

Run our CD-phone recognizer

ObtainingCD‐Phones

CD-phone sequence

* not just /r/ /A/ /b/

Do the above for all training data of all dialects

18

Page 19: Discriminave Phonotaccs for Dialect Recognion Using ...

CD‐PhoneUniversalBackgroundAcous)cModel

e.g., [Back vowel]-r-[Central Vowel]

EachCDphonetypehasanacous)cmodel:

19

Page 20: Discriminave Phonotaccs for Dialect Recognion Using ...

ObtainingCD‐Phones+FrameAlignment

20

Acoustic frames: Front-End

Acoustic frames for second state

CD-Acoustic Models:

CD-Phones: (e.g.) [vowel]-b-[glide] [front-vowel]-r-[sonorant]

CD-Phone Recognizer

Page 21: Discriminave Phonotaccs for Dialect Recognion Using ...

[Back Vowel]-r-[Central Vowel]

MAP MAP

MAP

MAPadapttheCD‐phoneacous)cmodelGMMstothecorrespondingframes(r=0.1)

MAPAdapta)onofeachCD‐PhoneInstance

21

Page 22: Discriminave Phonotaccs for Dialect Recognion Using ...

MAPadapttheCD‐phoneacous)cmodelGMMstothecorrespondingframes*

OneSuperVectorforeachCDphoneinstance:

StackalltheGaussianmeansandphoneduraHonV k =[µ1, µ2, …., µN, duration]

i.e., summarize the acoustic-phonetic features of each CD-phone in one vector

[Back Vowel]-r-[Central Vowel]

MAPAdapta)onofeachCD‐PhoneInstance

22 *Similarto(Campbelletal.,2006)butatthelevelofCD‐phone

Page 23: Discriminave Phonotaccs for Dialect Recognion Using ...

Super vectors of CD-phone instances of all training speakers in dialect 1

Super vectors of CD phone instances of all training speakers in dialect 2

[Back Vowel]-r-[Central Vowel]

dialect1

dialect2

SVMClassifierforeachCD‐PhoneTypeforeachPairofDialects

23

Page 24: Discriminave Phonotaccs for Dialect Recognion Using ...

Discrimina)vePhonotac)cs–CD‐PhoneClassifica)on

24

MAP Adapted Acoustic Models:

MAP Adapt GMMs

Super Vectors

Acoustic frames: Front-End

Acoustic frames for second state

CD-Acoustic Models:

CD-Phones: (e.g.) [vowel]-b-[glide] [front-vowel]-r-[sonorant]

CD-Phone Recognizer

Super Vectors: Super Vector 1 Super Vector N

Dialects: (e.g.) SVM Classifiers Egy Egy

Page 25: Discriminave Phonotaccs for Dialect Recognion Using ...

CD‐PhoneClassifierResults

  Splitthetrainingdataintotwohalves

  Train227(oneforeachCD‐phonetype)binaryclassifiersforeachpairofdialectson1sthalfandteston2nd

25 *performedsignificantlybeqerthanchance(50%)

Page 26: Discriminave Phonotaccs for Dialect Recognion Using ...

Extrac)onofLinguis)cKnowledge

  Usetheresultsoftheseclassifierstoshowwhichphonesinwhatcontextsdis)nguishdialectsthemost(chanceis50%)

26

Levan)ne/IraqiDialects

Page 27: Discriminave Phonotaccs for Dialect Recognion Using ...

LabelingPhoneSequenceswithDialectHypotheses

...

[Back vowel]-r-[Central Vowel]

[Plosive]-A-[Voiced Consonant]

[Central Vowel]-b-[High Vowel]

...

...

Run corresponding SVM classifier to get the dialect of each CD phone

27

...

[Back vowel]-r-[Central Vowel] Egyptian

[Plosive]-A-[Voiced Consonant] Egyptian

[Central Vowel]-b-[High Vowel] Levantine

...

...

CD‐phonerecognizer

Page 28: Discriminave Phonotaccs for Dialect Recognion Using ...

TextualFeatureExtrac)onforDiscrimina)vePhonotac)cs

  Extractthefollowingtextualfeaturesfromeachpairofdialects

  Normalizevectorbyitsnorm

  Trainalogis)cregressionwithL2regularizer

28

Page 29: Discriminave Phonotaccs for Dialect Recognion Using ...

Experiments–TrainingTwoModels

  Splittrainingdataintotwohalves

  TrainSVMCD‐phoneclassifiersusingthefirsthalf

  RuntheseSVMclassifierstoannotatetheCDphonesofthe2ndhalf

  Trainthelogis)cclassifierontheannotatedsequences

29

Page 30: Discriminave Phonotaccs for Dialect Recognion Using ...

Discrimina)vePhonotac)cs–DialectRecogni)on

30

MAP Adapted Acoustic Models:

MAP Adapt GMMs

Super Vectors

Acoustic frames: Front-End

Acoustic frames for second state

CD-Acoustic Models:

CD-Phones: (e.g.) [vowel]-b-[glide] [front-vowel]-r-[sonorant]

CD-Phone Recognizer

Logistic classifier

Egyptian

Super Vectors: Super Vector 1 Super Vector N

Dialects: (e.g.) SVM Classifiers Egy [vowel]-b-[glide] [front-vowel]-r-[sonorant]

Egy

Page 31: Discriminave Phonotaccs for Dialect Recognion Using ...

Baselines

 StandardPRLM:atrigramphonotac)cmodelperdialect

 StandardGMM‐UBM: Front‐End:

 13DPLPfeaturesfrom9framesfollowedbyLDA40D

 CMVN

 2048Gaussians–MLtrainedonequalnumberofframesfromeachdialect

 DialectModelsareMAPadaptedwith5itera)ons(similartoTorres‐Carrasquilloetal.,2008)

31

Page 32: Discriminave Phonotaccs for Dialect Recognion Using ...

Results–Discrimina)vePhonotac)cs

32

Approach EER(%)

PRLM 17.7

GMM‐UBM 15.3

GMM‐UBM‐fMLLR 11.0%

Disc.PhonotacHcs 6.0%

Page 33: Discriminave Phonotaccs for Dialect Recognion Using ...

ResultsperDialect

33

Dialect GMMfMLLR

Disc.Pho.

Egyp)an 4.4% 1.3%

Iraqi 11.1% 6.6%

Levan)ne 12.8% 6.9%

Gulf 15.6% 7.8%

Page 34: Discriminave Phonotaccs for Dialect Recognion Using ...

Conclusions

  fMLLRtotransformtheacous)cfeaturessignificantlyimproveresultsforGMM‐UBMapproach  Wes)llneedtodomoreanalyses

  Theproposedmethodhelpsinunderstandingthelinguis)cdifferencesbetweendialects

  Discrimina)vephonotac)csoutperformsGMM‐UBM‐fMLLRin5%absoluteEER.

34

Page 35: Discriminave Phonotaccs for Dialect Recognion Using ...

FutureWork

 NewSVMKerneltocomputethesimilarityofallphonesuper‐vectorsacrosstwouqerancesonlyoneSVMclassifierforeachpairofdialects(IS2010;submiqed)

 Testthisapproachonshorteruqerances(3sand10s)

 Trythisapproachondialects/accentsofotherlanguages:

 Englishaccents(AmericanEnglishandIndianEnglish) AmericanEnglishDialects

 ApplyVTLN

 Tes)ngwithNAP(needtomodifytoaccommodateforshortcontextSupervectors)

35

Page 36: Discriminave Phonotaccs for Dialect Recognion Using ...

ThankYou!

  Acknowledgments:  JasonPelecanosforusefuldiscussions

36

Page 37: Discriminave Phonotaccs for Dialect Recognion Using ...

CaseStudy:ArabicDialects–OurData

  IraqiArabic:Baghdadi,Northern,andSouthern

  GulfArabic:Omani,UAE,andSaudiArabic

  Levan)neArabic:Jordanian,Lebanese,Pales)nian,andSyrianArabic

  Egyp)anArabic:primarilyCaireneArabic

37


Recommended