+ All Categories
Home > Documents > NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo -...

NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo -...

Date post: 20-Apr-2018
Category:
Upload: dangquynh
View: 216 times
Download: 3 times
Share this document with a friend
32
NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo Fabio Castaldo, Emanuele Dalmasso, Pietro Laface Dipartimento di Automatica e Informatica Politecnico di Torino June 17, 2008
Transcript
Page 1: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

NIST SRE 2008 Workshop

Loquendo - Politecnico di TorinoLPT site presentation

Daniele Colibro, Claudio VairLoquendo

Fabio Castaldo, Emanuele Dalmasso, Pietro LafaceDipartimento di Automatica e Informatica Politecnico di Torino

June 17, 2008

Page 2: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

2NIST SRE 2008 Workshop: 17-18 June

Outline

Main goals

System description, key features

Training approach

Development history

Summed condition tests

Unsupervised adaptation tests

Page 3: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

3NIST SRE 2008 Workshop: 17-18 June

Our main goals

Improve performance on short durations

Improve performance on mismatched conditions

Deal with the new interview data

Perform all SRE 2008 tests

Page 4: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

4NIST SRE 2008 Workshop: 17-18 June

System descriptionTwo GMM systems were used for this evaluation

Phonetic GMM (PGMM) with 1408 (128x11) Gaussians

GMM with 512/1024/2048 Gaussians

GMM and PGMM main featuresFeature Domain Intersession Compensation (FDIC) in training and testing

Speaker factors + Relevance MAP in training

Standard log-likelihood computation in test

No discriminative models (e.g. SVM)

FoCal toolkit used for fusion, calibration and Log Likelihood computation, with prior weighted Logistic Regression objective

NEW!

NEW!

Page 5: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

5NIST SRE 2008 Workshop: 17-18 June

Key featuresAll conditions

Extended training set for gender conditioned UBMs, Intersession and Eigen-Speaker modeling

Extended data set for ZNorm and TNorm

Intersession compensation using condition-dependent subspaces

Page 6: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

6NIST SRE 2008 Workshop: 17-18 June

Key featuresShort duration

Speaker factors

Mismatched conditions:Intersession subspace estimated on more data

Speaker factors

Parameter tuning (# of Gaussians and # of MFCC parameters)

InterviewsDevelopment data for channel compensation

Page 7: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

7NIST SRE 2008 Workshop: 17-18 June

Acoustic features

Standard MFCC parametersGMM-25: 12 cep. (c1-c12) + 13 delta (Δc0-Δc12)

GMM-43: 18 cep. (c1-c18) + 19 delta (Δc0- Δc18) +6 double delta (ΔΔc0- ΔΔc5)

PGMM-36: 18 cepstrals (c1-c18) + 18 delta (Δc1- Δc18)

All systems perform feature warping to a Gaussian distribution on a 3 sec sliding window excluding silence frames

Page 8: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

8NIST SRE 2008 Workshop: 17-18 June

Training dataNo use of SRE06 for training

Gender dependent UBMs:SRE04 + SRE05

Intersession compensation eigen-matrix U:SRE04 + SRE05Interview development data

Speaker factors eigen-matrix V:SRE99 + SRE00 + SRE03 + SRE05 +1029 females and 828 males randomly selected from the Fisher English Training speech Part 1/2 among the speakers contributing at least 3 utterancesOverall: 2079 female and 1634 male speakers

Page 9: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

9NIST SRE 2008 Workshop: 17-18 June

Intersession compensation

Intersession compensation can be done:In model domain

In feature domain

Directly during test, adjusting the likelihood computation

No relevant performance differences in our tests

Page 10: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

10NIST SRE 2008 Workshop: 17-18 June

Matrix U training

Speaker model training by Relevance Map

Intersession compensation matrix U computed using the differences between models of the same speaker.

Interview: differences with the interviewee near microphone (channel 2)

Matrix U computed using EM-PCA

Page 11: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

11NIST SRE 2008 Workshop: 17-18 June

The speaker models are trained using Feature Domain Intersession Compensation (FDIC) features by a sequence of1. Eigen-speaker MAP modeling:

2. Relevance MAP adaptation using the Gaussian occupation statistics computed on s' (relevance = 16)

Speaker modeling

yVms +=′ UBM

z Dss +′=′′

Page 12: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

12NIST SRE 2008 Workshop: 17-18 June

Matrix V training

Relevance MAP training of speaker models, at least 3 utterances for each speaker

The eigen-speaker matrix V computed using EM-PCA

or

Maximum Likelihood + Minimum Divergence training

Page 13: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

13NIST SRE 2008 Workshop: 17-18 June

Interaction between U and V matrices

The matrices U and V can be trained using different approaches1. U’ first, then V’ using FDIC features depending on U’

2. U’’ and V’’ independently

3. V’’’ first, then U’’’ using models depending on V’’’

In our results, approaches 1. and 2. are equivalent, whereas 3. gives worse performance.

U and V are “orthogonal”

Different U can be used with the same V without accuracy loss

Page 14: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

14NIST SRE 2008 Workshop: 17-18 June

Score Normalization

We performed ZT-Normalization as we did in SRE06

For the trials involving phone call tests, our ZNorm set includes 1252 female and 1103 male files taken from the SRE04/05 phone calls.

For trials involving microphone and interview speech, the ZNorm set includes 204 female and 180 male additional files taken from theSRE05 microphone subset.

The same sets, selected according to the training conditions, were used for training the TNorm models.

For the 3/8conv training conditions, the ZNorm set is the same used in SRE06, which includes 80 speakers with 3/8sides from SRE04.

Page 15: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

15NIST SRE 2008 Workshop: 17-18 June

Development history: 1conv4w-1conv4w

SRE2006 - 1conv4w-1conv4w - All Trials

3.0

3.5

4.0

4.5

5.0

5.5

6.0

0.150

0.175

0.200

0.225

0.250

0.275

0.300

EER 5.88 5.57 5.23 5.01 5.66 4.62 4.59 4.28

Min DCF 0.278 0.278 0.264 0.257 0.272 0.243 0.239 0.221

GMM SRE06

GMM-25-512 MAP

U40TelMic V300+D16 NormExt

GMM-43-1024 MAP U60TelMic

V300+D16 NormExt U60Tel

EER: -28%DCF: -20%

GMM 25 - 512GMM 43 - 1024

Page 16: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

16NIST SRE 2008 Workshop: 17-18 June

Development history: 1conv4w-1convmic

SRE2006 - 1conv4w-1convmic - All Trials

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

6.5

0.125

0.150

0.175

0.200

0.225

0.250

0.275

0.300

0.325

EER 6.42 4.72 3.73 3.43 4.68 4.48 3.67

Min DCF 0.27 0.198 0.179 0.149 0.217 0.216 0.191

GMM SRE06GMM-25-512

MAP U40TelMic

V300+D16 NormExtGMM-43-1024

MAP U60TelMic

V300+D16 NormExt

EER: -46%DCF: -45%

Page 17: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

17NIST SRE 2008 Workshop: 17-18 June

Development history: 10sec-10sec

SRE2006 - 10sec4w-10sec4w - All Trials

15.0

16.0

17.0

18.0

19.0

20.0

21.0

22.0

23.0

24.0

25.0

0.725

0.750

0.775

0.800

0.825

0.850

EER 24.24 21.32 19.63 18.02 17.39

Min DCF 0.884 0.801 0.772 0.757 0.748

GMM SRE06 GMM-25-512 V300 U40TelMic

GMM-25-1024 V300 U40TelMic

GMM-43-1024 V300 U60TelMic

GMM-43-2048 V300 U60TelMic

EER: -28%DCF: -15%

Page 18: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

18NIST SRE 2008 Workshop: 17-18 June

Interview development test set

Development test set defined on SRE08_MX5_DEV13 Males + 3 Females, 6 sessions per speaker, 9 channels per session (324 audio files)

Each audio split in 3 min chunks (3466 elements)

Gender dependent tests, no same session tests, uniform cross channel test distribution.

Male: 7200 target tests, 17280 impostor tests

Female: 7290 target tests, 17496 impostor tests

Page 19: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

19NIST SRE 2008 Workshop: 17-18 June

Interview intersession normalization

Supervector differences with the interviewee near microphone (channel 2) using parallel chunks of the same session

The speaker and the phonetic content of parallel chunks are the same => the compensation is focused on microphone differences

20 interview channel eigenvectors appended to 30 Tel+Mic eigenvectors

Page 20: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

20NIST SRE 2008 Workshop: 17-18 June

Interview VADDevelopment: VAD based on energy distribution of interviewer and interviewee near field microphone + Loquendo ASR

Development & Test: VAD based on NIST’s interviewee VAD/ASR:

if ((VAD_NIST and ASR_NIST) speech % > 40%)VAD = (VAD_NIST and ASR_NIST)

else if (VAD_NIST speech % > 40%)VAD = VAD_NIST

else if (ASR_NIST speech % > 40%)VAD = ASR_NIST

else VAD=1 for each frame.

This VAD information is further filtered by the Loquendo ASR

Page 21: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

21NIST SRE 2008 Workshop: 17-18 June

Interview development historyMIX5-Develop - Interview-Interview (3 male speakers)

5.0

5.5

6.0

6.5

7.0

7.5

8.0

8.5

9.0

0.300

0.325

0.350

0.375

0.400

EER 7.25 6.91 6.48 6.31 8.57 8.14 7.15

Min DCF 0.363 0.343 0.332 0.318 0.4 0.378 0.338

GMM-25-512 V300

U40TelMic

U50TelMic + U40Int

U30TelMic + U20Int U20Int M+F All+LoqASR

NistVAD+LoqASR

NistVAD+NistASR+LoqASR

Page 22: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

22NIST SRE 2008 Workshop: 17-18 June

CalibrationCombination of our 3 systems by linear fusion with Logistic Regression

parameters estimated on SRE06 data using the FOCAL tool

depending on each condition that appears both in SRE08 and SRE06

For the interview conditions, the weights are borrowed by the most similar conditions, substituting the microphone to the interview condition

For a long training condition, we used the weights computed for the corresponding short2 interview condition

Page 23: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

23NIST SRE 2008 Workshop: 17-18 June

Performed tests

The SRE08 primary system has been tested on allthe evaluation conditions

Unsupervised adaptation scores have been submitted for the 10sec-10sec condition

The SRE06 mothballed system has been tested on the short2-short3 condition

Page 24: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

24NIST SRE 2008 Workshop: 17-18 June

Sub-systems comparison (i)

Short2Int-Short3Int Short2Int-Short3Tel

Short2Tel-Short3Int0.10.2

0.51.0

2

5

10

20

3040506070

0.1 0.2 0.5 1.0 2 5 10 20 30 40 50 60 70

Mis

s Pr

obab

ility

[%]

False Alarm Probability [%]

LPT06PGMM

GMM-1024-25VGMM-512-25

LPT08 Primary

0.10.2

0.51.0

2

5

10

20

3040506070

0.1 0.2 0.5 1.0 2 5 10 20 30 40 50 60 70

Mis

s Pr

obab

ility

[%]

False Alarm Probability [%]

LPT06PGMM

GMM-1024-25VGMM-512-25

LPT08 Primary

0.10.2

0.51.0

2

5

10

20

3040506070

0.1 0.2 0.5 1.0 2 5 10 20 30 40 50 60 70

Mis

s Pr

obab

ility

[%]

False Alarm Probability [%]

LPT06PGMM

GMM-1024-25VGMM-512-25

LPT08 Primary

LPT 08 vs 06 EER: -62%DCF: -57%

LPT 08 vs 06 EER: -29%DCF: -25%

LPT 08 vs 06EER: -46%DCF: -45%

Page 25: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

25NIST SRE 2008 Workshop: 17-18 June

0.10.2

0.51.0

2

5

10

20

3040506070

0.1 0.2 0.5 1.0 2 5 10 20 30 40 50 60 70

Mis

s Pr

obab

ility

[%]

False Alarm Probability [%]

Short2Tel-Short3Tel - All Trials

LPT06PGMM

GMM-512-25VGMM-1024-43

LPT08 Primary

0.10.2

0.51.0

2

5

10

20

3040506070

0.1 0.2 0.5 1.0 2 5 10 20 30 40 50 60 70M

iss

Prob

abili

ty [%

]

False Alarm Probability [%]

Short2Tel-Short3Mic - All Trials

LPT06PGMM

GMM-512-25VGMM-512-25

LPT08 Primary

Sub-systems comparison (ii)

Short2Tel-Short3Tel – All trials Short2Tel-Short3Mic – All trials

LPT 08 vs 06 EER: -26%DCF: -32%

LPT 08 vs 06 EER: -17%DCF: -9%

Page 26: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

26NIST SRE 2008 Workshop: 17-18 June

2 Wires Conditions

o Segmentation based on speaker factorso Reduced accuracy loss due to segmentation, both for summed test or train

conditionso Accuracy loss due to intrinsic increase of false alarm rate

0.10.2

0.51.0

2

5

10

20

3040506070

0.1 0.2 0.5 1.0 2 5 10 20 30 40 50 60 70

Mis

s Pr

obab

ility

[%]

False Alarm Probability [%]

LPT Primary - Tel-Tel Trials

short2-summedshort2-short3**2 (simulation)

short2-short3

0.10.2

0.51.0

2

5

10

20

3040506070

0.1 0.2 0.5 1.0 2 5 10 20 30 40 50 60 70

Mis

s Pr

obab

ility

[%]

False Alarm Probability [%]

LPT Primary - Tel-Tel Trials

3summed-short33conv-short3

Page 27: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

27NIST SRE 2008 Workshop: 17-18 June

Unsupervised Adaptation10sec-10sec

The selection of trials to be used for adaptation was carried out using the primary system scores, obtained using the un-adapted models

Original model for trial selection

Updated model for scoring

Page 28: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

28NIST SRE 2008 Workshop: 17-18 June

636

75 True speaker trialsselected foradaptation

Impostor trialsselected foradaptation

Unsupervised adaptation 10s-10s

Gender dependent adaptation thresholds tuned on SRE0610sec-10sec

636

1484

True speaker trialsselected foradaptation

True speaker trialsnot selected foradaptation

NIST SRE 2008

13.0

14.0

15.0

16.0

17.0

18.0

0.300

0.400

0.500

0.600

0.700

0.800

EER Std 14.73 16.09 14.57 16.34

EER Adapted 14.44 15.79 14.56 15.83

Min DCF Std 0.635 0.683 0.652 0.761

Min DCF Adapted 0.612 0.660 0.645 0.754

SRE06 Male SRE06 Female SRE08 Male SRE08 Female

Page 29: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

29NIST SRE 2008 Workshop: 17-18 June

ConclusionsSignificant improvements were obtained using the speaker factors on the 10sec-10sec condition

SRE06

Now !

Page 30: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

30NIST SRE 2008 Workshop: 17-18 June

ConclusionsContribution of channel normalization for the interview conditions

0.10.2

0.51.0

2

5

10

20

3040506070

0.1 0.2 0.5 1.0 2 5 10 20 30 40 50 60 70

Mis

s Pr

obab

ility

[%]

False Alarm Probability [%]

Short2Int-Short3Int - All Trials

GMM-512-25 U Tel+MicGMM-512-25 U Tel+Mic+Int

Page 31: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

31NIST SRE 2008 Workshop: 17-18 June

References

N. Brummer and J. du Preez “Application-Independent Evaluation of Speaker Detection”, Computer Speech and Language Vol. 20, 2-3, pp. 230-275, 2006.

F. Castaldo, D. Colibro, E. Dalmasso, P. Laface, and C. Vair, “Compensation of Nuisance Factors for Speaker and Language Recognition”, IEEE Trans. on Audio, Speech, and Language Processing. Vol. 15-7, pp. 1969-1978, 2007.

F. Castaldo, D. Colibro, E. Dalmasso, P. Laface, and C. Vair, “Stream-Based Speaker Segmentation Using Speaker Factors and Eigenvoices”,Proc. ICASSP-2008, pp. 4133-4136.

P. Kenny, P. Ouellet, N. Dehak, V. Gupta, and P. Dumouchel: “A Study of Inter-Speaker Variability in SpeakerVerification” , IEEE Transactions on Audio, Speech and Language Processing, July 2008.

R. Kuhn, J.C. Junqua, P. Nguyen, and N. Niedzielski, ”Rapid Speaker Adaptation in Eigenvoice Space”, IEEE Trans. on Speech and Audio Processing, Vol.8, No.6, Nov. 2000, pp. 695-707.

Page 32: NIST SRE 2008 Workshop Loquendo - Politecnico di Torino ... · NIST SRE 2008 Workshop Loquendo - Politecnico di Torino LPT site presentation Daniele Colibro, Claudio Vair Loquendo

32NIST SRE 2008 Workshop: 17-18 June

Thank you !


Recommended