Calibration based on duration quality measures function in noise robust speaker recognition for NIST SRE’12
Miranti Indar Mandasari, Rahim Saeidi and David van Leeuwen.
Biometric Technologies in Forensic ScienceBTFS Conference, 14 October 2013
Outline
● Introduction,
● Speaker recognition system,
● Corpora,
● Experiment setup,
● Calibration techniques,
– Conventional linear, and
– Quality measure function (QMF).
● Performance measures,
● Results, and
● Conclusion.
Introduction
● The importance of likelihood ratio calibration in speaker recognition:
– Likelihood ratio as a preferable form of score for forensic purposes,
– Acknowledged by the speaker recognition community through speaker recognition evaluation (SRE) by NIST, and
– Often, scores produced by the system are not in likelihood ratio form.
● Classic challenges in speaker recognition:
– Short duration, and
– Noisy speech.
Speaker recognition system
● Speech enhancement and feature extraction stage:
– Dynamic noise suppression rule and Wiener filter,
– 60 dimensional MFCCs feature, and
– Speech activity detection and feature warping.
● Modeling stage:
– Gender-dependent and 2048 components universal background model (UBM),
– 400 dimensional i-vectors,
– 200 dimensional linear discriminant analysis (LDA),
– Pre-PLDA modeling: i-vector centering, within class covariance normalization (WCCN), and i-vector length-normalization, and
– Probabilistic linear discriminant analysis (PLDA) scoring.
Corpora
● NIST SRE'12 database:
– Duration variability, and
– Noise conditions (crowd & HVAC):
● Clean / no-alteration, ● 15 dB noisy, and ● 6 dB noisy.
● Three datasets in the experiments:
– Development set from I4U (Dev-I4U),
– Evaluation set from I4U (Eval-I4U), and
– NIST SRE 2012 protocols (Eval-SRE'12).
● I4U is a joint effort from 9 research Institutes and Universities across 4 continents in joining the NIST SRE'12 evaluation.
Calibration
● Calibration is:
– The ability to set a threshold optimally if scores are used for decisions, or
– The ability to produce likelihood ratios that lead to minimum Bayes' risk for any cost function.
● Calibration techniques:
– Linear calibration with 2 parameters (conventional), and
– Linear calibration with additional quality measure function (QMF).
● Calibration stages:
– Training calibration parameters: Dev-I4U, and
– Evaluation of calibration: Dev-I4U, Eval-I4U, and Eval-SRE'12.
Linear Calibration
LikelihoodRatio
OffsetParameter
ScalingParameter
RawScore
● This two parameterized linear calibration refer to as conventional calibration,
● A monotonously increasing score-to-likelihood-ratio transformation so the discriminability stays the same, and
● The parameters w0 and w1 are found by minimizing cross-entropy (or Cllr) on a development set, i.e., by logistic regression.
QMF calibration
● QMF stands for quality measure function,
● QMF calibration is a linear calibration approach with quality measures as extra terms, and
● There are 4 proposed duration QMFs.
Quality Measure Function (QMF)
Duration of Model Segment
Duration of Test Segment
Extra OffsetParameters
LikelihoodRatio
OffsetParameter
ScalingParameter
RawScore
Duration-dependent Offset parameters
Quality measure functions
Performance measures(the lower the values, the better the performance)
● Equal error rate, E= or EER.
– Showing discrimination performance.
● Primary cost, Cprimary, of NIST SRE'12.
– Showing discrimination and calibration performances.
● Cost of log likelihood ratio, Cllr.
– Showing discrimination (minimum Cllr) and calibration (Cmc)
performances.
Results
EER on Dev-I4U
Clean 15 dB 6 dB0
0.5
1
1.5
2
2.5
3
3.5
4
No calibrationsConventional calibrationQMF calibration - Q1QMF calibration - Q2QMF calibration - Q3QMF calibration - Q4E
ER
(%
)Performance
Measure(EER & C-primary)
DatasetCalibrationTechnique
Trials Based on Noise Conditions
Cllr on Dev-I4UN
.A. O Q1
Q2
Q3
Q4
N.A
. O Q1
Q2
Q3
Q4
N.A
. O Q1
Q2
Q3
Q4
Clean 15 dB 6 dB
0
0.05
0.1
0.15
0.2
0.25
Cmcminimum Cllr
Cllr
PerformanceMeasure
(Cllr, min.Cllr and Cmc)
Dataset
PerformanceMeasuresCmc or miscalibration cost.
Cmc = Cllr - min.Cllr
Trials Based on Noise Conditions
Results
Dev-I4U
EER on Dev-I4U
Clean 15 dB 6 dB0
0.5
1
1.5
2
2.5
3
3.5
4
No calibrationsConventional calibrationQMF calibration - Q1QMF calibration - Q2QMF calibration - Q3QMF calibration - Q4E
ER
(%
)
C-primary on Dev-I4U
Clean 15 dB 6 dB0
0.05
0.1
0.15
0.2
0.25
0.3
No calibrationsConventional calibrationQMF calibration - Q1QMF calibration - Q2QMF calibration - Q3QMF calibration - Q4
C-p
rim
ary
Cllr on Dev-I4UN
.A. O Q1
Q2
Q3
Q4
N.A
. O Q1
Q2
Q3
Q4
N.A
. O Q1
Q2
Q3
Q4
Clean 15 dB 6 dB
0
0.05
0.1
0.15
0.2
0.25
Cmcminimum Cllr
Cllr
Results on
Eval-I4U
EER on Eval-I4U
Clean 15 dB 6 dB0
0.5
1
1.5
2
2.5
3
No calibrationsConventional calibrationQMF calibration - Q1QMF calibration - Q2QMF calibration - Q3QMF calibration - Q4E
ER
(%
)
C-primary on Eval-I4U
Clean 15 dB 6 dB0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
No calibrationsConventional calibrationQMF calibration - Q1QMF calibration - Q2QMF calibration - Q3QMF calibration - Q4
C-p
rim
ary
Cllr on Eval-I4UN
.A. O Q1
Q2
Q3
Q4
N.A
. O Q1
Q2
Q3
Q4
N.A
. O Q1
Q2
Q3
Q4
Clean 15 dB 6 dB
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Cmcminimum Cllr
Cllr
Results
Eval-SRE'12
EER on Eval-SRE'12
Clean 15 dB 6 dB0
1
2
3
4
5
6
7
8
No calibrationsConventional calibrationQMF calibration - Q1QMF calibration - Q2QMF calibration - Q3QMF calibration - Q4E
ER
(%
)
Cllr on Eval-SRE12N
.A. O Q1
Q2
Q3
Q4
N.A
. O Q1
Q2
Q3
Q4
N.A
. O Q1
Q2
Q3
Q4
Clean 15 dB 6 dB-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Cmcminimum Cllr
Cllr
C-primary on Eval-SRE'12
Clean 15 dB 6 dB0
0.2
0.4
0.6
0.8
1
1.2
No calibrationsConventional calibrationQMF calibration - Q1QMF calibration - Q2QMF calibration - Q3QMF calibration - Q4
C-p
rim
ary
Distribution of active speech duration in I4U and SRE'12 trials.
Conclusion
● The linear calibration with QMF as the additional terms shows a positive gain in the system performance compared to the conventional linear calibration with two terms.
● It is shown that by adding 1–2 extra parameters in the linear calibration through QMF approach, there is a potential to improve the calibration and discrimination performances of a speaker recognition system.
● In applying a QMF, it is important to design a development set that match the variability of duration in the evaluated set.
Thank you!
&
Questions?