Date post: | 05-Dec-2014 |
Category: |
Engineering |
Upload: | saurab-dulal |
View: | 289 times |
Download: | 6 times |
1
GMMGaussian mixture models
04/10/2023
Saurab Dulal
IOE, pulchowk Campus
2
Introduction to GMM• Gaussian“Gaussian is a
characteristic symmetric "bell curve" shape that quickly falls off towards 0 (practically)”
• Mixture Model“mixture model is a
probabilistic model which assumes the underlying data to belong to a mixture distribution”
3
Introduction to GMM• Mathematical Description of GMM
p(x) = w1 p1 (x) + w2p2 (x) + w3 p3 (x) ……… +wn pn (x)
where p(x) = mixture component
w1, w2 ….. wn = mixture weight or mixture coefficient
pi (x) = Density functions
Fig :- Image
showing
Best fit
Gaussian
Curve
4
Introduction to GMM“The most common mixture distribution is the Gaussian
(Normal) density function, in which each of the mixture components are Gaussian distributions, each with their own mean and variance parameters.”
p(x) = w1N( x | µ1∑1 )+ w1N( x | µ2∑2 )… +w1N( x | µn∑n )
µi ‘s are means and ∑i ‘s are covariance-matrix of individual components(probability density function)
G1,w1 G2,w2
G3,w3
G4,w4
G5,w5
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Component 1 Component 2p(
x)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Component 1 Component 2p(
x)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x)
-5 0 5 100
0.5
1
1.5
2
Component Modelsp(
x)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x)
8
GMM for Speaker Recognition
Motivation • Interpretation that Gaussian component
represent some general speaker –dependent spectral shapes
• Capabilities of Gaussian mixture to model arbitrary densities
9
Description of SR-using GMM
• Speech Analysis• Model Description• Model Interpretations• Maximum Likelihood Parameters Estimation• Speaker Identification
10
Speech Analysis
• Linear predictive coding(LPC)• Mel-scale filter-bank(to reducenoise)
Analysis is ended with the generation of Cepstrum coefficients x1
’, x2’
x3’….xn’
A cepstrum is the result of taking the Inverse Fourier transform (IFT) of the logarithm of the estimated spectrum of a signal.
Cosine transform
2000/05/03 11
Model Description
Gaussian Mixture Density
)()|(1
xbpxpM
iii
Where x
D-dimensional random vector
)()'(
2
1exp
)2(
1)( 1
212 iii
iDi xxxb
iiip ,, Mi ,,1
Nodal, Grand,Global
Nodal, diagonal (this)
Covariance matrix
Mean
Component Density
Speaker Model
12
Choice of Covariance Matrix• Nodal Covariance One co-variance matrix per Gaussian component
• Grand CovarianceOne co-variance matrix for all Gaussian component
• Global Covariance single co-variance matrix shared by all speaker component
13
Model Interpretation
• Intuitive notion Acoustic classes(vowels, nasals, fricatives) reflects
some general speaker-dependent vocal tract configuration that are useful for characterizing speaker-identity
• GMM have ability to form smooth approximation to arbitrary shaped density
• It doesn’t only have smooth approx but also multimodal nature of densities
2000/05/03 14
ML-Parameters EstimationStep:
1. Beginning with an initial model
2. Estimate a new model such that
Mixture density
3. Repeated 2. until certain threshold is reached.
…Maximum Likelihood
)|()|( XpXp
2000/05/03 15
(Mixture Weights)
(Means)
(Variances)
T
tti xip
Tp
1
),|(1
T
t t
T
t tti
xip
xxip
1
1
),|(
),|(
2
1
1
22
),|(
),|(iT
t t
T
t tti
xip
xxip
M
k tkk
tiit
xbp
xbpxip
1)(
)(),|(
Mixture
Density
ComponentDensity
and refers to arbitrary elements of vectors ii
,2 and tx
ii ','2
'tx
and
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4ANEMIA PATIENTS AND CONTROLS
Red Blood Cell Volume
Red
Blo
od C
ell H
emog
lobi
n C
once
ntra
tion
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 1
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 3
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 5
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 10
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 15
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 25
0 5 10 15 20 25400
410
420
430
440
450
460
470
480
490LOG-LIKELIHOOD AS A FUNCTION OF EM ITERATIONS
EM Iteration
Lo
g-L
ike
liho
od
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
ANEMIA DATA WITH LABELS
Anemia Group
Control Group
2000/05/03 25
Speaker IdentificationA group of speakers S = {1,2,…,S} is represented by GMM’s λ1, λ2, …, λs, the obective is to find the speaker model which has the maximum a posteriori probability for a given observation sequence
)(
)Pr()|(maxarg)|Pr(maxargˆ11 Xp
XpXS kk
Skk
Sk
)|(maxargˆ1
kSk
XpS
)|(logmaxargˆ1
1kt
T
tSk
xpS
T
ttiikt xbpxp
1
)()|( which
logtake
26
ReferencesD. A. Reynolds and R. C. Rose, “Robust Text- Independent
Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, vol.3, No.1, pp.72-83,January 1995.
• http://en.wikipedia.org/wiki/Probability_density_function• http://crsouza.blogspot.com/2010/10/gaussian-mixture-
models-and-expectation.html• https://www.ll.mit.edu/mission/communications/ist/public
ations/0802_Reynolds_Biometrics-GMM.pdf• http://statweb.stanford.edu/~tibs/stat315a/LECTURES/e
m.pdf• http://eprints.pascal network.org/archive/00008291/01/S
oftAssignReconstr_ICIP2011.pdf• http://home.deib.polimi.it/matteucc/Clustering/tutorial_ht
ml/kmeans.html