Date post: | 21-Jun-2015 |
Category: |
Science |
Upload: | aleksey-tyulpin |
View: | 459 times |
Download: | 2 times |
Parametric estimation of the multivariate probabilitydensity function. EM-algorithm. RBF network
A.Tyulpin
Measurement Systems and Digital Signal Processing laboratory,Northern (Arctic) Federal University
September, 2014
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 1 / 30
Plan
1 Examples of data classification problems2 Classification using Bayes decision theory3 Parametric estimation of probability density function4 Mixture model and EM algorithm5 EM-algorithm for Gaussian mixture model6 Data generation using GMM7 Radial Basis Functions
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 2 / 30
Classification problems
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 3 / 30
Classification problems
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 4 / 30
Bayes decision ruleBinary classification problem
X ⊂ Rn – features space, Ω = 1, . . . ,K – set of labels.X × Ω – statistical population.ω1, . . . , ωK – are denoted as classes.Xm = (xj , ωj) ∈ X × Ω, j = 1,m – data sample.p(x|ωj) – probability density function (pdf) of x in class ωj .p(x, ω) – joint pdf in X × Ω.P (ωi|x) – a posterior probability of x ∈ ωi, i = 1, 2.
Bayes decision rule (two classes):
Assign x to ω1 if P (ω1|x) > P (ω2|x)
Bayes decision rule (K classes):
Assign x to ωl, where ωl = arg maxω∈Ω
P (ω|x)
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 5 / 30
How to find P (ωj|x)
Bayes rule
P (ωj |x) =p(x|ωj)P (ωj)
p(x)
If we have a data sample Xm, it is possible to calculate P (ωj).But pdfs p(x|ωj) and p(x) are still unknown.
Two approaches for estimation of pdf
Nonparametric, e.g. Parzen window (or kernel density estimation).Parametric e.g. maximum likelihood parameter estimation andEM-algorithm.
There is also another approach – modelling of P (ωj |x)
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 6 / 30
Maximum likelihood estimation
Let x1, . . . , xm be random samples drawn from pdf p(x; θ).X = x1, . . . , xm, p(X; θ) ≡ p(x1, . . . , xm; θ) – joint pdf. Assumingstatistical independence between different samples, we have a likelihoodfunction:
p(x1, . . . , xm; θ) =
m∏k=1
p(xk; θ). (1)
Maximum likelihoodMaximum likelihood (ML) method estimates θ so that the value of thelikelihood function takes its maximum value:
θML = arg maxθ
m∏k=1
p(xk; θ). (2)
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 7 / 30
Maximum likelihood estimation
Obvious fact:
arg maxθ
m∏k=1
p(xk; θ) = arg maxθ
ln
m∏k=1
p(xk; θ). (3)
Let log-likelihood function is denoted as
L(θ) = ln
m∏k=1
p(xk; θ) =
m∑k=1
ln p(xk; θ). (4)
It takes its maximum value where
∂L(θ)
∂θ=
m∑k=1
1
p(xk; θ)
∂p(xk; θ)
∂θ= 0. (5)
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 8 / 30
Normal distribution
− 30 − 20 − 10 0 10 20 30− 3
− 2
− 1
0
1
2
3
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 9 / 30
Complex distribution
− 15 − 10 − 5 0 5 10− 6
− 5
− 4
− 3
− 2
− 1
0
1
2
3
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 10 / 30
Mixture model
Let pdf of x ∈ X is a mixture of K distributions:
p(x) =
K∑j=1
πjpj(x),
K∑j=1
πj = 1, πj ≥ 0, (6)
where pj(x) and πj ≡ Pj – are pdf and a prior probability of j-thcomponent of the mixture respectively. ϕ(x; θ) is a parametric family ofpdfs: pj(x) ≡ ϕ(x; θj).Separation of mixtureLet K and Xm are given. We need to estimate a vector of perametersΘ = [π1, . . . , πK , θ1, . . . , θK ]. The naive MLE of p(x) is a very complexproblem.
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 11 / 30
EM-algorithm
EM(expectation-maximization)
Algorithm for separation of mixture of distributions.
The general idea of EM-algorithm: Repeat following steps while Θ andG are not stable:
1 G = E(Θ) (E-step)2 Θ = M(Θ, G) (M-step)
In the EM-algorithm we will use G = (gij)m×K – the matrix of latentvariables. gij ≡ P (θj |xi) and ∀i = 1, . . . ,m
∑Kj=1 gij = 1. G is very useful
for calculating of MLE of p(x).
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 12 / 30
EM-algorithmWe can use Bayes rule:
gij =πjpj(xi)∑Ks=1 πsps(xi)
, (7)
therefore, if we have Θ we can calculate G. This is the goal of E-step.Now, let’s look at the MLE of p(x). This an optimization problem
Q(Θ) =
m∑i=1
ln
K∑j=1
πjpj(xi)→ max, (8)
with constraints of equlity and inequality type:
K∑j=1
πj = 1, πj ≥ 0.
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 13 / 30
EM-algorithm
Let’s "forget"about constraints πj ≥ 0, j = 1, . . . ,K for a while and useLagrange multipliers:
L(Θ;Xm) =
m∑i=1
ln
K∑j=1
πjpj(xi)− λ( K∑j=1
πj − 1). (9)
∂L
∂πj=
pj(xi)∑Ks=1 πsps(xi)
− λ = 0. (10)
Let’s multiply eq. (10) by πj , sum all K such equations and change indicesof summation:
m∑i=1
K∑j=1
πjpj(xi)∑Ks=1 πsps(xi)
= λ
K∑j=1
πj . (11)
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 14 / 30
EM-algorithm
Let’s «forget» about constraints πj ≥ 0, j = 1, . . . ,K for a while and useLagrange multipliers:
L(Θ;Xm) =
m∑i=1
ln
K∑j=1
πjpj(xi)− λ( K∑j=1
πj − 1). (12)
∂L
∂πj=
pj(xi)∑Ks=1 πsps(xi)
− λ = 0. (13)
Let’s multiply this equation by πj , sum all K such equations and changeindices of summation:
m∑i=1
K∑j=1
πjpj(xi)∑Ks=1 πsps(xi)︸ ︷︷ ︸
1
= λ
K∑j=1
πj︸ ︷︷ ︸1
⇒ λ = m. (14)
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 15 / 30
EM-algorithm
Now, let’s multiply the eq. (10) by πj , but with a substitution λ = m:
∀i = 1, . . . ,m
m∑i=1
πjpj(xi)∑Ks=1 πsps(xi)
= mπj . (15)
It’s obviously that
πj =1
m
m∑i=1
gij . (16)
Also, we can see that constarints πj ≥ 0 are satisfied if they were satisfiedat the beginning.
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 16 / 30
EM-algorithmRecall that pj(x) ≡ ϕ(x; θj). Now, let’s find a partial derivative ∂L
∂θj:
∂L
∂θj=
m∑i=1
πj∑Ks=1 πsps(xi)
∂pj(xi)
∂θj=
=
m∑i=1
πjpj(xi)∑Ks=1 πsps(xi)︸ ︷︷ ︸
gij
∂
∂θjln pj(xi) =
∂
∂θj
m∑i=1
gij ln pj(xi) = 0. (17)
The problem is called weighted MLE:
θj = arg maxθ
m∑i=1
gij ln pj(xi) (18)
The goal of M-step is to find new values of πj and solve Kindependent problems of weighted MLE for θj.
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 17 / 30
Last words about EM-algorithm in general
EM-algorithm converges.Q(Θ) can has many extremes, therefore it can stuck at the localextremes. Usage of Stochastic EM-algorithm can might solve suchproblems.For chosing the K we can use the EM-algorithm with sequentialincreasing of it.It is very benefitial to use well-known pdfs: Gaussian, Bernoulli, etc.
On the next slides example of using Gaussian Mixture Model will begiven.
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 18 / 30
Gaussian Mixture ModelIf the ϕ(x; θj) = N (x;µj , Σj), then ∀j = 1 . . . ,K:1st case: Σ is non-diagonal matrix (not benefitial to use):
µj =1
mπj
m∑i=1
gijxi (19)
Σj =1
mπj
m∑i=1
gij(xi − µj)(xi − µj)T (20)
2nd case: Σ is a diagonal matrix ∀l = 1 . . . , n:
µjl =1
mπj
m∑i=1
gijxil (21)
σ2jl =
1
mπj
m∑i=1
gij(xil − µjl)2 (22)
There are other cases, but these 2 are important.A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 19 / 30
EM-algorithm for GMM with diagonal ΣData: Xm, K, [π1, . . . , πK ], [µ1, . . . , µK ], [Σ1, . . . ,ΣK ], εResult: [µ1, . . . , µK ], [Σ1, . . . ,ΣK ]G = (0)m×n;repeat
for i = 1, . . . ,m, j = 1, . . . ,K dog0ij := gij ;
gij :=πjN (xi;µj ,Σj)∑K
s=1 πsN (xi;µs,Σs);
endfor j = 1, . . . ,K do
πj := 1m
∑mi=1 gij ;
for l = 1, . . . ,m doµjl := 1
mπj
∑mi=1 gijxil;
σ2jl := 1
mπj
∑mi=1 gij(xil − µjl)2;
endend
until maxi,j|gij − g0
ij | < ε;
return [µ1, . . . , µK ], [Σ1, . . . ,ΣK ]
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 20 / 30
Example of usage of GMM and EM
− 1 5 − 1 0 − 5 0 5 1 0− 6
− 5
− 4
− 3
− 2
− 1
0
1
2
3GMM 5 com p on e n ts
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 21 / 30
Examples of data generation
Handwritten digits – 1797 of images 8x8. 100 of them:
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 22 / 30
3 components
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 23 / 30
5 components
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 24 / 30
10 components
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 25 / 30
20 components
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 26 / 30
30 components
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 27 / 30
Radial Basis Function networkRecall the Bayes decision rule:
Assign x to ωl, where ωl = arg maxω∈Ω
P (ω|x)
Schema for M -classes problem:
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 28 / 30
References
K. Vorontsov – Mathematical methods of supervised learning.S. Theodoridis – Pattern Recognition.Cristoph M. Bishop – Pattern Recognition and Machine Learning.
A.Tyulpin (DSPlab) EM-algorithm & RBF nets September, 2014 29 / 30