+ All Categories
Home > Documents > Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X...

Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X...

Date post: 18-Aug-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
81
Outline Distribution Mixtures of Product Components Part I: EM Algorithm & Modifications Jiˇ ı Grim Institute of Information Theory and Automation Academy of Sciences of the Czech Republic January 2017 Available at: http://www.utia.cas.cz/people/grim Cf. J. Grim: “Approximation of Unknown Multivariate Probability Distributions by Using Mixtures of Product Components: A Tutorial.” International Journal of Pattern Recognition and Artificial Intelligence, (2017). DOI: 10.1142/S0218001417500288
Transcript
Page 1: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Outline

Distribution Mixtures of Product Components

Part I: EM Algorithm & Modifications

Jirı Grim

Institute of Information Theory and AutomationAcademy of Sciences of the Czech Republic

January 2017

Available at: http://www.utia.cas.cz/people/grim

Cf. J. Grim: “Approximation of Unknown Multivariate Probability Distributionsby Using Mixtures of Product Components: A Tutorial.” International Journal

of Pattern Recognition and Artificial Intelligence, (2017).DOI: 10.1142/S0218001417500288

Page 2: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Outline

Outline

1 METHOD OF MIXTURESApproximation of Unknown Probability DistributionsExample - mixture of Gaussian densities

2 GENERAL VERSION OF EM ALGORITHMGeneral EM Iteration SchemeMonotonic Property of EM AlgorithmComputational Properties of EM AlgorithmHistorical Comments

3 PRODUCT MIXTURESDistribution Mixtures with Product ComponentsImplementation of EM Algorithm

4 MODIFICATIONS OF PRODUCT MIXTURE MODELStructural Mixture ModelEM algorithm for Incomplete DataModification of EM algorithm for Weighted DataSequential Decision Scheme

5 SURVEY: computational properties of product mixturesLiterature related to Mixtures

Page 3: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Principle Example smesi

Method of Distribution Mixtures

Information Source:

training data S: independent observations of a random vector identicallydistributed (i.i.d.) according to an unknown probability distribution P∗(x)

S = x (1), x (2), . . . , x (K), x (k) = (x(k)1 , x

(k)2 , . . . , x

(k)N ) ∈ X

Principle of the Method of Mixtures:

approximation of unknown multidimensional multimodal distribution P∗(x)by means of a linear combination of component distributions F (x |m)

P(x) =∑m∈M

wmF (x |m),∑m∈M

wm = 1,∑x∈X

F (x |m) = 1

(=

∫XF (x |m)dx

)

Application examples:

pattern recognition, image analysis, prediction problems, texture modeling,statistical models, classification of text documents, . . .

Page 4: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Principle Example smesi

Mixtures as a “Semiparametric” Model

parametric approach: e.g. assuming multivariate normal density

P(x) =1√

(2π)N detAexp−1

2(x − c)TA−1(x − c), x ∈ X

mean: c =1

|S|∑x∈S

x , covariance matrix: A =1

|S|∑x∈S

(x − c)(x − c)T

nonparametric approach: general kernel estimate Theorem (Parzen, 1962)

P(x) =1

|S|∑y∈S

∏n∈N

1√2πσn

exp (xn − yn)2

2σ2n

, x ∈ X

problem: optimal smoothing (choice of the smoothing parameters σn)

Mixtures as a Compromise: Semiparametric Multimodal Model

not so limiting as parametric models

almost as general as nonparametric model, without smoothing

efficient estimation of parameters by EM algorithm

Page 5: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Principle Example smesi

Example - EM algorithm for mixtures of Gaussian densities

computation of parameter estimates from data: S = x (1), . . . , x (K)

F (x |cm,Am) =1√

(2π)N detAm

exp−1

2(x − cm)TA−1

m (x − cm), x ∈ IRN

L =1

|S|∑x∈S

logP(x) =1

|S|∑x∈S

log

[ ∑m∈M

F (x |cm,Am)wm

]

Iteration equations: ≈ to maximize log-likelihood function

E-step: q(m|x) =wmF (x |cm,Am)∑Mj=1 wjF (x |c j ,Aj)

, x ∈ S, m = 1, 2, . . . ,M

M-Step: w′

m =1

|S|∑x∈S

q(m|x), c′

m =1∑

x∈S q(m|x)

∑x∈S

x q(m|x)

A′

m =1∑

x∈S q(m|x)

∑x∈S

q(m|x) (x − c′

m)(x − c′

m)T

Remark: The number of components has to be given.

Page 6: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Principle Example smesi

Example: reconstruction of a Gaussian mixture from data

dimension of data: N = 2, number of mixture components: M = 7

Page 7: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Principle Example smesi

Random sampling from a Gaussian mixture (M=7)

6000 data points (test of the correct implementation of EM algorithm)

Page 8: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Principle Example smesi

Example of the mixture estimate (M=28)

number of mixture components M = 28 ( 6= 7) (COMPARISON: kernel estimate)

Page 9: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Principle Example smesi

Original mixture of Gaussian densities (M=7)

dimension of data N = 2, number of mixture components M = 7

Page 10: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

General Version of EM Algorithm

EM algorithm: to maximize log-likelihood function

L =1

|S|∑x∈S

logP(x) =1

|S|∑x∈S

log

[ ∑m∈M

wmF (x |m)

]Iteration Equations: (m = 1, 2, . . . ,M, x ∈ S, S = x (1), . . . , x (K))

E-step: q(m|x) =wmF (x |m)∑Mj=1 wjF (x |j)

, w′

m =1

|S|∑x∈S

q(m|x)

M-Step: F′(.|m) = arg max

F (.|m)

1∑x∈S q(m|x)

∑x∈S

q(m|x) log F (x |m)

for product components: F (x |m) =∏

n∈N fn(xn|m), N = 1, 2, . . . ,N

⇒ f′

n (.|m) = arg maxfn(.|m)

1∑x∈S q(m|x)

∑x∈S

q(m|x) log fn(xn|m), n ∈ N

Remark: Only inequality is sufficient in the M-Stepinstead of maximum ⇒ generalized EM (GEM) algorithm.

Page 11: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

Explicit Solution of the M-Step (Grim,1982)

Let F (x |b), x ∈ X be a probability density function and

let b∗ be the maximum-likelihood estimate of the parameter b:

b∗ = arg max

b

L(b)

= arg max

b

1

|S|∑x∈S

log F (x |b)

Further let b∗ be an additive function of the data vectors x ∈ S:

b∗ =

1

|S|∑x∈S

a(x).

Denoting γ(x) = N(x)/|S| the relative frequency of x in S we can write:

L(b) =∑x∈X

γ(x) log F (x |b), X = x ∈ X : γ(x) > 0, (∑x∈X

γ(x) = 1)

b∗ =

∑x∈X

γ(x) a(x) = arg maxb

∑x∈X

γ(x) log F (x |b)

Consequence: Weighted likelihood function is maximized by theweighted analogy of the related m.-l. estimate. Example: Gaussian mixture

Page 12: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

Monotonic Property of EM Algorithm (Schlesinger, 1968)

The sequence of log-likelihood values L(t)∞t=0 is non-decreasing:

L(t+1) − L(t) ≥ 0, t = 0, 1, 2, . . .

and, if bounded above, converges to a local or global maximum(or a saddle-point) of the log-likelihood function:

limt→∞

L(t) = L∗ <∞.

The existence of a finite limit L∗ <∞ implies the related necessaryconditions: Proof

limt→∞

(L(t+1) − L(t)) = 0 ⇒

⇒ limt→∞

|w (t+1)(m)−w (t)(m)| = 0,m ∈M, limt→∞

||q(t+1)(·|x)−q(t)(·|x)|| = 0

Remark: The convergence of the sequence L(t)∞t=0 does not implythe convergence of the corresponding parameter estimates!

Page 13: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

Proof of the Monotonic Property of EM Algorithm

Lemma

Kullback-Leibler information divergence I (q(·|x)||q′(·|x)) is non-negative forany two distributions q(·|x), q

′(·|x) and it is zero if and only if the two

distributions are identical. Proof

⇒ 1

|S|∑x∈S

I (q(·|x)||q′(·|x)) =

1

|S|∑x∈S

[ ∑m∈M

q(m|x) logq(m|x)

q′(m|x)

]≥ 0

Substitution for q(m|x), q′(m|x) from the E-Step implies the inequality:

1

|S|∑x∈S

∑m∈M

q(m|x) logP′(x)

P(x)− 1

|S|∑x∈S

∑m∈M

q(m|x) log

[w′

mF′(x |m)

wmF (x |m)

]≥ 0

where the first term is equal to the increment of the criterion L:

1

|S|∑x∈S

∑m∈M

q(m|x) logP′(x)

P(x)=

1

|S|∑x∈S

logP′(x)

P(x)= L

′− L.

Page 14: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

Proof of the Monotonic Property of EM Algorithm

Making substitution from the last equation we obtain:

(*) L′−L ≥

∑m∈M

[1

|S|∑x∈S

q(m|x)

]log

w′

m

wm+∑m∈M

1

|S|∑x∈S

q(m|x) logF′(x |m)

F (x |m)

and by using substitution from the M-Step

(**) w′

m =1

|S|∑x∈S

q(m|x), m = 1, 2, . . . ,M

we can write the inequality:

(***)∑m∈M

[1

|S|∑x∈S

q(m|x)

]log

w′

m

wm=∑m∈M

w′

m logw′

m

wm≥ 0.

Consequently, the first sum on the right-hand side of the inequality (*) isnon-negative.

Remark: The definition (**) of the weights w′

m maximizesthe first sum in Eq. (***).

Page 15: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

Proof of the Monotonic Property of EM Algorithm

In view of the M-Step definition, the function F′(·|m) maximizes

the left-hand side, i.e. we can write:

∑m∈M

1

|S|∑x∈S

q(m|x) log F′(x |m) ≥

∑m∈M

1

|S|∑x∈S

q(m|x) log F (x |m).

The last inequality can be rewritten in the form∑m∈M

1

|S|∑x∈S

q(m|x) logF′(x |m)

F (x |m)≥ 0,

i.e. the increment of the log-likelihood function L is non-negative:

L′− L ≥

∑m∈M

w′

m logw′

m

wm+∑m∈M

1

|S|∑x∈S

q(m|x) logF′(x |m)

F (x |m)≥ 0

⇒ L′≥ L Alternative proof

Remark: Any statistical interpretation of the proof is unnecessary!

Page 16: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

Mixture Identification × Approximating by Mixtures

Problem of mixture identification (e.g. cluster analysis)

GOAL: to identify the true number of components and toestimate the true mixture parameters

the estimated mixture must be identifiable Definition

PROBLEM: the log-likelihood function has local maxima nearly always(especially in case of small data sets in high dimensional spaces)

⇒ the resulting local maximum is starting-point dependent

PROBLEM: the mixture estimate is strongly influenced by the chosennumber of components and by the initial parameters

Problem of approximating unknown probability distributions

GOAL: precise approximation of the unknown probabilitydistribution by using mixture distributions Approximation Problem × MLE

the approximating mixture need not be identifiable

the exact number of components is irrelevant

the approximating mixture can be initialized randomly

Page 17: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

Computational properties of EM Algorithm

real-life approximation problems ⇒

large data sets + large number of components:

in case of large mixtures (M ≈ 101 − 102) the low-weight componentsmay be neglected (⇒ the exact number of components is irrelevant)

the existence of local log-likelihood maxima of large mixtures is lessrelevant because the related maximum values are comparable

⇒ the influence of initial parameters is less relevant, the mixtures can beinitialized randomly

the EM iterations can be stopped e.g. by a relative increment thresholdbecause of limited influence on the achieved log-likelihood value

a reasonable stopping rule may decrease the risk of overfitting (excessiveadaptation to training data)

the EM algorithm is applicable to weighted data

Remark: The computational properties are data-dependent andtherefore not generally valid.

Page 18: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

From the History of the Mixture Estimation Problem

Computation of m.-l. estimates of mixture parameters by setting partialderivatives to zero cannot be solved analytically. SOLUTION?

First paper: Pearson (1894): “Contributions to the mathematicaltheory of evolution. 1. Dissection of frequency curves.”Philosophical Trans. of the Royal Society of London 185, 71-110.Subject: mixture of two univariate Gaussian densities estimated by themethod of moments. (about 80 papers in the years 1895-1965)

efficient estimation of mixtures was enabled only by computers:

Hasselblad (1966), Day (1969), Wolfe (1970): derivedsimple iteration scheme by algebraic rearrangement of the likelihoodequations (at present known as EM algorithm) which was convergingand easily applicable to large mixtures in multidimensional spaces

Hosmer (1973): “Iterative m.-l. estimates were proposed by Hasselbladand subsequently have been looked at by Day, Hosmer and Wolfe.”

Peters a Walker (1978): “... we have observed in experiments thatthe convergence is monotone, i.e. that the likelihood function is actuallyincreased in each iteration, but we have been unable to prove it.”

Page 19: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

From the History of the Mixture Estimation Problem

the first proof of the monotonic property of EM algorithm:

Schlesinger M.I. (1968): “Relation between learning and self learningin pattern recognition”, Kibernetika, (Kiev), No. 2, 81-88. M.I. Schlesinger

Ajvazjan et al. (1974, in Russian): cite Schlesinger (1968)

Isaenko & Urbach (1976, in Russian): cite Schlesinger (1968)

Page 20: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

From the History of the Mixture Estimation Problem

the standard reference to EM algorithm:

Dempster et al. (1977): “Maximum likelihood from incomplete datavia the EM algorithm.” J. Roy. Statist. Soc., B, Vol. 39, pp.l-38.

Dempster et al. introduced the name EM algorithm and described itswide application possibilities (main subject: problem of incomplete data)

Google Scholar (2017): 48 500 citations of the above paper(“all time top 10” in statistics)

: the term “EM algorithm” used in 340 000 papers

: the terms “EM algorithm & mixture” used in 103 000 papers

Page 21: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey General EM Monotonie Computational properties Historie

From the History of the Mixture Estimation Problem

erroneous proof of the convergence of parameter estimates:(does not concern the monotonic property of EM algorithm)

Boyles R.A. (1983): “On the convergence of the EM algorithm.” J.Roy. Statist. Soc., B, Vol. 45, pp. 47-50.

Wu C.F.J. (1983): “On the convergence properties of the EMalgorithm.” Ann. Statist., Vol. 11, pp. 95-103.

Monographs on Mixtures:

Titterington et al. (1985): Statistical analysis of finite mixturedistributions, John Wiley & Sons: Chichester, New York.

McLachlan and Peel (2000): Finite Mixture Models, John Wiley &Sons, New York, Toronto.

Page 22: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Product mixtures Implementation

PRODUCT MIXTURES

mixtures of product components (conditional independence model):

P(x) =∑m∈M

wm

∏n∈N

fn(xn|m), x ∈ XExamples:Gaussian mixtures with diagonal covariance matrices (real variables)mixtures of multivariate Bernoulli distributions (binary variables)

ADVANTAGES:

do not imply the assumption of independence of variables

⇒ do not imply the “naive Bayes” assumption

the mixture parameters can be efficiently estimated by EM algorithm

any discrete distribution can be expressed as product mixture Proof

Gaussian product mixtures approach the asymptotic accuracy ofnon-parametric Parzen estimates for M >> 1 Parzen estimates

no risk of ill-conditioned covariance matrices in Gaussian components

marginal distributions: by omitting superfluous terms in the products

any conditional distributions easily computed

product mixtures support the subspace (structural) modification

Page 23: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Product mixtures Implementation

EM Estimation of Gaussian Product Mixtures

COMPONENTS: Gaussian densities with diagonal covariance matrices

F (x |µm,σm) =∏n∈N

1√2πσmn

exp− (xn − µmn)2

2σ2mn

, x ∈ X

L =1

|S|∑x∈S

logP(x) =1

|S|∑x∈S

log[∑m∈M

wmF (x |µm,σm)]

EM iteration equations: (m ∈M, n ∈ N ) Unnecessary norming of variables

q(m|x) =wmF (x |µm,σm)∑Mj=1 wjF (x |µj ,σj)

, x ∈ S,

w′

m =1

|S|∑x∈S

q(m|x), µ′

mn =1

w ′m|S|∑x∈S

xnq(m|x)

(σ′

mn)2 =1

w ′m|S|∑x∈S

(xn − µ′

mn)2q(m|x) =1

w ′m|S|∑x∈S

x2nq(m|x) − (µ

mn)2

no matrix inversion ⇒ no risk of ill-conditioned matrices

Page 24: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Product mixtures Implementation

EM Estimation of Discrete Product Mixtures

COMPONENTS: products of univariate discrete distributions

F (x |m) =∏n∈N

fn(xn|m), x = (x1, . . . , xN) ∈ X , xn ∈ Xn, |Xn| <∞

L =1

|S|∑x∈S

logP(x) =1

|S|∑x∈S

log

[ ∑m∈M

wm

∏n∈N

fn(xn|m)

], x ∈ X

EM iteration equations: (x ∈ S, S = x (1), . . . , x (K))

q(m|x) =wmF (x |m)∑Mj=1 wjF (x |j)

, w′

m =1

|S|∑x∈S

q(m|x)

f′

n (ξ|m) =1

w ′m|S|∑x∈S

δ(ξ, xn)q(m|x) More details:

Remark 1 Discrete product mixture is not identifiable. Proof

(⇒ problem in cluster analysis × advantage in approximation)

Remark 2 Any discrete distribution is representable as a product mixture.

Page 25: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Product mixtures Implementation

EM Estimation of Multivariate Bernoulli Mixtures

COMPONENTS: products of univariate Bernoulli distributions

binary data: numerals on a binary raster, results of biochemical tests ...

x = (x1, x2, . . . , xN) ∈ X , xn ∈ 0, 1, X = 0, 1N

F (x |m) = F (x |θm) =∏n∈N

fn(xn|θmn) =∏n∈N

θxnmn(1− θmn)1−xn

L =1

|S|∑x∈S

log[∑m∈M

wmF (x |θm)], S = x (1), . . . , x (K)

EM iteration equations:

q(m|x) =wmF (x |θm)∑Mj=1 wjF (x |θj)

, x ∈ S, m = 1, 2, . . . ,M

w′

m =1

|S|∑x∈S

q(m|x), θ′

mn =1

w ′m|S|∑x∈S

xnq(m|x)

Remark: Product of a large number of parameters θmn may underflow.

Page 26: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Product mixtures Implementation

Implementation Comments on EM Algorithm

implementation of EM algorithm as a data cycle (for |S| >> 1)∑x∈S

q(m|x)→ w′

m,∑x∈S

xn q(m|x)→ µ′

mn, θ′

mn

basic condition to verify the correct implementation: L′ ≥ L

relative increment threshold ε to stop iterations:(L′ − L)/L < ε, (ε ≈ 10−3 − 10−5)

ε is useful to avoid “overpeaking” in final stages of convergence

EM algorithm suppresses the weights of “superfluous” components(large number of low-weight components ⇒ to many components M)

global information about overlapping components:

qmax(x) = maxm∈M

q(m|x), qmax =1

|S|∑x∈S

qmax(x)

in multi-dimensional spaces (N >> 1) the criterion qmax isusually high (≈ 0.85÷ 0.99) ⇒ the overlap of components is small

Remark: Correct implementation of EM algorithm can be reliably verifiedby re-identification of mixture parameters from large artificial data.

Page 27: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Product mixtures Implementation

Implementation of EM Algorithm in High Dimensions

PROBLEM: numerical instability of the E-step

the components F (x |m) may “underflow” at dimensions N ≈ 30− 40

⇒ the “lost” values cannot be “recovered” by norming in Eq. for q(m|x)

⇒ inaccurate evaluation of the conditional weights q(m|x)

SOLUTION:

log[F (x |m)wm] = logwm +∑n∈N

log fn(xn|m)

maximum component: logC (x) = maxm log[F (x |m)wm]

NORMING of F (x |m) a P(x) for evaluation of q(m|x):

exp− logC (x) + logwm +∑n∈N

log fn(xn|m) = C (x)−1F (x |m)wm

q(m|x) =C (x)−1F (x |m)wm∑Mj=1 C (x)−1F (x |j)wj

=F (x |m)wm∑Mj=1 F (x |j)wj

Examples of C-pseudocode: Bernoulli Mixture Gaussian Mixture

Page 28: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Structural Model Incomplete Data Weighted Data Sequential Scheme

Structural Mixture Model (Grim et al. 1986, 1999, 2002)

binary structural parameters: φm = (φm1, . . . , φmN) ∈ 0, 1N

F (x |m) =∏n∈N

fn(xn|m)φmn fn(xn|0)1−φmn ,

fn(xn|0) : fixed “background” distributions, usually fn(xn|0) = P∗n (xn)φmn = 0 ⇒ fn(xn|m) is replaced by fn(xn|0)

P(x) =∑m∈M

F (x |m)wm = F (x |0)∑m∈M

G (x |m,φm)wm,

G (x |m,φm) =∏n∈N

[fn(xn|m)

fn(xn|0)

]φmn

, F (x |0) =∏n∈N

fn(xn|0) > 0

“the background distribution” F (x |0) reduces in the Bayes formula:

p(ω|x) =P(x |ω)p(ω)

P(x)=

∑m∈Mω

G (x |m,φm)wm∑j∈M G (x |j ,φj)wj

≈∑

m∈Mω

G (x |m,φm)wm

MOTIVATION: Local, component-specific feature selection,“dimensionless” computation, structural neural networks.

Page 29: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Structural Model Incomplete Data Weighted Data Sequential Scheme

Structural Modification of EM Algorithm

structural optimization can be included into EM algorithm:

L =1

|S|∑x∈S

log[ ∑m∈M

F (x |0)G (x |m,φm)wm

]EM iteration equations: (m ∈M, n ∈ N , x ∈ S)

q(m|x) =G (x |m,φm)wm∑j∈M G (x |j ,φj)wj

, w′

m =1

|S|∑x∈S

q(m|x),

f′

n (.|m) = arg maxfn(.|m)

∑x∈S

q(m|x)

w ′m|S|log fn(xn|m)

structural optimization:

φ′

mn = 1 for a fixed number R of largest values of the criterion γ′

mn:

γ′

mn =1

|S|∑x∈S

q(m|x) log[ f ′n (xn|m)

fn(xn|0)

]Proof

Remark: The background distribution F (x |0) can be included intooptimization too (Grim, 1999).

Page 30: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Structural Model Incomplete Data Weighted Data Sequential Scheme

Structural EM Algorithm - Discrete Mixture

fn(xn|m), xn ∈ Xn, n ∈ N ≈ discrete probability distributions

L =1

|S|∑x∈S

log[ ∑m∈M

G (x |m,φm)wm

], G (x |m) =

∏n∈N

[fn(xn|m)

fn(xn|0)

]φmn

EM iteration equations: (m ∈M, n ∈ N , x ∈ S)

q(m|x) =G (x |m,φm)wm∑j∈M G (x |j ,φj)wj

, w′

m =1

|S|∑x∈S

q(m|x)

f′

n (ξ|m) =∑x∈S

δ(ξ, xn)q(m|x)

w ′m|S|, Details

structural optimization: φ′

mn = 1 for the R largest values γ′

mn:

γ′

mn =∑x∈S

q(m|x)

w ′m|S|log[ f ′n (xn|m)

fn(xn|0)

]= w

m

∑ξn∈Xn

f′

n (ξn|m) logf′

n (ξn|m)

fn(ξn|0)Proof

Remark: The last sum is the Kullback-Leibler information divergence.

Page 31: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Structural Model Incomplete Data Weighted Data Sequential Scheme

Structural EM Algorithm - Gaussian Mixture

Gaussian densities: fn(xn|µmn, σmn) = 1√2πσmn

exp− (xn−µmn)2

2σ2mn

L =

1

|S|∑x∈S

log[ ∑m∈M

wm

∏n∈N

(fn(xn|µmn, σmn)

fn(xn|µ0n, σ0n)

)φmn ],

EM iteration equations: (m ∈M, n ∈ N , x ∈ S)

q(m|x) =G (x |m,φm)wm∑j∈M G (x |j ,φj)wj

, w′

m =1

|S|∑x∈S

q(m|x),

µ′

mn =1

w ′m|S|∑x∈S

xnq(m|x), (σ′

mn)2 =1

w ′m|S|∑x∈S

x2nq(m|x)− (µ

mn)2,

structural optimization: φ′

mn = 1 for the R largest values γ′

mn :

γ′

mn =w′

m

2

[(µ′

mn − µ0n)2

(σ0n)2+

(σ′

mn)2

(σ0n)2− log

(σ′

mn)2

(σ0n)2− 1

]= w

mI (f′

n (·|m), fn(·|0))

Remark: γ′

mn is the Kullback-Leibler information divergence. Proof

Page 32: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Structural Model Incomplete Data Weighted Data Sequential Scheme

Properties of Structural Mixture Model

STRUCTURAL MIXTURES ≈ statistically correct subspace approach:

PRINCIPLE: the less informative univariate distributions fn(xn|m)are replaced by fixed “background” distributions fn(xn|0)

reduces the number of mixture parameter (and components)⇒ reduces the risk of overpeaking

suppresses the influence of unreliable (less informative) variables

the EM algorithm performs feature selection for each componentindependently (it is not necessary to exclude variables globally)

Bayesian decision-making based on structural mixtures is dimensionindependent (Grim 2016)

the structural optimization implied by EM algorithm is controlled by theKullback-Leibler information divergence

avoids the biologically unnatural connection of probabilistic neurons withall input variables (Grim et al. 2000)

enables the structural optimization of probabilistic neural networks byEM algorithm (Grim 2007)

Page 33: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Structural Model Incomplete Data Weighted Data Sequential Scheme

Modification of EM Algorithm for Incomplete Data

INCOMPLETE DATA: x = (x1,−, x3, x4,−,−, x7, . . . , xN) ∈ X

N (x) = n ∈ N : variable xn is defined in x, x ∈ XSn = x ∈ S : n ∈ N (x), ≈ vectors x ∈ S with the defined variable xn

Assumption: components in product form ⇒ Easily available marginals

L =1

|S|∑x∈S

log[∑m∈M

wmF (x |m)], F (x |m) =∏

n∈N (x)

fn(xn|m)

EM iteration equations: (m ∈M, n ∈ N , x ∈ S)

q(m|x) =wmF (x |m)∑Mj=1 wj F (x |j)

, w′

m =1

|S|∑x∈S

q(m|x)

f′

n (.|m) = arg maxfn(.|m)

1∑x∈Sn q(m|x)

∑x∈Sn

q(m|x) log fn(xn|m)

Remark: The likelihood criterion depends on available values only.

Page 34: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Structural Model Incomplete Data Weighted Data Sequential Scheme

Modification of EM algorithm for Weighted Data

NOTATION: γ(x) > 0 : relative frequency of x in S, (∑

x∈X γ(x) = 1)

L =1

|S|∑x∈S

log[∑m∈M

wmF (x |m)] =∑x∈X

γ(x) log[∑m∈M

wmF (x |m)]

X = x ∈ X : γ(x) > 0 : the sum can be confined to x ∈ X :

“weighted” EM iteration equations: (m ∈M, n ∈ N , x ∈ X )

q(m|x) =wmF (x |m)∑j∈M wjF (x |j)

, F (x |m) =∏n∈N

fn(xn|m)

w′

m =1

|S|∑x∈S

q(m|x) =∑x∈X

γ(x)q(m|x)

F′(.|m) = arg max

F (.|m)

∑x∈X

γ(x)q(m|x)

w ′mlog F (x |m)

Applications: relevance of data, aggregation of data,discrete data weighted by table values: γ(x) = P∗(x), x ∈ X

Page 35: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Structural Model Incomplete Data Weighted Data Sequential Scheme

Sequential Decision Scheme (Grim 1986, 2014)

INFORMATION CONTROLLED SEQUENTIAL DECISION-MAKING

Given the observations xD = (xj1 , . . . , xjl ) ∈ XD , D = j1, . . . , jl ⊂ N wehave to choose the next most informative variable xn, n /∈ D to maximize theconditional information IxD

(Xn,Ω) about the classes Ω = ω1, . . . , ωK.

SOLUTION: explicit evaluation of the criterion IxD(Xn,Ω)

IxD(Xn,Ω) = HxD

(Xn)− HxD(Xn|Ω), n∗ = arg max

n/∈DIxD

(Xn,Ω)

HxD(Xn) =

∑xn∈Xn

−Pn|D(xn|xD) logPn|D(xn|xD), Pn|D(xn|xD) =PnD(xn, xD)

PD(xD)

HxD(Xn|Ω) =

∑ω∈Ω

p(ω|xD)∑xn∈Xn

−Pn|Dω(xn|xD , ω) logPn|Dω(xn|xD , ω),

Pn|Dω(xn|xD , ω) = PnD|ω(xn, xD |ω)/PD|ω(xD |ω) =∑

m∈Mω

Wm(xD , ω)fn(xn|m),

PnD|ω(xn, xD |ω) =∑m∈M

wmfn(xn|m, ω)∏i∈D

fi (xi |m, ω),

Page 36: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Structural Model Incomplete Data Weighted Data Sequential Scheme

Feature Selection: the Most Informative Subspace

special case of the sequential decision scheme:

INFORMATION CRITERION for the optimal feature subset

ASSUMPTION: class-conditional product mixtures P(x |ω), ω ∈ Ω

I (XD ,Ω) = H(XD)− H(XD |Ω), D∗ = arg maxD⊂N

I (XD ,Ω)

PD|ω(xD |ω) =∑

m∈Mω

wm

∏n∈D

fn(xn|m), xD ∈ XD ,

H(XD) =∑

xD∈XD

−PD(xD) logPD(xD), D = j1, . . . , jk ⊂ N , |D| = k

H(XD |Ω) =∑ω∈Ω

p(ω)∑

xD∈XD

−PD|ω(xD |ω) logPD|ω(xD |ω)

optimal subset D ⊂ N : complete search, approximate methods

APPLICATION: informative feature selection for pattern recognition

Page 37: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

PROPERTIES OF PRODUCT MIXTURES

SURVEY: computational properties of product mixtures

efficient estimation of multivariate distribution mixtures (!)

suitable to approximate multi-modal, real-life probability distributions

with increasing number of components the Gaussian mixturesapproach the asymptotic accuracy of Parzen (kernel) estimates

unlike Parzen estimates the product mixtures are optimally“smoothed” by the efficient EM algorithm

directly available marginal probability distributions (!)

the mixture parameters can be estimated from incomplete data

product components enable the information controlledsequential decision-making in multi-dimensional spaces

product mixtures can be interpreted as probabilistic neural networks

enable the structural optimization of probabilistic neural networks

provide information criterion for the optimal feature subset Literature

Page 38: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A1: Asymptotic Properties of Parzen Estimates

Theorem (Parzen, 1962; Cacoullos, 1966)

Let SK be a sequence of K independent observations of an N-dimensionalrandom vector distributed with the probability density function P∗(x). Thenon-parametric density estimate P(x) with the soothing parameter σK

P(x) =1

K

∑y∈SK

∏n∈N

1√2πσK

exp (xn − yn)2

2σ2K

is asymptotically unbiased in each continuity point of P∗(x), i.e. it holds

limK→∞

ESK P(x) = P∗(x)

if limK→∞ σK = 0. In addition, if limK→∞ KσNK =∞, then the unbiased

estimate P(x) is asymptotically consistent in the quadratic mean sense:

limK→∞

ESK [P∗(x)− P(x)]2 = 0.

Back: Compromise Back: Product mixtures

Page 39: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A2: Optimal Smoothing of Parzen (Kernel) Estimates

Parzen estimate with Gaussian kernel:

P(x) =1

|S|∑y∈S

f (x |y ,σ) =1

|S|∑y∈S

[∏n∈N

1√2πσn

exp (xn − yn)2

2σ2n

]

optimization by cross-validation (leaving-one-out) method:≈ to maximize the modified log-likelihood function by EM algorithm:

L(σ) =∑x∈S

log

1

(|S| − 1)

∑y∈S,y 6=x

∏n∈N

1√2πσn

exp (xn − yn)2

2σ2n

q(y |x) =

f (x |y ,σ)∑u∈S,u 6=x f (x |u,σ)

, y ∈ S

(σ′

n)2 =1

|S|∑x∈S

∑y∈S,y 6=x

(xn − yn)2q(y |x)

Remark: Optimal smoothing is crucial in high-dimensional spaces!Back: Product mixtures Back: Compromise

Page 40: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

“Under-smoothed” Kernel Estimate

Page 41: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

“Over-smoothed” Kernel Estimate

Page 42: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Optimally Smoothed Kernel Estimate

(general Gaussian kernel) Back: Norm. mixture

Page 43: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A3: Marginal Distributions of a Product Mixture

easily obtained by omitting superfluous terms in products:

P(x) =∑m∈M

wmF (x |m) =∑m∈M

wm

∏n∈N

fn(xn|m), x = (x1, . . . , xN) ∈ X

∑xi∈Xi

P(x) =M∑

m=1

wm(∑xi∈Xi

fi (xi |m))∏

n∈N\i

fn(xn|m) =M∑

m=1

wm

∏n∈N\i

fn(xn|m)

xC = (xi1 , xi2 , . . . , xik ) ∈ XC , XC = Xi1 × · · · × Xik , C = i1, . . . , ik ⊂ N

PC (xC ) =∑m∈M

wmFC (xC |m), FC (xC |m) =∏n∈C

fn(xn|m)

Pn|C (xn|xC ) =PnC (xn, xC )

PC (xC )=∑m∈M

wmFC (xC |m)

PC (xC )fn(xn|m)

Pn|C (xn|xC ) =∑m∈M

Wm(xC )fn(xn|m), Wm(xC ) =wmFC (xC |m)

PC (xC )

Back - Incomplete data Back: Product mixtures

Page 44: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A4: Solution of the M-Step - Gaussian Mixture

Gaussian Mixture with a General Covariance Matrix:

F (x |cm,Am) =1√

(2π)N detAm

exp−1

2(x − cm)TA−1

m (x − cm)

P(x) =∑m∈M

wmF (x |cm,Am)

implicit form of the M-Step:

(c′

m,A′

m) = arg max(cm,Am)

∑x∈S

γ(x) log F (x |cm,Am)

explicit solution:

c′

m =∑x∈S

γ(x) x , γ(x) =q(m|x)∑y∈S q(m|y)

A′

m =∑x∈S

γ(x) (x − c′

m)(x − c′

m)T =∑x∈S

γ(x)xxT − c′

m(c′

m)T

Back: Gaussian Mixture

Page 45: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A5: Solution of the M-Step - Discrete Product Mixture

f′

n (.|m) = arg maxfn(.|m)

∑x∈S

q(m|x)

w ′m|S|log fn(xn|m)

, n ∈ N , m ∈M,

∑ξ∈Xn

δ(ξ, xn) = 1, xn ∈ Xn,

f′

n (.|m) = arg maxfn(.|m)

∑x∈S

( ∑ξ∈Xn

δ(ξ, xn))q(m|x)

w ′m|S|log fn(xn|m)

,

f′

n (.|m) = arg maxfn(.|m)

∑ξ∈Xn

∑x∈S

δ(ξ, xn)q(m|x)

w ′m|S|log fn(ξ|m)

,

f′

n (.|m) = arg maxfn(.|m)

∑ξ∈Xn

(∑x∈S

δ(ξ, xn)q(m|x)

w ′m|S|

)log fn(ξ|m)

,

⇒ f′

n (ξ|m) =∑x∈S

δ(ξ, xn)q(m|x)

w ′m|S|

Back: EM algorithm Back: Structural EM

Page 46: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A5: Invariance of EM Algorithm Under Linear Transform

EM estimate of a Gaussian mixture is invariant under linear transform

Let the parameters wm, µmn, σmn,m ∈M, n ∈ N of a Gaussian productmixture define a stationary point of EM algorithm, i.e. they satisfy the EMiteration equations. Further let y = T (x) be a linear transform of the vectorsx ∈ X a of the mixture parameters :

yn = anxn + bn, x ∈ S, wm = wm, µmn = anµmn + bn, σmn = anσmn.

Then the transformed parameters wm, µmn, σmn,m ∈M, n ∈ N also definea stationary point of EM algorithm in the transformed space Y.

Proof: The following equations can be verified by related substitutions:

F (y |µm, σm) =1∏

n∈N anF (x |µm,σm), P(y) =

1∏n∈N an

P(x)

µmn =1

wm|S|∑y∈S

ynq(m|y), (σmn)2 =1

wm|S|∑y∈S

(yn − µmn)2q(m|y)

q(m|y) = q(m|x), y = T (x), x ∈ S, m ∈M Back: Gaussian Product Mixture

Page 47: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A6: Monotonic Property of Structural EM Algorithm

structural mixture is a special case of product mixture model, i.e.

w′

m =1

|S|∑x∈S

q(m|x), f′

n (.|m) = arg maxfn(.|m)

∑x∈S

q(m|x)

w ′m|S|log fn(xn|m)

It is necessary to prove, that the monotonic property holds for theoptimized structural parameters φmn. We use the inequality :

L′− L ≥

∑m∈M

1

|S|∑x∈S

∑m∈M

q(m|x) log[F ′(x |m)

F (x |m)

]≥ 0

and, making substitution for F′(x |m),F (x |m), we obtain:

L′− L ≥

∑m∈M

1

|S|∑x∈S

∑m∈M

q(m|x) log[G ′(x |m,φ′m)

G (x |m,φm)

]

L′− L ≥

∑m∈M

1

|S|∑x∈S

∑m∈M

q(m|x) log[ f ′n (xn|m)

fn(xn|0)

]φ′mn[ fn(xn|m)

fn(xn|0)

]φmn

Page 48: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Monotonic Property of Structural EM Algorithm

The last inequality can be rewritten in the form:

(∗) L′− L ≥

∑m∈M

∑n∈N

(φ′

mn − φmn)γ′

mn +∑m∈M

∑n∈N

φmn

|S|q(m|x) log

f′

n (xn|m)

fn(xn|m)

where γ′

mn is the structural optimization criterion:

γ′

mn =1

|S|∑x∈S

q(m|x) logf′

n (xn|m)

fn(xn|0), n ∈ N ,m ∈M

In view of the above definition of f′

n (.|m) we can write for arbitrary fn(·|m) :

1

|S|∑x∈S

q(m|x) log f′

n (xn|m) ≥ 1

|S|∑x∈S

q(m|x) log fn(xn|m)

Therefore, the last sum in the inequality (∗) is non-negative and, for thesame reason, we have γ

mn ≥ 0 for all n ∈ N ,m ∈M;

By setting φ′

mn = 1 for the R highest values γ′

mn, we obtain

L′− L ≥

∑m∈M

∑n∈N

(φ′

mn − φmn) γ′

mn ≥ 0 q.e.d. Back: Structural EM

Page 49: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Interpretation of Structural Criterion - Discrete Mixture

fn(xn|m), xn ∈ Xn, n ∈ N ≈ discrete probability distribution

γ′

mn =1

|S|∑x∈S

q(m|x) logf′

n (xn|m)

fn(xn|0), n ∈ N ,m ∈M

∑ξ∈Xn

δ(ξ, xn) = 1, xn ∈ Xn,

γ′

mn =1

|S|∑x∈S

q(m|x)[ ∑ξ∈Xn

δ(ξ, xn)]

logf′

n (xn|m)

fn(xn|0),

γ′

mn =1

|S|∑ξ∈Xn

[∑x∈S

δ(ξ, xn)q(m|x)]

logf′

n (ξ|m)

fn(ξ|0),

γ′

mn = w′

m

∑ξ∈Xn

f′

n (ξ|m) logf′

n (ξ|m)

fn(ξ|0)= w

mI (f′

n (·|m), fn(·|0)),

γ′

mn ≈ Kullback-Leibler information divergence Back: Structural EM

Page 50: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Interpretation of Structural Criterion - Gaussian Mixture

Gaussian densities: fn(xn|µmn, σmn) = 1√2πσmn

exp− (xn−µmn)2

2σ2mn

γ′

mn =1

|S|∑x∈S

q(m|x) logfn(xn|µ

mn, σ′

mn)

fn(xn|µ0mn, σ0n), n ∈ N ,m ∈M,

γ′

mn = w′

m

∑x∈S

q(m|x)

w ′m|S|

[− log

σ′

mn

σ0n− (xn − µ

mn)2

2(σ′mn)2+

(xn − µ0n)2

2(σ0n)2

],

γ′

mn =w′

m

2

[(µ′

mn − µ0n)2

(σ0n)2+

(σ′

mn)2

(σ0n)2− 1− log

(σ′

mn)2

(σ0n)2

]=

it is easily verified: Back: Structural EM

= w′

m

∫Xn

fn(xn|µ′

mn, σ′

mn) logfn(xn|µ

mn, σ′

mn)

fn(xn|µ0n, σ0n)dxn = w

mI (f′

n (·|m), fn(·|0))

⇒ γ′

mn ≈ “continuous” Kullback-Leibler information divergence

Page 51: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A7: Non-Identifiability of Discrete Product Mixtures

Definition of Identifiability of Mixtures (Teicher, 1963)

The class of Mixtures P = P(x ,θ) : θ ∈ Θ is identifiable, if the parameters

θ,θ′∈ Θ of any two equivalent mixtures

P(x ,θ) = P(x ,θ′), ∀ x ∈ X

may differ only by the order of components. Back: identification x aproximation

Theorem ( Grim, 2001; cf. Teicher, 1963, 1968; Gyllenberg et al., 1994;)

Arbitrary discrete product mixture (xn ∈ Xn, |Xn| <∞)

P(x) =∑m∈M

wmF (x |m) =∑m∈M

wm

∏n∈N

fn(xn|m)

has infinitely many equivalent forms with different parameters, if at least oneof the univariate component distributions fi (xi |m) is nonsingular, i.e. satisfiesthe condition

0 < fi (xi |m) < 1, for some xi ∈ Xi . Back: Discrete mixture

Page 52: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Proof: Non-Identifiability of Discrete Product Mixtures

Proof: Let 0 < fi (xi |m) < 1 for some i ∈ N , xi ∈ Xi and m ∈M. Then, forany 0 < α < 1, β = 1− α, we can construct two different probabilitydistributions f

i (·|m), f′′

i (·|m) in such a way that the distribution fi (·|m)

represents an internal point of the abscise 〈f ′i (·|m), f′′

i (·|m)〉 in the|Xi |-dimensional space in the sense of the following condition:

(*) fi (ξ|m) = αf′

i (ξ|m) + βf′′

i (ξ|m), ξ ∈ Xi .

Consequently, the nonsingular probability distribution fi (·|m) can beexpressed as a convex combination of two distributions f

i (·|m), f′′

i (·|m) ininfinitely many ways. By using the above substitution (*) we can write

(**) wmF (x |m) = w′

mF′(x |m) + w

′′

mF′′

(x |m),

wherew′

m = αwm, w′′

m = βwm, (w′

m + w′′

m = wm),

F′(x |m) = f

′(xi |m)

∏n∈N ,n 6=i

fn(xn|m), F′′

(x |m) = f′′

(xi |m)∏

n∈N ,n 6=i

fn(xn|m)

Finally, making substitution (**) for wmF (x |m), we obtain a non-triviallydifferent equivalent of the original distribution P(x), q.e.d. Back: EM algorithm

Page 53: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A8: Alternative Proof of the EM Monotonic Property

Kullback-Leibler information divergence is non-negative, i.e. :

I (q(·|x), q′(·|x)) =

∑m∈M

q(m|x) logq(m|x)

q′(m|x)≥ 0, Proof

The following proof follows the original idea of Schlesinger. Using notation

L =1

|S|∑x∈S

log[∑m∈M

wmF (x |m)], q(m|x) =wmF (x |m)∑Mj=1 wjF (x |j)

We can express the log-likelihood functions L and L′

equivalently by means ofthe conditional weights q(m|x), q

′(m|x):

L =1

|S|∑x∈S

∑m∈M

q(m|x) log[wmF (x |m)]−∑m∈M

q(m|x) log q(m|x)

L′

=1

|S|∑x∈S

∑m∈M

q(m|x) log[w′

mF′(x |m)] −

∑m∈M

q(m|x) log q′(m|x)

Page 54: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Alternative Proof of the EM Monotonic Property

Using the above equations we can express the increment L′ − L as follows:

L′−L =

1

|S|∑x∈S

∑m∈M

q(m|x) log[w ′mF ′(x |m)

wmF (x |m)

]+∑m∈M

q(m|x) logq(m|x)

q′(m|x)

where the second sum on the right-hand side is the non-negativeKullback-Leibler divergence:

L′− L =

1

|S|∑x∈S

∑m∈M

q(m|x) log[w ′mF ′(x |m)

wmF (x |m)

]+ I (q(·|x), q

′(·|x))

and therefore, we can write the inequality:

L′− L ≥ 1

|S|∑x∈S

∑m∈M

q(m|x) log[w ′mF ′(x |m)

wmF (x |m)

]L′− L ≥

∑m∈M

[ 1

|S|∑x∈S

q(m|x)]

logw′

m

wm+

1

|S|∑m∈M

∑x∈S

q(m|x) logF′(x |m)

F (x |m)

Page 55: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Alternative Proof of the EM Monotonic Property

Making substitution for w′

m from the M-Step we obtain the inequality∑m∈M

[ 1

|S|∑x∈S

q(m|x)]

logw′

m

wm=∑m∈M

w′

m logw′

m

wm≥ 0

Further, in view of the M-Step definition

F′(.|m) = arg max

F (.|m)

∑x∈S

q(m|x)

w ′m|S|log F (x |m)

we can write for any component F (x |m) the inequality:

(∗)∑x∈S

q(m|x) log F′(x |m) ≥

∑x∈S

q(m|x) log F (x |m), m ∈M

The monotonic property of EM algorithm follows from the above inequalities:

L′− L ≥

∑m∈M

w′

m logw′

m

wm+

1

|S|∑m∈M

∑x∈S

q(m|x) logF′(x |m)

F (x |m)≥ 0

Remark: The M-Step definition is redundantly strong, the new parametersneed to satisfy only the inequalities (*) ⇒ GEM algorithm Back

Page 56: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A9: Monotonic Property of EM Algorithm - Implications

Nondecreasing and above bounded sequence L(t)∞t=0 has a finite limitL∗ <∞ and therefore the following necessary condition is satisfied:

limt→∞

L(t) = L∗ <∞ ⇒ limt→∞

(L(t+1) − L(t)) = 0

Analogous conditions hold for the sequences w (t)(m)∞t=0 and

q(t)(·|x)∞t=0,m ∈M, too:

limt→∞

||w (t+1)(m)− w (t)(m)|| = 0, limt→∞

||q(t+1)(m|x)− q(t)(m|x)|| = 0.

The last limits follow from the inequality

L(t+1) − L(t) ≥ I (w (t+1)(·)||w (t)(·)) +1

|S|∑x∈S

I (q(t)(·|x)||q(t+1)(·|x))

and from the following general inequality (cf. Kullback (1966)): Back

∑x∈X

P∗(x) logP∗(x)

P(x)≥ 1

4

(∑x∈X|P∗(x)− P(x)|

)2

≥ 1

4‖P∗(·)− P(·)‖2

Page 57: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A10: M.-L. Estimates versus Approximation Problems

Lemma

Maximum-likelihood estimate asymptotically minimizes the upper bound ofthe Euklidean distance between the true discrete distribution P∗(·) and itsapproximating estimate P(·).

Proof: Asymptotically, for |S| → ∞, we can write

lim|S|→∞

1

|S|∑x∈S

logP(x) = lim|S|→∞

∑x∈S

γ(x) logP(x) =∑x∈X

P∗(x) logP(x)

where γ(x) ≥ 0 is the relative frequency of the discrete vector x in the i.i.d.sequence S and P∗ is the true probability distribution. The assertion followsfrom the inequality (cf. Kullback, 1966):∑

x∈XP∗(x) log

P∗(x)

P(x)≥ 1

4

(∑x∈X|P∗(x)− P(x)|

)2

≥ 1

4‖P∗(·)− P(·)‖2

Remark: The m.-l. estimate P(·) is justified as approximation of P∗(·).Back

Page 58: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A11: Kullback-Leibler Divergence is Non-Negative

Theorem (cf. e.g. Vajda, 1992)

Any two discrete probability distributions q1, q2, . . . , qM, q′

1, q′

2, . . . , q′

Msatisfy the following inequality

I (q‖ q′) =

∑m∈M

qm logqmq′m≥ 0

where the equality holds only if q′

m = qm, for all m ∈M.

Proof: Without any loss of generality we can assume qm > 0 for all m ∈M(since 0 log 0 = 0 asymptotically). By Jensens inequality we have:∑

m∈Mqm log

q′

m

qm≤ log

( ∑m∈M

qmq′

m

qm

)= log

( ∑m∈M

q′

m

)= log 1 = 0,

where the equality occurs only if q′

1/q1 = · · · = q′

M/qM , q.e.d.

Consequence: The following left-hand sum is maximized by q′

= q:∑m∈M

qm log q′

m ≤∑m∈M

qm log qm Back - Proof Back (Alternative Proof) Back (M-Step)

Page 59: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A12: Universality of Discrete Product Mixtures

Lemma (see e.g. Grim, 2006)

Let the table values p(k), k = 1, . . . ,K , K = |X | define a probabilitydistribution P(x) on a discrete space X :

P(x (k)) = p(k), x (k) ∈ X , k = 1, . . . ,K , X = ∪Kk=1x (k)

Then the discrete probability distribution P(x) can be expressed as a productdistribution mixture by using δ-functions in the product components:

P(x) =K∑

k=1

wkF (x |k) =K∑

k=1

p(k)∏n∈N

δ(xn, x(k)n ), x ∈ X .

Proof: The products of δ-functions in the components uniquely define thepoints x (k) ∈ X corresponding to the respective probabilistic table values p(k):

F (x |k) =∏n∈N

δ(xn, x(k)n ), wk = p(k), k = 1, . . . ,K .

Remark: The proof has only formal meaning, the mixture approximationbased on EM algorithm is numerically more efficient. Back - (“representable”)

Back - Advantages

Page 60: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A13: EM algorithm for Multivariate Bernoulli Mixtures

example of EM algorithm: multivariate Bernoulli mixture

Back

Page 61: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

A14: EM algorithm for Gaussian Product Mixtures

example of EM algorithm: multivariate Gaussian product mixture

Remark: Possible solution of the “underflow” problem. Back

Page 62: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Prof. M.I. Schlesinger with his wife

At Karlstejn castle during his visit in Prague in 1995. Back Back - Literature

Page 63: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 1/12

Ajvazjan S.A., Bezhaeva Z.I., Staroverov O.V. (1974):Classification ofMultivariate Observations, (in Russian). Moscow: Statistika.

Boyles R.A. (1983): On the convergence of the EM algorithm. J. Roy. Statist.Soc., B, Vol. 45, pp. 47-50.

Cacoullos I. (1966): Estimation of a multivariate density. Ann. Inst. Stat.Math., Vol. 18, pp. 179-190.

Carreira-Perpignan M.A., Renals S. (2000): Practical identifiability of finitemixtures of multivariate Bernoulli distributions. Neural Computation, Vol. 12,pp. 141-152.

Day N.E. (1969): Estimating the components of a mixture of normaldistributions. Biometrika, Vol. 56, pp. 463-474.

Dempster A.P., Laird N.M. and Rubin D.B. (1977): Maximum likelihood fromincomplete data via the EM algorithm. J. Roy. Statist. Soc., B, Vol. 39,pp.l-38.

Back

Page 64: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 2/12

Duda R.O., Hart P.E. (1973): Pattern Classification and Scene Analysis. NewYork: Wiley-Interscience.

Everitt, B.S. and D.J. Hand (1981): Finite Mixture Distributions. Chapman &Hall: London, 1981.

Grim J. (1982): On numerical evaluation of maximum - likelihood estimates forfinite mixtures of distributions. Kybernetika, Vol.18, No.3, pp.173-190.http://dml.cz/dmlcz/124132

Grim J. (1982): Design and optimization of multilevel homogeneous structuresfor multivariate pattern recognition. In Fourth FORMATOR Symposium 1982,Academia, Prague 1982, pp. 233-240.

Grim, J. (1984): On structural approximating multivariate discrete probabilitydistributions. Kybernetika, Vol. 20, No. 1, pp. 1-17, 1984.http://dml.cz/dmlcz/125676

Grim J. (1986): Multivariate statistical pattern recognition with nonreduceddimensionality, Kybernetika, Vol. 22, pp. 142-157.http://dml.cz/dmlcz/125022 Back

Page 65: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 3/12

Grim, J. (1986): Sequential decision-making in pattern recognition based onthe method of independent subspaces. In: Proceedings of the DIANA IIConference on Discriminant Analysis, (Ed. F. Zitek), Mathematical Institute ofthe AS CR, Prague 1986, pp. 139-149.

Grim J. (1994): Knowledge representation and uncertainty processing in theprobabilistic expert system PES, International Journal of General Systems, Vol.22, No. 2, p. 103 - 111.

Grim J. (1992): A dialog presentation of census results by means of theprobabilistic expert system PES, in Proceedings of the Eleventh EuropeccnMeeting on Cybernetics and Systems Research, (Ed. R.Trappl), Vienna, April1992, World Scientific, Singapore 1992, pp. 997-1005. Paper Award

Grim J. and Bocek P. (1995): Statistical Model of Prague Households forInteractive Presentation of Census Data, In SoftStat’95. Advances inStatistical Software 5, pp. 271 - 278, Lucius & Lucius: Stuttgart, 1996.

Grim J. (1996): Maximum Likelihood Design of Layered Neural Networks. In:Proceedings of the 13th International Conference on Pattern Recognition IV(pp. 85-89), Los Alamitos: IEEE Computer Society Press. Back

Page 66: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 4/12

Grim J. (1996a): Design of multilayer neural networks by informationpreserving transforms. In: E. Pessa, M.P. Penna, A. Montesanto (Eds.),Proceedings of the Third European Congress on System Science (pp.977-982), Roma: Edizzioni Kappa.

Grim J. (1998): A sequential modification of EM algorithm. In Studies inClassification, Data Analysis and Knowledge Organization, Gaul W.,Locarek-Junge H., (Eds.), pp. 163 - 170, Springer, 1999.

Grim J., Somol P., Novovicova J., Pudil P. and Ferri F. (1998b): Initializingnormal mixture of densities. In Proc. 14th Int. Conf. on Pattern RecognitionICPR’98, A.K. Jain, S. Venkatesh, B.C. Lovell (Eds.), pp. 886-890, IEEEComputer Society: Los Alamitos, California, 1998

Grim J. (1999): Information approach to structural optimization ofprobabilistic neural networks. In Proceedings of the 4th System ScienceEuropean Congress, L. Ferrer et al. (Eds.), (pp: 527-540), Valencia: SociedadEspanola de Sistemas Generales, 1999.

Grim J. (2000): Self-organizing maps and probabilistic neural networks. Neural

Network World, 3(10): 407-415. Paper Award Back

Page 67: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 5/12

Grim J., Kittler J., Pudil P. and Somol P. (2000): Combining multipleclassifiers in probabilistic neural networks, In Multiple Classifier Systems, Eds.Kittler J., Roli F., Springer, 2000, pp. 157 - 166.

Grim J., Pudil P. and Somol P. (2000): Recognition of handwritten numeralsby structural probabilistic neural networks. In: Proceedings of the Second ICSCSymposium on Neural Computation, Berlin, 2000. (Bothe H., Rojas R. eds.).ICSC, Wetaskiwin, 2000, pp 528-534. Paper Award

Grim J., Kittler J., Pudil P. and Somol P. (2001): Information analysis ofmultiple classifier fusion. In: Multiple Classifier Systems 2001, Kittler J., RoliF., (Eds.), Lecture Notes in computer Science, Vol. 2096, Springer-Verlag,Berlin, Heidelberg, New York 2001, pp. 168 - 177.

Grim J., Bocek P. and Pudil P. (2001): Safe dissemination of census results bymeans of interactive probabilistic models. In: Proceedings of the ETK-NTTS2001 Conference, (Hersonissos (Crete), June 18-22, 2001), Vol.2, pp. 849-856,European Communities 2001. Back

Page 68: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 6/12

Grim J. (2001): Latent Structure Analysis for Categorical Data. Research

Report No. 2019. UTIA AV CR, Praha 2001, 13 pp. 23

Grim J., Kittler J., Pudil P. and Somol P. (2002): Multiple classifier fusion inprobabilistic neural networks. Pattern Analysis & Applications Vol. 5, No. 7,pp. 221-233.

Grim J. and Haindl M. (2003): Texture Modelling by Discrete DistributionMixtures. Computational Statistics and Data Analysis, 3-4 41, pp. 603-615.

Grim J., Just P. and Pudil P. (2003): Strictly modular probabilistic neuralnetworks for pattern recognition. Neural Network World, Vol. 13 , No. 6, pp.599-615.

Grim J., Somol P., Pudil P. and Just P. (2003): Probabilistic neural networkplaying a simple game. In Artificial Neural Networks in Pattern Recognition.(Marinai S., Gori M. Eds.). University of Florence, Florence 2003, pp. 132-138.

Back

Page 69: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 7/12

Grim J., Hora J. and Pudil P. (2004): Interaktivnı reprodukce vysledku scıtanılidu se zarucenou ochranou anonymity dat. Statistika, Vol. 84, No. 5, pp.400-414.

Grim J., Haindl M., Somol P., Pudil P. and Kudo M. (2004): A Gaussianmixture-based colour texture model. In: Proc. of the 17th InternationalConference on Pattern Recognition. IEEE, Los Alamitos 2004, pp. 177-180.

Grim J., Somol P., Haindl M. and Pudil P. (2005): A statistical approach tolocal evaluation of a single texture image. In: Proceedings of the 16-th AnnualSymposium PRASA 2005. (Nicolls F. ed.). University of Cape Town, 2005, pp.171-176.

Grim J., Haindl M., Pudil P. and Kudo M. (2005): A Hybrid BTF Model Basedon Gaussian Mixtures. In: Texture 2005. Proceedings of the 4th InternationalWorkshop on Texture Analysis. (Chantler M., Drbohlav O. eds.). IEEE, LosAlamitos 2005, pp. 95-100.

Grim J. (2006): EM cluster analysis for categorical data. In: Structural,Syntactic and Statistical Pattern Recognition. (Yeung D. Y., Kwok J. T., FredA. eds.), (LNCS 4109). Springer, Berlin 2006, pp. 640-648. Back

Page 70: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 8/12

J. Grim (2007): Neuromorphic features of probabilistic neural networks.Kybernetika, Vol. 43, No. 5, pp.697-712. http://dml.cz/dmlcz/135807

Grim J. and Hora, J. (2008): Iterative principles of recognition in probabilistic

neural networks. Neural Networks, Special Issue, 6 21, 838–846 Paper Award

Grim J., Novovicova J. and Somol P. (2008): Structural Poisson Mixtures forClassification of Documents , Proceedings of the 19th International Conferenceon Pattern Recognition, Tampa (Florida), US, p. 1324-1327.

Grim J., Somol P., Haindl M. and J. Danes (2009): Computer-AidedEvaluation of Screening Mammograms Based on Local Texture Models. IEEETrans. on Image Processing, Vol. 18, No. 4, pp. 765-773. Paper Award

Grim J., Hora J., Bocek P., Somol P. and P. Pudil (2010): Statistical Model ofthe 2001 Czech Census for Interactive Presentation. Journal of OfficialStatistics. Vol. 26, No. 4, pp. 673–694. Paper Award

Back

Page 71: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 9/12

Grim J., Somol P. and Pudil P. (2010): Digital Image Forgery Detection byLocal Statistical Models. Proc. 2010 Sixth International Conference onIntelligent Information Hiding and Multimedia Signal Processing, Los Alamitos,IEEE computer society, Echizen, I. et al., eds., pp. 579-582.

J. Grim (2011): Preprocessing of Screening Mammograms Based on LocalStatistical Models. Proceedings of the 4th International Symposium on AppliedSciences in Biomedical and Communication Technologies, ISABEL 2011,Barcelona, ACM, pp. 1-5

Grim, J. (2014). Sequential pattern recognition by maximum conditionalinformativity. Pattern Recognition Letters, Vol. 45C, pp. 39-45.http:// dx.doi.org/10.1016/j.patrec.2014.02.024 Paper Award

Grim, J. (2017). Approximation of unknown multivariate probabilitydistributions by using mixtures of product components: a tutorial. InternationalJournal of Pattern Recognition and Artificial Intelligence, to appear.

Gyllenberg M., Koski T., Reilink E. and M. Verlaan (1994): Non-uniqueness inprobabilistic numerical identification of bacteria. Journal of AppliedProbability, Vol. 31, pp. 542–548. Back

Page 72: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 10/12

Hasselblad V. (1966): Estimation of prameters for a mixture of normaldistributions. Technometrics, Vol. 8, pp. 431-444.

Hasselblad V. (1969): Estimation of finite mixtures of distributions from theexponential family. Journal of Amer. Statist. Assoc., Vol. 58, pp. 1459-1471.

Isaenko O.K. and Urbakh K.I. (1976): Decomposition of probability distributionmixtures into their components (in Russian). In: Theory of probability,mathematical statistics and theoretical cybernetics, Vol. 13, Moscow: VINITI.

Kullback S. (1966): An information-theoretic derivation of certain limitrelations for a stationary Markov Chain. SIAM J. Control, Vol. 4, No. 3, pp.454-459.

McLachlan, G.J. and Krishnan, T. (1997): The EM algorithm and extensions,John Wiley & Sons, New York.

McLachlan G.J. and Peel D. (2000): Finite Mixture Models, John Wiley &Sons, New York, Toronto, (2000)

Back

Page 73: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 11/12

Meng X.L. and Van Dyk D. (1997): The EM Algorithm—an Old Folk-songSung to a Fast New Tune. Journal of the Royal Statistical Society: Series B(Statistical Methodology), Vol. 59, No. 3, pp. 511-567.

Parzen E. (1962): On estimation of a probability density function and itsmode. Annals of Mathematical Statistics, Vol. 33., pp. 1065-1076.

Pearson C. (1894): Contributions to the mathematical theory of evolution. 1.Dissection of frequency curves. Philosophical Transactions of the Royal Societyof London 185, 71-110.

Peters B.C. and Walker H.F. (1978): An iterative procedure for obtainingmaximumlikelihood estimates of the parameters for a mixture of normaldistributions. SIAM Journal Appl. Math., Vol. 35, No. 2, pp. 362-378.

Teicher H. (1963): Identifiability of finite mixtures. Ann. Math. Statist., Vol.34, pp. 1265-1269.

Teicher H. (1968): Identifiability of mixtures of product measures. Ann. Math.

Statist., Vol. 39, pp. 1300-1302. Back

Page 74: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Literature 12/12

Schlesinger M.I. (1968): Relation between learning and self learning in pattern

recognition (in Russian), Kibernetika, (Kiev), No. 2, pp. 81-88. Foto

Titterington D.M., Smith A.F.M. and Makov U.E. (1985): Statistical analysisof finite mixture distributions, John Wiley & Sons: Chichester, New York.

Vajda I. and Grim J. (1998): About the maximum information and maximumlikelihood principles in neural networks, Kybernetika, Vol. 34, No. 4, pp.485-494.

Wolfe J.H. (1970): Pattern clustering by multivariate mixture analysis.Multivariate Behavioral Research, Vol. 5, pp. 329-350.

Wu C.F.J. (1983): On the convergence properties of the EM algorithm. Ann.Statist., Vol. 11, pp. 95-103.

Xu L. and Jordan M.I. (1996): On convergence properties of the EM algorithmfor Gaussian mixtures. Neural Computation, Vol. 8. pp. 129-151.

Back

Page 75: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Paper Award

Eleventh European Meeting on Cybernetics and Systems Research,Vienna, April 1992 Back

Page 76: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Paper Award

Second ICSC Symposium on Neural Computation, Berlin, 2000 Back

Page 77: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Paper Award

IEEE Transactions on Image Processing 18(4): 765-773, 2009 Back

Page 78: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Paper Award

Sixth International Conference on Intelligent Information Hiding and MultimediaSignal Processing, IIH-MSP Darmstadt, 2010 Back

Page 79: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Paper Award

Pattern Recognition Letters, Vol. 45C, pp. 39-45, 2014 Back

Page 80: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Paper Award

Neural Networks, 21(6): 838–846, 2008 Back

Page 81: Distribution Mixtures of Product Componentsro.utia.cas.cz/lectures/Product_Mixtures.pdf · N) 2X Principle of the Method of Mixtures: approximation of unknown multidimensional multimodal

Method of Mixtures EM generally Product mixtures Modification Survey Literature

Paper Award

Neural Network World, 3(10): 407-415, 2000 Back


Recommended