+ All Categories
Home > Documents > Survey on ICA Technical Report, Aapo Hyvärinen, 1999. jagota/NCS.

Survey on ICA Technical Report, Aapo Hyvärinen, 1999. jagota/NCS.

Date post: 24-Dec-2015
Category:
Upload: charlene-lewis
View: 215 times
Download: 0 times
Share this document with a friend
34
Survey on ICA Technical Report, Aapo Hyvärinen, 1999. http://ww.icsi.berkeley.edu/ ~jagota/NCS
Transcript

Survey on ICA

Technical Report, Aapo Hyvärinen, 1999.

http://ww.icsi.berkeley.edu/~jagota/NCS

• 2nd-order methods• PCA / factor analysis

• Higher order methods• Projection pursuit / Blind deconvolution

• ICA • definitions• criteria for identifiability• relations to other methods• Applications

• Contrast functions• Algorithms

Outline

x = As + n

General model

Observations

Mixing matrix Noise

Latent variables, factors, independent components

s = Wx

Find transformation

s = f (x)

Consider only linear transformation:

Principal component analysis

• Find direction(s) where variance of wTx is maximized.

• Equivalent to finding the eigenvectors of C=E(xxT) corresponding to the k largest eigenvalues

Principal component analysis

• Closely related to PCA• x = As + n• Method of principal factors:

– Assumes knowledge of covariance matrix of the noise: E(nnT)

– PCA on: C = E(xxT)– E(nnT)

• Factors are not defined uniquely, but only up to a rotation

Factor analysis

• Projection pursuit• Redundancy reduction• Blind deconvolution

Requires assumption that data are not Gaussian

Higher order methods

• Find direction w, such that wTx has an ’interesting’ distribution

• Argued that interesting directions are those that show the least Gaussian distribution

Projection pursuit

Differential entropy

• Maximised when f is a Gaussian density• Minimize H(wTx) to find projection pursuit

directions (y = wTx)• Difficult to estimate the density of wTx

Example: projection pursuit

• Observe filtered version of s(t):

x(t) = s(t)*g(t)

• Find filter h(t), such that

s(t) = h(t)*x(t)

Blind deconvolution

• Seismic: ”statistical deconvolution”

Example blind deconvolution

Blind deconvolution (3)

g(t)

s(t)

t

t

Blind deconvolution (4)

Definition 1 (General definition)

ICA of a random vector x consists of finding a linear transformation, s=Wx, so that the components, si, are as independent as possible, in the sense of maximizing some function F(s1,..,sm) that measure independence.

ICA definitions

Definition 2 (Noisy ICA)

ICA of a random vector x consists of estimating the following model for the data:

x = As + nwhere the latent variables si are assumed independent

Definition 3 (Noise-free ICA) x = As

ICA definitions

• ICA requires statistical independence• Distinguish between statistically independent

and uncorrelated variables• Statistically independent:

• Uncorrelated:

Statistical independence

• All the independent components, but one, must be non-Gaussian

• The number of observed mixtures must be at least as large the number of independent components, m >= n

• The matrix A must be of full column rank

Note: with m < n, A may still be indentifiable

Identifiability of ICA model

• Redundancy reduction• Noise free case

– Find ’interesting’ projections– Special case of projection pursuit

• Blind deconvolution• Factor analysis for non-Gaussian data• Related to non-linear PCA

Relations to other methods

Relations to other methods (2)

• Blind source separation– Cocktail party problem

• Feature extraction• Blind deconvolution

Applications of ICA

Blind source separation

ICA method = Objective function + Optimization algorithm

Objective (contrast) functions

• Multi-unit contrast functions– Find all independent components

• One-unit contrast functions– Find one independent component (at a time)

Mutual information

• Mutual information is zero if the yi are independent

• Difficult to estimate, approximations exist

Mutual information (2)• Alternative definition

Mutual information (3)

H(X) H(Y)

H(X|Y) H(Y|X)I(X,Y)

Non-linear PCA

• Add non-linearity function g(.) in the formula for PCA

• Find one vector, w, so that wTx equals one of the independent components, si

• Related to projection pursuit• Prior knowledge of number of independent

components not needed

One-unit contrast functions

• Difference between differential entropy of y and differential entropy of Gaussian variable with same variance

Negentropy

• If the yi are uncorrelated, the mutual information can be expressed as

• J(y) can be approximated by higher-order cumulants, but estimation is sensitive to outliers

• Have x=As, want to find s=Wx• Preprocessing

– Centering of x– Sphering (whitening) of x

• Find transformation; v=Qx such that E(vvT)=I• Found via PCA / SVD

• Sphering does not solve problem alone

Algorithms

• Jutten-Herault– Cancel non-linear cross-correlations– Non-diagonal terms of W are updated according to

Algorithms (2)

– The yi are updated iteratively as y = (I+W)-1x

• Non-linear decorrelation

• Non-linear PCA• FastICA, ..., etc.

• Definitions of ICA• Conditions for identifiability of model• Relations to other methods• Contrast functions

– One-unit / multi-unit– Mutual information / Negentropy

• Applications of ICA• Algorithms

Summary

• Noisy ICA• Tailor-made methods for certain applications• Use of time correlations if x is a stochastic

process• Time delays/echoes in cocktail-party problem• Non-linear ICA

Future research


Recommended