Motivation Introduction ICA Application ICA Dependencies Summary
Independent Component Analysisfor Feature Extraction
Carmen Klaussner
LCT Language and Communication TechnologyUniversity of Groningen
April, 25th, 2013
Motivation Introduction ICA Application ICA Dependencies Summary
Independent Component Analysis (ICA)
Independent Component Analysis (ICA) is an unsupervised statisticaltechnique used for:
I separating a multivariate signal into independent subcomponents(blind source separation (BSS))
I revealing underlying latent concepts in feature extraction
Motivation Introduction ICA Application ICA Dependencies Summary
ICA and the Cocktail-Party Problem 1
I Imagine two different speakers in a room
I Two microphones placed at different locationsabout the room
I The microphones are recording mixtures of thevarious speech signals
1http://storage.blogues.canoe.ca/davidakin/200811141920.jpg
Motivation Introduction ICA Application ICA Dependencies Summary
ICA and the Cocktail-Party Problem cont’d
Figure: ICA Model 2
2http://www.imodenergy.com/images/courses/imode201/slide03.jpg
Motivation Introduction ICA Application ICA Dependencies Summary
ICA ModelResult of the recordings are mixed signals x1(t), x2(t)
x1(t) = a11s1 + a12s2
x2(t) = a21s1 + a22s2
I where x1 and x2 are the amplitudes and t the time index
I each recorded signal is a weighted sum of the original speech signalsof the two speakers denoted by s1(t) and s2(t)
I a11, a12, a21, and a22 are some parameters that depend on thedistances of the microphones from the speakers
I Assume that s1(t) and s2(t), at each time instant t, are statisticallyindependent
I Given only the mixed signals: x1, x2 ⇒ retrieve the original speechsignal of each speaker: s1(t), s2(t)
Motivation Introduction ICA Application ICA Dependencies Summary
ICA Model cont’d
x = As
with:
I x = (x1, x2...xn)T is a vector of observed random variables
I s = (s1, s2...sn)T the vector of the latent variables (the independentcomponents.)
I A is the unknown constant matrix, the mixing matrix A
I number of components is arbitrary, at most = no. of samples
Aim of Algorithm: find W = A−1, so that we obtain the independentcomponents by:
s = Wx
Motivation Introduction ICA Application ICA Dependencies Summary
ICA Feature Extraction on Text Documents
x =
D1023.txt D1392.txt D1394.txt D1400.txt D1406.txt . . .
able 73 2 1 32 7 . . .about 684 10 32 319 40 . . .above 51 2 4 31 4 . . .abroad 13 1 0 10 0 . . .absence 14 0 0 6 0 . . .absolutely 6 0 0 7 1 . . .accept 23 0 1 5 2 . . .accepted 14 1 0 7 2 . . .accident 11 0 1 9 0 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .
ICA Interpretation
I documents are linear mixtures of concepts
I each term is a mixed signal/observation xi at a different time indext (here: time index = document)
I the source signals s are the latent concepts(independent components)
I aim is to find latent concepts − document representation
Motivation Introduction ICA Application ICA Dependencies Summary
ICA on Text Documentsx = As becomes:
Xterm x document = Aterm x concept ∗ Sconcept x document
s = Wx becomes:
Sconcept x document = Wconcept x term ∗ Xterm x document
I S is a new data representation that combines terms into latentconcepts
I A, the mixing matrix assigns a weight for each term in eachcomponent
I term-by-document matrix is unmixed to yield ’original’concept-by-document mapping
Motivation Introduction ICA Application ICA Dependencies Summary
ICA Output Example
x=
D1023.txt D1392.txt D1394.txt D1400.txt D1406.txt . . .
able 73 2 1 32 7 . . .about 684 10 32 319 40 . . .above 51 2 4 31 4 . . .abroad 13 1 0 10 0 . . .absence 14 0 0 6 0 . . .absolutely 6 0 0 7 1 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .
=
A =
c1 c2 c3 c4 c5 . . .
able 0.076106808 −0.014558451 −0.10733842 −0.091537869 −0.0712187592 . . .about 0.168884358 −0.013135861 −0.04944864 −0.045366695 −0.0675653686 . . .above −0.087822012 −0.025989498 0.05227958 −0.002340966 −0.0181397638 . . .abroad −0.141542609 0.020390763 0.07750117 −0.040127687 0.0002770738 . . .absence −0.002402465 −0.134321250 0.04981664 0.140644925 −0.1017302731 . . .absolutely 0.002845907 −0.004149262 −0.01830506 0.047701236 −0.0910047210 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .
*
s =
D1023.txt D1392.txt D1394.txt D1400.txt D1406.txt . . .
c1 1.000000 −1.000053 1.000000 1.000000 1.000000 . . .c2 −1.068787 −1.026944 0.9187293 −1.068788 −1.068790 . . .c3 −1.000675 −0.9531389 0.9558447 1.002504 −1.000675 . . .c4 −1.038625 −0.8975203 0.8958735 −1.151772 0.9906527 . . .c5 −0.9303368 0.9171785 −0.9577544 1.164191 1.081455 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .
Motivation Introduction ICA Application ICA Dependencies Summary
My Master Thesis: Dickens’ Style Analysis
Find characteristic terms of Charles Dickens compared to hiscontemporary writer Wilkie Collins.
Dickens’ keywords:lot, release, answering, ive, sunk, softened, beside, examined, seven, brothers, wear, eleven, correct,path, watched, sorrow, treated, sounds, masters, oclock, upon, lean, reality, song...
Collins’ keywords:gentle, fate, sweet, contrast, forth, whom, changes, strong, art, disturb, ventured, sorrow, blessing,parties, faded, imagination, towards, moon, portrait, daily, guide, game, although, lot, building,learn, visits, pay, animal, humanity...
Motivation Introduction ICA Application ICA Dependencies Summary
ICA Characteristic Terms Extraction
1. use ICA on term-by-document matrix to extract term concepts
2. extract weights for each keyword in document
3. select characteristic terms for each document set
4. test generalisation ability of each term list
terms in documents (individual) ⇒ concepts in documents (global)
⇓terms over authors
Motivation Introduction ICA Application ICA Dependencies Summary
Principal Component Analysis (PCA)I Principal component analysis (PCA) finds directions of maximum
variance in dataI Reduction of feature space by selecting those directions explaining
most of the varianceI Decorrelation of features, so that new data representation only
varies within each featureI Works best on gaussian distributions
Motivation Introduction ICA Application ICA Dependencies Summary
ICA and PCA: a comparison
I ICA is computationally superior to PCAI may not generally be superior (depending on application)
I PCA acts as preprocessing method for ICA
Figure: PCA vs. ICA 3
3http://www.sciencedirect.com/science/article/pii/S0957417406001308
Motivation Introduction ICA Application ICA Dependencies Summary
ICA Ambiguity
I components are extracted “randomly” depending on initial weight
I components are not ranked as in PCA
I ambiguity of signal variance and sign of ICs
I how many components to extract for application???
Given only the mixed signals and the assumption of statisticalindependence of the estimated signals ⇒ ICA retrieves original sources
Motivation Introduction ICA Application ICA Dependencies Summary
Objective Function and Statistical Independence
Statistical independence of two random variables y1, y2:
p(y1, y2) = p(y1)p(y2).
Measures of Statistical Independence:
I Minimization of mutual informationI Kullback-Leibler Divergence and Maximum-entropy
I Maximization of non-GaussianityI Kurtosis and Negentropy
Motivation Introduction ICA Application ICA Dependencies Summary
What is Statistical Independence?I Intuitively, statistical independence of two signals means, that at
each time point, signal 1 does not give any information aboutposition of signal 2 and vice versa
⇒ consequently: permuting the values of one signal and thus changingthe mapping at each time point should not have any effect
Figure: Mapping of two independent signals
Motivation Introduction ICA Application ICA Dependencies Summary
So...Independent component Analysis...
I is a method for blind source separation and feature extraction
I Given only mixed signals and statistical independence assumptionestimates original sources or latent variables
I Computationally expensive, best to try similar, but simplermethods first
Motivation Introduction ICA Application ICA Dependencies Summary
Where to find ICA
There are different implementations of ICA: Infomax, JADE,...FastICA
I implementation for FastICA AlgorithmI For R: http://cran.r-project.org/web/packages/fastICA/index.htmlI For Matlab: http://research.ics.aalto.fi/ica/fastica/
Motivation Introduction ICA Application ICA Dependencies Summary
References I
Altangerel Chagnaa, Cheol-young Ock, Chang-beom Lee, and PurevJaimai.Feature Extraction of Concepts by Independent Component Analysis,2007.
Timo Honkela and Aapo Hyvarinen.Linguistic Feature Extraction using Independent ComponentAnalysis.In Proceedings of IJCNN’04, pages 279–284, Budabest, Hungary,July 2004.
Aapo Hyvarinen and Erkki Oja.Independent component analysis: algorithms and applications.Neural Networks, 13:411–430, 2000.
T. Kolenda, L. K. Hansen, and S. Sigurdsson.Independent Components in Text, 2000.