Post on 09-Apr-2018
transcript
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 1/39
Computational Intelligence:
Lecture 20
Clustering to form semantic concepts –
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 2/39
Overview• Interpretability of fuzzy representation.
• Histogram analysis• LVQ (Linear Vector Quantization)
• FCM (Fuzzy C-Means)
• FKP (Fuzzy Kohonen Partitioning)
Partitioning)
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 3/39
Semantic Label Clustering• Semantic properties of a linguistic variable
– , , , ,
where L is the name of the variable; T (L) is the linguistic term setof L; U is a universe of discourse; G is a syntactic rule whichgenerates T (L); and M is a semantic rule that associates eachT (L) with its meaning.
– Each linguistic term set is characterized by a fuzzyset which is described using a membership function
0.4
0.6
0.8
1
μ T
( x )
0.4
0.6
0.8
1
μ G
( x )
Gaussian MF
0 2 4 6 8 100
0.2
x
0 2 4 6 8 100
0.2
x
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 4/39
• linguistic variable x named L=“performance ”
• five lin uistic terms
where T (L)={“very small ”, “small ”, “medium ”, “large ” and “ very large ”}.• Semantic assignment M is shown in the figure – normal and convex
ordering such that “ very small ” “small ” “medium ” “large ” “very large ”.
• universe of discourse
≺≺≺ ≺
U =[0, 100] of the base variable x
0.8
1
verysmall small medium large
verylarge
≺
0.4
0.6 μ T ( x
)
0 20 40 60 80 1000
.
x (performance)
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 5/39
Criteria of Inter retabilit•
of discourse•
1 x =• Convex :
i
min , X X X x y z y x zμ μ μ ≤ ≤ ⇒ ≥
• Ordered : 1 2 j n X X X X ≺ ≺
1 2 X X ≺ denotes X1 precedes X2
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 6/39
Clustering• Clustering is a method that organizes patterns
into clusters such that atterns within a cluster
are more similar to each other than patterns inother clusters.
• When the crisp partition in classical clustering
analysis is replaced with a fuzzy partition or afuzzy pseudo-partition , it is referred to as fuzzy clustering
• xamp es: o onen , e ze ,MLVQ (Ang and Quek), DIC (Tung and Quek)
.
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 7/39
•
P2 :
P3:
10/7/2008 7
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 8/39
1: 2: 3:
P1
P2 b e r
b e r
N u m
N u m
P3
10/7/2008 8
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 9/39
1: 2: 3:
P2
b e r
e t e r P1
P3
3 N u m
P e r i
P2
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 10/39
•
sets: –
designing a classifier – -
evaluating the obtained classifier
• m by (n+1) matrix, where m is the number
of features.
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 11/39
Features Class
Area Perimeter Class
3 6 P1
5 7 P1
4 4 P1
7 6 P1
Design set:Odd-indexed entries
15 10 P2
14 12 P2
17 13 P2
Test set:Even-indexed entries
14 19 P313 20 P3
15 22 P3
… … …
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 12/39
Flowchart for HistogramFlowchart for HistogramAnalysisAnalysis
Feature Feature
From image toextraction extraction ea ures
a a a a reduction reduction None
Probability Probability estimate estimate Histogram analysis
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 13/39
• .
points50 sam les
3 binsfrom aGaussian
distribution
25 bins10 bins
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 14/39
Histo ram Anal sisHisto ram Anal sis• Properties: –
require explicit use of density functions – Dilemma between no. of intervals vs. no. of points – Rule of thumb: no. of intervals is equal to the square
root of no. of points – – To convert to density functions, the total area must be
unity – Can be used in any number of features, but subjected
to curse of dimensionality
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 15/39
σ = 0.1
σ = 0.3
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 16/39
•
– Also known as Parzen estimator – – Can be used in multi-features estimation –
154opt ⎛ ⎞=
normal optimal smoothing strategy
3n⎝ ⎠σ denotes the standard deviation of the distribution
W. Bowman and A. Azzalini. A lied Smoothin Techni ues for Data Anal sis: The Kernel Approach with S-Plus Illustrations . New York:Oxford University Press, 1997.
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 17/39
Learning Vector Quantization• LVQ are unsupervised neural networks that determine the weights
for cluster centers in an iterative and sequential manner • Each output neuron has a weight vector v j
that is adjusted during learning.• The winner whose wei ht has
x1 y1
v1
the minimum distance from the
input, updates its weights and
.
.
.
x2
y j
v jw j1
w
w j2
• Repeated until the weights are
forced to stabilize through the.
xiym
wm1
wmi
w
vm
.
winner
wm2
.
.
ji
specification of a learning rate.
.
.
xn yc
mn
vc
inputlayer
outputlayer
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 18/39
LVQ – Cont’d( )( ) ( )min for j 1..cT T
i j j x v x v− = − =
T T T
-
( 1)
( )
if j i j jT
j T j
v x v
v v
α + − =
=⎨ ≠⎪⎩
,clusters, x is the input vector, v i is the i th cluster centre
and is the learning constantα Pseudo Code: (1) Define number of clusters c and smallterminating condition (2) Initialise weights (3)Determinin winnin neuron based on distance 4
ε
Update winner:(5) Determine terminating condition, else repeat with
(T) (T-1) ( ) ( 1) (T)i iv v ( ) for i NT T
i k i x vα −= + − ≤
new vec or
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 19/39
Fuzz C-Means FCM – Bezdek• A fuzzy pseudo-partition of a finite data setc
1
( ) 1 for all k 1..ni k i
xμ =
= =
n
• An objective function for fuzzy clustering is
1
or a ..ci k k
x nμ =
=
(m defines the degree of fuzziness):2
n cm
1 1m i k k i
k i x x vμ = == −
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 20/39
– ’•
– Define number of clusters (c), degree of fuzziness (m)and terminating condition ( ε )
– Init t and pseudo parition p 0
– Compute cluster centres: v 1, v 2, …v i … v c
( ) 1
( ( ))for i 1..c
nm
i k k T k
i n
x xv
μ == =
∑
1( ( ))
mi k
k x
=∑
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 21/39
– ’•
– Update new Pseudo Partition:11 −
2 1( )( 1)
2( )1
( ) for i 1..c, k 1..nmT c
k iT i k
T j k j
x v x
x vμ
−+
=
⎛ ⎞⎜ ⎟−⎜ ⎟= = =⎜ ⎟
⎜ ⎟⎜ ⎟−∑
– Compare distance between the partitions E= p t+1 – p t
c n
–
( 1) ( ) ( 1) ( )
1 1( ) ( )T T T T i k i k
i k E x xμ μ + +
= == Ρ − Ρ = −∑∑
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 22/39
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 23/39
• - -,
optimization algorithm..
• Unable to perform on-line training.• performance depends on a good choice of
weighting exponential m.
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 24/39
0.4
0.6
0.8
1.0
b e r s
h i p d e g r e e μ
( x )
0.4
0.6
0.8
1.0
b e r s
h i p d e g r e e
( x )
0.0
0.2
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9Sepal length (cm)
M e
0.0
0.2
2 2.4 2.8 3.2 3.6 4 4.4Sepal width (cm)
M e
1.0 1.0
0.2
0.4
0.6
0.8
M e m
b e r s
h i p d e g r e e
( x )
0.2
0.4
0.6
0.8
M e m
b e r s
h i p d e g r e e
( x )
sentosa
versicolor
0.0
1 2 3 4 5 6Petal length (cm)
0.0
0.1 0.5 0.9 1.3 1.7 2.1 2.5Petal width (cm)
IRIS data set - FCM with m=1.5, ε=0.0001
trapezoidal –like membership functions
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 25/39
– ’
0.4
0.6
0.8
1.0
e r s
h i p d e g r e e μ
( x )
0.4
0.6
0.8
1.0
b e r s
h i p d e g r e e
( x )
0.0
0.2
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9Sepal length (cm)
M e m
0.0
0.2
2 2.4 2.8 3.2 3.6 4 4.4Sepal width (cm)
M e m
1.0 1.0
0.4
0.6
0.8
m b e r s
h i p d e g r e e
( x )
0.4
0.6
0.8
m b e r s
h i p d e g r e e
( x )
sentosa
0.0
.
1 2 3 4 5 6Petal length (cm)
M
0.0
.
0.1 0.5 0.9 1.3 1.7 2.1 2.5Petal width (cm)
M versicolor
virginica
- = . , ε= .
Gaussian-like membership functions
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 26/39
• i closest v v−= i σ
0.8
1.0
e μ
( x )0.8
1.0
e e
( x )
0.0
0.2
0.4
0.6
M e m
b e r s
h i p d e g r
0.0
0.2
0.4
0.6
M e m
b e r s
h i p d e g r
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9
Sepal length (cm)
2 2.4 2.8 3.2 3.6 4 4.4Sepal width (cm)
0.8
1.0
r e e
( x )
0.8
1.0
r e e
( x )
0.0
0.2
0.4
0.6
M e m
b e r s
h i p d e g
0.0
0.2
0.4
0.6
M e m
b e r s
h i p d e g
sentosa
versicolor
virginica
1 2 3 4 5 6Petal length (cm)
0.1 0.5 0.9 1.3 1.7 2.1 2.5Petal width (cm)
MLVQ with λ=0.02, σ=1.5, ε=0.0001
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 27/39
– ’ • i closest v v−= i σ
0.8
1.0
r e e μ
( x )
0.8
1.0
r e e
( x )
0.0
0.2
0.4
.
M e m
b e r s
h i p d e g
0.0
0.2
0.4
.
M e m
b e r s
h i p d e g
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9
Sepal length (cm)
. . . . .Sepal width (cm)
0.8
1.0
r e e
( x )
0.8
1.0
r e e
( x )
0.0
0.2
0.4
.
M e m
b e r s
h i p
d e g
0.0
0.2
0.4
.
M e m
b e r s
h i p
d e g
sentosa
versicolor
virginica
MLVQ with λ=0.02, σ=3.0, ε=0.0001
1 2 3 4 5 6Petal length (cm)
0.1 0.5 0.9 1.3 1.7 2.1 2.5Petal width (cm)
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 28/39
•
can be described by a fuzzy interval, , ,
a centroid v μ(x)
also known as a
trapezoidal fuzzy number α γ δ
x0 v
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 29/39
– ’ • The subinterval where x =1 is called the kernel of
the fuzzy interval, and the subinterval [ α, δ] is called thesupport.• , γ = erne o e uzzy n erva , an• [α, δ]=support of the fuzzy interval.•
used to derive the centroid v• it cannot derive the
0 if x or x
if x
x
α
α α β
α
< >⎪ −⎪ ≤ ≤
−parameters ( α, β, γ, δ)of the trapezoidal-shaped
(x) 1 if x
x
μ β γ
δ =⎨ ≤ ≤⎪⎪ −
membership function
δ γ −⎩
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 30/39
The Fuzzy Kohonen Partitionalgorithm - supervised
• Define:
– c as the number of classes, – λ≤1/Ω as the learning constant, where Ω=number of data vectors,
– η as the learning width and a small positive number ε as astopping criterion; n=total number of data vectors
• n a se we g s:( ) ( ) ( )( )(0) 2min max min
for i 1..c, k 1..n
i k k k k k k v x x x
c
+= + −
= =
• Determine the i th cluster that the data x k belongs andUpdate the weights v i of the i th cluster
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 31/39
The Fuzzy Kohonen Partitionalgorithm – supervised (cont’d)• ompute error to c uster an erence n error etween
iteration: ( 1) ( 1)n
T T e x v+ += −1k =
( 1) ( 1) ( )T T T + + −• Repeat: while ¬ (de (T+1) ≤ε )
– End of determining centroids
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 32/39
The Fuzzy Kohonen Partitionalgorithm – supervised (cont’d)• n t a ze
– where ϕi is the pseudo weight of v i.1..cifor ====== iiiiii vϕ γ δ β α
• Determine the i th cluster that the data x belon s and
i=1 i=2 i=3
Update the pseudo weights ϕi of the i th cluster
−i i k i
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 33/39
The Fuzzy Kohonen Partitionalgorithm – supervised (cont’d)• p ate t e our po nts o t e rapezo a uzzy um er
(Tr FN)
m in( , )
m in
i i k xα α =
=m ax( , )i i iγ γ ϕ =
m ax( , )i i k xδ δ =
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 34/39
The Fuzzy Kohonen Partitionalgorithm Results
0.8
1.0
g r e e μ
( x )
0.8
1.0
g r e e
( x )
0.0
0.2
0.4
.
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9
M e m
b e r s
h i p d e
0.0
0.2
0.4
.
2 2.4 2.8 3.2 3.6 4 4.4
M e m
b e r s
h i p d e
Sepal length (cm) Sepal width (cm)
0.6
0.8
1.0
e g r e e
( x )
0.6
0.8
1.0
e g r e e
( x )
0.0
0.2
0.4
1 2 3 4 5 6
M e m
b e r s
h i p
0.0
0.2
0.4
0.1 0.5 0.9 1.3 1.7 2.1 2.5
M e m
b e r s
h i p
sentosa
versicolor
virginica
e a eng cm e a w cm
FKP with λ=0.02, η=0, ε=0.0005
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 35/39
The Fuzzy Kohonen Partitionalgorithm Results
0.8
1.0
e
( x )
0.8
1.0
e
( x )
0.0
0.2
0.4
0.6
M e m
b e r s
h i p d e g r e
0.0
0.2
0.4
0.6
M e m
b e r s
h i p d e g r e
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9Sepal length (cm)
2 2.4 2.8 3.2 3.6 4 4.4Sepal width (cm)
0.8
1.0
e
( x )
0.8
1.0
e
( x )
0.0
0.2
0.4
0.6
M e m
b e r s
h i p d e g r e
0.0
0.2
0.4
0.6
M e m
b e r s
h i p d e g r e
sentosa
versicolor
virginica
FKP with λ=0.02, η=0.5, ε=0.0005
1 2 3 4 5 6Petal length (cm)
0.1 0.5 0.9 1.3 1.7 2.1 2.5Petal width (cm)
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 36/39
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 37/39
-1.0 1.0
0.4
0.6
0.8
b e r s h i p d e g r e
e
( x )
0.4
0.6
0.8
b e r s h i p d e g r e
e
( x )
0.0
0.2
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9Sepal length (cm)
M e m
0.0
0.2
2 2.4 2.8 3.2 3.6 4 4.4Sepal width (cm)
M e m
0.4
0.6
0.8
1.0
r s h i p d e g r e e
( x )
0.4
0.6
0.8
1.0
r s h i p d e g r e e
( x )
0.0
0.2
1 2 3 4 5 6Petal length (cm)
M e m b e
0.0
0.2
0.1 0.5 0.9 1.3 1.7 2.1 2.5Petal width (cm)
M e m b e
sentosa
versicolor
virginica
PFKP with α=0.02, η=0, ε=0.0005
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 38/39
– ’ 1.0 1.0
0.4
0.6
0.8
m b e r s
h i p d e g r e e
( x
0.4
0.6
0.8
m b e r s
h i p d e g r e e
( x
0.0
.
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9Sepal length (cm)
M
0.0
.
2 2.4 2.8 3.2 3.6 4 4.4Sepal width (cm)
M
1.0 1.0
0.4
0.6
0.8
e r s
h i p d e g r e e
( x )
0.4
0.6
0.8
e r s
h i p d e g r e e
( x )
0.0
0.2
1 2 3 4 5 6Petal length (cm)
M
e m
0.0
0.2
0.1 0.5 0.9 1.3 1.7 2.1 2.5Petal width (cm)
M
e m sentosa
versicolor virginica
with λ=0.02, η=0.01, ε=0.0005
8/8/2019 Clustering for Semantic labels - Wk9
http://slidepdf.com/reader/full/clustering-for-semantic-labels-wk9 39/39
•
Fuzzy Kohonen Partition (PFKP), were proposedto directly derive appropriate membershipfunctions from training data.
• Both algorithms directly derive trapezoidalmembership functions that are convex andnormal from training data while the latter derive
-partition of the input space