1 Neural Network-Based Clustering A. Selçuk MERCANLI Supervisor: Assist. Prof.Dr. Turgay...

1

Neural Network-Based Clustering

A. Selçuk MERCANLI

Supervisor: Assist. Prof.Dr. Turgay İBRİKÇİ

2

Why NN?

• NN have solved a wide range of problems and have a good learning capabilities. Their strengths include adaptation, ease of implementation, parallezition, speed and flexibility. NN based clustering is closely related to the consept of competitive learning.

3

w: weigth, initialy random

k: # of clusterss(x,wj)=

d

iijixw

1

4

Updating Weights

))()(()()1( twtxtwtw jjj

: Learning rate.

To avoid the problem of unlimited growth of the weight, the weight vector must be normalized if the input pattern is normalized.

İf it’s zero no learning, if it’s 1 fast learning

5

WTA - WTM The competitive learning paradigm allows learning for a particular

winning neuron that matches best with the given input pattern. Thus, it is also known as winner - take – all (WTA)

On the other hand, learning can also occur in a cooperative way,

which means that not just the winning neuron adjusts its prototype, but all other cluster prototypes have the opportunity to be adapted based on how proximate they are to the input pattern. The learning scheme is called soft competitive learning or winner - take - most (WTM)- Hard competition

Only one neuron is activated- Soft competition

Neurons neighboring the true winner are activated.

6

HARD COMPETITIVE LEARNING CLUSTERING

• Online K-means Algorithm

• Leader Follower Clustering Algorithm

• Adaptive Resonance Theory

• Fuzzy ART

7

Online K-means Algorithm

1. Initialize K cluster prototype vectors, m1 , … , mK ℜd randomly;

2. Present a normalized input pattern x ℜd ;3. Choose the winner J that has the smallest Euclidean

distance to x ,

J =argmin ||x−mj ||;4. Update the winning prototype vector towards x ,

mJ(new) =mJ(old)+η(x−mJ(old)), where η is the learning rate;

5. Repeat steps 2 – 4 until the maximum number of steps is reached.

8

K-means Algorithm

iterate { Compute distance from all points to all k- centers Assign each point to the nearest k-center Compute the average of all points assigned to

all specific k-centers Replace the k-centers with the new averages}

From Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007, Distributed Computing Seminar, p 12

9

Disadvantages of K-means

• In the begining of K-means algorithm, it requires a certain determination of number of clusters in advance. The number of clusters must be estimated via the procedure of cluster analysis. An inappropriate selection of number of clusters may distort the real clustering structure, that’s why Leader follower is needed.

10

Disadvantages of K-means

• η , the learning rate, will be very small in the last stages that cause a disadvantage of not learning very well of new patterns.

• where η0 and η1 are the initial and final values of the learning rate, respectively, and t1 is the maximum number of iterations allowed

11

Leader - Follower Clustering Algorithm

1. Initialize the first cluster prototype vector m1 with the first input pattern;

2. Present a normalized input pattern x ;3. Choose the winner J that is closest to x based on the Euclideandistance,

j=argmin ||x−mj ||;4. If || x − mj || < θ , update the winning prototype vector,

mJ(new) =mJ(old)+η(x−mJ(old))where η is the learning rate. Otherwise, create a new cluster with theprototype vector equal to x ;

5. Repeat steps 2 – 4 until the maximum number of steps is reached.

12

Leader - Follower

• Find the closest cluster center– Distance above threshold ? Create new cluster– Or else, add instance to cluster and update cluster

center

Distance > Threshold

From Johan Everts, Clustering algorithms, Kunstmatige Intelligentie, p 31

13

Performance Analysis

• K-Means– Depends a lot on a priori knowledge (K)– Very Stable

• Leader Follower– Depends a lot on a priori knowledge

(Threshold)– Faster but unstable

From Johan Everts, Clustering algorithms, Kunstmatige Intelligentie, p 39

14

Adaptive Resonance Theory

• An important problem with competitive learning - based clustering is stability. The stability of an incremental clustering algorithm in terms of two conditions: “

• (1) No prototype vector can cycle, or take on a value that it had at a previous time (provided it has changed in the meantime).

• (2) Only a finite number of clusters are formed with infinite presentation of the data. ” The first condition considers the stability of individual prototype vectors of the clusters, and the second one concentrates on the stability of all the cluster vectors.

15


• K-means and Leader Follower algorithms dosen’t produce stable clusters. The plasticity of two algorithms may cause lost of previously learned rules.

• Adaptive resonance theory (ART) was developed by Carpenter and Grossberg (1987a, 1988)

• ART is not, as is popularly imagined, a neural network architecture. It is a learning theory hypothesizing that resonance in neural circuits can trigger fast learning.

16


• Stability-Plasticity Dilemma• Stability: system behaviour doesn’t change

after irrelevant events • Plasticity: System adapts its behaviour

according to significant events• Dilemma: how to achieve stability without

rigidity and plasticity without chaos? – Ongoing learning capability– Preservation of learned knowledge

From:Arash Ashari,Ali Mohammadi, ART powerpoint

17

ART-1

• The basic ART1 architecture consists of two - layer nodes or neurons, the feature representation field F1 , and the category representation field F2

• The neurons in layer F1 are activated by the input pattern, while the prototypes of the formed clusters are stored in layer F2 .

18

ART-1 Architecture

19

ART-1

• The two layers are connected via adaptive weights:a bottom – up weight matrix and a top - down weight matrix

• F2 performs a winner - take - all competition, between a certain number of committed neurons and one uncommitted neuron. The winning neuron feeds back its template weights to layer F1. This is known as top - down feedback expectancy.This template is compared with the input pattern

20

ART-1

• If the match meets the vigilance criterion, weight adaptation occurs, where both bottom - up and top - down weights are updated simultaneously. This procedure is called resonance, which suggests the name of ART. On the other hand, if the vigilance criterion is not met, a reset signal is sent back to layer F2 to shut off the current winning neuron.

• This new expectation is then projected into layer F1 , and this process repeats until the vigilance criterion is met. If an uncommitted neuron is selected for coding, a new uncommitted neuron is created to represent a potential new cluster. It is clear that the vigilance parameter ρ has a function similar to that of the threshold parameter θ of the leader - follower algorithm.

21

ART-1 Flowchart

22

Fuzzy ART

• FA maintains architecture and operations similar to ART1 while using the fuzzy set operators to replace the binary operators so that it can work for all real data sets. We describe FA by emphasizing its main difference with ART1 in terms of the following five phases, known as preprocessing, initialization, category choice, category match, and learning.

• Preprocessing. Each component of a d - dimensional input pattern x = ( x1 , … , xd ) must be in the interval [0,1].

23

Fuzzy ART

• Initialization. The real - valued adaptive weights W = { w ij }, representing the connection from the i th neuron in layer F2 to the j th neuron in layer F1 , include both the bottom - up and top - down weights of ART1. Initially, the weights of an uncommitted node are set to one. Larger values may also be used, however, this will bias the tendency of the system to select committed nodes

24

||

||

j

j

w

wx

Fuzzy ART

• Category choice. After an input pattern is presented, the nodes in layer F2 compete by calculating the category choice function, defined as

Tj=

Where is the fuzzy AND operator defined by

(x y)i= min (xi,yi) ,

25

Fuzzy ART

• Category match. The category match function of the winning neuron is then tested with the vigilance criterion. If

resonance occurs. Otherwise, the current winning neuron is disabled and a new neuron in layer F2 is selected and examined with the vigilance criterion. This search process continues until upper criteria sattisfied.

26

Fuzzy ART

• Learning. The weight vector of the winning neuron that passes the vigilance test at the same time is updated using the following learning rule,

: [0 1] learning rate parameter.

27

SOFT COMPETITIVE LEARNING CLUSTERING

• Leaky Learning,• One of the major problems with hard competitive learning is the

underutilized or dead neuron problem, which refers to the possibility that the weight vector of a neuron is initialized farther away from any input patterns than other weight vectors so that it has no opportunity to ever win the competition and, therefore, no opportunity to be trained. One solution to addressing this problem is to allow both winning and losing neurons to move towards the presented input pattern, but with different learning rates.

• where ηw and ηl are the learning rates for the winning and losing neurons, respectively, and ηw >> ηl .

28

• Conscience Mechanism

we need to modify the distance definition described in upside. Desieno(1988) adds a bias term bj to the squared Euclidean distance.x : Data setWj : j=1,2,…K neurons weightsbj : Bias term

29

• Rival Penalized Competitive Learning

• x : Data set• Wj : j=1,2,…K neurons weights• j : Bias term

30

Learning Vector Quantization

• Learning vector quantization (LVQ) , (Kohonen1990) is a unsupervised learning pattern classification method. Essentially same as the Kohonens SOM. LVQ algo is to find the output unit that is closest to the input vector. If x and wt belong to same class, then we move the weights toward the new vector; if they belong to different classes then we move the weights away from this input vector.(Fundamentals of Neural Networks, L.Fausett,)

31

Flowchart of LVQ

X: input pattern

J(w,x) : cost function

w: weights

32

LVQJ is the winning neuron and cost function defined on locally weighted error between x and w

33

LVQ

: Prespesified threshold

34

LVQ Application

10 number of data clustered to two cluster and wieved by red and cyan colors

35

SOM

• A competitive network. Output neurons of the network compete among themselves to be activated or fired. Neighboorhood function usually decrease by linear, rectangular or hexagonal.

36

Neural Network a Comprehensive Foundation, Simon Haykin , Prentice Hall, p 467

37

SOM Neighboorhood

Application of neural Network and other Learning Technologies in Process Engineering, I.M. Majtaba, M.A. Hussain, Imperial College Press, 2001, P 53

38

SOM BMU

Best matching unit, Update weights of winner and neighbours

Decrease learning rate & neighbourhood size

39

Flowchart of SOFM

40

Basic steps of SOFM

• 1. Determine the topology of the SOFM. Initialize the weight vectors w j (0) for j = 1, … , K , randomly;

• 2. Present an input pattern x to the network. Choose the winning node J that has the minimum Euclidean distance to x , i.e.

J=argmin(||x−wj||)• 3. Calculate the current learning rate and size of the

neighborhood;• 4. Update the weight vectors of all the neurons in the

neighborhood of J using wj(t+1)=wj(t)+ (t)hji(t)(x-wj(t)) ;• 5. Repeat steps 2 to 4 until the change of neuron

position is below a prespecified small positive number.

41

SOM Application

Learning A character

42

SOM Application

Learning circle with SOM

43

SOM Application

SOM Examples from Bernd Fritzke, Ruhr Univercity Draft 5 April 1997, p32

44

Neural Gas

NG is capable of adaptively determining the updating of the neighborhood by using a neighborhood ranking of the prototype vectors within the input space, rather than a neighborhood function in the output lattice

45

Neural Gas

• h λ ( k j ( x, W )) is a bell - shaped curve

• Prototype vectors are updated as

• Learning rate η and characteristic decay constant λ

• η0 and ηf : initial and final values• λ0 and λf : initial and final decay constants• T : maximum number of iteration

hλ (kj(x,W)) = exp(−kj(x,W) λ ).

wj(t + 1) = wj(t) +η(t)hλ (kj(x,W))(x − wj(t)).

46

NG Algorithm

The major process of the NG algorithm is as follows:1. Initialize a set of prototype vectors W = { w1 , w2 , … ,

wK } randomly;2. Present an input pattern x to the network. Sort the index

list in order from the prototype vector with the smallest Euclidean distance from x to the one with the greatest distance from x ;

3. Calculate the current learning rate and hλ ( k j ( x, W )) (bell shaped curve). Adjust the prototype vectors using the learning rule

4. Repeat steps 2 and 3 until the maximum number of iterations is reached.

47

NG Application

NG always adding new centers and stops when it reaches maxiteration

48

NG Application

NG Examples from Bernd Fritzke, Ruhr Univercity Draft 5 April 1997, p22

49

Growing Neural Gas

• A type of SOM. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space.

50

Growing Neural Gas

• When prototype learning occurs, not only is the prototype vector of the winning neuron J1 updated towards x , but the prototypes within its topological neighborhood NJ1 are also adapted

• Different from NG, GCS, or SOFM, GNG is developed as a self - organizing network that can dynamically increase (usually) and remove the number of neurons in the network. A succession of new neurons is inserted into the network every λ iterations near the neuron with the maximum accumulated error. At the same time, a neuron removal rule could also be used to eliminate the neurons featuring the lowest utility for error reduction

58

GNG

GNG Examples from Bernd Fritzke, Ruhr Univercity Draft 5 April 1997, p29

59

Some Applications

Magnetic Resonance Imaging SegmentationMRI provides a visualization of the internal tissues and organs in the living organism, which is valuable in its applications in disease diagnosis (such as cancer and heart and vascular disease ), treatment and surgical planning. MRI segmentation can be formulated as a clustering problem in which a set of feature vectors, which are obtained through transforming image measurements and positions, is grouped into a relatively small number of clusters.

60

Magnetic Resonance Imaging Segmentation

• After the patient was given Gadolinium, the tumor on the T1 - weighted image (Fig. 5.17 (d)) becomes very bright and is isolated from surrounding tissue.

From N. Karayiannis and P. Pai. Segmentation of magnetic resonance images using fuzzy algorithms for learning vector quantization. IEEE Transactions on Medical Imaging, vol. 18, pp. 172 – 180, 1999. Copyright © 1999 IEEE.)

61

Condition Monitoring of 3G Cellular Networks

• The 3G mobile networks combine new technologies such as WCDMA and UMTS and provide users with a wide range of multimedia services and applications with higher data rates (Laiho et al., 2005 ). At the same time, emerging new requirements make it more important to monitor the states and conditions of 3G cellular networks. Specifically, in order to detect abnormal behaviors in 3G cellular systems, four competitive learning neural networks, LVQ, FSCL, SOFM (see another application of SOFM in WCDMA network analysis in Laiho et al. (2005) ), and NG, were applied to generate abstractions or clustering prototypes of the input vectors under normal conditions, which are further used for network behavior prediction

62

Condition Monitoring of 3G Cellular Networks

The clustering prototypes provide a good summary of the normal behaviors of the cellular networks, which can then be used to detect abnormalities.

63

Summary

Neural network – based clustering is tightly related to the concept of competitive learning. Prototype vectors, associated with a set of neurons in the network and representing clusters in the feature or output space, compete with each other upon the presentation of an input pattern. The active neuron or winner reinforces itself (hard competitive learning) or its neighborhood within certain regions (soft competitive learning). More often, the neighborhood decreases monotonically with time.One important problem that learning algorithms need to deal with is the stability and plasticity dilemma. A system should have the capability of learning new and important patterns while maintaining stable cluster structures in response to irrelevant inputs.

Date post:	29-Dec-2015
Category:	Documents
Upload:	roy-gregory
View:	224 times
Download:	2 times

1 Neural Network-Based Clustering A. Selçuk MERCANLI Supervisor: Assist. Prof.Dr. Turgay...

Documents