+ All Categories
Home > Documents > Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Victoria J. Hodge Jim Austin

Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Victoria J. Hodge Jim Austin

Date post: 30-Jan-2016
Category:
Upload: hashim
View: 45 times
Download: 0 times
Share this document with a friend
Description:
國立雲林科技大學 National Yunlin University of Science and Technology. Hierarchical Growing Cell Structures: TreeGCS. Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Victoria J. Hodge Jim Austin. - PowerPoint PPT Presentation
32
Intelligent Database Systems Lab Advisor Dr. Hsu Graduate Ching-Lung Chen Author Victoria J. Ho dge Jim Austin Hierarchical Growing Cell Structures: TreeGCS 國國國國國國國國 National Yunlin University of Science and T echnology IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 13, NO.2, MARCH/APRIL 2001
Transcript
Page 1: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Advisor : Dr. Hsu

Graduate : Ching-Lung Chen

Author : Victoria J. Hodge

Jim Austin

Hierarchical Growing Cell Structures: TreeGCS

國立雲林科技大學National Yunlin University of Science and Technology

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 13, NO.2, MARCH/APRIL 2001

Page 2: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Outline

Motivation Objective Introduction GCS TreeGCS Evaluation

Single-pass TreeGCS Cyclic TreeGCS

Conclusions Personal Opinion Review

N.Y.U.S.T.

I.M.

Page 3: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Motivation

The GCS network topology is susceptible to the ordering of the input vectors.

The original algorithm to visualization of dendograms for large data sets as there are too many leaf nodes and branches to visualize.

Parameter selection is a combinatorial problem.

N.Y.U.S.T.

I.M.

Page 4: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Objective

To overcome the instability problem in GCS approach.

To overcome the visualization of dendograms for large data sets.

To recommendations for effective parameter combinations for TreeGCS that are easily derived.

N.Y.U.S.T.

I.M.

Page 5: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Introduction 1/3

Clustering algorithms have been investigated previously

However, nearly all clustering techniques suffer from at least one of the following:

1. assume specific forms for the probability distribution e.x. : normal

2. require unique global minima of the input probability distribution

3. The cannot handle identical cluster similarities.

4. do not scale well as the training time is often O(n2)

5. Require prior knowledge to set parameters.

N.Y.U.S.T.

I.M.

Page 6: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Introduction 2/3

The hierarchy may e formed agglomeratively (buttoon-up) by progressively merging the most similar clusters.

TreeGCS is an unsupervised, growing, self-organizing hierarchy of nodes able to form discrete clusters. In TreeGCS, high dimensional inputs are mapped onto a two-dimensional hierarchy reflecting the topological ordering of the input space.

TreeGCS is similar to HiGS.

N.Y.U.S.T.

I.M.

Page 7: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Introduction 3/3

However, the structure of HiGS does not match our requirements.

1. The toplology induced for HiGS is not a tree configuration as the parent must be a member of a cluster of cardinality at least three.

2. The HiGS algorithm generates child clusters and periodically deletes superfluous children so, at any particular time, the tree representation may be incorrect.

Our proposal maintains the correct cluster topology at each epoch.

N.Y.U.S.T.

I.M.

Page 8: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

GCS is a two-dimensional structure of cell linked by vertices. Each cell has a neighborhood defined as those cells directly linked by a vertex to the cell.

The adaptation strength is constant over time and only the best matching unit (bmu) and its direct topological neighbors are adapted, unlike SOM.

Each cell has a winner counter denoting the number of times that cell has been the bmu.

GCS 1/7N.Y.U.S.T.

I.M.

Page 9: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

GCS 2/7

The GCS algorithm is described below. Initialized in (1) and (2~7) represent one iteration.

1. A random triangular structure of connected cells with attached vectors ( ) and E representing the winner counter.

2. The next random input vector is selected from the input vector density distribution.

3. The bmu is determined for and the bmu’s winning counter is incremented.

N.Y.U.S.T.

I.M.

nci

w

Page 10: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

GCS 3/7

4. The bmu and its neighbor are adapted toward by adaptation increments set by the user.

5. If the number of input signals exceeds a threshold set by the user, a new cell ( ) is inserted between the cell with the highest winning counter ( ) and its farthest neighbor (wf) Fig. 2

N.Y.U.S.T.

I.M.

newwbmuw

Page 11: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

GCS 4/7

5. The winner counter of all neighbors of is redistributed to donate fractions of the neighboring cells’ winning counters to the new cell.

The winner counter for the new cell is set to the total decremented:

6. After a user-specified number of iterations, the cell with the greatest mean Euclidean distance between itself and its neighbors is deleted and any cells within the neighborhood that would be left “dangling” are also deleted (see Fig. 3).

N.Y.U.S.T.

I.M.

neww

Page 12: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

GCS 5/7

7. The winning counter variable of all cells is decreased by a user-specified factor to implement temporal decay:

N.Y.U.S.T.

I.M.

Page 13: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

GCS 6/7

The user-specified parameters are:

1. The dimensionality of GCS, which is fixed.

2. The maximum number of neighbor connections per cell

3. The maximum cells in the structure,

4. The adaptation step for the winning cell,

5. The adaptation step of the neighborhood,

6. The temporal decay factor;

7. The number of iterations for insertion

8. The number of iterations for deletion.

N.Y.U.S.T.

I.M.

Page 14: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

GCS 7/7

Fritzke has demonstrated superior performance for the GCS over SOMs. Superiority with respect to:

Topology preservation, with similar input vectors being mapped onto identical or closely neighboring neurons ensuring robustness against distortions.

Neighboring cells having similar attached vectors, ensuring robustness. If the dimensionality of the input vectors is greater than the network dimensionality, then the mapping usually preserves the similarities among the input vectors.

Lower distribution-modeling error (which is the standard deviation of all counters divided by the mean value of the counters).

N.Y.U.S.T.

I.M.

Page 15: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

GCS Evaluation

The run-time is complexity for GCS : (numberCells * dimension * numberInputs * epochs)

The GCS algorithm was susceptible in data order.

In this paper, we utilize three data orderings to illustrate the initial susceptibility of the algorithm to input data order and how cycling improves the stability.

N.Y.U.S.T.

I.M.

Page 16: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

TreeGCS 1/2

When the cluster subdivides, new node are added to the tree to reflect the additional cluster (Fig. 4)

Only leaf nodes maintain a cluster list.

The hierarchy generation is run once after each GCS epoch

N.Y.U.S.T.

I.M.

Page 17: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

TreeGCS 2/2

If the number of clusters has decreased, a cluster has been deleted and the associated tree node is deleted. (Fig.5)

All tree nodes except leaf nodes have only a identifier and pointers to their children.

N.Y.U.S.T.

I.M.

Page 18: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Evaluation

The data set is comprised of 41 countries in Europe each one with 47-dimensional real-valued vector.

We use three different orderings of the data to evaluate stability.

1. Alphabetical order of the country names.

2. Middle to front.

3. Numerical order.

N.Y.U.S.T.

I.M.

Page 19: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Dendogram

If we take the dendogram as three cluster, the clusters produced are:1. {Den, Fra, Ger, It, UK}

2. {Lux}

3. {Alb, And, Aus, Bel, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far, Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Mac, Mal, Mon, NL, Nor, Pol, Rom, SM, Ser, Slk, Sln, Spa, Swe, Swi, Ukr}.

The parameter setting for TreeGCS were:

The are six permutations of the three data orders (1,2,3) (1,3,2) (2,3,1) (2,1,3) (3,1,2) (3,2,1).

N.Y.U.S.T.

I.M.

Page 20: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Single-Pass TreeGCS 1/3N.Y.U.S.T.

I.M.

Page 21: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Single-Pass TreeGCS 2/3Alphabetical order of countries (see Fig. 6). 34 {Lux, Ukr} 9 {Den, Fra, Ger, It, Spa, UK} 80 {Alb, And, Aus, Bel, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far, Fin

,Gib, Gre, Hun, Ice, Lat, Lie, Lit, Mac, Mal, Mon, NL, Nor, Pol, Rom, SM, Ser, Slk, Sln, Swi}

3 {Swe}

Middle to front order of countries (see Fig. 6). 11a {Den, Fra, Ger, It, Spa, UK} 11b {Aus, Bel, NL, Swe, Swi, Ukr} 12 {Cze, Fin, Gre, Nor, Rom} 13 {Bul, Eir, Hun, Pol, Slk} 14 {Lux, Ice} 65 {Alb, And, Bos, Bul, Cro, Cyp, Est, Far, Gib, Lat, Lie, Lit,

Mac, Mal, Mon, SM, Ser, Sln}

N.Y.U.S.T.

I.M.

Page 22: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Single-Pass TreeGCS 3/3Numerical order of first attributes (see Fig. 6). 29 {Aus, Bel, Den, Fra, Ger, It, NL, Spa, Swe, Swi,UK} 69 {Alb, And, Bos, Bul, Cro, Cyp, Est, Far, Gib, Ice, Lat, Lie, Lit, Mac,

Mal, Mon, Pol, Rom, SM, Slk, Sln} 8 {Hun, Lux, Ser} 20 {Cze, Eir, Fin, Gre, Nor, Ukr}

N.Y.U.S.T.

I.M.

Page 23: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Cyclic TreeGCS 1/4

D = alphabetical data order

M =middle to front

S = sorted numerically by the first attribute

N.Y.U.S.T.

I.M.

Page 24: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Cyclic TreeGCS 2/4

1. DMS (see Fig. 7). 18 {Den, Fra, Ger, It, NL, Spa, UK} 108 {Alb, And, Aus, Bel, Bos, Bul, Cro, Cyp, Cze, Eir,

Est, Far, Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Lux, Mac, Mal, Mon, Nor, Pol, Rom, SM, Ser, Slk, Sln, Swe, Swi, Ukr}

2. DSM (see Fig. 7). 30 {Bel, Den, Fra, Ger, It, NL, Spa, Swe, UK} 8 {Aus, Lux, Ser, Swi, Ukr} 88 {Alb, And, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far,

  Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Mac, Mal, Mon, Nor, Pol, Rom, SM, Slk, Sln}

N.Y.U.S.T.

I.M.

Page 25: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Cyclic TreeGCS 3/4

3. MSD (see Fig. 7). 10 {Den, Fra, Ger, It, Spa, UK} 116 {Alb, And, Aus, Bel, Bos, Bul, Cro, Cyp, Cze, Eir,

Est, Far, Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Lux, Mac, Mal, Mon, NL, Nor, Pol, Rom, SM, Ser, Slk, Sln, Swe, Swi, Ukr}

4. MDS (see Fig. 7). 17 {Den, Fra, Ger, It, NL, Spa, UK} 32 {Aus, Bel, Cze, Fin, Gre, Nor, Rom, Swe, Swi, Ukr} 11 {Bul, Eir, Hun, Lux, Ser, Slk} 66 {Alb, And, Bos, Cro, Cyp, Est, Far, Gib, Ice, Lat,

Lie, Lit, Mac, Mal, Mon, Pol, SM, Sln}

N.Y.U.S.T.

I.M.

Page 26: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Cyclic TreeGCS 4/4

5. SDM (see Fig. 7). 18 {Den, Fra, Ger, It, NL, Spa, UK} 5 {Cze, Gre, Lux, Ser} 15 {Aus, Bel, Rom, Swe, Swi} 12 {Eir, Fin, Hun, Nor, Ukr} 76 {Alb, And, Bos, Bul, Cro, Cyp, Est, Far, Gib, Ice,

Lat, Lie, Lit, Mac, Mal, Mon, Pol, SM, Slk, Sln}

6. SMD (see Fig. 7). 23 {Bel, Den, Fra, Ger, It, NL, Spa, UK} 90 {Alb, And, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far,

Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Lux, Mac, Mal, Mon, Pol, SM, Ser, Slk, Sln}

13 {Aus, Nor, Rom, Swe, Swi, Ukr}

N.Y.U.S.T.

I.M.

Page 27: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Parameter Settings 1/2N.Y.U.S.T.

I.M.

For the final column, a “T” indicates a static hierarchy and ”F” indicates that the hierarchy never became static.

Page 28: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Parameter Settings 2/2

For the final column, a “T” indicates a static hierarchy and “F” indicates that the hierarchy never became static.

N.Y.U.S.T.

I.M.

Page 29: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Analysis

One solution would be to maintain a list of the hierarchy nodes removed with details of parents and siblings.

Another solution would be a posteriori manual inspection of the run-time output.

N.Y.U.S.T.

I.M.

Page 30: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Conclusions

TreeGCS overcoming the instability problem.

The algorithm adaptively determines the depth of the cluster hierarchy; there is no requirement to prespecify network dimensions as with most SOM-based algorithms.

The superimposed are no user-specified parameters for the hierarchy.

A further advantage of our approach over dendograms is that leaf nodes in our hierarchy represent groups of input vectors.

N.Y.U.S.T.

I.M.

Page 31: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Personal Opinion

We can learning the skill of TreeGCS to subdivides cluster by added a new node to the tree in hierarchical clustering.

N.Y.U.S.T.

I.M.

Page 32: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Victoria J. Hodge         Jim Austin

Intelligent Database Systems Lab

Review

1. GCS seven point for one epoch.

2. TreeGCS

3. Parameter Settings

N.Y.U.S.T.

I.M.


Recommended