Unsupervised Learning - UCSB Computer Science · 2018-05-22 · Learning Paradigm Global learning...

Post on 11-Jul-2020

0 views 0 download

transcript

Unsupervised Learning

Using ANNs

2PR , ANN, & ML

If correct I/O association is not provided

A number of samples are imposed

What does an ANN do with samples?

Network topology

Layers and connection

Learning rules used

familiarity, principal component analysis, feature mapping, etc.

Learning paradigm

Competitive vs. cooperative

Update

Batch (off-line) update vs. interactive (on-line) update

Unsupervised Learning

3PR , ANN, & ML

Again, the Recurring Theme of

Finding to which a sample belongs

Belong to everyone

Belong to only one

Belong to a small group of classes

How a sample affects class statistics

Global weighted update

Competitive update

Collaborative update

4PR , ANN, & ML

Issues

Network topology

Does multiple layers help?

Training mechanism?

Separation of functionalities?

But lateral connections are often important

Update rules

Firing of neurons is instantaneous upon

receiving inputs

Cf with k-mean which is batch

5PR , ANN, & ML

Learning Rules

Even though we can use the same networks,

we have to be careful about the learning

rules

Rules that require backpropagation of error

(knowing the correct I/O association) are

not applicable

E.g., Use Hebb rules instead

Reward for correlated pre- and post- firing

6PR , ANN, & ML

Learning Paradigm

Global learning

Nice guy, democratic approach

Cooperative

Try to maintain some kind of local structure

with a radial basis attention function

Competitive learning

Playground bully approach

Mine only

Not only it is mine, stay far away as possible

7PR , ANN, & ML

Simplest case

One linear unit with Hebb’s learning rules

1x 2x 3x 4x

w1 w2 w4w3

y

xw

wxxwtt

yyxw

xwy

ii

i

ii

weights are adjusted to be “similar” to inputs

more frequent input patterns dominate

pattern “familiarity” is learned

Similarity measure

8PR , ANN, & ML

At equilibrium (with a lot of patterns

observed and weight vectors do not change

significantly)

jiij

t

i

j

jjii

xxC

xxwyxw

,,0

0

xxCCw

implies that w is the eigenvector of matrix C

with zero eigenvalue

but this cannot be stable!

Simplest case

xwt

i

ii xwy

Cww

d

d

d w

w

xx

x

x

1

1

1

9PR , ANN, & ML

Train a network with the same pattern over

and over again

weights will go to infinity, dominated by the

eigenvectors with the largest eigenvalues

111

)(

2

...

...

''"

'

u

uwwCCwww

wCCwwCwww

Cwww

n

i

ii

nn

a

a

Simplest case

10PR , ANN, & ML

Oja’s learning rule

similar learning effect as Hebb’s rule

If the weight already confirms to the pattern,

don’t learn

without divergence of weight vector

weight vector converges to the maximal

eigenvector

can be generalized to locate other eigenvectors

(principal component analysis)

)( iii ywxyw

11PR , ANN, & ML

Unsupervised Competitive Learning

Clustering or categorizing data

Only one output active (winner-take-all)

Lateral inhibition

Each output neuron y for one class

1y2y

3y

1x 2x 3x 4x

12PR , ANN, & ML

Simple competitive learning

one-layer network

decision rule: (most) similar one learns

update rule: closer to the input pattern)1||,i allfor (||||

)iallfor (

xxwxw

xwxw

ii

ii

*

*

)(

1)(

**

***

*

ji

u

jji

jjiji

j

u

j

u

j

ji

u

jji

wxw

wwx

xw

xw

13PR , ANN, & ML

Competitive Learning ExampleInput data Initial placement Final placement

14PR , ANN, & ML

More Examples

15PR , ANN, & ML

Vector Quantization

A compression technique to represent input

vectors with a smaller number of “code”

(representative, prototype) vectors

Standard decision rule + learning rule

Learning Vector Quantization

standard decision rule +

classcorrect in )(

classcorrect )(

*

*

*

ji

u

j

ji

u

j

ji wx

wxw

Playground bully, push others away

16PR , ANN, & ML

17PR , ANN, & ML

Feature Mapping

A topology preserving map

Similar inputs map to outputs which are

close-by

1x 2x

18PR , ANN, & ML

Kohonen Map (Self-Organizing Map)

Preserve neighborhood relations

Decision rule

Update rule

initially, the neighborhood is large

gradually the neighborhood narrows down

)i allfor (|||| * xwxw ii

22* 2/||*

*

),(

))(,(

ii rr

ijjij

eii

wxiiw

Learning rate

Neighborhood size

Both drop through iterations

19PR , ANN, & ML

20PR , ANN, & ML

SOM Example

21PR , ANN, & ML

1 2

22PR , ANN, & ML

23PR , ANN, & ML

More Examples

24PR , ANN, & ML

More Examples

25PR , ANN, & ML

Three data clusters in 3D

Projection

From 3D to 2D

SOM

26PR , ANN, & ML

Traveling salesman problem

Mapping from a plane to a 1D ring

Modified Kohenen algorithm

Standard decision rule + update rule:

1st term: pulling weight to a particular city

2nd term: minimize inter-city distance

j

wx

wxu

iii

u

i

uu

i

ju

iu

e

ei

wwwkwxiw

22

22

2/||

2/||

11

)(

))2())(((

27PR , ANN, & ML

Hybrid Learning Schemes

Improved speed

Satisfactory performance

Unsupervised layer: clustering (divide input space in a Voronoi tessellation)

Supervised layer: key-value lookup

unsupervised

supervised

x

y

z

28PR , ANN, & ML

Example 1:

input to hidden layer: competitive learning

hidden to output layer: general delta rule

Example 2 (radial basis function):

input to hidden layer:

)(|||| * iallforxwxw ii

j

u

i

u

iij yOzw )(

j

ux

ux

ijj

ii

e

exg

22

22

2/)(

2/)(

)(