+ All Categories
Home > Documents > Neural Networks Part 4

Neural Networks Part 4

Date post: 02-Jan-2016
Category:
Upload: xantha-hopkins
View: 34 times
Download: 0 times
Share this document with a friend
Description:
Neural Networks Part 4. Dan Simon Cleveland State University. Outline. Learning Vector Quantization (LVQ) The Optimal Interpolative Net (OINet). Learning Vector Quantization (LVQ) Invented by Tuevo Kohonen in 1981 Same architecture as the Kohonen Self Organizing Map Supervised learning. - PowerPoint PPT Presentation
33
Neural Networks Part 4 Dan Simon Cleveland State University 1
Transcript
Page 1: Neural Networks Part 4

Neural NetworksPart 4

Dan SimonCleveland State University

1

Page 2: Neural Networks Part 4

Outline

1. Learning Vector Quantization (LVQ)2. The Optimal Interpolative Net (OINet)

2

Page 3: Neural Networks Part 4

Learning Vector Quantization (LVQ)

Invented by Tuevo Kohonen in 1981Same architecture as the Kohonen Self Organizing MapSupervised learning

x1

xi

xn

y1

yk

ym

w11

w1kw1m

wi1wik

wim

wn1 wnk

wnm

3

Page 4: Neural Networks Part 4

LVQ Notation:

x = [x1, …, xn] = training vectorT(x) = target; class or category to which x belongswk = weight vector of k-th output unit = [w1k, …, wnk]a = learning rate

LVQ Algorithm:Initialize reference vectors (that is, vectors which represent prototype inputs for each class)while not (termination criterion)

for each training vector xk0 = argmink || x – wk ||if k0 = T(x) then

wk0 wk0 + a(x – wk0)else

wk0 wk0 – a(x – wk0)end if

end forend while

4

Page 5: Neural Networks Part 4

w1

w2

w3

x

x–w2

We have three input classes.Training input x is closest to w2.

If x class 2, then w2 w2 + a(x – w2)that is, move w2 towards x.

If x class 2, then w2 w2 – a(x – w2) that is, move w2 away from x.

LVQ Example

LVQ reference vector initialization:

1.Use a random selection of training vectors, one from each class.2.Use randomly-generated weight vectors.3.Use a clustering method (e.g., the Kohonen SOM). 5

Page 6: Neural Networks Part 4

LVQ Example: LVQ1.m

(1, 1, 0, 0) Class 1(0, 0, 0, 1) Class 2(0, 0, 1, 1) Class 2(1, 0, 0, 0) Class 1(0, 1, 1, 0) Class 2

Final weight vectors:(1.04, 0.57, 0.04, 0.00)(0.00, 0.30, 0.62, 0.70)

6

Page 7: Neural Networks Part 4

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Training Data

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Final Result

Final classification results on the training data, and final weight vectors. 14

classification errors after 20 iterations.

LVQ Example: LVQ2.m

Training data from Fausett, p. 190.

Four initial weight vectors are at the corners of the training data.

7

Page 8: Neural Networks Part 4

8

-0.2 0 0.2 0.4 0.6 0.8-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Training Data

-0.2 0 0.2 0.4 0.6 0.8-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Final Result

LVQ Example: LVQ3.m

Final classification results on the training data, and final weight vectors. Four

classification errors after 600 iterations.

Training data from Fausett, p. 190.20 initial weight vectors are randomly

chosen with random classes.

In practice it would be better to use our training data to assign the classes of the initial weight vectors.

Page 9: Neural Networks Part 4

LVQ Extensions:

9

w1

w2

w3

xx–w2

The graphical illustration of LVQ gives us some ideas for algorithmic modifications.•Always move the correct vector towards x, and move the closest p vectors that are incorrect away from x.•Move incorrect vectors away from x only if they are within a distance threshold.•Popular modifications are called LVQ2, LVQ2.1, and LVQ3 (not to be confused with the names of our Matlab programs).

Page 10: Neural Networks Part 4

10

LVQ Applications to Control Systems:

•Most LVQ applications involve classification•Any classification algorithm can be adapted for control•Switching control – switch between control algorithms based on the system features (input type, system parameters, objectives, failure type, …)•Training rules for a fuzzy controller – if input 1 is Ai and input 2 is Bk, then output is Cik – LVQ can be used to classify x and y•User intent recognition – for example, a brain machine interface (BMI) can recognize what the user is trying to do

Page 11: Neural Networks Part 4

11

The optimal interpolative net (OINet) – March 1992Pattern classification; M classes; if x class k, then yi = ki

The network grows during training, but only as large as needed.

x1

x2

xN

y1

y2

yM

w11

w12w1m

w21

w22

w2m

wq1wq2

wqM

q hidden neurons

v11

v12v1q

v21

v22v2q

vN1vN2

vNq

vi = weight vector to i-th hidden neuron; vi is N-dimensionalvi = prototype; { vi } { xi }

Page 12: Neural Networks Part 4

12

1 11

1

(|| ||)

(|| ||

1) ( )( 1( )

T T

q q

y W W

M

v x

v x

M q q

Suppose we have q training samples: y(xi) = yi, for i {1, …, q}. Then:

11 1

1

1

1

]

( ) ( )

[

( )

qT

q

q qq

T

T

y W

Y W G

M q M q q

y

q

W G Y

Note if xi = xk for some i k,then G is singular

Page 13: Neural Networks Part 4

13

The OINet works by selecting a set of {xi} to use as input weights.These are the prototype vectors {vi}, i = 1, …, p.

Choose a set of {xi} to optimize the output weights W.These are the subprototypes {zi}, i = 1, …, l.

Include xi in {zi} only if needed to correctly classify xi.

Include xi in {vi} only if G is not ill-conditioned, and only if it decreasesthe total classification error.

1

1

1

more exp

(

licit notation:

) ( )( )

(

(

)

) ( ) ( )

T

T T T

T

l T l T lp l p p

G

M l M p p l

W Y G GG

G R

Y

Y W

Y

W G R

Use l inputs for training.Use p hidden neurons.Gik = (||vizk||)

Page 14: Neural Networks Part 4

14

Suppose we have trained the network for a certain l and p.All training inputs considered so far have been correctly classified.

We then consider xi, the next input in the training set.Is xi correctly classified with the existing OINet? If so, everything is fine, and we move on to the next training input.

If not, then we need to add xi to the subprototype set {zi} and obtaina new set of output weights W. We also consider adding xi to the prototypeset {vi}, but only if it does not make G ill-conditioned, and only if it reducesthe error by enough.

1 1 1

1

1

(|| ||) (|| ||)

(|| ( ||) (|| ||)

( )

) ( )(( )

lT

l

p p l

l T ll p p

y

v z v z

y

v z v z

W G

l M p p l

p l

W

Y

M

Page 15: Neural Networks Part 4

15

Suppose we have trained the network for a certain l and p.

Suppose xi, the next input in the training set, is not correctly classified. We need to add xi to the subprototype set {zi} and retrain W.

1, , We have , and ( )l l l lp p p pRW G R

1

1 1 1 1 11

1 1

1 1

1 11

1 1 1 11

(|| ||) (|| ||) (|| ||)

)

(|| ( ||) (|| ||) (|| ||)

( )

( 1)) ( )( ( 1))

(

) ( )( ) (

(

l i

l ll T

l l

p

p

p p l l

l T ll p p

l T l T lp l p p

x

v z v z v z

y

v z v z v z

W G

l M p p

z

y y W

Y

M l

Y GW R

This is going to get expensive if we have lots of data, and if we have to perform a new matrix inversion every time we add a subprototype.Note: Equation numbers from here on refer to those in Sin & deFigueiredo

1 (Eq. 11)lpk

Page 16: Neural Networks Part 4

16

1 1 1 1 1 1 1

Matrix inversion lemma (prove for homewo

)

rk):

( ( )C A A B D CA B CAA BD

1 1 1 1 1 11

1 1

1 1 1 1 1

11 1 1 1 1 1 1 1

1

([ ( (

(

(

( ) ( )

( ) ( ) ( ( ) ( ( )

(

)( ) ] ) )

)

)

( )

) )

( )

l Tpl l l T l l l l T l l T

p p p p p p p p pl Tp

l l l Tp p p

l l l l Tp p p p

l l l l T l l l T lp p p p p p p p

lp

G G k k kG

R G G G

R

R R

R

k

k k

k k

k I k kR R

RR

kR

1 1 1 1

1 1 1

) ( ( )(Eq. 17)

( ( )

)

1 )

l l l T lp p p p

l T l lp p p

k k

k k

R

R

We only need scalar division to compute the new inverse, because we already know the old inverse!

(Eq. 10)

Page 17: Neural Networks Part 4

17

We can implement this equation, but we can also find a recursive solution.

1 1 1 11

1 1 1 11

) ( ) ( )(

( )

l T l T lp l p p

l l lp p p

Tl

Y G R

R G Y

W

W

1 1 1 11 1 1

1 1 11

1 1 1 11 1 1

1 11 1 1

( ) ( ( )( )

( ( )

( ) ( ( )( ) ( ) ( )

(

)

1 )

)

( )

(

1 )

l l l T l Tp p p pl l l l l

p p p pl T l l Tp p p l

l l l T lp p p pl l T l T l T l T

p p l p l p l p ll T l lp p p

lp

k kG

k k

k kG G

R R YW R k

R y

R RR Y k y Y k y

R

kR

W

k

1 1 111 1

1 1 1 1

)

1 )

( ( ( ) )) (Eq. 20)

( ( )

l T l l l Tp p p p ll l T

p p l l T l lp p p

k k

k k

W R yk y

R

Page 18: Neural Networks Part 4

18

1 1 1

1 1 1 1 1 11 1 1

1

1 1 1

1 1

1 1

11

11 11

( )

( ( ) ( ( ( ) )

( ( )

( ) ( ) ( )

) )

1 )

)

( )

( ( )

l l l lp p p p

T l T l l T l T l l l Tl p p p l p p p p l

l T l lp p p

l l lp p p

T l T l l T l T l lp p p l p

l lp p l

Tpl pk

W W R k

y R y

R k y k R k y

W R y

R

W R k

y

k k k k

k k

Wk

1 1 1

111 1

1 1 1

1

( ( )

(( ) (Eq. 20)

(

1 )

( )

)

1 )

l T l lp p p

T l T ll p pl l l

p p p l T l lp

T

p p

k k

k

k k

R

y WW R k

R

We have a recursive equation for the new weight matrix.

Page 19: Neural Networks Part 4

19

Now we have to decide if we should add xi to the prototypeset {vi}. We will do this if it does not make G ill-conditioned, and if it reducesthe error by enough.

1

1 1 1 1 1

11 1 1

1 1

1 1 1 1

1 11 1 1

1

(|| ||) (|| ||) (|| ||)

)(|| ( ||) (|| ||) (|| ||)

(|| ( ||) (|| ||) (|| ||)

(

( 1)

(

(

)

p

p i

l l

l Tl l p

p p l l

p p l l

l T ll p

p

p

x

v z v z v z

yv z v z v z

v z v z v z

W G

v

y y W

Y

M l

111

1 1

1 1 1 1 1 1 1 11 1 1 1 1 1 1 1

) ( ( 1))(( 1) ( 1))

) ( ) ] ( ) (

(Eqs. 22-24)

( [ ( ) )

l ll p pp T

l l

l T l T l l T l T lp l p p p l p p

G k

M p p l

r

W

G

Y G G G Y G R

I wonder if we can think of something clever to avoid the new matrix inversion …

Page 20: Neural Networks Part 4

20

1 1 11 1 1

11

111 1

1 11 1

1 21 1 1 1 1

1 11

( )

( )

(Eq. 26)( ) ( )

'( )

' '

( )( )

'

l l l Tp p p

l Tl lp lp pl TTp ll l

l l lp p l l p

T l T l T Tl p l p l l l

lp

R

G rG k

kr

R G r k A

r G k

G G

B

r r C D

AR

D

B

C

1

1

1 1 1 1

1 1 1 1 1

Another matrix inversion lemma (prove for homework):

' '- First define

' '

'

' ' '

BA B A B

E D CAC D C D

A A

B A C E D

A BE CA

BE CA E

Page 21: Neural Networks Part 4

21

1 1 1 1

12 1 1 1 1

1 1

11 1

11 1

1 11

1 1

1

1

1

1 1 1 1 1

( )

( ( )

' ( (

(

(Eq. 29

(

( ) )

) )

)

ˆ

) ( ) )

ˆ

(

)]

( )

[

T T l T l T l l ll l l l p l p p p l

l lp p

lp

T

l lp l

l p

l p

T l T l Tl p l p

l l lp l l p p

Tl l

G r k

r G k

G r

A R R

r r r G k R G

G r

r k

r

k

R

r

1 1 1

1 1 1 1 1 1 1 1

1 1

1

1

1

11 1

1 1 1

(

(Eq. 30)

' ( ( ( (

ˆ) (Eq. 28)

ˆ ˆ) ) ) ) ( /

(

( /( (E

ˆ ˆ ˆ ˆ( ) ( )

q. 27

)

' ) /

) /) )

1//

l lp p

l l

T T l T l lp

l l T lp p p p p

l Tp

l Tl pp T

p pr r r G

R G

A R R G G R

R uu

R uu uR

R G

u r

r

u

r

r

A

Homework: Derive this

Page 22: Neural Networks Part 4

22

We already have everything on the right side, so we can derive the newinverse with only scalar division (no additional matrix inversion).

Small ill conditioningSo don’t use xi as a prototype if < 1 (threshold)

Even if > 1 , don’t use xi as a prototype if the error only decreases by a small amount, because it won’t be worth the extra network complexity.

1 1 11

1 1 11 1 1 1

1 1 2

,

1 1 21 1

,

1 11

( ) ( 1) matrix (Eq. 35)

( ) ( 1)

)

)

(Eq. 38)

matrix

(

(

l l T lp l p p

l l T lp l p p

l lp p ij

i j

l lp p ij

i j

l lp p

Y W G l

Y W G l

e e

E M

E M

e E

e E

e

Before we check e, let’s see if we can find a recursive formula for W …

Page 23: Neural Networks Part 4

23

1 1 1 11 1 1 1

11 1

1 1 1

1 1 1 1 1 11 1

11 1

( /

1/

( / ( /

( )

) /

/

) /

/

)

/

/

/ /

l l l Tp p p l

l l Tl Tp p lpT TTl l l

l l T l T l l T lp p p l p p p l

T l T T lp l p l

W

G k YR uu u

r yu

R G uu G ur R k uu k u

u G r u k

R G Y

1

1

2

1

1 1

1 1 1 1 11

1 1 1

1

1 1 11

111

11

1

/

/

( ) (

ˆˆ( )

/

/

ˆ /

/ (

)

)

/

T l Tp l

T l

l l T l l Tp p l p p l

lp

TlTl

l l l T T lp

Tp l

T l Tp

T Tl l

Tl l

T Tl l

p p l p

ur Y

u y

ur

uu G Y

uu k y

u

R G

Y

y

w

w

w

R G rrW Y

Y R k y

W G YY u

R

1 1 1 1 11

1 1 1 1 1 1 1 11 1

1 1 1 1 1 11

1 1 1 1 11

ˆˆ ( ) ( )

ˆˆ( ) ( ) ( )

ˆˆ( ) ( )

ˆ ˆ/ (Eq. 34) ( )

/

/

/

l T l T l l Tp p p p l

l l l T T l T l l Tp p p l p p p l

l l l T T l T lp p p l p p

l l l lp p p p

G rr G R G Y

R G rr Y G R G Y

R G rr Y G

E r w R G r

W

W W

W

(Eq. 33)T

Page 24: Neural Networks Part 4

24

2

11 1

1 1 1 11 1

1 1

1

1

1 11

1

1

11 1

ˆ/ /

ˆ

/ /

) ]ˆ[ ( /

) ] /

) )] /

( )

ˆ ˆ[ (

ˆ[ ((

ˆ ( )

/

/

/T l T T T

p l l

T l T l l T T Tp p p l l

T l T l T Tp p l

T l T l

T l T T l Tp l

Tp p l

T l T

T T Tl l p l

p

l lr Y yuw

u G Y r Y

r G R G Y r Y

r G W r Y

r G W Y

r

G Y u k y

E

1 1 1 1

11

1

ˆ

(Eq.

( )

125)

T

l l l Tl p p pp T

lp

T

R G r

MW

M

WW

pW

We have a recursive equation for the new weight matrix.

Page 25: Neural Networks Part 4

25

Back to computing e (see Eq. 38):Suppose Ax = b, and dim(x) < dim(b), so system is over-determinedLeast squares:

1

2 1 21

1 2

1 1

1 1 1 1

1

( )

|| ( (

( ( (

|| || || ( ) ||

) ) ||

) ) ) )

) ) ) ) )

(

( ( ( ( (

(( ) )

T T

T T

T T

T T T T T T

T T T T T T T T T

T T T

Ax b A A A A b b

A A I b

A A I A A I

x A A A b

e

A A

b A A A A

A A A A A A A A A I

A

b

b A A A A A A A b

I AA bb A

Now suppose that we add another column to A and another element to x.

ˆ

ˆ [ a]

ˆ ˆ [ a]=T T T

T

T T T

Ax b

A A

A A A A aA A A

a a A a a

We have more degrees of freedom in x, so the approximation error should decrease.

Page 26: Neural Networks Part 4

26

Matrix inversion lemma:

11

1 1

1

12

1 11 2

) /ˆ ˆ)/

( ) ( ( ) )

( )

ˆ ˆ ˆ ˆ( ( ) )

ˆ ˆ ˆ ˆ( )

( /(

1

)

/

(

T TT

T

T T T T T T

T T

T T

T

T

T T T T T T

A ggA A

a a a A A A A a a I A A A A a

g A A a

e b I A A A A b

e e e b A A A A

A

b b A A A

g

A b

g

A

But notice:

11

1

1 2

) /ˆ ˆ ˆ ˆ( ) [/

( ) / / / /

/

( /]

1/

/ / /

T T TT T T T

T T

T T T T T T T T T

T T T T T T T

A gg AA A A A b b A b

a

b A A A A Agg A ag A Aga aa b

e e e b Agg A ag A A

A gb

ga a

g

a b

a

Page 27: Neural Networks Part 4

27

2

21

21

/

)

2

/

( ) /

( ( ) /

T T T T T T T T

T T

T T T T

T T T

e b b b ag A b b aa b

b Ag b a

b A A A A a b a

b I A A A

Agg A

aA

1 2

21

1 1 11 1 1 1

1 2

ˆ) [ ] (error

ˆ has one more column than

Now consider (error , and )

(error ,

. Solve for .

( ) /

This is like going from

(G ) ) (G )to

T T T T

l T l T l Tp p l p p

AX A a X B e

A A X

e

AX B e

e

e e B B A A A A a

W Y W

1 21 (error )l T

l eY

Page 28: Neural Networks Part 4

28

1 2

21 1 1 1

1 1

21 1

1

21

2

ˆ( ) ( )

ˆ( )

ˆ /

|| ||

/

/

l T l ll l p p p

l T ll p p

lp

Y

Y

e e e

Y G R G r

W G r

E r

We have a very simple formula for the error decrease due to adding a prototype. Don’t add the prototype unless (e / e1) > 2 (threshold)

Page 29: Neural Networks Part 4

29

The OINet Algorithm

Training data {xi} {yi}, i {1, …, q}Initialize prototype set: V = {x1}, and # of prototypes = p = |V| = 1Initialize subprototype set: Z = {x1}, and # of subprototypes = l = |Z| = 1

1 11

1 11

211

1 211

/

1

( )

) /(

l l lp p p

l Tp

l l l Tp p p

lp

W W G

yW

R G G

y

R

x1

x2

xN

y1

y2

yM

w11

w12

w1m

1 hidden neuron

v11

v21

vN1

What are the dimensionsof these quantities?

Page 30: Neural Networks Part 4

30

Begin outer loop: Loop until all training patterns are correctly classified.n = q – 1 Re-index {x2 … xq} & {y2 … yq} from 1 to nFor i = 1 to n (training data loop)

Send xi through the network. If correctly classified, continue loop.Begin subprototype addition:

1

11 1 1

1 1

1

1 1

1 1 1 1

(Eq. 11)

(R computation (Eq. 17)

computation (Eq. 20)

Consider using as a prototype (note

(|| ||) (|| ||

)

)

(||

:

(Eq. ||) (|| || 2

l i

Tlp l p l

lp

lp

i p l

T

l l l l

z x

v z v z

z

z z z

k

x

z

W

v

r

1 1 1

1

1

3)

(Eq. 2(|| ||)

ˆ

4)

(Eq. 29)

l l l

l

l

r

z z

r

Page 31: Neural Networks Part 4

31

1 1 1 1 1

1 1 1 1

1 1 11

1 2

11 2

ˆ)

ˆ ˆ ˆ

(Eq. 10), ( (Eq. 28)

) ( (Eq. 30)

E ) (Eq. 35)

(Eq. 34), (Eq. 40)

If th

ˆ( )

(

ˆ / || ||

en a und /

l l l l lp p p p p

T T l T l lp p p

l l T lp l p p

lp

lp

G r

r r r r

Y

E r e

e

G k u R G

G R G

W G

E

111

1 1

1 11

1 11

1 1 1 11

se as a prototype

(Eq. 22)

Update (Eq. 25)

Update (Eq. 26)

Update (

End

(

1

End prototype additi

) ) (Eq. 27)

subprototype add

on

1

i

ll pp T

l l

l lp p

l lp p

l lp p

x

GG

r

W

R

R

p

W

R

R

p

ll

End training dat

End

it

ou

a l

ter

i

oo

o

p

n

loop

Page 32: Neural Networks Part 4

32

Homework:

•Implement the OINet using FPGA technology for classifying subatomic particles using experimental data. You may need to build your own particle accelerator to collect data. Be careful not to create any black holes.•Find the typo in Sin and deFigueiredo’s original OINet paper.

Page 33: Neural Networks Part 4

References

•L. Fausett, Fundamentals of Neural Networks, Prentice Hall, 1994•S. Sin and R. deFigueiredo, An evolution-oriented learning algorithm for the optimal intepolative net, IEEE Transactions on Neural Networks, vol. 3, no. 2, pp. 315–323, March 1992

33


Recommended