+ All Categories
Home > Documents > CS621: Artificial Intelligence

CS621: Artificial Intelligence

Date post: 22-Feb-2016
Category:
Upload: vidal
View: 33 times
Download: 0 times
Share this document with a friend
Description:
CS621: Artificial Intelligence. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 43– Perceptron Capacity; Perceptron Training ; Convergence 8 th Nov, 2010. The Perceptron Model - PowerPoint PPT Presentation
27
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 43– Perceptron Capacity; Perceptron Training; Convergence 8 th Nov, 2010
Transcript
Page 1: CS621: Artificial Intelligence

CS621: Artificial IntelligencePushpak Bhattacharyya

CSE Dept., IIT Bombay

Lecture 43– Perceptron Capacity; Perceptron Training; Convergence

8th Nov, 2010

Page 2: CS621: Artificial Intelligence

The Perceptron Model

A perceptron is a computing element with input lines having associated weights and the cell having a threshold value. The perceptron model is motivated by the biological neuron.Output = y

wn Wn-1

w1

Xn-1

x1

Threshold = θ

Page 3: CS621: Artificial Intelligence

θ

1y

Step function / Threshold functiony = 1 for Σwixi >=θ =0 otherwise

Σwixi

Page 4: CS621: Artificial Intelligence

Features of Perceptron• Input output behavior is discontinuous and

the derivative does not exist at Σwixi = θ • Σwixi - θ is the net input denoted as net

• Referred to as a linear threshold element - linearity because of x appearing with power 1

• y= f(net): Relation between y and net is non-linear

Page 5: CS621: Artificial Intelligence

Recap on Capacity

Page 6: CS621: Artificial Intelligence

Fundamental Observation The number of TFs computable by a

perceptron is equal to the number of regions produced by 2n hyper-planes,obtained by plugging in the values <x1,x2,x3,…,xn> in the equation

∑i=1nwixi= θ

Page 7: CS621: Artificial Intelligence

Number of regions founded by n hyperplanes in d-dim passing through origin is given by the following recurrence relation

we use generating function as an operating function

Boundary condition:1 hyperplane in d-dim

n hyperplanes in 1-dim, Reduce to n points thru origin

The generating function is

1,1,, 1 dnddn RRR n

22

1,

,1

n

d

RR

d

n d

ndn yxRyxf

1 1

,),(

Page 8: CS621: Artificial Intelligence

From the recurrence relation we have,

Rn-1,d corresponds to ‘shifting’ n by 1 place, => multiplication by xRn-1,d-1 corresponds to ‘shifting’ n and d by 1 place => multiplication by xy

On expanding f(x,y) we get

01,1,, 1 dnddn RRR n

........

.............

........),(

,3

3,2

2,1,

2,2

323,2

222,2

21,2

,13,12,11,132

dndn

nn

nn

nn

dd

d

yxRyxRyxRyxR

yxRyxRyxRyxR

yxRyxRyxRxyRyxf d

Page 9: CS621: Artificial Intelligence

22 2

,1

2

1,1

2 2

,1

2 2

1,11

1 1

1,

2 1

,1

1 1

1,

1 1

,

2

),(

),(

),(

),(

n

n

n d

dndn

n

nn

n d

dndn

d

n d

ndn

d

n d

ndn

d

n d

ndn

n d

dndn

d

n d

ndn

yxyxR

yxRyxRyxfx

yxRyxRyxfxy

yxRyxRyxfx

yxRyxf

Page 10: CS621: Artificial Intelligence

After all this expansion,

since other two terms become zero

xyyxyxyxyxR

xyRyxRxyRyxR

yxRyxf

n

n

d

dd

n d

ndn

n

nn

d

dd

d

n d

ndn

d

n d

ndn

222

),(

112 2

,

1,1

1

1,

1

,1

2 2

,

1 1

,

1

121

2 2

1,1,1,

2

2222

)(

d

d

d

d

nn

d

d

n d

ndndndn

yx

yxxyxyxy

yxRRR

),(),(),( yxfxyyxfxyxf

Page 11: CS621: Artificial Intelligence

This implies

also we have,

Comparing coefficients of each term in RHS we get,

].....)1(...)1()1(1[

]........[2

2)]1(1[

1),(

2),(]1[

22

32

1

1

dd

d

d

d

d

d

yxyxyx

yyyyx

yxyx

yxf

yxyxfxyx

d

n d

ndn yxRyxf

1 1

,),(

Page 12: CS621: Artificial Intelligence

1

0

1d

i

n

iC

Comparing co-efficients we get

dnR ,

Page 13: CS621: Artificial Intelligence

Perceptron Training Algorithm (PTA)Preprocessing:1. The computation law is modified

toy = 1 if ∑wixi > θy = o if ∑wixi < θ

. . .

θ, ≤

w1 w2 wn

x1 x2 x3 xn

. . .

θ, <

w1 w2 w3 wn

x1 x2 x3 xn

w3

Page 14: CS621: Artificial Intelligence

PTA – preprocessing cont…2. Absorb θ as a weight

3. Negate all the zero-class examples

. . .

θ

w1 w2 w3 wn

x2 x3 xnx1

w0=θ

x0= -1

. . .

θ

w1 w2 w3

wn

x2 x3 xnx1

Page 15: CS621: Artificial Intelligence

Example to demonstrate preprocessing OR perceptron1-class<1,1> , <1,0> , <0,1>0-class<0,0>

Augmented x vectors:-1-class<-1,1,1> , <-1,1,0> , <-1,0,1>0-class<-1,0,0>

Negate 0-class:- <1,0,0>

Page 16: CS621: Artificial Intelligence

Example to demonstrate preprocessing cont..Now the vectors are

x0 x1 x2

X1 -1 0 1X2 -1 1 0X3 -1 1 1X4 1 0 0

Page 17: CS621: Artificial Intelligence

Perceptron Training Algorithm1. Start with a random value of w

ex: <0,0,0…>2. Test for wxi > 0 If the test succeeds for i=1,2,…n

then return w3. Modify w, wnext = wprev + xfail

Page 18: CS621: Artificial Intelligence

Tracing PTA on OR-example

w=<0,0,0> wx1 fails w=<-1,0,1> wx4 fails

w=<0,0,1> wx2 fails w=<-1,1,1> wx1 fails

w=<0,1,2> wx4 fails w=<1,1,2> wx2 fails w=<0,2,2> wx4 fails w=<1,2,2> success

Page 19: CS621: Artificial Intelligence

PTA convergence

Page 20: CS621: Artificial Intelligence

Statement of Convergence of PTA

Statement: Whatever be the initial choice of

weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Page 21: CS621: Artificial Intelligence

Proof of Convergence of PTA

Suppose wn is the weight vector at the nth step of the algorithm.

At the beginning, the weight vector is w0

Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as

wi+1 = wi + Xj Since Xjs form a linearly separable

function, w* s.t. w*Xj > 0 j

Page 22: CS621: Artificial Intelligence

Proof of Convergence of PTA (cntd.)

Consider the expressionG(wn) = wn . w*

| wn|where wn = weight at nth iteration

G(wn) = |wn| . |w*| . cos |wn|

where = angle between wn and w* G(wn) = |w*| . cos G(wn) ≤ |w*| ( as -1 ≤ cos ≤ 1)

Page 23: CS621: Artificial Intelligence

Behavior of Numerator of G

wn . w* = (wn-1 + Xn-1fail ) . w*

wn-1 . w* + Xn-1fail . w*

(wn-2 + Xn-2fail ) . w* + Xn-1

fail . w* ….. w0 . w* + ( X0

fail + X1fail +.... + Xn-1

fail ). w*

w*.Xifail is always positive: note

carefully Suppose |Xj| ≥ , where is the

minimum magnitude. Num of G ≥ |w0 . w*| + n . |w*| So, numerator of G grows with n.

Page 24: CS621: Artificial Intelligence

Behavior of Denominator of G |wn| = wn . wn (wn-1 + Xn-1

fail )2

(wn-1)2 + 2. wn-1. Xn-1

fail + (Xn-1fail )2

≤ (wn-1)2 + (Xn-1

fail )2 (as wn-1. Xn-

1fail ≤ 0 )

≤ (w0)2 + (X0

fail )2 + (X1fail )2 +…. + (Xn-

1fail )2

|Xj| ≤ (max magnitude) So, Denom ≤ (w0)2

+ n2

Page 25: CS621: Artificial Intelligence

Some Observations Numerator of G grows as n Denominator of G grows as n

=> Numerator grows faster than denominator

If PTA does not terminate, G(wn) values will become unbounded.

Page 26: CS621: Artificial Intelligence

Some Observations contd. But, as |G(wn)| ≤ |w*| which is

finite, this is impossible! Hence, PTA has to converge. Proof is due to Marvin Minsky.

Page 27: CS621: Artificial Intelligence

Convergence of PTA proved

• Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.


Recommended