13/08/2004 CS-621/CS-449 Lecture Notes
Instructor: Prof. Pushpak Bhattacharyya
CS621/CS449Artificial Intelligence
Lecture Notes
Set 4: 13/08/2004, 17/08/2004, 18/08/2004, 20/08/2004
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Outline
• Proof of Convergence of PTA• Some Observations • Study of Linear Separability• Test for Linear Separability• Asummability
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Proof of Convergence of PTA
• Perceptron Training Algorithm (PTA)
• Statement:
Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Proof of Convergence of PTA
• Suppose wn is the weight vector at the nth step of the algorithm.
• At the beginning, the weight vector is w0
• Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as
wi+1 = wi + Xj
• Since Xjs form a linearly separable function,
w* s.t. w*Xj > 0 j
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Proof of Convergence of PTA
• Consider the expression
G(wn) = wn . w*
| wn|
where wn = weight at nth iteration• G(wn) = |wn| . |w*| . cos
|wn|
where = angle b/w wn and w*• G(wn) = |w*| . cos • G(wn) ≤ |w*| ( as -1 ≤ cos ≤ 1)
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Behavior of Numerator of G
wn . w* = (wn-1 + Xn-1fail ) . w*
wn-1 . w* + Xn-1fail . w*
(wn-2 + Xn-2fail ) . w* + Xn-1
fail . w* ….. w0 . w* + ( X0
fail + X1fail +.... + Xn-1
fail ). w*
• Suppose |Xj| ≥ , where is the minimum magnitude.
• Num of G ≥ |w0 . w*| + n . |w*| • So, numerator of G grows with n.
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Behavior of Denominator of G
• |wn| = wn . wn
(wn-1 + Xn-1fail )2
(wn-1)2 + 2. wn-1. Xn-1
fail + (Xn-1fail )2
≤ (wn-1)2 + (Xn-1
fail )2 (as wn-1. Xn-1
fail ≤ 0 )
≤ (w0)2 + (X0
fail )2 + (X1fail )2 +…. + (Xn-1
fail )2
• |Xj| ≤ (max magnitude)
• So, Denom ≤ (w0)2 + n2
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Some Observations
• Numerator of G grows as n• Denominator of G grows as n
=> Numerator grows faster than denominator
• If PTA does not terminate, G(wn) values will become unbounded.
• But, as |G(wn)| ≤ |w*| which is finite, this is impossible!
• Hence, PTA has to converge. • Proof is due to Marvin Minsky.
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Study of Linear Separability
y
x1
. . .
θ
w1 w2 w3 wn
x2 x3 xn
• W. Xj = 0 defines a hyperplane in the (n+1) dimension.
=> W vector and Xj vectors are perpendicular to each other.
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Linear Separability
Xk+1 -
Xk+2 -
-
Xm -
X1+ + X2
Xk+
+
Positive set :
w. Xj > 0 j≤k
Negative set :
w. Xj < 0 j>kSeparating hyperplane
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Linear Separability
• w. Xj = 0 => w is normal to the hyperplane which separates the +ve points from the –ve points.
• In this computing paradigm, computation means “placing hyperplanes”.
• Functions computable by the perceptron are called – “threshold functions” because of comparing
∑ wiXi with θ (the threshold)– “linearly separable” because of setting up
linear surfaces to separate +ve and –ve points
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Linear Separability
• Decision problems may have +ve and –ve points that need to be separated
• The hyperplane found should be generalized through learning– PAC learning, SVMs etc. for learning
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Test for Linear Separability (LS)
• Theorem:
A function is linearly separable iff the vectors corresponding to the function do not have a Positive Linear Combination (PLC)
• PLC – Both a necessary and sufficient condition.
• X1, X2, … , Xm - Vectors of the function• Y1, Y2, … , Ym - Augmented negated set
• Prepending -1 to the 0-class vector Xi and negating it, gives Yi
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Example (1) - XNOR
• The set {Yi} has a PLC if ∑ Pi Yi = 0 , 1 ≤ i ≤ m
– where each Pi is a non-negative scalar and
– atleast one Pi > 0
• Example : 2 bit even-parity (X-NOR function)
X1 <0,0> + Y1 <-1,0,0>
X2 <0,1> - Y2 <1,0,-1>
X3 <1,0> - Y3 <1,-1,0>
X4 <1,1> + Y4 <-1,1,1>
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Example (1) - XNOR
• P1 [ -1 0 0 ]T + P2 [ 1 0 -1 ]T
+ P3 [ 1 -1 0 ]T + P4 [ -1 1 1 ]T
= [ 0 0 0 ]T
• All Pi = 1 gives the result.
• For Parity function,PLC exists => Not linearly separable.
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Example (2) – Majority function
Xi Yi
<0 0 0> - <1 0 0 0>
<0 0 1> - <1 0 0 -1>
<0 1 0> - <1 0 -1 0>
<0 1 1> + <-1 0 1 1>
<1 0 0> - <1 -1 0 0>
<1 0 1> + <-1 1 0 1>
<1 1 0> + <-1 1 1 0>
<1 1 1> + <-1 1 1 1>
• 3-bit majority function
• Suppose PLC exists. Equations obtained are:
• P1 + P2+ P3- P4 + P5- P6- P7 – P8 = 0
• -P5+ P6 + P7+ P8 = 0
• -P3+ P4 + P7+ P8 = 0
• -P2+ P4 + P6+ P8 = 0
• On solving, all Pi will be forced to 0
• 3 bit majority function
=> No PLC => LS
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Example (3) – n bit comparatorz
y2n-1
. . .
θ
W2n-1 wn wn w0
yn Xn-1 x0
. .
• n-bit comparator• z = 1 if dec(y) > dec(x)
= 0 otherwise• Perceptron realization
possible?
• w vector is such that
wi = - 2i i , 0 ≤ i ≤ n-1= 2j-n j , n ≤ j ≤ 2n-1
• and θ = 0
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Example (3) – n bit comparator
• 2 bit comparator (y0, y1, x0, x1, z) • +ve class : 0100, 1000 and so on• -ve class : 0000, 0001and so on
• After augmentation and negation, 10 non-negative nos. Pi are to be found such that not all of them are zero.
• Suppose PLC exists, Σ Pi Yi = 0 such that i Pi ≥ 0, i=1…15 and ~ (i , Pi = 0)
• Solving the equations, we eventually end up with all
Pis = 0.
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Observations
• If the vectors are from a linearly separable function, then during the course of training, never can there be a repetition of weights.
• Weights strictly increase and hence do not repeat
• Asummability
*.
....
*.
*.
*).(*.
0
2
1
11
ww
ww
ww
wXwww
n
n
nfailnn
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Asummability• Another form of PLC test• Helps in quick testing of LS• Definition of k-summability:
Given the +ve and -ve classes, a function is called
k-summable, if we can find k vectors from the +ve class and k vectors from the –ve class such that
where pis and njs are >0.
kk
kk
YnYnYn
YpYpYp
...
...
2211
2211
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Asummability
• A function is called asummable if it is not k-summable for any k.
• Asummability is equivalent to LS.• Example : XOR
01
10
+ve class11
00
-ve class
01
10
11
11
00
11
2-summable => not LS
13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya
IIT Bombay
Assumability – Examples
• AND – is not k-summable => asummability =>LS
11+ve class
00
01
10
-ve class
• A single vector in one class and all others in another => linearly separable
• Another example – Majority function