Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | kaliska-parker |
View: | 18 times |
Download: | 1 times |
Carla P. GomesCS4700
CS 4700:Foundations of Artificial Intelligence
Prof. Carla P. [email protected]
Module: Neural Networks
Expressiveness of Perceptrons(Reading: Chapter 20.5)
Carla P. GomesCS4700
Expressiveness of Perceptrons
Carla P. GomesCS4700
Expressiveness of Perceptrons
What hypothesis space can a perceptron represent?
Even more complex Booelan functions such as majority function .
But can it represent any arbitrary Boolean function?
Carla P. GomesCS4700
Expressiveness of Perceptrons
A threshold perceptron returns 1 iff the weighted sum of its inputs (including the bias) is positive, i.e.,:
I.e., iff the input is on one side of the hyperplane it defines.
Linear discriminant function or linear decision surface.
Weights determine slope and bias determines offset.
Perceptron Linear Separator
Carla P. GomesCS4700
x1
x2+
++
++
+
+
2
01
2
12
22110 0
w
wx
w
wx
xwxww
Can view trained network as defining a “separation line”.
Linear Separability
Percepton used for classification
Consider example with two inputs, x1, x2:
What is its equation?
Carla P. GomesCS4700
Linear Separability
x1
x2
OR
Carla P. GomesCS4700
Linear Separability
x1
x2
AND
Carla P. GomesCS4700
Linear Separability
x1
x2
XOR
Carla P. GomesCS4700
Linear Separability
x1
x2
XOR
Minsky & Papert (1969) Bad News: Perceptrons can only represent linearly separable functions.
Not linearly separable
Carla P. GomesCS4700
Consider a threshold perceptron for the logical XOR function (two inputs):
Our examples are:
x1 x2 label
1 0 0 0
2 1 0 1
3 0 1 1
4 1 1 0
Linear Separability:XOR
Txwxw 2211
Given our examples, we have the following inequalities for the perceptron:
From (1) 0 + 0 ≤ T T0From (2) w1+ 0 > T w1 > TFrom (3) 0 + w2 > T w2 > TFrom (4) w1 + w2 ≤ T
w1 + w2 > 2T
contradiction
So, XOR is not linearly separable
Carla P. GomesCS4700
Convergence of Perceptron Learning Algorithm
… training data linearly separable
… step size sufficiently small
… no “hidden” units
Perceptron converges to a consistent function, if…
Perceptron learns majority function easily, DTL is hopeless
DTL learns restaurant function easily, perceptron cannot represent it
Carla P. GomesCS4700
Good news: Adding hidden layer allows more target functions to be represented.
Minsky & Papert (1969)
Carla P. GomesCS4700
Multi-layer Perceptrons (MLPs)
Single-layer perceptrons can only represent linear decision surfaces.
Multi-layer perceptrons can represent non-linear decision surfaces.
Carla P. GomesCS4700
Minsky & Papert (1969) “[The perceptron] has many features to attract attention: its linearity; its intriguing learning theorem; its clear paradigmatic simplicity as a kind of parallel computation. There is no reason to suppose that any of these virtues carry over to the many-layered version. Nevertheless, we consider it to be an important research problem to elucidate (or reject) our intuitive judgment that the extension is sterile.”
Bad news: No algorithm for learning in multi-layered networks, and no convergence theorem was known in 1969!
Minsky & Papert (1969) pricked the neural network balloon …they almost killed the field.
Winter of Neural Networks 69-86.
Rumors say these results may have killed Rosenblatt….
Carla P. GomesCS4700
Two major problems they saw were
1. How can the learning algorithm apportion credit (or blame) to individual weights for incorrect classifications depending on a (sometimes) large number of weights?
2. How can such a network learn useful higher-order features?
Carla P. GomesCS4700
Good news: Successful credit-apportionment learning algorithms
developed soon afterwards (e.g., back-propagation). Still successful, in
spite of lack of convergence theorem.
The “Bible” (1986)