+ All Categories
Home > Documents > CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence

Date post: 04-Jan-2016
Category:
Upload: kaliska-parker
View: 18 times
Download: 1 times
Share this document with a friend
Description:
CS 4700: Foundations of Artificial Intelligence. Prof. Carla P. Gomes [email protected] Module: Neural Networks Expressiveness of Perceptrons (Reading: Chapter 20.5). Expressiveness of Perceptrons. Expressiveness of Perceptrons. What hypothesis space can a perceptron represent?. - PowerPoint PPT Presentation
18
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes [email protected] Module: Neural Networks Expressiveness of Perceptrons (Reading: Chapter 20.5)
Transcript
Page 1: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

CS 4700:Foundations of Artificial Intelligence

Prof. Carla P. [email protected]

Module: Neural Networks

Expressiveness of Perceptrons(Reading: Chapter 20.5)

Page 2: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Expressiveness of Perceptrons

Page 3: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Expressiveness of Perceptrons

What hypothesis space can a perceptron represent?

Even more complex Booelan functions such as majority function .

But can it represent any arbitrary Boolean function?

Page 4: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Expressiveness of Perceptrons

A threshold perceptron returns 1 iff the weighted sum of its inputs (including the bias) is positive, i.e.,:

I.e., iff the input is on one side of the hyperplane it defines.

Linear discriminant function or linear decision surface.

Weights determine slope and bias determines offset.

Perceptron Linear Separator

Page 5: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

x1

x2+

++

++

+

+

2

01

2

12

22110 0

w

wx

w

wx

xwxww

Can view trained network as defining a “separation line”.

Linear Separability

Percepton used for classification

Consider example with two inputs, x1, x2:

What is its equation?

Page 6: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Linear Separability

x1

x2

OR

Page 7: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Linear Separability

x1

x2

AND

Page 8: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Linear Separability

x1

x2

XOR

Page 9: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Linear Separability

x1

x2

XOR

Minsky & Papert (1969) Bad News: Perceptrons can only represent linearly separable functions.

Not linearly separable

Page 10: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Consider a threshold perceptron for the logical XOR function (two inputs):

Our examples are:

x1 x2 label

1 0 0 0

2 1 0 1

3 0 1 1

4 1 1 0

Linear Separability:XOR

Txwxw 2211

Given our examples, we have the following inequalities for the perceptron:

From (1) 0 + 0 ≤ T T0From (2) w1+ 0 > T w1 > TFrom (3) 0 + w2 > T w2 > TFrom (4) w1 + w2 ≤ T

w1 + w2 > 2T

contradiction

So, XOR is not linearly separable

Page 11: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Convergence of Perceptron Learning Algorithm

… training data linearly separable

… step size sufficiently small

… no “hidden” units

Perceptron converges to a consistent function, if…

Page 12: CS 4700: Foundations of  Artificial Intelligence

Perceptron learns majority function easily, DTL is hopeless

Page 13: CS 4700: Foundations of  Artificial Intelligence

DTL learns restaurant function easily, perceptron cannot represent it

Page 14: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Good news: Adding hidden layer allows more target functions to be represented.

Minsky & Papert (1969)

Page 15: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Multi-layer Perceptrons (MLPs)

Single-layer perceptrons can only represent linear decision surfaces.

Multi-layer perceptrons can represent non-linear decision surfaces.

Page 16: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Minsky & Papert (1969) “[The perceptron] has many features to attract attention: its linearity; its intriguing learning theorem; its clear paradigmatic simplicity as a kind of parallel computation. There is no reason to suppose that any of these virtues carry over to the many-layered version. Nevertheless, we consider it to be an important research problem to elucidate (or reject) our intuitive judgment that the extension is sterile.”

Bad news: No algorithm for learning in multi-layered networks, and no convergence theorem was known in 1969!

Minsky & Papert (1969) pricked the neural network balloon …they almost killed the field.

Winter of Neural Networks 69-86.

Rumors say these results may have killed Rosenblatt….

Page 17: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Two major problems they saw were

1. How can the learning algorithm apportion credit (or blame) to individual weights for incorrect classifications depending on a (sometimes) large number of weights?

2. How can such a network learn useful higher-order features?

Page 18: CS 4700: Foundations of  Artificial Intelligence

Carla P. GomesCS4700

Good news: Successful credit-apportionment learning algorithms

developed soon afterwards (e.g., back-propagation). Still successful, in

spite of lack of convergence theorem.

The “Bible” (1986)


Recommended