+ All Categories
Home > Documents > New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays...

New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays...

Date post: 25-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
40
CS 4700: Foundations of Artificial Intelligence Spring 2020 Prof. Haym Hirsh Lecture 22 April 6, 2020 1
Transcript
Page 1: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

CS 4700: Foundations of

Artificial Intelligence

Spring 2020 Prof. Haym Hirsh

Lecture 22

April 6, 2020

1

Page 2: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Welcome!

2

Page 3: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Backup plans:

If I leave the meeting

(check participants window)

(example: my machine crashes)

Take a five minute break

If I’m not back on by then check Piazza

3

Page 4: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Backup plans:

Zoombombing/trolling

I will try to fix it (kill the user)

If I end the meeting please check Piazza for URL for restart

4

Page 5: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Rules:

Please stay muted

Please use video!

(but be dressed thoughtfully)

5

Page 6: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Rules:

You are encouraged to ask and answer questions

using Zoom’s chat window

6

Page 7: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Rules:

You are encouraged to ask questions by audio/video

Please post QUESTION in the chat window

Please wait until I call on you

7

Page 8: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Other announcements:

Please fill in survey on Canvas

(Lets me plan, lets you do quiz format)

8

Page 9: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Other announcements:

S/U grading

9

Page 10: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Other announcements:

Homework 3

Due: Wed Apr 8 11:59pm

10

Page 11: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Other announcements:

No prelim/final

Quizzes on Tuesdays and Thursdays Tuesdays: previous week’s material

Thursdays: a topic from the first half of the semester

First quiz: Thu Apr 9 12:00pm Topic: Uninformed search

24 hour window for submission Further details forthcoming

11

Page 12: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Other announcements:

Office hours on website

(including 2/3am EDT most days)

My office hours:

Mondays 3-4pm EDT

and by arrangement (email [email protected])

But not this week, hours this week TBA. 12

Page 13: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Other announcements:

Karma Lectures:

Resume tomorrow, 11:40am

Further details later today

13

Page 14: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Academic integrity

14

Page 15: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Academic integrity

You know right from wrong

Ask me if there’s something on the boundary

15

Page 16: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Today: “Multi-Armed Bandits”

16

Page 17: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Today: “Multi-Armed Bandits” (Special type of MDP)

17

Page 18: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Today: “Multi-Armed Bandits” (Special type of MDP)

(Section 17.3 of textbook)

18

Page 19: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

19

Page 20: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

R1 R2 R3 R4 Rn

20

Page 21: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

R1 R2 R3 R4 Rn

Ri: Generated stochastically Don’t know Ri

21

Page 22: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

R1 R2 R3 R4 Rn

Example: Ri = 1 with probability pi, otherwise 0 Don’t know pi

22

Page 23: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

R1 R2 R3 R4 Rn

Example: Ri = 1 with probability pi, otherwise 0 Don’t know pi

23

Which arm do I pull?

Page 24: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

R1 R2 R3 R4 Rn

Example: Ri = 1 with probability pi, otherwise 0 Don’t know pi

24

Which arms do I pull to figure out which arm to pull?

Page 25: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

R1 R2 R3 R4 Rn

25

Page 26: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

R1 R2 R3 R4 Rn

M1 M2 M3 M4 Mn

26

Page 27: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

R1 R2 R3 R4 Rn

a1 a2 a3 a4 an

M1 M2 M3 M4 Mn

27

Page 28: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandit

. . . .

R1 R2 R3 R4 Rn

a1 a2 a3 a4 an

What strategy do I use to pick a sequence of ai?

. . . . M1 M2 M3 M4 Mn

28

Page 29: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

View as a Single-State MDP

R(s,ai,s)=R(ai) P(s|s,ai)=1.0

a1

a2

an

29

Page 30: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

30

Page 31: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• If you correctly knew Ri: Pick the arm with largest Ri

31

Page 32: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• If you correctly knew Ri: Pick the arm with largest Ri

• Intuition: Use the arm with the best observed Ri

Pick argmaxi 𝑅 i

32

Page 33: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• If you correctly knew Ri: Pick the arm with largest Ri

• Intuition: Use the arm with the best observed Ri

Pick argmaxi 𝑅 i

Problem: May not have enough data for 𝑅 i to be accurate

33

Page 34: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• Key idea: Be optimistic about the outcome of each arm

• (But not wildly optimistic like Q Learning)

34

Page 35: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• Key idea: Be optimistic about the outcome of each arm

• (But not wildly optimistic like Q Learning)

Instead of using observed average value

go up one standard deviation

35

Page 36: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• Pick the arm with largest UCB(Mi) instead of 𝑅 i

UCB(Mi) = 𝑅 i + 𝑔(𝑁)

𝑁i

where

R i = average reward for i so far

N = total number of pulls made so far

Ni = total number of pulls of Mi so far

36

Page 37: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• Pick the arm with largest UCB(Mi) instead of 𝑅 i

UCB(Mi) = 𝑅 i + c ln N

𝑁i

where

R i = average reward for i so far

N = total number of pulls made so far

Ni = total number of pulls of Mi so far

37

Page 38: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• Pick the arm with largest UCB(Mi) instead of 𝑅 i

UCB(Mi) = 𝑅 i + 2 log (1+N log2 N)

𝑁i

where

R i = average reward for i so far

N = total number of pulls made so far

Ni = total number of pulls of Mi so far

38

Page 39: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• Pick the arm with largest UCB(Mi) instead of 𝑅 i

UCB(Mi) = 𝑅 i + 𝑔(𝑁)

𝑁i

where

R i = average reward for i so far

N = total number of pulls made so far

Ni = total number of pulls of Mi so far

g(N) = 2 log (1 + N log2 N) <textbook>

g(N) = c ln N <common> 39

Page 40: New CS 4700: Foundations of Artificial Intelligence · 2020. 4. 13. · My office hours: Mondays 3-4pm EDT and by arrangement (email FAI-L@cornell.edu) But not this week, hours this

Multi-Armed Bandits: Upper Confidence Bound (UCB) Heuristic

• Pick the arm with largest UCB(Mi) instead of 𝑅 i

UCB(Mi) = 𝑅 i + 𝑔(𝑁)

𝑁i

where

R i = average reward for i so far

N = total number of pulls made so far

Ni = total number of pulls of Mi so far

g(N) = c ln N g(N) = 2 log (1 + N log2 N)

g(N) should go up more slowly than 𝑁i 40


Recommended