+ All Categories
Home > Documents > Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement...

Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement...

Date post: 28-Sep-2018
Category:
Upload: hoanghanh
View: 218 times
Download: 0 times
Share this document with a friend
22
Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>
Transcript
Page 1: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lecture 3: Q-learning (table)

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Try Frozen Lake, Real Game?

S

Page 3: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: Random?

S

Page 4: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: Even if you know the way, ask.“아는 길도, 물어가라”

Page 5: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Q-function (state-action value function)

(1) state

(2) action(3) quality (reward)

Q (state, action)

Page 6: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Policy using Q-function

Q (state, action)

Q (s1, LEFT): 0Q (s1, RIGHT): 0.5Q (s1, UP): 0Q (s1, DOWN): 0.3

Page 7: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Optimal Policy, and Max Q

Q (state, action)

Max Q =

Page 8: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: optimal policy with Q

S

Page 9: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: optimal policy with Qa

s`s

Page 10: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Finding, Learning Q

• Assume (believe) Q in s` exists!

• My condition- I am in s

- when I do action a, I’ll go to s`- when I do action a, I’ll get reward r- Q in s`, Q(s`, a`) exist!

• How can we express Q(s, a) using Q(s`, a`)?

Page 11: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q (s, a)?

a

r s`

Page 12: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

State, action, reward

S F F F

F H F H

F F F H

H F F G

Page 13: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Future reward

Page 14: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q (s, a)?

Page 15: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a): 16x4 Table16 states and 4 actions (up, down, left, right)

Page 16: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a): Tableinitial Q values are 0

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

Page 17: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table (with many trials)initial Q values are 0

Page 18: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table (with many trials)initial Q values are 0

1 Q(s14, aright) = r = 1

Q(s13, aright) = r + max(Q(s14, a)) = 0 + max (0, 0, 1, 0) = 1

Page 19: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table: one success!initial Q values are 0

11

11

1

111

Page 20: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table: optimal policy

11

11

1

111

Page 21: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Dummy Q-learning algorithm

Page 22: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Next

Lab: Dummy Q-learning Table


Recommended