Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement...

Date post:	28-Sep-2018
Category:	Documents
Upload:	hoanghanh
View:	218 times
Download:	0 times

Download Report this document

Share this document with a friend

Embed Size (px):

Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Transcript

Lecture 3: Q-learning (table)

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Try Frozen Lake, Real Game?

Page 3: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: Random?

Page 4: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: Even if you know the way, ask.“아는 길도, 물어가라”

Page 5: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Q-function (state-action value function)

(1) state

(2) action(3) quality (reward)

Q (state, action)

Page 6: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Policy using Q-function

Q (state, action)

Q (s1, LEFT): 0Q (s1, RIGHT): 0.5Q (s1, UP): 0Q (s1, DOWN): 0.3

Page 7: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Optimal Policy, and Max Q

Q (state, action)

Max Q =

Page 8: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: optimal policy with Q

Page 9: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: optimal policy with Qa

s`s

Page 10: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Finding, Learning Q

• Assume (believe) Q in s` exists!

• My condition- I am in s

- when I do action a, I’ll go to s`- when I do action a, I’ll get reward r- Q in s`, Q(s`, a`) exist!

• How can we express Q(s, a) using Q(s`, a`)?

Page 11: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q (s, a)?

r s`

Page 12: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

State, action, reward

S F F F

F H F H

F F F H

H F F G

Page 13: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Future reward

Page 14: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q (s, a)?

Page 15: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a): 16x4 Table16 states and 4 actions (up, down, left, right)

Page 16: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a): Tableinitial Q values are 0

Page 17: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table (with many trials)initial Q values are 0

Page 18: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table (with many trials)initial Q values are 0

1 Q(s14, aright) = r = 1

Q(s13, aright) = r + max(Q(s14, a)) = 0 + max (0, 0, 1, 0) = 1

Page 19: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table: one success!initial Q values are 0

111

Page 20: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table: optimal policy

111

Page 21: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Dummy Q-learning algorithm

Page 22: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lab: Dummy Q-learning Table

Documents

Generative Adversarial Networks (GANs) - Ian Goodfellow · 2020. 9. 24. · Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley

Documents

Lab 3: Dummy Q-learning (table) - GitHub Pages · PDF fileLab 3: Dummy Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Documents

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, …people.csail.mit.edu/hunkim/papers/kim-tse2013.pdf · information about a specific software failure, including the failure symptoms,

Documents

OpenAI Five Model Architecture · OpenAI Five Model Architecture (06/06/2018) Title: dota_network_diagram Created Date: 6/24/2018 4:00:19 PM ...

Documents

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overﬁtting Sung Kim

Documents

Extending the OpenAI Gym for robotics: a toolkit for ...erlerobotics.com/whitepaper/robot_gym.pdfExtending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS

Documents

GeoﬀreyIrving PaulChristiano DarioAmodei OpenAIAI safety via debate GeoﬀreyIrving∗ PaulChristiano OpenAI DarioAmodei Abstract TomakeAIsystemsbroadlyusefulforchallengingreal-worldtasks,weneedthemtolearn

Documents

On First-Order Meta-Learning Algorithms · On First-Order Meta-Learning Algorithms Alex Nichol and Joshua Achiam and John Schulman OpenAI falex, jachiam, [email protected] Abstract

Documents

State Common Entrance Test Cell - fe2019.mahacet.orgfe2019.mahacet.org/CAP-I/CAPR-I_EN3207.pdf · 95 693790.6428261EN19333354GHANGREKAR ATHARVA VINAY M OPENAI 96 753789.8902170EN19247083ANKIT

Documents

ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Documents

Model-Based Reinforcement Learning via Meta-Policy ...h2t.anthropomatik.kit.edu/pdf/Rothfuss2018.pdf · Jonas Rothfuss KIT, UC Berkeley [email protected] John Schulman OpenAI

Documents

Lecture 7: DQN - GitHub Pageshunkim.github.io/ml/RL/rl07.pdf · Lecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Q-function Approximation:

Documents

Learning Montezuma’s Revenge from a Single Demonstration · 2018-12-11 · Learning Montezuma’s Revenge from a Single Demonstration Tim Salimans OpenAI, Google Brain Richard Chen

Documents

Extending the OpenAI Gym for robotics: a toolkit for ... · OpenAI Gym [1] is a is a toolkit for reinforcement learning research that has recently gained popularity in the machine

Documents

PDF - arXiv · Proximal Policy Optimization Algorithms John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov OpenAI fjoschu, filip, prafulla, alec ...

Documents

Large-Scale Study of Curiosity-Driven LearningLarge-Scale Study of Curiosity-Driven Learning Yuri Burda OpenAI Harri Edwards OpenAI Deepak Pathak UC Berkeley Amos Storkey Univ. of

Documents

Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your

Documents