+ All Categories
Home > Documents > Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 ·...

Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 ·...

Date post: 06-Jun-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
13
Lab 6-2: Q Network for Cart Pole Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>
Transcript
Page 1: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Lab 6-2: Q Network for Cart Pole

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Cart Pole

https://gym.openai.com/docs

Page 3: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Random trials

Page 4: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Rewards

Page 5: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Cart Pole Q-network

(2)Ws(1)s

Page 6: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Q-Network training (Network construction)

(2)Ws(1)s

Page 7: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Q-Network training (linear regression)

(2)Ws(1)s

y = r + �maxQ(s0)

cost(W ) = (Ws� y)2

Page 8: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Code: Network and setup

Page 9: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Code: Training

Page 10: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Code: apply

Page 11: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Results: really poor!

Page 12: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Why does not work? Too shallow?

Page 13: Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Excise

• Why does not work?

• Hint: DQN


Recommended