+ All Categories
Home > Documents > Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep...

Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep...

Date post: 21-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
86
Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain
Transcript
Page 1: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Deep Q-Learning from Demonstrations (DQfD)

Bryan Chan & Chandripal Budnarain

Page 2: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Markov Decision Process (MDP)

• A MDP is a tuple ⟨𝑆, 𝐴, 𝑃, 𝑅, 𝛾⟩• 𝑆: A finite set of states

• 𝐴: A finite set of actions

• 𝑃: A state transition function

• 𝑅: A reward function

• 𝛾: Discount factor

• Want to find a policy 𝜋: 𝑆 → 𝐴 such that it maximizes the expected discounted total reward

Page 3: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Q-Function

• The action-value Q-function 𝑄𝜋(𝑠𝑡 , 𝑎𝑡) is the expected return starting from state 𝑠𝑡, taking action 𝑎𝑡, and then following policy 𝜋

• 𝑄𝜋 𝑠𝑡 , 𝑎𝑡 = 𝐸 𝑅𝑡+1 + 𝛾𝑅𝑡+2 + 𝛾2𝑅𝑡+3 +⋯ | 𝑠𝑡, 𝑎𝑡= 𝐸𝑠′ 𝑅𝑡+1 + 𝛾𝑄𝜋 𝑠′, 𝑎′ 𝑠𝑡, 𝑎𝑡]

• The optimal policy 𝜋∗ 𝑠 can be obtained from optimal Q-function argmax𝑎𝑄

∗(𝑠, 𝑎)

Page 4: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Q-Learning Algorithm

Page 5: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Deep Q-Network (DQN)

• State-action space might be too big for storing a Q-table!

• Idea: Replace Q-table with a neural network that approximates Q-values

• Deep Q-Network = Deep Learning + Q-Learning

Page 6: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Q-Function Approximator

• 𝐿𝑜𝑠𝑠 = [(𝑅 𝑠, 𝑎 + 𝛾max𝑎∈𝐴

𝑄(𝑠′, 𝑎; 𝜃)) − 𝑄(𝑠, 𝑎; 𝜃)]2

Page 7: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

DQN Algorithm

Page 8: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

How to Combine Demonstration Data with DQN?

Page 9: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Loss Function

• Recall that the loss function for Q-Learning is:𝐽𝐷𝑄𝑁 𝑄 = [(𝑅 𝑠, 𝑎 + 𝛾max

𝑎𝑄(𝑠′, 𝑎; 𝜃)) − 𝑄(𝑠, 𝑎; 𝜃)]2

• Given demonstration data, we want the agent to learn from it

• Issue: Demonstration data only covers a small subset of the state space and does not consider a lot of actions

• Issue: Many (ungrounded) values are not realistic and the Q-Network would propagate these values

Page 10: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Supervised Large Margin Classification Loss

• Push the values of other actions to be at least a margin lower than the demonstrator’s action

• The loss function:𝐽𝐸 𝑄 = max

𝑎∈𝐴𝑄 𝑠, 𝑎 + 𝑙 𝑎, 𝑎𝐸 − 𝑄(𝑠, 𝑎𝐸) ,

where 𝑙 𝑎, 𝑎𝐸 is a margin function that is 0 when 𝑎 = 𝑎𝐸 and some positive value otherwise, and 𝑎𝐸 is the demonstrator’s action

• In this paper, 𝑙 𝑎, 𝑎𝐸 = 0 if 𝑎 = 𝑎𝐸 , and 0.8 otherwise

Page 11: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

New Loss Function

• 𝐽 𝑄 = 𝐽𝐷𝑄𝑁 𝑄 + 𝜆1𝐽𝑛 𝑄 + 𝜆2𝐽𝐸 𝑄 + 𝜆3𝐽𝐿2(𝑄),

where 𝜆’s control the weighting between the losses, 𝐽𝑛 𝑄 is the n-step TD-loss, and 𝐽𝐿2 𝑄 is the L2 regularization loss

• There is a trade off between following demonstration data and finding optimal Q-values

Page 12: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Prioritized Experience Replay

• In DQN, we sample experiences from the replay buffer uniformly

• Issue: We tend to learn better when there is a big difference between what we imagine and the actual outcome

• For example, we focus on mistakes and learn from them!

• We can prioritized what we sample instead – By looking at the latest TD-error: 𝛿 = 𝑅 𝑠, 𝑎 + 𝛾max

𝑎∈𝐴𝑄 𝑠′, 𝑎; 𝜃 − 𝑄 𝑠, 𝑎; 𝜃

“actual” outcome “estimated” outcome

Page 13: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Prioritized Experience Replay

• Specifically, priority of experience 𝑖, 𝑃 𝑖 =𝑝𝑖𝛼

σ𝑘 𝑝𝑘𝛼,

where 𝑝𝑖 = 𝛿𝑖 + 𝜖 is the absolute of last TD-error with some positive constant

• What is 𝜶?

• 𝛼 (hyperparameter) decides how much prioritization is used. If 𝛼 =0, we are sampling uniformly

• Issue: Sampling with priority introduces bias and changes the distribution

Page 14: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Prioritized Experience Replay

• Solution: Correct using weighted importance-sampling with weights

𝑤𝑖 = (1

𝑁

1

𝑃 𝑖)𝛽, where 𝑁 is number of samples

• What is 𝜷?

• 𝛽 (hyperparameter) decides how much we should compensate for the non-uniform probabilities 𝑃 𝑖 . If 𝛽 = 1, we fully compensate

• In general, 𝛼 and 𝛽 grows together as time goes on. The idea is that we first sample close to uniformly, then slowly sample with priority

• In this paper, 𝛼 = 0.4 and 𝛽 = 0.6 (Fixed)

Page 15: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Deep Q-Learning from Demonstration (DQfD)

Page 16: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

DQfD Pre-Training

Page 17: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

DQfD Post-Training

Page 18: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

DQfD Replay Buffer Tweak

• We give more priority on demonstration data (by having a higher 𝜖)

• In this paper, 𝜖𝑎 = 0.001 (self-generated) and 𝜖𝑑 = 1.0 (demonstration)

• Problem: What if the replay buffer is full?

• 1) We want to make sure the agent does not go too far from demonstrator unless some other action is optimal

• Keep demonstration data

• 2) Old sampled experiences are out-of-date

• Remove oldest self-generated data

Page 19: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Experiment

• Compared DQfD with

PDD DQN & supervised

imitation on three games

• PDD DQN is DQfDwithout demonstration data, pre-training, supervised loss, and regularization loss

Page 20: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Removing Supervised Loss

Page 21: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Summary

• Improved initial performance in real system using demonstration data

• Accelerated learning by combining supervised large margin classification loss and traditional DQN loss

• Smartly utilizes demonstration data during post-training using prioritized experience replay

Page 22: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Limitations

• Does not explore continuous state-action space scenarios

• Similar to previous paper, algorithm does not explore hidden state humans might consider

Page 23: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 24: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 25: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 26: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 27: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 28: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 29: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 30: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 31: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 32: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 33: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 34: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 35: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 36: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 37: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 38: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 39: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 40: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 41: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 42: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 43: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 44: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 45: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 46: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Presented by David Acuna and Brenna Li

Page 47: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Problem Formulation

Auto-Rally car

training/test track

off-the-road real-word scenario.

high-speed is a must

Page 48: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Problem Formulation

cheap sensors.

NN learns from raw images and speed sensor

expensive sensors

model predictive control~ $6,000

~ $500

IMU=Inertial Measurement Units

GPS=Global Positioning System

Page 49: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Formulation

• needs to account for high-speed

• involves a physical robot

state, action, observation

expected reward of taking this action

expected reward of this state

Page 50: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Formulation

• needs to account for high-speed

• involves a physical robot

Hard to solve

expert

Wasserstein Distance

Page 51: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Formulation

experts policylearner policy

Online Imitation Learning Problem

Page 52: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Online Imitation Learning

online IL problem

DAgger

Sequence of

Supervised Learning Problems

Page 53: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Batch Imitation LearningFlipping the policies

This resumes to supervised learning

expert policyexpert policy

Page 54: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

System Diagram

Page 55: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

DNN Control Policy

Page 56: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Expert – recall control

Sparse Spectrum Gaussian Process

Page 57: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Expert – MPC

Differential Dynamic Program (DDP) ~ Recall iLQR

Page 58: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Related works:

Page 59: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Experiment – Setup Experts

High Speed driving

at 7.5 m/s or 135 km / h

Cost for expert:

Page 60: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Experiment– learning trajectories

Page 61: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Comparing – Loss (to expert)

Page 62: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Comparing – distance travelled

Page 63: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Comparing – generalizability

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Page 64: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Comparing – generalizability

Page 65: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

DNN – high and low capture

Page 66: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

DNN > CNN … or Limitation?

Page 67: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 68: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Thank you!

Any Questions?

Page 69: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 70: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 71: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 72: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 73: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 74: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 75: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 76: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 77: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 78: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 79: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 80: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 81: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 82: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 83: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 84: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 85: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision
Page 86: Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Recommended