+ All Categories
Home > Documents > Model-Based Reinforcement Learning

Model-Based Reinforcement Learning

Date post: 11-Dec-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
53
Model-Based Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey Levine
Transcript
Page 1: Model-Based Reinforcement Learning

Model-Based Reinforcement Learning

CS 294-112: Deep Reinforcement Learning

Sergey Levine

Page 2: Model-Based Reinforcement Learning

Class Notes

1. Project proposal due today!

2. Remember to start early on Homework 3!

Page 3: Model-Based Reinforcement Learning

1. Last lecture: choose good actions autonomously by backpropagating (or planning) through known system dynamics (e.g. known physics)

2. Today: what do we do if the dynamics are unknown?a. Fitting global dynamics models (“model-based RL”)

b. Fitting local dynamics models

3. Friday: learning dynamics for high-dimensional observations, such as images

4. Following Wednesday: combining optimal control and policy search to train neural network policies with the aid of optimal control

Overview

Page 4: Model-Based Reinforcement Learning

1. Overview of model-based RL• Learn only the model

• Learn model & policy

2. What kind of models can we use?

3. Global models and local models

4. Learning with local models and trust regions

• Goals:• Understand the terminology and formalism of model-based RL

• Understand the options for models we can use in model-based RL

• Understand practical considerations of model learning

• Not much deep RL today, we’ll see more advanced model-based RL later!

Today’s Lecture

Page 5: Model-Based Reinforcement Learning

Why learn the model?

Page 6: Model-Based Reinforcement Learning

Why learn the model?

Page 7: Model-Based Reinforcement Learning

Why learn the model?

Page 8: Model-Based Reinforcement Learning

Does it work? Yes!

• Essentially how system identification works in classical robotics

• Some care should be taken to design a good base policy

• Particularly effective if we can hand-engineer a dynamics representation using our knowledge of physics, and fit just a few parameters

Page 9: Model-Based Reinforcement Learning

Does it work? No!

• Distribution mismatch problem becomes exacerbated as we use more expressive model classes

go right to get higher!

Page 10: Model-Based Reinforcement Learning

Can we do better?

Page 11: Model-Based Reinforcement Learning

What if we make a mistake?

Page 12: Model-Based Reinforcement Learning

Can we do better?ev

ery

N s

tep

s

This will be on HW4!

Page 13: Model-Based Reinforcement Learning

How to replan?ev

ery

N s

tep

s

• The more you replan, the less perfect each individual plan needs to be

• Can use shorter horizons

• Even random sampling can often work well here!

Page 14: Model-Based Reinforcement Learning

That seems like a lot of work…ev

ery

N s

tep

s

Page 15: Model-Based Reinforcement Learning

Backpropagate directly into the policy?

backprop backpropbackprop

easy for deterministic policies, but also possible for stochastic policy (more on this later)

Page 16: Model-Based Reinforcement Learning

Summary

• Version 0.5: collect random samples, train dynamics, plan• Pro: simple, no iterative procedure• Con: distribution mismatch problem

• Version 1.0: iteratively collect data, replan, collect data• Pro: simple, solves distribution mismatch• Con: open loop plan might perform poorly, esp. in stochastic domains

• Version 1.5: iteratively collect data using MPC (replan at each step)• Pro: robust to small model errors• Con: computationally expensive, but have a planning algorithm available

• Version 2.0: backpropagate directly into policy• Pro: computationally cheap at runtime• Con: can be numerically unstable, especially in stochastic domains (more on this later)

Page 17: Model-Based Reinforcement Learning

Case study: model-based policy search with GPs

Page 18: Model-Based Reinforcement Learning

Case study: model-based policy search with GPs

Page 19: Model-Based Reinforcement Learning

Case study: model-based policy search with GPs

Page 20: Model-Based Reinforcement Learning
Page 21: Model-Based Reinforcement Learning

What kind of models can we use?

Gaussian process neural network

image: Punjani & Abbeel ‘14

other

video prediction?more on this later in the course

Page 22: Model-Based Reinforcement Learning

ever

y N

ste

ps

Page 23: Model-Based Reinforcement Learning
Page 24: Model-Based Reinforcement Learning

Break

Page 25: Model-Based Reinforcement Learning

The trouble with global models

• Planner will seek out regions where the model is erroneously optimistic

• Need to find a very good model in most of the state space to converge on a good solution

Page 26: Model-Based Reinforcement Learning

The trouble with global models

• Planner will seek out regions where the model is erroneously optimistic

• Need to find a very good model in most of the state space to converge on a good solution

• In some tasks, the model is much more complex than the policy

Page 27: Model-Based Reinforcement Learning

Local models

Page 28: Model-Based Reinforcement Learning

Local models

Page 29: Model-Based Reinforcement Learning

Local models

Page 30: Model-Based Reinforcement Learning

What controller to execute?

Page 31: Model-Based Reinforcement Learning

What controller to execute?

Page 32: Model-Based Reinforcement Learning

What controller to execute?

Page 33: Model-Based Reinforcement Learning

Local models

Page 34: Model-Based Reinforcement Learning

How to fit the dynamics?

Can we do better?

Page 35: Model-Based Reinforcement Learning

What if we go too far?

Page 36: Model-Based Reinforcement Learning

How to stay close to old controller?

Page 37: Model-Based Reinforcement Learning

KL-divergences between trajectories

• Turns out to work very similarly to trust region for PG

dynamics & initial state are the same!

Page 38: Model-Based Reinforcement Learning

KL-divergences between trajectories

Page 39: Model-Based Reinforcement Learning

KL-divergences between trajectories

negative entropy

Page 40: Model-Based Reinforcement Learning

KL-divergences between trajectories

Page 41: Model-Based Reinforcement Learning

Digression: dual gradient descent

how to maximize? Compute the gradient!

Page 42: Model-Based Reinforcement Learning

Digression: dual gradient descent

Page 43: Model-Based Reinforcement Learning

Digression: dual gradient descent

Page 44: Model-Based Reinforcement Learning

DGD with iterative LQR

Page 45: Model-Based Reinforcement Learning

DGD with iterative LQR

this is the hard part, everything else is easy!

Page 46: Model-Based Reinforcement Learning

DGD with iterative LQR

Page 47: Model-Based Reinforcement Learning

DGD with iterative LQR

Page 48: Model-Based Reinforcement Learning

Trust regions & trajectory distributions

•Bounding KL-divergences between two policies or controllers, whether linear-Gaussian or more complex (e.g. neural networks) is really useful

•Bounding KL-divergence between policies is equivalent to bounding KL-divergences between trajectory distributions

Page 49: Model-Based Reinforcement Learning

Example: local models & iterative LQR

Page 50: Model-Based Reinforcement Learning
Page 51: Model-Based Reinforcement Learning
Page 52: Model-Based Reinforcement Learning

Example: local models with images

Page 53: Model-Based Reinforcement Learning

Recommended