+ All Categories
Home > Documents > Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy...

Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy...

Date post: 12-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
33
1 © 2019 The MathWorks, Inc. Reinforcement Learning: Leveraging Deep Learning for Controls Christoph Stockhammer MathWorks Application Engineering
Transcript
Page 1: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

1© 2019 The MathWorks, Inc.

Reinforcement Learning:

Leveraging Deep Learning for Controls

Christoph Stockhammer MathWorks Application Engineering

Page 2: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

2

Final Deployment

Page 3: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

3

Goal: We hope you walk away knowing the answer to these questions

▪ What is reinforcement learning and why should I care about it?

▪ How do I set it up and solve it? [from an engineer’s perspective]

▪ What are some benefits and drawbacks?

Page 4: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

4

Let’s try to solve this problem the traditional way

Motor

Control

Leg &

Trunk

Trajectories

Balance

Motor

Commands

Observations

Sensors

Camera

Data

Feature

Extraction

State

Estimation

Plant

Model

+

Control

System

Observations

Motor

Commands

Page 5: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

5

What is the alternative approach?

Sensors

Camera

Data

Feature

Extraction

State

Estimation

Plant

Model

+

Control

System

Observations

Motor

Commands

Sensors

Camera

DataBlack Box

Controller

Motor

Commands

Page 6: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

6

What is reinforcement learning?

Page 7: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

7

Unsupervised

Learning[No Labeled Data]

Clustering

Machine Learning

Reinforcement Learning vs Machine Learning vs Deep Learning

Page 8: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

8

Unsupervised

Learning[No Labeled Data]

Supervised Learning

[Labeled Data]

Clustering Classification Regression

Machine Learning

Reinforcement Learning vs Machine Learning vs Deep Learning

Page 9: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

9

Unsupervised

Learning[No Labeled Data]

Supervised Learning

[Labeled Data]

Clustering Classification Regression

Machine Learning

Reinforcement Learning vs Machine Learning vs Deep Learning

Deep Learning

Supervised learning typically involves

feature extraction

Deep learning typically simplifies

feature extraction

Page 10: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

10

Unsupervised

Learning[No Labeled Data]

Supervised Learning

[Labeled Data]

Clustering Classification Regression

Deep Learning

Machine Learning

Reinforcement

Learning

[Interaction Data]

Decision

MakingControl

Reinforcement Learning vs Machine Learning vs Deep Learning

Reinforcement learning:

▪ Learning through trial & error

[interaction]

▪ Complex problems typically

need deep learning

[Deep Reinforcement

Learning]

▪ It’s about learning a

behavior or accomplishing a

task

Page 11: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

11

Reinforcement Learning vs Machine Learning vs Deep Learning

Page 12: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

12

Reinforcement Learning vs Machine Learning vs Deep Learning

Page 13: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

13

Reinforcement Learning vs Machine Learning vs Deep Learning

Page 14: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

14

Reinforcement Learning vs Machine Learning vs Deep Learning

Page 15: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

15

A Practical Example of Reinforcement LearningTraining a Self-Driving Car

▪ Vehicle’s computer learns how to drive…

(agent)

▪ using sensor readings from LIDAR, cameras,…

(state)

▪ that represent road conditions, vehicle position,…

(environment)

▪ by generating steering, braking, throttle commands,…

(action)

▪ based on an internal state-to-action mapping…

(policy)

▪ that tries to get you from A to B without an accident while

possibly optimizing driver comfort & fuel efficiency…

(reward).

▪ The policy is updated through repeated trial-and-error by a

reinforcement learning algorithm

AGENT

Reinforcement

Learning

Algorithm

Policy

ENVIRONMENT

ACTION

REWARD

STATE

Policy update

Page 16: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

16

A Practical Example of Reinforcement Learning A Trained Self-Driving Car Only Needs A Policy To Operate

▪ Vehicle’s computer uses the final state-to-action mapping…

(policy)

▪ to generate steering, braking, throttle commands,…

(action)

▪ based on sensor readings from LIDAR, cameras,…

(state)

▪ that represent road conditions, vehicle position,…

(environment)

POLICY

ENVIRONMENT

ACTIONSTATE

By definition, this trained policy

puts into practice what is in the

reward function

Page 17: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

17

A deep neural network trained using reinforcement learning is a

black-box model that determines the best possible action

Current State

(Image, Radar,

Sensor, etc.)

Previous

Action

(optional)

Next

Action

Deep Neural Network Policy

(captures environment

dynamics…somehow)

By representing policies using deep neural networks, we can solve problems

for complex, non-linear systems (continuous or discrete) by directly using

data that traditional approaches cannot use easily

Page 18: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

18

How do I set it up and solve it?

Page 19: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

19

Steps in the Reinforcement Learning Workflow

Environment Reward Policy Training Deployment

Page 20: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

20

Environment

Page 21: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

21

Video of Simulink model

Page 22: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

22

Steps in the Reinforcement Learning Workflow

Page 23: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

23

S0

S1

S2

S3

S4

T=0 T=2

T=1

T=1

T=1

a1 (5)

a2 (10)

a3 (40)

Reward Function

Long-Term Reward (Q)

▪ Q(S0,a1) = +105 (optimal)

▪ Q(S0,a2) = +60

▪ Q(S0,a3) = +30

b1 (100)

b2 (50)

b3 (-10)

The logic you used right now to

decide you should take action a1

is a policy

Page 24: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

24

Policy & Agent

Page 25: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

25

Page 26: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

26

Reinforcement Learning vs Controls

Control system Reinforcement learning system

PLANTCONTROLLER

REFERENCE

MEASUREMENT

MANIPULATED

VARIABLE

+

-

ERROR

Reinforcement learning has parallels to control system design

Controller Policy

Plant Environment

Measurement Observation

Manipulated variable

Error/Cost function

Adaptation mechanism

Action

Reward

RL Algorithm

Page 27: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

27

When would you use Reinforcement Learning?

Reinforcement learning might be a good fit if

▪ An environment model is available (trial & error on hardware can be expensive), and

▪ Training/tuning time is not critical for the application, and

▪ Uncertain environments or nonlinear environments

Controller

Capability

Computational Cost

in Training/Tuning

Computational Cost

in Deployment

PID Low Low Low

Model Pred Control High Low High

Reinforcement Learning Very High High Medium

Page 28: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

28

Reinforcement Learning Toolbox

New in R2019a

▪ Built-in and custom algorithms for reinforcement

learning

▪ Environment modeling in MATLAB and Simulink

▪ Deep Learning Toolbox support for designing policies

▪ Training acceleration through GPUs and cloud

resources

▪ Deployment to embedded devices and production

systems

▪ Reference examples for getting started

Page 29: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

29

Automotive Applications

▪ Controller Design

▪ Lane Keep Assist

▪ Adaptive Cruise Control

▪ Path Following Control

▪ Trajectory Planning

Page 30: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

30

Takeaways

Page 31: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

31

Simulation and virtual models are a key aspect of reinforcement

learning

▪ Reinforcement learning needs a lot of data

(sample inefficient)

– Training on hardware can be prohibitively

expensive and dangerous

▪ Virtual models allow you to simulate conditions

hard to emulate in the real world

– This can help develop a more robust

solution

▪ Many of you have already developed MATLAB

and Simulink models that can be reused

Page 32: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

32

Resources

▪ Examples for automotive and

autonomous system applications

▪ Documentation written for

engineers and domain experts

▪ Tech Talk video series on

reinforcement learning concepts for

engineers

Page 33: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20

33

Thank You!


Recommended