+ All Categories
Home > Documents > Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2)...

Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2)...

Date post: 17-Jan-2016
Category:
Upload: stewart-rich
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar 2) 1) Computer Science, Tokyo Institute of Technology 2) School of Informatics, University
Transcript
Page 1: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

Value Function Approximationon Non-linear Manifolds for Robot Motor Control

Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu Vijayakumar2)

1) Computer Science, Tokyo Institute of Technology2) School of Informatics, University of Edinburgh

Page 2: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

2Maze Problem: Guide Robot to Goal

Robot knows its position but doesn’t know which direction to go.

We don’t teach the best action to take at each position but give a reward at the goal.

Task: make the robot select the optimal action.

Up

RightLeft

Down

Possible actions

Position (x,y)

Goal

reward

Page 3: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

3Markov Decision Process (MDP)

An MDP consists of : set of states, : set of actions, : transition probability, : reward,

An action the robot takes at state is specified by policy .

Goal: make the robot learn optimal policy

RPAS ,,,SAPR

is right left, down, up,

)sΡ(s,a,

)(sa

),( asR

a s

Page 4: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

4Definition of Optimal PolicyAction-value function:

discounted sum of future rewards when taking in and following thereafter

Optimal value:

Optimal policy: is computed if is given.Question: How to compute ?

aassrEasQt

tt

000

,),(

),(maxarg),( asQasQ

),(maxarg),( asQasa

a s

Q

Q

Page 5: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

5Policy Iteration

Starting from some initial policy iterate Steps 1 and 2 until convergence.Step 1. Compute for current

Step 2. Update by

Policy iteration always converges to if       in step 1 can be computed.

Question: How to compute ?

(Sutton & Barto, 1998)

),(maxarg)( asQsa

aassrEasQt

tt

000

,|),(

),( asQ

),( asQ

),( asQ

Page 6: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

6

can be recursively expressed by

can be computed by solving Bellman equation

Drawback: dimensionality of Bellman equation becomes huge in large state and action spaces

Bellman Equation

high computational cost

))'(,'()',,(),(),('

ssQsasPasRasQs

),( asQ

),( asQ

as ,

AS

Page 7: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

7Least-Squares Policy Iteration

Linear architecture:

is learned so as to optimally approximate Bellman equation in the least-squares sense

# of parameters is only :

LSPI works well if we choose appropriateQuestion: How to choose ?

(Lagoudakis and Parr, 2003)

: fixed basis functions : parameters: # of basis functions

),(),(ˆ1

aswasQ i

K

ii

),( asiiwK

KASK

Kii 1}{

Kii 1}{

Kiiw 1}{

Page 8: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

8Popular Choice: Gaussian Kernel (GK)

Smooth Gaussian tail goes over

partitions

: Euclidean distance: Centre state

2

2

2

),(exp)(

ssED

sk c

ED

cscs

cs

Partitions

Page 9: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

9Approximated Value Function by GK

Values around the partitions are not approximated well.

Approximated by GKOptimal value function

Log scale

20 randomly located Gaussians

Page 10: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

10Policy Obtained by GK

GK provides an undesired policy around the partition.

GK-based policyOptimal policy

Page 11: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

11Aim of This Research

Gaussian tails go over the partition.Not suited for approximating

discontinuous value functions.

We propose new Gaussian kernel to overcome this problem.

Page 12: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

12State Space as a GraphOrdinary Gaussian uses Euclidean distance.

Euclidean distance does not incorporate state space structure, so tail problems occur.

We represent state space structure by a graph, and use it for defining Gaussian kernels.

(Mahadevan, ICML 2005)

2

2

2

),(exp)(

ssED

sk c

Page 13: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

13

Natural distance on graph is shortest path.

We use shortest path in Gaussian function.

We call this kernel geodesic Gaussian.SP can be efficiently computed by Dijkstra.

Geodesic Gaussian Kernels

2

2

2

),(exp)(

ssSP

sk cEuclidean distance

Shortest path

Page 14: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

14Example of Kernels

Tails do not go across the partition.Values smoothly decrease along the maze.

Geodesic GaussianOrdinary Gaussiancs

cs cs

Page 15: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

15

Values near the partition are well approximated.Discontinuity across the partition is preserved.

Ordinary Gaussian

Optimal

Comparison of Value Functions

Geodesic Gaussian

Page 16: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

16Comparison of Policies

GGKs provide good policies near the partition.

Geodesic GaussianOrdinary Gaussian

Page 17: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

17

Average over 100 runs

Geodesic

Ordinary

Ordinary Gaussian: tail problemGeodesic Gaussian: no tail problem

Experimental Result

Number of kernels

Fra

ctio

n of

opt

imal

st

ates

Page 18: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

18Robot Arm Reaching

2-DOF robot arm State space

Joint 1

Joint 2

Endeffector

Object

Obstacle

Joint 1 (degree)

Join

t 2 (

degr

ee)

0

180

-1801000-100Reward:

+1 reach the object0 otherwise

Task: move the end effector to reach the object

Page 19: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

19Robot Arm Reaching

Successfully avoids the obstacle and can reach the object.

Moves directly towards the object without

avoiding the obstacle.

Ordinary Gaussian Geodesic Gaussian

Page 20: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

20Khepera Robot Navigation

Khepera has 8 IR sensors measuring the distance to obstacles.

Task: explore unknown maze without collision

Reward: +1 (forward)-2 (collision)0 (others)

Sensor value: 0 - 1030

Page 21: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

21State Space and Graph

Discretize 8D state space by self-organizing map.

Partitions

2D visualization

Page 22: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

22Khepera Robot Navigation

When facing obstacle, goes backward (and goes

forward again).

When facing obstacle, makes a turn

(and go forward).

Ordinary Gaussian Geodesic Gaussian

Page 23: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

23Experimental Results

Average over 30 runs

Geodesic outperforms ordinary Gaussian.

Geodesic

Ordinary

Page 24: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.

24Conclusion

Value function approximation:

good basis function neededOrdinary Gaussian kernel:

tail goes over discontinuitiesGeodesic Gaussian kernel:

smooth along the state spaceThrough the experiments, we showed

geodesic Gaussian is promising in high-dimensional continuous problems!


Recommended