Laboratory for Perceptual Robotics – Department of Computer Science Hierarchical Mechanisms for...

transcript

Laboratory for Perceptual Robotics – Department of Computer Science

Hierarchical Mechanisms for Robot Programming

Shiraj Sen Stephen Hart Rod Grupen Laboratory for Perceptual Robotics

University of Massachusetts AmherstMay 30, 2008

NEMS ‘08

2Laboratory for Perceptual Robotics – Department of Computer Science

OutlineHierarchical mechanisms

for robot programming

representationprogrammin

ActionPotential functions

Value functions

State representation

user defined

reinforcementlearning

intrinsicextrinsic

Hierarchical Actions

forcevelocity

references

feedbacksignals

ϕpotential fields

Φvalue functions greedy traversal

avoids local minimum

programs

closed loopprimitive actions

Primitive Action Programming Interface

Sensory Error () Visual (uref)

Tactile (fref) Configuration

variables (θref) Operational

Space(xref)

Potential Functions () Spring potential fields

Collision-free motion fields (ϕc)

Kinematic conditioning fields (ϕcond)

Motor Variables ()Subsets of : Configuration

Variables Operational

Space Variables

primitive actions:

a =Nullspace Projection

State Representation

Discrete abstraction of action dynamics. 4-level logic in control predicate pi

no reference ()

convergenceunknown X

0 descending gradient

Hierarchical Programming

A program is defined as a MDP over a vector of controller predicates:

S = p1 … pN

Absorbing states in the value function capture “convergence” of programs.

Learn value functions using reinforcement learning

StackInsertGraspTouch

Catalog

Intrinsic Reward

Goal: build deep control knowledge

Reward controllable interaction with the world• controllers with direct feedback from the external world.

convergence event

Experimental Demonstration

Motor units• Two 7-DOF Barrett WAMs• Two 4-DOF Barrett Hands• 2-DOF pan/tilt stereo head

Sensory feedback• Visual

• Hue• Saturation• Intensity• Texture

• Tactile • 6-axis finger-tip F/T sensors

• ProprioceptiveDexter

STAGE 1: SaccadeTrack - 25 Learning Episodes

atrack

asaccade asaccade

X 1X 0

Sst = psaccade ptrack

rewarding action

Track-saturation

Srg = pst preach pgrab

STAGE 2: ReachGrab - 25 Learning Episodes

rewarding action

TouchTrack-saturation

STAGE 2: ReachGrab - 25 Learning Episodes TouchTrack-saturation

STAGE 3: VisualInspect - 25 Learning Episodes

Svi = prg pcond ptrack(blue)

Track-blue

rewarding action

STAGE 3: VisualInspect - 25 Learning Episodes

Track-blue

STAGE 4: Grasp – User Defined Reward

1 X XX X X

ReachGrab

X 0 0 X 1 1

amoment aforce

Track-blue

Sgrasp = prg pmoment pforce

rewarding action

STAGE 5: PickAndPlace – User Defined Reward

atransport amoment

X 0 - X 0 0

1 X X X 1 1X 1 0

Spnp = pg ptransport pmoment

rewarding action

Conclusions

Mechanisms for creating hierarchical programs.• recursive formulation of potential functions and value functions.

control theoretic representation for action, state, and intrinsic reward.

Experimental demonstration of programming manipulation skills using staged learning episodes.

Intrinsic reward pushes out new behavior and models the affordances of objects.

Thank You

Laboratory for Perceptual Robotics – Department of Computer Science Hierarchical Mechanisms for...

Documents