Doctoral School – Robotics Program Human-Robot Interaction

Doctoral School Robotics Program Human-Robot Interaction
Autonomous Robots Class Human-Robot Interaction Social learning and skill acquisition via teaching and imitation Aude G Billard Learning Algorithms and Systems Laboratory - LASA EPFL,Swiss Federal Institute of Technology Lausanne, Switzerland A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
Overview of the Class 9h15-12h00 Interfaces and interaction modalities non-verbal cues and expressiveness in interactions: gesture, posture, social spaces and facial expressions User-centred design of social robots: humanoids, androids, etc. motivations and emotions in robots social intelligence for robots Social learning and skill acquisition via teaching and imitation 14h15-17h00: Robots in education, therapy and rehabilitation Evaluation methods and methodologies for HRI research Ethical issues in human-robot interaction research A.G. Billard, AR Class -EDIC/EDPR WHY IS PROGRAMMING BY DEMONSTRATION BENEFICIAL FOR ROBOTS?
. Why is learning beneficial? Assistance with routine tasks which cannot be fully automated User-friendly means of reprogramming the robot Means of sharing knowledge with a Companion Robot A.G. Billard, AR Class -EDIC/EDPR Transmitting Human Skills and Knowledge to Robots
Why is it not that simple? A.G. Billard, AR Class -EDIC/EDPR Transmitting Human Skills and Knowledge to Robots
Why is it not that simple? A.G. Billard, AR Class -EDIC/EDPR Transmitting Human Skills and Knowledge to Robots
Learning human skills by imitation includes learning: What to imitate? How to imitate? When to imitate? Who to imitate? C. L. Nehaniv, K. Dautenhahn (2000): Of Hummingbirds and Helicopters: An Algebraic Framework for Interdisciplinary Studies of Imitation and Its Applications. In: J. Demiris and A. Birk, eds., Interdisciplinary Approaches to Robot Learning, World Scientific Series in Robotics and Intelligent Systems - Vol. 24. A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
The Transfer Problem Imitator Demonstrator ? A.G. Billard, AR Class -EDIC/EDPR Same direction of motion
What to imitate? Same Object, same target location Same direction of motion Same speed, same force Same posture A.G. Billard, AR Class -EDIC/EDPR The correspondence problem No solutions (smaller range of motion)
How to Imitate? The correspondence problem Demonstration Imitation ? Find the closest solution according to a metric No solutions (smaller range of motion) A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
Calinon, S. and Billard, A. (2007) Incremental Learning of Gestures by Imitation in a Humanoid Robot. in Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI). A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
Argall, B., Sauser, E and Billard, A. (2010) Policy Adaptation Through Tactile Feedback. in Proceedings of AISB symposium 2010. A.G. Billard, AR Class -EDIC/EDPR How are actions perceived? How is information parsed?
Gesture Recognition How are actions perceived? How is information parsed? Imitation Level of granularity: What is copied? Should it copy the intention, goal or dynamics of movement? Motor Learning How is information transferred across multiple modalities? Visuo-motor, Auditor-motor A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Following an imitation mechanism
Gesture Recognition While following the teacher, the learner robot learns to associate a word with a meaning in terms of sensory inputs Learning by Imitation Robotic Implementation Billard et al, ESANN1997, Billard & Dautenhahn,Robotics &Autonomous Systems 1998, Billard & Hayes, 99,00 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Following an imitation mechanism
Gesture Recognition Teaching path in a Maze Demiris & Hayes, 1994, 1996; Teaching how to climb a hill Dautenhahn, Robotics & Autonomous Systems, 1995 Teaching a path in the environment Billard & Hayes, Adaptive Behavior, 1999 Moga, Gaussier, Applied Artificial Intelligence, 2000 Kaiser et al, Robotics & Autonomous Systems, 2002 Nicolescu & Mataric, AGENTS 2003 Teaching a vocabulary Billard 1997, 1998, 1999 Vogt & Steels, ECAL, 1999 Learning by Imitation Robotic Implementation A.G. Billard, AR Class -EDIC/EDPR Imitation Learning One-Shot Learning Methods Gesture Recognition
Segmentation of demonstration into primitives Classification of gestures into predefined states (e.g. grasp, collision) Built-in controller for producing sequences of states Learning by Imitation Robotic Implementation Kuniyoshi et al. IEEE Trans. on Robotics and Automation,1994. Dillmann et al, Robotics & Autonomous Systems, 2001. Ritter et al, Rev Neuroscience, 2003 Aleotti et al, Robotics & Autonomous Systems, 2004. A.G. Billard, AR Class -EDIC/EDPR Robot Programming by Demonstration One-Shot Learning Methods
Sensors: Data Gloves, Fixed cameras, Speech processing Actuators: Mobile robot, 7 DOF arm, 2 fingers Gripper R. Dillmann, Robotics & Autonomous Systems 47:2-3, , 2004 A.G. Billard, AR Class -EDIC/EDPR Robot Programming by Demonstration One-Shot Learning Methods
Sensors: Data Gloves, Fixed cameras, Speech processing Actuators: Mobile robot, 7 DOF arm, 2 fingers Gripper R. Dillmann, Robotics & Autonomous Systems 47:2-3, , 2004 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning One-Shot Learning Methods Gesture Recognition
Explicit teaching/learning: Reasoning about tasks Verbal instructions Gesture Recognition: For each sensor a context-dependent model based on background knowledge is provided: opening the refrigerator door, extracting the bottle and closing the door Task Reproduction: Store action sequences in atree-like structure of macro-operators Learning by Imitation Robotic Implementation R. Dillmann,Robotics & Autonomous Systems 47:2-3, A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Robot Programming by Demonstration: Grasping
Gesture Recognition Because ofthe large range of possible shapes, generalizing pre-programmed grasps to new and general objects is a rather hard task: Orientation of the hand Positioning of the fingers (correspondence problem!) Tactile forces, stable object contact Learning by Imitation Robotic Implementation Steil et al, Robotics & Autonomous Systems 47:2-3, , 2004 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Robot Programming by Demonstration: Grasping
Gesture Recognition (i) a nave imitation strategy, in which the observed joint angle trajectories (after their transformation into the three-finger geometry) were directly applied to control the fingers of the TUM hand during the grasp, until complete closure around the object (ii) a strategy in which the visually observed hand posture is matched to the initial conditions of a power grip, a precision grip, a three-finger and two-finger grip, respectively, in order to identify the grip type. Learning by Imitation Robotic Implementation Steil et al, Robotics & Autonomous Systems 47:2-3, , 2004 A.G. Billard, AR Class -EDIC/EDPR Robot Programming by Demonstration
Other related works are, e.g.: Kuniyoshi et al, ICRA, 1994 Aleotti et al, Robotics & Autonomous Systems, 47:2-3, , 2004 Zhang & Roessler, Robotics & Autonomous Systems 47:2-3, , 2004 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning motion through Dynamical Systems
Gesture Recognition Locally weighted learning Learning primitives of the system Learning by Imitation Robotic Implementation Ijspeert, Nakanishi, Schaal, ICRA01, NIPS02 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning motion through Dynamical Systems
Gesture Recognition Goals: Use dynamical systems with well defined attractor properties to:1) learn by imitation, 2) be able to modulate the learned trajectory when:perceptual variables are varied, and/or perturbations occur Learning by Imitation Robotic Implementation Discrete movements Rhythmic movements dy/dt y dy/dt y Single point attractor Limit cycle attractor Ijspeert, Nakanishi, Schaal, ICRA01, NIPS02 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning motion through Dynamical Systems
Gesture Recognition Principle: Use a basic linear dynamical system with a single attractor Modulate this system with a pdf estimate of the dynamics of the movement Learning by Imitation Robotic Implementation Discrete movements Rhythmic movements Stable linear system Modulation Single point attractor Limit cycle attractor Ijspeert, Nakanishi, Schaal, ICRA01, NIPS02 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning motion through Dynamical Systems
Gesture Recognition Locally weighted learning Learning primitives of the system Learning by Imitation Robotic Implementation Ijspeert, Nakanishi, Schaal, ICRA01, NIPS02 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning motion through Dynamical Systems
Gesture Recognition The learned trajectory is not sufficient to control the actual robots walking pattern. Phase resetting using foot contact information is necessary. on-line adjustment of the phase of the CPG by sensory feedback from the environment is essential to achieve successful locomotion Learning by Imitation Robotic Implementation Nakanishi et al, Robotics & Autonomous Systems, 47:2-3, 79-91, 2004. A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning motion through Dynamical Systems
Gesture Recognition Learning motion through Dynamical Systems Drawback: stable DS takes over if long delay in reproduction of the movement, leading the movement to depart from original trajectory Needs a heuristic to rescale the time parameter of the DS Need a time-independent encoding A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning of Dynamical Systems through RL
Gesture Recognition Learning of Dynamical Systems through RL Remove time-dependency. Learn estimates of autonomous dynamical systems. I.e. assume that all the demonstrations are instances of a global dynamics of robot motion. Video One must ensure that the system is asymptotically stable at the target M. Khansari and A. Billard, BM: An Iterative Algorithm to Learn Stable Non-Linear Dynamical Systems with Gaussian Mixture Models, Proceedings of the IEEE int. conf. on Robotics and Automation, ICRA 2010 A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
Imitation Learning Learning of the Metric Gesture Recognition Learning Task Constraints as Constraints Continuous in Space and Time estimating joint prob. density function of the signals Learning by Imitation Robotic Implementation S Calinon, D. Guenter A. Billard, On Learning, Representing and Generalizing a Task in a Humanoid Robot, IEEE trans. in SMC: Part-B, 2007 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning of the Metric Gesture Recognition Robotic
Learning by Imitation Robotic Implementation The robot should learn that the important feature in this task is that the queen should be moved 2 steps forward vertically S Calinon, D. Guenter A. Billard, On Learning, Representing and Generalizing a Task in a Humanoid Robot, IEEE trans. in SMC: Part-B, 2007 A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning of the Metric Gesture Recognition Robotic
Learning by Imitation Robotic Implementation Once the robot has learned the rule of motion for the queen, it can apply this rule for moving the queen from locations not seen during the demonstrations S Calinon, D. Guenter A. Billard, On Learning, Representing and Generalizing a Task in a Humanoid Robot, IEEE trans. in SMC: Part-B, 2007 A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
Imitation Learning Learning of the Metric Encoding a task using Gaussian Mixture Regression Probability density function: Gaussian Function: S Calinon, D. Guenter A. Billard, On Learning, Representing and Generalizing a Task in a Humanoid Robot, IEEE trans. in SMC: Part-B, 2007 A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
Imitation Learning Learning of the Metric Extraction of the constraints Expected output: Where: S Calinon, D. Guenter A. Billard, On Learning, Representing and Generalizing a Task in a Humanoid Robot, IEEE trans. in SMC: Part-B, 2007 A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
INCREMENTAL LEARNING Movie A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
INCREMENTAL LEARNING Movie A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
INCREMENTAL LEARNING Movie 6 demonstrations of moving the white Knight to catch the black King A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
INCREMENTAL LEARNING Movie Trajectories of the hand with respect to the first and second object. (Left) Superimposed, the trajectories of each of the 6 demonstrations Middle: Gaussian Mixture Model, Right, Gaussian Mixture Regression A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
INCREMENTAL LEARNING A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
INCREMENTAL LEARNING A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Reinforcement Learning Methods Gesture Recognition
Learning the optimal controller Model of physical system (pendulum) Reinforcement and locally weighted learning Learning by Imitation Robotic Implementation Atkeson & Schaal, ICML, 1997. A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning DS through Inverse RL Gesture Recognition
Intended trajectory Expert demonstrations Learning by Imitation If t is unknown, inference is hard. If t is known, we have a standard HMM. Time indices Make an initial guess for t. Alternate between Fix t Run EM on resulting HMM. Choose new t using dynamic programming. Pieter Abbeel and Andrew Y. Ng. Apprenticeship Learning via Inverse Reinforcement Learning,In Proceedings of ICML, 2004. A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning of Dynamical Systems through RL
Gesture Recognition Learning of Dynamical Systems through RL Quadratic programming problem (QP): quadratic objective, linear constraints. Constraint generation for path constraints. Pieter Abbeel and Andrew Y. Ng. Apprenticeship Learning via Inverse Reinforcement Learning,In Proceedings of ICML, 2004. A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Learning of Dynamical Systems through RL
Gesture Recognition Learning of Dynamical Systems through RL Give a local linear model of the dynamics of the helicopter Pieter Abbeel and Andrew Y. Ng. Apprenticeship Learning via Inverse Reinforcement Learning,In Proceedings of ICML, 2004. A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Reinforcement Learning Methods Gesture Recognition
No model of the physics Use statistical estimate of the motion through mixture of Gaussiansand use the estimated variance as bounds for the search via RL. Learning by Imitation Robotic Implementation F. Guenter, M. Hersch, S. Calinon and A. Billard. Reinforcement Learning for Imitating Constrained Reaching Movements. RSJ Advanced Robotics (2007) A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Reinforcement Learning Methods Gesture Recognition
No model of the physics Use statistical estimate of the motion through mixture of Gaussiansand use the estimated variance as bounds for the search via RL. Learning by Imitation Robotic Implementation F. Guenter, M. Hersch, S. Calinon and A. Billard. Reinforcement Learning for Imitating Constrained Reaching Movements. RSJ Advanced Robotics (2007) A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Reinforcement Learning Methods
Episodic Natural Actor Critic (Peters et al 2005) is applied to learn a new trajectory, so as to overcome the obstacle. Gaussian Stochastic Policy: Reward Function: Shape constraint Reaching constraint F. Guenter, M. Hersch, S. Calinon and A. Billard. Reinforcement Learning for Imitating Constrained Reaching Movements. RSJ Advanced Robotics (2007) A.G. Billard, AR Class -EDIC/EDPR Imitation Learning Reinforcement Learning Methods
Evolution of the reward during the learning phase Reproduction after the learning phase F. Guenter, M. Hersch, S. Calinon and A. Billard. Reinforcement Learning for Imitating Constrained Reaching Movements. RSJ Advanced Robotics (2007) A.G. Billard, AR Class -EDIC/EDPR A.G. Billard, AR Class - EDIC/EDPR
SUMMARY Learning new tasks relies on various means of teaching the robots. Imitation learning is useful in so far that it gives hints as to the optimal solution The robot must however rely on generic skills of its own to adapt the demonstration to its own body and to the context Learning of complex skills is overall relatively slow and must proceed incrementally A.G. Billard, AR Class -EDIC/EDPR

Date post:	18-Jan-2018
Category:	Documents
Upload:	branden-farmer
View:	225 times
Download:	1 times

Doctoral School – Robotics Program Human-Robot Interaction

Documents