Beyond Geometric Path Planning: When Context Matters

transcript

Beyond Geometric Path Planning:When Context Matters

Ashesh Jain, Shikhar Sharma Thorsten Joachims and Ashutosh Saxena

Jain, Sharma, Joachims, Saxena

Outline

• Motivation

• Approach–Context-based score– Feedback mechanism– Learning algorithm

• Results

Structured To Unstructured Environments

[Images from Google]

Kuka arm Kiva Beam

Baxter PR2 Robot Nurse

Path Planning

• High DoF manipulators• Continuous high dimensional space• Obstacles

BASE7 DoF arm

JointLink

End-Effector

ABGeometric criteria's

Collision free Shortest path Least timeMinimum energy

Kavraki et. al. PRMLaValle et. al. RRTRatliff et. al. CHOMPKaraman et. al. RRT*Schulman et. al. TrajOpt

Context Rich Environment

http://www.youtube.com/watch?v=uLktpkd7ojAVideo [14 sec to 18 sec]

What went wrong?

• Robot not modeling the context

• Does not understand the preferences

Does Existing Works Address This?

• Inverse Reinforcement Learning(Kober and Peters 2011 , Abbeel et. al. 2010 , Ziebrat et. al. 2008 , Ratliff et. al. 2006)

• Context is not important, focuses on specific trajectory– Modeling human navigation patterns Kitani et. al. ECCV 2012

• Optimal Demonstrations: Requires an expert

Abbeel et. al.Ratliff et. al. Kober et. al.

Our Goal• Model Context

• Generate multiple trajectories for a task

• User preferences

• Learn from non-expert’s

Outline

• Motivation

• Results

Learning Setting

UserRobot

1. Online learning system2. Learns from user feedback3. Sub-optimal feedback

𝑠∗ ( 𝑦 ,𝑐𝑜𝑛𝑡𝑒𝑥𝑡∨𝑡𝑎𝑠𝑘 )𝑠 (𝑦 ,𝑐𝑜𝑛𝑡𝑒𝑥𝑡|𝑡𝑎𝑠𝑘 ¿

Goal: Learn user preferences

Outline

• Motivation

• Results

Example of Preferences• Move a glass of water

Upright

Context

Contorted Arm

Preferences varies with users, tasks and environments

Score function

Robot configurationand

Environment Interactions

Context TrajectoryObject-object

Interactions

Connecting waypoints to neighboring objects

Trajectory graph

Score function

Trajectory graph

Object attributes: {electronic, fragile, sharp, liquid, hot, …}E.g. Laptop: {electronic, fragile}

Knife: {sharp} …..Hermans et. al. ICRA w/s 2011Koppula et. al. NIPS 2011

∑𝑒𝑑𝑔𝑒𝑠

∑𝑙 ,𝑘∈𝑙𝑎𝑏𝑒𝑙𝑠

𝟏 (𝑒𝑑𝑔𝑒 ,𝑙 ,𝑘 )𝑤𝑙𝑘𝑇 𝜙𝑜−𝑜(𝑥 , 𝑦 ;𝑒𝑑𝑔𝑒)

Distance features

𝑤𝑂𝑇 𝜙𝑂 (𝑥 , 𝑦 )

Object-object Interactions

Score function

Object-object Interactions

Robot configurationand

Environment Interactions

Bad Good

Features

1. Spectrogram

2. Object’s distance from horizontal and vertical surfaces

3. Object’s angle with vertical axis

4. Robot’s wrist and elbow configuration in cylindrical co-ordinateCakmak et. al. IROS 2011

∈ℝ𝟕𝟓

Outline

• Motivation

• Results

User FeedbackIntuitive feedback mechanisms

Re-rank Interactive

Zero-G

1. Re-rank

• Robot ranks trajectories and user selects one

Top three trajectories User feedbackUser observing top three trajectories

2. Zero-G

• User corrects trajectory waypoints

Bad waypoint in red Holding wrist activates zero-G mode

2. Zero-G

• User corrects trajectory waypoints

3. Interactive

• Not all robots support zero-G feedback

3. Interactive

• Not all robots support zero-G feedback

Outline

• Motivation

• Results

Coactive Learning

UserRobot

Goal: Learn user preferences

Shivaswamy & Joachims, ICML 2012

Learn from sub-optimal feedback

Trajectory Preference Perceptron

Regret bound𝐸 [𝑅𝐸𝐺𝑇 ]≤𝑂 ( 1

𝛼√𝑇+1𝑇 ∑𝜉 𝑡) Shivaswamy & Joachims, ICML 2012

Outline

• Motivation

• Results

Experimental Setup• Two robots: Baxter and PR2

• 35 tasks in household setting– 2100 expert labeled trajectories

• 16 tasks in grocery store checkout settings– 1300 expert labeled trajectories

• 14 objects– Bowl, Knife, Laptop, Metal box, Fruits, Egg cartons etc.

• 7 users

Experimental Setting 1

Household environment on PR2

Pouring Cleaning the table Setting up table

• 35 tasks • Variation in objects and environment • Expert’s label on 2100 trajectories on a scale of 1 to 5

Experimental Setting 2

Grocery store checkout on Baxter

Cereal box Egg carton Knife in human vicinity

• 16 tasks • Variations in objects and their placement• Expert’s label on 1300 trajectories on a scale of 1 to 5

Generalization

#Feedback

Ours w/o pre-training

Ours pre-trained

SVM-rank

MMP-online

Household setting• Testing on a new

environment

• Higher nDCG w/o feedback

• SVM-rank trained on expert’s labels

•MMP-online is an IRL technique

User Study10 tasks per user– 7 users

– Total 7 hours worth robot interaction

– Users interacts until satisfied

User Study

Task No.

Increasing difficulty

Grocery setting

• Baxter

• Re-rank popular for easier tasks

• Increase in zero-G for hard tasks

#FeedbackTime

Re-rank

Zero-G

User # Re-rank # Zero-G Time (min)

SelfScore

CrossScore

1 5.4 3.3 7.8 3.8 4.02 1.8 1.7 4.6 4.3 3.63 2.9 2.0 5.0 4.4 3.24 3.2 1.5 5.3 3.0 3.75 3.6 1.9 5.0 3.5 3.36 3.1 2.4 - 3.5 3.67 2.3 1.8 - 4.1 4.1

User Study

3.2 (1.1) 2.1 (0.6) 5.5 (1.3) 3.8 (0.5) 3.6 (0.3)Avg.

SelfScore

CrossScore

1 5.4 3.3 7.8 3.8 4.02 1.8 1.7 4.6 4.3 3.63 2.9 2.0 5.0 4.4 3.24 3.2 1.5 5.3 3.0 3.75 3.6 1.9 5.0 3.5 3.36 3.1 2.4 - 3.5 3.67 2.3 1.8 - 4.1 4.1

User Study

3.2 (1.1) 2.1 (0.6) 5.5 (1.3) 3.8 (0.5) 3.6 (0.3)Avg.

5 Feedback• 3 Re-rank• 2 Zero-G

SelfScore

CrossScore

1 5.4 3.3 7.8 3.8 4.02 1.8 1.7 4.6 4.3 3.63 2.9 2.0 5.0 4.4 3.24 3.2 1.5 5.3 3.0 3.75 3.6 1.9 5.0 3.5 3.36 3.1 2.4 - 3.5 3.67 2.3 1.8 - 4.1 4.1

User Study

3.2 (1.1) 2.1 (0.6) 5.5 (1.3) 3.8 (0.5) 3.6 (0.3)Avg.

5 to 6 min. per task

SelfScore

CrossScore

1 5.4 3.3 7.8 3.8 4.02 1.8 1.7 4.6 4.3 3.63 2.9 2.0 5.0 4.4 3.24 3.2 1.5 5.3 3.0 3.75 3.6 1.9 5.0 3.5 3.36 3.1 2.4 - 3.5 3.67 2.3 1.8 - 4.1 4.1

User Study

3.2 (1.1) 2.1 (0.6) 5.5 (1.3) 3.8 (0.5) 3.6 (0.3)Avg.

Similar preferences

Robot Demonstration

http://www.youtube.com/watch?v=uLktpkd7ojAVideo [full video]

Conclusion

• Challenges of Unstructured Environment

• Geometric approaches are not enough

• Modeling context is crucial

• Learning from users and not expert’s

Thank You

For more details visit http://pr.cs.cornell.edu/coactive

Beyond Geometric Path Planning: When Context Matters

Documents