May 23, 2007 1
Transfer Learning Experiments with Soar & the UCT
Nicholas Gorskiwith John Laird and Taylor Lafrinere
2May 23, 2007
Outline
� Transfer Learning Background� Urban Combat Testbed (UCT)� Results� Spatial Reasoning in the UCT� Movies!
3May 23, 2007
Project Background
� 3 year DARPA Transfer Learning (TL)initiative
� Soar grouped with ICARUS & Companion�Y1: Urban Combat Testbed (completed F06)�Y2: GGP (ongoing, evaluation in F07)
� Last workshop, reported some initialexperiments
4May 23, 2007
Transfer Learning (TL)
� Similarities to multi-task learning, inductive learning, and “learning to learn”
� Transfer Learning:� performs in source problem� applies learned knowledge to a target problem via transformation� performs in target problem, applying previoulsy learned
knowledge
SourceSource
KnowledgeTarget
TransferAgent
ControlAgent Target
TargetKnowledge
Transformation
5May 23, 2007
Urban Combat Testbed
� Software suite consisting of� First Person Shooter video game engine� Scenarios designed to test for specific types of
transfer
� Complex domain� Large and continuous � Noisy actions� Many different objects & obstacles
� Doors, windows, barriers, pits, water, electrical barriers, etc.
6May 23, 2007
7May 23, 2007
TL Scenarios in the UCT
� Agent must navigate from start to goal� To reach goal, it must climb ladder/drainpipe� Generalization: drainpipe can be climbed because ladder was
climbable
Goal
Start
Drainpipe
Ladder
Source Target
8May 23, 2007
Y1 ResultsComparing Agent and Human Performance
-100
0
100
200
300
400
500
Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Level 8
Jum
p-s
tart
Soar ICARUS Human Targets UTAustin
Higher scoresare better
9May 23, 2007
Jump-start Discussion
� Y1 results�Single methodology used by all teams
�Used for go/no-go decision for Y2
� Used “Jump-start”�Magnitude of initial differences in performance
(not normalized)�Rewards poorly performing agents
10May 23, 2007
Y1 ResultsComparing Agent and Human Performance
-100
0
100
200
300
400
500
Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Level 8
Jum
p-s
tart
Soar ICARUS Human Targets UTAustin
Higher scoresare better
11May 23, 2007
Calibrated Transfer Ratio (CTR)
� Interpretation: “the amount of available improvement achieved”
� Disadvantage: requires knowledge of optimality
� Advantage: more meaningful
∫
∫−
−− *
0
*
0
)()(
)()(1
t
t
t
t
dttBtOPT
dttABtOPT
0
1
Trials
Per
f. M
etri
c
12May 23, 2007
UM Evaluation: CTR vs Jump-start
UCT Jump-starts of Soar Agent
0
50
100
150
200
250
300
350
level 1 level 2 level 3 level 4 level 5 level 6 level 7 level 8
Jum
p-s
tart
(sc
ore
) Soar
UCT Transfer Ratios of Soar Agent
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
level 1 level 2 level 3 level 4 level 5 level 6 level 7 level 8
Tra
nsf
er R
atio
Soar
13May 23, 2007
UM Evaluation: Soar vs HumanTL Project Transfer Ratios (Expert Optimal)
-2
-1.5
-1
-0.5
0
0.5
1
level 1 level 2 level 3 level 4 level 5 level 6 level 7 level 8
Tan
sfer
Rat
io
Soar Agent Human Targets
14May 23, 2007
Outline
� Transfer Learning Background� Urban Combat Testbed (UCT)� Results� Spatial Reasoning in the UCT� Movies!
15May 23, 2007
Navigation in UCT
� Agent perceives 3D space as set of positive and negative convex polyhedrons� Mapped to 2D convex polygons by SML middleware� “Gateways” are intersections of free space regions
� UCTBot navigates from region to region by:� Moving to a gateway� Moving through a gateway� Suboptimal navigation
� Doesn’t cut close to corners� Uses partitioning even when moving in wide open terrain
16May 23, 2007
Obstacle Detection & Avoidance
� UCTBot is “blind” to obstacles and some gateways
� Detection: relies on velocity� Avoidance:
� Some obstacles can be surmounted� Test all available actions� Robust for most obstacle types
� For blocking obstacles, find paths around them� (Mostly) robust
� Learning: which obstacle/gateway is blocking?
17May 23, 2007
(Partially) Motivated SRS
� Better navigation improves performance� In untrained agents� In transfer agents�To honestly evaluate TL, must optimize both
transfer & control cases
� Possible applications:�Route finding, obstacle avoidance�But would also allow for multi-agent tasks
18May 23, 2007
Learning to climb
19May 23, 2007
Searching indoors
20May 23, 2007
Using weapons
21May 23, 2007
Gold Nuggets & Lumps of Coal
� Met Y1 goals
� Laid groundwork for more motivating TL experiments in Y2
� Motivated SRS� Developed CTR
� UCT cut from Y2
� Scenarios and domain lacked motivating transfer
� Didn’t get to use SRS� CTR not adopted for
internal evaluations