Transfer Learning Experiments with Soar & the UCT · 2017. 6. 14. · Urban Combat Testbed Software...

May 23, 2007 1

Transfer Learning Experiments with Soar & the UCT

Nicholas Gorskiwith John Laird and Taylor Lafrinere

2May 23, 2007

Outline

� Transfer Learning Background� Urban Combat Testbed (UCT)� Results� Spatial Reasoning in the UCT� Movies!

3May 23, 2007

Project Background

� 3 year DARPA Transfer Learning (TL)initiative

� Soar grouped with ICARUS & Companion�Y1: Urban Combat Testbed (completed F06)�Y2: GGP (ongoing, evaluation in F07)

� Last workshop, reported some initialexperiments

4May 23, 2007

Transfer Learning (TL)

� Similarities to multi-task learning, inductive learning, and “learning to learn”

� Transfer Learning:� performs in source problem� applies learned knowledge to a target problem via transformation� performs in target problem, applying previoulsy learned

knowledge

SourceSource

KnowledgeTarget

TransferAgent

ControlAgent Target

TargetKnowledge

Transformation

5May 23, 2007

Urban Combat Testbed

� Software suite consisting of� First Person Shooter video game engine� Scenarios designed to test for specific types of

transfer

� Complex domain� Large and continuous � Noisy actions� Many different objects & obstacles

� Doors, windows, barriers, pits, water, electrical barriers, etc.

6May 23, 2007

7May 23, 2007

TL Scenarios in the UCT

� Agent must navigate from start to goal� To reach goal, it must climb ladder/drainpipe� Generalization: drainpipe can be climbed because ladder was

climbable

Goal

Start

Drainpipe

Ladder

Source Target

8May 23, 2007

Y1 ResultsComparing Agent and Human Performance

-100

0

100

200

300

400

500

Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Level 8

Jum

p-s

tart

Soar ICARUS Human Targets UTAustin

Higher scoresare better

9May 23, 2007

Jump-start Discussion

� Y1 results�Single methodology used by all teams

�Used for go/no-go decision for Y2

� Used “Jump-start”�Magnitude of initial differences in performance

(not normalized)�Rewards poorly performing agents

10May 23, 2007

Y1 ResultsComparing Agent and Human Performance

-100

0

100

200

300

400

500

Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Level 8

Jum

p-s

tart

Soar ICARUS Human Targets UTAustin

Higher scoresare better

11May 23, 2007

Calibrated Transfer Ratio (CTR)

� Interpretation: “the amount of available improvement achieved”

� Disadvantage: requires knowledge of optimality

� Advantage: more meaningful

∫

∫−

−− *

0

*

0

)()(

)()(1

t

t

t

t

dttBtOPT

dttABtOPT

0

1

Trials

Per

f. M

etri

c

12May 23, 2007

UM Evaluation: CTR vs Jump-start

UCT Jump-starts of Soar Agent

0

50

100

150

200

250

300

350

level 1 level 2 level 3 level 4 level 5 level 6 level 7 level 8

Jum

p-s

tart

(sc

ore

) Soar

UCT Transfer Ratios of Soar Agent

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Tra

nsf

er R

atio

Soar

13May 23, 2007

UM Evaluation: Soar vs HumanTL Project Transfer Ratios (Expert Optimal)

-2

-1.5

-1

-0.5

0

0.5

1


Tan

sfer

Rat

io

Soar Agent Human Targets

14May 23, 2007

Outline

� Transfer Learning Background� Urban Combat Testbed (UCT)� Results� Spatial Reasoning in the UCT� Movies!

15May 23, 2007

Navigation in UCT

� Agent perceives 3D space as set of positive and negative convex polyhedrons� Mapped to 2D convex polygons by SML middleware� “Gateways” are intersections of free space regions

� UCTBot navigates from region to region by:� Moving to a gateway� Moving through a gateway� Suboptimal navigation

� Doesn’t cut close to corners� Uses partitioning even when moving in wide open terrain

16May 23, 2007

Obstacle Detection & Avoidance

� UCTBot is “blind” to obstacles and some gateways

� Detection: relies on velocity� Avoidance:

� Some obstacles can be surmounted� Test all available actions� Robust for most obstacle types

� For blocking obstacles, find paths around them� (Mostly) robust

� Learning: which obstacle/gateway is blocking?

17May 23, 2007

(Partially) Motivated SRS

� Better navigation improves performance� In untrained agents� In transfer agents�To honestly evaluate TL, must optimize both

transfer & control cases

� Possible applications:�Route finding, obstacle avoidance�But would also allow for multi-agent tasks

18May 23, 2007

Learning to climb

19May 23, 2007

Searching indoors

20May 23, 2007

Using weapons

21May 23, 2007

Gold Nuggets & Lumps of Coal

� Met Y1 goals

� Laid groundwork for more motivating TL experiments in Y2

� Motivated SRS� Developed CTR

� UCT cut from Y2

� Scenarios and domain lacked motivating transfer

� Didn’t get to use SRS� CTR not adopted for

internal evaluations

Date post:	19-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Transfer Learning Experiments with Soar & the UCT · 2017. 6. 14. · Urban Combat Testbed Software...

Documents