Frederick L. Crabbe€¦ · Autonomous agents (animal, robot, software) pick actions to take Many...

Post on 24-Aug-2020

1 views 0 download

transcript

Compromise Strategies for Action Selection

Frederick L. Crabbe

Computer Science DepartmentUnited States Naval Academy

University of Pittsburgh Intelligent Systems Program

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 1 / 40

The Problem

Autonomous agents (animal, robot, software) pick actions to takeMany approaches

Solve the problem optimallyTreat problem heuristicallyPros and cons of each

Multiple conflicting goals introduce tough problems for optimalapproach

Difficult to expressDifficult to compute

Multiple conflicting goals introduce tough problems for heuristicapproach

Which goal does the agent pursue?How can they (should they) be combined?

This talk is on one such combination technique: compromise

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 2 / 40

The Message

Compromise behavior (seletcting actions that compromisebetween goals) is an influential concept in many areas of agentsresearchExperiments here show it less beneficial than predicted

Infinite variations possible...Experiments are based on scenarios compromise advocates sayshould workSomething’s wrong with the currently accepted hypothesis

We propose an alternate hypothesisThe level the decision is being made at is key to whether it is helpfulCompromise is more useful at higher level of decision making.

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 3 / 40

Outline

1 Introduction

2 HistoryComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40

Traditional Planning

Action selection problem = singlegoal in search space

Can have multiple parts:Have(robot,medicine003) ∧In(robot,room342)Cannot be conflicting.

Find shortest path, next action is 1ststep on pathCalculating this: hardRelax optimality? hardStill can’t handle the multipleconflicting goals

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 5 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 6 / 40

Comparative Psychology

Branch of animalpsychologyDerived from traditionsof behaviorismExperiments outside ofnatural environment:maze, skinner boxAll animal drives notbeing tested are met byexperimentersDesigned to isolatematters in questionFocus on reasoning andlearning

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 7 / 40

Comparative Psychology

Branch of animalpsychologyDerived from traditionsof behaviorismExperiments outside ofnatural environment:maze, skinner boxAll animal drives notbeing tested are met byexperimentersDesigned to isolatematters in questionFocus on reasoning andlearning

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 7 / 40

Ethology

A different perspectiveObserve animals innatural surroundingsPerforming naturaltasksOften with multipleconflicting goals

Fixed Action PatternsAnimals often react toexternal stimuli withhard-coded behaviorsWhat happens whenmultiple FAPs areactive?A focus of ethologyresearch

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 8 / 40

Ethology

A different perspectiveObserve animals innatural surroundingsPerforming naturaltasksOften with multipleconflicting goals

Fixed Action PatternsAnimals often react toexternal stimuli withhard-coded behaviorsWhat happens whenmultiple FAPs areactive?A focus of ethologyresearch

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 8 / 40

Conflict Resolution

Possible FAP conflict strategies

Pick oneIntention movementsAlternationAmbivalent behaviorCommon ComponentsCompromise behavior

Autonomic responsesDisplacementRedirectionRegressionImmobilityAggression

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 9 / 40

Traditional Planning

Action selection problem = singlegoal in search space

Can have multiple parts:Have(robot,medicine003) ∧In(robot,room342)Cannot be conflicting.

Find shortest path, next action is 1ststep on pathCalculating this: hardRelax optimality? hardStill can’t handle the multipleconflicting goals

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 10 / 40

The Behavior Based Solution

Ethologically inspired:Divide problem into FAP-like“behaviors”Each dedicated to solvingindividual goalsRecombinerecommendations,somehowHow do we recombine?

Use the ethology list!Already well studiedMany make intuitive senseSeen in nature = good idea?Need to pick and choose thegood ones

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 11 / 40

Compromise Actions

Recombination must be able to exhibit compromise behavior

Tyrell’s rule 12:[The combination mechanism must] be able to choose actions that,while not the best choice for any one sub-problem alone, are bestwhen all sub-problems are considered simultaneously.

Why? Council of ministers analogyIssues

Compromize can be costly (in computation AND design)Actual benefit unknown

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 12 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 13 / 40

Prescriptive Goals

Low level, twoprescriptive goalscenario

2 goals to move to 2targetstargets candissappearWill bet-hedgingcompromise be agood idea?

Seen in nature?Mating behaviorFrogs (Leptodactylusocellatus)Hunting behavior ofcheetahs (Acinonyxjubatus)

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 14 / 40

Prescriptive Goals

Low level, twoprescriptive goalscenario

2 goals to move to 2targetstargets candissappearWill bet-hedgingcompromise be agood idea?

Seen in nature?Mating behaviorFrogs (Leptodactylusocellatus)Hunting behavior ofcheetahs (Acinonyxjubatus)

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 14 / 40

Prescriptive Goals

Low level, twoprescriptive goalscenario

2 goals to move to 2targetstargets candissappearWill bet-hedgingcompromise be agood idea?

Seen in nature?Mating behaviorFrogs (Leptodactylusocellatus)Hunting behavior ofcheetahs (Acinonyxjubatus)

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 14 / 40

Modeling Approach

Simulated environmentAction detail level

Too detailed: “move left leg”Too vague: “go to target”Move one unit at angle θ

Environment contains 2 stationary targets, can disappear w/probability 1− pMeasurement: utility theory

EU(Ai |Sj) =∑

So∈O

P(So|Ai , Sj)Uh(So)

Uh(S) = U(S) + maxAi∈A

EU(Ai |Sj)

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 15 / 40

Formal Model

Applying the EU equations to our situation, we get:

EU(Ai |ta, tb, λ) =p2EU(Aθ|ta, tb, λ′)+

p(1− p)EU(Aθ|ta, λ′)+

p(1− p)EU(Aθ|tb, λ′),

EU(Ai |ta, λ′) =Gapλ′ta , and,

EU(Ai |tb, λ′) =Gbpλ′tb

Can be solved using dynamic programming

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 16 / 40

Experimental Set-Up

Select random location for 2 targetsSelect random goal valuesSelect random pRun for 50,000 scenariosCalculate optimal policyCompare against non-compromise:

Closest (C)Maximum Utility (MU)Maximum Expected Utility (MEU)

Compare against compromise strategies:Forces (F)Signal Gradient (SG)Exponentially Weakening Forces (EWF)

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 17 / 40

Example

0

5

10

15

20

25

30

35

40

45

50

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 18 / 40

Prescriptive Results

Comparing non-compromise strategies to each other

MU C MEU% over MU 0.0 9.35 15.31% over C -4.13 0.0 12.62% over MEU -8.49 -5.96 0.0

Comparing compromise strategies to MEU

F SG EWF Optimalavg -4.07% -2.79% -2.47% 1.12%best 4.84% 4.82 % 20.56 % 22.73%

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 19 / 40

Summary

Standard compromisestrategies worse thanclever non-compromiseOptimal only barelybetter thannon-compromise

Extra bonus conclusion:animals that exhibit apparentcompromise in the 2prescriptive goal case areeither using some unknownstrategy or are doing so forsome other reason.

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 20 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 21 / 40

Proscriptive Goals

Maybe the previousscenario wasn’t wherecompromise shinesCompromise work betterwith proscriptive goals?

Proscriptive goal is agoal to not dosomethingSuch as, don’t gonear the predator

Maybe prescriptive goaland a proscriptive oneMove to food?Away from predator?Somewhere else?

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 22 / 40

Proscriptive Goals

Maybe the previousscenario wasn’t wherecompromise shinesCompromise work betterwith proscriptive goals?

Proscriptive goal is agoal to not dosomethingSuch as, don’t gonear the predator

Maybe prescriptive goaland a proscriptive oneMove to food?Away from predator?Somewhere else?

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 22 / 40

Proscriptive Goals

Maybe the previousscenario wasn’t wherecompromise shinesCompromise work betterwith proscriptive goals?Maybe prescriptive goaland a proscriptive oneMove to food? Awayfrom predator?Somewhere else?

Tyrell:“It is obviously preferable to combine this demand [to flee the hazard]with a preference to head toward food, if the two don’t clash, ratherthan to head diametrically away from the hazard because the onlysystem being considered is that of avoid hazard”

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 22 / 40

Formal Model

Applying the EU equations to our situation, we get:

EU(O|t , d , λ) =ptpdpn(λ)EU(O|t , d , λ′)+

pt(1− pd)EU(O|t , λ′)+

pd(1− pn(λ))Gd+

(1− pt)pdpn(λ)EU(O|d , λ′)

EU(O|t , λ) =Gtpλ,t ,

EU(O|d , λ) =pn(λ′)pdEU(O|d , λ′)+

(1− pn(λ′))Gd .

Which can be calculated using dynamic programming

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 23 / 40

Experimental Details

Location of target: (50, 90); Utility: 100Location of danger: (60, 50); Utility: −100pd in range [0.5; 1), pt in range [0.95; 1)

Probability distributions to strike probabilityLinear A: pn(d) = 0.04d + 0.2 when d ≤ 20, 1 otherwiseLinear B: pn(d) = 0.005d + .9 when d ≤ 20, 1 otherwiseQuadratic: pn(d) = d2/400 when d ≤ 20, 1 otherwiseSigmoid: pn(d) = 1/(1 + 1.810−d ) everywhere

Generated 2000 scenariosAction Selection mechanisms

OptimalMEU- Go directly to targetActive Goal- Act based on goal currently activeSkirt- Move toward the target, but skirt around danger zone

Examined EU of 4 AS strategies at 200 locations per scenario

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 24 / 40

Predictions

Compromise seen most strongly inside danger zone, with dangerto one side of agentMore compromise for linear BCompromise around the edges for Sigmoid and QuadraticMore compromise with low pd

More compromise with low pt

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 25 / 40

pt high, pd high, and pn(d) is Linear A:

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

"vf050330"

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 26 / 40

pt high, pd high, and pn(d) is Linear B:"vf050331c"

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 27 / 40

pt high, pd low, and pn(d) is Linear A:

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

"vf050331b"

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 28 / 40

pt low, pd low, and pn(d) is Linear A:

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

"vf050331g"

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 29 / 40

pt low, pd low, and pn(d) is Linear B:

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

"vf050331h"

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 30 / 40

pt high, pd high, and pn(d) is Sigmoid:

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90

"vf050331e"

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 31 / 40

Quantitative Results

optimal over optimal skirt overscenario active goal over skirt active goal

all 29.6% 0.1% 29.1%opposite 64.9% 0.2% 63.3%

danger zone 26.2% 0.01% 26.1%

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 32 / 40

Summary

As predictedMore compromise for linear BCompromise at edges for Sigmoid and ExponentialOptimal compromise out performs Active Goal.

Not predictedCompromise not seen inside danger zone at all in many cases.Compromise behind danger zone with high pd .pt has less effect than pd .Optimal compromise does not out perform skirt.

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 33 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 34 / 40

Why do we care?

Why not use optimal for our agents

Why not use a faster compromise strategy?

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40

Why do we care?

Why not use optimal for our agentsToo slow

Why not use a faster compromise strategy?

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40

Why do we care?

Why not use optimal for our agentsToo slowSmall benefit

Why not use a faster compromise strategy?

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40

Why do we care?

Why not use optimal for our agentsToo slowSmall benefit

Why not use a faster compromise strategy?Complicates system

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40

Why do we care?

Why not use optimal for our agentsToo slowSmall benefit

Why not use a faster compromise strategy?Complicates systemSmall benefit

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40

Blending vs. Voting

Compromise in experiments here resembles “blending” of actionsMatches the descriptions in ethology literatureResult is similar to the recommended actions of the two subgoals

Compromise often justified as voting scheme:Subgoal votes for top n actions from finite setAction with most votes selectedResulting actions different from best for each subgoal

Confusion from equivocation on definition of compromise actionhigh vs. low level actionhigh level actions: small, discrete set; amenable to votinglow level actions: continuous, infinite set; result in blending

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 36 / 40

Compromise Behavior Hypothesis

1 Low level compromise action less useful than high levelcompromise

2 Higher the decision level, the more useful is compromise1 At low levels, compromise actions similar to non-compromise

actions2 At high levels, compromise actions can be very different from

non-compromise actions3 In complex environments, optimal or even very good non-optimal

low-level actions are prohibitively difficult to calculate

You want to compromise in the selection of the high-level

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 37 / 40

Outline

1 Introduction2 History

ComputationalBiological

3 Prescriptive Action SelectionFormulationExperiments

4 Proscriptive Action SelectionFormulationExperiments

5 What does it mean?A new hypothesis

6 Future Work and ConclusionFuture WorkConclusion

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 38 / 40

Future Work

Test against non-optimal compromise behaviorsTest the compromise behavior hypothesis

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 39 / 40

Conclusion

In prescriptive goal scenariosOptimal compromise marginally usefulSub-optimal compromise harmful

In proscriptive goal scenariosOptimal compromise behavior is different from what we expectedLess beneficial than expected, and only in some situations

Equivocation on definition of compromise actionhigh level vs low level actions

Compromise Behavior hypothesis may explain what is going on

F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 40 / 40