Compromise Strategies for Action Selection
Frederick L. Crabbe
Computer Science DepartmentUnited States Naval Academy
University of Pittsburgh Intelligent Systems Program
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 1 / 40
The Problem
Autonomous agents (animal, robot, software) pick actions to takeMany approaches
Solve the problem optimallyTreat problem heuristicallyPros and cons of each
Multiple conflicting goals introduce tough problems for optimalapproach
Difficult to expressDifficult to compute
Multiple conflicting goals introduce tough problems for heuristicapproach
Which goal does the agent pursue?How can they (should they) be combined?
This talk is on one such combination technique: compromise
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 2 / 40
The Message
Compromise behavior (seletcting actions that compromisebetween goals) is an influential concept in many areas of agentsresearchExperiments here show it less beneficial than predicted
Infinite variations possible...Experiments are based on scenarios compromise advocates sayshould workSomething’s wrong with the currently accepted hypothesis
We propose an alternate hypothesisThe level the decision is being made at is key to whether it is helpfulCompromise is more useful at higher level of decision making.
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 3 / 40
Outline
1 Introduction
2 HistoryComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 4 / 40
Traditional Planning
Action selection problem = singlegoal in search space
Can have multiple parts:Have(robot,medicine003) ∧In(robot,room342)Cannot be conflicting.
Find shortest path, next action is 1ststep on pathCalculating this: hardRelax optimality? hardStill can’t handle the multipleconflicting goals
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 5 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 6 / 40
Comparative Psychology
Branch of animalpsychologyDerived from traditionsof behaviorismExperiments outside ofnatural environment:maze, skinner boxAll animal drives notbeing tested are met byexperimentersDesigned to isolatematters in questionFocus on reasoning andlearning
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 7 / 40
Comparative Psychology
Branch of animalpsychologyDerived from traditionsof behaviorismExperiments outside ofnatural environment:maze, skinner boxAll animal drives notbeing tested are met byexperimentersDesigned to isolatematters in questionFocus on reasoning andlearning
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 7 / 40
Ethology
A different perspectiveObserve animals innatural surroundingsPerforming naturaltasksOften with multipleconflicting goals
Fixed Action PatternsAnimals often react toexternal stimuli withhard-coded behaviorsWhat happens whenmultiple FAPs areactive?A focus of ethologyresearch
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 8 / 40
Ethology
A different perspectiveObserve animals innatural surroundingsPerforming naturaltasksOften with multipleconflicting goals
Fixed Action PatternsAnimals often react toexternal stimuli withhard-coded behaviorsWhat happens whenmultiple FAPs areactive?A focus of ethologyresearch
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 8 / 40
Conflict Resolution
Possible FAP conflict strategies
Pick oneIntention movementsAlternationAmbivalent behaviorCommon ComponentsCompromise behavior
Autonomic responsesDisplacementRedirectionRegressionImmobilityAggression
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 9 / 40
Traditional Planning
Action selection problem = singlegoal in search space
Can have multiple parts:Have(robot,medicine003) ∧In(robot,room342)Cannot be conflicting.
Find shortest path, next action is 1ststep on pathCalculating this: hardRelax optimality? hardStill can’t handle the multipleconflicting goals
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 10 / 40
The Behavior Based Solution
Ethologically inspired:Divide problem into FAP-like“behaviors”Each dedicated to solvingindividual goalsRecombinerecommendations,somehowHow do we recombine?
Use the ethology list!Already well studiedMany make intuitive senseSeen in nature = good idea?Need to pick and choose thegood ones
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 11 / 40
Compromise Actions
Recombination must be able to exhibit compromise behavior
Tyrell’s rule 12:[The combination mechanism must] be able to choose actions that,while not the best choice for any one sub-problem alone, are bestwhen all sub-problems are considered simultaneously.
Why? Council of ministers analogyIssues
Compromize can be costly (in computation AND design)Actual benefit unknown
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 12 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 13 / 40
Prescriptive Goals
Low level, twoprescriptive goalscenario
2 goals to move to 2targetstargets candissappearWill bet-hedgingcompromise be agood idea?
Seen in nature?Mating behaviorFrogs (Leptodactylusocellatus)Hunting behavior ofcheetahs (Acinonyxjubatus)
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 14 / 40
Prescriptive Goals
Low level, twoprescriptive goalscenario
2 goals to move to 2targetstargets candissappearWill bet-hedgingcompromise be agood idea?
Seen in nature?Mating behaviorFrogs (Leptodactylusocellatus)Hunting behavior ofcheetahs (Acinonyxjubatus)
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 14 / 40
Prescriptive Goals
Low level, twoprescriptive goalscenario
2 goals to move to 2targetstargets candissappearWill bet-hedgingcompromise be agood idea?
Seen in nature?Mating behaviorFrogs (Leptodactylusocellatus)Hunting behavior ofcheetahs (Acinonyxjubatus)
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 14 / 40
Modeling Approach
Simulated environmentAction detail level
Too detailed: “move left leg”Too vague: “go to target”Move one unit at angle θ
Environment contains 2 stationary targets, can disappear w/probability 1− pMeasurement: utility theory
EU(Ai |Sj) =∑
So∈O
P(So|Ai , Sj)Uh(So)
Uh(S) = U(S) + maxAi∈A
EU(Ai |Sj)
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 15 / 40
Formal Model
Applying the EU equations to our situation, we get:
EU(Ai |ta, tb, λ) =p2EU(Aθ|ta, tb, λ′)+
p(1− p)EU(Aθ|ta, λ′)+
p(1− p)EU(Aθ|tb, λ′),
EU(Ai |ta, λ′) =Gapλ′ta , and,
EU(Ai |tb, λ′) =Gbpλ′tb
Can be solved using dynamic programming
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 16 / 40
Experimental Set-Up
Select random location for 2 targetsSelect random goal valuesSelect random pRun for 50,000 scenariosCalculate optimal policyCompare against non-compromise:
Closest (C)Maximum Utility (MU)Maximum Expected Utility (MEU)
Compare against compromise strategies:Forces (F)Signal Gradient (SG)Exponentially Weakening Forces (EWF)
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 17 / 40
Example
0
5
10
15
20
25
30
35
40
45
50
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 18 / 40
Prescriptive Results
Comparing non-compromise strategies to each other
MU C MEU% over MU 0.0 9.35 15.31% over C -4.13 0.0 12.62% over MEU -8.49 -5.96 0.0
Comparing compromise strategies to MEU
F SG EWF Optimalavg -4.07% -2.79% -2.47% 1.12%best 4.84% 4.82 % 20.56 % 22.73%
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 19 / 40
Summary
Standard compromisestrategies worse thanclever non-compromiseOptimal only barelybetter thannon-compromise
Extra bonus conclusion:animals that exhibit apparentcompromise in the 2prescriptive goal case areeither using some unknownstrategy or are doing so forsome other reason.
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 20 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 21 / 40
Proscriptive Goals
Maybe the previousscenario wasn’t wherecompromise shinesCompromise work betterwith proscriptive goals?
Proscriptive goal is agoal to not dosomethingSuch as, don’t gonear the predator
Maybe prescriptive goaland a proscriptive oneMove to food?Away from predator?Somewhere else?
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 22 / 40
Proscriptive Goals
Maybe the previousscenario wasn’t wherecompromise shinesCompromise work betterwith proscriptive goals?
Proscriptive goal is agoal to not dosomethingSuch as, don’t gonear the predator
Maybe prescriptive goaland a proscriptive oneMove to food?Away from predator?Somewhere else?
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 22 / 40
Proscriptive Goals
Maybe the previousscenario wasn’t wherecompromise shinesCompromise work betterwith proscriptive goals?Maybe prescriptive goaland a proscriptive oneMove to food? Awayfrom predator?Somewhere else?
Tyrell:“It is obviously preferable to combine this demand [to flee the hazard]with a preference to head toward food, if the two don’t clash, ratherthan to head diametrically away from the hazard because the onlysystem being considered is that of avoid hazard”
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 22 / 40
Formal Model
Applying the EU equations to our situation, we get:
EU(O|t , d , λ) =ptpdpn(λ)EU(O|t , d , λ′)+
pt(1− pd)EU(O|t , λ′)+
pd(1− pn(λ))Gd+
(1− pt)pdpn(λ)EU(O|d , λ′)
EU(O|t , λ) =Gtpλ,t ,
EU(O|d , λ) =pn(λ′)pdEU(O|d , λ′)+
(1− pn(λ′))Gd .
Which can be calculated using dynamic programming
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 23 / 40
Experimental Details
Location of target: (50, 90); Utility: 100Location of danger: (60, 50); Utility: −100pd in range [0.5; 1), pt in range [0.95; 1)
Probability distributions to strike probabilityLinear A: pn(d) = 0.04d + 0.2 when d ≤ 20, 1 otherwiseLinear B: pn(d) = 0.005d + .9 when d ≤ 20, 1 otherwiseQuadratic: pn(d) = d2/400 when d ≤ 20, 1 otherwiseSigmoid: pn(d) = 1/(1 + 1.810−d ) everywhere
Generated 2000 scenariosAction Selection mechanisms
OptimalMEU- Go directly to targetActive Goal- Act based on goal currently activeSkirt- Move toward the target, but skirt around danger zone
Examined EU of 4 AS strategies at 200 locations per scenario
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 24 / 40
Predictions
Compromise seen most strongly inside danger zone, with dangerto one side of agentMore compromise for linear BCompromise around the edges for Sigmoid and QuadraticMore compromise with low pd
More compromise with low pt
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 25 / 40
pt high, pd high, and pn(d) is Linear A:
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
"vf050330"
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 26 / 40
pt high, pd high, and pn(d) is Linear B:"vf050331c"
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 27 / 40
pt high, pd low, and pn(d) is Linear A:
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
"vf050331b"
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 28 / 40
pt low, pd low, and pn(d) is Linear A:
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
"vf050331g"
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 29 / 40
pt low, pd low, and pn(d) is Linear B:
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
"vf050331h"
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 30 / 40
pt high, pd high, and pn(d) is Sigmoid:
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
"vf050331e"
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 31 / 40
Quantitative Results
optimal over optimal skirt overscenario active goal over skirt active goal
all 29.6% 0.1% 29.1%opposite 64.9% 0.2% 63.3%
danger zone 26.2% 0.01% 26.1%
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 32 / 40
Summary
As predictedMore compromise for linear BCompromise at edges for Sigmoid and ExponentialOptimal compromise out performs Active Goal.
Not predictedCompromise not seen inside danger zone at all in many cases.Compromise behind danger zone with high pd .pt has less effect than pd .Optimal compromise does not out perform skirt.
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 33 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 34 / 40
Why do we care?
Why not use optimal for our agents
Why not use a faster compromise strategy?
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40
Why do we care?
Why not use optimal for our agentsToo slow
Why not use a faster compromise strategy?
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40
Why do we care?
Why not use optimal for our agentsToo slowSmall benefit
Why not use a faster compromise strategy?
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40
Why do we care?
Why not use optimal for our agentsToo slowSmall benefit
Why not use a faster compromise strategy?Complicates system
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40
Why do we care?
Why not use optimal for our agentsToo slowSmall benefit
Why not use a faster compromise strategy?Complicates systemSmall benefit
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 35 / 40
Blending vs. Voting
Compromise in experiments here resembles “blending” of actionsMatches the descriptions in ethology literatureResult is similar to the recommended actions of the two subgoals
Compromise often justified as voting scheme:Subgoal votes for top n actions from finite setAction with most votes selectedResulting actions different from best for each subgoal
Confusion from equivocation on definition of compromise actionhigh vs. low level actionhigh level actions: small, discrete set; amenable to votinglow level actions: continuous, infinite set; result in blending
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 36 / 40
Compromise Behavior Hypothesis
1 Low level compromise action less useful than high levelcompromise
2 Higher the decision level, the more useful is compromise1 At low levels, compromise actions similar to non-compromise
actions2 At high levels, compromise actions can be very different from
non-compromise actions3 In complex environments, optimal or even very good non-optimal
low-level actions are prohibitively difficult to calculate
You want to compromise in the selection of the high-level
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 37 / 40
Outline
1 Introduction2 History
ComputationalBiological
3 Prescriptive Action SelectionFormulationExperiments
4 Proscriptive Action SelectionFormulationExperiments
5 What does it mean?A new hypothesis
6 Future Work and ConclusionFuture WorkConclusion
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 38 / 40
Future Work
Test against non-optimal compromise behaviorsTest the compromise behavior hypothesis
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 39 / 40
Conclusion
In prescriptive goal scenariosOptimal compromise marginally usefulSub-optimal compromise harmful
In proscriptive goal scenariosOptimal compromise behavior is different from what we expectedLess beneficial than expected, and only in some situations
Equivocation on definition of compromise actionhigh level vs low level actions
Compromise Behavior hypothesis may explain what is going on
F. L. Crabbe (USNA) Compromise Strategies for Action Selection Pitt ISP 2006 40 / 40