Post on 16-Oct-2021
transcript
expressiveintelligencestudio
Integrating Learning in a Multi-Scale Agent
Ben Weber
Dissertation Defense
May 18, 2012
expressiveintelligencestudio UC Santa Cruz
Introduction
AI has a long history of using games to advance the state of the field
[Shannon 1950]
expressiveintelligencestudio UC Santa Cruz
Real-Time Strategy Games
Building human-level AI for RTS games remains an open research challenge
StarCraft II, Blizzard Entertainment
expressiveintelligencestudio UC Santa Cruz
Task Environment Properties
Chess StarCraft Taxi Driving
Fully vs. partially observable
Fully Partially Partially
Deterministic vs. stochastic
Deterministic Deterministic* Stochastic
Episodic vs. sequential
Sequential Sequential Sequential
Static vs. dynamic Static Dynamic Dynamic
Discrete vs. continuous
Discrete Continuous Continuous
Single vs. multiagent Multi Multi Multi
[Russell & Norvig 2009]
expressiveintelligencestudio UC Santa Cruz
Motivation
RTS games present complex environments and complex tasks
Professional players demonstrate a broad range of reasoning capabilities
Human behavior can be observed, emulated, and evaluated
[Langley 2011, Mateas 2002]
expressiveintelligencestudio UC Santa Cruz
Hypothesis
Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities
expressiveintelligencestudio UC Santa Cruz
Research Questions
What competencies are necessary for expert StarCraft gameplay?
Which competencies can be learned from demonstrations?
How can these competencies be integrated in a real-time agent?
expressiveintelligencestudio UC Santa Cruz
Overview
StarCraft
Multi-Scale AI
Learning from Demonstration
Integrating Learning
Evaluation
expressiveintelligencestudio UC Santa Cruz
StarCraft
Expert gameplay
300+ APM
Evolving meta-game
Exhibited capabilities
Estimation
Anticipation
Adaptation
[Flash, Pro-gamer]
expressiveintelligencestudio UC Santa Cruz
StarCraft Gameplay
Expand Tech Tree
Manage Economy Produce Units
Attack Opponent
expressiveintelligencestudio UC Santa Cruz
Gameplay Scales in StarCraft
Individual
Squad
Global Support
siege line
Worker harassment
Aggressive mine placement
expressiveintelligencestudio UC Santa Cruz
State Space
The following number of states are possible, considering only unit type and location:
(Type * X * Y)Units
States on a 256x256 tile map:
(100*256*256)1700 > 1011,500
expressiveintelligencestudio UC Santa Cruz
Decision Complexity
The set of possible actions that can be executed at a particular moment:
O(2W(A * P) + 2T(D + S) + B(R + C))
W – number of workers
A – number of the type of worker assignments
P – average number of workspaces
T – number of troops
D – number of movement directions
[Aha et al. 2005]
expressiveintelligencestudio UC Santa Cruz
Decision Complexity
The set of possible actions that can be executed at a particular moment:
O(W * A * P + T * D * S + B(R + C))
Assumption
Unit actions can be selected independently
Resulting complexity:
Assuming 50 worker units on a 256x256 tile map results in more than 1,000,000 possible actions
expressiveintelligencestudio UC Santa Cruz
StarCraft
Complex gameplay
Real-world properties
Highly-competitive
Sources of expert gameplay
expressiveintelligencestudio UC Santa Cruz
Research Question #1
What competencies are necessary for expert StarCraft gameplay?
expressiveintelligencestudio UC Santa Cruz
Multi-Scale AI
Multiple scales
Actions are performed across multiple levels of coordination
Interrelated tasks
Performance in each tasks impacts other tasks
Real-time
Actions are performed in real time
expressiveintelligencestudio UC Santa Cruz
Reactive Planning
Provides useful mechanisms for building multi-scale agents
Advantages
Efficient behavior selection
Interleaved plan expansion and execution
Disadvantages
Lacks deliberative capabilities
[Loyall 1997, Mateas 2002]
expressiveintelligencestudio UC Santa Cruz
Agent Design
Implemented in the ABL reactive planning language
Architecture
Extension of McCoy & Mateas integrated agent framework
Partitions gameplay into distinct competencies
Uses a blackboard for coordination
[McCoy & Mateas 2008]
expressiveintelligencestudio UC Santa Cruz
EISBot Managers
Strategy Manager
Income Manager
Production Manager
Tactics Manager
Recon Manager
Gather Resources
Construct Buildings
Attack Opponent
Scout Opponent
expressiveintelligencestudio UC Santa Cruz
Multi-Scale Idioms
Design patterns for authoring multi-scale AI
Idioms
Message passing
Daemon behaviors
Managers
Unit subtasks
Behavior locking
expressiveintelligencestudio UC Santa Cruz
Idioms in EISBot
Initial_tree
Tactics Manager Strategy Manager Income Manager
Form Squad
Squad Monitor
Squad Attack Squad Retreat
Attack Enemy Pump Probes
Legend
Subgoal
Daemon behavior
Message passing Dragoon Dance
Timing Attack WME Probe Stop WME
expressiveintelligencestudio UC Santa Cruz
Multi-Scale AI
StarCraft gameplay is multi-scale
Reactive planning provides mechanisms for multi-scale reasoning
Idioms are applied in EISBot to support StarCraft gameplay
expressiveintelligencestudio UC Santa Cruz
Research Question #2
Which competencies can be learned from demonstrations?
expressiveintelligencestudio UC Santa Cruz
Learning from Demonstration
Objective
Emulate capabilities exhibited by expert players by harnessing gameplay demonstrations
Methods
Classification and regression model training
Case-based goal formulation
Parameter selection for model optimization
expressiveintelligencestudio UC Santa Cruz
Strategy Prediction
Tasks
Identify opponent build orders
Predict when buildings will be constructed
0
100
200
300
400
0 4 Game Time (minutes)
Spawning Pool Timing
[Hsieh & Sun 2008]
expressiveintelligencestudio UC Santa Cruz
Approach
Feature encoding
Each player’s actions are encoded in a single vector
Vectors are labeled using a build-order rule set
Features describe the game cycle when a unit or building type is first produced by a player
t, time when x is first produced by P
0, x was not (yet) produced by P
f(x) = {
expressiveintelligencestudio UC Santa Cruz
Strategy Prediction Results
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5 6 7 8 9 10 11 12
Re
call
Pre
cisi
on
Game Time (minutes)
NNge Boosting Rule Set State Lattice
expressiveintelligencestudio UC Santa Cruz
Strategy Learning
Task
Learn build-orders from demonstration
Trace Algorithm
Converts replays to a trace representation
Formulates goals based on most similar situation
q = argminc ϵ L distance(s, c)
g = s + (q’ - q)
[Ontañón et al. 2010]
expressiveintelligencestudio UC Santa Cruz
Trace Retrieval: Example
Consider a planning window of size 2
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >
expressiveintelligencestudio UC Santa Cruz
Trace Retrieval: Step 1
The system retrieves the most similar case, q
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >
expressiveintelligencestudio UC Santa Cruz
Trace Retrieval : Step 2
q’ is retrieved
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >
expressiveintelligencestudio UC Santa Cruz
Trace Retrieval : Step 3
The difference is computed: T4 – T2 = <1,1,0.4,1>
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >
expressiveintelligencestudio UC Santa Cruz
Trace Retrieval : Step 4
g is computed:
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >
g = s + (T4 – T2) = <4, 1, 1.4, 2>
expressiveintelligencestudio UC Santa Cruz
Strategy Learning Results
0
2
4
6
8
10
12
14
0 10 20 30 40 50 60 70 80 90 100
Pre
dic
tio
n E
rro
r (R
MSE
)
Actions performed by player
Opponent modeling with a window size of 20
Null
IB1
Trace
MultiTrace
expressiveintelligencestudio UC Santa Cruz
State Estimation
Task
Estimate enemy positions given prior observations
Particle Model
Apply movement model
Remove visible particles
Reweight particles
[Thrun 2002, Bererton 2004]
expressiveintelligencestudio UC Santa Cruz
Parameter Selection
Free parameters
Trajectory weights
Decay rates
State estimation is represented as an optimization problem
Input: parameter weights
Output: particle model error
Replays are used to implement a particle model error function
expressiveintelligencestudio UC Santa Cruz
State Estimation Results
0
20
40
60
80
100
120
140
160
0 2 4 6 8 10 12 14 16 18
Th
reat
Pre
dic
tio
n E
rro
r
Game Time (Minutes)
Null Model Perfect Tracker Default Model Optimized Model
expressiveintelligencestudio UC Santa Cruz
Learning from Demonstration
Anticipation
Classification and regression models
Adaptation
Case-based goal formulation
Estimation
Model optimization
expressiveintelligencestudio UC Santa Cruz
Research Question #3
How can these competencies be integrated in a real-time agent?
expressiveintelligencestudio UC Santa Cruz
Agent Architecture
expressiveintelligencestudio UC Santa Cruz
Integration Approaches
Augmenting working memory
External plan generation
External goal formulation Working Memory
External Components
expressiveintelligencestudio UC Santa Cruz
Augmenting Working Memory
Supplementing working memory with additional beliefs
expressiveintelligencestudio UC Santa Cruz
External Plan Generation
Generating plans outside the scope of ABL
expressiveintelligencestudio UC Santa Cruz
External Goal Formulation
Formulating goals outside the scope of ABL
expressiveintelligencestudio UC Santa Cruz
Goal-Driven Autonomy
A framework for building self introspective agents
GDA agents monitor plan execution, detect discrepancies, and explain failures
Implementations
Hand-authored rules
Case-based reasoning
[Molineaux et al. 2010, Muñoz-Avila et al. 2010]
expressiveintelligencestudio UC Santa Cruz
GDA Subtasks
Expectation generation
Discrepancy detection
Explanation generation
Goal formulation
expressiveintelligencestudio UC Santa Cruz
Implementation
expressiveintelligencestudio UC Santa Cruz
Integrating Learning
ABL agents can be interfaced with external learning components
Applying the GDA model enabled tighter coordination across capabilities
EISBot incorporates ABL behaviors, a particle model, and a GDA implementation
expressiveintelligencestudio UC Santa Cruz
Evaluation
Claim
Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities
Experiments
Ablation studies
User study
expressiveintelligencestudio UC Santa Cruz
GDA Ablation Study
Agent configurations
Base
Formulator
Predictor
GDA
Free parameters
Planning window size
Look-ahead window size
Discrepancy period
Discrepancy Detector
Explanation Generator
Goal Formulator
Goal Manager
Discrepancies
Explanations
Goals
expressiveintelligencestudio UC Santa Cruz
GDA Results
Overall results from the GDA experiments
Agent
Win Ratio
Base 0.73
Formulator 0.77
Predictor 0.81
GDA 0.92
expressiveintelligencestudio UC Santa Cruz
User Study
Experiment setup
Matches hosted on ICCup
3 trials
Testing script
1. Launch StarCraft
2. Connect to server
3. Host match
4. Announce experiment
[Dennis Fong, Pro-gamer]
expressiveintelligencestudio UC Santa Cruz
Performance on Tau Cross
0
500
1000
1500
2000
0 10 20 30 40 50
ICC
up
Sco
re
Number of Games Played
Base
Formulator
Predictor
GDA
expressiveintelligencestudio UC Santa Cruz
ICCup Results
Agent Longinus Python Tau Cross Overall
Base 942 599 669 737
Formulator 980 718 1078 925
Predictor 1111 555 1145 937
GDA 952 860 1293 1035
expressiveintelligencestudio UC Santa Cruz
EISBot Ranking
Rankings achieved by the complete GDA agent
Trial
Percentile Ranking
Longinus 32nd
Python 8th
Tau Cross 66th
Average 48th
expressiveintelligencestudio UC Santa Cruz
Evaluation
Ablation Studies
Optimized particle model
Complete GDA model
Integrating additional capabilities into EISBot improved performance
EISBot performed at the level of a competitive amateur StarCraft player
expressiveintelligencestudio UC Santa Cruz
Conclusion
Objective
Identify and realize capabilities necessary for expert-level StarCraft gameplay in an agent
Approach
Decompose gameplay
Learn capabilities from demonstrations
Integrate learned gameplay models
Evaluate versus humans and agents
expressiveintelligencestudio UC Santa Cruz
Contributions
Idioms for authoring multi-scale agents
Methods for learning from demonstration
Integration approaches for ABL agents
expressiveintelligencestudio UC Santa Cruz
Integrating Learning in a Multi-Scale Agent
Ben G. Weber
Ph.D. Candidate
Expressive Intelligence Studio
UC Santa Cruz
bweber@soe.ucsc.edu
Funding
NSF Grant IIS – 1018954
expressiveintelligencestudio UC Santa Cruz
References
Aha, Molineaux, & Ponsen. 2005. “Learning to Win: Case-Based Plan Selection in a Real-Time Strategy Game”, Proceedings of ICCBR.
Bererton. 2004. “State Estimation for Game AI using Particle Filters”, Proceedings of AAI Workshop on Challenges in Game AI.
Hsieh & Sun. 2008. “Building a Player Strategy Model by Analyzing Replays of Real-Time Strategy Games”, Proceedings of IJCNN.
Langley. 2011. “Artificial Intelligence and Cognitive Systems”, AISB Quarterly.
Loyall. 1997. “Believable Agents: Building Interactive Personalities”, Ph.D. thesis, CMU.
Mateas. 2002. “Believable Agents: Building Interactive Personalities”, Ph.D. thesis, CMU.
expressiveintelligencestudio UC Santa Cruz
References
McCoy & Mateas. 2008. “An Integrated Agent for Playing Real-Time Strategy Games”, Proceedings of AAAI.
Molineaux, Klenk, Aha. 2010. “Goal-Driven Autonomy in a Navy Strategy Simulation”, Proceedings of AAAI.
Muñoz-Avila, Aha, Jaidee, Klenk, Molineaux. 2010. “Applying Goal Driven Autonomy to a Team Shooter Game”, Proceedings of FLAIRS.
Ontañón, Mishra, Sugandh, Ram. 2010. “On-line Case-Based Planning”, Computational Intelligence.
Russell & Norvig. 2009. Artificial Intelligence: A Modern Approach.
Shannon. 1950. “Programming a Computer for Playing Chess”, Philosophical magazine .
Thrun. 2002. “Particle Filters in Robotics”, Proceedings of UAI.