Integrating Learning in a Multi-Scale Agent

transcript

expressiveintelligencestudio

Ben Weber

Dissertation Defense

May 18, 2012

expressiveintelligencestudio UC Santa Cruz

Introduction

AI has a long history of using games to advance the state of the field

[Shannon 1950]

Real-Time Strategy Games

Building human-level AI for RTS games remains an open research challenge

StarCraft II, Blizzard Entertainment

Task Environment Properties

Chess StarCraft Taxi Driving

Fully vs. partially observable

Fully Partially Partially

Deterministic vs. stochastic

Deterministic Deterministic* Stochastic

Episodic vs. sequential

Sequential Sequential Sequential

Static vs. dynamic Static Dynamic Dynamic

Discrete vs. continuous

Discrete Continuous Continuous

Single vs. multiagent Multi Multi Multi

[Russell & Norvig 2009]

Motivation

RTS games present complex environments and complex tasks

Professional players demonstrate a broad range of reasoning capabilities

Human behavior can be observed, emulated, and evaluated

[Langley 2011, Mateas 2002]

Hypothesis

Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities

Research Questions

What competencies are necessary for expert StarCraft gameplay?

Which competencies can be learned from demonstrations?

How can these competencies be integrated in a real-time agent?

Overview

StarCraft

Multi-Scale AI

Learning from Demonstration

Integrating Learning

Evaluation

StarCraft

Expert gameplay

300+ APM

Evolving meta-game

Exhibited capabilities

Estimation

Anticipation

Adaptation

[Flash, Pro-gamer]

StarCraft Gameplay

Expand Tech Tree

Manage Economy Produce Units

Attack Opponent

Gameplay Scales in StarCraft

Individual

Global Support

siege line

Worker harassment

Aggressive mine placement

State Space

The following number of states are possible, considering only unit type and location:

(Type * X * Y)Units

States on a 256x256 tile map:

(100*256*256)1700 > 1011,500

Decision Complexity

The set of possible actions that can be executed at a particular moment:

O(2W(A * P) + 2T(D + S) + B(R + C))

W – number of workers

A – number of the type of worker assignments

P – average number of workspaces

T – number of troops

D – number of movement directions

[Aha et al. 2005]

Decision Complexity

The set of possible actions that can be executed at a particular moment:

O(W * A * P + T * D * S + B(R + C))

Assumption

Unit actions can be selected independently

Resulting complexity:

Assuming 50 worker units on a 256x256 tile map results in more than 1,000,000 possible actions

StarCraft

Complex gameplay

Real-world properties

Highly-competitive

Sources of expert gameplay

Research Question #1

What competencies are necessary for expert StarCraft gameplay?

Multi-Scale AI

Multiple scales

Actions are performed across multiple levels of coordination

Interrelated tasks

Performance in each tasks impacts other tasks

Real-time

Actions are performed in real time

Reactive Planning

Provides useful mechanisms for building multi-scale agents

Advantages

Efficient behavior selection

Interleaved plan expansion and execution

Disadvantages

Lacks deliberative capabilities

[Loyall 1997, Mateas 2002]

Agent Design

Implemented in the ABL reactive planning language

Architecture

Extension of McCoy & Mateas integrated agent framework

Partitions gameplay into distinct competencies

Uses a blackboard for coordination

[McCoy & Mateas 2008]

EISBot Managers

Strategy Manager

Income Manager

Production Manager

Tactics Manager

Recon Manager

Gather Resources

Construct Buildings

Attack Opponent

Scout Opponent

Multi-Scale Idioms

Design patterns for authoring multi-scale AI

Idioms

Message passing

Daemon behaviors

Managers

Unit subtasks

Behavior locking

Idioms in EISBot

Initial_tree

Tactics Manager Strategy Manager Income Manager

Form Squad

Squad Monitor

Squad Attack Squad Retreat

Attack Enemy Pump Probes

Legend

Subgoal

Daemon behavior

Message passing Dragoon Dance

Timing Attack WME Probe Stop WME

Multi-Scale AI

StarCraft gameplay is multi-scale

Reactive planning provides mechanisms for multi-scale reasoning

Idioms are applied in EISBot to support StarCraft gameplay

Which competencies can be learned from demonstrations?

Objective

Emulate capabilities exhibited by expert players by harnessing gameplay demonstrations

Methods

Classification and regression model training

Case-based goal formulation

Parameter selection for model optimization

Strategy Prediction

Identify opponent build orders

Predict when buildings will be constructed

0 4 Game Time (minutes)

Spawning Pool Timing

[Hsieh & Sun 2008]

Approach

Feature encoding

Each player’s actions are encoded in a single vector

Vectors are labeled using a build-order rule set

Features describe the game cycle when a unit or building type is first produced by a player

t, time when x is first produced by P

0, x was not (yet) produced by P

f(x) = {

Strategy Prediction Results

0 1 2 3 4 5 6 7 8 9 10 11 12

Game Time (minutes)

NNge Boosting Rule Set State Lattice

Strategy Learning

Learn build-orders from demonstration

Trace Algorithm

Converts replays to a trace representation

Formulates goals based on most similar situation

q = argminc ϵ L distance(s, c)

g = s + (q’ - q)

[Ontañón et al. 2010]

Trace Retrieval: Example

Consider a planning window of size 2

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

Trace Retrieval: Step 1

The system retrieves the most similar case, q

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

Trace Retrieval : Step 2

q’ is retrieved

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

The difference is computed: T4 – T2 = <1,1,0.4,1>

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

g is computed:

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

g = s + (T4 – T2) = <4, 1, 1.4, 2>

Strategy Learning Results

0 10 20 30 40 50 60 70 80 90 100

Actions performed by player

Opponent modeling with a window size of 20

MultiTrace

State Estimation

Estimate enemy positions given prior observations

Particle Model

Apply movement model

Remove visible particles

Reweight particles

[Thrun 2002, Bererton 2004]

Parameter Selection

Free parameters

Trajectory weights

Decay rates

State estimation is represented as an optimization problem

Input: parameter weights

Output: particle model error

Replays are used to implement a particle model error function

State Estimation Results

0 2 4 6 8 10 12 14 16 18

Game Time (Minutes)

Null Model Perfect Tracker Default Model Optimized Model

Anticipation

Classification and regression models

Adaptation

Case-based goal formulation

Estimation

Model optimization

How can these competencies be integrated in a real-time agent?

Agent Architecture

Integration Approaches

Augmenting working memory

External plan generation

External goal formulation Working Memory

External Components

Augmenting Working Memory

Supplementing working memory with additional beliefs

External Plan Generation

Generating plans outside the scope of ABL

External Goal Formulation

Formulating goals outside the scope of ABL

Goal-Driven Autonomy

A framework for building self introspective agents

GDA agents monitor plan execution, detect discrepancies, and explain failures

Implementations

Hand-authored rules

Case-based reasoning

[Molineaux et al. 2010, Muñoz-Avila et al. 2010]

GDA Subtasks

Expectation generation

Discrepancy detection

Explanation generation

Goal formulation

Implementation

Integrating Learning

ABL agents can be interfaced with external learning components

Applying the GDA model enabled tighter coordination across capabilities

EISBot incorporates ABL behaviors, a particle model, and a GDA implementation

Evaluation

Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities

Experiments

Ablation studies

User study

GDA Ablation Study

Agent configurations

Formulator

Predictor

Free parameters

Planning window size

Look-ahead window size

Discrepancy period

Discrepancy Detector

Explanation Generator

Goal Formulator

Goal Manager

Discrepancies

Explanations

GDA Results

Overall results from the GDA experiments

Win Ratio

Base 0.73

Formulator 0.77

Predictor 0.81

GDA 0.92

User Study

Experiment setup

Matches hosted on ICCup

3 trials

Testing script

1. Launch StarCraft

2. Connect to server

3. Host match

4. Announce experiment

[Dennis Fong, Pro-gamer]

Performance on Tau Cross

0 10 20 30 40 50

Number of Games Played

Formulator

Predictor

ICCup Results

Agent Longinus Python Tau Cross Overall

Base 942 599 669 737

Formulator 980 718 1078 925

Predictor 1111 555 1145 937

GDA 952 860 1293 1035

EISBot Ranking

Rankings achieved by the complete GDA agent

Percentile Ranking

Longinus 32nd

Python 8th

Tau Cross 66th

Average 48th

Evaluation

Ablation Studies

Optimized particle model

Complete GDA model

Integrating additional capabilities into EISBot improved performance

EISBot performed at the level of a competitive amateur StarCraft player

Conclusion

Objective

Identify and realize capabilities necessary for expert-level StarCraft gameplay in an agent

Approach

Decompose gameplay

Learn capabilities from demonstrations

Integrate learned gameplay models

Evaluate versus humans and agents

Contributions

Idioms for authoring multi-scale agents

Methods for learning from demonstration

Integration approaches for ABL agents

Integrating Learning in a Multi-Scale Agent

Ben G. Weber

Ph.D. Candidate

Expressive Intelligence Studio

UC Santa Cruz

bweber@soe.ucsc.edu

Funding

NSF Grant IIS – 1018954

References

Aha, Molineaux, & Ponsen. 2005. “Learning to Win: Case-Based Plan Selection in a Real-Time Strategy Game”, Proceedings of ICCBR.

Bererton. 2004. “State Estimation for Game AI using Particle Filters”, Proceedings of AAI Workshop on Challenges in Game AI.

Hsieh & Sun. 2008. “Building a Player Strategy Model by Analyzing Replays of Real-Time Strategy Games”, Proceedings of IJCNN.

Langley. 2011. “Artificial Intelligence and Cognitive Systems”, AISB Quarterly.

Loyall. 1997. “Believable Agents: Building Interactive Personalities”, Ph.D. thesis, CMU.

Mateas. 2002. “Believable Agents: Building Interactive Personalities”, Ph.D. thesis, CMU.

References

McCoy & Mateas. 2008. “An Integrated Agent for Playing Real-Time Strategy Games”, Proceedings of AAAI.

Molineaux, Klenk, Aha. 2010. “Goal-Driven Autonomy in a Navy Strategy Simulation”, Proceedings of AAAI.

Muñoz-Avila, Aha, Jaidee, Klenk, Molineaux. 2010. “Applying Goal Driven Autonomy to a Team Shooter Game”, Proceedings of FLAIRS.

Ontañón, Mishra, Sugandh, Ram. 2010. “On-line Case-Based Planning”, Computational Intelligence.

Russell & Norvig. 2009. Artificial Intelligence: A Modern Approach.

Shannon. 1950. “Programming a Computer for Playing Chess”, Philosophical magazine .

Thrun. 2002. “Particle Filters in Robotics”, Proceedings of UAI.

Integrating Learning in a Multi-Scale Agent

Documents