+ All Categories
Home > Documents > A Review of Real-Time Strategy Game AI

A Review of Real-Time Strategy Game AI

Date post: 12-Dec-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
30
Articles WINTER 2014 75 Copyright © 2014, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602 A Review of Real-Time Strategy Game AI Glen Robertson, Ian Watson G ames are an ideal domain for exploring the capabili- ties of artificial intelligence (AI) within a constrained environment and a fixed set of rules, where problem- solving techniques can be developed and evaluated before being applied to more complex real-world problems (Scha- effer 2001). AI has notably been applied to board games, such as chess, Scrabble, and backgammon, creating compe- tition that has sped the development of many heuristic- based search techniques (Schaeffer 2001). Over the past decade, there has been increasing interest in research based on video game AI, which was initiated by Laird and van Lent (2001) in their call for the use of video games as a test bed for AI research. They saw video games as a potential area for iterative advancement in increasingly sophisticated scenar- ios, eventually leading to the development of human-level AI. Buro (2003) later called for increased research in real-time strategy (RTS) games as they provide a sandbox for exploring various complex challenges that are central to game AI and many other problems. Video games are an attractive alternative to robotics for AI research because they increasingly provide a complex and realistic environment for simulation, with few of the messy properties (and cost) of real-world equipment (Buro 2004; Laird and van Lent 2001). They also present a number of challenges that set them apart from the simpler board games that AI has famously been applied to in the past. Video games often have real-time constraints that prevent players from thinking extensively about each action, randomness that prevents players from completely planning future events, and hidden information that prevents players from n This literature review covers AI techniques used for real-time strategy video games, focus- ing specifcally on StarCraft. It fnds that the main areas of current academic research are in tactical and strategic decision making, plan recognition, and learning, and it outlines the research contributions in each of these areas. The paper then contrasts the use of game AI in academe and industry, fnding the aca- demic research heavily focused on creating game-winning agents, while the industry aims to maximize player enjoyment. It fnds that industry adoption of academic research is low because it is either inapplicable or too time-consuming and risky to implement in a new game, which highlights an area for potential investigation: bridging the gap between academe and industry. Finally, the areas of spatial reasoning, multiscale AI, and cooperation are found to require future work, and standardized evaluation methods are pro- posed to produce comparable results between studies.
Transcript

Articles

WINTER 2014 75Copyright © 2014, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602

A Review of Real-TimeStrategy Game AI

Glen Robertson, Ian Watson

Games are an ideal domain for exploring the capabili-ties of artificial intelligence (AI) within a constrainedenvironment and a fixed set of rules, where problem-

solving techniques can be developed and evaluated beforebeing applied to more complex real-world problems (Scha-effer 2001). AI has notably been applied to board games,such as chess, Scrabble, and backgammon, creating compe-tition that has sped the development of many heuristic-based search techniques (Schaeffer 2001). Over the pastdecade, there has been increasing interest in research basedon video game AI, which was initiated by Laird and van Lent(2001) in their call for the use of video games as a test bedfor AI research. They saw video games as a potential area foriterative advancement in increasingly sophisticated scenar-ios, eventually leading to the development of human-levelAI. Buro (2003) later called for increased research in real-timestrategy (RTS) games as they provide a sandbox for exploringvarious complex challenges that are central to game AI andmany other problems.

Video games are an attractive alternative to robotics for AIresearch because they increasingly provide a complex andrealistic environment for simulation, with few of the messyproperties (and cost) of real-world equipment (Buro 2004;Laird and van Lent 2001). They also present a number ofchallenges that set them apart from the simpler board gamesthat AI has famously been applied to in the past. Videogames often have real-time constraints that prevent playersfrom thinking extensively about each action, randomnessthat prevents players from completely planning futureevents, and hidden information that prevents players from

n This literature review covers AI techniquesused for real-time strategy video games, focus-ing specifically on StarCraft. It finds that themain areas of current academic research arein tactical and strategic decision making, planrecognition, and learning, and it outlines theresearch contributions in each of these areas.The paper then contrasts the use of game AIin academe and industry, finding the aca-demic research heavily focused on creatinggame-winning agents, while the industryaims to maximize player enjoyment. It findsthat industry adoption of academic researchis low because it is either inapplicable or tootime-consuming and risky to implement in anew game, which highlights an area forpotential investigation: bridging the gapbetween academe and industry. Finally, theareas of spatial reasoning, multiscale AI, andcooperation are found to require future work,and standardized evaluation methods are pro-posed to produce comparable results betweenstudies.

Articles

76 AI MAGAZINE

Figure 1. A Typical Match Start in an RTS Game.

Worker units have been sent to gather resources (right) and return them to the central building. Resources (recorded top right) are beingspent building an additional worker (bottom center). Dark fog (left) blocks visibility away from player units.

0 0 5/10

1500/1500

Terran Command Center Building

MENU

knowing exactly what the other players are doing.Similar to many board games, competitive videogames usually require adversarial reasoning to reactaccording to other players’ actions (Laird and vanLent 2001; Mehta et al. 2009; Weber, Mateas, andJhala 2010).

RTS GamesThis article is focused on real-time strategy games,which are essentially simplified military simulations.In an RTS game, a player indirectly controls manyunits and structures by issuing orders from an over-head perspective (figure 1) in real time in order togather resources, build an infrastructure and an army,and destroy the opposing player’s forces. The real-time aspect comes from the fact that players do nottake turns, but instead may perform as many actionsas they are physically able to make, while the gamesimulation runs at a constant frame rate (24 frames

per second in StarCraft) to approximate a continuousflow of time. Some notable RTS games include DuneII, Total Annihilation, and the Warcraft, Command &Conquer, Age of Empires, and StarCraft series.

Generally, each match in an RTS game involvestwo players starting with a few units and/or struc-tures in different locations on a two-dimensional ter-rain (map). Nearby resources can be gathered in orderto produce additional units and structures and pur-chase upgrades, thus gaining access to moreadvanced in-game technology (units, structures, andupgrades). Additional resources and strategicallyimportant points are spread around the map, forcingplayers to spread out their units and buildings inorder to attack or defend these positions. Visibility isusually limited to a small area around player-ownedunits, limiting information and forcing players toconduct reconnaissance in order to respond effec-tively to their opponents. In most RTS games, a

Articles

WINTER 2014 77

match ends when one player (or team) destroys allbuildings belonging to the opponent player (orteam), although often players will forfeit earlierwhen they see they cannot win.

RTS games have a variety of military units, used bythe players to wage war, as well as units and struc-tures to aid in resource collection, unit production,and upgrades. During a match, players must balancethe development of their economy, infrastructure,and upgrades with the production of military units,so they have enough units to successfully attack anddefend in the present and enough resources andupgrades to succeed later. They must also decidewhich units and structures to produce and whichtechnologies to advance throughout the game inorder to have access to the appropriate compositionof units at the appropriate times. This long-termhigh-level planning and decision making, oftencalled macromanagement, is referred to in this articleas strategic decision making. In addition to strategicdecision making, players must carefully control theirunits in order to maximize their effectiveness on thebattlefield. Groups of units can be maneuvered intoadvantageous positions on the map to surround orescape the enemy, and individual units can be con-trolled to attack a weak enemy unit or avoid anincoming attack. This short-term control and deci-sion making with individual units, often calledmicromanagement, and medium-term planningwith groups of units, often called tactics, is referred tocollectively in this article as tactical decision making.

In addition to the general video game challengesmentioned above, RTS games involve long-termgoals and usually require multiple levels of abstrac-tion and reasoning. They have a vast space of actionsand game states, with durative actions, a hugebranching factor, and actions that can have long-term effects throughout the course of a match (Buroand Churchill 2012; Buro and Furtak 2004; Mehta etal. 2009; Ontañón 2012; Tozour 2002; Weber,Mateas, and Jhala 2010). Even compared with Go,which is currently an active area of AI research, RTSgames present a huge increase in complexity — atleast an order of magnitude increase in the numberof possible game states, actions to choose from,actions per game, and actions per minute (using stan-dard rules) (Buro 2004; Schaeffer 2001; Synnaeve andBessière 2011b). The state space is so large that tradi-tional heuristic-based search techniques, which haveproven effective in a range of board games (Schaeffer2001), have so far been unable to solve all but themost restricted subproblems of RTS AI. Due to theircomplexity and challenges, RTS games are probablythe best current environment in which to pursueLaird and van Lent’s vision of game AI as a steppingstone toward human-level AI. It is a particularlyinteresting area for AI research because even the bestagents are outmatched by experienced humans(Huang 2011; Synnaeve and Bessière 2011a; Weber,

Mateas, and Jhala 2010), due to the human abilitiesto abstract, reason, learn, plan, and recognize plans(Buro 2004; Buro and Churchill 2012).

StarCraftThis article primarily examines AI research within asubtopic of RTS games: the RTS game StarCraft1 (fig-ure 2). StarCraft is a canonical RTS game, like chess isto board games, with a huge player base and numer-ous professional competitions. The game has threedifferent but very well balanced teams, or races,allowing for varied strategies and tactics without anydominant strategy, and requires both strategic andtactical decision making roughly equally (Synnaeveand Bessière 2011b). These features give StarCraft anadvantage over other RTS titles that are used for AIresearch, such as Wargus2 and ORTS.3

StarCraft was chosen because of its increasing pop-ularity for use in RTS game AI research, driven by theBrood War application programming interface(BWAPI)4 and the AIIDE5 and CIG6 StarCraft AI Com-petitions. BWAPI provides an interface to program-matically interact with StarCraft, allowing externalcode to query the game state and execute actions asif they were a player in a match. The competitionspit StarCraft AI agents (or bots) against each other infull games of StarCraft to determine the best bots andimprovements each year (Buro and Churchill 2012).Initially these competitions also involved simplifiedchallenges based on subtasks in the game, such ascontrolling a given army to defeat an opponent withan equal army, but more recent competitions haveused only complete matches. For more detail on Star-Craft competitions and bots, see Ontañón et al. (inpress).

In order to develop AI for StarCraft, researchershave tried many different techniques, as outlinedin table 1. A community has formed around thegame as a research platform, enabling people tobuild on each other’s work and avoid repeating thenecessary groundwork before an AI system can beimplemented.

This work includes a terrain analysis module(Perkins 2010), well-documented source code for acomplete, modular bot (Churchill and Buro 2012),and preprocessed data sets assembled from thou-sands of professional games (Synnaeve and Bessière2012). StarCraft has a lasting popularity among pro-fessional and amateur players, including a large pro-fessional gaming scene in South Korea, with interna-tional competitions awarding millions of dollars inprizes every year (Churchill and Buro 2011). Thispopularity means that there are a large number ofhigh-quality game logs (replays) available on theInternet that can be used for data mining, and thereare many players of all skill levels to test against(Buro and Churchill 2012; Synnaeve and Bessière2011b; Weber, Mateas, and Jhala 2011a).

This article presents a review of the literature on

RTS AI with an emphasis on StarCraft. It includesparticular research based on other RTS games in thecase that significant literature based on StarCraft isnot (yet) available in that area. The article begins byoutlining the different AI techniques used, groupedby the area in which they are primarily applied.These areas are tactical decision making, strategicdecision making, plan recognition, and learning.This is followed by a comparison of the way game AIis used in academe and the game industry, whichoutlines the differences in goals and discusses thelow adoption of academic research in the industry.Finally, some areas are identified in which theredoes not seem to be sufficient research on topicsthat are well-suited to study in the context of RTSgame AI. This last section also calls for standardiza-tion of the evaluation methods used in StarCraft AI

research in order to make comparison possiblebetween papers.

Tactical Decision MakingTactical and micromanagement decisions — control-ling individual units or groups of units over a shortperiod of time — often make use of a different tech-nique from the AI that makes strategic decisions.These tactical decisions can follow a relatively simplemetric, such as attempting to maximize the amountof enemy firepower that can be removed from theplaying field in the shortest time (Davis 1999). In thevideo game industry, it is common for simple tech-niques, such as finite state machines, to be used tomake these decisions (Buckland 2005). However,even in these small-scale decisions, many factors can

Articles

78 AI MAGAZINE

Figure 2. Part of a Player’s Base in StarCraft.

The white rectangle on the minimap (bottom left) is the area visible on screen. The minimap shows areas that are unexplored (black),explored but not visible (dark), and visible (light). It also shows the player’s forces (lighter dots) and last-seen enemy buildings (darker dots).

.

89 190 103/134

1500/1500

Terran Command Center

Supplies used: 103Supplies provided: 10Total supplies: 134Supplies max: 200

MENU

be considered to attempt to make the best decisionspossible, particularly when using units with variedabilities (figure 3), but the problem space is not near-ly as large as that of the full game, making feasibleexploratory approaches to learning domain knowl-edge (Weber and Mateas 2009). There appears to beless research interest in this aspect of RTS game AIthan in the area of large-scale, long-term strategicdecision making and learning.

Reinforcement LearningReinforcement learning (RL) is an area of machinelearning in which an agent must learn, by trial anderror, optimal actions to take in particular situationsorder to maximize an overall reward value (Suttonand Barto 1998). Through many iterations of weaklysupervised learning, RL can discover new solutions

that are better than previously known solutions. It isrelatively simple to apply to a new domain, as itrequires only a description of the situation and pos-sible actions, and a reward metric (Manslow 2004).However, in a domain as complex as an RTS game —even just for tactical decision making — RL oftenrequires clever state abstraction mechanisms in orderto learn effectively. This technique is not commonlyused for large-scale strategic decision making, but isoften applied to tactical decision making in RTSgames, likely because of the huge problem space anddelayed reward inherent in strategic decisions, whichmake RL difficult.

RL has been applied to StarCraft by Shantia,Begue, and Wiering (2011), where Sarsa, an algo-rithm for solving RL problems, is used to learn tocontrol units in small skirmishes. They made use of

Articles

WINTER 2014 79

Figure 3. A Battle in StarCraft.

Intense micromanagement is required to maximize the effectiveness of individual units, especially spellcaster units like theProtoss Arbiter.

1311 965 160/200

150/150 200/20077/250

Protoss Arbiter

Kills: 0

MENU

artificial neural networks to learn the expectedreward for attacking or fleeing with a particular unitin a given state (figure 4), and chose the action withthe highest expected reward when in-game. The sys-tem learned to beat the inbuilt StarCraft AI scriptingon average in only small three-unit skirmishes, withnone of the variations learning to beat the in-builtscripting on average in six-unit skirmishes (Shantia,Begue, and Wiering 2011).

RL techniques have also been applied to other RTS

games. Sharma et al. (2007) and Molineaux, Aha, andMoore (2008) combine case-based reasoning (CBR)and RL for learning tactical-level unit control inMadRTS7 (a description of CBR is presented later onin this article). Sharma et al. (2007) was able toincrease the learning speed of the RL agent by begin-ning learning in a simple situation and then gradual-ly increasing the complexity of the situation. Theresulting performance of the agent was the same orbetter than an agent trained in the complex situationdirectly.

Their system stores its knowledge in cases that per-tain to situations it has encountered before, as inCBR. However, each case stores the expected utilityfor every possible action in that situation as well asthe contribution of that case to a reward value, allow-ing the system to learn desirable actions and situa-tions. It remains to be seen how well it would workin a more complex domain.

Molineaux, Aha, and Moore (2008) describe a sys-tem for RL with nondiscrete actions. Their systemretrieves similar cases from past experience and esti-mates the result of applying each case’s actions to thecurrent state. It then uses a separate case base to esti-mate the value of each estimated resulting state, andextrapolates around, or interpolates between, theactions to choose one that is estimated to provide themaximum value state. This technique results in a sig-nificant increase in performance when comparedwith one using discrete actions (Molineaux, Aha, andMoore 2008).

Human critique is added to RL by Judah et al.(2010) in order to learn tactical decision making for

Articles

80 AI MAGAZINE

Agent Internal State

t – 1 X1

...

...X2 X X

Agent Environment (Grids)

Figure 4. Game State Information Fed into a Neural Network to Produce an Expected Reward Value for a Particular Action.

Adapted from Shantia, Begue, and Wiering (2011).

Table 1. AI Techniques Used for StarCraft.

Tactical Decision Making Strategic Decision Making and Plan Recognition

Reinforcement Learning

Game-Tree Search

Bayesian models

Case-Based Reasoning

Neural Networks

Case-Based Planning

Hierarchical Planning

Behavior Trees

Goal-Driven Autonomy

State Space Planning

Evolutionary Algorithms

Cognitive Architectures

Deductive Reasoning

Probabilistic Reasoning

Case-Based Reasoning

controlling a small group of units in combat in War-gus. By interleaving sessions of autonomous statespace exploration and human critique of the agent’sactions, the system was able to learn a better policyin a fraction of the training iterations compared withusing RL alone. However, slightly better overallresults were achieved using human critique only totrain the agent, possibly due to humans giving betterfeedback when they can see an immediate result(Judah et al. 2010).

Marthi et al. (2005) argues that it is preferable todecrease the apparent complexity of RTS games andpotentially increase the effectiveness of RL or othertechniques by decomposing the game into a hierar-chy of interacting parts. Using this method, insteadof coordinating a group of units by learning the cor-rect combination of unit actions, each unit can becontrolled individually with a higher-level groupcontrol affecting each individual’s decision. Similarhierarchical decomposition appears in many RTS AIapproaches because it reduces complexity from acombinatorial combination of possibilities — in thiscase, possible actions for each unit — down to a mul-tiplicative combination.

Game-Tree SearchSearch-based techniques have so far been unable todeal with the complexity of the long-term strategicaspects of RTS games, but they have been successful-ly applied to smaller-scale or abstracted versions ofRTS combat. To apply these search methods, a simu-lator is usually required to allow the AI system toevaluate the results of actions very rapidly in order toexplore the game tree.

Sailer, Buro, and Lanctot (2007) take a game theo-retic approach by searching for the Nash equilibriumstrategy among a set of known strategies in a simpli-fied RTS. Their simplified RTS retains just the tacticsaspect of RTS games by concentrating on unit groupmovements, so it does not require long-term plan-ning for building infrastructure and also excludesmicromanagement for controlling individual units.They use a simulation to compare the expected out-come from using each of the strategies against theiropponent, for each of the strategies their opponentcould be using (which is drawn from the same set),and select the Nash-optimal strategy. The simulationcan avoid simulating every time step, skippinginstead to just the states in which something inter-esting happens, such as a player making a decision,or units coming into firing range of opponents.Through this combination of abstraction, state skip-ping, and needing to examine only the possiblemoves prescribed by a pair of known strategies at atime, it is usually possible to search all the way to anend-game state very rapidly, which in turn means asimple evaluation function can be used. The resultingNash player was able to defeat each of the scriptedstrategies, as long as the set included a viable coun-

terstrategy for each strategy, and it also producedbetter results than the max-min and min-max play-ers (Sailer, Buro, and Lanctot 2007).

Search-based techniques are particularly difficultto use in StarCraft because of the closed-sourcenature of the game and inability to arbitrarily manip-ulate the game state. This means that the precisemechanics of the game rules are unclear, and thegame cannot be easily set up to run from a particu-lar state to be used as a simulator. Furthermore, thegame must carry out expensive calculations such asunit vision and collisions, and cannot be forced toskip ahead to just the interesting states, making ittoo slow for the purpose of search (Churchill, Saffi-dine, and Buro 2012). In order to overcome theseproblems, Churchill, Saffidine, and Buro (2012) cre-ated a simulator called SparCraft8 that models Star-Craft and approximates the rules, but allows thestate to be arbitrarily manipulated and unnecessaryexpensive calculations to be ignored (including skip-ping uninteresting states). Using this simulator anda modified version of alpha-beta search, which takesinto consideration actions of differing duration, theycould find effective moves for a given configurationof units. Search time was limited to approximatereal-time conditions, so the moves found were notoptimal. This search allowed them to win an averageof 92 percent of randomized balanced scenariosagainst all of the standard scripted strategies theytested against within their simulator (Churchill, Saf-fidine, and Buro 2012).

Despite working very well in simulation, theresults do not translate perfectly back to the actualgame of StarCraft, due to simplifications, such as thelack of unit collisions and acceleration, that affectthe outcome (Churchill and Buro 2012; Churchill,Saffidine, and Buro 2012). The system was able towin only 84 percent of scenarios against the built inStarCraft AI despite the simulation predicting 100percent, faring the worst in scenarios that were setup to require hit-and-run behavior (Churchill andBuro 2012). The main limitation of this system isthat due to the combinatorial explosion of possibleactions and states as the number of units increases,the number of possible actions in StarCraft, and atime constraint of 5ms per game frame, the searchwill only allow up to eight units per side in a two-player battle before it is too slow. On the other hand,better results may be achieved through opponentmodeling, because the search can incorporateknown opponent actions instead of searchingthrough all possible opponent actions.

When this was tested on the scripted strategieswith a perfect model of each opponent (the scriptsthemselves), the search was able to achieve at least a95 percent win rate against each of the scripts in sim-ulation (Churchill, Saffidine, and Buro 2012).

Monte Carlo PlanningMonte Carlo planning has received significant atten-

Articles

WINTER 2014 81

tion recently in the field of computer Go, but seemsto be almost absent from RTS AI, and (to the authors’knowledge) completely untested in the domain ofStarCraft. It involves sampling the decision spaceusing randomly generated plans in order to find outwhich plans tend to lead to more successful out-comes. It may be very suitable for RTS games becauseit can deal with uncertainty, randomness, large deci-sion spaces, and opponent actions through its sam-pling mechanism. Monte Carlo planning has likelynot yet been applied to StarCraft due to the unavail-ability of an effective simulator, as was the case withthe search methods above, as well as the complexityof the domain. However, it has been applied to somevery restricted versions of RTS games. Although bothof the examples seen here are considering tactical-and unit-level decisions, given a suitable abstractionand simulation, Monte Carlo tree search (MCTS) mayalso be effective at strategic level decision making ina domain as complex as StarCraft.

Chung, Buro, and Schaeffer (2005) created a cap-ture-the-flag game in which each player needed tocontrol a group of units to navigate through obsta-cles to the opposite side of a map and retrieve theopponent’s flag. They created a generalized MonteCarlo planning framework and then applied it totheir game, producing positive results. Unfortunate-ly, they lacked a strong scripted opponent to testagainst, and their system was also very reliant onheuristic evaluations of intermediate states in orderto make planning decisions. Later, Balla and Fern(2009) applied the more recent technique of upperconfidence bounds applied to trees (UCT) to a sim-plified Wargus scenario. A major benefit of theirapproach is that it does not require a heuristic evalu-ation function for intermediate states, and insteadplays a game randomly out to a terminal state inorder to evaluate a plan. The system was evaluated byplaying against a range of scripts and a human play-er in a scenario involving multiple friendly and ene-my groups of the basic footman unit placed aroundan empty map. In these experiments, the UCT sys-tem made decisions at the tactical level for movinggroups of units while micromanagement was con-trolled by the inbuilt Wargus AI, and the UCT evalu-ated terminal states based on either unit hit pointsremaining or time taken. The system was able to winall of the scenarios, unlike any of the scripts, and tooverall outperform all of the other scripts and thehuman player on the particular metric (either hitpoints or time) that it was using.

Other TechniquesVarious other AI techniques have been applied to tac-tical decision making in StarCraft. Synnaeve andBessière (2011b) combine unit objectives, opportuni-ties, and threats using a Bayesian model to decidewhich direction to move units in a battle. The mod-el treats each of its sensory inputs as part of a proba-

bility equation that can be solved, given data (poten-tially learned through RL) about the distributions ofthe inputs with respect to the direction moved, tofind the probability that a unit should move in eachpossible direction. The best direction can be selected,or the direction probabilities can be sampled over toavoid having two units choose to move into the samelocation. Their Bayesian model is paired with a hier-archical finite state machine to choose different setsof behavior for when units are engaging or avoidingenemy forces, or scouting. The bot produced was veryeffective against the built-in StarCraft AI as well as itsown ablated versions (Synnaeve and Bessière 2011b).

CBR, although usually used for strategic reasoningin RTS AI, has also been applied to tactical decisionmaking in Warcraft III,9 a game that has a greaterfocus on micromanagement than StarCraft(Szczepanski and Aamodt 2009). CBR generallyselects the most similar case for reuse, but Szczepan-ski and Aamodt (2009) added a conditional check toeach case so that it could be selected only when itsaction was able to be executed. They also added reac-tionary cases that would be executed as soon as cer-tain conditions were met. The resulting agent wasable to beat the built in AI of Warcraft III in a micro-management battle using only a small number of cas-es, and was able to assist human players by micro-managing battles to let the human focus onhigher-level strategy.

Neuroevolution is a technique that uses an evolu-tionary algorithm to create or train an artificial neu-ral network. Gabriel, Negru, and Zaharie (2012) use aneuroevolution approach called rtNEAT to evolveboth the topology and connection weights of neuralnetworks for individual unit control in StarCraft. Intheir approach, each unit has its own neural networkthat receives input from environmental sources (suchas nearby units or obstacles) and hand-definedabstractions (such as the number, type, and qualityof nearby units), and outputs whether to attack,retreat, or move left or right. During a game, the per-formance of the units is evaluated using a hand-craft-ed fitness function, and poorly performing unitagents are replaced by combinations of the best-per-forming agents. It is tested in very simple scenarios of12 versus 12 units in a square arena, where all unitson each side are either a hand-to-hand or rangedtype unit. In these situations, it learns to beat thebuilt-in StarCraft AI and some other bots. However, itremains unclear how well it would cope with moreunits or mixes of different unit types (Gabriel, Negru,and Zaharie 2012).

Strategic Decision MakingIn order to create a system that can make intelligentactions at a strategic level in an RTS game, manyresearchers have created planning systems. These sys-tems are capable of determining sequences of actions

Articles

82 AI MAGAZINE

to be taken in a particular situation in order toachieve specified goals. It is a challenging problembecause of the incomplete information available —“fog of war” obscures areas of the battlefield that areout of sight of friendly units — as well as the hugestate and action spaces and many simultaneous non-hierarchical goals. With planning systems,researchers hope to enable AI to play at a humanlikelevel, while simultaneously reducing the develop-ment effort required when compared with the script-ing commonly used in industry. The main tech-niques used for planning systems are case-basedplanning (CBP), goal-driven autonomy (GDA) andhierarchical planning.

A basic strategic decision-making system was pro-duced in-house for the commercial RTS game KohanII: Kings of War10 (Dill 2006). It assigned resources —construction, research, and upkeep capacities — togoals, attempting to maximize the total priority ofthe goals that could be satisfied. The priorities wereset by a large number of hand-tuned values, whichcould be swapped for a different set to give the AI dif-ferent personalities (Dill 2006). Each priority valuewas modified based on relevant factors of the currentsituation, a goal commitment value (to prevent flip-flopping once a goal has been selected) and a randomvalue (to reduce predictability). It was found that thisnot only created a fun, challenging opponent, butalso made the AI easier to update for changes in gamedesign throughout the development process (Dill2006).

Case-Based PlanningCBP is a planning technique that finds similar pastsituations from which to draw potential solutions tothe current situation. In the case of a CBP system, thesolutions found are a set of potential plans or sub-plans that are likely to be effective in the current sit-uation. CBP systems can exhibit poor reactivity at thestrategic level and excessive reactivity at the actionlevel, not reacting to high-level changes in situationuntil a low-level action fails, or discarding an entireplan because a single action failed (Palma et al. 2011).

One of the first applications of CBP to RTS gameswas by Aha, Molineaux, and Ponsen (2005), who cre-ated a system that extended the dynamic scriptingconcept of Ponsen et al. (2005) to select tactics andstrategy based on the current situation. Using thistechnique, their system was able to play against anonstatic opponent instead of requiring additionaltraining each time the opponent changed. Theyreduced the complexity of the state and action spacesby abstracting states into a state lattice of possibleorders in which buildings are constructed in a game(build orders) combined with a small set of features,and abstracting actions into a set of tactics generatedfor each state. This allowed their system to improveits estimate of the performance of each tactic in eachsituation over multiple games, and eventually learn

to consistently beat all of the tested opponent scripts(Aha, Molineaux, and Ponsen 2005).

Ontañón et al. (2007) use the ideas of behaviors,goals, and alive-conditions from A Behavior Lan-guage (ABL, introduced by Mateas and Stern [2002])combined with the ideas from earlier CBP systems toform a case-based system for playing Wargus. Thecases are learned from human-annotated game logs,with each case detailing the goals a human wasattempting to achieve with particular sequences ofactions in a particular state. These cases can then beadapted and applied in-game to attempt to changethe game state. By reasoning about a tree of goalsand subgoals to be completed, cases can be selectedand linked together into plan to satisfy the overallgoal of winning the game (figure 5).

During the execution of a plan, it may be modifiedin order to adapt for unforeseen events or compen-sate for a failure to achieve a goal.

Mishra, Ontañón, and Ram (2008) extend thework of Ontañón et al. (2007) by adding a decisiontree model to provide faster and more effective caseretrieval. The decision tree is used to predict a high-level situation, which determines the attributes andattribute weights to use for case selection. This helpsby skipping unnecessary attribute calculations andcomparisons, and emphasizing important attributes.The decision tree and weightings are learned fromgame logs that have been human annotated to showthe high-level situation at each point throughout thegames. This annotation increased the developmenteffort required for the AI system but successfully pro-vided better and faster case retrieval than the originalsystem (Mishra, Ontañón, and Ram 2008).

More recent work using CBP tends to focus on thelearning aspects of the system instead of the plan-ning aspects. As such, it is discussed further in thePlan Recognition and Learning section.

A different approach is taken by Cadena and Gar-rido (2011), who combine the ideas of CBR withthose of fuzzy sets, allowing the reasoner to abstractstate information by grouping continuous featurevalues. This allows them to vastly simplify the statespace, and it may be a closer representation ofhuman thinking, but could potentially result in theloss of important information. For strategic decisionmaking, their system uses regular cases made up ofexact unit and building counts, and selects a planmade up of five high-level actions, such as creatingunits or buildings. But for tactical reasoning (micro-management is not explored), their system main-tains independent fuzzy state descriptions and car-ries out independent CBR for each region of themap, thus avoiding reasoning about the map as awhole at the tactical level. Each region’s stateincludes a linguistic fuzzy representation of its area(for example, small, medium, big), choke points,military presence, combat intensity, lost units, and

Articles

WINTER 2014 83

amounts of each friendly and enemy unit type (forexample, none, few, many). After building the casebase from just one replay of a human playing againstthe in-built AI, the system was able to win around 60percent of games (and tie in about 15 percent) againstthe AI on the same map. However, it is unclear howwell the system would fare at the task of playingagainst different races (unique playable teams) andstrategies, or playing on different maps.

Hierarchical PlanningBy breaking up a problem hierarchically, planningsystems are able to deal with parts of the situationseparately at different levels of abstraction, reducingthe complexity of the problem, but creating a poten-tial new issue in coordination between the differentlevels (Marthi et al. 2005; Weber et al. 2010). A hier-archical plan maps well to the hierarchy of goals andsubgoals typical in RTS games, from the highest-lev-el goals such as winning the game, to the lowest-lev-el goals, which map directly to in-game actions.Some researchers formalize this hierarchy into thewell-defined structure of a hierarchical task network(HTN), which contains tasks, their ordering, andmethods for achieving them. High-level, complextasks in an HTN may be decomposed into a sequenceof simpler tasks, which themselves can be decom-posed until each task represents a concrete action(Muñoz-Avila and Aha 2004).

HTNs have been used for strategic decision mak-ing in RTS games, but not for StarCraft. Muñoz-Avi-la and Aha (2004) focus on the explanations that anHTN planner is able to provide to a human queryingits behavior, or the reasons underlying certainevents, in the context of an RTS game. Laagland(2008) implements and tests an agent capable ofplaying an open source RTS called Spring11 using a

hand-crafted HTN. The HTN allows the agent toreact dynamically to problems, such as rebuilding abuilding that is lost or gathering additionalresources of a particular type when needed, unlikethe built-in scripted AI. Using a balanced strategy,the HTN agent usually beats the built-in AI inSpring, largely due to better resource management.Efforts to learn HTNs, such as Nejati, Langley, andKonik (2006), have been pursued in much simplerdomains, but never directly used in the field of RTSAI. This area may hold promise in the future forreducing the work required to build HTNs.

An alternative means of hierarchical planning wasused by Weber et al. (2010). They use an activebehavior tree in A Behavior Language, which hasparallel, sequential, and conditional behaviors andgoals in a tree structure (figure 6) very similar to abehavior tree (discussed in the next subsection).However, in this model, the tree is expanded duringexecution by selecting behaviors (randomly, or basedon conditions or priority) to satisfy goals, and differ-ent behaviors can communicate indirectly by read-ing or writing information on a shared whiteboard.

Hierarchical planning is often combined as part ofother methods, such as how Ontañón et al. (2007)use a hierarchical CBP system to reason about goalsand plans at different levels.

Behavior TreesBehavior trees are hierarchies of decision and actionnodes that are commonly used by programmers anddesigners in the game industry in order to definebehaviors (effectively a partial plan) for agents (Palmaet al. 2011). They have become popular because,unlike scripts, they can be created and edited usingvisual tools, making them much more accessible andunderstandable to nonprogrammers (Palma et al.

Articles

84 AI MAGAZINE

Execution

Actions Sensors

Behavior Acquisition

RTS Game

Annotated

Trace

Expert

AnnotationTool

CaseExtractor

Goals, State

Behaviors

RTS Game

BehaviorGeneration

Plan Expansionand Execution Trace

Case Base

Figure 5. A Case-Based Planning Approach.

The approach uses cases of actions extracted from annotated game logs to form plans that satisfy goals in Wargus. Adapted from Ontañónet al. (2007).

Articles

WINTER 2014 85

2011). Additionally, their hierarchical structure en-courages reuse, as a tree defining a specific behaviorcan be attached to another tree in multiple positionsor can be customized incrementally by adding nodes(Palma et al. 2011). Because behavior trees are hierar-chical, they can cover a wide range of behavior, fromvery low-level actions to strategic-level decisions. Pal-ma et al. (2011) use behavior trees to enable directcontrol of a case-based planner’s behavior. With theirsystem, machine learning can be used to create com-plex and robust behavior through the planner, whileallowing game designers to change specific parts ofthe behavior by substituting a behavior tree insteadof an action or a whole plan. This means they candefine custom behavior for specific scenarios, fixincorrectly learned behavior, or tweak the learnedbehavior as needed.

Goal-Driven AutonomyGDA is a model in which “an agent reasons about itsgoals, identifies when they need to be updated, andchanges or adds to them as needed for subsequentplanning and execution” (Molineaux, Klenk, andAha 2010). This addresses the high- and low-levelreactivity problem experienced by CBP by actively

reasoning about and reacting to why a goal is suc-ceeding or failing.

Weber, Mateas, and Jhala (2010) describe a GDAsystem for StarCraft using A Behavior Language,which is able to form plans with expectations aboutthe outcome. If an unexpected situation or eventoccurs, the system can record it as a discrepancy,generate an explanation for why it occurred, andform new goals to revise the plan, allowing the sys-tem to react appropriately to unforeseen events (fig-ure 7). It is also capable of simultaneously reasoningabout multiple goals at differing granularity. It wasinitially unable to learn goals, expectations, or strate-gies, so this knowledge had to be input and updatedmanually, but later improvements allowed these tobe learned from demonstration (discussed in thenext section) (Weber, Mateas, and Jhala 2012). Thissystem was used in the Artificial Intelligence andInteractive Digital Entertainment (AIIDE) StarCraftAI competition entry EISBot and was also evaluatedby playing against human players on a competitiveStarCraft ladder called International Cyber Cup(ICCup),12 where players are ranked based on theirperformance — it attained a ranking indicating itwas better than 48 percent of the competitive play-

Figure 6. A Simple Active Behavior Tree Used for Hierarchical Planning.

The figure shows mental acts (calculation or processing), physical acts (in-game actions), and an unexpanded goal. Adapted from Weber etal. (2010).

RootBehavior

SequentialBehavior

Goal 1 Goal 2

Physical Act Goal 3Mental Act

ParallelBehavior

ers (Weber, Mateas, and Jhala 2010; Weber et al.2010).

Jaidee, Muñoz-Avila, and Aha (2011) integrateCBR and RL to make a learning version of GDA,allowing their system to improve its goals anddomain knowledge over time. This means that lesswork is required from human experts to specify pos-sible goals, states, and other domain knowledgebecause missing knowledge can be learned automat-ically. Similarly, if the underlying domain changes,the learning system is able to adapt to the changesautomatically. However, when applied to a simple

domain, the system was unable to beat the perform-ance of a nonlearning GDA agent (Jaidee, Muñoz-Avila, and Aha 2011).

State Space PlanningAutomated planning and scheduling is a branch ofclassic AI research from which heuristic state spaceplanning techniques have been adapted for plan-ning in RTS game AI. In these problems, an agent isgiven a start and goal state, and a set of actions thathave preconditions and effects. The agent must thenfind a sequence of actions to achieve the goal from

Articles

86 AI MAGAZINE

Figure 7. GDA Conceptual Model.

A planner produces actions and expectations from goals, and unexpected outcomes result in additional goals being produced (Weber,Mateas, and Jhala 2012).

State

Expectations

Discrepancies

Explanations

Goals

Planner

ExecutionEnvironment

DiscrepancyDetector

Controller

ExplanationGenerator

GoalFormulator

GoalManager

Actions

ActiveGoal

the starting state. Existing RTS applications add com-plexity to the basic problem by dealing with durativeand parallel actions, integer-valued state variables,and tight time constraints.

Automated planning ideas have already beenapplied successfully to commercial first-person shoot-er (FPS) games within an architecture called Goal-Ori-ented Action Planning (GOAP). GOAP allows agentsautomatically to select the most appropriate actionsfor their current situation in order to satisfy a set ofgoals, ideally resulting in more varied, complex, andinteresting behavior, while keeping code morereusable and maintainable (Orkin 2004). However,GOAP requires a large amount of domain engineer-ing to implement and is limited because it mapsstates to goals instead of to actions, so the plannercannot tell whether achieving goals is going accord-ing to the plan, failing, or has failed (Orkin 2004;Weber, Mateas, and Jhala 2010). Furthermore, Cham-pandard13 states that GOAP has now turned out to bea dead end, as academe and industry have movedaway from GOAP in favor of hierarchical planners toachieve better performance and code maintainabili-ty.

However, Chan et al. (2007) and Churchill andBuro (2011) use an automated planning-basedapproach similar to GOAP to plan build orders in RTSgames. Unlike GOAP, they are able to focus on a sin-gle goal: finding a plan to build a desired set of unitsand buildings in a minimum duration (makespan).The RTS domain is simplified by abstracting resourcecollection to an income rate per worker, assumingbuilding placement and unit movement takes a con-stant amount of time, and completely ignoring oppo-nents. Ignoring opponents is fairly reasonable for thebeginning of a game, as there is generally little oppo-

nent interaction, and doing so means the plannerdoes not have to deal with uncertainty and externalinfluences on the state. Both of these methods stillrequire expert knowledge to provide a goal state forthem to pursue.

The earlier work by Chan et al. (2007) uses a com-bination of means-ends analysis and heuristic sched-uling in Wargus. Means-ends analysis produces aplan with a minimal number of actions required toachieve the goal, but this plan usually has a poormakespan because it doesn’t consider concurrentactions or actions that produce greater resources. Aheuristic scheduler then reorganizes actions in theplan to start each action as soon as possible, addingconcurrency and reducing the makespan. To consid-er producing additional resources, the same processis repeated with an extra goal for producing more ofa resource (for each resource) at the beginning of theplan, and the plan with the shortest makespan isused. The resulting plans, though nonoptimal, werefound to be similar in length to plans executed by anexpert player, and vastly better than plans generatedby state-of-the-art general purpose planners (Chan etal. 2007).

Churchill and Buro (2011) improve upon the ear-lier work by using a branch-and-bound depth-firstsearch to find optimal build orders within anabstracted simulation of StarCraft. In addition to thesimplifications mentioned above, they avoid simu-lating individual time steps by allowing any actionthat will eventually complete without further playerinteraction, and jumping directly to the point atwhich each action completes for the next decisionnode. Even so, other smaller optimizations wereneeded to speed up the planning process enough touse in-game. The search used either the gathering

Articles

WINTER 2014 87

Figure 8. Design of a Chromosome for Evolving RTS Game AI Strategies.

(Ponsen et al. 2005)

...

...

...

...

StartChromosome

State

Gene

Gene 1.1

Tactic for State 1 Tactic for State 3

Gene 1.2 Gene 3.1 Gene 3.2 Gene 3.3

State marker State number x Gene x.1 Gene x.2 Gene x.n

EndState 1

Gene ID Parameter 1 Parameter 2 Parameter p

State 2 State m

Start B 4 4S S3 3E 8 R 15 B S 1 C1 2 5 defend

time or the build time required to reach the goal(whichever was longer) as the lower bound, and arandom path to the goal as the upper bound(Churchill and Buro 2011). The system was evaluat-ed against professional build orders seen in replays,using the set of units and buildings owned by theplayer at a particular time as the goal state. Due tothe computational cost of planning later in thegame, planning was restricted to 120 seconds ahead,with replanning every 30 seconds. This producedshorter or equal-length plans to the human players atthe start of a game, and similar-length plans on aver-age (with a larger variance) later in the game. Itremains to be seen how well this method would per-form for later stages of the game, as only the first 500seconds were evaluated and searching took signifi-cantly longer in the latter half. However, this appearsto be an effective way to produce near-optimal buildorders for at least the early to middle game of Star-Craft (Churchill and Buro 2011).

Evolutionary AlgorithmsEvolutionary algorithms search for an effective solu-tion to a problem by evaluating different potentialsolutions and combining or randomizing compo-nents of high-fitness potential solutions to find new,better solutions. This approach is used infrequentlyin the RTS Game AI field, but it has been effectivelyapplied to the subproblem of tactical decision mak-ing in StarCraft (discussed earlier) and learningstrategic knowledge in similar RTS titles.

Although evolutionary algorithms have not yetbeen applied to strategic decision making in Star-Craft, they have been applied to its sequel, StarCraftII.14 The Evolution Chamber15 software uses the tech-nique to optimize partially defined build orders. Giv-en a target set of units, buildings, and upgrades to beproduced by certain times in the match, the softwaresearches for the fastest or least resource-intensive wayof reaching these targets. Although there have notbeen any academic publications regarding this soft-ware, it gained attention by producing an unusualand highly effective plan in the early days of Star-Craft II.

Ponsen et al. (2005) use evolutionary algorithms togenerate strategies in a game of Wargus. To generatethe strategies, the evolutionary algorithm combinesand mutates sequences of tactical and strategic-levelactions in the game to form scripts (figure 8) thatdefeat a set of human-made and previously evolvedscripts. The fitness of each potential script is evaluat-ed by playing it against the predefined scripts andusing the resulting in-game military score combinedwith a time factor that favors quick wins or slow loss-es. Tactics are extracted as sequences of actions fromthe best scripts, and are finally used in a dynamicscript that chooses particular tactics to use in a givenstate, based on its experience of their effectiveness —a form of RL. The resulting dynamic scripts are able

to consistently beat most of the static scripts theywere tested against after learning for approximately15 games against that opponent, but were unable toconsistently beat some scripts after more than 100games (Ponsen et al. 2005; 2006). A drawback of thismethod is that the effectiveness values learned forthe dynamic scripts assume that the opponent is stat-ic and would not adapt well to a dynamic opponent(Aha, Molineaux, and Ponsen 2005).

Cognitive ArchitecturesAn alternative method for approaching strategic-lev-el RTS game AI is to model a reasoning mechanismon how humans are thought to operate. This couldpotentially lead toward greater understanding of howhumans reason and allow us to create more human-like AI. This approach has been applied to StarCraft aspart of a project using the Soar cognitive architecture,which adapts the BWAPI interface to communicatewith a Soar agent.16 It makes use of Soar’s spatial visu-al system to deal with reconnaissance activities andpathfinding, and Soar’s working memory to hold per-ceived and reasoned state information. However, it iscurrently limited to playing a partial game of Star-Craft, using only the basic barracks and marine unitsfor combat, and using hard-coded locations for build-ing placement.16

A similar approach was taken by Wintermute, Xu,and Laird (2007), but it applied Soar to ORTS insteadof StarCraft. They were able to interface the Soar cog-nitive architecture to ORTS by reducing the com-plexity of the problem using the concepts of group-ing and attention for abstraction. These concepts arebased on human perception, allowing the underlyingSoar agent to receive information as a human would,postperception — in terms of aggregated and filteredinformation. The agent could view entire armies ofunits as a single entity, but could change the focus ofits attention, allowing it to perceive individual unitsin one location at a time, or groups of units over awide area (figure 9). This allowed the agent to controla simple strategic-level RTS battle situation withoutbeing overwhelmed by the large number of units(Wintermute, Xu, and Laird 2007). However, due tothe limitations of Soar, the agent could pursue onlyone goal at a time, which would be very limiting inStarCraft and most complete RTS games.

Spatial ReasoningRTS AI agents have to be able to reason about thepositions and actions of often large numbers of hid-den objects, many with different properties, movingover time, controlled by an opponent in a dynamicenvironment (Weber, Mateas, and Jhala 2011b; Win-termute, Xu, and Laird 2007). Despite the complexi-ty of the problem, humans can reason about thisinformation very quickly and accurately, often pre-dicting and intercepting the location of an enemy

Articles

88 AI MAGAZINE

attack or escape based on very little information, orusing terrain features and the arrangement of theirown units and buildings to their advantage. Thismakes RTS a highly suitable domain for spatial rea-soning research in a controlled environment (Buro2004; Weber, Mateas, and Jhala 2011a; Wintermute,Xu, and Laird 2007).

Even the analysis of the terrain in RTS games,ignoring units and buildings, is a nontrivial task. Inorder to play effectively, players need to be able toknow which regions of the terrain are connected toother regions, and where and how these regions con-nect. The connections between regions are as impor-tant as the regions themselves, because they offerdefensive positions through which an army mustmove to get into or out of the region (choke points).Perkins (2010) describes the implementation andtesting of the Brood War Terrain Analyzer, which hasbecome a very common library for creating StarCraftbots capable of reasoning about their terrain. Thelibrary creates and prunes a Voronoi diagram usinginformation about the walkable tiles of the map,identifies nodes as regions or choke points, thenmerges adjacent regions according to thresholds thatwere determined by trial and error to produce the

desired results. The choke point nodes are convertedinto lines that separate the regions, resulting in a setof region polygons connected by choke points (fig-ure 10). When compared against the choke pointsidentified by humans, it had a 0–17 percent falsenegative rate, and a 4–55 percent false positive rate,and took up to 43 seconds to analyze the map, sothere is still definite room for improvement (Perkins2010).

Once a player is capable of simple reasoning aboutthe terrain, it is possible to begin reasoning about themovement of units over this terrain. A particularlyuseful spatial reasoning ability in RTS games is to beable to predict the location of enemy units whilethey are not visible to a player. Weber, Mateas, andJhala (2011b) use a particle model for predicting ene-my unit positions in StarCraft, based on the unit’strajectory and nearby choke points at the time it wasseen. A single particle was used for each unit insteadof a particle cloud because it is not possible to visu-ally distinguish between two units of the same type,so it would be difficult to update the cloud if a unitwas lost then resighted (Weber, Mateas, and Jhala2011b). In order to account for the differencesbetween the unit types in StarCraft, they divided the

Articles

WINTER 2014 89

Figure 9. Attention Limits the Information the Agent Receives by Hiding or Abstracting Objects Further from the Agent’s Area of Focus.

(Wintermute, Xu, and Laird 2007).

Friendly units: 0Enemy units: 1

Friendly units: 1Enemy units: 1

Friendly units: 1Enemy units: 0

Focus of attention

Friendly units: 1Enemy units: 0

Friendly units: 1Enemy units: 0

Friendly units: 1Enemy units: 0

Friendly units: 0Enemy units: 1

types into broad classes and learned a movementmodel for each class from professional replays on avariety of maps. The model allowed their bot to pre-dict, with decreasing confidence over time, the sub-sequent locations of enemy units after sighting them,resulting in an increased win rate against other bots(Weber, Mateas, and Jhala 2011b).

The bulk of spatial reasoning research in StarCraftand other RTS games is based on potential fields(PFs), and to a lesser extent, influence maps. Each ofthese techniques help to aggregate and abstract spa-tial information by summing the effect of individualpoints of information into a field over an area, allow-

ing decisions to be made based on the computed fieldstrength at particular positions. They were firstapplied to RTS games by Hagelbäck and Johansson(2008), before which they were used for robot navi-gation. Kabanza et al. (2010) use an influence map toevaluate the potential threats and opportunities of anenemy force in an effort to predict the opponent’sstrategy, and Uriarte and Ontañón (2012) use one toevaluate threats and obstacles in order to control themovement of units performing a hit-and-run behav-ior known as kiting. Baumgarten, Colton, and Morris(2009) use a few different influence maps for syn-chronizing attacks by groups of units, moving and

Articles

90 AI MAGAZINE

Figure 10. Terrain After Analysis.

The figure shows impassable areas in blue and choke points as lines between light areas (Perkins 2010).

grouping units, and choosing targets to attack. Weberand Ontañón (2010) use PFs to aid a CBP system bytaking the field strengths of many different fields at aparticular position, so that the position is represent-ed as a vector of field strengths, and can be easilycompared to others stored in the case base. Synnaeveand Bessière (2011b) claim that their Bayesian mod-el for unit movement subsumes PFs, as each unit iscontrolled by Bayesian sensory inputs that are capa-ble of representing threats and opportunities in dif-ferent directions relative to the unit. However, theirsystem still needs to use damage maps in order tosummarize this information for use by the sensoryinputs (Synnaeve and Bessière 2011b).

PFs were used extensively in the Overmind Star-Craft bot, for both offensive and defensive unitbehavior (Huang 2011). The bot used the fields torepresent opportunities and threats represented byknown enemy units, using information about unitstatistics so that the system could estimate how ben-eficial and how costly it would be to attack each tar-get. This allowed attacking units to treat the fields asattractive and repulsive forces for movement, result-ing in them automatically congregating on high-val-ue targets and avoiding defenses.

Additionally, the PFs were combined with tempo-ral reasoning components, allowing the bot to con-sider the time cost of reaching a faraway target, andthe possible movement of enemy units around themap, based on their speed and visibility. The result-ing threat map was used for threat-aware pathfind-ing, which routed units around more threateningregions of the map by giving movement in threat-ened areas a higher path cost. The major difficultythey experienced in using PFs so much was in tuningthe strengths of the fields, requiring them to train theagent in small battle scenarios in order to find appro-priate values (Huang 2011). To the authors’ knowl-edge, this is the most sophisticated spatial reasoningthat has been applied to playing StarCraft.

Plan Recognition and LearningA major area of research in the RTS game AI literatureinvolves learning effective strategic-level game play.By using an AI system capable of learning strategies,researchers aim to make computer opponents morechallenging, dynamic, and humanlike, while makingthem easier to create (Hsieh and Sun 2008). StarCraftis a very complex domain to learn from, so it mayprovide insights into learning to solve real-worldproblems. Some researchers have focused on the sub-problem of determining an opponent’s strategy,which is particularly difficult in RTS games due toincomplete information about the opponent’sactions, hidden by the “fog of war” (Kabanza et al.2010). Most plan recognition makes use of an exist-ing plan library to match against when attempting torecognize a strategy, but some methods allow for plan

recognition without any predefined plans (Chengand Thawonmas 2004; Synnaeve and Bessière2011a). Often, data is extracted from the widelyavailable replays files of expert human players, so adata set was created in order to reduce repeated work(Synnaeve and Bessière 2012). This section dividesthe plan recognition and learning methods intodeductive, abductive, probabilistic, and case-basedtechniques. Within each technique, plan recogni-tion can be either intended — plans are denoted forthe learner and there is often interaction betweenthe expert and the learner — or keyhole — plans areindirectly observed and there is no two-way interac-tion between the expert and the learner.

DeductiveDeductive plan recognition identifies a plan by com-paring the situation with hypotheses of expectedbehavior for various known plans. By observing par-ticular behavior a deduction can be made about theplan being undertaken, even if complete knowledgeis not available. The system described by Kabanza etal. (2010) performs intended deductive plan recogni-tion in StarCraft by matching observations of itsopponent against all known strategies that couldhave produced the situation. It then simulates thepossible plans to determine expected future actionsof its opponent, judging the probability of plansbased on new observations and discarding plans thatdo not match (figure 11). The method used requiressignificant human effort to describe all possible plansin a decision tree type structure (Kabanza et al. 2010).

The decision tree machine learning method usedby Weber and Mateas (2009) is another example ofintended deductive plan recognition. Using trainingdata of building construction orders and timingsthat have been extracted from a large selection ofStarCraft replay files, it creates a decision tree to pre-dict which midgame strategy is being demonstrated.The replays are automatically given their correct clas-sification through a rule set based upon the buildorder. The learning process was also carried out witha nearest neighbor algorithm and a nonnested gen-eralized exemplars algorithm. The resulting modelswere then able to predict the build order fromincomplete information, with the nearest neighboralgorithm being most robust to incomplete informa-tion (Weber and Mateas 2009).

AbductiveAbductive plan recognition identifies plans by mak-ing assumptions about the situation that are suffi-cient to explain the observations. The GDA systemdescribed by Weber, Mateas, and Jhala (2010) is anexample of intended abductive plan recognition inStarCraft, where expectations are formed about theresult of actions, and unexpected events are account-ed for as discrepancies. The planner handles discrep-ancies by choosing from a set of predefined explana-

Articles

WINTER 2014 91

tions that give possible reasons for discrepancies andcreate new goals to compensate for the change inassumed situation. This system required substantialdomain engineering in order to define all of the pos-sible goals, expectations, and explanations necessaryfor a domain as complex as StarCraft.

Later work added the ability for the GDA system tolearn domain knowledge for StarCraft by analyzingreplays offline (Weber, Mateas, and Jhala 2012). Inthis modified system, a case library of sequentialgame states was built from the replays, with each caserepresenting the player and opponent states as

numerical feature vectors. Then case-based goal for-mulation was used to produce goals at run time. Thesystem forms predictions of the opponent’s futurestate (referred to as explanations in the article) byfinding a similar opponent state to the current oppo-nent state in the case library, looking at the future ofthe similar state to find the difference in the featurevectors over a set period of time, and then applyingthis difference to the current opponent state to pro-duce an expected opponent state. In a similar man-ner, it produces a goal state by finding the expectedfuture player state, using the predicted opponent

Articles

92 AI MAGAZINE

Figure 11. New Observations Update an Opponent’s Possible Plan Execution Statuses to Determine Which Plans Are Potentially Being Followed.

(Kabanza et al. 2010).

Plan Library

Hypotheses Hypotheses

Opponent’sInitial PES

...

...

... ......

P1

t = 1 t = 2 t = 3

P2 Pn

ActionAction Action

state instead of the current state in order to findappropriate reactions to the opponent. Expectationsare also formed from the case library, using changesin the opponent state to make predictions aboutwhen new types of units will be produced. When anexpectation is not met (within a certain tolerance forerror), a discrepancy is created, triggering the systemto formulate a new goal. The resulting systemappeared to show better results in testing than theprevious ones, but further testing is needed to deter-mine how effectively it adapts to unexpected situa-tions (Weber, Mateas, and Jhala 2012).

ProbabilisticProbabilistic plan recognition makes use of statisticsand expected probabilities to determine the mostlikely future outcome of a given situation. Synnaeveand Bessière (2011a), Dereszynski et al. (2011), andHostetler et al. (2012) carry out keyhole probabilisticplan recognition in StarCraft by examining buildorders from professional replays, without any priorknowledge of StarCraft build orders. This means theyshould require minimal work to adapt to changes inthe game or to apply to a new situation, because theycan learn directly from replays without any humaninput. The models learned can then be used to pre-dict unobserved parts of the opponent’s current state,or the future strategic direction of a player, given theplayer’s current and past situations. Alternatively,they can be used to recognize an unusual strategybeing used in a game. The two approaches differ inthe probabilistic techniques that are used, the scopein which they are applied, and the resulting predic-tive capabilities of the systems.

Dereszynski et al. (2011) use hidden Markov mod-els to model the player as progressing through aseries of states, each of which has probabilities forproducing each unit and building type, and proba-bilities for which state will be transitioned to next.The model is applied to one of the sides in just oneof the six possible race matchups, and to only thefirst seven minutes of game play, because strategiesare less dependent on the opponent at the start ofthe game. State transitions happen every 30 seconds,so the timing of predicted future events can be easi-ly found, but it is too coarse to capture the more fre-quent events, such as building new worker units.Without any prior information, it is able to learn astate transition graph that closely resembles thecommonly used opening build orders (figure 12), buta thorough analysis and evaluation of its predictivepower is not provided (Dereszynski et al. 2011).

Hostetler et al. (2012) extend previous work byDereszynski et al. (2011) using a dynamic Bayesiannetwork model for identifying strategies in StarCraft.This model explicitly takes into account the recon-naissance effort made by the player — measured bythe proportion of the opponent’s main base that hasbeen seen — in order to determine whether a unit orbuilding was not seen because it was not present, orbecause little effort was made to find it. This meansthat failing to find a unit can actually be very inform-ative, provided enough effort was made. The model isalso more precise than prior work, predicting exactcounts and production of each unit and buildingtype each 30-second time period, instead of just pres-ence or absence. Production of units and buildingseach time period is dependent on the current state,

Articles

WINTER 2014 93

Figure 12. State Transition Graph.

As learned in Dereszynski et al. (2011), showing transitions with probability at least 0.25 as solid edges, and higher-probability transitionswith thicker edges. Dotted edges are low-probability transitions shown to make all nodes reachable. Labels in each state are likely units tobe produced, while labels outside states are a human analysis of the strategy exhibited. (Dereszynski et al. 2011).

Early game

Early Expand

Reaver drop

Expand

Dragoon + Observer

Dragoon + Gateway

Robotics Facility

Arbiter tech Observatory

S8Probes... Gate

S11+Goon

S15Shuttle

S19Reaver

S5Dark Templar

S18Archives

S18Goon /Citadel

S13CitadelS17

+GoonS2

+GoonS25

+NexusS27

+Goon+Citadel

S27+Citadel

S16+Gate

S21+Zealot

S22+Gate

S0+Gate

S7

S12

Dark Templar

S10+Goon

S28Assimilator

S23Just

Dragoons

S26Cyber

S14Support Bay /

Shuttle

S6Observer /Stargate

S9 S4

S1

S20 S3

based on a hidden Markov model as in Dereszynski etal. (2011). Again, the model was trained and appliedto one side in one race matchup, and results areshown for just the first seven minutes of game play.For predicting unit quantities, it outperforms a base-line predictor, which simply predicts the average forthe given time period, but only after reconnaissancehas begun. This highlights a limitation of the model:it cannot differentiate easily between sequential timeperiods with similar observations, and therefore hasdifficulty making accurate predictions for during andafter such periods. This happens because the similarperiods are modeled as a single state that has a highprobability of transitioning to the same state in thenext period. For predicting technology structures, themodel seems to generally outperform the baseline,and in both prediction tasks it successfully incorpo-rates negative information to infer the absence ofunits (Hostetler et al. 2012).

Synnaeve and Bessière (2011a) carry out a similarprocess using a Bayesian model instead of a hiddenMarkov model. When given a set of thousands ofreplays, the Bayesian model learns the probabilitiesof each observed set of buildings existing at one-sec-ond intervals throughout the game. These timingsfor each building set are modeled as normal distribu-tions, such that few or widely spread observationswill produce a large standard deviation, indicatinguncertainty (Synnaeve and Bessière 2011a). Given a(partial) set of observations and a game time, themodel can be queried for the probabilities of eachpossible building set being present at that time. Alter-natively, given a sequence of times, the model can bequeried for the most probable building sets overtime, which can be used as a build order for the agentitself (Synnaeve and Bessière 2011a).

The model was evaluated and shown to be robustto missing information, producing a building setwith a little over one building wrong, on average,when 80 percent of the observations were randomlyremoved. Without missing observations and allow-ing for one building wrong, it was able to predictalmost four buildings into the future, on average(Synnaeve and Bessière 2011a).

Case BasedCase-based plan recognition may also be carried outusing case-based reasoning as a basis. CBR works bystoring cases that represent specific knowledge of aproblem and solution, and comparing new problemsto past cases in order to adapt and reuse past solu-tions (Aamodt and Plaza 1994). It is commonly usedfor learning strategic play in RTS games because itcan capture complex, incomplete situational knowl-edge gained from specific experiences to attempt togeneralize about a very large problem space, withoutthe need to transform the data (Aamodt and Plaza1994; Floyd and Esfandiari 2009; Sánchez-Pelegrín,Gómez-Martín, and Díaz-Agudo 2005).

Hsieh and Sun (2008) use CBR to perform keyholerecognition of build orders in StarCraft by analyzingreplays of professional players, similar to Synnaeveand Bessière (2011a) above. Hsieh and Sun (2008) usethe resulting case base to predict the performance ofa build order by counting wins and losses seen in theprofessional replays, which allows the system to pre-dict which build order is likely to be more successfulin particular situations.

In RTS games, CBR is often used not only for planrecognition but also as part of a more general methodfor learning actions and the situations in which theyshould be applied. An area of growing interest forresearchers involves learning to play RTS games froma demonstration of correct behavior. These learningfrom demonstration techniques often use CBR andCBP, but they are discussed in their own section,which follows.

Although much of the recent work using CBR forRTS games learns from demonstration, Baumgarten,Colton, and Morris (2009) use CBR directly withoutobserving human play. Their system uses a set of met-rics to measure performance, in order to learn to playthe strategy game DEFCON17 through an iterativeprocess similar to RL. The system uses cases of pastgames played to simultaneously learn which strate-gic moves it should make as well as which moves itsopponent is likely to make. It abstracts lower-levelinformation about unit and structure positions byusing influence maps for threats and opportunities inan area and by grouping units into fleets andmetafleets. In order for it to make generalizationsabout the cases it has stored, it groups the cases sim-ilar to its current situation using a decision tree algo-rithm, splitting the cases into more or less successfulgames based on game score and hand-picked metrics.A path through the resulting decision tree is thenused as a plan that is expected to result in a high-scor-ing game. Attribute values not specified by the select-ed plan are chosen at random, so the system tries dif-ferent moves until an effective move is found. In thisway, it can discover new plans from an initially emp-ty case base.

Learning by ObservationFor a domain as complex as RTS games, gatheringand maintaining expert knowledge or learning itthrough trial and error can be a very difficult task, butgames can provide simple access to (some of) thisinformation through replays or traces. Most RTSgames automatically create traces, recording theevents within a game and the actions taken by theplayers throughout the game. By analyzing thetraces, a system can learn from the human demon-stration of correct behavior, instead of requiring pro-grammers to specify its behavior manually. Thislearning solely by observing the expert’s externalbehavior and environment is usually called learningby observation, but is also known as apprenticeship

Articles

94 AI MAGAZINE

learning, imitation learning, behavioral cloning, pro-gramming by demonstration, and even learningfrom demonstration (Ontañón, Montana, and Gon-zalez 2011). These learning methods are analogous tothe way humans are thought to accelerate learningthrough observing an expert and emulating theiractions (Mehta et al. 2009).

Although the concept can be applied to otherareas, learning by observation (as well as learningfrom demonstration, discussed in the next section) isparticularly applicable for CBR systems. It can reduceor remove the need for a CBR system designer toextract knowledge from experts or think of potentialcases and record them manually (Hsieh and Sun2008; Mehta et al. 2009). The replays can be trans-formed into cases for a CBR system by examining theactions players take in response to situations andevents, or to complete certain predefined tasks.

In order to test the effectiveness of different tech-niques for learning by observation, Floyd and Esfan-diari (2009) compared CBR, decision trees, supportvector machines, and naïve Bayes classifiers for a taskbased on RoboCup robot soccer.18 In this task, classi-fiers were given the perceptions and actions of a setof RoboCup players and were required to imitatetheir behavior. There was particular difficulty intransforming the observations into a form usable bymost of the the classifiers, as the robots had anincomplete view of the field, so there could be veryfew or many objects observed at a given time (Floydand Esfandiari 2009). All of the classifiers besides k-nearest neighbor — the classifier commonly used forCBR — required single-valued features or fixed-sizefeature vectors, so the missing values were filled witha placeholder item in those classifiers in order tomimic the assumptions of k-nearest neighbor. Classi-fication accuracy was measured using the f-measure,and results showed that the CBR approach outper-formed all of the other learning mechanisms (Floydand Esfandiari 2009). These challenges and resultsmay explain why almost all research in learning byobservation and learning from demonstration in thecomplex domain of RTS games uses CBR as a basis.

Bakkes, Spronck, and van den Herik (2011)describe a case-based learning by observation systemthat is customized to playing Spring RTS games at astrategic level (figure 13), while the tactical decisionmaking is handled by a script. In addition to regularCBR, with cases extracted from replays, they record afitness value with each state, so the system can inten-tionally select suboptimal strategies when it is win-ning in order to make the game more evenlymatched and more fun to play. This requires a goodfitness metric for the value of a state, which is diffi-cult to create for an RTS. In order to play effectively,the system uses hand-tuned feature weights on a cho-sen set of features, and chooses actions that areknown to be effective against its expected opponent.The opponent strategy model is found by comparing

observed features of the opponent to those of oppo-nents in its case base, which are linked to the gameswhere they were encountered. In order to make caseretrieval efficient for accessing online, the case baseis clustered and indexed with a fitness metric whileoffline. After playing a game, the system can add thereplay to its case base in order to improve its knowl-edge of the game and opponent. A system capable ofcontrolled adaptation to its opponent like this couldconstitute an interesting AI player in a commercialgame (Bakkes, Spronck, and van den Herik 2011).

Learning by observation also makes it possible tocreate a domain-independent system that can sim-ply learn to associate sets of perceptions and actions,without knowing anything about their underlyingmeaning (Floyd and Esfandiari 2010; 2011a). How-ever, without domain knowledge to guide decisions,learning the correct actions to take in a given situa-tion is very difficult. To compensate, the systemmust process and analyze observed cases, using tech-niques like automated feature weighting and caseclustering in order to express the relevant knowl-edge.

Floyd and Esfandiari (2011a) claim their system iscapable of handling complex domains with partialinformation and nondeterminism, and show it to besomewhat effective at learning to play robot soccerand Tetris, but it has not yet been applied to adomain as complex as StarCraft. Their system hasmore recently been extended to be able to compareperceptions based on the entire sequence of percep-tions — effectively a trace — so that it is not limitedto purely reactive behavior (Floyd and Esfandiari2011b). In the modified model, each perceived statecontains a link to the previous state, so that whensearching for similar states to the current state, thesystem can incrementally consider additional paststates to narrow down a set of candidates. By alsoconsidering the similarity of actions contained in thecandidate cases, the system can stop comparing paststates when all of the candidate cases suggested asimilar action, thereby minimizing wasted process-ing time. In an evaluation where the correct actionwas dependent on previous actions, the updated sys-tem produced a better result than the original, but itis still unable to imitate an agent whose actions arebased on a hidden internal state (Floyd and Esfandi-ari 2011b).

Learning from DemonstrationInstead of learning purely from observing the tracesof interaction of a player with a game, the traces maybe annotated with extra information — often aboutthe player’s internal reasoning or intentions — mak-ing the demonstrations easier to learn from, and pro-viding more control over the particular behaviorslearned. Naturally, adding annotations by handmakes the demonstrations more time-consuming toauthor, but some techniques have been developed to

Articles

WINTER 2014 95

automate this process. This method of learning fromconstructed examples is known as learning fromdemonstration.

Given some knowledge about the actions and tasks(things that we may want to complete) in a game,there are a variety of different methods that can beused to extract cases from a trace for use in learningby observation or learning from demonstration sys-tems. Ontañón (2012) provides an overview of sev-eral different case acquisition techniques, from themost basic reactive and monolithic learningapproaches to more complex dependency graphlearning and time-span analysis techniques. Reactivelearning selects a single action in response to the cur-rent situation, while monolithic sequential learningselects an entire game plan; the first has issues withpreconditions and the sequence of actions, whereasthe second has issues managing failures in its long-term plan (Ontañón 2012). Hierarchical sequentiallearning attempts to find a middle ground by learn-ing which actions result in the completion of partic-ular tasks, and which tasks’ actions are subsets of oth-er tasks’ actions, making them subtasks. That way,ordering is retained, but when a plan fails it mustonly choose a new plan for its current task, instead offor the whole game (Ontañón 2012).

Sequential learning strategies can alternatively use

dependency graph learning, which uses known pre-conditions and postconditions, and observed order-ing of actions, to find a partial ordering of actionsinstead of using the total ordered sequence exactly asobserved. However, these approaches to determiningsubtasks and dependencies produce more dependen-cies than really exist, because independent actions ortasks that coincidentally occur at a similar time willbe considered dependent (Ontañón 2012). The sur-plus dependencies can be reduced using time-spananalysis, which removes dependencies where theduration of the action indicates that the secondaction started before the first one finished. In anexperimental evaluation against static AI, it wasfound that the dependency graph and time-spananalysis improved the results of each strategy theywere applied to, with the best results being producedby both techniques applied to the monolithic learn-ing strategy (Ontañón 2012).

Mehta et al. (2009) describe a CBR and planningsystem that is able to learn to play the game Wargusfrom human-annotated replays of the game (figure14). By annotating each replay with the goals that theplayer was trying to achieve at the time, the systemcan group sequences of actions into behaviors toachieve specific goals, and learn a hierarchy of goalsand their possible orderings. The learned behaviors

Articles

96 AI MAGAZINE

Figure 13. Learning by Observation Applied to an RTS.

Offline processing generalizes observations, initialization chooses an effective strategy, and online adaptation ensures cases are appropri-ate in the current situation. Adapted from Bakkes, Spronck, and van den Herik (2011).

Online AdaptationOf�ine Processing

Clustering

Indexing

SimilarityMatching

StrategySelection Game

Indices

ClusteredObservations

Initialization Game AIGameObservations

are stored in a behavior base that can be used by theplanner to achieve goals while playing the game. Thisresults in a system that requires less expert program-mer input to develop a game AI because it may betrained to carry out goals and behavior (Mehta et al.2009).

The system described by Weber and Ontañón(2010) analyzes StarCraft replays to determine thegoals being pursued by the player with each action.Using an expert-defined ontology of goals, the sys-tem learns which sequences of actions lead to goalsbeing achieved, and in which situations these actionsoccurred. Thus, it can automatically annotate replayswith the goals being undertaken at each point, andconvert this knowledge into a case base that is usablein a case-based planning system. The case-basedplanning system produced was able to play games ofStarCraft by retrieving and adapting relevant cases,but was unable to beat the in-built scripted StarCraftAI. Weber and Ontañón (2010) suggest that the sys-tem’s capability could be improved using moredomain knowledge for comparing state features andidentifying goals, which would make it more specificto StarCraft but less generally applicable.

An alternative to analyzing traces is to gather thecases in real time as the game is being played and thecorrect behavior is being demonstrated — known as

online learning. This method has been used to trainparticular desired behaviors in robots learning robotsoccer, so that humans could guide the learningprocess and apply more training if necessary (Groll-man and Jenkins 2007). The training of particulardesired behaviors in this way meant that fewer train-ing examples could be covered, so while the robotcould learn individual behaviors quickly, it requiredbeing set into explicit states for each behavior (Groll-man and Jenkins 2007). To the authors’ knowledge,such an approach has not been attempted in RTSgames.

Open Research AreasAs well as the areas covered above, most of which areactively being researched, there are some areas thatare applicable to RTS AI but seem to have been givenlittle attention. The first of these areas is found byexamining the use of game AI in industry and howit differs from academic AI. The next area — multi-scale AI — has had a few contributions that have yetto be thoroughly examined, while the third — coop-eration — is all but absent from the literature. Eachof these three areas raises problems that are chal-lenging for AI agents, and yet almost trivial for a

Articles

WINTER 2014 97

Figure 14. General Architecture for a Learning by Demonstration System.

Adapted from Mehta et al. (2009).

Actions

RTS Game

Expert

TraceAnnotation

ToolAnnotated

Trace

BehaviorLearning

BehaviorLibrary

BehaviorExecution

human player. The final section notes the inconsis-tency in evaluation methods between various papersin the field and calls for a standardized evaluationmethod to be put into practice.

Game AI in IndustryDespite the active research in the RTS AI field, thereseems to be a large divide between the academicresearch, which uses new, complex AI techniques,and the games industry, which usually uses older andmuch simpler approaches. By examining the differ-ences in academic and industry use of AI, we see newopportunities for research that may benefit bothgroups.

Many papers reason that RTS AI research will beuseful for new RTS game development by reducingthe work involved in creating AI opponents, or byallowing game developers to create better AI oppo-nents (Baekkelund 2006; Dill 2006; Mehta et al.2009; Ontañón 2012; Ponsen et al. 2005; Tozour2002; Woodcock 2002). For example, the RTS gameDEFCON was given enhanced, learning AI throughcollaboration with the Imperial College of London(discussed earlier) (Baumgarten, Colton, and Morris2009). Similarly, Kohan II: Kings of War was pro-duced with flexible AI through a dynamic goal-selec-tion mechanism based on complex priority calcula-tions (discussed earlier) (Dill 2006). More recently,the currently in development RTS game PlanetaryAnnihilation19 is using flow fields for effective unitpathfinding with large numbers of units, and neuralnetworks for controlling squads of units.20

In practice, however, there is very low rate ofindustry adoption of academic game AI research. It istypical for industry game producers to specify andencode manually the exact behavior of their agentsinstead of using learning or reasoning techniques(Mehta et al. 2009; Tozour 2002; Woodcock 2002).Older techniques such as scripting, finite statemachines, decision trees, and rule-based systems arestill the most commonly used (Ontañón 2012;Tozour 2002; Woodcock 2002)20 — for example, thebuilt-in AI of StarCraft uses a static script that choos-es randomly among a small set of predeterminedbehaviors (Huang 2011). These techniques result ingame AI that often has predictable, inflexible behav-ior, is subject to repeatable exploitation by humans,and doesn’t learn or adapt to unforeseen situationsor events (Dill 2006; Huang 2011; Ontañón 2012;Woodcock 2002).

There are two main reasons for this lack of adop-tion of academic AI techniques. Firstly, there is anotable difference in goals between academe andindustry. Most academic work focuses on trying tocreate rational, optimal agents that reason, learn, andreact, while the industry aims to create challengingbut defeatable opponents that are fun to play against,usually through entirely predefined behavior (Baum-garten, Colton, and Morris 2009; Davis 1999; Lidén

2004; Ontañón 2012; Tozour 2002). The two aims arelinked, as players find a game more fun when it is rea-sonably challenging (Hagelbäck and Johansson2009),21 but this difference in goals results in very dif-ferent behavior from the agents. An agent aiming toplay an optimal strategy — especially if it is the sameoptimal strategy every game — is unlikely to make adesirable RTS opponent, because humans enjoy find-ing and taking advantage of opportunities and oppo-nent mistakes.22 An optimal agent is also trying towin at all costs, while the industry really wants gameAI that is aiming to lose the game, but in a morehumanlike way (Davis 1999). 22 Making AI that actsmore humanlike and intelligent — even just in spe-cific circumstances through scripted behaviors — isimportant in the industry as it is expected to make agame more fun and interesting for the players (Lidén2004; Scott 2002; Woodcock 2002).

The second major reason for the lack of adoptionis that there is little demand from the games industryfor new AI techniques. Industry game developers donot view their current techniques as an obstacle tomaking game AI that is challenging and fun to playagainst, and note that it is difficult to evaluate thepotential of new, untested techniques (Woodcock2002).20, 22 Industry RTS games often allow AI oppo-nents to cheat in order to make them more challeng-ing, or emphasize playing against human opponentsinstead of AI (Davis 1999; Laird and van Lent 2001;Synnaeve and Bessière 2011a). Additionally, gamedevelopment projects are usually under severe timeand resource constraints, so trying new AI techniquesis both costly and risky (Buro 2004; Tozour 2002).20

In contrast, the existing techniques are seen as pre-dictable, reliable, and easy to test and debug (Dill2006; Baekkelund 2006; Tozour 2002; Woodcock2002).22 Academic AI techniques are also seen as dif-ficult to customize, tune, or tweak in order to performimportant custom scripted tasks, which scripted AI isalready naturally suited to doing.20, 22

Some new avenues of research come to light con-sidering the use of game AI in industry. Most impor-tantly, creating AI that is more humanlike, whichmay also make it more fun to play against. This taskcould be approached by making an RTS AI that iscapable of more difficult human interactions. Com-pared to AI, human players are good at workingtogether with allies, using surprises, deception, dis-tractions and coordinated attacks, planning effectivestrategies, and changing strategies to become less pre-dictable (Scott 2002). Players that are able to do atleast some of these things appear to be intelligent andare more fun for human players to play against (Scott2002). In addition, being predictable and exploitablein the same fashion over multiple games means thathuman players do not get to find and exploit newmistakes, removing a source of enjoyment from thegame. AI can even make mistakes and still appearintelligent as long as the mistake appears plausible in

Articles

98 AI MAGAZINE

the context of the game — the sort of mistakes that ahuman would make (Lidén 2004).

An alternative way to create AI that is morehumanlike is to replicate human play styles andskills. Enabling an AI to replicate particular strategies— for example a heavily defensive turtle strategy orheavily offensive rush strategy — would give the AImore personality and allow players to practiceagainst particular strategies.22 This concept has beenused in industry AI before (Dill 2006) but may be dif-ficult to integrate into more complex AI techniques.A system capable of learning from a human player —using a technique such as learning from demonstra-tion (see the section on this topic), likely usingoffline optimization — could allow all or part of theAI to be trained instead of programmed (Floyd andEsfandiari 2010; Mehta et al. 2009). Such a systemcould potentially copy human skills — like unitmicromanagement or building placement — in orderto keep up with changes in how humans play a gameover time, which makes it an area of particular inter-est to the industry.22

Evaluating whether an RTS AI is humanlike ispotentially an issue. For FPS games, there is an AIcompetition, BotPrize,20 for creating the mosthumanlike bots (AI players), where the bots arejudged on whether they appear to be a human play-ing the game — a form of Turing test.24 This test hasfinally been passed in 2012, with two bots judgedmore likely to be humans than bots for the first time.Appearing humanlike in an RTS would be an evengreater challenge than in an FPS, as there are moreways for the player to act and react to every situation,and many actions are much more visible than thevery fast-paced transient actions of an FPS. However,being humanlike is not currently a focus of any Star-Craft AI research, to the authors’ knowledge,although it has been explored to a very small extentin the context of some other RTS games. It is also nota category in any of the current StarCraft AI compe-titions. The reason for this could be the increased dif-ficulty of creating a human level agent for RTS gamescompared with FPS games, however, it may simply bedue to an absence of goals in this area of game AIresearch. A Turing Test similar to BotPrize could bedesigned for StarCraft bots by making humans playin matches and then decide whether their opponentwas a human or a bot. It could be implemented fair-ly easily on a competitive ladder like ICCup by sim-ply allowing a human to join a match and askingthem to judge the humanness of their opponent dur-ing the match. Alternatively, the replay facility inStarCraft could be used to record matches betweenbots and humans of different skill levels, and otherhumans could be given the replays to judge thehumanness of each player. Due to the popularity ofStarCraft, expert participants and judges should berelatively easy to find.

A secondary avenue of research is in creating RTS

AI that is more accessible or useful outside of acad-eme. This can partially be addressed by simply con-sidering and reporting how often the AI can be reliedupon to behave as expected, how performant thesystem is, and how easily the system can be testedand debugged. However, explicit research into theseareas could yield improvements that would benefitboth academe and industry. More work could also bedone to investigate how to make complex RTS AI sys-tems easier to tweak and customize, to produce spe-cific behavior while still retaining learning or rea-soning capabilities. Industry feedback indicates it isnot worthwhile to adapt individual academic AItechniques in order to apply them to individualgames, but it may become worthwhile if techniquescould be reused for multiple games in a reliable fash-ion. A generalized RTS AI middleware could allowgreater industry adoption — games could be moreeasily linked to the middleware and then tested withmultiple academic techniques — as well as a widerevaluation of academic techniques over multiplegames. Research would be required in order to findeffective abstractions for such a complex and variedgenre of games, and to show the viability of thisapproach.

Multiscale AIDue to the complexity of RTS games, current botsrequire multiple abstractions and reasoning mecha-nisms working in concert in order to play effectively(Churchill and Buro 2012; Weber et al. 2010; Weber,Mateas, and Jhala 2011a). In particular, most botshave separate ways of handling tactical and strategiclevel decision making, as well as separately manag-ing resources, construction, and reconnaissance.Each of these modules faces an aspect of an interre-lated problem, where actions taken will have long-term strategic trade-offs affecting the whole game, sothey cannot simply divide the problem into isolatedor hierarchical problems. A straightforward hierar-chy of command — like in a real-world military — isdifficult in an RTS because the decisions of the top-level commander will depend on, and affect, multi-ple subproblems, requiring an understanding of eachone as well as how they interact. For example,throughout the game, resources could be spent onimproving the resource generation, training units foran army, or constructing new base infrastructure,with each option controlled by a different modulethat cannot assess the others’ situations. Notably,humans seem to be able to deal with these problemsvery well through a combination of on- and offline,reactive, deliberative, and predictive reasoning.

Weber et al. (2010) define the term multiscale AIproblems to refer to these challenges, characterized byconcurrent and coordinated goal pursuit across mul-tiple abstractions. They go on to describe several dif-ferent approaches they are using to integrate parts oftheir bot. First is a working memory or shared black-

Articles

WINTER 2014 99

board concept for indirect communication betweentheir modules, where each module publishes its cur-rent beliefs for the others to read. Next, they allowfor goals and plans generated by their planning andreasoning modules to be inserted into their centralreactive planning system, to be pursued in parallelwith current goals and plans. Finally, they suggest amethod for altered behavior activation, so that mod-ules can modify the preconditions for defined behav-iors, allowing them to activate and deactivate behav-iors based on the situation.

A simpler approach may be effective for at leastsome parts of an RTS bot. Synnaeve and Bessière(2011b) use a higher-level tactical command, such asscout, hold position, flock, or fight, as one of theinputs to their micromanagement controller. Simi-larly, Churchill and Buro (2012) use a hierarchicalstructure for unit control, with an overall game com-mander — the module that knows about the high-level game state and makes strategic decisions — giv-ing commands to a macro commander and a combatcommander, each of which give commands to theirsubcommanders. Commanders further down thehierarchy are increasingly focused on a particulartask, but have less information about the overallgame state, so therefore must rely on their parents tomake them act appropriately in the bigger picture.This is relatively effective because the control of unitsis more hierarchically arranged than other aspects ofan RTS. Such a system allows the low-level con-trollers to incorporate information from their parentin the hierarchy, but they are unable to react andcoordinate with other low-level controllers directlyin order to perform cooperative actions (Synnaeveand Bessière 2011b). Most papers on StarCraft AI skirtthis issue by focusing on one aspect of the AI only, ascan be seen in how this review paper is divided intotactical and strategic decision making sections.

CooperationCooperation is an essential ability in many situa-tions, but RTS games present a particular complexenvironment in which the rules and overall goal arefixed, and there is a limited ability to communicatewith your cooperative partner(s). It would also bevery helpful in commercial games, as good coopera-tive players could be used for coaching or teamgames. In team games humans often team up to helpeach other with coordinated actions throughout thegame, like attacking and defending, even withoutactively communicating. Conversely AI players inmost RTS games (including StarCraft) will act seem-ingly independently of their teammates. A possiblebeginning direction for this research could be toexamine some techniques developed for opponentmodeling and reuse them for modeling an ally, thusgiving insight into how the player should act to coor-dinate with the ally. Alternatively, approaches toteamwork and coordination used in other domains,

such as RoboCup (Kitano et al. 1998) may be appro-priate to be adapted or extended for use in the RTSdomain.

Despite collaboration being highlighted as a chal-lenging AI research problem in Buro (2003), to theauthors’ knowledge just one research publicationfocusing on collaborative behavior exists in thedomain of StarCraft (and RTS games in general). Mag-nusson and Balsasubramaniyan (2012) modified anexisting StarCraft bot to allow both communicationof the bot’s intentions and in-game human controlof the bot’s behavior. It was tested in a small experi-ment in which a player is allied with the bot, with orwithout the communication and control elements,against two other bots. The players rated the com-municating bots as more fun to play with than thenoncommunicating bots, and more experiencedplayers preferred to be able to control the bot whilenovice players preferred a noncontrollable bot. Muchmore research is required to investigate collaborationbetween humans and bots, as well as collaborationbetween bots only.

Standardized EvaluationDespite games being a domain that is inherently suit-ed to evaluating the effectiveness of the players andmeasuring performance, it is difficult to make faircomparisons between the results of most literature inthe StarCraft AI field.

Almost every paper has a different method for eval-uating its results, and many of these experiments areof poor quality. Evaluation is further complicated bythe diversity of applications, as many of the systemsdeveloped are not suited to playing entire games ofStarCraft, but are suited to a specific subproblem.Such a research community, made up of isolatedstudies that are not mutually comparable, was recog-nized as problematic by Aha and Molineaux (2004).Their Testbed for Integrating and Evaluating Learn-ing Techniques (TIELT), which aimed to standardizethe learning environment for evaluation, attemptedto address the problem but unfortunately neverbecame very widely used.

Partial systems — those that are unable to play afull game of StarCraft — are often evaluated using acustom metric, which makes comparison betweensuch systems nearly impossible. A potential solutionfor this would be to select a common set of parts thatcould plug in to partial systems and allow them tofunction as a complete system for testing. This maybe possible by compartmentalizing parts of an open-source AI used in a StarCraft AI competition, such asUAlbertaBot (Churchill and Buro 2012), which isdesigned to be modular, or using an add-on librarysuch as the BWAPI Standard Add-on Library(BWSAL).25 Alternatively, a set of common tests couldbe made for partial systems to be run against. Suchtests could examine common subproblems of an AIsystem, such as tactical decision making, planning,

Articles

100 AI MAGAZINE

and plan recognition, as separate suites of tests. Evenwithout these tests in place, new systems should atleast be evaluated against representative related sys-tems in order to show that they represent a nontriv-ial improvement.

Results published about complete systems are sim-ilarly difficult to compare against one another due totheir varied methods of evaluation. Some of the onlycomparable results come from systems demonstratedagainst the inbuilt StarCraft AI, despite the fact thatthe inbuilt AI is a simple scripted strategy that aver-age human players can easily defeat (Weber, Mateas,and Jhala 2010). Complete systems are more effec-tively tested in StarCraft AI competitions, but theseare run infrequently, making quick evaluation diffi-cult. An alternative method of evaluation is to auto-matically test the bots against other bots in a laddertournament, such as in the StarCraft Brood War Lad-der for BWAPI Bots.26 In order to create a consistentbenchmark of bot strength, a suite of tests could beformed from the top three bots from each of theAIIDE StarCraft competitions on a selected set oftournament maps. This would provide enough vari-ety to give a general indication of bot strength, andit would allow for results to be compared betweenpapers and over different years. An alternative to test-ing bots against other bots is testing them in match-es against humans, such as how Weber, Mateas, andJhala (2010) tested their bot in the ICCup.

Finally, it may be useful to have a standard evalu-ation method for goals other than finding the AI bestat winning the game. For example, the game indus-try would be more interested in determining the AIthat is most fun to play against, or the most human-like. A possible evaluation for these alternate objec-tives was discussed earlier.

ConclusionThis article has reviewed the literature on artificialintelligence for real-time strategy games focusing onStarCraft. It found significant research focus on tacti-cal decision making, strategic decision making, planrecognition, and strategy learning. Three main areaswere identified where future research could have alarge positive impact. First, creating RTS AI that ismore humanlike would be an interesting challengeand may help to bridge the gap between academeand industry. The other two research areas discussedwere noted to be lacking in research contributions,despite being highly appropriate for real-time strate-gy game research: multiscale AI, and cooperation.Finally, the article finished with a call for increasedrigor and ideally standardization of evaluation meth-ods, so that different techniques can be compared oneven ground. Overall the RTS AI field is small butvery active, with the StarCraft agents showing con-tinual improvement each year, as well as graduallybecoming more based upon machine learning, learn-

ing from demonstration, and reasoning, instead ofusing scripted or fixed behaviors.

Notes1. Blizzard Entertainment: StarCraft: blizzard.com/games/sc/.

2. Wargus: wargus.sourceforge.net.

3. Open RTS: skatgame.net/mburo/orts.

4. Brood War API: code.google.com/p/bwapi.

5. AIIDE StarCraft AI Competition: www.starcraftaicompe-tition.com.

6. CIG StarCraft AI Competition: ls11-www.cs.uni-dort-mund.de/rts-competition/.

7. Mad Doc Software. Website no longer available.

8. SparCraft: code.google.com/p/sparcraft/.

9.  Blizzard Entertainment: Warcraft III: blizzard.com/games/war3/.

10.  TimeGate Studios: Kohan II Kings of War: www.timegate.com/games/kohan-2-kings-of-war.

11. Spring RTS: springrts.com.

12. International Cyber Cup: www.iccup.com.

13. See A. J. Champandard, This Year [2010] in Game AI:Analysis, Trends from 2010 and Predictions for 2011.aigamedev.com/open/editorial/2010-retrospective.

14.  Blizzard Entertainment: StarCraft II: blizzard.com/games/sc2/.

15.  Evolution Chamber: code.google.com/p/evolution-chamber/.

16. See A. Turner, 2012, Soar-SC: A Platform for AI Researchin StarCraft: Brood War github.com/bluechill/Soar-SC/tree/master/Soar-SC-Papers.

17.  Introversion Software: DEFCON: www.introversion.co.uk/defcon.

18. RoboCup: www.robocup.org.

19. Uber Entertainment: Planetary Annihilation: www.uberent.com/pa.

20. Personal communication with M. Robbins, 2013. Rob-bins is a software engineer at Uber Entertainment, former-ly game-play engineer at Gas Powered Games.

21. Also see L. Dicken’s 2011 blog, altdevblogaday.com/2011/05/12/a-difficult-subject/.

22. Personal communication with B. Schwab, 2013. Schwabis a senior AI/game-play engineer at Blizzard Entertain-ment.

23. BotPrize: botprize.org.

24. See L. Dicken’s 2011 blog, A Turing Test for Bots. altde-vblogaday.com/2011/09/09/a-turing-test-for-bots/.

25.  BWAPI Standard Add-on Library: code.google.com/p/bwsal.

26.  StarCraft Brood War Ladder for BWAPI Bots: bots-stats.krasi0.com.

ReferencesAamodt, A., and Plaza, E. 1994. Case-Based Reasoning:Foundational Issues, Methodological Variations, and Sys-tem Approaches. AI Communications 7(1): 39–59.

Aha, D.; Molineaux, M.; and Ponsen, M. 2005. Learning toWin: Case-Based Plan Selection in a Real-Time StrategyGame. In Case-Based Reasoning: Research and Development,

Articles

WINTER 2014 101

volume 3620, Lecture Notes in Computer Science, ed. H.Muñoz-Ávila and F. Ricci, 5–20. Berlin: Springer.

Aha, D. W., and Molineaux, M. 2004. Integrating Learningin Interactive Gaming Simulators. In Challenges in GameAI: Papers from the AAAI Workshop. AAAI Technical ReportWS-04-04. Palo Alto, CA: AAAI Press.

Baekkelund, C. 2006. Academic AI Research and Relationswith the Games Industry. In AI Game Programming Wisdom,volume 3, ed. S. Rabin, 77–88. Boston, MA: Charles RiverMedia.

Bakkes, S.; Spronck, P.; and van den Herik, J. 2011. A CBR-Inspired Approach to Rapid and Reliable Adaption of VideoGame AI. Paper presented at the Case-Based Reasoning forComputer Games Workshop at the International Confer-ence on Case-Based Reasoning (ICCBR), 17–26. Greenwich,London, 12–15 September.

Balla, R., and Fern, A. 2009. UCT for Tactical Assault Plan-ning in Real-Time Strategy Games. In Proceedings of the 21stInternational Joint Conference on Artificial Intelligence (IJCAI),40–45. Palo Alto, CA: AAAI Press.

Baumgarten, R.; Colton, S.; and Morris, M. 2009. Combin-ing AI Methods for Learning Bots in a Real-Time StrategyGame. International Journal of Computer Games Technology2009: Article Number 4.

Buckland, M. 2005. Programming Game AI by Example. Plano,TX: Wordware Publishing, Inc.

Buro, M., ed. 2012. Artificial Intelligence in AdversarialReal-Time Games: Papers from the 2012 AIIDE Workshop,AAAI Technical Report WS-12-15. Palo Alto, CA: AAAI Press.

Buro, M. 2004. Call for AI Research in RTS Games. In Chal-lenges in Game AI: Papers from the AAAI Workshop, 139–142. AAAI Technical Report WS-04-04. Palo Alto, CA: AAAIPress.

Buro, M. 2003. Real-Time Strategy Games: A New AIResearch Challenge. In Proceedings of the Eighteenth Interna-tional Joint Conference on Artificial Intelligence, 1534–1535.San Francisco: Morgan Kaufmann, Inc.

Buro, M., and Churchill, D. 2012. Real-Time Strategy GameCompetitions. AI Magazine 33(3): 106–108.

Buro, M., and Furtak, T. M. 2004. RTS Games and Real-TimeAI Research. Paper presented at the 13th Behavior Repre-sentation in Modeling and Simulation Conference, Arling-ton, VA, USA, 17–20 May.

Cadena, P., and Garrido, L. 2011. Fuzzy Case-Based Reason-ing for Managing Strategic and Tactical Reasoning in Star-Craft. In Advances in Artificial Intelligence, volume 7094, Lec-ture Notes in Computer Science, ed. I. Batyrshin and G.Sidorov, 113–124. Berlin: Springer.

Chan, H.; Fern, A.; Ray, S.; Wilson, N.; and Ventura, C. 2007.Online Planning for Resource Production in Real-TimeStrategy Games. In Proceedings of the 17th International Con-ference on Automated Planning and Scheduling (ICAPS), 65–72.Menlo Park, CA: AAAI Press.

Cheng, D., and Thawonmas, R. 2004. Case-Based PlanRecognition for Real-Time Strategy Games. In Proceedings ofthe 5th Annual European GAME-ON Conference, 36–40. Read-ing, UK: University of Wolverhampton Press.

Chung, M.; Buro, M.; and Schaeffer, J. 2005. Monte CarloPlanning in RTS Games. In Kendall, G., and Lucas, S., eds.,Proceedings of the 2005 IEEE Conference on ComputationalIntelligence and Games, 117–124. Piscataway, NJ: Institute forElectrical and Electronics Engineers.

Churchill, D., and Buro, M. 2011. Build Order Optimizationin StarCraft. In Proceedings of the Seventh AAAI Conference onArtificial Intelligence and Interactive Digital Entertainment, 14–19. Palo Alto, CA: AAAI Press.

Churchill, D., and Buro, M. 2012. Incorporating SearchAlgorithms into RTS Game Agents. In Artificial Intelligencein Adversarial Real-Time Games: Papers from the 2012AIIDE Workshop, AAAI Technical Report WS-12-15, 2–7.Palo Alto, CA: AAAI Press.

Churchill, D.; Saffidine, A.; and Buro, M. 2012. Fast Heuris-tic Search for RTS Game Combat Scenarios. In Proceedings ofthe Eighth AAAI Conference on Artificial Intelligence and Inter-active Digital Entertainment, 112–117. Palo Alto, CA: AAAIPress

Davis, I. L. 1999. Strategies for Strategy Game AI. In Artifi-cial Intelligence and Computer Games: Papers from theAAAI Spring Symposium, 24–27, AAAI Technical Report SS-99-02. Menlo Park, CA: AAAI Press.

Dereszynski, E.; Hostetler, J.; Fern, A.; Dietterich, T.; Hoang,T.; and Udarbe, M. 2011. Learning Probabilistic BehaviorModels in Real-Time Strategy Games. In Proceedings of theSeventh AAAI Conference on Artificial Intelligence and Interac-tive Digital Entertainment, 20–25. Palo Alto, CA: AAAI Press.

Dill, K. 2006. Prioritizing Actions in a Goal-Based RTS AI. InAI Game Programming Wisdom, volume 3, ed. S. Rabin, 321–330. Boston, MA: Charles River Media.

Floyd, M., and Esfandiari, B. 2009. Comparison of Classifiersfor Use in a Learning by Demonstration System for a Situat-ed Agent. Paper Presented at the Case-Based Reasoning forComputer Games at the 8th International Conference onCase-Based Reasoning, Seattle, WA, USA, 20–23 July.

Floyd, M., and Esfandiari, B. 2010. Toward a Domain Inde-pendent Case-Based Reasoning Approach for Imitation:Three Case Studies in Gaming. Paper Presented at the Case-Based Reasoning for Computer Games Workshop held at the18th International Conference on Case-Based Reasoning,Alessandria, Italy, 19–22 July.

Floyd, M. W., and Esfandiari, B. 2011a. A Case-Based Rea-soning Framework for Developing Agents Using Learning byObservation. In Proceedings of the 2011 IEEE InternationalConference on Tools with Artificial Intelligence, 531–538. Pis-cataway, NJ: Institute for Electrical and Electronics Engi-neers.

Floyd, M., and Esfandiari, B. 2011b. Learning State-BasedBehaviour Using Temporally Related Cases. Presented at theNineteenth UK Workshop on Case-Based Reasoning, Cam-bridge, UK, 9 December.

Gabriel, I.; Negru, V.; and Zaharie, D. 2012. NeuroevolutionBased MultiAgent System for Micromanagement in Real-Time Strategy Games. In Proceedings of the Fifth Balkan Con-ference in Informatics, 32–39. New York: Association for Com-puting Machinery.

Grollman, D., and Jenkins, O. 2007. Learning Robot SoccerSkills From Demonstration. In Proceedings of the 2007 IEEEInternational Conference on Development and Learning, 276–281. Piscataway, NJ: Institute for Electrical and ElectronicsEngineers.

Hagelbäck, J., and Johansson, S. J. 2008. The Rise of Poten-tial Fields in Real Time Strategy Bots. In Proceedings of theFourth Artificial Intelligence and Interactive Digital Entertain-ment Conference, 42–47. Palo Alto, CA: AAAI Press.

Hagelbäck, J., and Johansson, S. 2009. Measuring PlayerExperience on Runtime Dynamic Difficulty Scaling in an

Articles

102 AI MAGAZINE

RTS Game. In Proceedings of the 2009 IEEE Conference onComputational Intelligence and Games, 46–52. Piscataway, NJ:Institute for Electrical and Electronics Engineers.

Hostetler, J.; Dereszynski, E.; Dietterich, T.; and Fern, A.2012. Inferring Strategies from Limited Reconnaissance inReal-Time Strategy Games. Paper presented at the Confer-ence on Unceertainty in Artificial Intelligence, Avalon,Catalina Island, CA, 15–17 August.

Hsieh, J., and Sun, C. 2008. Building a Player Strategy Mod-el by Analyzing Replays of Real-Time Strategy Games. In Pro-ceedings of the 2008 IEEE International Joint Conference on Neu-ral Networks, 3106–3111. Piscataway, NJ: Institute forElectrical and Electronics Engineers.

Huang, H. 2011. Skynet Meets the Swarm: How the Berke-ley Overmind Won the 2010 StarCraft AI Competition. ArsTechnica 18 January 2011. (arstechnica.com/gaming/news/2011/01/skynet-meets-the-swarm-how-the-berkeley-overmind-won-the-2010-starcraft-ai-competition.ars).

Jaidee, U.; Muñoz-Avila, H.; and Aha, D. 2011. IntegratedLearning for Goal-Driven Autonomy. In Proceedings of the22nd International Joint Conference on Artificial Intelligence,2450–2455. Palo Alto, CA: AAAI Press.

Judah, K.; Roy, S.; Fern, A.; and Dietterich, T. G. 2010. Rein-forcement Learning via Practice and Critique Advice. In Pro-ceedings of the Twenty-Fourth AAAI Conference on ArtificialIntelligence. Palo Alto, CA: AAAI Press.

Kabanza, F.; Bellefeuille, P.; Bisson, F.; Benaskeur, A.; andIrandoust, H. 2010. Opponent Behaviour Recognition forReal-Time Strategy Games. In Plan, Activity, and IntentRecognition: Papers from the AAAI Workshop. TechnicalReport WS-10-15. Palo Alto, CA: AAAI Press.

Kitano, H.; Tambe, M.; Stone, P.; Veloso, M.; Coradeschi, S.;Osawa, E.; Matsubara, H.; Noda, I.; and Asada, M. 1998. TheRobocup Synthetic Agent Challenge 97. In RoboCup-97:Robot Soccer World Cup I, volume 1395, Lecture Notes inComputer Science, ed. H. Kitano, 62–73. Berlin: Springer.

Laagland, J. 2008. A HTN Planner for a Real-Time StrategyGame. Unpublished manuscript. (hmi.ewi.utwente.nl/ver-slagen/capita-selecta/CS-Laagland-Jasper.pdf).

Laird, J., and van Lent, M. 2001. Human-Level AI’S KillerApplication: Interactive Computer Games. AI Magazine22(2): 15–26.

Lidén, L. 2004. Artificial Stupidity: The Art of IntentionalMistakes. In AI Game Programming Wisdom, volume 2, ed. S.Ragin, 41–48. Hingham, MA: Charles River Media.

Magnusson, M. M., and Balsasubramaniyan, S. K. 2012. ACommunicating and Controllable Teammate Bot for RTSGames. Master’s thesis, School of Computing, BlekingeInstitute of Technology, Blekinge Sweden.

Manslow, J. 2004. Using Reinforcement Learning to SolveAI Control Problems. In AI Game Programming Wisdom, vol-ume 2, ed. Rabin, S. Hingham, MA: Charles River Media.591–601.

Marthi, B.; Russell, S.; Latham, D.; and Guestrin, C. 2005.Concurrent Hierarchical Reinforcement Learning. In Pro-ceedings of the Nineteenth International Joint Conference onArtificial Intelligence, 779–785. San Francisco: Morgan Kauf-mann, Inc.

Mateas, M., and Stern, A. 2002. A Behavior Language forStory-Based Believable Agents. IEEE Intelligent Systems 17(4):39–47.

Mehta, M.; Ontañón, S.; Amundsen, T.; and Ram, A. 2009.

Authoring Behaviors for Games Using Learning fromDemonstration. Paper Presented at the Case-Based Reason-ing for Computer Games at the 8th International Confer-ence on Case-Based Reasoning, Seattle, WA, USA, 20–23July.

Mishra, K.; Ontañón, S.; and Ram, A. 2008. SituationAssessment for Plan Retrieval in Real-Time Strategy Games.In Advances in Case-Based Reasoning, volume 5239, LectureNotes in Computer Science, ed. K.-D. Althoff, R. Bergmann,M. Minor, and A. Hanft, 355–369. Berlin: Springer.

Molineaux, M.; Aha, D.; and Moore, P. 2008. Learning Con-tinuous Action Models in a Real-Time Strategy Environ-ment. In Proceedings of the Twenty-First International FloridaArtificial Intelligence Research Society (FLAIRS) Conference,257–262. Palo Alto, CA: AAAI Press.

Molineaux, M.; Klenk, M.; and Aha, D. 2010. Goal-DrivenAutonomy in a Navy Strategy Simulation. In Proceedings ofthe Twenty-Fourth AAAI Conference on Artificial Intelligence.Palo Alto, CA: AAAI Press.

Muñoz-Avila, H., and Aha, D. 2004. On the Role of Expla-nation for Hierarchical Case-Based Planning in Real-TimeStrategy Games. In Advances in Case-Based Reasoning, 7thEuropean Conference, ECCBR 2004. Lecture Notes in Com-puter Science. Berlin: Springer.

Nejati, N.; Langley, P.; and Konik, T. 2006. Learning Hierar-chical Task Networks by Observation. In Proceedings of the23rd International Conference on Machine Learning, 665–672.New York: Association for Computing Machinery.

Ontañón, S. 2012. Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games. In Proceed-ings of the Twenty-Fifth International Florida Artificial Intelli-gence Research Society Conference (FLAIRS). Palo Alto, CA:AAAI Press.

Ontañón, S.; Mishra, K.; Sugandh, N.; and Ram, A. 2007.Case-Based Planning and Execution for Real-Time StrategyGames. In Case-Based Reasoning: Research and Development,volume 4626, Lecture Notes in Computer Science, ed. R.Weber and M. Richter, 164–178. Berlin: Springer.

Ontañón, S.; Montana, J.; and Gonzalez, A. 2011. Towardsa Unified Framework for Learning from Observation. Paperpresented at the Workshop on Agents Learning Interactive-ly from Human Teachers at the Twenty-Second Interna-tional Joint Conference on Artificial Intelligence,Barcelona, Spain 16–22 July.

Ontañón, S.; Synnaeve, G.; Uriarte, A.; Richoux, F.;Churchill, D.; and Preuss, M. In press. A Survey of Real-Time Strategy Game AI Research and Competition in Star-Craft. Transactions of Computational Intelligence and AI inGames 5(4): 1–19.

Orkin, J. 2004. Applying Goal-Oriented Action Planning toGames. In AI Game Programming Wisdom, volume 2, ed. S.Rabin, 217–227. Hingham, MA: Charles River Media.

Palma, R.; Sánchez-Ruiz, A.; Gómez-Martín, M.; Gómez-Martín, P.; and González-Calero, P. 2011. Combining ExpertKnowledge and Learning from Demonstration in Real-TimeStrategy Games. In Case-Based Reasoning Research and Devel-opment, volume 6880, Lecture Notes in Computer Science,ed. A. Ram and N. Wiratunga, 181–195. Berlin: Springer.

Perkins, L. 2010. Terrain Analysis in Real-Time StrategyGames: an Integrated Approach to Choke Point Detectionand Region Decomposition. In Proceedings of the Sixth AAAIConference on Artificial Intelligence and Interactive DigitalEntertainment, 168–173. Palo Alto, CA: AAAI Press.

Articles

WINTER 2014 103

Ponsen, M.; Muñoz-Avila, H.; Spronck, P.; and Aha, D. 2005.Automatically Acquiring Domain Knowledge for AdaptiveGame AI Using Evolutionary Learning. In Proceedings, TheTwentieth National Conference on Artificial Intelligence and theSeventeenth Innovative Applications of Artificial IntelligenceConference, 1535–1540. Palo Alto, CA: AAAI Press.

Ponsen, M.; Muñoz-Avila, H.; Spronck, P.; and Aha, D. 2006.Automatically Generating Game Tactics Through Evolu-tionary Learning. AI Magazine 27(3): 75–84.

Sailer, F.; Buro, M.; and Lanctot, M. 2007. Adversarial Plan-ning Through Strategy Simulation. In Proceedings of the IEEEConference on Computational Intelligence and Games, 80–87.Piscataway, NJ: Institute for Electrical and Electronics Engi-neers.

Sánchez-Pelegrín, R.; Gómez-Martín, M.; and Díaz-Agudo,B. 2005. A CBR Module for a Strategy Videogame. Paper pre-sented at the ICCBR05 Workshop on Computer Gamingand Simulation Environments at the ICCBR, Chicago, IL,23–26 August.

Schaeffer, J. 2001. A Gamut of Games. AI Magazine 22(3):29–46.

Scott, B. 2002. The Illusion of Intelligence. In AI Game Pro-gramming Wisdom, volume 1, ed. S. Rabin, 16–20. Hingham,MA: Charles River Media.

Shantia, A.; Begue, E.; and Wiering, M. 2011. Connection-ist Reinforcement Learning for Intelligent Unit Micro Man-agement in StarCraft. Paper Presented at the 2011 Interna-tional Joint Conference on Neural Networks (IJCNN). SanJose, CA USA, 31 July–5 August.

Sharma, M.; Holmes, M.; Santamaria, J.; Irani, A.; Isbell, C.;and Ram, A. 2007. Transfer Learning in Real-Time StrategyGames Using Hybrid CBR/RL. In Proceedings of the 20th Inter-national Joint Conference on Artificial Intelligence. Palo Alto,CA: AAAI Press.

Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning:An Introduction. Cambridge Massachusetts: The MIT Press.

Synnaeve, G., and Bessière, P. 2011a. A Bayesian Model forPlan Recognition in RTS Games Applied to StarCraft. In Pro-ceedings of the Seventh AAAI Conference on Artificial Intelli-gence and Interactive Digital Entertainment, 79–84. Palo Alto,CA: AAAI Press.

Synnaeve, G., and Bessière, P. 2011b. A Bayesian Model forRTS Units Control Applied to StarCraft. In Proceedings ofthe 2011 IEEE Conference on Computational Intelligenceand Games, 190–196. Piscataway, NJ: Institute for Electricaland Electronics Engineers.

Synnaeve, G., and Bessière, P. 2012. A Dataset for StarCraftAI and an Example of Armies Clustering. In Artificial Intel-ligence in Adversarial Real-Time Games: Papers from the2012 AIIDE Workshop, AAAI Technical Report WS-12-15.Palo Alto, CA: AAAI Press.

Szczepanski, T., and Aamodt, A. 2009. Case-Based Reason-ing for Improved Micromanagement in Real-Time StrategyGames. Paper Presented at the Case-Based Reasoning forComputer Games at the 8th International Conference onCase-Based Reasoning, Seattle, WA, USA, 20–23 July.

Tozour, P. 2002. The Evolution of Game AI. In AI Game Pro-gramming Wisdom, volume 1, ed. S. Ragin, 3–15. Hingham,MA: Charles River Media.

Uriarte, A., and Ontañón, S. 2012. Kiting in RTS GamesUsing Influence Maps. In Artificial Intelligence in Adversar-ial Real-Time Games: Papers from the AIIDE Workshop.Technical Report WS-12-14. Palo Alto, CA: AAAI Press.

Weber, B., and Mateas, M. 2009. A Data Mining Approach toStrategy Prediction. In Proceedings of the 2009 IEEE Sympo-sium on Computational Intelligence and Games, 140–147. Pis-cataway, NJ: Institute for Electrical and Electronics Engi-neers.

Weber, B.; Mateas, M.; and Jhala, A. 2010. Applying Goal-Driven Autonomy to StarCraft. In Proceedings of the SixthAAAI Conference on Artificial Intelligence and Interactive Digi-tal Entertainment, 101–106. Palo Alto, CA: AAAI Press.

Weber, B.; Mateas, M.; and Jhala, A. 2011a. BuildingHuman-Level AI for Real-Time Strategy Games. In Advancesin Cognitive Systems: Papers from the AAAI Fall Sympo-sium. Technical Report FS-11-01, 329–336. Palo Alto, CA:AAAI Press.

Weber, B.; Mateas, M.; and Jhala, A. 2011b. A Particle Mod-el for State Estimation in Real-Time Strategy Games. In Pro-ceedings of the Seventh AAAI Conference on Artificial Intelligenceand Interactive Digital Entertainment, 103–108. Palo Alto, CA:AAAI Press.

Weber, B.; Mateas, M.; and Jhala, A. 2012. Learning fromDemonstration for Goal-Driven Autonomy. In Proceedings ofthe Twenty-Sixth AAAI Conference on Artificial Intelligence,1176–1182. Palo Alto, CA: AAAI Press.

Weber, B.; Mawhorter, P.; Mateas, M.; and Jhala, A. 2010.Reactive Planning Idioms for multiScale Game AI. In Pro-ceedings of the 2010 IEEE Conference on Computational Intelli-gence and Games, 115–122. Piscataway, NJ: Institute for Elec-trical and Electronics Engineers.

Weber, B., and Ontañón, S. 2010. Using Automated ReplayAnnotation for Case-Based Planning in Games. Paper Pre-sented at the Case-Based Reasoning for Computer Games atthe 8th International Conference on Case-Based Reasoning,Seattle, WA, USA, 20–23 July.

Wintermute, S.; Xu, J.; and Laird, J. 2007. SORTS: A Human-Level Approach to Real-Time Strategy AI. In Proceedings ofthe Eighth AAAI Conference on Artificial Intelligence and Inter-active Digital Entertainment, 55–60. Palo Alto, CA: AAAIPress.

Woodcock, S. 2002. Foreword. In AI Techniques for Game Pro-gramming, ed. M. Buckland. Portland, OR: Premier Press.

Ian Watson is an associate professor of artificial intellli-gence in the Department of Computer Science at the Uni-versity of Auckland, New Zealand. With a background inexpert systems Watson became interested in case-based rea-soning (CBR) to reduce the knowledge engineering bottle-neck. Watson has remained active in CBR, focusing on gameAI alongside other techniques. Watson also has an interestin the history of computing, having written a popular sci-ence book called The Universal Machine.

Glen Robertson is a Ph.D. candidate at the University ofAuckland, working under the supervision of Ian Watson.Robertson’s research interests are in machine learning andartificial intelligence, particularly in unsupervised learningfor complex domains with large data sets.

Articles

104 AI MAGAZINE


Recommended