+ All Categories
Home > Documents > A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST...

A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST...

Date post: 06-Mar-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
18
A TEST FOR IEEETRAN.CLS— [RUNNING ENHANCED CLASS V1.6] 1 Playing to Learn: Case-Injected Genetic Algorithms for Learning to Play Computer Games Sushil J. Louis, Member, IEEE, and Chris Miles Abstract— We use case-injected genetic algorithms to learn to competently play computer strategy games. Case-injected genetic algorithms periodically inject individuals that were successful in past games into the population of the GA working on the current game, biasing search towards know successful strate- gies. Computer strategy games are fundamentally resource allocation games characterized by complex long-term dynamics and by imperfect knowledge of the game state. The case-injected genetic algorithm plays by extracting and solving the game’s underlying resource allocation problems. We show how case injection can be used to learn to play better from a human’s or system’s game-playing experience and our approach to acquiring experience from human players showcases an elegant solution to the knowl- edge acquisition bottleneck in this domain. Results show that with an appropriate representation, case injection effectively biases the genetic algorithm to- wards producing plans that contain important strate- gic elements from previously successful strategies. I. I NTRODUCTION T HE computer gaming industry is now almost as big as the movie industry and both gaming and entertainment drive research in graphics, mod- eling, and many other computer fields. Although AI and evolutionary computing research has been interested in games like checkers and chess [1], [2], [3], [4], [5], [6], popular computer games such as Starcraft and Counter-Strike are very different and have not received much attention. These games are situated in a virtual world, involve both long-term and reactive planning, and provide an immersive, fun experience. At the same time, we can pose many training, planning, and scientific problems as games in which player decisions bias or determine the final solution. Developers of computer players (game AI) for popular first-person shooters (FPS) and real-time strategy (RTS) games tend to acquire and encode human-expert knowledge in finite state machines or rule-based systems [7], [8]. This works well, until a human player learns the game AI’s weaknesses, and requires significant player and developer time to create competent players. Development of game AI thus suffers from the knowledge acquisition bottleneck that is well known to AI researchers. This paper, in contrast, describes and uses a Case-Injected Genetic AlgoRithm (CIGAR) that combines genetic algorithms with case-based rea- soning to competently play a computer strategy game. The main task in such a strategy game is to continuously allocate (and re-allocate) resources to counter opponent moves. Since RTS games are fundamentally about solving a sequence of resource allocation problems, the genetic algorithm plays by attempting to solve these underlying resource allocation problems. Note that the genetic algorithm (or human) is attempting to solve resource alloca- tion problems with no guarantee that the genetic algorithm (or human) will find the optimal solution to the current resource allocation problem - quickly finding a good solution is usually enough to get good game-play. Case injection improves the genetic algorithm’s performance (quality and speed) by periodically seeding the evolving population with individuals containing good building blocks from a case-based
Transcript
Page 1: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

A TEST FOR IEEETRAN.CLS— [RUNNING ENHANCED CLASS V1.6] 1

Playing to Learn: Case-Injected GeneticAlgorithms for Learning to Play Computer

GamesSushil J. Louis, Member, IEEE, and Chris Miles

Abstract— We use case-injected genetic algorithmsto learn to competently play computer strategygames. Case-injected genetic algorithms periodicallyinject individuals that were successful in past gamesinto the population of the GA working on the currentgame, biasing search towards know successful strate-gies. Computer strategy games are fundamentallyresource allocation games characterized by complexlong-term dynamics and by imperfect knowledge ofthe game state. The case-injected genetic algorithmplays by extracting and solving the game’s underlyingresource allocation problems. We show how caseinjection can be used to learn to play better froma human’s or system’s game-playing experience andour approach to acquiring experience from humanplayers showcases an elegant solution to the knowl-edge acquisition bottleneck in this domain. Resultsshow that with an appropriate representation, caseinjection effectively biases the genetic algorithm to-wards producing plans that contain important strate-gic elements from previously successful strategies.

I. INTRODUCTION

THE computer gaming industry is now almostas big as the movie industry and both gaming

and entertainment drive research in graphics, mod-eling, and many other computer fields. AlthoughAI and evolutionary computing research has beeninterested in games like checkers and chess [1], [2],[3], [4], [5], [6], popular computer games such asStarcraft and Counter-Strike are very different andhave not received much attention. These games aresituated in a virtual world, involve both long-termand reactive planning, and provide an immersive,

fun experience. At the same time, we can posemany training, planning, and scientific problems asgames in which player decisions bias or determinethe final solution.

Developers of computer players (game AI) forpopular first-person shooters (FPS) and real-timestrategy (RTS) games tend to acquire and encodehuman-expert knowledge in finite state machines orrule-based systems [7], [8]. This works well, untila human player learns the game AI’s weaknesses,and requires significant player and developer timeto create competent players. Development of gameAI thus suffers from the knowledge acquisitionbottleneck that is well known to AI researchers.

This paper, in contrast, describes and uses aCase-Injected Genetic AlgoRithm (CIGAR) thatcombines genetic algorithms with case-based rea-soning to competently play a computer strategygame. The main task in such a strategy game isto continuously allocate (and re-allocate) resourcesto counter opponent moves. Since RTS games arefundamentally about solving a sequence of resourceallocation problems, the genetic algorithm playsby attempting to solve these underlying resourceallocation problems. Note that the genetic algorithm(or human) is attempting to solve resource alloca-tion problems with no guarantee that the geneticalgorithm (or human) will find the optimal solutionto the current resource allocation problem - quicklyfinding a good solution is usually enough to getgood game-play.

Case injection improves the genetic algorithm’sperformance (quality and speed) by periodicallyseeding the evolving population with individualscontaining good building blocks from a case-based

Page 2: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

repository of individuals that have performed wellon previously confronted problems. Think of thiscase-base as a repository of past experience. Ourpast work describes how to choose appropriatecases from the case-base for injection, how to definesimilarity, and how often to inject chosen cases tomaximize performance [9].

This paper reports on results from ongoing workthat seeks to develop competent game opponents fortactical and strategic games. We are particularly in-terested in automated methods for modeling humanstrategic and tactical game play in order to developcompetent opponents and to model a particulardoctrine or “style” of human game-play. Our long-term goal is to show that evolutionary computingtechniques can lead to robust, flexible, challengingopponents that learn from human game-play. In thispaper, we develop and use a strike force planningRTS game as a test-bed (see Figure 1) and showthat CIGAR can 1) play the game; 2) learn fromexperience to play better; and 3) learn trap avoid-ance from a human player’s game play.

Fig. 1. Game Screen-shot

The significance of learning trap avoidance fromhuman game-play arises from the system having tolearn a concept that is external to the evaluationfunction used by the case-injected genetic algo-rithm. Initially, the system has no conception ofa trap (the concept) and has no way of learningabout traps through feedback from the evaluation

function. Therefore, the problem is for the system toacquire knowledge about traps and trap-avoidancefrom humans and then to learn to avoid traps. Thispaper shows how the system “plays to learn.” Thatis, we show how a case-injected genetic algorithmuses cases acquired from human (or system) game-play to learn to avoid traps without changing thegame and the evaluation function.

The next section introduces the strike force plan-ning game and case-injected genetic algorithms.We then describe previous work in this area. Sec-tion IV describes the specific strike scenariosused for testing, the evaluation computation, thesystem’s architecture, and the encoding. The nexttwo sections describe the test setup and results withusing a case-injected genetic algorithm to play thegame and to learn trap-avoidance from humans. Thelast section provides conclusions and directions forfuture research.

II. STRIKE FORCE PLANNING

Strike force asset allocation maps to a broadcategory of resource allocation problems in industryand thus makes a suitable test problem for ourwork. We want to allocate a collection of assetson platforms to a set of targets and threats on theground. The problem is dynamic; weather and otherenvironmental factors affect asset performance, un-known threats can “popup,” and new targets canbe assigned. These complications as well as thevarying effectiveness of assets on targets makethe problem suitable for evolutionary computingapproaches.

Our game involves two sides: Blue and Red,both seeking to allocate their respective resourcesto minimize damage received while maximizingthe effectiveness of their assets in damaging theopponent. Blue plays by allocating a set of assetson aircraft (platforms), to attack Red’s buildings(targets) and defensive installations (threats). Bluedetermines which targets to attack, which weapons(assets) to use on them, as well as how to route plat-forms to targets, trying to minimize risk presentedwhile maximizing weapon effectiveness.

Red has defensive installations (threats) that pro-tect targets by attacking Blue platforms that come

2

Page 3: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

within range. Red plays by placing these threatsto best protect targets. Potential threats and targetscan also popup on Red’s command in the middleof a mission, allowing a range of strategic options.By cleverly locating threats, Red can feign vul-nerability and lure Blue into a deviously locatedpopup trap, or keep Blue from exploiting such aweakness out of fear of a trap. The scenario in thispaper involves Red presenting Blue with a trappedcorridor of seemingly easy access to targets.

In this paper, a human plays Red while a GeneticAlgorithm Player (GAP) plays Blue. GAP developsstrategies for the attacking strike force, includingflight plans and weapons targeting for all availableaircraft. When confronted with popups, GAP re-sponds by replanning in order to produce a newplan of action that responds to the changes. Beyondpurely responding to immediate scenario changeswe use case injection in order to produce plansthat anticipate opponent moves. We provide a shortintroduction to CIGAR next.

A. Case-Injected Genetic Algorithms

A case-injected genetic algorithm works differ-ently than a typical genetic algorithm. A geneticalgorithm (GA) randomly initializes its startingpopulation so that it can proceed from an unbiasedsample of the search space. We believe that itmakes less sense to start a problem solving searchattempt from scratch when previous search attempts(on similar problems) may have yielded usefulinformation about the search space. Instead, peri-odically injecting a genetic algorithm’s populationwith relevant solutions or partial solutions to similarpreviously solved problems can provide informa-tion (a search bias) that reduces the time taken tofind a quality solution. Our approach borrows ideasfrom case-based reasoning (CBR) in which oldproblem and solution information, stored as casesin a case-base, helps solve a new problem [10],[11], [12]. In our system, the data-base, or case-base, of problems and their solutions supplies thegenetic problem solver with a long-term memory.The system does not require a case-base to startwith and can bootstrap itself by learning new cases

from the genetic algorithm’s attempts at solving aproblem.

While the genetic algorithm works on a problem,promising members of the population are storedinto the case-base through a preprocessor. Subse-quently, when starting work on a new problem,suitable cases are retrieved from the case base andare used to populate a small percentage (say 10%−

15%) of the initial population. A case is a memberof the population (a candidate solution) togetherwith other information including its fitness and thegeneration at which this case was generated [13].During GA search, whenever the fitness of the bestindividual in the population increases, the new bestindividual is stored in the case-base.

Like CIGAR, human players playing the gameare also solving resource allocation and routingproblems. A human player’s asset allocation androuting strategy is automatically reverse engineeredinto CIGAR’s chromosomal representation andstored as a case into the case-base. Such casesembody domain knowledge acquired from humanplayers.

The case-base does what it is best at – memoryorganization; the genetic algorithm handles whatit is best at – adaptation. The resulting combina-tion takes advantage of both paradigms; the ge-netic algorithm component delivers robustness andadaptive learning while the case-based componentspeeds up the system.

The case-injected genetic algorithm used in thispaper operates on the basis of solution similarity.CIGAR periodically injects a small number of so-lutions similar to the current best member of the GApopulation into the current population, replacingthe worst members. The GA continues searchingwith this combined population. Apart from usingsolution similarity, note that one other distinguish-ing feature from the “problem-similarity” metricCIGAR is that cases are periodically injected. Theidea is to cycle through the following steps. Letthe GA make some progress. Next, find solutionsin the case-base that are similar to the current bestsolution in the population and inject these solutionsinto the population. Then, let the GA make someprogress, and repeat the previous steps. The detailed

3

Page 4: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

algorithm can be found in [9]. If injected solutionscontain useful cross problem information, the GA’sperformance will be significantly boosted. Figure 2shows this situation for CIGAR when it is solvinga sequence of problems, Pi, 0 < i ≤ n, each ofwhich undergoes periodic injection of cases.

Fig. 2. Solving problems in sequence with CIGAR. Notethe multiple periodic injections in the population as CIGARattempts problem Pi, 0 < i ≤ n.

We have described one particular implementationof such a system. Other less elitist approachesfor choosing population members to replace arepossible, as are different strategies for choosingindividuals from the case-base. We can also vary theinjection percentage; the fraction of the populationreplaced by chosen injected cases.

CIGAR has to periodically inject cases becausewe do not know which previously solved problemsare similar to the current one. That is, we donot have a problem similarity metric. However,the Hamming distance between binary encodedchromosomes provides a simple and remarkablyeffective solution similarity metric. We thus findand inject cases that are similar (close in Hammingdistance) to the current best individual. Since thecurrent best individual changes, we have to find andinject the closest cases into the evolving population.We are assuming that similar solutions must havecome from similar problems and that these similarsolutions retrieved from the case-base contain use-ful information to guide genetic search. Althoughthis is an assumption, results on design, scheduling,and allocation problems show the efficacy of this

similarity metric and therefore of CIGAR [9].An advantage of using solution similarity arises

from the string representations typically used bygenetic algorithms. A chromosome is a string ofsymbols. String similarity metrics are relativelyeasy to create and compute, and furthermore, aredomain independent.

What happens if our similarity measure is noisyand/or leads to unsuitable retrieved cases? By def-inition, unsuitable cases will have low fitness andwill quickly be eliminated from the GA’s popula-tion. CIGAR may suffer from a slight performancehit in this situation but will not break or fail –the genetic search component will continue makingprogress towards a solution. In addition, note thatdiversity in the population - “the grist for the mill ofgenetic search [14]” can be supplied by the geneticoperators and by injection from the case-base. Evenif the injected cases are unsuitable, variation is stillinjected.

The system that we have described injects indi-viduals from the case-base that are deterministicallyclosest, in Hamming distance, to the current bestindividual in the population. We can also chooseschemes other than injecting the closest to the best.For example, we have experimented with injectingcases that are the furthest (in the case-base) fromthe current worst member of the population. Proba-bilistic versions of both have also proven effective.

Reusing old solutions has been a traditionalperformance improvement procedure. The CIGARapproach differs in that 1) we attack a set oftasks, 2) store and reuse intermediate candidatesolutions, and 3) do not depend on the existence ofa problem similarity metric. CIGAR pseudo-codeand more details are provided in [9].

B. CIGAR for RTS games

Within the context of RTS games as resourceallocation problems, genetic algorithms can usuallyrobustly search for effective strategies. These strate-gies usually approach static game optimal strategiesbut they do not necessarily approach optima inthe real world as the game is an imperfect reflec-tion of reality. For example, in complex games,

4

Page 5: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

humans with past experience “playing” the real-world counterpart of the game tend to includeexternal knowledge when producing strategies forthe simulated game. Incorporating knowledge fromthe way these humans play (through case injection)should allow us to carry over some of this externalknowledge into GAP’s game play. Since GAP cangain experience (cases) from observing and record-ing human Blue-players’ decisions as well as fromplaying against human or computer opponents, caseinjection allows GAP to use current game-stateinformation as well as acquired knowledge to playbetter. Our game is designed to record all playerdecisions (moves) on a central server for laterstorage into a case-base. The algorithm does notconsider whether cases come from humans or frompast game-playing episodes and can use cases froma variety of sources. We are particularly interestedin acquiring and using cases from humans in orderto learn to play with a specific human style and tobe able to acquire knowledge external to the game.

Specifically, we seek to produce a genetic al-gorithm player that can play on a strategic leveland learn to emulate aspects of strategies used byhuman players. Our goals in learning to play likehumans are:

• We want to use GAP for decision support,whereby GAP provides suggestions and alter-native strategies to humans actively playing thegame. Strategies more compatible with thosebeing considered by the humans should bemore likely to have a positive effect on thedecision-making process.

• We would like to make GAP a challengingopponent to play against.

• We would like to use GAP for training. GAPplays not just to win but to teach its opponenthow to better play the game, in particular toprepare them for future play against human op-ponents. This would allow us to use GAP foracquiring knowledge from human experts andtransferring that knowledge to human playerswithout the expense of individual training withexperts.

These roles require GAP to play with objectivesin mind besides that of winning – these objectives

would be difficult to quantify inside the evaluator.As humans can function effectively in these regards,learning from them should help GAP better fulfillthese responsibilities.

C. Playing the game

A genetic algorithm can generate an initial re-source allocation (a plan) to start the game. How-ever, no initial plan survives contact with the en-emy.1 The dynamic nature of the game requires re-planning in response to opponent decisions (moves)and changing game-state. This replanning has to befast enough to not interfere with the flow of thegame and the new plan has to be good enough towin the game, or at least, not lose. Can genetic algo-rithms satisfy these speed and quality constraints?Initial results on small scenarios with tens of unitsshowed that a parallelized genetic algorithm on asmall ten-node cluster runs fast enough to satisfyboth speed and quality requirements. For the case-injected genetic algorithm, replanning is simplysolving a similar planning problem. We have shownthat CIGAR learns to increase performance withexperience at solving similar problems [9], [15],[16], [17], [18]. This implies that when used forreplanning, CIGAR should quickly produce betternew plans in response to changing game dynamics.In our game, aircraft break off to attack newlydiscovered targets, reroute to avoid new threats, andre-prioritize to deal with changes to the game state.

Beyond speeding up GAP’s response to scenariochanges through replanning we use case injectionin order to produce plans that anticipate opponentmoves. This teaches GAP to act in anticipation ofchanging game states and leads to the avoidance oflikely traps and better capitalization on opponentvulnerabilities. GAP learns to avoid areas likely tocontain traps from two sources:

• Humans: As humans play the game, GAP addstheir play to the case-base gaining some oftheir strategic knowledge. Specifically, when-ever the human player makes a game move,

1We paraphrase from a quote attributed to Helmuth vonMoltke.

5

Page 6: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

the system records this move for later stor-age into the case-base. The system thus ac-quires knowledge from humans simply byrecording their game-play. We do not need toconduct interviews, deploy concept maps, oruse other expensive, error-prone, and lengthyknowledge-acquisition techniques.

• Experience: As GAP plays games it builds acase-base with knowledge of how it shouldplay. Since the system does not distinguishbetween human players and GAP, it acquiresknowledge from GAP’s game-play exactly asdescribed above.

Our results indicate GAP’s potential in makingan effective Blue player with the ability to quicklyreplan in response to changing game dynamics,and that case injection can bias GAP to producegood solutions that are suboptimal with respect tothe game simulation’s evaluation function but thatavoid potential traps. Instead of changing evaluationfunction parameters or code, GAP changes its be-havior by acquiring and reusing knowledge, storedas cases in a case-base. Case injection also biasesthe genetic algorithm towards producing strategiessimilar to those learned from a human player.Furthermore, our novel representation allows thegenetic algorithm to re-use learned strategic knowl-edge across a range of similar scenarios indepen-dent of geographic location.

III. PREVIOUS WORK

Previous work in strike force asset allocation hasbeen done in optimizing the allocation of assets totargets, the majority of it focusing on static pre-mission planning. Griggs [19] formulated a mixed-integer problem (MIP) to allocate platforms andassets for each objective. The MIP is augmentedwith a decision tree that determines the best planbased upon weather data. Li [20] converts a nonlin-ear programming formulation into a MIP problem.Yost [21] provides a survey of the work that hasbeen conducted to address the optimization of strikeallocation assets. Louis [22] applied case-injectedgenetic algorithms to strike force asset allocation.

From the computer gaming side, a large bodyof work exists in which evolutionary methods have

been applied to games [2], [23], [4], [24], [3].However the majority of this work has been ap-plied to board, card, and other well-defined games.Such games have many differences from popularreal-time strategy (RTS) games such as Starcraft,Total Annihilation, and Homeworld [25], [26],[27]. Chess, checkers and many others use entities(pieces) that have a limited space of positions (suchas on a board) and restricted sets of actions (definedmoves). Players in these games also have well-defined roles and the domain of knowledge avail-able to each player is well identified. These charac-teristics make the game state easier to specify andanalyze. In contrast, entities in our game exist andinteract over time in continuous three-dimensionalspace. Entities are not controlled directly by playersbut instead sets of parametrized algorithms controlthem in order to meet goals outlined by players.This adds a level of abstraction not found in moretraditional games. In most such computer games,players have incomplete knowledge of the gamestate and even the domain of this incomplete knowl-edge is difficult to determine. Laird [7], [8], [28]surveys the state of research in using AI tech-niques in interactive computers games. He describesthe importance of such research and provides ataxonomy of games. Several military simulationsshare some of our game’s properties [29], [30],however, these attempt to model reality while oursis designed to provide a platform for research instrategic planning, knowledge acquisition and re-use, and to have fun. The next section describesthe scenario being played.

IV. THE SCENARIO

Figure 3 shows an overview of our test scenario.We chose the scenario to be simple and easy toanalyze but to still encapsulate the dynamics oftraps and anticipation.

The translucent grey hemispheres show the ef-fective radii of Red’s threats placed on the gamemap. The scenario takes place in Northern Nevada,Walker Lake is visible near the bottom of the mapcovered by the largest grey hemisphere. Red haseight targets on the right-hand side of the map withtheir locations denoted by the cross-hairs. Red has

6

Page 7: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

Fig. 3. The Scenario

a number of threats placed to defend the targets andthe translucent grey hemispheres show the effectiveradii of some of these threats. Red has the potentialto play a popup threat to trap platforms venturinginto the corridor formed by the threats.

Blue has eight platforms, all of which startin the lower left-hand corner. Each platform hasone weapon, with three classes of weapons beingdistributed among the platforms. Each of the eightweapons can be allocated to any of the four targets,giving 48 = 216 = 65, 536 allocations. This spacecan be exhaustively searched but more complexscenarios quickly become intractable.

In this scenario, GAP’s router can produce threebroad types of routes that we have named black,white, and grey (see figure 3).

1) Black - Flies through the corridor in order toreach the targets.

2) White - Flies around the threats, attacking thetargets from behind.

3) Grey - Flies inside the perimeter of knownthreats (not shown in the figure).

Grey routes expose platforms to unnecessary riskfrom threats and thus receive low fitness. Ignoringpopup threats, the optimal strategy contains blackroutes, which are the most direct routes to the targetthat still manage to avoid known threats. However,in the presence of the popup threat and our riskaverse evaluation function, aircraft following the

black route are vulnerable and white routes becomeoptimal although they are longer than black routes.The evaluator looks only at known threats, so planscontaining white routes receive lower fitness thenthose containing black routes. GAP should learn toanticipate traps and to prefer white trap-avoidingroutes even though white routes have lower fitnessthan black routes.

In order to search for good routes and allocations,GAP must be able to compute and compare theirfitnesses. Computing this fitness is dependent on therepresentation of entities’ states inside the game,and our way of computing fitness and representingthis state is described next.

A. Fitness

We evaluate the fitness of an individual in GAP’spopulation by running the game and checking thegame outcome. Blue’s goals are to maximize dam-age done to red targets, while minimizing damagedone to its platforms. Shorter simpler routes arealso desirable, so we include a penalty in the fitnessfunction based on the total distance traveled. Thisgives the fitness calculated as shown in Equation 1.

Fitness(plan) = Damage(Red) - Damage(Blue)−d∗c(1)

d is the total distance traveled by Blue’s platformsand c is chosen such that d ∗ c has a 10-20%effect on the fitness of a plan. Total damage doneis calculated below.

Damage(Player) =∑

E∈F

Ev ∗ (1 − Es)

E is an entity in the game and F is the set ofall forces belonging to that side. Ev is the valueof E, while Es is the probability of survival forentity E. We use probabilistic health metrics toevaluate entity damage keeping careful track oftime to ensure that the probabilities are calculatedat appropriate times during game-play.

B. Probabilistic Health Metrics

In many games, entities (platforms, threats, andtargets in our game) possess hit-points that rep-resent their ability to take damage. Each attack

7

Page 8: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

removes a number of hit-points and the entity isdestroyed when the number of hit-points is reducedto zero. In reality, weapons have a more hit ormiss effect, destroying entities or leaving themfunctional. A single attack may be effective whilemultiple attacks may have no effect. Although morerealistic, this introduces a large degree of stochasticerror into the game. Evaluating an individual plancan result in outcomes ranging from total failure toperfect success, making it difficult to compare twoplans based on a single evaluation. Lacking a goodcomparison, it is difficult to search for an optimalstrategy. By taking a statistical analysis of survivalwe can achieve better results.

Consider the state of each entity at the end of themission as a random variable. Comparing the ex-pected values for those variables allows judging theeffectiveness of a plan. These expected values canthen be estimated by executing each plan a numberof times and averaging the results. However, do-ing multiple runs to determine a single evaluationincreases the computational expense many-fold.

We use a different approach based on probabilis-tic health metrics. Instead of monitoring whetheror not an object has been destroyed we monitorthe probability of its survival. Being attacked nolonger destroys objects and removes them from thegame, it just reduces their probability of survivalaccording to Equation 2 below.

S(E) = St0(E) ∗ (1 − D(E)) (2)

E is the entity being considered, a platform, target,or threat. S(E) is the probability of survival ofentity E after the attack. St0(E) is probability ofsurvival of E up until the attack and D(E) is theprobability of that platform being destroyed by theattack and is given by equation 3 below.

D(E) = S(A) ∗ E(W ) (3)

Here, S(A) is the attacker’s probability of survivalup until the time of the attack and E(W ) is theeffectiveness of the attacker’s weapon as given inthe weapon-entity effectiveness matrix. Our methodprovides the expected values of survival for allentities in the game within one run of the game,thereby producing a representative evaluation of the

value of a plan. As a side effect, we also gain asmoother gradient for the GA to search as wellas consistently reproducible evaluations. We expectthat this approach will work for games where acontinuous approximation to discontinuous events(like death) does not affect game outcomes. Notethat this approach does not yet consider: 1) ensuringthat performance lies above a minimum accept-able threshold and 2) a plan’s tolerance to smallperturbations. Incorporating additional constraintsis ongoing work, but for this paper the evaluationfunction described above provides an efficient ap-proach to evaluating a plan’s effectiveness for theGA.

The strike force game uses this approach tocompute damage sustained by entities in the game.The gaming system’s architecture reflects the flowof action in the game and is described next.

C. System Architecture

Fig. 4. System Architecture.

Figure 4 outlines our system’s architecture. Start-ing at the left, Red and Blue, human and GAPrespectively, see the scenario and have some ini-tialization time to prepare strategy. GAP appliesthe case-injected genetic algorithm to the under-lying resource allocation and routing problem andchooses the best plan to play against Red. The gamethen begins. During the game, Red can activatepopup threats that GAP can detect upon activation.GAP then runs the case-injected genetic algorithmproducing a new plan of action, and so on.

To play the game, GAP must produce routingdata for each of Blue’s platforms. Figure 5 showshow routes are built using the A* algorithm [31].A* builds routes between locations that platformswish to visit, generally the starting airbase andtargets they are attacking. The A* router finds the

8

Page 9: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

Fig. 5. How Routes are Built From an Encoding.

cheapest route, where cost here is a function oflength and risk and leads to a preference for theshortest routes that avoid threats.

We parameterize A* in order to represent andproduce routes that are not dependent on geograph-ical location and that have specific characteristics.For example, to avoid traps, GAP must be able tospecify that it wants to avoid areas of potentialdanger. In our game, traps are most effective inareas confined by other threats. If we artificiallyinflate threat radii, threats expand to fill in potentialtrap corridors and A* produces routes that goaround these expanded threats. We thus introduce aparameter, rc that encodes threats’ effective radii.Larger rc’s expand threats and fill in confined areas,smaller rc’s lead to more direct routes. Figures 6and 7 shows rc’s effect on routing, as rc increases,A* produces routes that avoid the confined area.

Fig. 6. Routing With rc = 1.0

In our scenarios, values of rc < 1.0 produce greyroutes, values with 1.0 < rc < 1.35 produce directblack routes, and values of rc > 1.35 produce whitetrap-avoiding routes. rc is limited currently to therange [0, 3] and encoded with eight (8) bits at the

Fig. 7. Routing With rc = 1.3

end of our chromosome. We encoded a single rcfor each plan but are investigating the encodingof rc’s for each section of a route. Note that thisrepresentation of routes is location independent.We can store and re-use values of rc that haveworked in different terrains and different locationsto produce more direct or indirect routes.

D. Encoding

Most of the encoding specifies the asset-to-targetallocation with rc encoded at the end as detailedearlier. Figure 8 shows how we represent the allo-cation data as an enumeration of assets to targets.The scenario involves two platforms (P1, P2), eachwith a pair of assets, attacking four targets. Theleft box illustrates the allocation of asset A1 onplatform P1 to target T3, asset A2 to target T1and so on. Tabulating the asset-to-target allocationgives the table in the center. Letting the positiondenote the asset and reducing the target id to binarythen produces a binary string representation for theallocation.

Fig. 8. Allocation Encoding

Earlier work has shown how we can use CIGARto learn to increase asset allocation performance

9

Page 10: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

with experience [9] and we therefore focus moreon rc and routing in this paper.

V. LEARNING TO AVOID TRAPS

We address the problem of learning from expe-rience to avoid traps using a two-part approach.First, from experience, we learn where traps arelikely to be, then we apply that acquired knowl-edge and avoid potential traps in the future. Caseinjection provides an implementation of these steps:building a case-base of individuals from past gamesstores important knowledge. The injection of thoseindividuals applies the knowledge towards futuresearch.

GAP records games played against opponentsand runs offline to determine the optimal wayto win the previously played game. If the gamecontains a popup trap, genetic search progressestowards the optimal strategy in the presence ofthe popup and GAP saves individuals from thissearch into the case-base, building a case-base withroutes that go around the popup trap – white routes.When faced with other opponents, GAP then injectsindividuals from the case-base, biasing the currentsearch towards containing this learned anticipatoryknowledge.

Specifically, GAP first plays our test scenario,likely picking a black route and falling into Red’strap. Afterward GAP replays the game while in-cluding Red’s trap. At this stage black routes re-ceive poor fitness and GAP prefers white trap-avoiding routes. Saving individuals to the case-basefrom this search stores a cross-section of planscontaining “trap avoiding” knowledge.

The process produces a case-base of individualsthat contain important knowledge about how weshould play, but how can we use that knowledgein order to play smarter in the future? We use caseinjection when playing the game and periodicallyinject a number of individuals from the case-baseinto the population, biasing our current search to-wards information from those individuals. Injectionreplaces the worst members of the population withindividuals chosen from the case-base through a”probabilistic closest to the best” strategy [9]. These

new individuals bring their “trap avoiding” knowl-edge into the population, increasing the likelihoodof that knowledge being used in the final solutionand therefore increasing GAP’s ability to avoid thetrap.

A. Knowledge Acquisition and Application

Imagine playing a game and seeing your oppo-nents do something you had not considered but thatworked out to great effect. Seeing something new,you are likely to try to learn some of the dynamicsof that move so you can incorporate it into your ownplay and become a better player. Ideally you wouldhave perfect understanding of when and where thismove is effective and ineffective, and how to bestexecute it under effective circumstances. Whetherthe move is using a combination of chess pieces ina particular way, bluffing in poker, or doing a reaverdrop in Starcraft, the general idea remains. In orderto imitate this process we use a two-step approachwith case injection. First, we learn knowledge fromhuman players by saving their decision makingduring game play and encoding it for storage inthe case-base. Second, we apply this knowledge byperiodically injecting these stored cases into GAP’sevolving population.

B. Knowledge Acquisition

Knowledge acquisition is a significant problem inrule-based systems. GAP acquires knowledge fromhuman Blue players by recording player plans, re-verse engineering these plans into genetic algorithmchromosomes, and storing these chromosomes ascases in our case-base. In the strike force game,we can easily encode the human player’s assetallocation. Finding an rc that closely matches theroute chosen by the human player may requirea search but note that this reverse-engineering isdone offline. When a person plays the game, westore all human moves (solutions) into the case-base. Injecting appropriate cases from a particularperson’s case-base biases the genetic algorithm togenerate candidate solutions that are similar tothose from the player. Instead of interviewing anexpert game player, deriving rules that govern the

10

Page 11: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

player’s strategy and style, and then encoding theminto a finite state machine or a rule-base, ourapproach simply and automatically records playerinteractions while playing the game, automaticallytransforms player solutions into cases, and usesthese cases to bias search toward producing strate-gies similar to those used by the player. We believethat our approach is less expensive in that we donot need a knowledge engineer. In addition, we gainflexibility and robustness. For example, considerwhat happens when a system is confronted with anunexpected situation. In rule-based systems, defaultrules that may or may not be appropriate to the situ-ation control game play. With case-injected geneticalgorithms, if no appropriate cases are available,the “default” genetic algorithm finds near-optimalplayer strategies.

C. Knowledge Application

Consider learning from a human who played awhite trap-avoiding route, but had a non-optimalallocation. The GA should keep the white route, butoptimize the allocation unless the allocation itselfwas based on some external knowledge (a particulartarget might seem like a trap), in which case theGA should maintain that knowledge. Identifyingwhich knowledge to maintain and which to replaceis a difficult task even for human players. In thisresearch, we thus use GAP to reproduce a simplebut useful and easily identifiable aspect of humanstrategy: avoidance of confined areas.

VI. RESULTS

We designed test scenarios that were non-trivialbut tractable. In each of the test scenarios, weknow the optimum solution and can thus evaluateGAP’s performance against this known optimum.This allows us to evaluate our approach on a well-understood (known) problem. For learning trap-avoidance in the presence of “popup” traps, humanexperts (the authors) played optimally and chosewhite trap-avoiding routes with an optimal assetallocation.

Plans consist of an allocation of assets to targetsand a parameter to A* (rc) that determines the

route taken. For the scenarios considered, reverse-engineering a human plan into a chromosome isnon-trivial but manageable. The human asset alloca-tion can be easily reverse-engineered but we have tosearch through rc values (offline) to find the closestchromosomal representation of the human route.

We present results showing that

1) GAP can play the strike force asset allocationgame effectively

2) Replanning can effectively react to popups3) GAP can use case injection to learn to avoid

traps4) GAP can use knowledge acquired from hu-

man players5) With our representation, acquired knowledge

can be generalized to different scenarios6) Fitness biasing can maintain injected infor-

mation in the search.

Unless stated otherwise, GAP uses a populationsize of 25, two-point crossover with a probabilityof 0.95, and point mutation with a probability of0.01. We use elitist selection, where offspring andparents compete for population slots in the nextgeneration [32]. Experimentation showed that theseparameter values satisfied our time and qualityconstraints. Results are averages over 50 runs andare statistically significant at the 0.05 level ofsignificance or below.

A. GAP plays the Game

We first show that GAP can generate good strate-gies strategies. GAP runs 50 times against our testscenario, and we graph the minimum, maximum,and average population fitness against the numberof generations in Figure 9. We designed the testscenario to have an optimum fitness of 250 andthe graph in Figure 9 shows a strong approachtoward the optimum - in more the 95% of runs thefinal solution is within 5% of the optimum. Thisindicates that GAP can form effective strategies forplaying the game.

B. Playing the game - Complex Mission

Testing GAP on a more realistic/complex mission(in Figure 10) leads to a similar effect shown

11

Page 12: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

Fig. 9. Best/Worst/Average Individual Fitness as a functionof Generation - Averaged over 50 runs

in Figure 11. This mission has a wider array ofdefenses, which are often placed directly on topof targets. Note that the first generation best isnow much farther from the optimum compared withFigure 9, but that the genetic algorithm quicklymakes progress. Sample routes generated by GAPto reach targets in the two widely separated clustersare shown and there are no popups in this mission.

Fig. 10. Complex Mission

C. Replanning

To analyze GAP’s ability to deal with the dy-namic nature of the game we look at the effectsof replanning. Figure 12 illustrates the effect of

Fig. 11. Best/Worst/Average Individual Fitness as a functionof Generation - Averaged over 50 runs on the Complex Mission

Fig. 12. Final routes used during a mission involvingreplanning.

replanning by showing the final route followedinside a game. A black (direct) route was chosen,and when the popup occurred, trapping the plat-forms, GAP redirected the strike force to retreatand attack from the rear. Replanning allows GAPto rebuild its routing information as well as modifyits allocation to compensate for damaged platforms.The A* routing algorithm’s cost function found thatthe lowest cost route was to retreat and go aroundthe threats rather than simply fly through the popup.Using a different cost function may have allowedthe mission to keep flying through the popup evenin the new plan. The white route shown in the figureis explained next.

12

Page 13: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

D. Learned Trap Avoidance

GAP learns to avoid traps through playing gamesoffline. Specifically, GAP plays (or replays) gamesthat it lost in order to learn how to avoid losing. Inour scenario, during GAP’s offline play, the popupwas included as part of the scenario and casescorresponding to solutions that avoided the popupthreat were stored in the case-base. GAP learnsto avoid the popup trap through injection of thesecases obtained from offline play. This is also shownin Figure 12, where GAP, having learned from pastexperience, prefers the white trap-avoiding route.

GAP’s ability to learn to avoid the trap can alsobe seen by looking at the numbers of black andwhite routes produced with and without case injec-tion as shown in Figure 13. The figures comparethe histograms of rc values produced by GAP withand without case injection. Case injection leadsto a strong shift in the kinds of rc’s produced,biasing the population towards using white routes.The effect of this bias is a large and statisticallysignificant increase in the frequency at which strate-gies containing white routes were produced (2% to42%). These results were based on 50 independentruns of the system and show that case injection doesbias the search toward avoiding the trap.

E. Case Injection’s Effect on Fitness

Figure 14 compares the fitnesses with and with-out case injection. Without case injection the searchshows a strong approach toward the optimal black-route plan; with injection the population quicklyconverges toward the white-route plan.

Case injection applies a bias towards whiteroutes, however the GA has a tendency to act in op-position to this bias, trying to search towards ever-shorter routes. GAP’s ability to overcome the biasthrough manipulation of injected material dependson the size of the population and the number ofgenerations run. We will come back to this later inthe section.

Instead of gaining experience by re-playing pastgames offline, we can also gain experience byacquiring knowledge from good players. Since wecontrol the game’s interface, it is a simple matter

Fig. 13. Histogram of Routing Parameters Produced withoutCase Injection (top) and with Case Injection from Offline Play(bottom)

Fig. 14. Effect of Case Injection on Fitness Inside the GAover time

to capture all human player decisions during thecourse of playing the game. We can then convertthese decisions into our plan encoding and storethem in the case-base for later injection. Using thismethodology, we reverse engineer the human route(shown in black in Figure 15) into our chromosomeencoding. The closest encoding gives the routeshown in white in Figure 15. The plans are notidentical because the chromosome does not containexact routes - it contains the routing parameter rc.The overall fitness difference between these two

13

Page 14: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

plans is less then 2%.

Fig. 15. Plans produced by the Human and GAP

The rc values determine the route category pro-duced and GAP’s ability to generate the humanroute depends on the values of rc found by the GA.Figure 16 shows the distribution of rc producedby the non-injected genetic algorithm and CIGAR.Comparing the figures shows a significant shift inthe rc’s produced. This shift corresponds to a largeincrease in the number of white routes generatedby CIGAR. Without case injection, GAP producedno (0%) white trap-avoiding routes routes, but usingcase injection, 64% of the routes produced by GAPwere white trap-avoiding routes. This difference isstatistically significant and based on 50 differentruns of the system with different random seeds.The figures indicate that case injection does biasthe search towards the human strategy.

Moving to the mission shown in Figure 17and repeating the process produces the histogramsshown in Figure 18. The same effect on rc can beobserved even though the missions are significantlydifferent in location and in optimal allocation,and even though we use cases from the previousmission. Case injection and the general routingrepresentation allows GAP to generalize and learnto avoid confined areas from play by the humanexpert.

Fig. 16. Histogram of Routing Parameters Produced withoutInjection (top) and with Injection of Human Cases (bottom)

Fig. 17. Alternate Mission

F. Fitness Biasing

Case injection applies a bias to the GA search,while the number and frequency of individuals in-jected determines the strength of this bias. However,the fitness function also contains a term that biasesagainst producing longer routes. Thus, we wouldexpect that as the number of evaluations allotted tothe GA increases, the bias against longer routes out-weighs the bias towards white trap-avoiding longer

14

Page 15: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

Fig. 18. Histogram of Routing Parameters on the AlternateMission without Case Injection (top) and with Case Injection(bottom)

routes and fewer white routes are produced. Theeffect is shown in Figure 19. We use fitness biasing

Fig. 19. Percentage of white trap-avoiding routes producedover time

to change this behavior. Fitness biasing effectivelychanges the fitness landscape of the underlyingproblem by changing the fitness of an individual.

One possible approach to changing the fitnesslandscape is to change the fitness function. Thiswould either involve re-writing code, or parameter-izing the fitness function and using some algorithmto set parameters to produce desired behavior. Ei-

ther way, changing the fitness function is equivalentto changing the strike force game and is domaindependent. However, we want to bias fitness ina domain independent way without changing thegame.

We propose a relatively domain independent wayto use information from human derived cases to biasfitness. An individual’s fitness is now the sum oftwo terms: 1) the fitness returned from evaluationand 2) a bonus term that is directly proportional tothe number of injected bits in the individual. Letfb be the biased fitness and let fe be the fitnessreturned by evaluation. Then the equation belowcomputes the biased fitness.

fb = fe(1 + g(nb, l))

Where nb is the number of injected bits in thisindividual, l is the chromosome length, and

g(nb, l) = anb + b if nb < l/c= l/c otherwise

where a, b and c are constants. In our work, we useda = 1, b = 0 and c = 5 resulting the the simple biasfunction below.

g(nb, l) = nb if nb < l/5= l/5 otherwise

Since the change in fitness depends on the genotype(a bit string) not on the domain dependent phe-notype, we do not expect to have to significantlychange this fitness biasing equation for other do-mains.

With fitness biasing, there is a significant in-crease in the number of white trap-avoiding routesproduced, regardless of the number of evaluationspermitted. Figure 20 compares the number of whitetrap-avoiding routes produced by the genetic algo-rithm, by CIGAR, and by CIGAR with fitness bi-asing. Clearly, fitness biasing increases the numberof white routes.

Fitness biasing’s long-term behavior is depictedin Figure 21. The figure shows that as the numberof evaluations increases, the number of white routesproduced with fitness biasing remains relativelyconstant and that this number decreases otherwise.

15

Page 16: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

Fig. 20. Times Trapped Top: Without Injection Middle: WithInjection - No Fitness Biasing Bottom: With Fitness Biasing

Fig. 21. Fitness Biasing’s effect over time

Summarizing, the results indicate that CIGARcan produce competent players for RTS games.GAP can learn through experience gained fromhuman Blue players and from playing against Redopponents. Fitness biasing changes the fitness land-scape in response to acquired knowledge and leadsto better performance in learning to avoid traps. Fi-nally, our novel route representation allows GAP togeneralize acquired knowledge to other geographiclocations and scenarios.

VII. SUMMARY, CONCLUSIONS, AND FUTURE

WORK

In this paper, we developed and used a strikeforce planning real-time strategy game to show thatthe case-injected genetic algorithm can 1) play thegame; 2) learn from experience to play better; and3) learn trap avoidance from a human player’s gameplay. We cast our RTS game play as the solvingof resource allocation problems and showed thata parallel genetic algorithm running on a ten-nodecluster can efficiently solve the problems consid-ered in this paper. Thus, the genetic algorithm canplay the strike force RTS game by solving thesequence of resource allocation problems that ariseduring game play.

Case injection allows the genetic algorithmplayer to learn from past experience and leads tobetter and quicker response to opponent game play.This past experience can come from previous gameplay or from expert human game play. To showthat case-injected genetic algorithms can acquireand use knowledge from human game play we firstdefined a structured scenario involving laying andavoiding traps as a test-bed. We then showed howcase-injected genetic algorithms use cases savedfrom human game play in learning to avoid con-fined areas (potential traps). Although the systemhas no concept of traps or how to avoid them, weshowed that the system acquired and used trap-avoiding knowledge from automatically generatedcases that represented human moves (decisions)during game play. Specifically, the system worksby automatically recording human player movesduring game play. Next, it automatically gener-ates cases for storage into a case-base from these

16

Page 17: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

recorded moves. Finally, the system periodicallyinjects relevant cases into the evolving populationof the genetic algorithm. Since humans recognizeand avoid confined areas that have high potential fortraps, cases generated from human play implicitlycontain trap-avoiding knowledge. When injected,these cases bring trap-avoiding knowledge into theevolving genetic algorithm population.

However, the evaluation function does not modelthe knowledge being acquired from human players:Trap-avoiding knowledge in our scenarios. Thegenetic algorithm player may therefore prematurelylose these low-fitness injected individuals. To en-sure that the genetic algorithm player does not loseacquired knowledge, we proposed a new method,fitness biasing, for more effectively retaining andusing acquired knowledge. Fitness biasing is a do-main independent method for changing the fitnesslandscape by changing the value returned from theevaluation function by a factor that depends onthe amount of acquired knowledge. This amount ofacquired knowledge is measured (domain indepen-dently) by the number of bits that were inheritedfrom injected cases in the individual being evalu-ated.

We parameterized the A* search algorithm inorder to define representation for routes that allowstrap-avoidance knowledge to generalize to newgame scenarios and locations. Specifically, this newrepresentation allows using cases acquired duringgame play in one scenario (or map) to bias systemplay in other scenarios. Recent work in addingmore parameters to the routing system has shownthat the genetic algorithm player can effectivelyemulate many attack strategies, from pincer attacksto combined assaults.

We plan to build on these results to furtherdevelop the game. We would like to make the gamemore interesting, allow multiple players to play,and to develop the code for distribution. In thenext phase of our research we will be developinga genetic algorithm player for the Red side. Co-evolving competence has a long history in evolu-tionary computing approaches to game playing andwe would like to explore this area for RTS games.

Acknowledgments

This material is based upon work supported bythe Office of Naval Research under contract numberN00014-03-1-0104.

REFERENCES

[1] P. J. Angeline and J. B. Pollack, “Competitiveenvironments evolve better solutions for complextasks,” in Proceedings of the 5th InternationalConference on Genetic Algorithms (ICGA-93),1993, pp. 264–270. [Online]. Available: cite-seer.ist.psu.edu/angeline93competitive.html

[2] D. B. Fogel, Blondie24: Playing at the Edge of AI.Morgan Kauffman, 2001.

[3] A. L. Samuel, “Some studies in machine learning usingthe game of checkers,” IBM Journal of Research andDevelopment, vol. 3, pp. 210–229, 1959.

[4] J. B. Pollack, A. D. Blair, and M. Land, “Coevolution ofa backgammon player,” in Artificial Life V: Proc. of theFifth Int. Workshop on the Synthesis and Simulation ofLiving Systems, C. G. Langton and K. Shimohara, Eds.Cambridge, MA: The MIT Press, 1997, pp. 92–98.

[5] G. Tesauro, “Temporal difference learning and td-gammon,” Communications of the ACM, vol. 38, no. 3,1995.

[6] D. B. Fogel, T. J. Hays, S. L. Hahn, and J. Quon, “Tempo-ral difference learning and td-gammon,” Communicationsof the ACM, vol. 92, no. 12, pp. 1947 – 1954, 2004.

[7] J. E. Laird, “Research in human-level AI using computergames,” Communications of the ACM, vol. 45, no. 1, pp.32–35, 2002.

[8] J. E. Laird and M. van Lent, “The role of AIin computer game genres,” 2000. [Online]. Avail-able: http://ai.eecs.umich.edu/people/laird/papers/book-chapter.htm

[9] S. J. Louis and J. McDonnell, “Learning with caseinjected genetic algorithms,” IEEE Transactions on Evo-lutionary Computation, vol. 8, no. 4, pp. 316–328, 2004.

[10] C. K. Riesbeck and R. C. Schank, Inside Case-BasedReasoning. Cambridge, MA: Lawrence Erlbaum Asso-ciates, 1989.

[11] R. C. Schank, Dynamic Memory: A Theory of Remindingand Learning in Computers and People. Cambridge,MA: Cambridge University Press, 1982.

[12] D. B. Leake, Case-Based Reasoning: Experiences,Lessons, and Future Directions. Menlo Park, CA:AAAI/MIT Press, 1996.

[13] S. J. Louis, G. McGraw, and R. Wyckoff, “Case-basedreasoning assisted explanation of genetic algorithm re-sults,” Journal of Experimental and Theoretical ArtificialIntelligence, vol. 5, pp. 21–37, 1993.

[14] D. E. Goldberg, Genetic Algorithms in Search, Optimiza-tion, and Machine Learning. Addison-Wesley, 1989.

[15] S. J. Louis, “Evolutionary learning from experience,”Journal of Engineering Optimization, vol. 26, no. 2, pp.237–247, 2004.

17

Page 18: A TEST FOR PlayingIEEETRAN.CLS—miles/papers/ec2005.pdf · 2007-03-28 · A TEST FORPlayingIEEETRAN.CLS— [RtoUNNINGLearn:ENHANCED CLASS V1.6]Case-Injected Genetic 1 Algorithms

[16] ——, “Genetic learning for combinational logic design,”Journal of Soft Computing, vol. 9, no. 1, pp. 38–43, 2004.

[17] ——, “Learning from experience: Case injected geneticalgorithm design of combinational logic circuits,” in Pro-ceedings of the Fifth International Conference on Adap-tive Computing in Design and Manufacturing. Springer-Verlag, 2002, pp. 295–306.

[18] S. J. Louis and J. Johnson, “Solving similar problemsusing genetic algorithms and case-based memory,” inProceedings of the Seventh International Conference onGenetic Algorithms. Morgan Kauffman, San Mateo, CA,1997, pp. 283–290.

[19] B. J. Griggs, G. S. Parnell, and L. J. Lemkuhl, “Anair mission planning algorithm using decision analysisand mixed integer programming,” Operations Research,vol. 45, no. 5, pp. 662–676, Sep-Oct 1997.

[20] V. C.-W. Li, G. L. Curry, and E. A. Boyd, “Strikeforce allocation with defender suppression,” IndustrialEngineering Department, Texas A&M University, Tech.Rep., 1997.

[21] K. A. Yost, “A survey and description of usaf conven-tional munitions allocation models,” Office of AerospaceStudies, Kirtland AFB, Tech. Rep., Feb 1995.

[22] S. J. Louis, J. McDonnell, and N. Gizzi, “Dynamic strikeforce asset allocation using genetic algorithms and case-based reasoning,” in Proceedings of the Sixth Conferenceon Systemics, Cybernetics, and Informatics. Orlando,2002, pp. 855–861.

[23] C. D. Rosin and R. K. Belew, “Methods for competi-tive co-evolution: Finding opponents worth beating,” inProceedings of the Sixth International Conference onGenetic Algorithms, L. Eshelman, Ed. San Francisco,CA: Morgan Kaufmann, 1995, pp. 373–380.

[24] G. Kendall and M. Willdig, “An investigation of an adap-tive poker player,” in Australian Joint Conference on Ar-tificial Intelligence, 2001, pp. 189–200. [Online]. Avail-able: citeseer.nj.new.com/kendall01investgation.html

[25] Blizzard, “Starcraft,” 1998, www.blizzard.com/starcraft.[Online]. Available: www.blizzard.com/starcraft

[26] Cavedog, “Total annihilation,” 1997,www.cavedog.com/totala. [Online]. Available:www.cavedog.com/totala

[27] R. E. Inc., “Homeworld,” 1999, home-world.sierra.com/hw.

[28] J. E. Laird and M. van Lent, “Human-level AI’s killerapplication: Interactive computer games, invited talkat the aaai-2000 conference,” 2000. [Online]. Avail-able: http://ai.eecs.umich.edu/people/laird/papers/AAAI-00.pdf

[29] G. Tidhar, C. Heinze, and M. C. Selvestrel, “Flyingtogether: Modelling air mission teams,” AppliedIntelligence, vol. 8, no. 3, pp. 195–218, 1998. [Online].Available: citeseer.nj.nec.com/tidhar98flying.html

[30] D. McIlroy and C. Heinze, “Air combat tacticsimplementation in the smart whole air mission model,”in Proceedings of the First Internation SimTecTConference, Melbourne, Australia, 1996. [Online].Available: citeseer.nj.nec.com/mcilroy96air.html

[31] B. Stout, “The basics of A* for path planning,” in GameProgramming Gems. Charles River media, 2000, pp.254–262.

[32] L. J. Eshelman, “The CHC adaptive search algorithm:How to have safe search when engaging in nontradi-tional genetic recombination,” in Foundations of GeneticAlgorithms-1, G. J. E. Rawlins, Ed. Morgan Kauffman,1991, pp. 265–283.

Sushil J. Louis Sushil J. Louis is anassociate professor and director of theEvolutionary Computing Systems Lab-oratory, in the Department of ComputerScience and Engineering, University ofNevada, Reno, Nevada 89557, USA. Dr.Louis received the Ph.d. from IndianaUniversity, Bloomington, in 1993 andis a member of the IEEE and ACM.

More information on his current work is available at hiswebsite http://www.cse.unr.edu/∼sushil and he can be reachedat [email protected].

Chris Miles Chris Miles is a Ph.d. student in the EvolutionaryComputing Systems Laboratory. He is working on using evolu-tionary computing techniques for Real-Time Strategy Games.More information on his current work is available at hiswebsite http://www.cse.unr.edu/∼miles and he can be reachedat [email protected].

18


Recommended