CS 480: GAME AI ADVERSARIAL SEARCH 2
5/29/2012 Santiago Ontañón [email protected] https://www.cs.drexel.edu/~santi/teaching/2012/CS480/intro.html
Reminders • Check BBVista site for the course regularly • Also: https://www.cs.drexel.edu/~santi/teaching/2012/CS480/intro.html
• Project 4 description is available. • Project 4 due June 7th
Outline • Student Presentations:
• “Game AI as Storytelling” • “Computational Approaches to Story-telling and Creativity”
• Monte-Carlo Search Algorithms • UCT • Strategy Simulation
Outline • Student Presentations:
• “Game AI as Storytelling” • “Computational Approaches to Story-telling and Creativity”
• Monte-Carlo Search Algorithms • UCT • Strategy Simulation
Board Games • Main characteristic: turn-based
• The AI has a lot of time to decide the next move
Board Games • Not just chess…
Board Games • From an AI point of view:
• Turn-based • Discrete actions • Complete information (mostly)
• Those features make these games amenable to game tree search!
Game Tree Search in Complex Games • Classic minimax assumes (Chess, Checkers, Go…):
• 2 players • Perfect information • Turn-taking game • Given a state and an action, we can predict the next state
• It is easily generalizable to a multiplayer turn-taking game
(max^n algorithm)
• Complex games (like RTS games): • Real-time, not turn-taking, simultaneous actions • Lots of possible actions: branching factor too large! • We cannot exactly predict the next state • Imperfect information
Game Tree Search in RTS Games • Problem:
• Lots of possible actions, branching factor too large!
• Solution: • ???
• Problems: • real-time, no turn taking, simultaneous actions
• Solution: • ???
Game Tree Search in RTS Games • Problem:
• Lots of possible actions, branching factor too large!
• Solution: • Sampling (Monte-Carlo Search)
• Problems: • real-time, no turn taking, simultaneous actions
• Solution: • ???
Monte-Carlo Methods • Idea: use sampling instead of exact calculations • Simplest Monte-Carlo method:
• Integration • Imagine a very complex function f(x), we want to compute the
definite integral of f(x) between a and b:
• Generate N random numbers between a and b. For each number n, compute f(n), and do the average.
• For large values of N, this converges to the actual integral!
Z b
af(x)
Monte-Carlo Tree Search • Monte-Carlo Search:
• Instead of opening the whole minimax tree • Approximate it by sampling (same idea as for the integral)
• For each possible action: play N games at random until the end starting with each action
• If N is large, the average win ratio converges to the expected utility of the action
Minimax vs Monte-Carlo Minimax: Monte-Carlo:
U U U U U U U U U U U U U U U U
Minimax vs Monte-Carlo Minimax: Monte-Carlo:
U U U U U U U U U U U U U U U U
Minimax opens the complete tree (all
possible moves) up to a fixed depth.
Then, the Utility function is applied to
the leaves.
Minimax vs Monte-Carlo Minimax: Monte-Carlo:
U U U U U U U U U U U U U U U U
Monte-Carlo search runs, for each possible move at the root
node, a fixed number K of random complete games.
No need for a Utility function (but it can be used),
Complete Game
Monte-Carlo Search • Advantages:
• Scales up better than minimax (less sensitive to branching factors) • No need for a utility function! Just play till the end and return the
move with highest probability of win.
• Disadvantages: • Brittle: possibility that a good move of the opponent is not sampled
Monte-Carlo Search Improvements • Each branch of a Monte-Carlo search tree is a random
game. • Instead of generating random games, bias the probability
of each move: • Example, in chess, favor capturing moves. This is more likely to
generate move sequences that make sense! • In general: use game-play data to learn which moves are more
frequent, and use those probabilities when generating random games.
Monte-Carlo Search Uses • Extremely useful in complex games when minimax cannot
be used
• When trying to decide between a set of actions: • Just play random games with each action, and select the best one
• Can be used, for example, in: • RTS games • RPG game battles • Board games • Etc.
Outline • Student Presentations:
• “Game AI as Storytelling” • “Computational Approaches to Story-telling and Creativity”
• Monte-Carlo Search Algorithms • UCT • Strategy Simulation
Monte-Carlo Tree Search: UCT • Upper Confidence Tree (UCT) is a state of the art, simple
variant of Monte-Carlo Search, responsible for the recent success of Computer Go programs
• Ideas: • Sampling optimally (UCB) • Instead of opening the whole Minimax tree or play N random
games open only the upper part of the tree, and play random games from there
UCT 0/0 Tree Search
Monte-Carlo Search
Current state w/t is the account of how many games starting from this state
have be found to be won out of the total games explored in the
current search
Current State
UCT 1/1 Tree Search
Monte-Carlo Search
win
UCT 1/2 Tree Search
Monte-Carlo Search
0/1
loss
At each iteration, one node of the tree (upper part) is selected and expanded (one node added to the tree). From this new node a complete game is played out
at random (Monte-Carlo)
UCT 2/3 Tree Search
Monte-Carlo Search
0/1
At each iteration, one node of the tree (upper part) is selected and expanded (one node added to the tree). From this new node a complete game is played out
at random (Monte-Carlo)
1/1
win
UCT 3/4 Tree Search
Monte-Carlo Search
0/1 2/2
1/1
win
The counts w/t are used to determine which nodes to explore next.
Naïve Exploration/Exploitation policy: 50% expand the best node in the tree
50% expand a node at random
UCT 3/4 Tree Search
Monte-Carlo Search
0/1 2/2
1/1
win
The counts w/t are used to determine which nodes to explore next.
Naïve Exploration/Exploitation policy: 50% expand the best node in the tree
50% expand a node at random Instead of this naïve policy, UCT uses an optimal sampling policy called UCB (Upper Confidence Bounds) coming
from reinforcement learning.
UCT 3/5 Tree Search
Monte-Carlo Search
0/1 2/3
1/1 0/1
loss
The tree ensures all relevant actions are explored (greatly alleviates the
randomness that affects Monte-Carlo methods)
UCT 3/5 Tree Search
Monte-Carlo Search
0/1 2/3
1/1 0/1
loss
The random games played from each node of the tree serve to estimate the
Utility function. They can be random, or use an opponent model (if available)
UCT • After a fixed number of iterations K (or after the assigned
time is over), UCT analyzes the resulting trees, and the selected action is the one that has been explored more often.
• UCT can search in games with much larger state spaces than minimax. It is the standard algorithms for modern (from 2008 to present) Go playing programs
Outline • Student Presentations:
• “Game AI as Storytelling” • “Computational Approaches to Story-telling and Creativity”
• Monte-Carlo Search Algorithms • UCT • Strategy Simulation
Game Tree Search in RTS Games • Problem:
• Lots of possible actions, branching factor too large!
• Solution: • Sampling (Monte-Carlo Search)
• Problems: • real-time, no turn taking, simultaneous actions
• Solution: • Strategy simulation, rather than turn-based action taking
Strategy Simulation: Example • Assume we want to use UCT for the Strategy module of
an RTS AI game • Define a collection of “high level actions” (or strategies)
that make sense for the game. For example, in S3: • S1: Attack with the units we have • S2: Train 4 footmen • S3: Train 4 archers • S4: Train 4 catapults • S5: Train 4 knights • S6: Build 2 defense Towers • S7: Build 2 defense Towers around a Gold Mine • S8: Build 2 defense Towers around a group of Trees • S9: Bring units back to the base • S10: Train 2 more peasants to gather resources
Strategy Simulation: Example • Instead of taking turns in executing actions, we assign a
“strategy” to each player, and simulate it until completion:
Player 1, Action 1
Player 2, Action 2
Player 1, Action 3
Player 1: S2 (ETA 240) Player 2: S3 (ETA 400)
Player 1: S1 (ETA 400) Player 2: S3 (ETA 160)
Player 1: S1 (ETA 240) Player 2: S1 (ETA 400)
Player 1: S1
Player 2: S1
Standard Minimax
Strategy Simulation
Strategy Simulation • Requires:
• A way to simulate strategies: typically a very simplified model • E.g. battles just decided by who has more units, or added damage of
units (taking into account air/ground units) • No pathfinding, etc. • Abstracted version of the game, e.g.: divide map in regions, and just
count the number of unit types in each region • Utility function (optional):
• If available, there is no need to simulate games till the end when using Monte-Carlo
• If not available, simply simulate games to the end
UCT for RTS Games • Applicable to:
• Strategy (previous example) • Attack: where the high-level actions are things like attack enemy X,
retreat, etc. • Economy
• In Turn-based games, minimax is executed each turn • For RTS games: execute each K cycles (e.g. once per second), or
once the current action has finished, or an important event happened (e.g. new enemy sighted)
• State of the art: • No current commercial games use it • Research in experimental games shows its potential
Projects 3 & 4 • Project 4 (and last): Rule-based Strategy for RTS Game (S3)
• Idea: • Create a perception layer that creates a simple knowledge base (logical
terms) • Create a simple unification algorithm with variable bindings • Define a set of actions the rule-based system can execute • Define a small set of rules (do not overdo it! J) • RETE is optional (extra credit) • See how well it plays and how easy is it to make the AI play well!
• Anyone wants to do a different project 4? Any ideas?
Next Thursday • Machine Learning in games (last lecture!)