Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 219 times |
Download: | 0 times |
Computing equilibria in extensive form games
Andrew Gilpin
Advanced AI – April 7, 2005
This talk
• Extensive form games– Representation– Computing equilibrium
• Poker AI– History of poker research– Current research
Extensive form representation1. I = {0, 1, …, n} – players2. (V,E), terminals Z – tree3. P: V \ Z H – controlling player
4. H = {H0, …, Hn} – information sets
5. A = {A0, …, An} – actions6. u : Z Rn – payoffs7. p – chance probabilities
Perfect recall assumption: Players never forget informationGame from: Bernhard von Stengel. Efficient Computation of BehaviorStrategies. In Games and Economic Behavior 14:220-246, 1996.
Computing equilibria via normal form
• Normal form exponential, in worst case and in practice (e.g. poker)
Sequence form• Instead of a move for every information set,
consider choices necessary to reach each information set and each leaf
• These choices are sequences and constitute the pure strategies in the sequence form
S1 = {{}, l, r, L, R}S2 = {{}, c, d}
Realization plans• Players strategies are specified as realization plans over sequences:
• Prop. Realization plans are equivalent to behavior strategies.
Computing equilibria via sequence form• Players 1 and 2 have realization plans x and y
• Realization constraint matrices E and F specify constraints on realizations
{} l r L R
{} c d
{} v v’
{} u
Computing equilibria via sequence form• Payoffs for player 1 and 2 are: and
for suitable matrices A and B
• Creating payoff matrix:– Initialize each entry to 0
– For each leaf, there is a (unique) pair of sequences corresponding to an entry in the payoff matrix
– Weight the entry by the product of chance probabilities along the path from the root to the leaf
{} c d
{} l r L R
Computing equilibria via sequence form
Primal Dual
Holding x fixed,compute best response
Holding y fixed,Compute best response
Primal Dual
Computing equilibria via sequence form: An example
min p1subject to x1: p1 - p2 - p3 >= 0 x2: 0y1 + p2 >= 0 x3: -y2 + y3 + p2 >= 0 x4: 2y2 - 4y3 + p3 >= 0 x5: -y1 + p3 >= 0 q1: -y1 = -1 q2: y1 - y2 - y3 = 0bounds y1 >= 0 y2 >= 0 y3 >= 0 p1 Free p2 Free p3 Freeend
Sequence form summary
• Poly-time algorithm for computing Nash equilibria in 2-player zero-sum games
• Poly-size linear complementarity problem (LCP) for computing Nash equilibria in 2-player general-sum games
• Major shortcomings:– Not well understood when more than two players
– Sometimes, polynomial is still slow (e.g. poker)
Poker
• Poker is a wildly popular card game– This year’s World Series of Poker is expected to have
prizes totaling almost $50 million
• Challenges– Incomplete information
– Risk assessment
– Deception and counter-deception
• Sequence form does not directly apply– Two-player Texas Hold’em has ~1018 nodes
Hold’em Poker
• Every player receives hole cards
• Some cards are placed on the table (flop, turn, river)
• Betting rounds after each deal of cards– Players can bet, raise, check, fold, call
• At end of the game, player with best hand takes the pot
Previous work in poker research
• Rule-based• Simulation/Learning• Game-theoretic
– Manual abstraction• “Approximating Game-Theoretic Optimal Strategies
for Full-scale Poker”, Billings, Burch, Davidson, Holte, Schaeffer, Schauenberg, Szafron, IJCAI-03. Distinguished Paper Award.
– Automated abstraction
Finding equilibria in large sequential games of incomplete information
(Joint with Tuomas Sandholm, 2005)
• Outline:– Extensive game isomorphism– Restricted game isomorphic abstraction transformation– GameShrink – automatically shrinking games– Application to poker– Approximation methods
Extensive game isomorphism: example
Extensive game isomorphism: example
Extensive game isomorphism: definition
• Let G=(I,V,E,P,H,A,u,p) and G’=(I’,V’,E’,P’,H’,A’, u’,p’) be given. A bijection f:V V’ is an extensive game isomorphism if:
1. f induces a graph isomorphism between (V,E) and (V’,E’)
2. For each information set h in G, f induces a bijection between the nodes of h and some h’ in G’
3. P(x) = P’(f(x)) for all x in V \ Z
4. U(x) = u’(f(x)) for all x in Z
5. p(h,a) = p’(f(h), f(a)) for all h in H0
Restricted game isomorphic abstraction transformation
• The restricted game Gx is obtained from G by removing all nodes except x and its descendants.
• (Gx,Gy) is contractible within G if1. x and y are in the same information set2. Every node in that information set has the same parent, and
the parent is either in a singleton information set or a chance node
3. Gx and Gy are extensive game isomorphic
• For (Gx,Gy) contractible, the restricted game isomorphic abstraction transformation is the game where Gx and Gy are “merged”
Restricted game isomorphicabstraction transformation: example
Restricted game isomorphicabstraction transformation: example
Restricted game isomorphicabstraction transformation: example
Main equilibrium result
• Thm. Let G be a sequential game with observable actions, let G’ be obtained by one application of the restricted game isomorphic abstraction transformation, and let s’ be a Nash equilibrium for G’. Then the corresponding s for G is a Nash equilibrium.
Computing ExtensiveGameIsomorphic?(x,y)
1. If x and y both leaves, return u(x) == u(y)
2. If x and y have different number of children, or if a different player controls them, return false
3. Construct bipartite graph Gx,y (see next slide).
4. Return true if Gx,y has a perfect matching; otherwise return false.
Constructing Gx,y
• Each vertex corresponds to an information set containing a child node.
• Edges connect information sets where there exists a bijection between extensive game isomorphic vertices (extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a child node.
• Edges connect information sets where there exists a bijection between extensive game isomorphic vertices (extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a child node.
• Edges connect information sets where there exists a bijection between extensive game isomorphic vertices (extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a child node.
• Edges connect information sets where there exists a bijection between extensive game isomorphic vertices (extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a child node.
• Edges connect information sets where there exists a bijection between extensive game isomorphic vertices (extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a child node.
• Edges connect information sets where there exists a bijection between extensive game isomorphic vertices (extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a child node.
• Edges connect information sets where there exists a bijection between extensive game isomorphic vertices (extensive game isomorphic information sets)
GameShrink: Efficiently computing restricted game isomorphic abstraction transformations
1. Bottom-up pass: Compute the ExtensiveGameIsomorphic relation for each pair of equal depth nodes.
2. Top-down pass: For i from 0 to height(G):• For each information set h at level i whose
nodes share a common parent:• Apply the restricted game isomorphic abstraction
transformation to each applicable x and y in h
Enhancements
• Disjoint-set data structure for storing isomorphisms
• Implicit enumeration of game tree nodes
• Necessary conditions for extensive game isomorphism
• Payoff histogram database
Application to poker• Theorem. In poker, can compute
isomorphisms only considering card tree.
J1 J2
J2 J1 J1 J2
K
KK
0 -1 -1 0 1 1
Rhode Island Hold’em
• Invented as a testbed for AI research [Shi & Littman 2001]
• More than 3.1 billion game tree nodes• Applying sequence form:
– LP has 91 million rows and columns
• Applying GameShrink:– LP has 1.2 million rows and columns– Solvable in about 1 week– GameShrink itself takes less than 1 second, the LP
solve still dominates
Future poker research
• More difficult games– Multi-player
• LP only handles two players• Possible mapping of n-player strategy to (n+1)- player
strategy
– Tournament• Size of bankroll changes aggressiveness of players
• Maximally vs. Optimally– Opponent modeling