Enhancements for Multi-Player Monte-Carlo Tree Search
J. (Pim) A.M. NijssenMark H.M. Winands
29 September 2010
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 2
Overview• Introduction• Progressive History• MP-MCTS-Solver• Test domains• Experiments and Results• Conclusions• Future Research
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 3
Introduction• Enhancements for Multi-Player Monte-
Carlo Tree Search– More than 2 players– Techniques
• maxn (Luckhardt and Irani, 1986)• Paranoid (Sturtevant and Korf, 2000)
– Games• Chinese Checkers• Hearts
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 4
Introduction• Enhancements for Multi-Player Monte-
Carlo Tree Search– Best-first search technique– Monte Carlo simulations– Four phases
• Selection (UCT)• Expansion (1 node per sample)• Playout (ε-greedy)• Backpropagation
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 5
Introduction• Enhancements for Multi-Player Monte-
Carlo Tree Search– Stores tuple of size N in nodes– Game returns tuple of size N
• Winner gets a score of 1, losers get a score of 0• Score is split in case of multiple winners
– e.g. [½, ½, 0] is returned if Players 1 and 2 both win
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 6
Introduction• Enhancements for Multi-Player Monte-
Carlo Tree Search– Progressive History– Multi-Player Monte-Carlo Tree Search Solver
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 7
Progressive History• Combination of Progressive Bias (Chaslot
et al., 2008) and the history heuristic (Schaeffer, 1983)
• Move selection strategy uses action information
• More information available• Information is less accurate• Influence decreases over time
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 8
Progressive History
1)ln(
+−×+×+=
iia
a
i
p
i
ii sn
Wns
nn
Cnsv
History heuristic Progressive Bias
Divide by number of losses
UCT
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 9
MP-MCTS-Solver• Multi-Player version of MCTS-Solver
(Winands et al., 2008)• Updating game-theoretical values• Update rules
– Standard (mate in one, one winner)– Paranoid– First winner
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 10
MP-MCTS-Solver
A
B C D
E F G H I
Player 3
Player 1
[0,1,0][…]
[1,0,0]
[0,1,0]
[1,0,0]
[1,0,0]
[0,1,0]
[0,1,0]
[?] Paranoid [0,1,0]
[1,0,0]First winner
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 11
Test domains• Multi-player games• Zero-sum• Perfect information
• Focus• Chinese Checkers
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 12
Focus• Capturing pieces
by creating stacks• Goal
– Total number of pieces captured
– Number of pieces captured from each opponent
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 13
Focus• Moving
– Only stacks one owns– Orthogonally– Move as many squares
as the number of pieces
– Maximum stack size is 5
• Capture pieces by creating larger stacks
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 14
Chinese Checkers• Goal: move pieces to
other side of the board
• Move pieces to adjacent fields or jump over other pieces– Sequential jumps
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 15
Experiments and Results• Processor: AMD64 2.4 GHz• Programming language: Java 6
• MCTS settings: C = 0.2, ε = 0.05
• Time: 2.5s per turn• 3360 games per tournament• All possible configurations
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 16
Experiments and Results• Progressive History in Focus
W 2 players 3 players 4 players
0 52.0% 51.2% 50.8%
0.5 59.0% 61.1% 57.5%
0.1 59.8% 63.0% 58.9%
0.25 61.3% 62.9% 59.4%
0.5 64.1% 65.5% 59.9%
1 66.0% 65.4% 58.2%
3 62.2% 65.2% 59.6%
5 57.9% 63.8% 59.6%
7.5 51.3% 60.6% 57.1%
10 47.4% 57.8% 56.9%
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 17
Experiments and Results• Progressive History in Chinese Checkers
W 2 players 3 players 4 players
0.25 52.8% 59.0% 56.9%
0.5 58.2% 62.8% 58.3%
1 67.8% 63.5% 61.9%
3 79.9% 66.7% 66.4%
5 83.5% 65.8% 66.8%
10 83.2% 65.3% 69.6%
15 81.0% 65.0% 69.2%
20 60.8% 60.2% 63.2%
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 18
Experiments and Results• Divide by number of losses
Game 2 players 3 players 4 players
Focus 64.8% 61.0% 52.0%
Chinese Checkers 57.6% 54.8% 53.9%
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 19
Experiments and Results• MP-MCTS-Solver in Focus
Update rule 2 players 3 players 4 players
Standard 53.0% 54.9% 53.3%
Paranoid 51.9% 50.4% 44.9%
First winner 52.8% 51.5% 43.4%
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 20
Conclusions• Progressive history
– Significant enhancement in Chinese Checkers and Focus
– Dividing by number of losses in Progressive Bias part increases performance
• MP-MCTS-Solver– Small but significant enhancement in Chinese
Checkers– Standard update rule works best
5 October 2010 Enhancements for Multi-Player Monte-Carlo Tree Search 21
Future Research• Test Progressive History in other games• Compare Progressive History with similar
techniques, like RAVE, prior knowledge (Gelly and Silver, 2007), Gibbs Sampling (Björnsson and Finnsson, 2009), etc.
• Create new update rules for MP-MCTS-Solver