+ All Categories
Home > Documents > Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search:...

Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search:...

Date post: 12-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
8
Foundations of Artificial Intelligence 44. Monte-Carlo Tree Search: Introduction Thomas Keller Universit¨ at Basel May 27, 2016 T. Keller (Universit¨ at Basel) Foundations of Artificial Intelligence May 27, 2016 1 / 29 Foundations of Artificial Intelligence May 27, 2016 — 44. Monte-Carlo Tree Search: Introduction 44.1 Introduction 44.2 Monte-Carlo Methods 44.3 Sparse Sampling 44.4 MCTS 44.5 Summary T. Keller (Universit¨ at Basel) Foundations of Artificial Intelligence May 27, 2016 2 / 29 Board Games: Overview chapter overview: I 41. Introduction and State of the Art I 42. Minimax Search and Evaluation Functions I 43. Alpha-Beta Search I 44. Monte-Carlo Tree Search: Introduction I 45. Monte-Carlo Tree Search: Advanced Topics I 46. AlphaGo and Outlook T. Keller (Universit¨ at Basel) Foundations of Artificial Intelligence May 27, 2016 3 / 29 44. Monte-Carlo Tree Search: Introduction Introduction 44.1 Introduction T. Keller (Universit¨ at Basel) Foundations of Artificial Intelligence May 27, 2016 4 / 29
Transcript
Page 1: Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search: Introduction MCTS Monte-Carlo Tree Search Expansion: create a node for rst statebeyond

Foundations of Artificial Intelligence44. Monte-Carlo Tree Search: Introduction

Thomas Keller

Universitat Basel

May 27, 2016

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 1 / 29

Foundations of Artificial IntelligenceMay 27, 2016 — 44. Monte-Carlo Tree Search: Introduction

44.1 Introduction

44.2 Monte-Carlo Methods

44.3 Sparse Sampling

44.4 MCTS

44.5 Summary

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 2 / 29

Board Games: Overview

chapter overview:

I 41. Introduction and State of the Art

I 42. Minimax Search and Evaluation Functions

I 43. Alpha-Beta Search

I 44. Monte-Carlo Tree Search: Introduction

I 45. Monte-Carlo Tree Search: Advanced Topics

I 46. AlphaGo and Outlook

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 3 / 29

44. Monte-Carlo Tree Search: Introduction Introduction

44.1 Introduction

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 4 / 29

Page 2: Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search: Introduction MCTS Monte-Carlo Tree Search Expansion: create a node for rst statebeyond

44. Monte-Carlo Tree Search: Introduction Introduction

Monte-Carlo Tree Search: Brief History

I Starting in the 1930s: first researchers experiment withMonte-Carlo methods

I 1998: Ginsberg’s GIB player competes with expert Bridgeplayers this chapter

I 2002: Kearns et al. propose Sparse Sampling this chapter

I 2002: Auer et al. present UCB1 action selection formulti-armed bandits Chapter 45

I 2006: Coulom coins the term Monte-Carlo Tree Search(MCTS) this chapter

I 2006: Kocsis and Szepesvari combine UCB1 and MCTS tothe most famous MCTS variant, UCT Chapter 45

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 5 / 29

44. Monte-Carlo Tree Search: Introduction Introduction

Monte-Carlo Tree Search: Applications

Examples for successful applications of MCTS in games:

I board games (e.g., Go Chapter 46)

I card games (e.g., Poker)

I AI for computer games(e.g., for Real-Time Strategy Games or Civilization)

I Story Generation(e.g., for dynamic dialogue generation in computer games)

I General Game Playing

Also many applications in other areas, e.g.,

I MDPs (planning with stochastic effects) or

I POMDPs (MDPs with partial observability)

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 6 / 29

44. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

44.2 Monte-Carlo Methods

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 7 / 29

44. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Monte-Carlo Methods: Idea

I summarize a broad family of algorithms

I decisions are based on random samples

I results of samples are aggregated by computing the average

I apart from that, algorithms can differ significantly

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 8 / 29

Page 3: Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search: Introduction MCTS Monte-Carlo Tree Search Expansion: create a node for rst statebeyond

44. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Monte-Carlo Methods: Example

Bridge Player GIB, based on Hindsight Optimization (HOP)

I perform samples as long as resources (deliberation time,memory) allow:

I sample hand for all players that is consistent with currentknowledge about the game state

I for each legal action, compute if perfect information gamethat starts with executing that action is won or lost

I compute win percentage for each action over all samples

I play the card with the highest win percentage

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 9 / 29

44. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Hindsight Optimization: Example

0%

100%

0%

50%

100%

0%

67%

100%

33%

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 10 / 29

44. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Hindsight Optimization: Restrictions

I HOP well-suited for imperfect information games like mostcard games (Bridge, Skat, Klondike Solitaire)

I must be possible to solve or approximate sampled gameefficiently

I often not optimal even if provided with infinite resources

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 11 / 29

44. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Hindsight Optimization: Suboptimality

gamble safe

hit

miss

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 12 / 29

Page 4: Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search: Introduction MCTS Monte-Carlo Tree Search Expansion: create a node for rst statebeyond

44. Monte-Carlo Tree Search: Introduction Sparse Sampling

44.3 Sparse Sampling

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 13 / 29

44. Monte-Carlo Tree Search: Introduction Sparse Sampling

Reminder: Minimax for Games

Minimax: alternate maximization and minimization

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 14 / 29

44. Monte-Carlo Tree Search: Introduction Sparse Sampling

Excursion: Expectimax for MDPs

Expectimax: alternate maximization and expectation(expectation = probability weighted sum)

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 15 / 29

44. Monte-Carlo Tree Search: Introduction Sparse Sampling

Sparse Sampling: Idea

I search tree creation: sample a constant number of outcomesaccording to their probability in each state and ignore the rest

I update values by replacing probability weighted updates withaverage

I near-optimal: utility of resulting policy close to utility ofoptimal policy

I runtime independent from the number of states

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 16 / 29

Page 5: Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search: Introduction MCTS Monte-Carlo Tree Search Expansion: create a node for rst statebeyond

44. Monte-Carlo Tree Search: Introduction Sparse Sampling

Sparse Sampling: Search Tree

Without Sparse SamplingWith Sparse Sampling

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 17 / 29

44. Monte-Carlo Tree Search: Introduction Sparse Sampling

Sparse Sampling: Problems

I independent from number of states, but still exponential inlookahead horizon

I constant that gives the number of outcomes large for goodbounds on near-optimality

I search time difficult to predict

I tree is symmetric ⇒ resources are wasted in non-promisingparts of the tree

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 18 / 29

44. Monte-Carlo Tree Search: Introduction MCTS

44.4 MCTS

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 19 / 29

44. Monte-Carlo Tree Search: Introduction MCTS

Monte-Carlo Tree Search: Idea

I perform iterations as long as resources (deliberation time,memory) allow:

I builds a search tree of nodes n with annotatedI utility estimate Q(n)I visit counter N(n)

I initially, the tree contains only the root node

I execute the action that leads to the node with the highestutility estimate

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 20 / 29

Page 6: Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search: Introduction MCTS Monte-Carlo Tree Search Expansion: create a node for rst statebeyond

44. Monte-Carlo Tree Search: Introduction MCTS

Monte-Carlo Tree Search: Iterations

Each iteration consist of four phases:

I selection: traverse the tree by applying tree policy

I expansion: add to the tree the first visited state that is not inthe tree

I simulation: continue by applying default policy until terminalstate is reached (which yields utility of current iteration)

I backpropagation: for all visited nodes n,I increase N(n)I extend the current average Q(n) with yielded utility

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 21 / 29

44. Monte-Carlo Tree Search: Introduction MCTS

Monte-Carlo Tree Search

Selection: apply tree policy to traverse tree

111113

12 5 1414 4 6 1 7 3

41

81

182

18182

21

51

61

12 1 16 1

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 22 / 29

44. Monte-Carlo Tree Search: Introduction MCTS

Monte-Carlo Tree Search

Expansion: create a node for first state beyond the tree

1113

12 5 14 4 6 1 7 3

41

81

182

182

21

51

61

12 1 0 0 16 1

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 23 / 29

44. Monte-Carlo Tree Search: Introduction MCTS

Monte-Carlo Tree Search

Simulation: apply default policy until terminal state is reached

1113

12 5 14 4 6 1 7 3

41

81

182

182

21

51

61

12 1 0 0 16 1

39

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 24 / 29

Page 7: Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search: Introduction MCTS Monte-Carlo Tree Search Expansion: create a node for rst statebeyond

44. Monte-Carlo Tree Search: Introduction MCTS

Monte-Carlo Tree Search

Backpropagation: update utility estimates of visited nodes

1113

1314

12 5 14 419 519 5 6 1 7 3

41

81

182

182

253

253

21

51

61

12 1 3939 1 16 1

39

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 25 / 29

44. Monte-Carlo Tree Search: Introduction MCTS

Monte-Carlo Tree Search: Pseudo-Code

Monte-Carlo Tree Search

tree := new SearchTreen0 = tree.add root node()while time allows():

visit node(tree, n0)n? = arg maxn∈succ(n0) Q(n)return n?.get action()

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 26 / 29

44. Monte-Carlo Tree Search: Introduction MCTS

Monte-Carlo Tree Search: Pseudo-Code

function visit node(tree, n)

if is final(n.state):return u(n.state)

s = tree.get unvisited successor(n)if s 6= none:

n′ = tree.add child node(n, s)utility = apply default policy()backup(n′, utility)

else:n′ = apply tree policy(n)utility = visit node(tree, n′)

backup(n, utility)return utility

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 27 / 29

44. Monte-Carlo Tree Search: Introduction Summary

44.5 Summary

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 28 / 29

Page 8: Foundations of Artificial Intelligence - Monte-Carlo Tree ... · 44. Monte-Carlo Tree Search: Introduction MCTS Monte-Carlo Tree Search Expansion: create a node for rst statebeyond

44. Monte-Carlo Tree Search: Introduction Summary

Summary

I Simple Monte-Carlo methods like Hindsight Optimizationperform well in some games, but are suboptimal even withunbound resources

I Sparse Sampling allows near-optimal solutions independent ofthe state size, but it wastes time in non-promising parts of thetree

I Monte-Carlo Tree Search algorithms iteratively build a searchtree. Algorithms are specified in terms of a tree policy and adefault policy.(We analyze its theoretical properties in the next chapter)

T. Keller (Universitat Basel) Foundations of Artificial Intelligence May 27, 2016 29 / 29


Recommended