+ All Categories
Transcript
Page 1: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Online Knowledge Enhancements for Monte CarloTree Search in Probabilistic PlanningBachelor presentation

Marcel Neidinger <[email protected]>

Department of Mathematics and Computer Science,University of Basel

13. February 2017

Page 2: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

What is Probabilistic Planning?

Solve planning tasks with probabilistic transitionsModels a Markov Decision Problem given byM = ⟨V, s0, A, T,R⟩

A set of binary variables V inducing States S = 2V

An initial state s0 ∈ SA set of applicable actions AA transition model T : S ×A× S → [0; 1]A Reward R(s, a)

Monte Carlo Tree Search algorithms solve MDPs

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 2 / 33

Page 3: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Monte Carlo Tree Search Algorithms

Algorithmic framework to solve MDPsUsed especially in computer Go

Go Board1 Lee Sedol2

1Source: https://commons.wikimedia.org/wiki/File:Go_board.jpg2Source: https://qz.com/639952/googles-ai-won-the-game-go-by-defying-

millennia-of-basic-human-instinct/Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 3 / 33

Page 4: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Four phases - Two components

Selection Expansion Simulation

e

Simulation

Backpropagation

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 4 / 33

Page 5: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Monte Carlo Tree node

MCTS tree for a MDP M

Important information in a tree nodeA state s ∈ SA counter N (i) for the number of visitsA counter N (i)(s, a)∀a ∈ A for the number of times a was selectedin sA reward estimate Q(i)(s, a) for action a in state s

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 5 / 33

Page 6: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Online Knowledge

AlphaGo used Neural Networks for the two policis →Domain-specific knowledgeWe want domain independent enhancements

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 6 / 33

Page 7: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Overview

Tree-Policy EnhancementsAll Moves as First

α-AMAFCutoff-AMAF

Rapid Action Value Estimation

Default-Policy EnhancementsMove-Average Sampling Technique

Conclusion

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 7 / 33

Page 8: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

What is a Tree Policy?

Iterate through the known part of the tree and select an actiongiven a nodeUse a Q value for a state-action pair to estimate an actionsreward

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 8 / 33

Page 9: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

UCT

MCTS implementation first proposed in 2006

m m′m′

m′m′′

Reward: 10

s1

s2 s3

s4s5

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 9 / 33

Page 10: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

UCT

Reward approximation, parent node vl, child node vj

UCT (vl, vj) = Q(i)(sl, aj) + 2Cp

√2 lnN (i)(sl)

N (i+1)(sj)(1)

From parent vl select child node v∗ that maximises

v∗ = maxvj

{UCT (nl, nj)} (2)

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 10 / 33

Page 11: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - Idea

UCT score needs several trials to become reliableIdea: Generalize informations extracted from trialsImplementation: Use additional (node-independant) scorethat updates unselected actions as well

m m′m′

m′m′′

Reward: 10

s1

s2 s3

s4s5

State Action Rewards1 m …

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 11 / 33

Page 12: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - α-AMAF

Idea: Combine UCT and AMAF score

SCR = αAMAF + (1− α)UCT (3)

Choose action with highest SCR

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 12 / 33

Page 13: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - α-AMAF - Results

0

0.2

0.4

0.6

0.8

1

wildfire

triangle

academic

elevators

tamarisk

sysadmin

recongame

trafficcrossing

skillnavigation

total

IPPC

score

Domain

AMAF(α = 0)AMAF(α = 0.2)AMAF(α = 0.4)

AMAF(α = 0.6)AMAF(α = 0.8)AMAF(α = 1.0)

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 13 / 33

Page 14: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - α-AMAF - Problems

With more trials UCT becomes more reliableAMAF score has higher variance

We want to discontinue using AMAF score aftersome time

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 14 / 33

Page 15: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - α-AMAF - Problems

With more trials UCT becomes more reliableAMAF score has higher variance

We want to discontinue using AMAF score aftersome time

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 14 / 33

Page 16: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - Cutoff-AMAF

Introduce cutoff parameter K

SCR =

{αAMAF + (1− α)UCT , for i ≤ k

UCT ,else(4)

Use AMAF score only in the first k trials

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 15 / 33

Page 17: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - Cutoff-AMAF - Results

0.5

0.55

0.6

0.65

0.7

0.75

0 10 20 30 40 50

TotalIPPCscore

K value

init: IDS, backup: MCRaw UCT

Plain α-AMAF

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 16 / 33

Page 18: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - Cutoff-AMAF - Problems

How to choose the parameter K?When is the UCT score reliable enough?

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 17 / 33

Page 19: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Rapid Actio Value Estimation - Idea

First introduced in 2007 for computer goUse soft cutoff

α = max

{0,

V − v(n)

V

}(5)

Use UCT for often visited nodes and AMAF score forless-visited

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 18 / 33

Page 20: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Rapid Action Value Estimation - Results

0

0.2

0.4

0.6

0.8

1

wildfire

triangle

academic

elevators

tamarisk

sysadmin

recongame

trafficcrossing

skillnavigation

total

IPPC

score

Domain

UCTRAVE(5)

RAVE(15)RAVE(25)

RAVE(50)

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 19 / 33

Page 21: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - Conclusion

0

0.2

0.4

0.6

0.8

1

wildfire

triangle

academic

elevators

tamarisk

sysadmin

recongame

trafficcrossing

skillnavigation

total

IPPC

score

Domain

UCTRAVE(25)

AMAF(α = 0.2)

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 20 / 33

Page 22: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Rapid Action Value Estimation - Problems

PROST uses problem description with conditional effectsAlso no preconditions givenPROST description is more general

PlayerGoal fieldMovepath

In PROST:

Action: move_upIn e.g. computer chess

Action: move_a2_to_a3

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 21 / 33

Page 23: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Predicate Rapid Action Value Estimation

A state has predicates that give some contextIdea Use predicates to find similar states and use their score

QPRAV E(s, a) =1

N

∑p∈P

QRAV E(p, a) (6)

and weight with

α =

{0,

V − v(n)

V

}(7)

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 22 / 33

Page 24: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

All Moves as First - Conclusion - Revisited

0

0.2

0.4

0.6

0.8

1

wildfire

triangle

academic

elevators

tamarisk

sysadmin

recongame

trafficcrossing

skillnavigation

total

IPPC

score

Domain

UCTPRAVE

RAVE(25)AMAF(α = 0.2)

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 23 / 33

Page 25: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Overview

Tree-Policy EnhancementsAll Moves as First

α-AMAFCutoff-AMAF

Rapid Action Value Estimation

Default-Policy EnhancementsMove-Average Sampling Technique

Conclusion

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 24 / 33

Page 26: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

What is a Default Policy?

e

Simulation

Simulate the outcome of a trialBasic default policy: random walk

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 25 / 33

Page 27: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

X-Average Sampling Technique

Use tree knowledge to bias default policy towards moves thatare more goal-oriented

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 26 / 33

Page 28: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Move-Average Sampling Technique - Idea -Sample Game

PlayerGoal fieldMovepath

Introduce Q(a)

Use moves that aregood on averageChoose actionaccording to:

P (a) =e

Q(a)τ∑

b∈Ae

Q(b)τ

(8)

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 27 / 33

Page 29: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Move-Average Sampling Technique - Idea -Example

Actions: r,r,u,u,uQ(r) = 1;N(r) = 2Q(u) = 6;N(u) = 3

Actions: r,r,u,l,lQ(r) = 2;N(r) = 4Q(u) = 7;N(u) = 4Q(l) = 3;N(l) = 2

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 28 / 33

Page 30: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Move-Average Sampling Technique - Idea -Example (2)

Actions: l,u,u,r,rQ(r) = 7;N(r) = 6Q(u) = 8;N(u) = 6Q(l) = 2;N(l) = 3

Actions: r,r,r,u,uQ(r) = 7;N(r) = 9Q(u) = 9;N(u) = 8Q(l) = 2;N(l) = 3

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 29 / 33

Page 31: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Move-Average Sampling Technique - Results

0

0.1

0.2

0.3

0.4

0.5

wildfire

triangle

academic

elevators

tamarisk

sysadmin

recongam

etraffic

crossing

skillnavigation

total

IPPCscore

Domain

UCT(RandomWalk) UCT(MAST)

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 30 / 33

Page 32: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Overview

Tree-Policy EnhancementsAll Moves as First

α-AMAFCutoff-AMAF

Rapid Action Value Estimation

Default-Policy EnhancementsMove-Average Sampling Technique

Conclusion

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 31 / 33

Page 33: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Conclusion

Tree-policy enhancementsα-AMAF and RAVE performe worse than standard UCTPRAVE performs slightly better but still worse than standard UCT

Default-policy enhancementsMAST outperforms RandomWalk

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 32 / 33

Page 34: Online Knowledge Enhancements for Monte Carlo Tree Search ... · Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Author:

Questions?

[email protected]


Top Related