+ All Categories
Home > Documents > Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence...

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence...

Date post: 15-Jan-2016
Category:
Upload: shaylee-hosford
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
39
Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex
Transcript

Mapello: Othello on Game Maps Some initial explorations

Monte Carlo Tree Search:Insights and ApplicationsBCS Real AI EventSimon LucasGame Intelligence GroupUniversity of Essex

OutlineGeneral machine intelligence: the ingredientsMonte Carlo Tree SearchA quick overview and tutorialExample application: MapelloNote: Game AI is Real AI !!!Example test problem: Physical TSPResults of open competitionsChallenges and future directionsGeneral Machine Intelligence: the ingredientsEvolutionReinforcement LearningFunction approximationNeural nets, N-Tuples etcSelective search / Sample based planning / Monte Carlo Tree Search

Darwin, Pavlov, and Skinner3Conventional Game Tree SearchMinimax with alpha-beta pruning, transposition tablesWorks well when:A good heuristic value function is knownThe branching factor is modestE.g. Chess: Deep Blue, RybkaSuper-human on a smartphone!Tree grows exponentially with search depth

GoMuch tougher for computersHigh branching factorNo good heuristic value function

MCTS to the rescue!

Although progress has been steady, it will take many decades of research and development before world-championshipcalibre go programs exist. Jonathan Schaeffer, 2001Monte Carlo Tree Search (MCTS) Upper Confidence bounds for Trees (UCT)

Further reading:

Attractive FeaturesAnytimeScalableTackle complex games and planning problems better than beforeMay be logarithmically better with increased CPUNo need for heuristic functionThough usually better with oneNext well look at:General MCTSUCT in particularNote: this slide may be unnecessary might be covered earlier7MCTS: the main ideaTree policy: choose which node to expand (not necessarily leaf of tree)Default (simulation) policy: random playout until end of game

MCTS AlgorithmDecompose into 6 parts:MCTS main algorithmTree policy ExpandBest Child (UCT Formula)Default PolicyBack-propagateWell run through these then show demosMCTS Main AlgorithmBestChild simply picks best child node of root according to some criteria: e.g. best mean valueIn our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be usedE.g. final selection can be the max value child or the most frequently visited one

Check this point about BestChild10TreePolicyNote that node selected for expansion does not need to be a leaf of the treeBut it must have at least one untried action

Expand

Best Child (UCT)This is the standard UCT equationUsed in the treeHigher values of c lead to more explorationOther terms can be added, and usually areMore on this later

DefaultPolicyEach time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reachedThe standard is to do this uniformly randomlyBut better performance may be obtained by biasing with knowledge

BackupNote that v is the new node added to the tree by the tree policyBack up the values from the added node up the tree to the root

MCTS Builds Asymmetric Trees (demo)

All Moves As First (AMAF),Rapid Value Action Estimates (RAVE)Additional term in UCT equation:Treat actions / moves the same independently of where they occur in the move sequence

Using for a new problem:Implement the State interface

Example Application: MapelloOthelloEach move you must Pincer one or more opponent counters between the one you place and an existing one of your colourPincered counters are flipped to your own colourWinner is player with most pieces at the end

Basics of Good Game DesignSimple rulesBalanceSense of dramaOutcome should not be obvious

Othello Example white leads: -58(from http://radagast.se/othello/Help/strategy.html )

Black wins with score of 16

MapelloTake the counter-flipping drama of OthelloApply it to novel situationsObstaclesPower-ups (e.g. triple square score)Large maps with power-plays e.g. line fillNovel gamesAllow users to design maps that they are expert inThe map design is part of the gameResearch bonus: large set of games to experiment withExample Initial Maps

Or how about this?

Need Rapidly Smart AIGive players a challenging gameEven when the game map can be new each timeObvious easy to apply approachesTD Learning Monte Carlo Tree Search (MCTSCombinations of these E.g. Silver et al, ICML 2008Robles et al, CIG 2011MCTS (see Browne et al, TCIAIG 2012)Simple algorithmAnytimeNo need for a heuristic value functionE-E balanceWorks well across a range of problems

DemoTDL learns reasonable weights rapidlyHow well will this play at 1 ply versus limited toll-out MCTS?

For Strong Play Combine MCTS, TDL, N-Tuples

Where to play / buyComing to Android (November 2012)Nestorgames (http://www.nestorgames.com)

MCTS in Real-Time Games: PTSPHard to get long-term planning without good heuristics

Optimal TSP order != PTSP Order36

MCTS: Challenges and Future DirectionsBetter handling of problems with continuous action spacesSome work already done on thisBetter understanding of handling real-time problemsUse of approximations and macro-actionsStochastic and partially observable problems / games of incomplete and imperfect informationHybridisation: with evolution with other tree search algorithmsConclusionsMCTS: a major new approach to AIWorks well across a range of problemsGood performance even with vanilla UCTBest performance requires tuning and heuristicsSometimes the UCT formula is modified or discardedCan be used in conjunction with RLSelf tuningAnd with evolutionE.g. evolving macro-actionsFurther reading and linkshttp://ptsp-game.net/http://www.pacman-vs-ghosts.net/


Recommended