+ All Categories
Home > Documents > chapter06-Game playing-Russel.pdf

chapter06-Game playing-Russel.pdf

Date post: 03-Apr-2018
Category:
Upload: rlfacanha
View: 231 times
Download: 0 times
Share this document with a friend

of 38

Transcript
  • 7/28/2019 chapter06-Game playing-Russel.pdf

    1/38

    Gameplaying

    Chapter6

    Chapter61

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    2/38

    Outline

    Games

    Perfectplay minimaxdecisions

    pruning

    ResourcelimitsandapproximateevaluationGamesofchance

    Gamesofimperfectinformation

    Chapter62

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    3/38

    Gamesvs.searchproblems

    UnpredictableopponentsolutionisastrategyspecifyingamoveforeverypossibleopponentreplyTimelimitsunlikelytofindgoal,mustapproximate

    Planofattack:

    Computerconsiderspossiblelinesofplay(Babbage,1846)

    Algorithmforperfectplay(Zermelo,1912;VonNeumann,1944)

    Finitehorizon,approximateevaluation(Zuse,1945;Wiener,1948;Shannon,1950)Firstchessprogram(Turing,1951)

    Machinelearningtoimproveevaluationaccuracy(Samuel,195257)

    Pruningtoallowdeepersearch(McCarthy,1956)

    Chapter63

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    4/38

    Typesofgames

    deterministicchance

    perfectinformation

    imperfectinformation

    chess,checkers,

    go,othello

    backgammon

    monopoly

    bridge,poker,scrabblenuclearwar

    battleships,blindtictactoe

    Chapter64

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    5/38

    Gametree(2-player,deterministic,turns)

    X X

    X X

    X

    X

    X

    X X

    MAX(X)

    MIN(O)

    XX

    O

    O

    O XO

    O

    OO

    OO O

    MAX(X)

    XO XO XOX

    XX

    X

    X

    XX

    MIN(O)

    XOXXOXXOX

    ............

    ...

    ...

    ...

    TERMINAL

    X X

    10+1 Utility

    Chapter65

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    6/38

    Minimax

    Perfectplayfordeterministic,perfect-informationgames

    Idea:choosemovetopositionwithhighestminimaxvalue =bestachievablepayoffagainstbestplay

    E.g.,2-plygame:

    MAX

    31286 4 21452

    MIN

    3

    A1A3 A2

    A13 A

    12A

    11A

    21A23

    A22

    A33 A

    32A

    31

    322

    Chapter66

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    7/38

    Minimaxalgorithm

    functionMinimax-Decision(state)returnsanaction

    inputs:state,currentstateingame

    returnthe

    ain

    Actions

    (state

    )maximizingMin-Value

    (Result

    (a,

    state))

    functionMax-Value(state)returnsautilityvalue

    ifTerminal-Test(state)thenreturnUtility(state)

    v

    fora,sinSuccessors(state)dovMax(v,Min-Value(s))

    returnv

    functionMin-Value(state)returnsautilityvalue

    ifTerminal-Test(state)thenreturnUtility(state)

    v

    fora,sinSuccessors(state)dovMin(v,Max-Value(s))

    returnv

    Chapter67

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    8/38

    Propertiesofminimax

    Complete??

    Chapter68

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    9/38

    Propertiesofminimax

    Complete??Onlyiftreeisfinite(chesshasspecificrulesforthis).NBafinitestrategycanexisteveninaninfinitetree!

    Optimal??

    Chapter69

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    10/38

    Propertiesofminimax

    Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)

    Optimal??Yes,againstanoptimalopponent.Otherwise??

    Timecomplexity??

    Chapter610

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    11/38

    Propertiesofminimax

    Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)

    Optimal??Yes,againstanoptimalopponent.Otherwise??

    Timecomplexity??O(bm)

    Spacecomplexity??

    Chapter611

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    12/38

    Propertiesofminimax

    Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)

    Optimal??Yes,againstanoptimalopponent.Otherwise??

    Timecomplexity??O(bm)

    Spacecomplexity??O(bm)(depth-firstexploration)

    Forchess,b35,m100forreasonablegames

    exactsolutioncompletelyinfeasible

    Butdoweneedtoexploreeverypath?

    Chapter612

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    13/38

    pruningexample

    MAX

    3128

    MIN3

    3

    Chapter613

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    14/38

    pruningexample

    MAX

    3128

    MIN3

    2

    2

    XX

    3

    Chapter614

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    15/38

    pruningexample

    MAX

    3128

    MIN3

    2

    2

    XX14

    14

    3

    Chapter615

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    16/38

    pruningexample

    MAX

    3128

    MIN3

    2

    2

    XX14

    14

    5

    5

    3

    Chapter616

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    17/38

    pruningexample

    MAX

    3128

    MIN

    3

    3

    2

    2

    XX14

    14

    5

    5

    2

    2

    3

    Chapter617

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    18/38

    Whyisitcalled?

    ..

    ..

    ..

    MAX

    MIN

    MAX

    MINV

    isthebestvalue(tomax)foundsofaroffthecurrentpath

    IfVisworsethan,maxwillavoiditprunethatbranch

    Definesimilarlyformin

    Chapter618

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    19/38

    Thealgorithm

    functionAlpha-Beta-Decision(state)returnsanaction

    returntheainActions(state)maximizingMin-Value(Result(a,state))

    functionMax-Value(state,,)returnsautilityvalue

    inputs:state,currentstateingame

    ,thevalueofthebestalternativeformaxalongthepathtostate

    ,thevalueofthebestalternativeforminalongthepathtostate

    ifTerminal-Test(state)thenreturnUtility(state)

    v

    fora,sinSuccessors(state)do

    vMax(v,Min-Value(s,,))

    ifvthenreturnvMax(,v)

    returnv

    functionMin-Value(state,,)returnsautilityvalue

    sameasMax-Valuebutwithrolesof,reversed

    Chapter619

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    20/38

    Propertiesof

    Pruningdoesnotaffectfinalresult

    Goodmoveorderingimproveseffectivenessofpruning

    Withperfectordering,timecomplexity=O(bm/2

    )doublessolvabledepth

    Asimpleexampleofthevalueofreasoningaboutwhichcomputationsarerelevant(aformofmetareasoning)

    Unfortunately,3550

    isstillimpossible!

    Chapter620

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    21/38

    Resourcelimits

    Standardapproach:

    UseCutoff-TestinsteadofTerminal-Test

    e.g.,depthlimit(perhapsaddquiescencesearch)

    UseEvalinsteadofUtilityi.e.,evaluationfunctionthatestimatesdesirabilityofposition

    Supposewehave100seconds,explore104

    nodes/second10

    6nodespermove35

    8/2

    reachesdepth8prettygoodchessprogram

    Chapter621

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    22/38

    Evaluationfunctions

    Blacktomove

    Whiteslightlybetter

    Whitetomove

    Blackwinning

    Forchess,typicallylinearweightedsumoffeatures

    Eval(s)=w1f1(s)+w2f2(s)+...+wnfn(s)

    e.g.,w1=9with

    f1(s)=(numberofwhitequeens)(numberofblackqueens),etc.

    Chapter622

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    23/38

    Digression:Exactvaluesdontmatter

    MIN

    MAX

    2 1

    1

    4 2

    2

    20

    1

    1400 20

    20

    BehaviourispreservedunderanymonotonictransformationofEval

    Onlytheordermatters:

    payoffindeterministicgamesactsasanordinalutilityfunction

    Chapter623

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    24/38

    Deterministicgamesinpractice

    Checkers:Chinookended40-year-reignofhumanworldchampionMarionTinsleyin1994.Usedanendgamedatabasedefiningperfectplayforallpositionsinvolving8orfewerpiecesontheboard,atotalof443,748,401,247

    positions.

    Chess:DeepBluedefeatedhumanworldchampionGaryKasparovinasix-gamematchin1997.DeepBluesearches200millionpositionspersecond, usesverysophisticatedevaluation,andundisclosedmethodsforextending

    somelinesofsearchupto40ply.

    Othello:humanchampionsrefusetocompeteagainstcomputers,whoaretoogood.

    Go:humanchampionsrefusetocompeteagainstcomputers,whoaretoobad.Ingo,b>300,somostprogramsusepatternknowledgebasestosuggestplausiblemoves.

    Chapter624

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    25/38

    Nondeterministicgames:backgammon

    123456789101112

    242322212019181716151413

    0

    25

    Chapter625

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    26/38

    Nondeterministicgamesingeneral

    Innondeterministicgames,chanceintroducedbydice,card-shuffling

    Simplifiedexamplewithcoin-flipping:

    MIN

    MAX

    2

    CHANCE

    4746052

    2402

    0.50.50.50.5

    31

    Chapter626

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    27/38

    Algorithmfornondeterministicgames

    Expectiminimaxgivesperfectplay

    JustlikeMinimax,exceptwemustalsohandlechancenodes:

    ...

    ifstateisaMaxnodethenreturnthehighestExpectiMinimax-ValueofSuccessors(state)

    ifstateisaMinnodethen

    returnthelowestExpectiMinimax-ValueofSuccessors(state)ifstateisachancenodethen

    returnaverageofExpectiMinimax-ValueofSuccessors(state)...

    Chapter627

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    28/38

    Nondeterministicgamesinpractice

    Dicerollsincreaseb:21possiblerollswith2diceBackgammon20legalmoves(canbe6,000with1-1roll)

    depth4=20(2120)3

    1.2109

    Asdepthincreases,probabilityofreachingagivennodeshrinksvalueoflookaheadisdiminished

    pruningismuchlesseffective

    TDGammonusesdepth-2search+verygoodEvalworld-championlevel

    Chapter628

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    29/38

    Digression:ExactvaluesDOmatter

    DICE

    MIN

    MAX

    22331144

    2314

    .9.1.9.1

    2.11.3

    2020303011400400

    20301400

    .9.1.9.1

    2140.9

    BehaviourispreservedonlybypositivelineartransformationofEval

    HenceEvalshouldbeproportionaltotheexpectedpayoff

    Chapter629

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    30/38

    Gamesofimperfectinformation

    E.g.,cardgames,whereopponentsinitialcardsareunknown

    Typicallywecancalculateaprobabilityforeachpossibledeal

    Seemsjustlikehavingonebigdicerollatthebeginningofthegame

    Idea:computetheminimaxvalueofeachactionineachdeal,thenchoosetheactionwithhighestexpectedvalueoveralldeals

    Specialcase:ifanactionisoptimalforalldeals,itsoptimal.

    GIB,currentbestbridgeprogram,approximatesthisideaby1)generating100dealsconsistentwithbiddinginformation2)pickingtheactionthatwinsmosttricksonaverage

    Chapter630

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    31/38

    Example

    Four-cardbridge/whist/heartshand,Maxtoplayfirst

    8

    92

    6 668766766766767

    429342934234343

    0

    Chapter631

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    32/38

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    33/38

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    34/38

    Commonsenseexample

    RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:

    taketheleftforkandyoullfindamoundofjewels;

    taketherightforkandyoullberunoverbyabus.

    Chapter634

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    35/38

    Commonsenseexample

    RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:

    taketheleftforkandyoullfindamoundofjewels;

    taketherightforkandyoullberunoverbyabus.

    RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:

    taketheleftforkandyoullberunoverbyabus;taketherightforkandyoullfindamoundofjewels.

    Chapter635

    p

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    36/38

    Commonsenseexample

    RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:

    taketheleftforkandyoullfindamoundofjewels;

    taketherightforkandyoullberunoverbyabus.

    RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:

    taketheleftforkandyoullberunoverbyabus;taketherightforkandyoullfindamoundofjewels.

    RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:

    guesscorrectlyandyoullfindamoundofjewels;guessincorrectlyandyoullberunoverbyabus.

    Chapter636

    Chapter637

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    37/38

    Properanalysis

    *IntuitionthatthevalueofanactionistheaverageofitsvaluesinallactualstatesisWRONG

    Withpartialobservability,valueofanactiondependsontheinformationstateorbeliefstatetheagentisin

    Cangenerateandsearchatreeofinformationstates

    LeadstorationalbehaviorssuchasActingtoobtaininformationSignallingtoonespartnerActingrandomlytominimizeinformationdisclosure

    Chapter637

    Chapter638

  • 7/28/2019 chapter06-Game playing-Russel.pdf

    38/38

    Summary

    Gamesarefuntoworkon!(anddangerous)

    TheyillustrateseveralimportantpointsaboutAI

    perfectionisunattainablemustapproximate

    goodideatothinkaboutwhattothinkaboutuncertaintyconstrainstheassignmentofvaluestostates

    optimaldecisionsdependoninformationstate,notrealstate

    GamesaretoAIasgrandprixracingistoautomobiledesign

    Chapter638


Recommended