+ All Categories
Home > Documents > CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Date post: 18-Jan-2018
Category:
Upload: mark-mccoy
View: 215 times
Download: 0 times
Share this document with a friend
Description:
Project 2 Contest Results  Naïve Bayes  Runners-up: Chris Crutchfield and Wei Tu (83%)  Number of curves in the image  Ratio of height to width  Runners-up: Danny Guan and Daniel Low (83%)  Percentage of active pixels  Maximum contiguous active pixels per row  Winners: Taylor Berg-Kirkpatrick and Fenna Krienen (84%)  Color changes across rows and columns
25
CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley
Transcript
Page 1: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

CS 188: Artificial IntelligenceSpring 2006

Lecture 23: Games4/18/2006

Dan Klein – UC Berkeley

Page 2: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Game Playing in Practice Checkers: Chinook ended 40-year-reign of human world champion

Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Exact solution imminent.

Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply.

Othello: human champions refuse to compete against computers, who are too good.

Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

Page 3: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Game Playing

Axes: Deterministic or not Number of players Perfect information or not

Want algorithms for calculating a strategy (policy) which recommends a move in each state

Page 4: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Deterministic Single Player? Deterministic, single player,

perfect information: Know the rules Know what moves will do Have some utility function over

outcomes E.g. Freecell, 8-Puzzle, Rubik’s

cube

… it’s (basically) just search!

Slight reinterpretation: Calculate best utility from each

node Each node is a max over children Note that goal values are on the

goal, not path sums as before

8 2 5 6

Page 5: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Stochastic Single Player What if we don’t know what the

result of an action will be? E.g. solitaire, minesweeper,

trying to drive home

… just an MDP!

Can also do expectimax search Chance nodes, like actions

except the environment controls the action chosen

Calculate utility for each node Max nodes as in search Chance nodes take

expectations of children

8 2 5 6

Page 6: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Deterministic Two Player (Turns) E.g. tic-tac-toe

Minimax search Basically, a state-space search tree Each layer, or ply, alternates players Choose move to position with highest

minimax value = best achievable utility against best play

Zero-sum games One player maximizes result The other minimizes result 8 2 5 6

Page 7: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.
Page 8: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Minimax Example

Page 9: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Minimax Search

Page 10: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Minimax Properties Optimal against a perfect player. Otherwise?

Time complexity? O(bm)

Space complexity? O(bm)

For chess, b 35, m 100 Exact solution is completely infeasible But, do we need to explore the whole tree?

Page 11: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Multi-Player Games Similar to

minimax: Utilities are

now tuples Each player

maximizes their own entry at each node

Propagate (or back up) nodes from children

1,2,6 4,3,2 6,1,2 7,4,1 5,1,1 1,5,2 7,7,1 5,4,5

Page 12: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Games with Chance

E.g. backgammon Expectiminimax search!

Environment is an extra player than moves after each agent

Chance nodes take expectations, otherwise like minimax

Page 13: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Games with Chance Dice rolls increase b: 21 possible rolls

with 2 dice Backgammon 20 legal moves Depth 4 = 20 x (21 x 20)3 1.2 x 109

As depth increases, probability of reaching a given node shrinks So value of lookahead is diminished So limiting depth is less damaging But pruning is less possible…

TDGammon uses depth-2 search + very good eval function + reinforcement learning: world-champion level play

Page 14: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Games with Hidden Information Imperfect information:

E.g., card games, where opponent's initial cards are unknown Typically we can calculate a probability for each possible deal Seems just like having one big dice roll at the beginning of the game

Idea: compute the minimax value of each action in each deal, then choose the action with highest expected value over all deals Special case: if an action is optimal for all deals, it's optimal. GIB, current best bridge program, approximates this idea by

1) generating 100 deals consistent with bidding information 2) picking the action that wins most tricks on average

Drawback to this approach? It’s broken! (Though useful in practice)

Page 15: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Averaging over Deals is Broken Road A leads to a small heap of gold pieces Road B leads to a fork:

take the left fork and you'll find a mound of jewels; take the right fork and you'll be run over by a bus.

Road A leads to a small heap of gold pieces Road B leads to a fork:

take the left fork and you'll be run over by a bus; take the right fork and you'll find a mound of jewels.

Road A leads to a small heap of gold pieces Road B leads to a fork:

guess correctly and you'll nd a mound of jewels; guess incorrectly and you'll be run over by a bus.

Page 16: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Efficient Search

Several options:

Pruning: avoid regions of search tree which will never enter into (optimal) play

Limited depth: don’t search very far into the future, approximate utility with a value function (familiar?)

Page 17: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Next Class

More game playing Pruning Limited depth search Connection to reinforcement learning!

Page 18: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

- Pruning Example

Page 19: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.
Page 20: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Q-Learning

Model free, TD learning with Q-functions:

Page 21: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Function Approximation Problem: too slow to learn each state’s utility one by one

Solution: what we learn about one state should generalize to similar states Very much like supervised learning If states are treated entirely independently, we can only learn on

very small state spaces

Page 22: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Discretization Can put states into buckets of various

sizes E.g. can have all angles between 0 and 5

degrees share the same Q estimate Buckets too fine takes a long time to

learn Buckets too coarse learn suboptimal,

often jerky control

Real systems that use discretization usually require clever bucketing schemes Adaptive sizes Tile coding

[DEMOS]

Page 23: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Linear Value Functions Another option: values are linear

functions of features of states (or action-state pairs)

Good if you can describe states well using a few features (e.g. for game playing board evaluations)

Now we only have to learn a few weights rather than a value for each state

0.60

0.70

0.80 0.85

0.65 0.70

0.80

0.90

0.75

0.85

0.95

Page 24: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

TD Updates for Linear Values

Can use TD learning with linear values (Actually it’s just like the perceptron!) Old Q-learning update:

Simply update weights of features in Q(a,s)

Page 25: CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Example: TD for Linear Qs


Recommended