Lecture 4 - Opponent Modelling

transcript

Making Better Decisions -Opponent Modelling

Monte Carlo in Poker(Recap)

• Yesterday we saw that Monte Carlo could be used to

estimate the expected reward of an action by evaluating the

delayed reward

• We do this by simulating or "rolling out" games to their end

state.

• Assess the amount we won or lost

Game Tree and Monte Carlo

Opponent

Chance

Random Walksin the Game Tree

• When we walk the Game Tree at random, we pick nodes to

follow at random.

• We assume (for now) that this is an unbiased choice

• This means every choice has the same probability of being

chosen

Can We Do Better?

• Random walks are all well and good

• But a uniform distribution across action choices isn't

accurate

‣ Certain situations will make sensible players more likely to use

certain actions

• How can we bring this bias into play in the walk?

Classifying Opponents

• The way we do this is to work out what type of player

someone is.

• We observe them to get a better understanding of how they

operate.

• In Poker and other games, we can use all sorts of statistical

measures to quantify a player's type.

Action Prediction

• Once we know what kind of player someone is, we can flip

things on their head.

• We answered "what is the likelihood this player is type X

given we have seen this type of play"

• We can now answer "what is the likelihood this player will

make action Y given they are of type X"

• Remember from Bayes Theorem last week, these questions

are closely linked

Simple (Human) Classification

• Pro Poker players try to quantify their opponents into one of

several classes based on 3 measures

‣ Voluntarily Put in Pot (VPiP)

‣ Won at Showdown (WSD)

‣ Pre-flop Raise (PFR)

Player Stereotypes

• Players can be

‣ Tight / Loose (how likely they are to play hands)

‣ Passive / Aggressive pre-flop

‣ Passive / Aggressive post-flop

Utilising Stereotypes

• If we can classify players we can use this against them

• For instance, we might discover that passive players can be

chased off by aggressive play

• Or we understand that when a super-conservative player

decides to raise, we need to be careful

• We can build heuristic rule bases around this like we saw

before.

• Or we can be much smarter

Better Classifications

• Humans are getting by on 3 dimensions

• But Poker has waaaay more statistics available than this

• We can make a lot of use of this extra data.

Poker Tracker

• Poker Tracker is a stats package specifically for Poker

• Analyses play at online casinos

• Real-time access to stats about opponents

• Allows players to review hands later

Stats in Poker

• A few slides ago - Poker has many statistics

• Poker Tracker keeps tabs on around 150 metrics

• Some of these are somewhat similar, some relate more to

the games than the players

Problem of Dimensionality

• The problem now is that we have too much information!

• Trying to learn on cluttered data can be problematic,

assuming it works at all.

Dimensionality Reduction

• Somehow we have to reduce the number of dimensions that

our data points are using.

• In many ways, getting the right data into a learning algorithm

is the biggest challenge.

• As much art as it is engineering.

• Two options

‣ Feature selection

‣ Feature extraction

Selection vs Extraction

• In Selection, you pick the dimensions you believe to be most

relevant

‣ The human players did this to get their 3 dimensional

representations

• In Extraction, you come up new dimensions that can

represent your datapoint

Principal Components Analysis

• PCA is a common strategy for this.

• Recasts the dimensions of the datapoint into another set of

"basis vectors".

• Smushes together dimensions that have a strong correlation

‣ Some stats measures are looking at fundamentally the same thing, in

different ways

‣ E.g. Various raise frequency metrics might be treated as a single

“aggression” dimension after PCA

• This was going to be a worked example.

• Honestly, that’s way to painful.

• For N observations in M dimensions X is a matrix

where each column is an observation.

• Calculate the mean and std. dev. for each row in the

matrix (each dimension)

• Calculate the covariance matrix, the amount that

the dimensions vary with respect to each other.

• Calculate the eigenvectors and eigenvalues of the

covariance matrix

‣ The eigenvectors are the new basis vectors of the

reduced-dimension datapoints

‣ The eigenvalues represent how significant the

eigenvector is. Large value = significant

• Pick the most significant K of the eigenvectors.

• Project the original datapoint in X onto the new

basis vectors.

• Honestly, if anyone ever asks you to do this

‣ Get a textbook

‣ Use Matlab

‣ Be really careful because it’s kind of complicated

• It is possible to do it by hand.

‣ I can’t anymore...

• Assuming that you finish the calculations without

mucking up.

‣Or, you find something to work it out for you (Matlab

functions for this exist)

•What you have now is a new datapoint, that is

approximately the same information.

• Recast into fewer dimensions.

‣Note that the dimensions will not make sense

PCA in Action

Clustering Algorithms

• Having performed PCA, we have a much more manageable

set of datapoints, and we’ve eliminated extraneous

dimensions

• Now we need to group them together.

• Clustering algorithms are one approach.

• Tries to find a set of “clusters” of points that are grouped

together.

Clustering

0 7.5 15.0 22.5 30.0

Blue Peter style example - real data is rarely so neat

Clustering

• k-means is one of the most popular algorithms

‣Others exist, fuzzy c-means, FLAME clustering and more

• Pick a value for k

‣ You can play around a bit to find good values or use

some tricks

‣ Accepted “rule of thumb” :

K-Means Algorithm

• Typically, we run the k-means algorithm as an

“iterative refinement” process

‣ Guess at some initial values, keep running the process

round and round until it stabilises

• Randomly assign datapoints to one of the k clusters

• Step 1 - Calculate centroids of the clusters

• Step 2 - Update assignment based on new centroids

• Rinse and repeat 1 and 2 until convergence.27

K-Means Algorithm

• Calculating Centroids of clusters

‣ xj denotes the datapoints being sampled

‣mi(t+1) denotes mean of cluster i at iteration t+1

‣ Si(t) denotes the set of datapoints assigned to cluster i at

iteration t

• Effectively, the average of the datapoints

K-Means Algorithm

• Assigning Datapoints to Clusters

• The set of points Si is all datapoints for which the

centroid of cluster i (mi) is the nearest centroid.

K-Means Worked Example

• Board work

From Classification to Prediction

• Once we have our clusters defined, we know what

datapoints constitute the type of player we are analysing

• We can use this to predict what the player will do

‣We have a collection of “similar” players, we can use

their history.

‣We may be able to use the raw data from the

observations directly.

• In either case, we can use the classification to predict actions

Back to Monte Carlo

• So, back to the game tree.

• We now have an idea of what type of player we are dealing

• We have an idea of what actions the players are going to

take in given situations.

• Can we plug this back into the Monte Carlo simulation?

Informed Walksin the Game Tree

• We talked earlier about Opponent nodes in the game tree

• Specifically, when we hit an Opponent node, we would use a

uniform distribution to randomly pick between the options

available.

• Now, we can bias that distribution towards selecting the

action we expect the player to take.

Does This Work?

• Intuitively, it should

• The more accurate we make the simulation, the

more accurate the results should be.

• Concern is that the prediction process will slow

things down too much

‣Monte Carlo relies on large numbers of samples, if they

take too long, accuracy isn’t helping.

Does This Work?

•We don’t know.

• It’s been proven to aid Monte Carlo for Poker when

‣ All players are treated as a generic “player”

• This is ongoing research right now in SAIG.

• Look for papers next year. :)

What We Do Know

•We’ve previously attempted Machine Learning for

Opponent Modelling.

• Using 32 different statistical measures (reduced

down to 8 significant dimensions by PCA)

• Training data of 700,000 hands of Poker

• Successfully extracted around 28 different player

stereotypes.

The Aim of the Game

•We aren’t going to be able to make an AI that

always wins at Poker

• There’s too much chance involved

‣ Bad hands come up

‣Mis-interpreting players

•What we want to do is make an AI that performs

better than the other players under the same

circumstances37

Evaluation

• Any time we do research we are testing some sort

of scientific hypothesis.

•We need to design experiments to test whether the

hypothesis is true or not

• Science doesn't care if we're right - unbiased. Even if

we're wrong, we have learnt something.

Evaluation

• Consider a pro Poker player

•Will win some games and lose others

‣ In fact, a fundamental rule of good poker play is not even

taking part in about 80% of the games you sit through

• Measuring in terms of a single game doesn't work

‣Need to look at the forest, not the trees

• What counts is how much money the player wins at

the end.39

Measuring the Strength of an AI

• What we need is a measure of how successful a bot

is on average.

• Poker gives a metric for this - Big Blind / 100

‣Metric is in terms of the table limit - normalised

• Note that even for a large number of games, the

variance on this measure can be really big.

‣Recall Black Swan events - low likelihood, high

impact. Large wins are Black Swans here. 40

Stable Experimentation

• We really need a way to remove the variance from

the problem.

• Ordinarily we might repeat the experimentation, take

a large number of sample, use law of averages to our

advantage.

• We talked yesterday about the state space of just the

card dealing component of Poker

‣We know it's too large for this to be an option41

Experimentation

•What if we generate experimental scenarios.

• A large number of games, with the deck already

configured.

•We can play the scenario with player A

• Then replay the exact same scenario with player B

• The results that player A and B generate are now

comparable.

Experimental Design

• Designing good experiments is really important

• Not just for AI but for all kinds of things

• Understanding sources of uncertainty means we can

find ways to factor them out

• Design fair unbiased experiments

• For Science!

Summary

• More detail on Monte Carlo in Poker

• Explanation of Opponent Modelling in Poker

‣Dimensionality Reduction

‣ Clustering algorithms

• Exploiting Opponent Models

• Experimental Design

Next Week

•Other uses for Opponent Models

• Procedural Content Generation

• AI in Video Games

Lecture 4 - Opponent Modelling

Technology