Post on 21-Jun-2015
description
transcript
Making Better Decisions -Opponent Modelling
1
Monte Carlo in Poker(Recap)
• Yesterday we saw that Monte Carlo could be used to
estimate the expected reward of an action by evaluating the
delayed reward
• We do this by simulating or "rolling out" games to their end
state.
• Assess the amount we won or lost
2
Game Tree and Monte Carlo
3
iF
CR
Opponent
Chance
Random Walksin the Game Tree
• When we walk the Game Tree at random, we pick nodes to
follow at random.
• We assume (for now) that this is an unbiased choice
• This means every choice has the same probability of being
chosen
4
Can We Do Better?
• Random walks are all well and good
• But a uniform distribution across action choices isn't
accurate
‣ Certain situations will make sensible players more likely to use
certain actions
• How can we bring this bias into play in the walk?
5
Classifying Opponents
• The way we do this is to work out what type of player
someone is.
• We observe them to get a better understanding of how they
operate.
• In Poker and other games, we can use all sorts of statistical
measures to quantify a player's type.
6
Action Prediction
• Once we know what kind of player someone is, we can flip
things on their head.
• We answered "what is the likelihood this player is type X
given we have seen this type of play"
• We can now answer "what is the likelihood this player will
make action Y given they are of type X"
• Remember from Bayes Theorem last week, these questions
are closely linked
7
Simple (Human) Classification
• Pro Poker players try to quantify their opponents into one of
several classes based on 3 measures
‣ Voluntarily Put in Pot (VPiP)
‣ Won at Showdown (WSD)
‣ Pre-flop Raise (PFR)
8
Player Stereotypes
• Players can be
‣ Tight / Loose (how likely they are to play hands)
‣ Passive / Aggressive pre-flop
‣ Passive / Aggressive post-flop
9
Utilising Stereotypes
• If we can classify players we can use this against them
• For instance, we might discover that passive players can be
chased off by aggressive play
• Or we understand that when a super-conservative player
decides to raise, we need to be careful
• We can build heuristic rule bases around this like we saw
before.
• Or we can be much smarter
10
Better Classifications
• Humans are getting by on 3 dimensions
• But Poker has waaaay more statistics available than this
• We can make a lot of use of this extra data.
11
Poker Tracker
• Poker Tracker is a stats package specifically for Poker
• Analyses play at online casinos
• Real-time access to stats about opponents
• Allows players to review hands later
12
Stats in Poker
• A few slides ago - Poker has many statistics
• Poker Tracker keeps tabs on around 150 metrics
• Some of these are somewhat similar, some relate more to
the games than the players
13
Problem of Dimensionality
• The problem now is that we have too much information!
• Trying to learn on cluttered data can be problematic,
assuming it works at all.
14
Dimensionality Reduction
• Somehow we have to reduce the number of dimensions that
our data points are using.
• In many ways, getting the right data into a learning algorithm
is the biggest challenge.
• As much art as it is engineering.
• Two options
‣ Feature selection
‣ Feature extraction
15
Selection vs Extraction
• In Selection, you pick the dimensions you believe to be most
relevant
‣ The human players did this to get their 3 dimensional
representations
• In Extraction, you come up new dimensions that can
represent your datapoint
16
Principal Components Analysis
• PCA is a common strategy for this.
• Recasts the dimensions of the datapoint into another set of
"basis vectors".
• Smushes together dimensions that have a strong correlation
‣ Some stats measures are looking at fundamentally the same thing, in
different ways
‣ E.g. Various raise frequency metrics might be treated as a single
“aggression” dimension after PCA
17
Principal Components Analysis
• This was going to be a worked example.
• Honestly, that’s way to painful.
• For N observations in M dimensions X is a matrix
where each column is an observation.
• Calculate the mean and std. dev. for each row in the
matrix (each dimension)
18
Principal Components Analysis
• Calculate the covariance matrix, the amount that
the dimensions vary with respect to each other.
• Calculate the eigenvectors and eigenvalues of the
covariance matrix
‣ The eigenvectors are the new basis vectors of the
reduced-dimension datapoints
‣ The eigenvalues represent how significant the
eigenvector is. Large value = significant
19
Principal Components Analysis
• Pick the most significant K of the eigenvectors.
• Project the original datapoint in X onto the new
basis vectors.
20
Principal Components Analysis
• Honestly, if anyone ever asks you to do this
‣ Get a textbook
‣ Use Matlab
‣ Be really careful because it’s kind of complicated
• It is possible to do it by hand.
‣ I can’t anymore...
21
Principal Components Analysis
• Assuming that you finish the calculations without
mucking up.
‣Or, you find something to work it out for you (Matlab
functions for this exist)
•What you have now is a new datapoint, that is
approximately the same information.
• Recast into fewer dimensions.
‣Note that the dimensions will not make sense
22
PCA in Action
23
Clustering Algorithms
• Having performed PCA, we have a much more manageable
set of datapoints, and we’ve eliminated extraneous
dimensions
• Now we need to group them together.
• Clustering algorithms are one approach.
• Tries to find a set of “clusters” of points that are grouped
together.
24
Clustering
25
0
12.5
25.0
37.5
50.0
0 7.5 15.0 22.5 30.0
Blue Peter style example - real data is rarely so neat
Clustering
• k-means is one of the most popular algorithms
‣Others exist, fuzzy c-means, FLAME clustering and more
• Pick a value for k
‣ You can play around a bit to find good values or use
some tricks
‣ Accepted “rule of thumb” :
26
K-Means Algorithm
• Typically, we run the k-means algorithm as an
“iterative refinement” process
‣ Guess at some initial values, keep running the process
round and round until it stabilises
• Randomly assign datapoints to one of the k clusters
• Step 1 - Calculate centroids of the clusters
• Step 2 - Update assignment based on new centroids
• Rinse and repeat 1 and 2 until convergence.27
K-Means Algorithm
• Calculating Centroids of clusters
‣ xj denotes the datapoints being sampled
‣mi(t+1) denotes mean of cluster i at iteration t+1
‣ Si(t) denotes the set of datapoints assigned to cluster i at
iteration t
• Effectively, the average of the datapoints
28
K-Means Algorithm
• Assigning Datapoints to Clusters
• The set of points Si is all datapoints for which the
centroid of cluster i (mi) is the nearest centroid.
29
K-Means Worked Example
• Board work
30
From Classification to Prediction
• Once we have our clusters defined, we know what
datapoints constitute the type of player we are analysing
• We can use this to predict what the player will do
‣We have a collection of “similar” players, we can use
their history.
‣We may be able to use the raw data from the
observations directly.
• In either case, we can use the classification to predict actions
31
Back to Monte Carlo
• So, back to the game tree.
• We now have an idea of what type of player we are dealing
with.
• We have an idea of what actions the players are going to
take in given situations.
• Can we plug this back into the Monte Carlo simulation?
32
Informed Walksin the Game Tree
• We talked earlier about Opponent nodes in the game tree
• Specifically, when we hit an Opponent node, we would use a
uniform distribution to randomly pick between the options
available.
• Now, we can bias that distribution towards selecting the
action we expect the player to take.
33
Does This Work?
• Intuitively, it should
• The more accurate we make the simulation, the
more accurate the results should be.
• Concern is that the prediction process will slow
things down too much
‣Monte Carlo relies on large numbers of samples, if they
take too long, accuracy isn’t helping.
34
Does This Work?
•We don’t know.
• It’s been proven to aid Monte Carlo for Poker when
k=1
‣ All players are treated as a generic “player”
• This is ongoing research right now in SAIG.
• Look for papers next year. :)
35
What We Do Know
•We’ve previously attempted Machine Learning for
Opponent Modelling.
• Using 32 different statistical measures (reduced
down to 8 significant dimensions by PCA)
• Training data of 700,000 hands of Poker
• Successfully extracted around 28 different player
stereotypes.
36
The Aim of the Game
•We aren’t going to be able to make an AI that
always wins at Poker
• There’s too much chance involved
‣ Bad hands come up
‣Mis-interpreting players
•What we want to do is make an AI that performs
better than the other players under the same
circumstances37
Evaluation
• Any time we do research we are testing some sort
of scientific hypothesis.
•We need to design experiments to test whether the
hypothesis is true or not
• Science doesn't care if we're right - unbiased. Even if
we're wrong, we have learnt something.
38
Evaluation
• Consider a pro Poker player
•Will win some games and lose others
‣ In fact, a fundamental rule of good poker play is not even
taking part in about 80% of the games you sit through
• Measuring in terms of a single game doesn't work
‣Need to look at the forest, not the trees
• What counts is how much money the player wins at
the end.39
Measuring the Strength of an AI
• What we need is a measure of how successful a bot
is on average.
• Poker gives a metric for this - Big Blind / 100
‣Metric is in terms of the table limit - normalised
• Note that even for a large number of games, the
variance on this measure can be really big.
‣Recall Black Swan events - low likelihood, high
impact. Large wins are Black Swans here. 40
Stable Experimentation
• We really need a way to remove the variance from
the problem.
• Ordinarily we might repeat the experimentation, take
a large number of sample, use law of averages to our
advantage.
• We talked yesterday about the state space of just the
card dealing component of Poker
‣We know it's too large for this to be an option41
Experimentation
•What if we generate experimental scenarios.
• A large number of games, with the deck already
configured.
•We can play the scenario with player A
• Then replay the exact same scenario with player B
• The results that player A and B generate are now
comparable.
42
Experimental Design
• Designing good experiments is really important
• Not just for AI but for all kinds of things
• Understanding sources of uncertainty means we can
find ways to factor them out
• Design fair unbiased experiments
• For Science!
43
Summary
• More detail on Monte Carlo in Poker
• Explanation of Opponent Modelling in Poker
‣Dimensionality Reduction
‣ Clustering algorithms
• Exploiting Opponent Models
• Experimental Design
44
Next Week
•Other uses for Opponent Models
• Procedural Content Generation
• AI in Video Games
45