cs6501: PoKER
Class 3:
Probabilistic
Reasoning
Spring 2010
University of Virginia
David Evans
Plan
• One-line Proof of Bayes’ Theorem
• Inductive Learning
Home Game this Thursday, 7pm! (Game start: 7:15pm)
This is not an official course activity.
Email me by Wednesday afternoon if you are coming.
Bayes’ Theorem Bayes’ Theorem Proof
Machine
Learning
Inductive Learning“Learning from Examples”
Learner
Input:
Output:
Limits of Induction
It was the summer of 1919 that I began to feel more and more
dissatisfied with these three theories—the Marxist theory of
history, psycho-analysis, and individual psychology; and I began
to feel dubious about their claims to scientific status. My
problem perhaps first took the simple form, “What is wrong with
Marxism, psycho-analysis, and individual psychology? Why are
they so different from physical theories, from Newton's theory,
and especially from the theory of relativity?”
To make this contrast clear I should explain that few of us at the
time would have said that we believed in the truth of Einstein's
theory of gravitation. This shows that it was not my doubting the
truth of those three other theories which bothered me, but
something else.
One can sum up all this by saying that the criterion of the scientific
status of a theory is its falsifiability, or refutability, or testability.
Karl Popper, Science as Falsification, 1963.
Deciding on h
• Many hypotheses fit the training data
• Depends on hypothesis space: what types of
functions
• Pick between simple hypothesis function (that
may not fit exactly) and complex one
• How many functions are there for
X: n bits, Y: 1 bit ?
Forms of Inductive Learning
Supervised Learning
Given:
Output: hypothesis function
Unsupervised Learning (no explicit outputs)
Given:
Output: clustering
Reinforcement Learning
Given:
Output:
Feedback:
No feedback for individual decisions (output), but overall feedback.
First Reinforcement Learner (?)Arthur Samuel. Some Studies in Machine Learning
Using the Game of Checkers. IBM Journal, 1959.
Earlier inductive learning paper:
R. J. Solomonoff. An Inductive Inference Machine, 1956.
(and neural networks studied ealier)
Spam Filtering Supervised Learning: Spam FilterMessage 1
Not Spam
Message 2
Spam
Message N
Not Spam
Learner
X-Sender-IP: 78.128.95.196
From: Nicole Cox <[email protected]>
Subject: Job Offer
Date: Thu, 22 Apr 2010 10:10:45 +0300 (EEST)
Dear David,
Do you want to participate in the greatest Mystery Shopping quests nationwide? Have you
ever wondered how Mystery Shoppers are recruited and how prosperous companies keep up
doing business in the highly competitive business world? The answer is that many companies
are recruiting young, creative, observant, and responsible individuals like you to give their
feedback on various products and customer services and thus improve their quality.
As a Mystery Shopper you have only one responsibility: Act as a real customer while
evaluating the place you are sent to mystery shop and enjoy all the benefits that go along with
your job. Remember that you have nothing to lose, because you are awarded generously for
your efforts:
-You get paid between $10 and $40 per hour for each mystery shopping assignment;
-You keep all things that you have purchased for free;
-You watch movies, eat in restaurants, and visit amusement parks for free;
-You are turning your most enjoyable hobby into a well-paying activity.
Be aware that as a Mystery Shopper you can earn on average $100 to $300 per week. The
Feature Extraction
Features
F1 = From: <someone in address book>
F2 = Subject: *FREE*
F3 = Body: *enlargement*
F4 = Date: <today>
F5 = Body: <contains URL>
F6 = To: [email protected]
…
Note: this assumes we
already know what the
features are! Need to
learn them.
Bayesian Filtering
Feature Number of Spam Number of Ham
F1: From: <someone in address book> 2 4052
F2: Subject: *FREE* 3058 2
F3: Body: *enlargement* 253 1
F4: Date: <today> 304 5423
F5: Body: <contains URL> 3630 263
… … …
Total messages: 4000 Spam / 6000 Ham
Bayesian Filtering
Feature Number of Spam Number of Ham
F1: From: <someone in address book> 2 4052
F2: Subject: *FREE* 3058 2
F3: Body: *enlargement* 253 1
F4: Date: <today> 304 5423
F5: Body: <contains URL> 3630 263
… … …
Total messages: 4000 Spam / 6000 Ham
Bayesian Filtering
Total messages: 4000 Spam / 6000 Ham
Feature Number of Spam Number of Ham
F1: From: <someone in address book> 2 4052
F2: Subject: *FREE* 3058 2
F3: Body: *enlargement* 253 1
F4: Date: <today> 304 5423
F5: Body: <contains URL> 3630 263
… … …
Combining Probabilities
Naïve Bayesian Model: assume all features are
independent
Combining Probabilities
Naïve Bayesian Model: assume all features are
independent
Learning the Features
Learner
Feature Spam Likelihood
F1: Subject: *Poker* 0.03
F2: Subject: *FREE* 0.999
… …
Make every
<context, token>
pair a feature.
Which ones should
we keep?
Bayesian Spam Filtering
Patrick Pantel and Dekang Lin. SpamCop: A
Spam Classification & Organization Program.
AAAI-98 Workshop on Text Classification,
1998.
Paul Graham. A Plan for Spam (2002), Better
Bayesian Filtering (2003)
SpamAssassin Bayesian Filter
Testing Learners
K-Fold Cross Validation
Training Data
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
Randomly Partition (usually 10 folds)
LearnerFold 1
Fold 2Fold 3
Fold 4 Use k-1 folds for training, test on unused fold.
(repeat for each fold unused)
Fold 5
Score
Concerns
• Limits of Naïve Bayesian Model
• How many features
• Expressiveness of learned features
• What if
Adversarial Spam
How Many Ways Can You Spell V1@gra?
600,426,974,379,824,381,952
Really Adversarial Spam
• Player 1: Spammer
– Goal: Create a spam that tricks filter, or make filter
reject ham
• Player 2: Filter
– Not be tricked (but not reject ham messages)
Does this game have a Nash equilibrium?
Blaine Nelson, Marco Barreno, Fuching Jack Chi, Anthony D. Joseph,
Benjamin I. P. Rubinstein, Udam Saini, Charles Sutton, J. D. Tygar, Kai Xia
Exploiting Machine Learning to Subvert Your Spam Filter. USENIX LEET 2008.
Focused Attack
(try to filter particular
non-spam message)
Hidden Markov Models
Hidden Markov Model
A
KStart
Q
Finite State Machine
+ probabilities on transitions
1/3
1/3
1/3
+ hide the state
+ add observations and output
probabilities
Bet
Check
1
1
1/32/3
Viterbi Path
Given a sequence of observations, what is most likely sequence of states
Viterbi Algorithm (1967)
Andrew Viterbi
x(t-2) x(t-1) x(t)
y(t-2) y(t-1) y(t)
…
Key assumption: the most likely sequence
for time t depends only on
(1) the most likely sequence at time t-1(2) y(t)
This is true for first-order HMMs: transition probabilities depend only on current state.
Viterbi AlgorithmKey assumption: the most likely sequence
for time t depends only on
(1) the most likely sequence at time t-1(2) y(t)
Viterbi Algorithm
Running time:
Applications of HMMs
• Noisy Transmission (Convolution Codes)
– Sequence of states: message to transmit
– Observations: received signal
• Speech Recognition
– Sequence of states: utterance
– Observations: recorded sound
• Bioinformatics
• Cryptanalysis
• etc.
Charge
• So far we have assumed the state transition
and output probabilities are all known!
• Thursday’s Class: learning the HMM
If you are coming to the home game Thursday 7pm,
remember to email me by 5pm Wednesday.
Include if you need a ride or can drive other people.