Laboratory for Social & Neural Systems Research (SNS)
PATTERN RECOGNITION AND MACHINE LEARNING
Institute of Empirical Research in Economics (IEW)
22-09-2010 1Computational Neuroeconomics and Neuroscience
Computational Neuroeconomics and Neuroscience
2
Course schedule
22-09-2010
Date Topic Chapter13-10-2010 Density Estimation, Bayesian Inference 2Adrian Etter, Marco Piccirelli, Giuseppe Ugazio
20-10-2010 Linear Models for Regression 3Susanne Leiberg, Grit Hein
27-10-2010 Linear Models for Classification 4Friederike Meyer, Chaohui Guo
03-11-2010 Kernel Methods I: Gaussian Processes 6Kate Lomakina
10-11-2010 Kernel Methods II: SVM and RVM 7Christoph Mathys, Morteza Moazami
17-11-2010 Probabilistic Graphical Models 8Justin Chumbley
Computational Neuroeconomics and Neuroscience
3
Course schedule
22-09-2010
Date Topic Chapter24-11-2010 Mixture Models and EM 9Bastiaan Oud, Tony Williams
01-12-2010 Approximate Inference I: Deterministic Approximations 10Falk Lieder
08-12-2010 Approximate Inference II: Stochastic Approximations 11Kay Brodersen
15-12-2010 Inference on Continuous Latent Variables: PCA, Probabilistic PCA, ICA 12
Lars Kasper
22-12-2010 Sequential Data: Hidden Markov Models, Linear Dynamical Systems 13
Chris Burke, Yosuke Morishima
Sandra Iglesias
Laboratory for Social & Neural Systems Research (SNS)
CHAPTER 1: PROBABILITY, DECISION, AND INFORMATION THEORY
Institute of Empirical Research in Economics (IEW)
22-09-2010 4Computational Neuroeconomics and Neuroscience
Computational Neuroeconomics and Neuroscience
5
Outline
- Introduction- Probability Theory
- Probability Rules- Bayes’Theorem- Gaussian Distribution
- Decision Theory- Information Theory
22-09-2010
Computational Neuroeconomics and Neuroscience
6
Pattern recognition
22-09-2010
computer algorithms automatic discovery of regularities in data
use of these regularities to take actions such as classifying the data into different categories
classify data (patterns) based either on - a priori knowledge or- statistical information extracted from the patterns
Computational Neuroeconomics and Neuroscience
7
Machine learning
22-09-2010
'How can we program systems to automatically learn and to improve with experience?'
the machine is programmed to learn froman incomplete set of examples (training set)
the core objective of a learner is to generalize from its experience
Computational Neuroeconomics and Neuroscience
8
Polynomial Curve Fitting
22-09-2010
𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: sin (2𝜋𝑥)
Computational Neuroeconomics and Neuroscience
9
Sum-of-Squares Error Function
22-09-2010
Computational Neuroeconomics and Neuroscience
10
Plots of polynomials
22-09-2010
Computational Neuroeconomics and Neuroscience
11
Over-fitting
Root-Mean-Square (RMS) Error:
22-09-2010
Computational Neuroeconomics and Neuroscience
12
RegularizationPenalize large coefficient values
22-09-2010
M = 9 M = 9
Computational Neuroeconomics and Neuroscience
13
Regularization: vs.
22-09-2010
M = 9
Computational Neuroeconomics and Neuroscience
14
Outline
- Introduction- Probability Theory- Decision Theory- Information Theory
22-09-2010
Computational Neuroeconomics and Neuroscience
15
Probability Theory
Uncertainty
Probability theory: consistent framework for the
quantification and manipulation of uncertainty
22-09-2010
Noise on measurements
Finite size of data sets
Computational Neuroeconomics and Neuroscience
16
Probability Theory
Marginal Probability
Conditional ProbabilityJoint Probability
22-09-2010
Computational Neuroeconomics and Neuroscience
17
Probability Theory
22-09-2010
i = 1, …,Mj = 1, …,Lnij: number of trials in which
X = xi and Y = yj
ci: number of trials in which X = xi irrespective of the value of Y
rj: number of trials in which X = xi irrespective of the value of Y
Computational Neuroeconomics and Neuroscience
18
Probability Theory
Marginal Probability
Conditional ProbabilityJoint Probability
22-09-2010
Computational Neuroeconomics and Neuroscience
19
Probability Theory
Marginal Probability
Conditional ProbabilityJoint Probability
22-09-2010
Computational Neuroeconomics and Neuroscience
20
Probability Theory
Marginal Probability
Conditional ProbabilityJoint Probability
22-09-2010
Computational Neuroeconomics and Neuroscience
21
Probability Theory
Sum Rule
22-09-2010
Computational Neuroeconomics and Neuroscience
22
Probability Theory
Product Rule
22-09-2010
Computational Neuroeconomics and Neuroscience
23
The Rules of Probability
Sum Rule
Product Rule
22-09-2010
Computational Neuroeconomics and Neuroscience
24
Bayes’ Theorem
22-09-2010
T. Bayes (1702-1761)
P.-S. Laplace (1749-1827)
p(X,Y) = p(Y,X)
Computational Neuroeconomics and Neuroscience
25
Bayes’ Theorem
posterior likelihood × prior
22-09-2010
T. Bayes (1702-1761)
P.-S. Laplace (1749-1827)
𝑝ሺ𝒘|𝐷ሻ= 𝑝ሺ𝐷|𝒘ሻ𝑝ሺ𝒘ሻ𝑝ሺ𝐷ሻ
Polynomial curve fitting problem
Computational Neuroeconomics and Neuroscience
26
Probability Densities
22-09-2010
Computational Neuroeconomics and Neuroscience
27
Expectations
Expectation for a discrete distribution:
22-09-2010
Expectation for a continuous distribution:
Expectation of f(x) is the average value of some function f(x) under a probability distribution p(x)
Computational Neuroeconomics and Neuroscience
28
The Gaussian Distribution
22-09-2010
Computational Neuroeconomics and Neuroscience
29
Gaussian Parameter Estimation
Likelihood function
22-09-2010
Computational Neuroeconomics and Neuroscience
30
Maximum (Log) Likelihood
22-09-2010
Computational Neuroeconomics and Neuroscience
31
Curve Fitting Re-visited
22-09-2010
Computational Neuroeconomics and Neuroscience
32
Maximum Likelihood
Determine by minimizing sum-of-squares error, .
22-09-2010
Computational Neuroeconomics and Neuroscience
33
Outline
- Introduction- Probability Theory- Decision Theory- Information Theory
22-09-2010
34
Decision Theory• Used with probability theory to make optimal decisions• Input vector x, target vector t
• Regression: t is continuous• Classification: t will consist of class labels
• Summary of uncertainty associated is given by• Inference problem: is to obtain from data• Decision problem: make specific prediction for value of t and take
specific actions based on t
Inference step Decision stepDetermine either or . For given x, determine
optimal t.
22-09-2010 Computational Neuroeconomics and Neuroscience
35
Medical Diagnosis Problem• X-ray image of patient• Whether patient has cancer or not• Input vector x: set of pixel intensities• Output variable t: whether cancer or not
• C1 = cancer; C2 = no cancer• General inference problem is to determine which
gives most complete description of situation• In the end we need to decide whether to give treatment or
not Decision theory helps do this
22-09-2010 Computational Neuroeconomics and Neuroscience
𝑝(𝑥,𝐶𝑘)
Computational Neuroeconomics and Neuroscience
36
Bayes’ Decision• How do probabilities play a role in making a
decision?• Given input x and classes Ck using Bayes’ theorem
• Quantities in Bayes theorem can be obtained from p(x,Ck) either by marginalizing or conditioning with respect to the appropriate variable
22-09-2010
𝑝ሺ𝐶𝑘|𝑥ሻ= 𝑝ሺ𝑥|𝐶𝑘ሻ𝑝ሺ𝐶𝑘ሻ𝑝ሺ𝑥ሻ
Computational Neuroeconomics and Neuroscience
37
Minimum Expected LossExample: classify medical images as ‘cancer’ or ‘normal’
• Unequal importance of mistakes• Loss or Cost Function given by Loss Matrix• Utility is negative of Loss
• Minimize Average Loss
Decision
Trut
h
22-09-2010
• Regions are chosen to minimize
Computational Neuroeconomics and Neuroscience
38
Why Separate Inference and Decision?
Classification problem broken into two separate stages:– Inference stage: training data is used to learn a model for
– Decision stage: posterior probabilities used to make optimal class assignments
Three distinct approaches to solving decision problems1. Generative models:2. Discriminative models 3. Discriminant functions
22-09-2010
𝑝ሺ𝐶𝑘,𝑥ሻ= 𝑝ሺ𝑥,𝐶𝑘ሻ𝑝ሺ𝐶𝑘ሻ𝑝ሺ𝑥ሻ
Computational Neuroeconomics and Neuroscience
39
Generative models
1. solve inference problem of determiningclass-conditional densities for eachclass separately and the prior probabilities
2. use Bayes’ theorem to determine posterior probabilities
3. use decision theory to determine class membership
22-09-2010
𝑝ሺ𝐶𝑘,𝑥ሻ= 𝑝ሺ𝑥,𝐶𝑘ሻ𝑝ሺ𝐶𝑘ሻ𝑝ሺ𝑥ሻ
𝑝ሺ𝐶𝑘|𝑥ሻ= 𝑝ሺ𝑥|𝐶𝑘ሻ𝑝ሺ𝐶𝑘ሻ𝑝ሺ𝑥ሻ
𝑝ሺ𝐶𝑘|𝑥ሻ= 𝑝ሺ𝑥|𝐶𝑘ሻ𝑝ሺ𝐶𝑘ሻ𝑝ሺ𝑥ሻ
Computational Neuroeconomics and Neuroscience
40
Discriminative models
1. solve inference problem to determineposterior class probabilities
2. Use decision theory to determine class membership
22-09-2010
𝑝ሺ𝐶𝑘|𝑥ሻ= 𝑝ሺ𝑥|𝐶𝑘ሻ𝑝ሺ𝐶𝑘ሻ𝑝ሺ𝑥ሻ
Computational Neuroeconomics and Neuroscience
41
Discriminant functions
1. Find a function f(x) that maps each input xdirectly to a class label
e.g. two-class problem: f (·) is binary valuedf =0 represents C1, f =1 represents C2
Probabilities play no role
22-09-2010
Computational Neuroeconomics and Neuroscience
42
Decision Theory for Regression
Inference stepDetermine
Decision stepFor given x, make optimal prediction, y(x), for t
Loss function:
22-09-2010
Computational Neuroeconomics and Neuroscience
43
Outline
- Introduction- Probability Theory- Decision Theory- Information Theory
22-09-2010
Computational Neuroeconomics and Neuroscience
44
Information theory
• Quantification of informationDegree of surprise: highly improbable a lot of information
highly probable less informationcertain no information
• Based on probability theory• Most important quantity: entropy
22-09-2010
Entropy
22-09-2010 45Computational Neuroeconomics and Neuroscience
H[x]
p(x)0
Entropy is the average amount of information expected, weighted with the probability of the random variable quantifies the uncertainty involved when we encounter this random variable
Computational Neuroeconomics and Neuroscience
46
The Kullback-Leibler Divergence
22-09-2010
• Non-symmetric measure of the difference between two probability distributions
• Also called relative entropy
Computational Neuroeconomics and Neuroscience
47
Mutual Information
22-09-2010
Two sets of variables: x and y
If independent:
If not independent:
𝑝ሺ𝑥,𝑦ሻ= 𝑝ሺ𝑥ሻ𝑝ሺ𝑦ሻ
Computational Neuroeconomics and Neuroscience
48
Mutual Information
22-09-2010
Mutual information mutual dependence shared information related to the conditional entropy
Computational Neuroeconomics and Neuroscience
49
Course schedule
22-09-2010
Date Topic Chapter22-09-2010 Probability, Decision, and Information Theory 113-10-2010 Density Estimation, Bayesian Inference 220-10-2010 Linear Models for Regression 327-10-2010 Linear Models for Classification 403-11-2010 Kernel Methods I: Gaussian Processes 610-11-2010 Kernel Methods II: SVM and RVM 717-11-2010 Probabilistic Graphical Models 824-11-2010 Mixture Models and EM 901-12-2010 Approximate Inference I: Deterministic Approximations 1008-12-2010 Approximate Inference II: Stochastic Approximations 1115-12-2010 Inference on Continuous Latent Variables: PCA,
Probabilistic PCA, ICA 1222-12-2010 Sequential Data: Hidden Markov Models, Linear Dynamical
Systems 13