Pattern Recognition and Machine Learning

Laboratory for Social & Neural Systems Research (SNS)

PATTERN RECOGNITION AND MACHINE LEARNING

Institute of Empirical Research in Economics (IEW)

22-09-2010 1Computational Neuroeconomics and Neuroscience

Computational Neuroeconomics and Neuroscience

2

Course schedule

22-09-2010

Date Topic Chapter13-10-2010 Density Estimation, Bayesian Inference 2Adrian Etter, Marco Piccirelli, Giuseppe Ugazio

20-10-2010 Linear Models for Regression 3Susanne Leiberg, Grit Hein

27-10-2010 Linear Models for Classification 4Friederike Meyer, Chaohui Guo

03-11-2010 Kernel Methods I: Gaussian Processes 6Kate Lomakina

10-11-2010 Kernel Methods II: SVM and RVM 7Christoph Mathys, Morteza Moazami

17-11-2010 Probabilistic Graphical Models 8Justin Chumbley


3

Course schedule

22-09-2010

Date Topic Chapter24-11-2010 Mixture Models and EM 9Bastiaan Oud, Tony Williams

01-12-2010 Approximate Inference I: Deterministic Approximations 10Falk Lieder

08-12-2010 Approximate Inference II: Stochastic Approximations 11Kay Brodersen

15-12-2010 Inference on Continuous Latent Variables: PCA, Probabilistic PCA, ICA 12

Lars Kasper

22-12-2010 Sequential Data: Hidden Markov Models, Linear Dynamical Systems 13

Chris Burke, Yosuke Morishima

Sandra Iglesias

Laboratory for Social & Neural Systems Research (SNS)

CHAPTER 1: PROBABILITY, DECISION, AND INFORMATION THEORY

Institute of Empirical Research in Economics (IEW)



5

Outline

- Introduction- Probability Theory

- Probability Rules- Bayes’Theorem- Gaussian Distribution

- Decision Theory- Information Theory

22-09-2010


6

Pattern recognition

22-09-2010

computer algorithms automatic discovery of regularities in data

use of these regularities to take actions such as classifying the data into different categories

classify data (patterns) based either on - a priori knowledge or- statistical information extracted from the patterns


7

Machine learning

22-09-2010

'How can we program systems to automatically learn and to improve with experience?'

the machine is programmed to learn froman incomplete set of examples (training set)

the core objective of a learner is to generalize from its experience


8

Polynomial Curve Fitting

22-09-2010

𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: sin (2𝜋𝑥)


9

Sum-of-Squares Error Function

22-09-2010


10

Plots of polynomials

22-09-2010


11

Over-fitting

Root-Mean-Square (RMS) Error:

22-09-2010


12

RegularizationPenalize large coefficient values

22-09-2010

M = 9 M = 9


13

Regularization: vs.

22-09-2010

M = 9


14

Outline

- Introduction- Probability Theory- Decision Theory- Information Theory

22-09-2010


15

Probability Theory

Uncertainty

Probability theory: consistent framework for the

quantification and manipulation of uncertainty

22-09-2010

Noise on measurements

Finite size of data sets


16

Probability Theory

Marginal Probability

Conditional ProbabilityJoint Probability

22-09-2010


17

Probability Theory

22-09-2010

i = 1, …,Mj = 1, …,Lnij: number of trials in which

X = xi and Y = yj

ci: number of trials in which X = xi irrespective of the value of Y

rj: number of trials in which X = xi irrespective of the value of Y


18

Probability Theory



22-09-2010


19

Probability Theory



22-09-2010


20

Probability Theory



22-09-2010


21

Probability Theory

Sum Rule

22-09-2010


22

Probability Theory

Product Rule

22-09-2010


23

The Rules of Probability

Sum Rule

Product Rule

22-09-2010


24

Bayes’ Theorem

22-09-2010

T. Bayes (1702-1761)

P.-S. Laplace (1749-1827)

p(X,Y) = p(Y,X)


25

Bayes’ Theorem

posterior likelihood × prior

22-09-2010

T. Bayes (1702-1761)

P.-S. Laplace (1749-1827)

𝑝ሺ𝒘|𝐷ሻ= 𝑝ሺ𝐷|𝒘ሻ𝑝ሺ𝒘ሻ𝑝ሺ𝐷ሻ

Polynomial curve fitting problem


26

Probability Densities

22-09-2010


27

Expectations

Expectation for a discrete distribution:

22-09-2010

Expectation for a continuous distribution:

Expectation of f(x) is the average value of some function f(x) under a probability distribution p(x)


28

The Gaussian Distribution

22-09-2010


29

Gaussian Parameter Estimation

Likelihood function

22-09-2010


30

Maximum (Log) Likelihood

22-09-2010


31

Curve Fitting Re-visited

22-09-2010


32

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

22-09-2010


33

Outline


22-09-2010

34

Decision Theory• Used with probability theory to make optimal decisions• Input vector x, target vector t

• Regression: t is continuous• Classification: t will consist of class labels

• Summary of uncertainty associated is given by• Inference problem: is to obtain from data• Decision problem: make specific prediction for value of t and take

specific actions based on t

Inference step Decision stepDetermine either or . For given x, determine

optimal t.

22-09-2010 Computational Neuroeconomics and Neuroscience

35

Medical Diagnosis Problem• X-ray image of patient• Whether patient has cancer or not• Input vector x: set of pixel intensities• Output variable t: whether cancer or not

• C1 = cancer; C2 = no cancer• General inference problem is to determine which

gives most complete description of situation• In the end we need to decide whether to give treatment or

not Decision theory helps do this

22-09-2010 Computational Neuroeconomics and Neuroscience

𝑝(𝑥,𝐶𝑘)


36

Bayes’ Decision• How do probabilities play a role in making a

decision?• Given input x and classes Ck using Bayes’ theorem

• Quantities in Bayes theorem can be obtained from p(x,Ck) either by marginalizing or conditioning with respect to the appropriate variable

22-09-2010

𝑝ሺ𝐶𝑘|𝑥ሻ= 𝑝ሺ𝑥|𝐶𝑘ሻ𝑝ሺ𝐶𝑘ሻ𝑝ሺ𝑥ሻ


37

Minimum Expected LossExample: classify medical images as ‘cancer’ or ‘normal’

• Unequal importance of mistakes• Loss or Cost Function given by Loss Matrix• Utility is negative of Loss

• Minimize Average Loss

Decision

Trut

h

22-09-2010

• Regions are chosen to minimize


38

Why Separate Inference and Decision?

Classification problem broken into two separate stages:– Inference stage: training data is used to learn a model for

– Decision stage: posterior probabilities used to make optimal class assignments

Three distinct approaches to solving decision problems1. Generative models:2. Discriminative models 3. Discriminant functions

22-09-2010

𝑝ሺ𝐶𝑘,𝑥ሻ= 𝑝ሺ𝑥,𝐶𝑘ሻ𝑝ሺ𝐶𝑘ሻ𝑝ሺ𝑥ሻ


39

Generative models

1. solve inference problem of determiningclass-conditional densities for eachclass separately and the prior probabilities

2. use Bayes’ theorem to determine posterior probabilities

3. use decision theory to determine class membership

22-09-2010

𝑝ሺ𝐶𝑘,𝑥ሻ= 𝑝ሺ𝑥,𝐶𝑘ሻ𝑝ሺ𝐶𝑘ሻ𝑝ሺ𝑥ሻ




40

Discriminative models

1. solve inference problem to determineposterior class probabilities

2. Use decision theory to determine class membership

22-09-2010



41

Discriminant functions

1. Find a function f(x) that maps each input xdirectly to a class label

e.g. two-class problem: f (·) is binary valuedf =0 represents C1, f =1 represents C2

Probabilities play no role

22-09-2010


42

Decision Theory for Regression

Inference stepDetermine

Decision stepFor given x, make optimal prediction, y(x), for t

Loss function:

22-09-2010


43

Outline


22-09-2010


44

Information theory

• Quantification of informationDegree of surprise: highly improbable a lot of information

highly probable less informationcertain no information

• Based on probability theory• Most important quantity: entropy

22-09-2010

Entropy


H[x]

p(x)0

Entropy is the average amount of information expected, weighted with the probability of the random variable quantifies the uncertainty involved when we encounter this random variable


46

The Kullback-Leibler Divergence

22-09-2010

• Non-symmetric measure of the difference between two probability distributions

• Also called relative entropy


47

Mutual Information

22-09-2010

Two sets of variables: x and y

If independent:

If not independent:

𝑝ሺ𝑥,𝑦ሻ= 𝑝ሺ𝑥ሻ𝑝ሺ𝑦ሻ


48

Mutual Information

22-09-2010

Mutual information mutual dependence shared information related to the conditional entropy


49

Course schedule

22-09-2010

Date Topic Chapter22-09-2010 Probability, Decision, and Information Theory 113-10-2010 Density Estimation, Bayesian Inference 220-10-2010 Linear Models for Regression 327-10-2010 Linear Models for Classification 403-11-2010 Kernel Methods I: Gaussian Processes 610-11-2010 Kernel Methods II: SVM and RVM 717-11-2010 Probabilistic Graphical Models 824-11-2010 Mixture Models and EM 901-12-2010 Approximate Inference I: Deterministic Approximations 1008-12-2010 Approximate Inference II: Stochastic Approximations 1115-12-2010 Inference on Continuous Latent Variables: PCA,

Probabilistic PCA, ICA 1222-12-2010 Sequential Data: Hidden Markov Models, Linear Dynamical

Systems 13

Date post:	22-Feb-2016
Category:	Documents
Upload:	kevyn
View:	69 times
Download:	0 times

Pattern Recognition and Machine Learning

Documents