+ All Categories
Home > Documents > Part I: Classifier Performance

Part I: Classifier Performance

Date post: 14-Jan-2016
Category:
Upload: caspar
View: 22 times
Download: 0 times
Share this document with a friend
Description:
Part I: Classifier Performance. Mahesan Niranjan Department of Computer Science The University of Sheffield [email protected] & Cambridge Bioinformatics Limited [email protected]. Relevant Reading. Bishop, Neural Networks for Pattern Recognition - PowerPoint PPT Presentation
Popular Tags:
49
Part I: Classifier Performance Mahesan Niranjan Department of Computer Science The University of Sheffield [email protected] & Cambridge Bioinformatics Limited [email protected]
Transcript
Page 1: Part I: Classifier Performance

Part I: Classifier Performance

Mahesan Niranjan

Department of Computer ScienceThe University of Sheffield

[email protected]

&Cambridge Bioinformatics Limited

[email protected]

Page 2: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 2

Relevant Reading

• Bishop, Neural Networks for Pattern Recognition

• http://www.ncrg.aston.ac.uk/netlab• David Hand, Construction and Assessment of

Classification Rules

• Lovell, et. Al. CUED/F-INFENG/TR.299• Scott et al CUED/F-INFENG/TR.323

reports linked from http://www.dcs.shef.ac.uk/~niranjan

Page 3: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 3

Pattern Recognition Framework

Page 4: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 4

Two Approaches to Pattern Recognition

• Probabilistic via explicit modelling of probabilities encountered in Bayes’ formula

• Parametric form for class boundary and optimise it• In some specific cases (often not) both reduce to the

same answer

Page 5: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 5

Pattern Recognition: Simple case

O Gaussian Distributions Isotropic Equal Variances

Optimal Classifier:

• Distance to mean• Linear Class Boundary

Page 6: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 6

Distance can be misleading

O

Mahalanobis Distance

Optimal Classifier for this case is Fisher Linear Discriminant

Page 7: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 7

Support Vector MachinesMaximum Margin Perceptron

X

XX

X

X

X

O

O

OO

O O

OO

O

O

X

X

XX

XX

Page 8: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 8

Support Vector MachinesNonlinear Kernel Functions

X

XX

X

O OO

O

OO

OX

XX

XX

X

O

OO O

O

O

O

Page 9: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 9

Support Vector MachinesComputations

• Quadratic Programming

• Class boundary defined only by data that lie close to it - support vectors

• Kernels in data space equal scalar products in higher dimensional space

x Axt

0 x Ci

Page 10: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 10

Support Vector MachinesThe Hypes

• Strong theoretical basis - Computational Learning Theory; complexity controlled by the Vapnik-Chervonenkis dimension

• Not many parameters to tune

• High performance on many practical problems, high dimensional problems in particular

Page 11: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 11

Support Vector MachinesThe Truths

• Worst case bounds from Learning theory are not very practical

• Several parameters to tune– What kernel?– Internal workings of the optimiser– Noise in training data

• Performance? – depends on who you ask

Page 12: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 12

SVM: data driven kernel

• Fisher Kernel [Jaakola & Haussler]– Kernel based on a generative model of all the data

p x|

Ux p x ln |

K x x U I Ui j xt

xi j( , ) 1

Page 13: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 13

Classifier Performance

• Error rates can be misleading

– Imbalance in training/test data• 98% of population healthy• 2% population has disease

– Cost of misclassification can change after design of classifier

Page 14: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 14

x

xx

xxx

xx

x

x

x

x

Adverse Outcome

Benign Outcome

Threshold

Class Boundary

Page 15: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 15

Tru

e P

osi

tive

False Positive

Area under the ROC Curve: Neat Statistical Interpretation

Page 16: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 16

Convex Hull of ROC Curves

False Positive

Tru

e P

osi

tive

Page 17: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 17

Yeast Gene Example: MATLAB Demo here

Page 18: Part I: Classifier Performance

Part II: Particle Filters for Tracking and Sequential

Problems

Mahesan Niranjan

Department of Computer ScienceThe University of Sheffield

Page 19: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 19

Overview

• Motivation

• State Space Model

• Kalman Filter and Extensions

• Sequential MCMC Methods

– Particle Filter & Variants

Page 20: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 20

Motivation

• Neural Networks for Learning:– Function Approximation– Statistical Estimation– Dynamical Systems– Parallel Processing

• Guarantee Generalisation:– Regularise / control complexity– Cross validate to detect / avoid overfitting– Bootstrap to deal with model / data uncertainty

• Many of the above tricks won’t work in a sequential setting

Page 21: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 21

Interesting Applications

• Speech Signal Processing

• Medical Signals

– Monitoring Liver Transplant Patients

• Tracking the prices of Options contracts in

computational finance

Page 22: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 22

Good References• Bar-Shalom and Fortman:

Tracking and Data Association

• Jazwinski:

Stochastic Processes and Filtering Theory

• Arulampalam et al:

“Tutorial on Particle Filters…”; IEEE Transactions on Signal Processing

• Arnaud Doucet:

Technical Report 310, Cambridge University Engineering Department

• Benveniste, A et al:

Adaptive Algorithms and Stochastic Approximation

• Simon Haykin:

Adaptive Filters

Page 23: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 23

Matrix Inversion Lemma

Page 24: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 24

Linear Regression

Page 25: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 25

Recursive Least Squares

Page 26: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 26

State Space Model

State Process Noise

Observation Measurement Noise

Page 27: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 27

Simple Linear Gaussian Model

Page 28: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 28

Kalman Filter

Prediction

Correction

Page 29: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 29

Kalman Filter

Innovation

Kalman Gain

Page 30: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 30

Bayesian SettingPrior Likelihood

Innovation Probability

•Run Multiple Models and Switch - Bar-Shalom•Set Noise Levels to Max Likelihood Values - Jazwinski

Page 31: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 31

Extended Kalman Filter

Lee Feldkamp @ Ford Successful training of Recurrent Neural Networks

Taylor Series Expansion around the operating point

First Order

Second Order

Iterated Extended Kalman Filter

Page 32: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 32

Iterated Extended Kalman Filter

Local Linearization of State and / or Observation

Propagation and Update

Page 33: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 33

Unscented Kalman FilterGenerate some points at time

So they can represent the mean and covariance:

Propagate these through the state equations

Recompute predicted mean and covariance:

Page 34: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 34

Recipe to define:Recompute:

Page 35: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 35

Formant Tracking Example

Linear Filter

Excitation Speech

Page 36: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 36

Formant Tracking Example

Page 37: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 37

Formant Track Example

Page 38: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 38

Grid-based methods

Discretize continuous state into “cells”

Integrating probabilities over each partition

Fixed partitioning of state space

Page 39: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 39

Sampling Methods: Bayesian Inference

Parameters

Uncertainty over parameters

Inference:

Page 40: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 40

Basic Tool: Composition [Tanner]

To generate samples of

Page 41: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 41

Importance Sampling

Page 42: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 42

Particle Filters

Prediction

Weights of Sample

Bootstrap Filters ( Gordon et al, Tracking ) CONDENSATION Algorithm ( Isard et al, Vision )

Page 43: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 43

Sequential Importance Sampling

Recursive update of weights

Only upto a constant of proportionality

Page 44: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 44

Degeneracy in SIS

Variance of weights monotonically increases All except one decay to zero very rapidly

Effective number of particles

Resample if

Page 45: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 45

Sampling, Importance Re-Sampling (SIR)

Multiply samples of high weight; kill off samples in parts of space not relevant “Particle Collapse”

Page 46: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 46

Marginalizing Part of the State Space

Suppose

Possible to analytically integrate with respect to part of the state space

Sample with respect to

Integrate with respect to

Rao-Blackwell

Page 47: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 47

Variations to the Basic Algorithm

• Integrate out part of the state space– Rao-Blackwellized particle filters

( e.g. Multi-layer perceptron with linear output layer )• Variational Importance Sampling ( Lawrence et al )

• Auxilliary Particle Filters ( Pitt et al )• Regularized Particle Filters • Likelihood Particle Filters

Page 48: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 48

Regularised PF: basic idea

Samples

Kernel Density

Resample

Propagate in time

Page 49: Part I: Classifier Performance

BCS, Exeter, July 2004 Mahesan Niranjan 49

Conclusion / Summary

• Collection of powerful algorithms

• New and interesting signal processing problems


Recommended