Part I: Classifier Performance
Mahesan Niranjan
Department of Computer ScienceThe University of Sheffield
&Cambridge Bioinformatics Limited
BCS, Exeter, July 2004 Mahesan Niranjan 2
Relevant Reading
• Bishop, Neural Networks for Pattern Recognition
• http://www.ncrg.aston.ac.uk/netlab• David Hand, Construction and Assessment of
Classification Rules
• Lovell, et. Al. CUED/F-INFENG/TR.299• Scott et al CUED/F-INFENG/TR.323
reports linked from http://www.dcs.shef.ac.uk/~niranjan
BCS, Exeter, July 2004 Mahesan Niranjan 3
Pattern Recognition Framework
BCS, Exeter, July 2004 Mahesan Niranjan 4
Two Approaches to Pattern Recognition
• Probabilistic via explicit modelling of probabilities encountered in Bayes’ formula
• Parametric form for class boundary and optimise it• In some specific cases (often not) both reduce to the
same answer
BCS, Exeter, July 2004 Mahesan Niranjan 5
Pattern Recognition: Simple case
O Gaussian Distributions Isotropic Equal Variances
Optimal Classifier:
• Distance to mean• Linear Class Boundary
BCS, Exeter, July 2004 Mahesan Niranjan 6
Distance can be misleading
O
Mahalanobis Distance
Optimal Classifier for this case is Fisher Linear Discriminant
BCS, Exeter, July 2004 Mahesan Niranjan 7
Support Vector MachinesMaximum Margin Perceptron
X
XX
X
X
X
O
O
OO
O O
OO
O
O
X
X
XX
XX
BCS, Exeter, July 2004 Mahesan Niranjan 8
Support Vector MachinesNonlinear Kernel Functions
X
XX
X
O OO
O
OO
OX
XX
XX
X
O
OO O
O
O
O
BCS, Exeter, July 2004 Mahesan Niranjan 9
Support Vector MachinesComputations
• Quadratic Programming
• Class boundary defined only by data that lie close to it - support vectors
• Kernels in data space equal scalar products in higher dimensional space
x Axt
0 x Ci
BCS, Exeter, July 2004 Mahesan Niranjan 10
Support Vector MachinesThe Hypes
• Strong theoretical basis - Computational Learning Theory; complexity controlled by the Vapnik-Chervonenkis dimension
• Not many parameters to tune
• High performance on many practical problems, high dimensional problems in particular
BCS, Exeter, July 2004 Mahesan Niranjan 11
Support Vector MachinesThe Truths
• Worst case bounds from Learning theory are not very practical
• Several parameters to tune– What kernel?– Internal workings of the optimiser– Noise in training data
• Performance? – depends on who you ask
BCS, Exeter, July 2004 Mahesan Niranjan 12
SVM: data driven kernel
• Fisher Kernel [Jaakola & Haussler]– Kernel based on a generative model of all the data
p x|
Ux p x ln |
K x x U I Ui j xt
xi j( , ) 1
BCS, Exeter, July 2004 Mahesan Niranjan 13
Classifier Performance
• Error rates can be misleading
– Imbalance in training/test data• 98% of population healthy• 2% population has disease
– Cost of misclassification can change after design of classifier
BCS, Exeter, July 2004 Mahesan Niranjan 14
x
xx
xxx
xx
x
x
x
x
Adverse Outcome
Benign Outcome
Threshold
Class Boundary
BCS, Exeter, July 2004 Mahesan Niranjan 15
Tru
e P
osi
tive
False Positive
Area under the ROC Curve: Neat Statistical Interpretation
BCS, Exeter, July 2004 Mahesan Niranjan 16
Convex Hull of ROC Curves
False Positive
Tru
e P
osi
tive
BCS, Exeter, July 2004 Mahesan Niranjan 17
Yeast Gene Example: MATLAB Demo here
Part II: Particle Filters for Tracking and Sequential
Problems
Mahesan Niranjan
Department of Computer ScienceThe University of Sheffield
BCS, Exeter, July 2004 Mahesan Niranjan 19
Overview
• Motivation
• State Space Model
• Kalman Filter and Extensions
• Sequential MCMC Methods
– Particle Filter & Variants
BCS, Exeter, July 2004 Mahesan Niranjan 20
Motivation
• Neural Networks for Learning:– Function Approximation– Statistical Estimation– Dynamical Systems– Parallel Processing
• Guarantee Generalisation:– Regularise / control complexity– Cross validate to detect / avoid overfitting– Bootstrap to deal with model / data uncertainty
• Many of the above tricks won’t work in a sequential setting
BCS, Exeter, July 2004 Mahesan Niranjan 21
Interesting Applications
• Speech Signal Processing
• Medical Signals
– Monitoring Liver Transplant Patients
• Tracking the prices of Options contracts in
computational finance
BCS, Exeter, July 2004 Mahesan Niranjan 22
Good References• Bar-Shalom and Fortman:
Tracking and Data Association
• Jazwinski:
Stochastic Processes and Filtering Theory
• Arulampalam et al:
“Tutorial on Particle Filters…”; IEEE Transactions on Signal Processing
• Arnaud Doucet:
Technical Report 310, Cambridge University Engineering Department
• Benveniste, A et al:
Adaptive Algorithms and Stochastic Approximation
• Simon Haykin:
Adaptive Filters
BCS, Exeter, July 2004 Mahesan Niranjan 23
Matrix Inversion Lemma
BCS, Exeter, July 2004 Mahesan Niranjan 24
Linear Regression
BCS, Exeter, July 2004 Mahesan Niranjan 25
Recursive Least Squares
BCS, Exeter, July 2004 Mahesan Niranjan 26
State Space Model
State Process Noise
Observation Measurement Noise
BCS, Exeter, July 2004 Mahesan Niranjan 27
Simple Linear Gaussian Model
BCS, Exeter, July 2004 Mahesan Niranjan 28
Kalman Filter
Prediction
Correction
BCS, Exeter, July 2004 Mahesan Niranjan 29
Kalman Filter
Innovation
Kalman Gain
BCS, Exeter, July 2004 Mahesan Niranjan 30
Bayesian SettingPrior Likelihood
Innovation Probability
•Run Multiple Models and Switch - Bar-Shalom•Set Noise Levels to Max Likelihood Values - Jazwinski
BCS, Exeter, July 2004 Mahesan Niranjan 31
Extended Kalman Filter
Lee Feldkamp @ Ford Successful training of Recurrent Neural Networks
Taylor Series Expansion around the operating point
First Order
Second Order
Iterated Extended Kalman Filter
BCS, Exeter, July 2004 Mahesan Niranjan 32
Iterated Extended Kalman Filter
Local Linearization of State and / or Observation
Propagation and Update
BCS, Exeter, July 2004 Mahesan Niranjan 33
Unscented Kalman FilterGenerate some points at time
So they can represent the mean and covariance:
Propagate these through the state equations
Recompute predicted mean and covariance:
BCS, Exeter, July 2004 Mahesan Niranjan 34
Recipe to define:Recompute:
BCS, Exeter, July 2004 Mahesan Niranjan 35
Formant Tracking Example
Linear Filter
Excitation Speech
BCS, Exeter, July 2004 Mahesan Niranjan 36
Formant Tracking Example
BCS, Exeter, July 2004 Mahesan Niranjan 37
Formant Track Example
BCS, Exeter, July 2004 Mahesan Niranjan 38
Grid-based methods
Discretize continuous state into “cells”
Integrating probabilities over each partition
Fixed partitioning of state space
BCS, Exeter, July 2004 Mahesan Niranjan 39
Sampling Methods: Bayesian Inference
Parameters
Uncertainty over parameters
Inference:
BCS, Exeter, July 2004 Mahesan Niranjan 40
Basic Tool: Composition [Tanner]
To generate samples of
BCS, Exeter, July 2004 Mahesan Niranjan 41
Importance Sampling
BCS, Exeter, July 2004 Mahesan Niranjan 42
Particle Filters
Prediction
Weights of Sample
Bootstrap Filters ( Gordon et al, Tracking ) CONDENSATION Algorithm ( Isard et al, Vision )
BCS, Exeter, July 2004 Mahesan Niranjan 43
Sequential Importance Sampling
Recursive update of weights
Only upto a constant of proportionality
BCS, Exeter, July 2004 Mahesan Niranjan 44
Degeneracy in SIS
Variance of weights monotonically increases All except one decay to zero very rapidly
Effective number of particles
Resample if
BCS, Exeter, July 2004 Mahesan Niranjan 45
Sampling, Importance Re-Sampling (SIR)
Multiply samples of high weight; kill off samples in parts of space not relevant “Particle Collapse”
BCS, Exeter, July 2004 Mahesan Niranjan 46
Marginalizing Part of the State Space
Suppose
Possible to analytically integrate with respect to part of the state space
Sample with respect to
Integrate with respect to
Rao-Blackwell
BCS, Exeter, July 2004 Mahesan Niranjan 47
Variations to the Basic Algorithm
• Integrate out part of the state space– Rao-Blackwellized particle filters
( e.g. Multi-layer perceptron with linear output layer )• Variational Importance Sampling ( Lawrence et al )
• Auxilliary Particle Filters ( Pitt et al )• Regularized Particle Filters • Likelihood Particle Filters
BCS, Exeter, July 2004 Mahesan Niranjan 48
Regularised PF: basic idea
Samples
Kernel Density
Resample
Propagate in time
BCS, Exeter, July 2004 Mahesan Niranjan 49
Conclusion / Summary
• Collection of powerful algorithms
• New and interesting signal processing problems