CBCL MIT
9.520, spring 2003
Statistical Learning Theory and Statistical Learning Theory and Applications Applications
Sayan Mukherjee and Ryan Rifkin+ tomaso poggio
+ Alex Rakhlin
9.520
CBCL MIT
9.520, spring 2003
Learning: Brains and Learning: Brains and MachinesMachines
Learning is the gateway to understanding the brain and to making intelligent machines.
Problem of learning: a focus for o modern math o computer algorithms o neuroscience
CBCL MIT
9.520, spring 2003
Learning theory+ algorithms
Computational Neuroscience:
models+experiments
ENGINEERING APPLICATIONS
2
1
1min ( , ( ))i i Kf H
i
V y f x f
• Information extraction (text,Web…)• Computer vision and graphics• Man-machine interfaces• Bioinformatics (DNA arrays)• Artificial Markets (society of learning agents)
Learning to recognize objects in visual cortex
Multidisciplinary Approach to LearningMultidisciplinary Approach to Learning
CBCL MIT
9.520, spring 2003
ClassClass
Rules of the game: problem sets (2 + 1 optional) final project grading participation!
Web site: www.ai.mit.edu/projects/cbcl/courses/course9.520/index.html
CBCL MIT
9.520, spring 2003
Overview of overviewOverview of overview
o Supervised learning: the problem and how to o Supervised learning: the problem and how to frame it within classical mathframe it within classical math
o Examples of in-house applicationso Examples of in-house applications
o Learning and the braino Learning and the brain
CBCL MIT
9.520, spring 2003
Learning from Learning from ExamplesExamples
INPUT OUTPUT
EXAMPLESEXAMPLES
INPUT1 OUTPUT1
INPUT2 OUTPUT2
...........INPUTn OUTPUTn
f
CBCL MIT
9.520, spring 2003
Learning from ExamplesLearning from Examples: formal : formal setting setting
Given Given a set of a set of ll examples examples (past data)(past data)
QuestionQuestion: : find function find function ff such that such that
is ais a good predictorgood predictor of of yy for a for a futurefuture input input xx
yxf ˆ)(
),(...,,),(),,( 2211 yxyxyx
CBCL MIT
9.520, spring 2003
Classical equivalent view: supervised learning as Classical equivalent view: supervised learning as problem of multivariate function approximationproblem of multivariate function approximation
y
x
= data from f
= approximation of f
= function f
Generalization: estimating value of function where there are no data
Regression: function is real valued Classification: function is binary
Problem is ill-posed!
CBCL MIT
9.520, spring 2003
J. S. Hadamard, 1865-1963
Well-posed problems
A problem is well-posed if its solution
• exists
• is unique
• is stable, eg depends continuously on the data
Inverse problems (tomography, radar, scattering…)
are typically ill-posed
CBCL MIT
9.520, spring 2003
Two key requirements to solve the problem of Two key requirements to solve the problem of learning from examples: learning from examples:
well-posedness and consistencywell-posedness and consistency..A standard way to learn from examples is
The problem is in general ill-posed and does not have a predictive (or consistent) solution. By choosing an appropriate hypothesis space H it can be made well-posed. Independently, it is also known that an appropriate hypothesis space can guarantee consistency.
We have proved recently a surprising theorem: consistency and well posedness are equivalent, eg one implies the other.
Mukherjee, Niyogi, Poggio, 2002
Thus a stable solution is also predictive and viceversa.
CBCL MIT
9.520, spring 2003
A simple, “magic” algorithm ensures consistency A simple, “magic” algorithm ensures consistency and well-posednessand well-posedness
For a review, see Poggio and Smale, Notices of the AMS, 2003
Equation includes Regularization Networks (examples are Radial Basis Functions and Support Vector Machines)
2
1
))((1
minKii
iHf
fyxfV
)(),()( xxxx pKf i
l
i i implies
CBCL MIT
9.520, spring 2003
Classical framework but with more Classical framework but with more general loss functiongeneral loss function
Girosi, Caprile, Poggio, 1990
The algorithm uses a The algorithm uses a quite generalquite general space of functions or space of functions or “hypotheses” : RKHSs.“hypotheses” : RKHSs. n of the classical framework can provide a
better measure of “loss” (for instance for classification)…
2
1
))((1
minKii
iHf
fyxfV
CBCL MIT
9.520, spring 2003
Formally…Formally…
CBCL MIT
9.520, spring 2003
…and can be “written” as the same type of network..
X1
K K
+
f
K
Equivalence to networks
Xl
bKf i
l
i i ),()( xxx
Many different V lead to the same solution…
CBCL MIT
9.520, spring 2003
Unified framework: RN, SVMR and Unified framework: RN, SVMR and SVMCSVMC
Review by Evgeniou, Pontil and PoggioAdvances in Computational Mathematics, 2000
Equation includes Regularization Networks, eg Radial Basis Functions, and Support Vector Machines (classification and regression) and some multilayer Perceptrons.
2
1
))((1
minKii
iHf
fyxfV
CBCL MIT
9.520, spring 2003
The theoretical foundations of statistical The theoretical foundations of statistical learning are becoming part of mainstream learning are becoming part of mainstream
mathematicsmathematics
CBCL MIT
9.520, spring 2003
Theory summaryTheory summary
In the course we will introduce • Stability (well-posedness) • Consistency (predictivity)• RKHSs hypotheses spaces• Regularization techniques leading to RN and SVMs• Generalization bounds based on stability• Alternative bounds (VC and Vgamma dimensions)
• Related topics, extensions beyond SVMs
• Applications• A new key result
CBCL MIT
9.520, spring 2003
Overview of overviewOverview of overview
o Supervised learning: real matho Supervised learning: real math
o Examples of recent and ongoing in-house o Examples of recent and ongoing in-house research on applicationsresearch on applications
o Learning and the braino Learning and the brain
CBCL MIT
9.520, spring 2003
Learning from Examples: Learning from Examples: engineering applicationsengineering applications
Bioinformatics Artificial MarketsObject categorizationObject identification Image analysisGraphicsText Classification…..
…
INPUTINPUT OUTPUTOUTPUT
CBCL MIT
9.520, spring 2003
Learning from examples paradigm
Examples
Prediction Statistical Learning Algorithm
Prediction
New sample
Bioinformatics application: predicting type of cancer from DNA chips signals
CBCL MIT
9.520, spring 2003
Bioinformatics application: predicting type of cancer from DNA chips
New feature selection SVM:
Only 38 training examples, 7100 features
AML vs ALL: 40 genes 34/34 correct, 0 rejects. 5 genes 31/31 correct, 3 rejects of which 1 is an error.
CBCL MIT
9.520, spring 2003
Learning from Examples: Learning from Examples: engineering applicationsengineering applications
Bioinformatics Artificial MarketsObject categorizationObject identification Image analysisGraphicsText Classification…..
…
INPUTINPUT OUTPUTOUTPUT
CBCL MIT
9.520, spring 2003
Face identification: exampleFace identification: example
An old view-based system: 15 viewsAn old view-based system: 15 views
Performance: 98% on 68 person databaseBeymer, 1995
CBCL MIT
9.520, spring 2003
Face identification
SVM Classifier
Training Data A Identification Result
Feature extraction
New face image
Bernd Heisele, Jennifer Huang, 2002
CBCL MIT
9.520, spring 2003
Real time detection and identificationReal time detection and identification
Ho, Heisele, Poggio, 2000
Ported to Oxygen’s H21 (handheld Ported to Oxygen’s H21 (handheld device) Identification of rotated faces device) Identification of rotated faces up to about 45ºup to about 45º
Robust against changes in Robust against changes in illumination and backgroundillumination and background
Frame rate of 15 HzFrame rate of 15 Hz
Weinstein, Ho, Heisele, Poggio, Steele, Agarwal, 2002
CBCL MIT
9.520, spring 2003
Learning from Examples: Learning from Examples: engineering applicationsengineering applications
Bioinformatics Artificial MarketsObject categorizationObject identification Image analysisGraphicsText Classification…..
…
INPUTINPUT OUTPUTOUTPUT
CBCL MIT
9.520, spring 2003
Learning Object Detection: Learning Object Detection: Finding Frontal Faces ...Finding Frontal Faces ...
Training DatabaseTraining Database1000+ Real, 3000+ 1000+ Real, 3000+ VIRTUALVIRTUAL
50,0000+ Non-Face Pattern50,0000+ Non-Face Pattern
Sung, Poggio 1995Sung, Poggio 1995
CBCL MIT
9.520, spring 2003
Recent work on face detection
• Detection of faces in images
• Robustness against slight rotations in depth and imageplane
• Full face vs. component-based classifier
Heisele, Pontil, Poggio, 2000
CBCL MIT
9.520, spring 2003
The best existing system for face The best existing system for face detection?detection?
Test: CBCL set / 1,154 real faces / 14,579 non-faces / 132,365 shifted windowsTraining: 2,457 synthetic faces (58x58) / 13,654 non-faces
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.00001 0.00002 0.00003 0.00004 0.00005 0.00006 0.00007 0.00008 0.00009 0.0001
False positives / number of windows
Co
rre
ct
component classifier
whole face classifier
Heisele, Serre, Poggio et al., 2000
CBCL MIT
9.520, spring 2003
Trainable System for Object Detection: Pedestrian detection - Results
Papageorgiou and Poggio, 1998
CBCL MIT
9.520, spring 2003
Trainable System for Object Detection: Trainable System for Object Detection: Pedestrian detection - TrainingPedestrian detection - Training
Papageorgiou and Poggio, 1998
CBCL MIT
9.520, spring 2003
The system was tested in a test car The system was tested in a test car (Mercedes)(Mercedes)
CBCL MIT
9.520, spring 2003
CBCL MIT
9.520, spring 2003
System installed in experimental System installed in experimental MercedesMercedes
A fast version, integrated with a real-time obstacle
detection system
MPEG
Constantine Papageorgiou
CBCL MIT
9.520, spring 2003
System installed in experimental System installed in experimental MercedesMercedes
A fast version, integrated with a real-time obstacle
detection system
MPEG
Constantine Papageorgiou
CBCL MIT
9.520, spring 2003
People classification/detection: training People classification/detection: training the systemthe system
Representation: overcomplete dictionary of Haar wavelets; highdimensional feature space (>1300 features)
. . . . . .
pedestrian detection system
Core learning algorithm:Support Vector Machineclassifier
1848 patterns 7189 patterns
CBCL MIT
9.520, spring 2003
An improved approach: combining component An improved approach: combining component detectors detectors
Mohan, Papageorgiou and Poggio, 1999
CBCL MIT
9.520, spring 2003
ResultsResults
The system is capable of detecting partially occluded people
CBCL MIT
9.520, spring 2003
System PerformanceSystem Performance
Combination systemsCombination systems (ACC) (ACC) perform best. perform best.
All component based All component based systems perform better than systems perform better than full body personfull body person detectors. detectors.
A. Mohan, C. Papageorgiou, T. Poggio
CBCL MIT
9.520, spring 2003
Learning from Examples: ApplicationsLearning from Examples: Applications
Object identificationObject categorizationImage analysisGraphicsFinanceBioinformatics…
INPUTINPUT OUTPUTOUTPUT
CBCL MIT
9.520, spring 2003
Image AnalysisImage Analysis
IMAGE ANALYSIS: OBJECT RECOGNITION AND POSE IMAGE ANALYSIS: OBJECT RECOGNITION AND POSE ESTIMATIONESTIMATION
Bear (0° view)
Bear (45° view)
MI/ CBCL/AI
MIT
NTT, Atsugi, January 2002
CBCL MIT
9.520, spring 2002
Computer vision: analysis of facial expressions
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85
0
2
4
6
8
10
12
The main goal is to estimate basic facial parameters, e.g. degree of mouth openness, through learning. One of the
main applications is video-speech fusion to improve speech recognition systems.
Kumar, Poggio, 2001
CBCL MIT
9.520, spring 2002
Combining Top-Down Constraints and Bottom-Up Data
Learning Engine
SVM, RBF, etc
Morphable Model
= + ++ …1a
2a
3a
Kumar, Poggio, 2001
CBCL MIT
9.520, spring 2003
The Three Stages
Face Detection Localization of Facial Features
Localization of Facial Features
Analysis of Facial parts
Analysis of Facial parts
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85
0
2
4
6
8
10
12
For more details Appendix 2
CBCL MIT
9.520, spring 2003
Learning from Examples: engineering applications
Bioinformatics Artificial MarketsObject categorizationObject identification Image analysisGraphicsText Classification…..
…
INPUTINPUT OUTPUTOUTPUT
CBCL MIT
9.520, spring 2003
Image SynthesisImage Synthesis
UNCONVENTIONAL GRAPHICSUNCONVENTIONAL GRAPHICS
= 0° view
= 45° view
CBCL MIT
9.520, spring 2003
SupermodelsSupermodels
MPEG (Steve Lines)
Cc-cs.mpg
CBCL MIT
9.520, spring 2003
A trainable system for TTVSA trainable system for TTVS
Input: TextInput: Text
Output: Output: photorealistic photorealistic talking face talking face uttering textuttering text
Tony Ezzat
MITCBCL
9.520, spring 2002
From 2D to a better 2d and from 2D to 3D
Two extensions of our text-to-visual-speech (TTVS) system:
• morphing of 3D models of faces to output a 3D model of a speaking face (Blanz)
• Learn facial appearance, dynamics and coarticulation (Ezzat)
MITCBCL
9.520, spring 2002
Towards 3D
Neutral face shape was reconstructed from a single image,and animation transfered to new face.
Click for animation
3D face scans were collected.
Click for animation
Blanz, Ezzat, Poggio
MITCBCL
9.520, spring 2002
Using the same basic learning techniques: : Trainable Videorealistic Face Animation
(voice is real, video is synthetic)
Ezzat, Geiger, Poggio, SigGraph 2002
MITCBCL
9.520, spring 2002
Trainable Videorealistic Face Animation
pp ,
/B/ /AE/ /AE/ /JH//SIL/ /B/ /AE/ /JH/ /JH//SIL/
Phone Stream
Trajectory Synthesis
MMM
Phonetic Models
Image Prototypes
ii CI ,
1. Learning
System learns from 4 mins of video the face appearance (Morphable Model) and the speech dynamics of the person
Tony Ezzat,Geiger, Poggio, SigGraph 2002
2. Run Time
For any speech input the system provides as output a synthetic video stream
CBCL MIT
9.520, spring 2002
Novel !!!: Trainable Videorealistic Face Animation
Ezzat, Poggio, 2002
Let us look at video!!!
CBCL MIT
9.520, spring 2003
Blanz and Vetter,MPISigGraph ‘99
Reconstructed 3D Face Models from 1 image
CBCL MIT
9.520, spring 2003
Blanz and Vetter,MPISigGraph ‘99
Reconstructed 3D Face Models Reconstructed 3D Face Models from 1 imagefrom 1 image
CBCL MIT
9.520, spring 2003
Learning from Examples: ApplicationsLearning from Examples: Applications
Object identificationObject categorizationImage analysisGraphicsFinanceBioinformatics…
INPUTINPUT OUTPUTOUTPUT
CBCL MIT
9.520, spring 2003
Artificial Agents – learning algorithms Artificial Agents – learning algorithms – buy and sell stocks– buy and sell stocks
Bid/Ask Prices Bid/Ask Sizes
Buy/Sell
Inventory control
All available market Information
User control/calibration
Public Limit/Market Orders
Learning Feedback Loop
Example of a subproject: The Electronic Market Maker
Nicholas Chang
CBCL MIT
9.520, spring 2003
Learning from Examples: Learning from Examples: engineering applicationsengineering applications
Bioinformatics Artificial MarketsObject categorizationObject identification Image analysisGraphicsText Classification…..
…
INPUTINPUT OUTPUTOUTPUT
CBCL MIT
9.520, spring 2003
Overview of overviewOverview of overview
o Supervised learning: the problem and how to o Supervised learning: the problem and how to frame it within classical mathframe it within classical math
o Examples of in-house applicationso Examples of in-house applications
o Learning and the braino Learning and the brain
CBCL MIT
9.520, spring 2003
The Ventral Visual Stream: From V1 to ITThe Ventral Visual Stream: From V1 to IT
modified from Ungerleider and Haxby, 1994
Hubel & Wiesel, 1959Desimone, 1991
Desimone, 1991
CBCL MIT
9.520, spring 2003
Summary of “basic facts”Summary of “basic facts”
Accumulated evidence points to three (mostly accepted) properties of the ventral visual stream architecture:
• Hierarchical build-up of invariances (first to translation and scale, then to viewpoint etc.) , size of the receptive fields and complexity of preferred stimuli
• Basic feed-forward processing of information (for “immediate” recognition tasks)
• Learning of an individual object generalizes to scale and position
CBCL MIT
9.520, spring 2003
The The standardstandard model: following Hubel and Wiesel… model: following Hubel and Wiesel…
LearningLearning(supervised)(supervised)
LearningLearning(unsupervise(unsupervise
d)d)
Riesenhuber & Poggio, Nature Neuroscience, 2000
CBCL MIT
9.520, spring 2003
The standard modelThe standard model
Interprets or predicts many existing data in microcircuits and system physiology, and also in cognitive science:
• What some complex cells and V4 cells do and how: MAX…
• View-tuning of IT cells (Logothetis)• Response to pseudomirror views• Effect of scrambling • Multiple objects• Robustness to clutter• Consistent with K. Tanaka’s simplification procedure• Categorization tasks (cats vs dogs)• Invariance to translation, scale etc…
• Gender classification• Face inversion effect : experience, viewpoint, other-race, configural vs. featural representation• Transfer of generalization• No binding problem, no need for oscillations
CBCL MIT
9.520, spring 2003
Model’s early predictions: neurons Model’s early predictions: neurons becomebecome
view-tuned during recognition view-tuned during recognition
Poggio, Edelman, Riesenhuber (1990, 2000)
Logothetis, Pauls, and Poggio, 1995;Logothetis, Pauls, 1995
CBCL MIT
9.520, spring 2003
Recording Sites in Anterior Recording Sites in Anterior ITIT
LUNLAT
IOS
STS
AMTSLAT
STS
AMTS
Ho=0
Logothetis, Pauls, and Poggio, 1995;Logothetis, Pauls, 1995
…neurons tuned to faces are close by….
CBCL MIT
9.520, spring 2003
The Cortex: Neurons Tuned to Object The Cortex: Neurons Tuned to Object Views as predicted by modelViews as predicted by model
Logothetis, Pauls, Poggio, 1995
CBCL MIT
9.520, spring 2003
View-tuned cells: scale invariance View-tuned cells: scale invariance (one training view only)!(one training view only)!
Scale Invariant Responses of an IT Neuron
0 2000 3000Time (msec)
1000
Spi
kes/
sec
0
76
0 2000 3000Time (msec)
1000
Spi
kes/
sec
0
76
0 2000 3000Time (msec)
1000
Spi
kes/
sec
0
76
0 2000 3000Time (msec)
1000
Spi
kes/
sec
0
76
0 2000 3000Time (msec)
1000
Spi
kes/
sec
0
76
0 2000 3000Time (msec)
1000
Spi
kes/
sec
0
76
0 2000 3000Time (msec)
1000
Spi
kes/
sec
0
76
0 2000 3000Time (msec)
1000
Spi
kes/
sec
0
76
4.0 deg(x 1.6)
4.75 deg(x 1.9)
5.5 deg(x 2.2)
6.25 deg(x 2.5)
2.5 deg(x 1.0)
1.0 deg(x 0.4)
1.75 deg(x 0.7)
3.25 deg(x 1.3)
Logothetis, Pauls, and Poggio, 1995; Logothetis, Pauls, 1995
CBCL MIT
9.520, spring 2003
Category boundary
Prototypes100% Cat
80% Cat Morphs
60% CatMorphs
60% Dog Morphs 80% Dog
MorphsPrototypes 100% Dog
Joint Project (Max Riesenhuber) with Earl Miller & David Freedman (MIT): Neural Correlate of
Categorization (NCC)
Define categories in morph space
CBCL MIT
9.520, spring 2003
Categorization task
.
.
.
.
.
FixationSample
Delay
Test(Nonmatch)
Delay
(Match)
Test(Match)
600 ms.1000 ms.
500 ms.
.
.
Train monkey on categorization task
After training, record from neurons in IT & PFC
CBCL MIT
9.520, spring 2003
-500 0 500 1000 1500 20001
4
7
10
13
Time from sample stimulus onset (ms)
Firi
ng R
ate
(H
z)
Dog 100% Dog 80% Dog 60%
ChoiceSample Delay
Cat 100%
Fixation
Cat 60% Cat 80%
Single cell example: a PFC neuron that responds more strongly to DOGS than CATS
D. Freedman + E. Miller + M. Riesenhuber+T. Poggio (Science, 2001)
CBCL MIT
9.520, spring 2003
Is this what’s going on in cortex?
HMAX
The model suggests same computations for different recognition tasks (and objects!)
Task-specific units (e.g., PFC)
General representati
on (IT)