CSED514 SPRING 2019
Pattern Recognition
Lecture 0: Introduction
Department of Computer Science and Engineering
Pohang University of Science and Technology
77 Cheongam-ro, Nam-gu, Pohang 790-784, Korea
Course Information
• Lecture Time and Place : 15:30-16:45 pm, Tue & Thu #107, Eng. Building II
• Office Hours : 16:45-18:00 pm, Tue & Thu
• Textbooks– Christopher Bishop, Pattern Recognition and Machine Learning, 2006. – I. Goodfellow, Y. Bengio, and A, Courille, Deep Learning, 2016
• Required Backgrounds– Linear algebra: vector/matrix manipulations, properties
– Calculus: partial derivatives
– Probabilities: probability distribution, Bayes rule
– Statistics: mean and variance, maximum likelihood
– Matlab: a little programming skills
• Grades– Do the readings!
– Homework Assignments 25% : Derivations: Pen(cil) and paper
– Programming Assignment 50% : Do not cheat!!
– Final Exam 50% : Do your best. 2
Course Topics1) Introduction,
2) Probability Distributions,
3) Linear Models for Regression,
4) Linear Models for Classification,
5) Neural Networks,
6) Kernel Methods,
7) Sparse Kernel Machines,
8) Graphical Models (Bayesian Networks),
9) Mixture Models (Clustering) and EM,
10) Approximate Inference,
11) Sampling Methods,
12) Continuous Latent Variables (PCA),
13) Sequential Data (HMM),
14) Combining Models.
3
Recent advances in PRML
4
3, Learns from user selections
1, Understand human speech
2, Search and evaluates hypotheses
99%
60%
10%
IBM Watson for Jeopardy Google driverless car
Still many obstacles remain in
high level cognition.
Lecture note, Stat231-CS276A, © S.C.Zhu
ImageNet Challenges:(Russakovsky and Deng IJCV 2015)
200 object classes for detection
1000 classes for classification
(top five accuracy plotted
for object sizes in real world)
This dataset rejuvenated interests on the multi-layer Neural Network (Deep Learning).
Recent advances in PRML
ConvNets recursively defines filter responses layer by layer.
152-Layer ResNet and model is 0.5~1GB !
K. He, X. Zhang, S. Ren, J. Sun, Microsoft Research Asia, CVPR 2016
.Classification Performance
Recent advances in PRML
PRML Ex1: Classification by Deep Learning
PRML Ex1: Classification by Deep Learning
PRML Ex2: Face Detection by Boosting
Example: Human face
Viola and Jones, 2000
PRML Ex2: Face Detection by Faster R-CNN
Recently, the results have been much improved through engineering the neural networks.
Lecture note, Stat231-CS276A, © S.C.Zhu
PRML Ex3: Face social attributes
Social dimensions of faces by machine learning
© S.C.Zhu
PRML Ex4: Recognizing human pose and attributes
by attribute grammar
PRML Ex4: Recognizing human pose and attributes
by attribute grammar
These structures can be represented by graphs or by grammars.
This is often called syntactic pattern recognition with generative models.
One may view a compiler for a programming language (e.g. matlab, c) as a syntactic
pattern recognition system. A syntactic pattern recognition system not only classifies
the input, but also extracts hierarchical (compositional) structures.
Crystal patterns at atomic and molecular levels
Pattern, in English usually, refers to
regular repeated structures. But in
pattern recognition, anything that
you can perceive is a pattern.
Examples of Patterns
Constellation patterns in the sky are represented by 2D (often planar) graphs.
Finding patterns helps encoding the signals.
Human perception has a strong tendency to find patterns from almost anything.
We see (hallucinate) patterns from even random noise (psychology evidence) --- we are
more likely to believe a hidden pattern than denying it when the risk (reward) for missing
(discovering) a pattern is often high. This is an important aspect in pattern discovery.
From Philippe Schyns and
Nicola van Rijsbergen, 2014
It is formulated in the Bayesian decision theory --- considering the risk of classification.
random noise images are reported
as positives or negatives, then sum
of positives minus the sum of nega
tives to produce a face pattern.
Examples of Patterns
Biology patterns --- study in morphology and biometrics (Human ID is now an industry)
Landmarks (keypoints) are identified and matched between instances.
Applications include biometrics, computational anatomy, brain mapping,
forensics (fingerprint was first used in 1905 for solving a murder case,
now it is used for all kinds of ID systems and smart phone login).
But for other forms, like the root of plants, points cannot be registered crossing
instances. They are described by stochastic models.
Examples of Patterns
Pattern discovery and association: In plain language, a pattern often mean
a set of instances associated with (or caused by) some underlying factors.
Statistics show connections between the shape of one’s face (adults) and
his/her character. There is also evidence that the outline of children’s face
is related to alcohol abuse during pregnancy.
With fMRI, we now can look the internal patterns of brain activity and find
relationships between brain activities, cognition, and behaviors.
Examples of Patterns
A pattern often exhibits a wide range of variations with nuisance factors : e.g. faces
1. Expression –geometric deformation
2. Lighting --- photometric deformation
3. 3D pose transform
4. Noise and occlusion
Each pattern corresponds to a set (sometimes a manifold) in the signal space (spanned
by the degrees of freedoms much smaller than the signal dimensions).
The nuisance factors are called attributes when they are “useful”, e.g. as social traits.
Examples of Patterns
Examples of Patterns
Detecting or recognizing human faces in the real world is challenging.
We need to consider many factors to build a robust system.
1, Face detection with cooperative subjects has found wide commercial applications.
e.g. face detection in camera and iphones.
2, Face recognition is getting more promising this year, after failing many companies.
Face recognition becomes more tractable in constrained environments.
Outdoor lighting
variation is the
main obstacle for
face recognition.
infrared camera from Stan Li’s lab/com
Examples of Patterns
Lecture note, Stat231-CS276A, © S.C.Zhu
Neurons in our Brain
are detecting all kinds
of patterns.
This example is
a recorded neuron
at the human medial
temporal lobe (MTL)
.
“Invariant visual representation by single neurons
in the human brain” Nature, Vol 435, June 2005.
Classifying human actions, activities, and events in video. Other activities include
city level (your mobile pattern recorded by GPS, Phone, e.g. Intelligent City)
Image from Jason Corso, Action Bank.
Some neurons in the pre-motor area of our brain respond to various actions, and how do we
encode actions in our brain? Mirror neurons and origin of language.
Examples of Patterns
How are these texture patterns represented in a human brain or a computer?
physics-based models vs. phenomenological models vs. example-based models
A wide variety of texture patterns are generated by various stochastic processes
(chemical or physical, biologic), do we need to simulate these processes for
representing the patterns?
A pattern could be represented by many ways for different purposes.
Examples of Patterns
Speech signal and hidden Markov model (level I)
An example is the
model for speech
recognition:
People built physical
models to simulate the
uttering of phonemes.
Now this problem is
solved more effectively
by collecting large
examples of speech
to combat accents and
variations.
Examples of Patterns
Natural language and stochastic grammar (level II).
Syntactic pattern recognition methods were developed in the 1970s for recognizing
patterns which have wide structural variations (i.e. signals have varying dimensions).
Examples of Patterns
(level III)
I didn’t update this slice. Nowadays,
games are designed by stochastic
grammatical odels.
Examples of Patterns
Applications of Pattern Recognition
Lie detector,
Handwritten Zip code/digit/letter recognition
Biometrics: voice, iris, finger print, face, and gait recognition
Speech/voice recognition
Smell recognition (e-nose, sensor networks)
Defect detection in chip manufacturing
Reading DNA sequences, Medical diagnosis
Detecting spam mails, …
Levels of Difficulties in Pattern Recognition Tasks
For example, there are many levels of tasks related to human face patterns
1. Face authentication (hypothesis test for one class)
2. Face detection (yes/no for many instances).
3. Face recognition (classification)
4. Expression recognition (smile, disgust, surprise, angry)
identifiability problem.
5. Gender and age recognition
--------------------------------------------------------------
6. Face sketch and from images to cartoon
--- needs generative models.
7. Face caricature
… …
The simpler tasks 1-4 may be solved effectively using discriminative methods,
but the tasks 5-7 will need generative methods that model faces explicitly.
From this example, we can see a problem of generalization in discriminative methods.
Is a picture drawn by a master or an amateur?
Some hard example: art and antique authentication
In many cases, we only have small data.
(from S. Lyu, D. Rockmore, Hany Farid 2004)
Example: Art Authentication
A multi-dimensional scaling (MDS) technique
Project a high-dimensional feature vector
to 3D space so as to preserve the similarity
(distance).
The circular and rectangular dots correspond
to two types of styles.
Lecture note, Stat231-CS276A, © S.C.Zhu
Two Schools of Thoughts1. Discriminative methods:
The goal is to tell apart a number of patterns, say 100 people in a company, 10 digits
for zip-code reading. These methods hit the discriminative target directly, without
having to understand the patterns (their structures) or to develop a full mathematical
description.
For example, we may tell someone is speaking English or Chinese in the hallway
without understanding the words he is speaking.
“You should not solve a problem to an extent more than what you need” –Vapnik.
2. Generative methods:Bayesian school, pattern theory.
1) Define patterns and regularities (in graphical representations)
2) Specify likelihood model for how signals are generated from hidden structures
3) Learning probability models from ensembles of signals
4) Inferences
“If you cannot solve a simple problem in vision, you may have to solve for a complex one”
Recently, the two schools are increasingly integrated, leads to lifelong continuous learning.
Methods and Research Streams
Methods for pattern recognition:
Axis I: Generative vs. discriminative(Bayesian vs. non-Bayesian)
Axis II: Deterministic vs. stochastic(logic/syntactic/rule-based vs. statistics)
Axis III: Parametric vs. semi-parametric vs. non-parametric(the number of parameters vs. the size of training examples)
Axis IV: Supervised vs. Weakly supervised vs. unsupervised
Examples:
Bayesian decision theory, neural networks, syntactical pattern recognition (AI),
decision trees, support vector machines, boosting techniques, deep learning,
generative adversarial networks (GAN)
A Simple Example of Pattern Recognition
Classifying fish into two classes: Salmon and Sea Bass by discriminative method
Features and Distributions
Decision/Classification Boundaries
Main Issues in Pattern Recognition
1. Feature selection and learning.
--- What are good discriminative features?
2. Modeling and learning
3. Dimension reduction, model complexity
4. Decisions and risks
5. Error analysis and validation.
6. Performance bounds and capacity.
7. Algorithms
Definition of Machine Learning
• “Field of study that gives computers the ability to learn without being
explicitly programmed” (1959, Arthur Samuel)
• “A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at tasks
in T, as measured by P, improves with experience E” (2006, Tom M.
Mitchell)
• Machine learning is programming computers to optimize a performance criterion
using example data or past experience (2010, Ethem Alpydm).
• This definition is notable for its defining machine learning in fundamentally
operational rather than cognitive terms, thus following Alan Turing's
proposal in Turing's paper "Computing Machinery and Intelligence" that the
question "Can machines think?" be replaced with the question "Can
machines do what we (as thinking entities) can do?"
37
Why Study “Learning” ?
• Learning is used when:– Human expertise does not exist (navigating on Mars),– Humans are unable to explain their expertise (speech recognition)– Solution changes in time (routing on a computer network)– Solution needs to be adapted to particular cases (user biometrics)
• Develop enhanced computer systems– Automatically adapt to user, customize– Often difficult to acquire necessary knowledge
• Improve understanding of human, biological learning– Computational analysis provides concrete theory, predictions– Explosion of methods to analyze brain activity during learning
• Timing is good– Ever growing amounts of data available (Big data)– Cheap and powerful computers (GPU, Cloud (Cluster) computing)– Suite of algorithms, theory already developed– Many grand challenges (Big money)
38
Some examples of tasks that are best solved by
machine learning
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual sequences of credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant or unusual sound in your car engine.
• Prediction:
– Future stock prices or currency exchange rates
39
Applications by Machine Learning
Classifying DNA sequences
Sequence mining
Speech and handwriting recognition
Game playing
Software engineering
Adaptive websites
Robot locomotion
Computational advertising
Computational finance
Structural health monitoring
Sentiment analysis (or opinion mining)
Affective computing
Information retrieval
Recommender systems
40
Machine Perception
Computer vision, including object
recognition
Natural language processing
Syntactic pattern recognition
Search engines
Medical diagnosis
Bioinformatics
Brain-machine interfaces
Cheminformatics
Detecting credit card fraud
Stock market analysis
Classifying DNA sequences
Sequence mining
Speech and handwriting recognition
Game playing
Machine Learning & Statistics
• A lot of machine learning is just a rediscovery of things that statisticians already knew. This is often disguised by differences in terminology:
– Ridge regression = weight-decay
– Fitting = learning
– Held-out data = test data
• But the emphasis is very different:
– A good piece of statistics: Clever proof that a relatively simple estimation procedure is asymptotically unbiased.
– A good piece of machine learning: Demonstration that a complicated algorithm produces impressive results on a specific task.
41
Machine Learning & Statistics
Machine Learning vs Statistics
42
• network, graphs
• weights
• learning
• generalization
• supervised learning
• unsupervised learning.
• large grant: $1,000,000
• conference location:
Snowbird, French Alps
• model
• parameters
• fitting
• test set performance
• regression/classification
• density estimation, clustering
• large grant: $50,000
• conference location:
Las Vegas in August
Types of Learning Task
• Supervised learning: Classification, Regression, Prediction
– Learn to predict output when given an input vector• Who provides the correct answer?
• Unsupervised learning: Clustering, Density Estimation
– Create an internal representation of the input e.g. form clusters, extract features, compress data, detect outliers• How do we know if a representation is good?
– This is the new frontier of machine learning because most big datasets do not come with labels.
• Reinforcement learning
– Learn action to maximize payoff• Not much information in a payoff signal
• Payoff is often delayed
– Reinforcement learning is an important area that will not be covered in this course.
Simple Applications of Learning
• Association
• Supervised Learning
– Classification
– Regression
– Prediction
• Unsupervised Learning
• Reinforcement Learning
44
Learning Associations
• Basket analysis:
P (Y | X ) probability that somebody who buys X also buys Y w
here X and Y are products/services.
Example: P ( chips | beer ) = 0.7
45
Classification
46
• Example: Credit scoring
• Differentiating between low-
risk and high-risk customers
from their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Classification: Applications
• Aka Pattern recognition
• Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style
• Character recognition: Different handwriting styles.
• Speech recognition: Temporal dependency.
• Medical diagnosis: From symptoms to illnesses
• Biometrics: Recognition/authentication using physical and/orbehavioral characteristics: Face, iris, signature, etc
• ...
47
Regression
• Example: Price of a used car
• x : car attributes
y : price
y = g (x | q )
g ( ) model
q parameters
48
y = wx+w0
Regression Applications
• Navigating a car: Angle of the steering
• Kinematics of a robot arm
49
α1= g1(x,y)
α2= g2(x,y)
α1
α2
(x,y)
Response surface design
Temporal Prediction
• Goal: perform classification/regression on new input
sequence values at future time points, given input values and
corresponding class labels/outputs at some previous time
points
50
Unsupervised Learning
• Learning “what normally happens”
• No output
• Clustering: Grouping similar instances
• Example applications
– Customer segmentation in CRM
– Image compression: Color quantization
– Bioinformatics: Learning motifs
51
Unsupervised Learning (Cont’d)
• Clustering
– Inputs are vector or categorical
– Goal: group data cases into a finite number of clusters so that within each
cluster all cases have very similar inputs
• Compression
– Inputs are typically vector
– Goal: deliver an encoder and decoder such that size of encoder output is
much smaller than original input, but composition of encoder
followed by decode very similar to original input
• Outlier detection
– Inputs are anything
– Goal: select highly unusual cases from new and given data
52
Reinforcement Learning
• Learning a policy: A sequence of outputs
• No supervised output but delayed reward
• Credit assignment problem
• Game playing
• Robot in a maze
• Multiple agents, partial observability, ...
53
Resources: Datasets
• UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html
• UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.ht
ml
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/
• ImageNet: http://www.image-net.org
• COCO dataset: http://cocodataset.org
• KITTI dataset: http://www.cvlibs.net/datasets/kitti/
• CMU Multi PIE Face dataset: https://old.datahub.io/dataset/multipie
54
Resources: Journals
• Journal of Machine Learning Research www.jmlr.org
• Machine Learning
• Neural Computation
• Neural Networks
• IEEE Transactions on Neural Networks
• IEEE Transactions on Pattern Analysis and Machine Intelligence
• Pattern Recognition
• International Journal of Computer Vision
• Annals of Statistics
• Journal of the American Statistical Association
• ...
55
Resources: Conferences
• International Conference on Machine Learning (ICML)
• European Conference on Machine Learning (ECML)
• Neural Information Processing Systems (NIPS)
• Uncertainty in Artificial Intelligence (UAI)
• Computational Learning Theory (COLT)
• International Conference on Artificial Neural Networks (ICANN)
• Computer Vision and Pattern Recognition (CVPR)
• International Conference on Computer Vision (ICCV)
• International Conference on AI & Statistics (AISTATS)
• International Conference on Pattern Recognition (ICPR)
• ...
56