CSED514 SPRING 2019 Pattern Recognitionimlab.postech.ac.kr/dkim/class/csed514_2019s/ch0.pdf ·...

CSED514 SPRING 2019

Pattern Recognition

Lecture 0: Introduction

Department of Computer Science and Engineering

Pohang University of Science and Technology

77 Cheongam-ro, Nam-gu, Pohang 790-784, Korea

[email protected]

Course Information

• Lecture Time and Place : 15:30-16:45 pm, Tue & Thu #107, Eng. Building II

• Office Hours : 16:45-18:00 pm, Tue & Thu

• Textbooks– Christopher Bishop, Pattern Recognition and Machine Learning, 2006. – I. Goodfellow, Y. Bengio, and A, Courille, Deep Learning, 2016

• Required Backgrounds– Linear algebra: vector/matrix manipulations, properties

– Calculus: partial derivatives

– Probabilities: probability distribution, Bayes rule

– Statistics: mean and variance, maximum likelihood

– Matlab: a little programming skills

• Grades– Do the readings!

– Homework Assignments 25% : Derivations: Pen(cil) and paper

– Programming Assignment 50% : Do not cheat!!

– Final Exam 50% : Do your best. 2

Course Topics1) Introduction,

2) Probability Distributions,

3) Linear Models for Regression,

4) Linear Models for Classification,

5) Neural Networks,

6) Kernel Methods,

7) Sparse Kernel Machines,

8) Graphical Models (Bayesian Networks),

9) Mixture Models (Clustering) and EM,

10) Approximate Inference,

11) Sampling Methods,

12) Continuous Latent Variables (PCA),

13) Sequential Data (HMM),

14) Combining Models.

3

Recent advances in PRML

4

3, Learns from user selections

1, Understand human speech

2, Search and evaluates hypotheses

99%

60%

10%

IBM Watson for Jeopardy Google driverless car

Still many obstacles remain in

high level cognition.

Lecture note, Stat231-CS276A, © S.C.Zhu

ImageNet Challenges:(Russakovsky and Deng IJCV 2015)

200 object classes for detection

1000 classes for classification

(top five accuracy plotted

for object sizes in real world)

This dataset rejuvenated interests on the multi-layer Neural Network (Deep Learning).


ConvNets recursively defines filter responses layer by layer.

152-Layer ResNet and model is 0.5~1GB !

K. He, X. Zhang, S. Ren, J. Sun, Microsoft Research Asia, CVPR 2016

.Classification Performance


PRML Ex1: Classification by Deep Learning

PRML Ex1: Classification by Deep Learning

PRML Ex2: Face Detection by Boosting

Example: Human face

Viola and Jones, 2000

PRML Ex2: Face Detection by Faster R-CNN

Recently, the results have been much improved through engineering the neural networks.


PRML Ex3: Face social attributes

Social dimensions of faces by machine learning

© S.C.Zhu

PRML Ex4: Recognizing human pose and attributes

by attribute grammar

PRML Ex4: Recognizing human pose and attributes

by attribute grammar

These structures can be represented by graphs or by grammars.

This is often called syntactic pattern recognition with generative models.

One may view a compiler for a programming language (e.g. matlab, c) as a syntactic

pattern recognition system. A syntactic pattern recognition system not only classifies

the input, but also extracts hierarchical (compositional) structures.

Crystal patterns at atomic and molecular levels

Pattern, in English usually, refers to

regular repeated structures. But in

pattern recognition, anything that

you can perceive is a pattern.

Examples of Patterns

Constellation patterns in the sky are represented by 2D (often planar) graphs.

Finding patterns helps encoding the signals.

Human perception has a strong tendency to find patterns from almost anything.

We see (hallucinate) patterns from even random noise (psychology evidence) --- we are

more likely to believe a hidden pattern than denying it when the risk (reward) for missing

(discovering) a pattern is often high. This is an important aspect in pattern discovery.

From Philippe Schyns and

Nicola van Rijsbergen, 2014

It is formulated in the Bayesian decision theory --- considering the risk of classification.

random noise images are reported

as positives or negatives, then sum

of positives minus the sum of nega

tives to produce a face pattern.


Biology patterns --- study in morphology and biometrics (Human ID is now an industry)

Landmarks (keypoints) are identified and matched between instances.

Applications include biometrics, computational anatomy, brain mapping,

forensics (fingerprint was first used in 1905 for solving a murder case,

now it is used for all kinds of ID systems and smart phone login).

But for other forms, like the root of plants, points cannot be registered crossing

instances. They are described by stochastic models.


Pattern discovery and association: In plain language, a pattern often mean

a set of instances associated with (or caused by) some underlying factors.

Statistics show connections between the shape of one’s face (adults) and

his/her character. There is also evidence that the outline of children’s face

is related to alcohol abuse during pregnancy.

With fMRI, we now can look the internal patterns of brain activity and find

relationships between brain activities, cognition, and behaviors.


A pattern often exhibits a wide range of variations with nuisance factors : e.g. faces

1. Expression –geometric deformation

2. Lighting --- photometric deformation

3. 3D pose transform

4. Noise and occlusion

Each pattern corresponds to a set (sometimes a manifold) in the signal space (spanned

by the degrees of freedoms much smaller than the signal dimensions).

The nuisance factors are called attributes when they are “useful”, e.g. as social traits.



Detecting or recognizing human faces in the real world is challenging.

We need to consider many factors to build a robust system.

1, Face detection with cooperative subjects has found wide commercial applications.

e.g. face detection in camera and iphones.

2, Face recognition is getting more promising this year, after failing many companies.

Face recognition becomes more tractable in constrained environments.

Outdoor lighting

variation is the

main obstacle for

face recognition.

infrared camera from Stan Li’s lab/com



Neurons in our Brain

are detecting all kinds

of patterns.

This example is

a recorded neuron

at the human medial

temporal lobe (MTL)

.

“Invariant visual representation by single neurons

in the human brain” Nature, Vol 435, June 2005.

Classifying human actions, activities, and events in video. Other activities include

city level (your mobile pattern recorded by GPS, Phone, e.g. Intelligent City)

Image from Jason Corso, Action Bank.

Some neurons in the pre-motor area of our brain respond to various actions, and how do we

encode actions in our brain? Mirror neurons and origin of language.


How are these texture patterns represented in a human brain or a computer?

physics-based models vs. phenomenological models vs. example-based models

A wide variety of texture patterns are generated by various stochastic processes

(chemical or physical, biologic), do we need to simulate these processes for

representing the patterns?

A pattern could be represented by many ways for different purposes.


Speech signal and hidden Markov model (level I)

An example is the

model for speech

recognition:

People built physical

models to simulate the

uttering of phonemes.

Now this problem is

solved more effectively

by collecting large

examples of speech

to combat accents and

variations.


Natural language and stochastic grammar (level II).

Syntactic pattern recognition methods were developed in the 1970s for recognizing

patterns which have wide structural variations (i.e. signals have varying dimensions).


(level III)

I didn’t update this slice. Nowadays,

games are designed by stochastic

grammatical odels.


Applications of Pattern Recognition

Lie detector,

Handwritten Zip code/digit/letter recognition

Biometrics: voice, iris, finger print, face, and gait recognition

Speech/voice recognition

Smell recognition (e-nose, sensor networks)

Defect detection in chip manufacturing

Reading DNA sequences, Medical diagnosis

Detecting spam mails, …

Levels of Difficulties in Pattern Recognition Tasks

For example, there are many levels of tasks related to human face patterns

1. Face authentication (hypothesis test for one class)

2. Face detection (yes/no for many instances).

3. Face recognition (classification)

4. Expression recognition (smile, disgust, surprise, angry)

identifiability problem.

5. Gender and age recognition

--------------------------------------------------------------

6. Face sketch and from images to cartoon

--- needs generative models.

7. Face caricature

… …

The simpler tasks 1-4 may be solved effectively using discriminative methods,

but the tasks 5-7 will need generative methods that model faces explicitly.

From this example, we can see a problem of generalization in discriminative methods.

Is a picture drawn by a master or an amateur?

Some hard example: art and antique authentication

In many cases, we only have small data.

（from S. Lyu, D. Rockmore, Hany Farid 2004）

Example: Art Authentication

A multi-dimensional scaling (MDS) technique

Project a high-dimensional feature vector

to 3D space so as to preserve the similarity

(distance).

The circular and rectangular dots correspond

to two types of styles.


Two Schools of Thoughts1. Discriminative methods:

The goal is to tell apart a number of patterns, say 100 people in a company, 10 digits

for zip-code reading. These methods hit the discriminative target directly, without

having to understand the patterns (their structures) or to develop a full mathematical

description.

For example, we may tell someone is speaking English or Chinese in the hallway

without understanding the words he is speaking.

“You should not solve a problem to an extent more than what you need” –Vapnik.

2. Generative methods:Bayesian school, pattern theory.

1) Define patterns and regularities (in graphical representations)

2) Specify likelihood model for how signals are generated from hidden structures

3) Learning probability models from ensembles of signals

4) Inferences

“If you cannot solve a simple problem in vision, you may have to solve for a complex one”

Recently, the two schools are increasingly integrated, leads to lifelong continuous learning.

Methods and Research Streams

Methods for pattern recognition:

Axis I: Generative vs. discriminative(Bayesian vs. non-Bayesian)

Axis II: Deterministic vs. stochastic(logic/syntactic/rule-based vs. statistics)

Axis III: Parametric vs. semi-parametric vs. non-parametric(the number of parameters vs. the size of training examples)

Axis IV: Supervised vs. Weakly supervised vs. unsupervised

Examples:

Bayesian decision theory, neural networks, syntactical pattern recognition (AI),

decision trees, support vector machines, boosting techniques, deep learning,

generative adversarial networks (GAN)

A Simple Example of Pattern Recognition

Classifying fish into two classes: Salmon and Sea Bass by discriminative method

Features and Distributions

Decision/Classification Boundaries

Main Issues in Pattern Recognition

1. Feature selection and learning.

--- What are good discriminative features?

2. Modeling and learning

3. Dimension reduction, model complexity

4. Decisions and risks

5. Error analysis and validation.

6. Performance bounds and capacity.

7. Algorithms

Definition of Machine Learning

• “Field of study that gives computers the ability to learn without being

explicitly programmed” (1959, Arthur Samuel)

• “A computer program is said to learn from experience E with respect to

some class of tasks T and performance measure P, if its performance at tasks

in T, as measured by P, improves with experience E” (2006, Tom M.

Mitchell)

• Machine learning is programming computers to optimize a performance criterion

using example data or past experience (2010, Ethem Alpydm).

• This definition is notable for its defining machine learning in fundamentally

operational rather than cognitive terms, thus following Alan Turing's

proposal in Turing's paper "Computing Machinery and Intelligence" that the

question "Can machines think?" be replaced with the question "Can

machines do what we (as thinking entities) can do?"

37

http://en.wikipedia.org/wiki/Arthur_Samuel

http://en.wikipedia.org/wiki/Tom_M._Mitchell

http://en.wikipedia.org/wiki/Operational_definition

http://en.wikipedia.org/wiki/Alan_Turing

http://en.wikipedia.org/wiki/Computing_Machinery_and_Intelligence

Why Study “Learning” ?

• Learning is used when:– Human expertise does not exist (navigating on Mars),– Humans are unable to explain their expertise (speech recognition)– Solution changes in time (routing on a computer network)– Solution needs to be adapted to particular cases (user biometrics)

• Develop enhanced computer systems– Automatically adapt to user, customize– Often difficult to acquire necessary knowledge

• Improve understanding of human, biological learning– Computational analysis provides concrete theory, predictions– Explosion of methods to analyze brain activity during learning

• Timing is good– Ever growing amounts of data available (Big data)– Cheap and powerful computers (GPU, Cloud (Cluster) computing)– Suite of algorithms, theory already developed– Many grand challenges (Big money)

38

Some examples of tasks that are best solved by

machine learning

• Recognizing patterns:

– Facial identities or facial expressions

– Handwritten or spoken words

– Medical images

• Generating patterns:

– Generating images or motion sequences

• Recognizing anomalies:

– Unusual sequences of credit card transactions

– Unusual patterns of sensor readings in a nuclear power plant or unusual sound in your car engine.

• Prediction:

– Future stock prices or currency exchange rates

39

Applications by Machine Learning

Classifying DNA sequences

Sequence mining

Speech and handwriting recognition

Game playing

Software engineering

Adaptive websites

Robot locomotion

Computational advertising

Computational finance

Structural health monitoring

Sentiment analysis (or opinion mining)

Affective computing

Information retrieval

Recommender systems

40

Machine Perception

Computer vision, including object

recognition

Natural language processing

Syntactic pattern recognition

Search engines

Medical diagnosis

Bioinformatics

Brain-machine interfaces

Cheminformatics

Detecting credit card fraud

Stock market analysis

Classifying DNA sequences

Sequence mining

Speech and handwriting recognition

Game playing

http://en.wikipedia.org/wiki/DNA_sequence

http://en.wikipedia.org/wiki/Sequence_mining

http://en.wikipedia.org/wiki/Speech_recognition

http://en.wikipedia.org/wiki/Handwriting_recognition

http://en.wikipedia.org/wiki/Strategy_game

http://en.wikipedia.org/wiki/Software_engineering

http://en.wikipedia.org/wiki/Adaptive_website

http://en.wikipedia.org/wiki/Robot_locomotion

http://en.wikipedia.org/wiki/Computational_advertising

http://en.wikipedia.org/wiki/Computational_finance

http://en.wikipedia.org/wiki/Structural_health_monitoring

http://en.wikipedia.org/wiki/Sentiment_analysis

http://en.wikipedia.org/wiki/Affective_computing

http://en.wikipedia.org/wiki/Information_retrieval

http://en.wikipedia.org/wiki/Recommender_system

http://en.wikipedia.org/wiki/Computer_vision

http://en.wikipedia.org/wiki/Computer_vision

http://en.wikipedia.org/wiki/Object_recognition

http://en.wikipedia.org/wiki/Natural_language_processing

http://en.wikipedia.org/wiki/Syntactic_pattern_recognition

http://en.wikipedia.org/wiki/Search_engines

http://en.wikipedia.org/wiki/Diagnosis_(artificial_intelligence)

http://en.wikipedia.org/wiki/Bioinformatics

http://en.wikipedia.org/wiki/Brain-machine_interfaces

http://en.wikipedia.org/wiki/Cheminformatics

http://en.wikipedia.org/wiki/Credit_card_fraud

http://en.wikipedia.org/wiki/Stock_market

http://en.wikipedia.org/wiki/DNA_sequence

http://en.wikipedia.org/wiki/Sequence_mining

http://en.wikipedia.org/wiki/Speech_recognition

http://en.wikipedia.org/wiki/Handwriting_recognition

http://en.wikipedia.org/wiki/Strategy_game

Machine Learning & Statistics

• A lot of machine learning is just a rediscovery of things that statisticians already knew. This is often disguised by differences in terminology:

– Ridge regression = weight-decay

– Fitting = learning

– Held-out data = test data

• But the emphasis is very different:

– A good piece of statistics: Clever proof that a relatively simple estimation procedure is asymptotically unbiased.

– A good piece of machine learning: Demonstration that a complicated algorithm produces impressive results on a specific task.

41

Machine Learning & Statistics

Machine Learning vs Statistics

42

• network, graphs

• weights

• learning

• generalization

• supervised learning

• unsupervised learning.

• large grant: $1,000,000

• conference location:

Snowbird, French Alps

• model

• parameters

• fitting

• test set performance

• regression/classification

• density estimation, clustering

• large grant: $50,000

• conference location:

Las Vegas in August

Types of Learning Task

• Supervised learning: Classification, Regression, Prediction

– Learn to predict output when given an input vector• Who provides the correct answer?

• Unsupervised learning: Clustering, Density Estimation

– Create an internal representation of the input e.g. form clusters, extract features, compress data, detect outliers• How do we know if a representation is good?

– This is the new frontier of machine learning because most big datasets do not come with labels.

• Reinforcement learning

– Learn action to maximize payoff• Not much information in a payoff signal

• Payoff is often delayed

– Reinforcement learning is an important area that will not be covered in this course.

Simple Applications of Learning

• Association

• Supervised Learning

– Classification

– Regression

– Prediction

• Unsupervised Learning

• Reinforcement Learning

44

Learning Associations

• Basket analysis:

P (Y | X ) probability that somebody who buys X also buys Y w

here X and Y are products/services.

Example: P ( chips | beer ) = 0.7

45

Classification

46

• Example: Credit scoring

• Differentiating between low-

risk and high-risk customers

from their income and

savings

Discriminant: IF income > θ1 AND savings > θ2

THEN low-risk ELSE high-risk

Classification: Applications

• Aka Pattern recognition

• Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style

• Character recognition: Different handwriting styles.

• Speech recognition: Temporal dependency.

• Medical diagnosis: From symptoms to illnesses

• Biometrics: Recognition/authentication using physical and/orbehavioral characteristics: Face, iris, signature, etc

• ...

47

Regression

• Example: Price of a used car

• x : car attributes

y : price

y = g (x | q )

g ( ) model

q parameters

48

y = wx+w0

Regression Applications

• Navigating a car: Angle of the steering

• Kinematics of a robot arm

49

α1= g1(x,y)

α2= g2(x,y)

α1

α2

(x,y)

Response surface design

Temporal Prediction

• Goal: perform classification/regression on new input

sequence values at future time points, given input values and

corresponding class labels/outputs at some previous time

points

50

Unsupervised Learning

• Learning “what normally happens”

• No output

• Clustering: Grouping similar instances

• Example applications

– Customer segmentation in CRM

– Image compression: Color quantization

– Bioinformatics: Learning motifs

51

Unsupervised Learning (Cont’d)

• Clustering

– Inputs are vector or categorical

– Goal: group data cases into a finite number of clusters so that within each

cluster all cases have very similar inputs

• Compression

– Inputs are typically vector

– Goal: deliver an encoder and decoder such that size of encoder output is

much smaller than original input, but composition of encoder

followed by decode very similar to original input

• Outlier detection

– Inputs are anything

– Goal: select highly unusual cases from new and given data

52

Reinforcement Learning

• Learning a policy: A sequence of outputs

• No supervised output but delayed reward

• Credit assignment problem

• Game playing

• Robot in a maze

• Multiple agents, partial observability, ...

53

Resources: Datasets

• UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html

• UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.ht

ml

• Statlib: http://lib.stat.cmu.edu/

• Delve: http://www.cs.utoronto.ca/~delve/

• ImageNet: http://www.image-net.org

• COCO dataset: http://cocodataset.org

• KITTI dataset: http://www.cvlibs.net/datasets/kitti/

• CMU Multi PIE Face dataset: https://old.datahub.io/dataset/multipie

54

http://www.ics.uci.edu/~mlearn/MLRepository.html

http://kdd.ics.uci.edu/summary.data.application.html

http://lib.stat.cmu.edu/

http://www.cs.utoronto.ca/~delve/



http://www.cvlibs.net/datasets/kitti/

Resources: Journals

• Journal of Machine Learning Research www.jmlr.org

• Machine Learning

• Neural Computation

• Neural Networks

• IEEE Transactions on Neural Networks

• IEEE Transactions on Pattern Analysis and Machine Intelligence

• Pattern Recognition

• International Journal of Computer Vision

• Annals of Statistics

• Journal of the American Statistical Association

• ...

55

http://www.jmlr.org/

Resources: Conferences

• International Conference on Machine Learning (ICML)

• European Conference on Machine Learning (ECML)

• Neural Information Processing Systems (NIPS)

• Uncertainty in Artificial Intelligence (UAI)

• Computational Learning Theory (COLT)

• International Conference on Artificial Neural Networks (ICANN)

• Computer Vision and Pattern Recognition (CVPR)

• International Conference on Computer Vision (ICCV)

• International Conference on AI & Statistics (AISTATS)

• International Conference on Pattern Recognition (ICPR)

• ...

56

Date post:	07-Aug-2020
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

CSED514 SPRING 2019 Pattern Recognitionimlab.postech.ac.kr/dkim/class/csed514_2019s/ch0.pdf ·...

Documents