ECE 627 –Computer Vision Spring 2017 Lecture 9 · Semester Project (DUE MAY 20th!) • Motion...

ECE 627 – Computer VisionSpring 2017Lecture 9:

Pattern Recognition and Classification Algorithms

Charis TheocharidesAssociate Professor, Dept. of Electrical and Computer Engineering

University of Cyprus

Semester Project (DUE MAY 20th!)• Motion Detection and Estimation/Optical Flow• Active contour model/Snakes• IP Camera Intruder Detection System for

Surveillance• Face Recognition on Mobile Phones (Android

or iOS)• 3D Reconstruction from multiview images• Raspberry-Pi based Drone Object Recognition

and Identification• Intel Compute Stick (as above)• OpenCV projects on Jetson TK1 (face

recognition, car recognition)• Kinect-based motion recognition• Gesture Recognition on leap-motion sensor• Aerial Object Detection of MOVING objects• Movable Object Tracking from Movable camera

• Goal-line optical technology for sports using multiview cameras and real-time reconstruction

• Pedestrian vs. animal vs. car/truck classification for driver assistance

• SLAM (robotics)• License Plate Recognition• Road Sign Recognition• Road-Line Detection and tracking• Face Expression Recognition• Handwritten character recognition• Top view Object Detection (Cars,Buildings)

from google map images (also maybe landmark recognition)

• Hand Gesture Recognition• - Food Recognition (see "On Filter Banks of

Texture Features for Mobile Food Classification")

INDEPENDENT STUDY

• Each one of you will do areview

• Submit a 15-page report byend of semester – MAY 20th!

• Review related to the workyou will do for project

• Present your knowledge and review on a topic of your choice

• Suggested topics:– Object Recognition– Classification– Region Segmentation– Motion Detection– Gesture and Motion

Recognition– Contours and Edges– Tools/Software (OpenCV)– Algorithms (Viola-Jones,

SURF, etc.)

ΠΕΡΙΕΧΟΜΕΝΑ ΜΑΘΗΜΑΤΟΣ

• Introduction To Computer Vision• Image Fundamentals: Cameras, Lenses and Optical Sensors, Data Acquisition

and Representation, Radiometry & Reflectance• Image Formation: Sources, Shading, Colour, Metadata • Linear Filters & Edges, Lines, Textures, Pyramids• Segmentation: Transforms, Contours, Feature Extraction• Optical Flow, Silhouettes,Contours, Motion Vectors• Motion - Continuous and Discrete• Recognition Algorithms and Introduction to Computational Intelligence for

Vision• Template Matching and Recognition (Classifiers, Neural Nets, SVM,...)• Object Detection, Recognition, Tracking• Epipolar Geometry, Multiple View Geometry and Stereo Matching, Calibration• 3D Vision - Stereo/Multiview, Structured Light Approaches, other 3D

Approaches• Embedded and Mobile Computer Vision - Concepts, Constraints, Approaches and

Solutions in Emerging Applications

Object Detection Object Identification

Where is Jane?Where is a Face?Is there a face in the image?

Who is it?Is it Jane or Erik?

Object Recognition

What is pattern recognition?

• A pattern is an object, process or event that can be given a name.

• A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source.

• During recognition (or classification) given objects are assigned to prescribed classes.

• A classifier is a machine which performs classification.

“The assignment of a physical object or event to one of several prespecified categeries” -- Duda & Hart

Examples of applications

• Optical Character Recognition

(OCR)

• Biometrics

• Diagnostic systems

• Military applications

• Handwritten: sorting letters by postal code, input device for PDA‘s.• Printed texts: reading machines for blind people, digitalization of text documents.

• Face recognition, verification, retrieval. • Finger prints recognition.• Speech recognition.

• Medical diagnosis: X-Ray, EKG analysis.• Machine diagnostics, waster detection.

• Automated Target Recognition (ATR).• Image segmentation and analysis (recognition from aerial or satelite photographs).

Approaches• Statistical PR: based on underlying statistical model of patterns

and pattern classes.• Structural (or syntactic) PR: pattern classes represented by means

of formal structures as grammars, automata, strings, etc. • Neural networks: classifier is represented as a network of cells

modeling neurons of the human brain (connectionist approach).

Examples of applications

Overfitting and underfitting

Problem: how rich class of classifications q(x;θ) to use.

underfitting overfittinggood fit

Problem of generalization: a small emprical risk Remp does not imply small true expected risk R.

Basic concepts

y x=

úúúú

û

ù

êêêê

ë

é

nx

xx

!2

1Feature vector- A vector of observations (measurements).- is a point in feature space .

Hidden state- Cannot be directly measured.- Patterns with equal hidden state belong to the same class.

XÎx

x X

YÎy

Task- To design a classifer (decision rule) which decides about a hidden state based on an onbservation.

YX ®:q

Pattern

Example

x=úû

ùêë

é

2

1

xx

height

weight

Task: horse jockey recognition.

The set of hidden state is

The feature space is },{ JH=Y

2Â=X

Training examples )},(,),,{( 11 ll yy xx !

1x

2x

Jy =

Hy =Linear classifier:

îíì

<+×³+×

=0)(0)(

)q(bifJbifH

xwxw

x

0)( =+× bxw

Components of PR system

Sensors and preprocessing

Feature extraction Classifier Class

assignment

• Sensors and preprocessing.• A feature extraction aims to create discriminative features good for classification.• A classifier.• A teacher provides information about hidden state -- supervised learning.• A learning algorithm sets PR from training examples.

Learning algorithmTeacher

Pattern

Feature extraction

Task: to extract features which are good for classification.Good features: • Objects from the same class have similar feature values.

• Objects from different classes have different values.

“Good” features “Bad” features

Feature extraction methods

úúúú

û

ù

êêêê

ë

é

km

mm

!2

1

úúúú

û

ù

êêêê

ë

é

nx

xx

!2

11φ

2φ

nφ úúúúúú

û

ù

êêêêêê

ë

é

km

mmm

!3

2

1

úúúú

û

ù

êêêê

ë

é

nx

xx

!2

1

Feature extraction Feature selection

Problem can be expressed as optimization of parameters of featrure extractor .

Supervised methods: objective function is a criterion of separability (discriminability) of labeled examples, e.g., linear discriminat analysis (LDA).

Unsupervised methods: lower dimesional representation which preserves important characteristics of input data is sought for, e.g., principal component analysis (PCA).

φ(θ)

ClassifierA classifier partitions feature space X into class-labeled regions such that

||21 YXXXX ÈÈÈ= ! }0{||21 =ÇÇÇ YXXX !and

1X 3X

2X

1X1X

2X

3X

The classification consists of determining to which region a feature vector x belongs to.

Borders between decision boundaries are called decision regions.

Representation of classifier

A classifier is typically represented as a set of discriminant functions

||,,1,:)(f YX !=Â® ii xThe classifier assigns a feature vector x to the i-the class if )(f)(f xx ji > ij ¹"

)(f1 x

)(f2 x

)(f || xY

maxx yFeature vector

Discriminant function

Class identifier!

Review

Fig.1 Basic components of a pattern recognition system

Steps

• Data acquisition and sensing• Pre-processing

u Removal of noise in data.u Isolation of patterns of interest from the background.

• Feature extractionu Finding a new representation in terms of features.

(Better for further processing)

Steps• Model learning and estimation

u Learning a mapping between features and pattern groups.

• Classificationu Using learned models to assign a pattern to a predefined

category

• Post-processingu Evaluation of confidence in decisions.u Exploitation of context to improve performances.

Table 1 : Examples of pattern recognition applications

35

Image Recognition2D Matched Filter

• Functionalityu Degrading the noise effect.

u Computing the similarity of two objects. (Template matching for images)

• Functional block

2D Matched Filter

Input image I(m,n) Template image H*(-m,-n)

Output image Y(m,n)

Impulse response H*(-m,-n)

36

2D Matched Filter : Template MatchingInput image

I(m,n)

Template imageH(m,n)

Rotated image

H*(-m,-n)

2D MatchedFilter Without normalization

With normaliztion

Output image

37

2D Matched Filter : Template Matching

• Drawbacksu Poor discriminative ability on template shape.

(Ignoring the structural relation of patterns)u Changes in in rotation and magnification of

template objects result in enormous number oftemplates testing.

• Template matching is usually limited to smaller local features, which are more invariant to size and shape variations of an object.

38

Image Registration• What is Image Registration?

u Aligning images correctly to make systems have better performance.

• Misregistration between imagesu Translational differencesu Scale differencesu Rotational differences

39

Bayes Statistical Classifiers• Consideration

u Randomness of patterns• Decision criterion

Pattern x is labeled as class wi if

ki k k qj q q

W W

k=1 q=1

L p(x / w )P(w )< L p(x / w )P(w )å å

Lij : Misclassification loss function p(x/wi) : P.d.f. of a particular pattern x comes from class wiP(wi) : Probability of occurrence of class wi

40

Bayes Statistical Classifiers• Decision criterion :

Given Lij is symmetrical functionu Posterior probability decision rule

i i j jp(x / w)P(w)> p(x / w)P(w)

j j j jd (x)= p(x / w)P(w)= P(w / x)dj(x) : decision functions

Pattern x classifies to class j if dj(x) yields the largest value

41

Bayes Statistical Classifiers• Advantages

u Optimization in minimizing the total avarage lossin miscalssification.

• Disadvantagesu Both P(wj) and p(x/wj) must be known in advance.

Estimation is required. Performance highly depends on the assumption ofthe distributions.( P(wj) and p(x/wj) )

Two Schools of Thought1. Statistical Pattern Recognition

The data is reduced to vectors of numbers and statistical techniques are used for the tasks to be performed.

2. Structural Pattern Recognition

The data is converted to a discrete structure (such as a grammar or a graph) and the techniques are related to computer sciencesubjects (such as parsing and graph matching).

Classification in Statistical PR• A class is a set of objects having some important

properties in common

• A feature extractor is a program that inputs thedata (image) and extracts features that can beused in classification.

• A classifier is a program that inputs the feature vector and assigns it to one of a set of designated classes or to the “reject” class.

With what kinds of classes do you work?

Feature Vector Representation• X=[x1, x2, … , xn],

each xj a real number• xj may be an object

measurement• xj may be count of

object parts• Example: object rep.

[#holes, #strokes, moments, …]

Possible features for char rec.

Some Terminology• Classes: set of m known categories of objects

(a) might have a known description for each(b) might have a set of samples for each

• Reject Class:a generic class for objects not in any of the designated known classes

• Classifier:Assigns object to a class based on features

Discriminant functions• Functions f(x, K)

perform some computation on feature vector x

• Knowledge K from training or programming is used

• Final stage determines class

Classification using nearest class mean

• Compute the Euclidean distance between feature vector X and the mean of each class.

• Choose closest class, if close enough (reject otherwise)

Nearest mean might yield poor results with complex structure

• Class 2 has two modes; where isits mean?

• But if modes are detected, two subclass mean vectors can be used

Scaling coordinates by std dev

Receiver Operating Curve ROC

• Plots correct detection rate versus false alarm rate

• Generally, false alarms go up with attempts to detect higher percentages of known objects

Confusion matrix shows empirical performance

Classifiers often used in CV• Decision Tree Classifiers

• Artificial Neural Net Classifiers

• Bayesian Classifiers and Bayesian Networks(Graphical Models)

• Support Vector Machines

Introduction – Neural Nets• What are Neural Networks?– Neural networks are a paradigm of programming computers. – They are exceptionally good at performing pattern recognition

and other tasks that are very difficult to program using conventional techniques.

– Programs that employ neural nets are also capable of learning on their own and adapting to changing conditions.

Background• An Artificial Neural Network (ANN) is an information

processing paradigm that is inspired by the biological nervous systems, such as the human brain’s information processing mechanism.

• The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. NNs, like people, learn by example.

• An NN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of NNs as well.

How the Human Brain learns

• In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites.

• The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches.

• At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons.

Neural Networks• Biological approach to AI• Developed in 1943• Comprised of one or more layers of

neurons• Several types, we’ll focus on feed-

forward networks

http://faculty.washington.edu/chudler/color/pic1an.gifhttp://research.yale.edu/ysm/images/78.2/articles-neural-neuron.jpg

Biological

Artificial

Neurons

A Neuron

• Receives n-inputs• Multiplies each input

by its weight• Applies activation

function to the sum of results

• Outputs result

http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg

A Neuron Model• When a neuron receives excitatory input that is sufficiently large

compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another changes.

• We conduct these neural networks by first trying to deduce the essential features of neurons and their interconnections.

• We then typically program a computer to simulate these features.

Activation Functions• Controls when unit is “active”

or “inactive”• Threshold function outputs 1

when input is positive and 0 otherwise

• i.e. Sigmoid function = 1 / (1 + e-x)

• Hyperbolic Tangent…etc.

Neural Network Layers

• Each layer receives its inputs from the previous layer and forwards its outputs to the next layer

http://smig.usgs.gov/SMIG/features_0902/tualatin_ann.fig3.gif

Pattern Recognition• An important application of neural networks is pattern

recognition. Pattern recognition can be implemented by using a feed-forward neural network that has been trained accordingly.

• During training, the network is trained to associate outputs with input patterns.

• When the network is used, it identifies the input pattern and tries to output the associated output pattern.

• The power of neural networks comes to life when a pattern that has no output associated with it, is given as an input.

• In this case, the network gives the output that corresponds to a taught input pattern that is least different from the given pattern.

Pattern Recognition (cont.)

• Suppose a network is trained to recognize the patterns T and H. The associated patterns are all black and all white respectively as shown above.


Since the input pattern looks more like a ‘T’, when the network classifies it, it sees the input closely resembling ‘T’ and outputs the pattern that represents a ‘T’.


The input pattern here closely resembles ‘H’ with a slight difference. The network in this case classifies it as an ‘H’ and outputs the pattern representing an ‘H’.


• Here the top row is 2 errors away from a ‘T’ and 3 errors away from an H. So the top output is a black.

• The middle row is 1 error away from both T and H, so the output is random.• The bottom row is 1 error away from T and 2 away from H. Therefore the

output is black. • Since the input resembles a ‘T’ more than an ‘H’ the output of the network is in

favor of a ‘T’.

Learning by Back-Propagation: Illustration

ARTIFICIAL NEURAL NETWORKS Colin Fahey's Guide (Book CD)

Computational Complexity• Could lead to a very large number of calculations

Influence Map Layer 1

Influence Map Layer 2

Hidden Units

Output Units

Input Units

Different types of Neural Networks• Feed-forward networks– Feed-forward NNs allow signals to travel one way

only; from input to output. There is no feedback (loops) i.e. the output of any layer does not affect that same layer.

– Feed-forward NNs tend to be straight forward networks that associate inputs with outputs. They are extensively used in pattern recognition.

– This type of organization is also referred to as bottom-up or top-down.

Continued• Feedback networks

– Feedback networks can have signals traveling in both directions by introducing loops in the network.

– Feedback networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point.

– They remain at the equilibrium point until the input changes and a new equilibrium needs to be found.

– Feedback architectures are also referred to as interactive or recurrent, although the latter term is often used to denote feedback connections in single-layer organizations.

Diagram of an NN

Fig: A simple Neural Network

Network Layers• Input Layer - The activity of the input units represents the raw

information that is fed into the network. • Hidden Layer - The activity of each hidden unit is determined by

the activities of the input units and the weights on the connections between the input and the hidden units.

• Output Layer - The behavior of the output units depends on the activity of the hidden units and the weights between the hidden and output units.

Continued• This simple type of network is interesting because the hidden units

are free to construct their own representations of the input. • The weights between the input and hidden units determine when

each hidden unit is active, and so by modifying these weights, a hidden unit can choose what it represents.

Network Structure• The number of layers and of neurons depend on the specific task.

In practice this issue is solved by trial and error.• Two types of adaptive algorithms can be used:

– start from a large network and successively remove some neurons and links until network performance degrades.

– begin with a small network and introduce new neurons until performance is satisfactory.

Network Parameters• How are the weights initialized?• How many hidden layers and how many neurons?• How many examples in the training set?

Weights• In general, initial weights are randomly chosen,

with typical values between -1.0 and 1.0 or -0.5 and 0.5.

• There are two types of NNs. The first type is known as – Fixed Networks – where the weights are fixed– Adaptive Networks – where the weights are changed

to reduce prediction error.

Size of Training Data• Rule of thumb:

– the number of training examples should be at least five to ten times the number of weights of the network.

• Other rule:

a)-(1|W| N >

|W|= number of weights

a = expected accuracy on test set

Training Basics• The most basic method of training a neural

network is trial and error. • If the network isn't behaving the way it should,

change the weighting of a random link by a random amount. If the accuracy of the network declines, undo the change and make a different one.

• It takes time, but the trial and error method does produce results.

Training: Backprop algorithm• The Backprop algorithm searches for weight values that minimize the total

error of the network over the set of training examples (training set).• Backprop consists of the repeated application of the following two passes:

– Forward pass: in this step the network is activated on one example and the error of (each neuron of) the output layer is computed.

– Backward pass: in this step the network error is used for updating the weights. Starting at the output layer, the error is propagated backwards through the network, layer by layer. This is done by recursively computing the local gradient of each neuron.

Back Propagation• Learning Methodology

l Back-propagation training algorithm

l Backprop adjusts the weights of the NN in order to minimize the network total mean squared error.

Network activationForward Step

Error propagationBackward Step

The Learning Process (cont.)• Every neural network possesses knowledge which

is contained in the values of the connection weights.

• Modifying the knowledge stored in the network as a function of experience implies a learning rule for changing the values of the weights.

The Learning Process (cont.)• Recall: Adaptive networks are NNs that allow the

change of weights in its connections. • The learning methods can be classified in two

categories: – Supervised Learning– Unsupervised Learning

Supervised Learning• Supervised learning which incorporates an external teacher, so

that each output unit is told what its desired response to input signals ought to be.

• An important issue concerning supervised learning is the problem of error convergence, ie the minimization of error between the desired and computed unit values.

• The aim is to determine a set of weights which minimizes the error. One well-known method, which is common to many learning paradigms is the least mean square (LMS) convergence.

Supervised Learning• In this sort of learning, the human teacher’s experience is used to

tell the NN which outputs are correct and which are not. • This does not mean that a human teacher needs to be present at

all times, only the correct classifications gathered from the human teacher on a domain needs to be present.

• The network then learns from its error, that is, it changes its weight to reduce its prediction error.

Unsupervised Learning• Unsupervised learning uses no external teacher and is based upon

only local information. It is also referred to as self-organization, in the sense that it self-organizes data presented to the network and detects their emergent collective properties.

• The network is then used to construct clusters of similar patterns. • This is particularly useful is domains were a instances are checked

to match previous scenarios. For example, detecting credit card fraud.

Neural Network in Use

Since neural networks are best at identifying patterns or trends

in data, they are well suited for prediction or forecastingneeds including: – sales forecasting – industrial process control – customer research – data validation – risk management

ANN are also used in the following specific paradigms: recognition of speakers in communications; diagnosis of hepatitis; undersea mine detection; texture analysis; three-dimensional object recognition; hand-written word recognition; and facial recognition.

Other (Linear) Classifiers

f x

a

yest

denotes +1denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

wx + b=0

w x + b<0

w x + b>0

Linear Classifiers

f x

a

yest




Linear Classifiersf x

a

yest




Linear Classifiers

f x

a

yest



Any of these would be fine..

..but which is best?

Linear Classifiersf x

a

yest




Misclassifiedto +1 class

f x

a

yest



Define the marginof a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Classifier Marginf x

a

yest



Define the marginof a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Maximum Marginf x

a

yest



The maximum margin linear classifier is the linear classifier with the, um, maximum margin.This is the simplest kind of SVM (Called an LSVM)

Linear SVM

Support Vectors are those datapoints that the margin pushes up against

1. Maximizing the margin is good2. Implies that only support vectors are

important; other training examples are ignorable.

3. Empirically it works very very well.

SVM applications• SVMs were originally proposed by Boser, Guyon and Vapnik in

1992 and gained increasing popularity in late 1990s.

• SVMs are currently among the best performers for a number of classification tasks ranging from text to genomic data.

• SVM techniques have been extended to a number of tasks such as regression [Vapnik et al. ’97], principal component analysis [Schölkopf et al. ’99], etc.

• Most popular optimization algorithms for SVMs are SMO [Platt ’99] and SVMlight [Joachims’ 99], both use decomposition to hill-climb over a subset of αi’s at a time.

• Tuning SVMs remains a black art: selecting a specific kernel and parameters is usually done in a try-and-see manner.

Support Vector Machines

• SVMs pick best separating hyperplane according to some criterion– e.g. maximum margin

• Training process is an optimisation• Training set is effectively reduced to a relatively small

number of support vectors

Feature Spaces• We may separate data by mapping to a higher-

dimensional feature space– The feature space may even have an infinite number

of dimensions!• We need not explicitly construct the new feature space

Kernels

• We may use Kernel functions to implicitly map to a new feature space

• Kernel fn:

• Kernel must be equivalent to an inner product in some feature space

( ) Rxx Î21,K

Example Kernels

zx ×Linear:

Polynomial: ( )zx ×P

Gaussian: ( )22 /exp szx--

Perceptron Revisited: Linear Separators

• Binary classification can be viewed as the task of separating classes in feature space:

wTx + b = 0

wTx + b < 0wTx + b > 0

f(x) = sign(wTx + b)

Which of the linear separators is optimal?

Best Linear Separator?




Find Closest Points in Convex Hulls

c

d

Plane Bisect Closest Points

dc

wT x + b =0w = d - c

Classification Margin

• Distance from example data to the separator is • Data closest to the hyperplane are support vectors. • Margin ρ of the separator is the width of separation between

classes.

wxw brT +

=

r

ρ

Maximum Margin Classification

• Maximizing the margin is good according to intuition and theory.

• Implies that only support vectors are important; other training examples are ignorable.

Margins and Complexity

Skinny marginis more flexiblethus more complex.

Margins and Complexity

Fat marginis less complex.

n SVM locates a separating hyperplane in the feature space and classify points in that space

n It does not need to represent the space explicitly, simply by defining a kernel function

n The kernel function plays the role of the dot product in the feature space.

Nonlinear SVM - Overview

Properties of SVM

• Flexibility in choosing a similarity function• Sparseness of solution when dealing with large data sets

- only support vectors are used to specify the separating hyperplane • Ability to handle large feature spaces

- complexity does not depend on the dimensionality of the feature space• Overfitting can be controlled by soft margin approach• Nice math property: a simple convex optimization problem which is

guaranteed to converge to a single global solution• Feature Selection

SVM Applications• SVM has been used successfully in many real-

world problems- text (and hypertext) categorization- image classification- bioinformatics (Protein classification,

Cancer classification)- hand-written character recognition

Application 1: Cancer Classification

• High Dimensional- p>1000; n<100

• Imbalanced- less positive samples

• Many irrelevant features• Noisy

GenesPatients g-1 g-2 …… g-p

P-1p-2…….

p-n

Nn

xxkxxK+

+= l),(],[

FEATURE SELECTION

In the linear case,wi

2 gives the ranking of dim i

SVM is sensitive to noisy (mis-labeled) data L

Weakness of SVM• It is sensitive to noise

- A relatively small number of mislabeled examples can dramatically decrease the performance

• It only considers two classes- how to do multi-class classification with SVM?- Answer: 1) with output arity m, learn m SVM’s– SVM 1 learns “Output==1” vs “Output != 1”– SVM 2 learns “Output==2” vs “Output != 2”– :– SVM m learns “Output==m” vs “Output != m”

2)To predict the output for a new input, just predict with each SVM and find out which one puts the prediction the furthest into the positive region.

Application 2: Text Categorization• Task: The classification of natural text (or

hypertext) documents into a fixed number of predefined categories based on their content.- email filtering, web searching, sorting documents by topic, etc..

• A document can be assigned to more than one category, so this can be viewed as a series of binary classification problems, one for each category

Representation of Text

IR’s vector space model (aka bag-of-words representation)n A doc is represented by a vector indexed by a pre-fixed

set or dictionary of termsn Values of an entry can be binary or weights

n Normalization, stop words, word stems n Doc x => φ(x)

Text Categorization using SVM• The distance between two documents is φ(x)·φ(z)

• K(x,z) = �φ(x)·φ(z) is a valid kernel, SVM can be used with K(x,z) for discrimination.

• Why SVM?-High dimensional input space

-Few irrelevant features (dense concept)

-Sparse document vectors (sparse instances)

-Text categorization problems are linearly separable

Some Issues• Choice of kernel

- Gaussian or polynomial kernel is default- if ineffective, more elaborate kernels are needed- domain experts can give assistance in formulating appropriate similarity measures

• Choice of kernel parameters- e.g. σ in Gaussian kernel- σ is the distance between closest points with different classifications - In the absence of reliable criteria, applications rely on the use of a

validation set or cross-validation to set such parameters.

• Optimization criterion – Hard margin v.s. Soft margin- a lengthy series of experiments in which various parameters are tested

Additional Resources

• An excellent tutorial on VC-dimension and Support Vector Machines:

C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998.

• The VC/SRM/SVM Bible:Statistical Learning Theory by Vladimir Vapnik, Wiley-Interscience; 1998

http://www.kernel-machines.org/

http://www.kernel-machines.org/

•A Case Study on Face Detection and Recognition

•Feature based face matching

Face image From facedetection

Normalization Feature extraction

Feature vector

classifierDecision makerOutput results

You can extract various features

You can use various classifiers

You can use various decision makers

Normalization

)()())(()()y(

TITImeanTImeanC

T

TN ss

-=

Eye location Normalization: rotation normalization, scale normalization

Cross Correlation :

object template

Averaged for objects

Feature extraction•Eyebrow thickness and vertical position at the eye center position

•A coarse description of the left eyebrow’s arches

•Nose vertical position and width

•Mouth vertical position, width, height upper and lower lips

• eleven radii describing the chin shape

•Bigonial breadth (face width at nose position)

•Zygomatic breadth (face width halfway between nose tip and eyes).

Example of some geometrical features

Classifier

å---=D

1 )()()( jT

jj mxmxx

Bayes classifier

Feature vector Computer )(xj

Dx

(j=2,3,…N)jm

Rank the distance values

)(xj

D

Output the results

This is just one example of classifier!

Template matchingProduce a template

Face image From facedetection

Normalization

Decision makerOutput results

matching

Templatesdatabase

You have to create the data base of templates for all people you want to recognize

There are different templates used in various regions of the normalized face.

Various methods can be used to compress information for each template.

Example-Based Learning Approach

Three parts:• The image is divided into many possible-overlapping

windows, – each window pattern gets classified as either “a face” or

“not a face” based on a set of local image measurements.• For each new pattern to be classified, the system

computes a set of different measurements between the new pattern and the canonical face model.

• A trained classifier identifies the new pattern as “a face” or “not a face”.

Example of a system using EBL

• Kanade et al. first proposed an NN-based approach in 1996.

• Although NN have received significant attention in many research areas, few applications were successful in face recognition.

Why?

Neural Nets

Neural network (NN)• It’s easy to train a neural network with samples which

contain faces, but it is much harder to train a neural network with samples which do not.

• The number of “non-face” samples are just too large.

Neural network (NN)

• Neural network-based filter.– A small filter window is used to scan through all

portions of the image, – and to detect whether a face exists in each window.

• Merging overlapping detections and arbitration. By setting a small threshold, many false detections can be eliminated.

Rowley and Kanade’s Approach!

Test results of using NN

Date post:	17-Apr-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

ECE 627 –Computer Vision Spring 2017 Lecture 9 · Semester Project (DUE MAY 20th!) • Motion...

Documents