+ All Categories
Home > Documents > LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Date post: 28-Jan-2017
Category:
Upload: lamanh
View: 261 times
Download: 11 times
Share this document with a friend
44
LINEAR MODELS FOR CLASSIFICATION Chapter 4 in PRML Yun Gu Department of Automation Shanghai Jiao Tong University Shanghai, CHINA May 12, 2014
Transcript
Page 1: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

LINEAR MODELS FOR CLASSIFICATIONChapter 4 in PRML

Yun Gu

Department of AutomationShanghai Jiao Tong University

Shanghai, CHINA

May 12, 2014

Page 2: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Outline

1 Overview of Classification

2 Discriminant FunctionsFrom Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

3 Prob. Generative ModelsLinear Discriminant AnalysisQuadratic Discriminant Analysis

4 Prob. Discrimintive ModelsLogistic RegressionLDA or Logistic Regression?

5 BonusMulti-Class, Multi-Label, Struct-Output

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 3: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Outline of this section

1 Overview of Classification

2 Discriminant Functions

3 Prob. Generative Models

4 Prob. Discrimintive Models

5 Bonus

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 4: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

What is Classification?

Definition (Classification)

Classification is the problem of identifying to which of a set ofcategories (sub-populations) a new observation belongs, on the basis of atraining set of data containing observations (or instances) whose categorymembership is known.

Input: Feature vector x, continuous or discrete;

Output: Qualitative values (i.e. categorical or discrete values) y ,where y ∈ Y, Y = {y1, y2, ..., yn};Mapping: y = f (x) : x→ y ;

Supervised Learning/Semi-supervised Learning.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 5: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Example:Image Classification

Figure : An example of image classfication. Input: Visual Feaures(e.g. SIFT,Color Moment); Output: Scene Category (e.g. ”Bridge”, ”Castle”); Mapping:A Sparse Factor Representation (see ”Multi-Label Image Categorization WithSparse Factor Representation”, Sun, et.al. IEEE TIP 2014)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 6: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Outline of this section

1 Overview of Classification

2 Discriminant FunctionsFrom Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

3 Prob. Generative Models

4 Prob. Discrimintive Models

5 Bonus

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 7: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Two-Class Problem

Let’s start with the simplest case: Two-class Classification, where:

Input: Feature vector x, continuous or discrete;

Output: Binary values y , where y ∈ Y, Y = {−1,+1} orY = {0, 1};Model: y(X) = wTx + w0, x is assigned with −1, if y(x) ≥ 0;Otherwise, x is assigned with +1.

Examples: Male or Female; Toy Data.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 8: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Example of Two-Class Problem: On ”Toy” Dataset

Figure : A demonstration of Classification on ”toy” dataset via Semi-supervisedLearning. (see ”Graph Transduction via Alternating”, Wang, et.al, ICML, 2008)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 9: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

From Two-Class to Multiple-Class

A complex case: Multiple-class Classification, where:

Input: Feature vector x, continuous or discrete;

Output: For a n-class problem, Qualitive values C , where C ∈ C,C = {C1,C2, ...,Cn};Model:

Binary: One-vs-OneBinary: One-vs-RestBinary: ECOCDirectly comprising n Linear functions

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 10: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

From Two-Class to Multiple-Class (Cont.)

Scheme 1: Decompose a n-class problem into several two-classesproblems:

One-vs-Rest: Use n − 1 binary classifiers each of which solves atwo-class problem of separating points in a particular class Ck frompoints not in that class.

Pros: Easy and Efficient;Cons: Ambigious samples; Unbalanced Samples.

One-vs-One: Use (n − 1)n/2 binary classifiers, one for evertpossible pairs of classes. Each point is classified according to amajority vote amongst the discriminant functions.

Pros: ”Better” performance than ”One-vs-Rest”Cons: Ambigious samples; Too many classifiers;

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 11: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Problems from 1-vs-1 and 1-vs-R

Figure : Attempting to construct a n class discriminant from a set of two classdiscriminants leads to ambiguous regions, shown in green. On the left is anexample involving the use of two discriminants designed to distinguish points inclass Ck from points not in class Ck . On the right is an example involving threediscriminant functions each of which is used to separate a pair of classes Ck

and Cj . (PRML pp.183)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 12: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

From Two-Class to Multiple-Class (Cont.)

ECOC (Error-Correcting Output Codes:) Given a set of nclasses, the basis of the ECOC framework consists of designing acodeword for each of the classes. These codewords encode themembership information of each class for a given binary problem.

Pros: Robust;Cons: The construction of codebook.

4-Class One-vs-Rest Scheme

y1

y2

y3

y4

ECOC Extension

Figure : ECOC for a 4-Class ProblemY. Gu LINEAR MODELS FOR CLASSIFICATION

Page 13: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

From Two-Class to Multiple-Class (Cont.)

A comprehensive evaluation on different schemes for”TwoClass2MultipleClass” has been conducted (R. Rifkin and A.Klautau,”In Defense of One-Vs-All Classification”,JMLR,2004). Their main thesisis that a simple ”one-vs-all” scheme is as accurate as any other approach,assuming that the underlying binary classiers are well-tuned regularizedclassiers such as support vector machines. This thesis is interesting inthat it disagrees with a large body of recent published work on multiclassclassication. They support our position by means of a critical review ofthe existing literature, a substantial collection of carefully controlledexperimental work, and theoretical arguments.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 14: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

From Two-Class to Multiple-Class (Cont.)

Directly comprising n Linear functions:

yk(x) = wTx + w0, k = 1, 2, ..., n

x is assigned with Class Ck if yk≥yj(k ∈ {1, 2, ..., n}, k 6=j)

Pros: Likelihood Score, Convex;Cons: Easy to solve? Cannot use the binary mode;

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 15: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Methods for Classification

Discriminant Functions:

Data Fitting: Least Squares Estimation;Feature Reduction: Fisher’s Discriminant Analysis;Find a Separating Hyperplane: Perceptron and SVM.

Probalistic Generative Model: LDA/QDA.

Probalistic Discriminative Model: Logistic Regression.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 16: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Least Squares For Classification

Least Squares: Similar to linear regression, we can solve theclassification via least squares estimation.

Data: Given the training set {(xn, tn)}, n = 1, 2, ...,N, xn is thefeature vector; tn is a indicator vector. For a K-Class problem,tn,k = 1, tn,i = 0(i 6=k), if xn belongs to Ck .Decision Function: For Class Ck , yk((x)) = wk

Tx + wk,0. x isassigned with Ck , if yk(x)≥yi (x), k 6=iOpt Problem:

minW

L(W) = 12Tr{(WTX− T)(WTX− T)T}

s.t. X = {x1, x2, ..., xN};W = {w1

T ,w2T , ...,wK

T}T ;

T = {tT1 , tT2 , ..., tTK}T .

Cons: Sensitive to singular data; Poor performance with asymetrydistributed data.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 17: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Least Squares For Classification (Cont.)

Figure : Failure cases: Left:Singular data; Right: Asymetry data

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 18: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Fisher’s Linear Discriminant (Fisher, 1936)

Idea: Treat the classification problem as feature reduction(projected to 1-D);

Figure : The corresponding projection based on the Fisher linear discriminant.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 19: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Fisher’s Linear Discriminant (Cont.)

Criterion: Maximize the between-class variance and Minimize thewithin-class variance.

Data: For a 2-Class problem, training set {(xn, yn}.n = 1, 2, ...,N,yn∈{−1,+1}Decision Function: y = wTx

Opt Problem:

maxw

J(w) = (m2−m1)2

S21 +S2

2= wT SBw

wT SWw

s.t. mk = wTmk; mk = 1Nk

∑xn∈Ck

xn;

s2k =

∑xn∈Ck

(yn −mk)2; yk = wTxk;

SB = (m2 −m1)(m2 −m1)T ;

SW =∑

k

∑n∈Ck

(xn −mk)(xn −mk)T .

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 20: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Fisher’s Linear Discriminant (Cont.)

MultiClass Cases: (Reduced to K-Dimension)

Data: For a K-Class problem, training set {(xn, tn}.n = 1, 2, ...,N, tnis a K-dim indicator vector.Decision Function: y = WTxOpt Problem:

maxw

J(W) = Tr{s−1W sB}

s.t. sW =∑N

k=1

∑n∈Ck

(yn − µk)(yn − µk)T

sB =∑K

k=1 Nk(µk − µ)(µk − µ)T

µk = 1Nk

∑n∈Ck

yn, µ = 1N

∑Nk=1 µkNk .

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 21: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Find a Separating Hyperplane

This procedure tries to conduct linear decision boundaries thatexplicitly separate the data into different classes.

PerceptronSupport Vector Machine

Figure : The orange line is the least squares solution, which misclassifies one ofthe training points. Two blue separating hyperplanes are found by theperceptron with different random starts.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 22: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Preliminary

The distance between decision boundary and point:

Decision boundary: y(x) = wTx + w0 = 0;

Point: x = x⊥ + r w‖w‖ ;

Projection:r = y(x)‖w‖ (Pos/Neg).

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 23: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Perceptron (Rosenblatt,1962)

The Perceptron learning algorithms tries to find a separating hyperplaneby minimizing the distance of misclassified points to the decisionboundary. If a response yi = 1 is misclassified, then y = wTφ(x), andthe opposite for a misclassified response with yi = −1.

Problem: Binary Classification;

Decision Function: y = wTφ(x) + w0;

Opt Model:

minw,‖w‖=1

−∑i∈M

yi (wTφ(x))

where M indexes the set of missclassifield points. This problem canbe solved via Stochastic Gradient Descent (SGD) algorithm.

Cons: The solution is not unique; The number of iteration can bevery large.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 24: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

From Two Classes to Multiple ClassesLeast SquareFisher’s Linear DiscriminantFind a Separating Hyperplane

Optimal Separating Hyperplanes (Vapnik, 1996)

This approach tries to provides a unique solution to separatinghyperplane problem via maximizing the margin between the two classes.For a binary problem with N training data, we have:

maxw,w0

C

s.t. yi(wT xk+w0)‖w‖ ≥ C , i = 1, 2, ...,N

Set C‖w‖ = 1, the problem can be written as:

minw,w0

12‖w‖

s.t. yi (wTxk + w0) ≥ 1, i = 1, 2, ...,N

This is the basic form of Linear Support Vector Machine.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 25: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Linear Discriminant AnalysisQuadratic Discriminant Analysis

Outline of this section

1 Overview of Classification

2 Discriminant Functions

3 Prob. Generative ModelsLinear Discriminant AnalysisQuadratic Discriminant Analysis

4 Prob. Discrimintive Models

5 Bonus

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 26: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Linear Discriminant AnalysisQuadratic Discriminant Analysis

Generative Model Review

This part is based on Bayesian Posterior Inference as follows:

P(Ck |x) =P(x|Ck)P(Ck)

P(x)=

P(x|Ck)P(Ck)∑j P(x|Cj)P(Cj)

If P(x|Ck) is Gaussian, then the ”log-odds” c is:

logP(Ci |x)

P(Cj |x)= 1

2 log|Σj ||Σi | + 1

2 [(x− µj)TΣ−1

j (x− µj)− (x− µi)TΣ−1

i (x− µi)]

+ log P(Ci )P(Cj )

The task is to estimate the covariance and expectation of each class.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 27: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Linear Discriminant AnalysisQuadratic Discriminant Analysis

Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) arises in the special case when weassume that the classes have a common covariance matrix Σk = Σ,∀k.So we have:

logP(Ci |x)

P(Cj |x)= log

P(Ci )

P(Cj)− 1

2(µi + µj)

TΣ−1(µi − µj) + xTΣ−1(µi − µj)

an equation linear in x. For each class, we have the linear discriminantfunctions:

δk(x) = xTΣ−1µk −1

2µk

TΣ−1µk + logP(Ck)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 28: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Linear Discriminant AnalysisQuadratic Discriminant Analysis

Linear Discriminant Analysis (Cont.)

How to estimate the parameters? MLE.ˆP(Ck) = Nk

N , where Nk is the number of Class-k observations;

µ̂k = 1Nk

∑n∈Ck

xn;

Σ̂ =∑K

k=1

∑n∈Ck

1N−K (x− µk)(x− µk)T

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 29: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Linear Discriminant AnalysisQuadratic Discriminant Analysis

Quadratic Discriminant Analysis

Getting back to the general discriminant problem, if the Σk are notassumed to be equal, then the pieces quadratic in x remain. We the getquadratic discriminant functions:

δk(x) = −1

2log |Σk | −

1

2(x− µk)TΣ−1

k (x− µk) + logP(Ck)

The decision boundary between each pair of classes K and L is describedby a quadratic equation {x : δk(x) = δl(x)}

Note: Compared with LDA, QDA performed better for unlinearproblems.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 30: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Logistic RegressionLDA or Logistic Regression?

Outline of this section

1 Overview of Classification

2 Discriminant Functions

3 Prob. Generative Models

4 Prob. Discrimintive ModelsLogistic RegressionLDA or Logistic Regression?

5 Bonus

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 31: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Logistic RegressionLDA or Logistic Regression?

Logistic Regression

The Logistic Regression Model arises from the desire to model theposterior probabilities of the K classes via linear functions in x , while atthe same time they sum to one and remain in [0, 1]. The model has theform:

logP(C1|x)

P(Ck |x)= w10 + w1

Tx

logP(C2|x)

P(Ck |x)= w20 + w2

Tx

...

logP(Ck−1|x)

P(Ck |x)= w(k−1)0 + wk−1

Tx

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 32: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Logistic RegressionLDA or Logistic Regression?

Logistic Regression (Cont.)

The model is specified in terms of K − 1 log-odds. We use the logisticfunction f (σ) = 1

1+exp(−σ) and we have:

P(Ck |x) =exp(wk0 + wk

Tx)

1 +∑K−1

l=1 exp(wl0 + wlTx)

, k = 1, 2, ...,K − 1

P(CK |x) =1

1 +∑K−1

l=1 exp(wl0 + wlTx)

Computation: Maximum Likelihood Estimation;

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 33: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Logistic RegressionLDA or Logistic Regression?

LDA or Logistic Regression?

For LDA, we find that the log-posterior odss between class k and K arelinear functions of x :

logP(Ci |x)

P(Cj |x)= log

P(Ci )

P(Cj)− 1

2(µi + µj)

TΣ−1(µi − µj) + xTΣ−1(µi − µj)

= αk0 + αkTx

For Logistic Regression, we have:

logP(Ck−1|x)

P(Ck |x)= w(k−1)0 + wk−1

Tx

It seems that the models are the same. The Logistic Regression model ismore general, in that it makes less assumptions.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 34: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Outline of this section

1 Overview of Classification

2 Discriminant Functions

3 Prob. Generative Models

4 Prob. Discrimintive Models

5 BonusMulti-Class, Multi-Label, Struct-Output

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 35: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Roadmap of Machine Learning in Computer Vision

Classification

Binary Classification

Multiple-Labeling

MutiClass Classification

Multi-Label Multi-Instance

Structure Output

Bounding-Box Attributive

Pixel-wise Recognition

Early 1990s 2000s Today

Event & Behaviour

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 36: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Multi-Label and Multi-Instance

Figure : Examples of Multi-label Learning (Li,CIKM’13)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 37: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Structure-Output: Bounding-Box

Figure : Examples of Bounding Box (Lan,ECCV’12)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 38: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Structure-Output: Sentence

Figure : Examples of ”Sentence” (Lan, ECCV’12)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 39: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Structure-Output: Semantic Description

Figure : Examples of ”Semantic Description” (Lan, CVPR’12)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 40: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Structure-Output: Pixel-wise Annotation

Figure : Examples of ”Pixel-wise Annotation” (Ladicky, CVPR’13)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 41: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Fine-grained Categorization

Figure : Examples of ”Fine-grained Categorization” (Yao, CVPR’11)

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 42: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Challenges & Future Work

Figure : Challenges of Visual Recognition

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 43: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Challenges & Future Work

Learning:

Transfer Learning;Weakly-supervised Learning;Deep Learning.

Data:

Visual Saliency, Objectness, Descriptors;Semantic Network.

Y. Gu LINEAR MODELS FOR CLASSIFICATION

Page 44: LINEAR MODELS FOR CLASSIFICATION - Chapter 4 in PRML

Overview of ClassificationDiscriminant Functions

Prob. Generative ModelsProb. Discrimintive Models

Bonus

Multi-Class, Multi-Label, Struct-Output

Thank you.Q&A.

Y. Gu LINEAR MODELS FOR CLASSIFICATION


Recommended