Classification Techniques

8/12/2019 Classification Techniques

1/99

Pattern Classification and Dimensionality

Reduction : An Overview

Dr. D S Guru

[email protected]

Department of Studies in Computer Science,

University of Mysore, Manasagangothri,

Mysore - 570 006, INDIA.


2/99

Classification

Given arethe descriptions of n classes of objects,

and an unknown object X,

(Learning : Cognition)task is to

identifying the class label of X.

(Re-learning : Recognition)


3/99

Application : Male or Female

Classification

Male Female


4/99

Application : Character Recognition

Classification Hello

In this case, there are 26 classes A, B, Z


5/99

Application : Medical diagnostics

Cancer Not Cancer

classification


6/99


7/99

Application : Speech

Speech input

Speaker recognition ..Speech recognition

IdentificationVerification

Who What How


8/99

Classification

Techniques to recognize or describe

What : Unknown Pattern / Instance

How : By means of measured properties

called features.

Thus,

Classification Data Acquisition

+

Data Analysis


9/99

A formal definition :

M-1 :I P

i.e. Ij T(Pi)


10/99

Stages in Classification

Delineation

Feature Extraction

Descriptive features

Discriminating features

Representation (Knowledge base creation)

Labeling


11/99

Feature Extraction :

Feature : An extractable measurement.

Why ? : For Discrimination.

What Feature ? : Depends on purpose of classification.

How many ? : Depends on Qualities of the System.

When ? : 1. Cognition (Training)

2. Recognition (Classification)

How ? : ??!!!


12/99

Feature Extraction

A fundamental step in Classification

Influences the performance and simplicity of the

classifier

Refers defining new features which are functionsof the original features

Depends on application domain and purpose ofclassification (i.e., Label)

Representation Problem


13/99

Features

Text based features

Eg. Keywords Visual features

General

Eg. Colour, TextureDomain Specific

Shape Spatial


14/99

FeaturesQualitative( Eg. Intelligent, smart, beautiful, liking).

Quantitative (Numeric)

Crisp : Single eg. 10 cm.

Fuzzy: eg. Around 10AM

Interval-Valued: [a..b]

Multivalued: [a1,a2,,an]

Single Categorical Value eg. Town=Madurai

Multivalued With Weightage

Data with logical dependency


15/99

16

Fish SortingClassifier

Fish

Image Fish Species

SwordFish

Golden FishConveyer belt


16/99

Classifier Design

17

Fish Length in Centimeter

35 40 45 50 55 60 100 200 350 400 425 450 475 500

Golden

Fish 20 33 48 55 40 28 0 0 0 0 0 0 0 0

Sword

Fish 0 0 0 0 0 0 0 0 10 25 40 54 32 15


17/99

Fish Length as a Discriminating Factor

18

Find the best length threshold

GoldenFish

SwordFish

Threshold

Selection

:

:

Golden fish if fishlength ThresholdFishClass Label

Sword fish if fishlength Threshold

Classifying a new sample


18/99

19


19/99

20

Classifier Design

Threshold

Selection

Sea bass Salmon

:

:

Seabass fish if fishlength ThresholdFishClass Label

Salmon fish if fishlength Threshold

Classifying a new sample

Use Fish Length as a feature


20/99

21


21/99

22


22/99

23


23/99

24


24/99

Classifiers

25


25/99

Nearest Neighbor Classifier schematic

For a test instance,

1) Calculate distances from training pts.

2) Find the first nearest neighbor

3) Assign class label of the first neighbor

26


26/99

K-NN classifier schematic

For a test instance,

1) Calculate distances from training pts.

2) Find K-nearest neighbours (say, K = 3)

3) Assign class label based on majority

27


27/99

How good is it?

Susceptible to noisy values

Slow because of distance calculation

Alternate approaches:

Distances to representative points only

Partial distance

How to determine value of K?

Determine K experimentally. The K that gives minimum

error is selected.

K-NN classifier Issues

28


28/99

Support Vector Machines

29


29/99

Support Vector Machine (SVM) Classification

Classification as a problem of finding

optimal (canonical) linear hyperplanes.

Optimal Linear Separating Hyperplanes:

In Input Space

In Kernel Space

Can be non-linear


30/99

Linear Separating Hyper-Planes

How many lines can separate these points?

NO!

Which line should we use?


31/99

Calculating the Margin of a Classifier

P0

P2

P1

P0: Any separating hyperplane

P1: Parallel to P0, passing throughclosest point in one class

P2: Parallel to P0, passing through

point closest to the opposite class

Margin (M): distance measured along

a line perpendicular to P1 and P2

1

x

2x


32/99

Different P0s have Different Margins

P0

P2

P1








33/99


P0

P2

P1








34/99


P0P2

P1







H D SVM Ch th O ti l S ti


35/99

How Do SVMs Choose the Optimal Separating

Hyperplane (boundary)?

P2P1

Find the that

maximizes the margin!



2margin (M)

w

=

w


36/99

Neural Network

37


37/99

Classifiers

Linear Classifier

Non Linear Classifier

Parametric Classifier

Non- Parametric Classifier

Hierarchical Classifier

Adaptive Classifier

38


38/99


39/99


40/99


41/99


42/99


43/99


44/99


45/99


46/99

Examples (1) :

Patterns :

A B C D E F


47/99

Features?

Line and Curve Segments


48/99

Knowledge Acquired

Object 0 45 90 145 Top semi

circle

Bottom

semi

circle

Left

semi

circle

Right

semi

circle

A 1 1 0 1 0 0 0 0

B 0 0 1 0 0 0 0 2

C 0 0 0 0 0 0 1 0

D 0 0 1 0 0 0 0 1

E 3 0 1 0 0 0 0 0

F 2 0 1 0 0 0 0 0

R i i


49/99

Recognition

Object 0 45 90 145 Top semi

circle

Bottom

semi

circle

Left

semi

circle

Right

semi

circle^ 0 1 0 1 0 0 0 0

Dist with A 01 Dist with B 07

Dist with C 03 Dist with D 04

Dist with E 12 Dist with F 07

Given a test pattern : ^


50/99

Example (2)

Patterns:

not a straight line circle

straight line

Eigenvalues as Features


51/99

X Y

x1 y1

x2 y2

x3 y3

xn yn

2

1

22 )(1

)(

n

i

i yyn

cyVariance

2

1

11 )(1

)( n

i

i xxncxVariance

n

i

ii yyxxn

ccyxVarianceCo1

2112 )()(1),(

n

i

iy

ny

1

1

where,

2221

1211

cc

ccCConstruct a Matrix

Eigenvalues as Features

Given a set of points B = {pi | pi = (xi, yi) Z2, i = 1, 2, 3,, n}.

n

i

ix

nx

1

1


52/99

212

2

22112211 4

2

1ccccc

S

212

2

22112211 4

2

1ccccc

L

Compute eigenvalues

Solve for in | C - I | = 0

Solve for eigenvector V in CV = V

Compute eigenvectors

Li +


53/99

Orientation

()

Length large Small

0 50 50.54 0.0

10 60 60.34 0.0

27 70 70.45 0.0

30 55 55.62 0.0

Line : y = mx + c

Supervised Training

Ci l 2 + 2 2


54/99

Radius (r) large small

10 52.0727 52.0727

20 201.98 201.98

30 452.57 452.5749

45 1016.90 1016.90

90 4074.90 4074.90

Circle : x2 + y2 = r2

Supervised Training

A l t 1(90 /2) * | |


55/99

large small

40 1.56 x 103 148.8766

60 851.3714 0.83 x 103

90 843.9310 605.5014

100 843.116 208.85

Angle : y = tan-1(90 - /2) * |x|

Supervised Training

Knowledge Acquired


56/99

Knowledge Acquired

Straight Line :

* Small eigenvalue is zero and* Large eigenvalue is proportional to the length of the line

Circle :

* Both eigenvalues are equal

Angle :

* Eigenvalues are different

* Small eigenvalue is relatively large.

KB : Created through Supervised Learning


57/99

Approaches to classification

Geometrical or Statistical Approach

Structural or Syntactic Approach

Topology Based Methods


58/99

Eg: A MAN

H

L S L

L L

An animal

H S

L L L L

Physical Image Symbolic Image

Topology Based Methods


59/99

Shape Based

Pixel representation

Chain code representation

Polygonal approximation

Higher order Moments

Centroidal / Radial profile

Incremental Circle Transform

Axis of least inertia etc.,

Shape based Methods:


60/99

ICT and EA : Integrated approach

Definition :Let (l) be a closed curve,

the ICT vector of is :

( l )= (x(l), y(l)) such that

x2(l) + y2(l) = r2

and (l+l) = (l) + (l).

For some r and 0 l L

Boundary representation scheme:1) Compute ICT vector

2) Find first PCV

Invariant Properties:


61/99

Translation invariant

Let C(l), 0 l L, be the boundary curve of an object and C(l) = (x(l), y(l)) be the

corresponding ICT vector computed with constant radius r. Let Ct(l) is the translated version

of C(l) and Ct(l)=(xt(l), yt(l)) is its corresponding ICT vector computed with the sameradius r. Then irrespective of the Translation vector the determinants of the variance and

covariance matrices C(l) and Ct(l) remains the same.

Rotation Invariant

Theorem :

Let C(l), 0 l L, be the boundary of an object and C(l)=(x(l), y(l)) be the corresponding

ICT vector computed with constant radius r. If Cr(l) is the rotated version of C(l) and Cr(l) =

(xr(l), yr(l)) is its corresponding ICT vector computed with the same radius r then

irrespective of the rotation angle the determinants of the variance-covariance matrices of

C(l) and Cr(l) remain same.

Corollary: The egienvalues of the variance-covariance matrix of the ICT vector of the

boundary of a given object are rotational invariants.

Flipping Invariant

L


62/99

Lemma :

Let C(l) = (x(l), y(l)) 0 l L be the boundary curve of an object and C(l)=(x(l), y(l)) be the

corresponding ICT vector computed with constant radius r. If Cf(l), is the flipped version of C(l)

about Y-axis or/and X-axis and Cf(l) is its corresponding ICT vector computed with the same

radius r then the eigenvalues of the variance-covariance matrix of Cf(l) are same as that of

C(l).

Theorem: Let C(l) = (x(l), y(l)) be the shape curve of an object and C(l) = (x(l), y(l)) be its

corresponding ICT vector computed with a constant radius r. If Cf(l) is the flipped version of C(l)

about an arbitrary line andCf(l) is its corresponding ICT vector computed with the same radius

r then the eigenvalues ofCf(l) and C(l) are one and the same.

Proposed Methodology

Algorithm: Create Knowledge base


63/99

Algorithm: Create_Knowledge_base.

Input : S, Set of images of objects to be learnt (say n in number)Output : Knowledge base of eigenvalues.Method : For each of the image I in S do

1. Extract the boundary curve, B using a suitable boundary extractor.

2. Compute the ICT vector, V for the boundary B.

3. Construct the variance-covariance matrix, M of the ICT vector V.

4. Find out the largest eigenvalue, E of the matrix M.

5. Store the eigenvalue E to represent the image I in a Knowledge base KB.

For end.

Create_Database ends.Algorithm: Recognition.Input : I, The image of an object O to be recognized.

Output : Index of I if it is one of the learnt image.

Method : 1. Extract the boundary curve, B of I.

2. Compute the ICT vector, V for B.

3. Construct the variance-covariance matrix, M of V.

4. Find out the largest eigenvalue, E, of M.

5. Employ binary search technique to search for E in the knowledge base KB withsome threshold value and return the index.

Recognition Ends.


64/99

36 Samples of each object is considered

No flipped version of any object isconsidered.

D t t 3
http://localhost/var/www/apps/conversion/Document3.doc


65/99

Data set 3

A set of Industrial objects

Object Type Determinant Span Large eigenvalue Span Small eigenvalue Span


66/99


Model 1 0.2184 to 0.2302 0.5697 to 0.5808 0.3818 to 0.3966Model 2 0.2214 to 0.2323 0.4939 to 0.5068 0.4457 to 0.4596

Model 3 0.2284 to 0.2360 0.5461 to 0.5578 0.4160 to 0.4255Model 4 0.1656 to 0.1729 0.7198 to 0.7361 0.2298 to 0.2356Model 5 0.2012 to 0.2068 0.4485 to 0.4560 0.4481 to 0.4543

Model 6 0.2264 to 0.2335 0.5319 to 0.5398 0.4253 to 0.4333

(a)


Key A 0.1824 to 0.1893 0.5929 to 0.6070 0.3029 to 0.3146key B 0.1873 to 0.1935 0.5694 to 0.5793 0.3279 to 0.3358key C 0.1988 to 0.2044 0.5547 to 0.5639 0.3567 to 0.3636key D 0.1920 to 0.1983 0.5828 to 0.5913 0.3288 to 0.3358

(b)


Industrial Obj 1 0.1175 to 0.1207 0.7402 to 0.7442 0.1588 to 0.1628Industrial Obj 2 0.0926 to 0.0962 0.7958 to 0.8018 0.1162 to 0.1206Industrial Obj 3 0.1528 to 0.1554 0.7355 to 0.7401 0.2076 to 0.2103

Industrial Obj 4 0.0664 to 0.0697 0.8106 to 0.8155 0.0818 to 0.0856

Industrial Obj 5 0.0815 to 0.0854 0.8156 to 0.8276 0.0993 to 0.1039Industrial Obj 6 0.1434 to 0.1477 0.6695 to 0.6786 0.2142 to 0.2176Industrial Obj 7 0.1476 to 0.1515 0.6499 to 0.6561 0.2261 to 0.2312

Industrial Obj 8 0.1304 to 0.1355 0.7352 to 0.7409 0.1765 to 0.1839Industrial Obj 9 0.0566 to 0.0601 0.7834 to 0.7918 0.0717 to 0.0760

(c)

Table. 10.1(a-c). Span in determinant, large eigenvalue and small eigenvalue.


67/99

MACHINE LEARNING ?!


68/99

Gaining Knowledge of ...

Skill in

By

Study, Practice or Being Taught

Unsupervised

Supervised

Through Experience

Crucial stage in Machine Perception


69/99

A COWA COW WITH THREE LEGS AND TWO TAILS

Machine Learning through Vision?Re Learning

MACHINE LEARNING ?!


70/99

Gaining Knowledge of ...

Skill in

By

Study, Practice or Being Taught

Unsupervised

Supervised

Through Experience

Crucial stage in Machine Perception

The process that allows the learner to cope with reality

Cognitive process

Dimensionality Reduction


71/99

y

Feature

Sa

m

p

l

es

1

2

3

.

.

m

1 2 3 n

Reducing m to c ; c


72/99

Advantages of DR

Reduction in Memory Requirement

Data Analysis becomes simplified

Cluster Analysis and hence Classifier design

becomes easier

Visualization becomes relatively possible Time efficient classifier

Dimensionality Reduction Methodologies


73/99

Feature Sub setting

Feature Transformation

Feature Sub setting :Process of choosing d number of features from the collection of nfeatures.

There are 2npossible subsets.

Problem lies in : Choosing the best subset.

: O(2n) : Exponential

Original

features

Transformed

features

T

Feature Transformation

T?

F t S l ti M th d


74/99

Feature Selection Methods

Filter methodSupervised

Learning Algorithm Independent

Feature Selection criterion is required

Linear time complexity

Wrapper method

UnsupervisedLearning Algorithm dependent

No feature selection criterion is required

Quadratic complexity

The Simplest Filter Method :


75/99

The Simplest Filter Method :

Repeat

Until

Merge those two features

for which correlation is the

highest

(desired level of

dimensionality reduction

is achieved).


76/99

Wrapper Methods

Sequential Forward Selection (SFS)

Sequential Backward Selection (SBS)

Sequential Floating Forward Selection (SFFS)

Sequential Floating Backward Selection (SFFS)

77



77/99


Method of inclusion

Starts with empty set

At each step it adds a best feature such that

performance of a learning algorithm is

maximized

78

SFS - Example


78/99

p

79



79/99


Method of elimination

Starts with the set of all features

At each step it eliminates a worst feature such

that performance of a learning algorithm is

maximized

80

SBS - Example


80/99

p

81

Sequential Floating Forward Selection (SFFS)


81/99

q g ( )

Method of inclusion and elimination

Starts with empty set

Forward selection followed by backward elimination

SFS + SBS at each step

82

Sequential Floating Backward Selection (SFBS)


82/99

q g ( )

Method of elimination and inclusion

Starts with set of all features

Backward elimination followed by forward selection

SBS + SFS at each step

83

F T f i T h i


83/99

Feature Transformation Techniques

Principal Component Analysis

Independent Component Analysis

Latent Semantic Indexing

Manifold Learning

Fisher Linear Discriminate Analysis

Canonical Correlation Analysis Partial Least Square

Principal Component Analysis


84/99

PCA

f1

f2

fn

Pc1

Pcn

Pc2

Let F be a feature

matrix,

M=Covariance(F) ;= Eigen values [M]

(|M- I| = 0) ;

MV= V.

x

y

pc1

pc2

s1

s2 s3

s4

d1d2

d3d4


85/99

Stereographic Projection Model

Quadratic Solver

Z


86/99

(1)

(1,15,6)

(2)

(3)(4)

(5)(6)

(8)(7)

(8,15,6)

(8,15,7)(1,15,7)

(8,20,6)

(8,20,7)

(1,20,6)

(1,20,7)

X

Y


87/99

Quadratic Solver Based Model

(1,-5,6) (3,2) (2,8,6) (-3,3/2)

(1,8,12) (-6,-2) (4,-14,6) (3,1/2)

(1,7,12) (-4,-3) (2,-15,13) (1,13/2)(1,-11,24) (3,8) (1,-8,15) (5,3)

(1,-7,10) (5,2) (5,-7,2) (1,2/5)

0


88/99

-20

-18

-16

-14

0

(-0.9) (-0.8) (-0.3) (-0.2)

6

37

4

1

2

8

5

Quadratic Solver : Dimensionality Reducer


89/99

Quadratic Solver : Dimensionality Reducer

(1,-5,6) (3,2) (2,8,6) (-3,3/2)

(1,8,12) (-6,-2) (4,-14,6) (3,1/2)

(1,7,12) (-4,-3) (2,-15,13) (1,13/2)

(1,-11,24) (3,8) (1,-8,15) (5,3)

(1,-7,10) (5,2) (5,-7,2) (1,2/5)

017217222920

1720468505360

17246801180

22950510205

203601802050

0401061252

4001701816

1061700574

1251815010

2674100

1 2 3 4 5

1

2

34

5

1 2 3 4 5

1

2

34

5

Triplet Pair


90/99

Semantic Gap

Features


91/99

Features

Low Level Features High Level Features

Extracted directly from data

Easy to extract and analyze

Widely used

Inferred from Low Level Features

Difficult to extract and analyze

Rarely used

Not realistic in nature and hence far

away from human perception

Statistical analysis can be carried out

Conventional in nature

Realistic in nature and hence

Similar to human perception

Aggregation and abstraction ispossible

Unconventional in nature

Semantic Gap

Proximity (Conventional)


92/99

Proximity (Conventional)

Work on crisp type features

The proximity is crisp

It is symmetric

Similarity + Dissimilarity = Constant

But in reality,- Feature is not necessarily crisp

- Proximity itself may not be crisp

- Proximity might not be symmetric- Similarity might not be just another aspect of dissimilarity

Technology Provides

User Demands

Semantic

Gap

E i ti l ifi D di l ifi


93/99

Existing classifiers

Parametric

Exclusive

Uncertain (Inconsistency)

Non-adaptive

Non-Parametric

Overlapping

Consistent

AdaptiveSemantic

Gap

Demanding classifiers

Some Publications from my team for your


94/99

y y

reference

1. D.S. Guru, K.S. Manjunatha, S. Manjunath., User Dependent Features in OnlineSignature Verification. Proceedings of ICMCCA12, LNEE, 2012.

2. Harish B S, Guru D S and Manjunath S., Dissimilarity Based Feature Selection for

Text Classification: A Cluster Based Approach. proceedings of ACM International

Conference and Workshop on Emerging Trends and Technology, Feb 25 -26,

Mumbai, India, 2011.

3. Guru D S and Mallikarjuna P B. Fusion of Texture Features and Sequential Forward

selection method for Classification of Tobacco Leaves for Automatic Harvesting. In

proceedings of second International conference on Computational Vision and

Robotics, Bhubaneshwar, India, August 14-15, 2011, pp. 168-172.

4. B. S. Harish, D. S. Guru, S. Manjunath, Bapu B. Kiranagi., Symbolic Similarity and

symbolic Feature Selection for Text Classification. International workshop onEmerging Applications on Computer Vision, 2011, pp. 21 28, Moscow (Russia), pp

141-146.

5. D. S. Guru, P. B. Mallikarjuna., Classification of Tobacco Leaves for Automatic Harvesting: An

Approach Based on Feature Level Fusion and SBS Method International workshop on


95/99

Approach Based on Feature Level Fusion and SBS Method. International workshop on

Emerging Applications on Computer Vision, 2011, pp. 21 28, Moscow (Russia), pp 102-109.

6. D. S. Guru, M.G. Suraj, S. Manjunath., Fusion of covariance matrices of PCA and FLD. Pattern

Recognition Letters.,32, 2011, pp 432-440.

7. Harish B S, Guru D S, Manjunath S, Dinesh R., Cluster Based Symbolic Representation and

Feature Selection for Text Classification. Proceedings of Advanced Data Mining and

Applications, Vol. 2, pp. 158-166, 2010.

8. Punitha P and Guru D S., Symbolic image indexing and retrieval by spatial similarity: An

approach based on B-tree.Journal of Pattern Recognition, Elsevier Publishers, Vol. 41, 2008,

pp 2068 - 2085.

9. Suraj M G and Guru D S., Secondary diagonal FLD for fingerspelling recognition. Proceedingsof the International Conference on Computing: Theory and Applications, (ICCTA07,) Kolkota,

India, March 5-7, 2007, pp. 693-697.

10. Kiranagi B B, Guru D S and Ichino M., Exploitation of multivalued type proximity for symbolic

feature selection. Proceedings of the International Conference on Computing: Theory and

Applications, (ICCTA07), Kolkota, India, March 5-7, 2007, pp. 320 - 324

11. Nagabhushan P, Guru D S and Shekar B H., (2D)2 FLD: An efficient approach for appearancebased object recognition.Journal of Neurocomputing, Elsevier Publishers, Vol. 69. No.7-9,

2006, pp 934-940.

12. Nagabhushan P, Guru D S and Shekar B H., Visual Learning and Recognition of 3D Objects

Using Two Dimensional Principal Component Analysis: A Robust and an Efficient Approach,

Journal of Pattern Recognition, Elsevier Publishers, Vol. 39. No.4, 2006, pp 721-725


96/99

There is always a distance between two living


97/99

Now

Questions?

are Welcome

things as it is unlikely that any two living beings

are alike. It is true even with artificially made

objects however, they are visually alike.

D S Guru

R E S E A R C H ??


98/99

Reading a lot for

Establishing

Scientific and

Engineering

Aptitude to have a good personal

Rapport with a

Commitment to build up a

Healthy society for the development of Nation

-D.S. Guru

No(w) Questions!?


99/99

Dr. D S G

Date post:	03-Jun-2018
Category:	Documents
Upload:	hemanth-kumar
View:	224 times
Download:	0 times

Classification Techniques

Documents