+ All Categories
Home > Documents > Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino...

Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino...

Date post: 19-Jul-2020
Category:
Upload: others
View: 15 times
Download: 0 times
Share this document with a friend
68
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
Transcript
Page 1: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Text classification IICE-324: Modern Information Retrieval Sharif University of Technology

M. Soleymani

Fall 2017

Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

Page 2: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Outline

Vector space classification

Rocchio

Linear classifiers

kNN

2

Page 3: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Standing queries

The path from IR to text classification:

You have an information need to monitor, say:

Unrest in the Niger delta region

You want to rerun an appropriate query periodically to find new news

items on this topic

You will be sent new documents that are found

I.e., it’s not ranking but classification (relevant vs. not relevant)

Such queries are called standing queries

Long used by “information professionals”

A modern mass instantiation is GoogleAlerts

Standing queries are (hand-written) text classifiers

Ch. 13

Page 4: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

4

Recall: vector space representation

Each doc is a vector

One component for each term (= word).

Terms are axes

Usually normalize vectors to unit length.

High-dimensional vector space:

10,000+ dimensions, or even 100,000+

Docs are vectors in this space

How can we do classification in this space?

Sec.14.1

Page 5: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

5

Classification using vector spaces

Training set: a set of docs, each labeled with its class (e.g.,

topic)

This set corresponds to a labeled set of points (or, equivalently,

vectors) in the vector space

Premise 1: Docs in the same class form a contiguous

regions of space

Premise 2: Docs from different classes don’t overlap

(much)

We define surfaces to delineate classes in the space

Sec.14.1

Page 6: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

6

Documents in a vector space

Government

Science

Arts

Sec.14.1

Page 7: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

7

Test document of what class?

Government

Science

Arts

Sec.14.1

Page 8: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

8

Test document of what class?

Government

Science

Arts

Is this similarity

hypothesis true in

general?

Our main topic today is how to find good separators

Sec.14.1

Government

Page 9: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Relevance feedback

relation to classification

9

In relevance feedback, the user marks docs as

relevant/non-relevant.

Relevant/non-relevant can be viewed as classes or categories.

For each doc, the user decides which of these two classes

is correct.

Relevance feedback is a form of text classification.

Page 10: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Rocchino for text classification

Relevance feedback methods can be adapted for text

categorization

Relevance feedback can be viewed as 2-class classification

Use standard tf-idf weighted vectors to represent text docs

For training docs in each category, compute a prototype as

centroid of the vectors of the training docs in the category.

Prototype = centroid of members of class

Assign test docs to the category with the closest prototype

vector based on cosine similarity.

10

Sec.14.2

Page 11: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Definition of centroid

𝜇 𝑐 =1

𝐷𝑐

𝑑∈𝐷𝑐

𝑑

𝐷𝑐: docs that belong to class 𝑐

𝑑 : vector space representation of 𝑑.

Centroid will in general not be a unit vector even when

the inputs are unit vectors.

11

Sec.14.2

Page 12: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Rocchino algorithm

12

Page 13: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Rocchio: example

13

We will see that Rocchino finds linear boundaries

between classes

Government

Science

Arts

Page 14: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Illustration of Rocchio: text classification

14

Sec.14.2

Page 15: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

15

Rocchio properties

Forms a simple generalization of the examples in eachclass (a prototype).

Prototype vector does not need to be normalized.

Classification is based on similarity to class prototypes.

Does not guarantee classifications are consistent with thegiven training data.

Sec.14.2

Page 16: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

16

Rocchio anomaly

Prototype models have problems with polymorphic

(disjunctive) categories.

Sec.14.2

Page 17: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Rocchio classification: summary

Rocchio forms a simple representation for each class:

Centroid/prototype

Classification is based on similarity to the prototype

It does not guarantee that classifications are consistentwith the given training data

It is little used outside text classification

It has been used quite effectively for text classification

But in general worse than many other classifiers

Rocchio does not handle nonconvex, multimodal classescorrectly.

17

Sec.14.2

Page 18: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Linear classifiers

18

Assumption:The classes are linearly separable.

Classification decision: 𝑖=1𝑚 𝑤𝑖𝑥𝑖 +𝑤0 > 0?

First, we only consider binary classifiers.

Geometrically, this corresponds to a line (2D), a plane (3D) or

a hyperplane (higher dimensionalities) decision boundary.

Find the parameters 𝑤0, 𝑤1, … , 𝑤𝑚 based on training set.

Methods for finding these parameters: Perceptron, Rocchio,…

Page 19: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

19

Separation by hyperplanes

A simplifying assumption is linear separability:

in 2 dimensions, can separate classes by a line

in higher dimensions, need hyperplanes

Sec.14.4

Page 20: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Two-class Rocchio as a linear classifier

Line or hyperplane defined by:

For Rocchio, set:

𝑤 = 𝜇 𝑐1 − 𝜇 𝑐2

𝑤0 =1

2 𝜇 𝑐1

2 − 𝜇 𝑐22

20

Sec.14.2

𝑤0 +

𝑖=1

𝑀

𝑤𝑖𝑑𝑖 = 𝑤0 +𝑤𝑇 𝑑 ≥ 0

Page 21: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

21

Linear classifier: example

Class:“interest” (as in interest rate)

Example features of a linear classifier

wi ti wi ti

To classify, find dot product of feature vector and weights

• 0.70 prime

• 0.67 rate

• 0.63 interest

• 0.60 rates

• 0.46 discount

• 0.43 bundesbank

• −0.71 dlrs

• −0.35 world

• −0.33 sees

• −0.25 year

• −0.24 group

• −0.24 dlr

Sec.14.4

Page 22: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Linear classifier: example

22

Class “interest” in Reuters-21578

𝑑1:“rate discount dlrs world”

𝑑2:“prime dlrs”

𝑤𝑇 𝑑1 = 0.07 ⇒ 𝑑1 is assigned to the “interest” class

𝑤𝑇 𝑑2 = −0.01 ⇒ 𝑑2 is not assigned to this class

𝑤0 = 0

Page 23: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Naïve Bayes as a linear classifier

23

𝑃 𝐶1 𝑖=1

𝑀

𝑃 𝑡𝑖 𝐶1𝑡𝑓𝑖,𝑑 > 𝑃(𝐶2)

𝑖=1

𝑀

𝑃 𝑡𝑖 𝐶2𝑡𝑓𝑖,𝑑

log 𝑃(𝐶1) +

𝑖=1

𝑀

𝑡𝑓𝑖,𝑑 × log𝑃 𝑡𝑖 𝐶1

> log𝑃(𝐶2) +

𝑖=1

𝑀

𝑡𝑓𝑖,𝑑 × log𝑃 𝑡𝑖 𝐶2

𝑤𝑖 = log𝑃 𝑡𝑖 𝐶1𝑃 𝑡𝑖 𝐶2

𝑥𝑖 = 𝑡𝑓𝑖,𝑑 𝑤0 = log𝑃 𝐶1

𝑃 𝐶1

Page 24: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

24

Linear programming / Perceptron

Find a,b,c, such that

ax + by > c for red points

ax + by < c for blue points

Sec.14.4

Page 25: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

25

Which hyperplane?

In general, lots of possible

solutions for a,b,c.

Sec.14.4

Page 26: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

26

Which hyperplane?

Lots of possible solutions for a,b,c.

Some methods find a separating hyperplane, but not theoptimal one [according to some criterion of expected goodness]

Which points should influence optimality?

All points

E.g., Rocchino

Only “difficult points” close to decision boundary

E.g., SupportVector Machine (SVM)

Sec.14.4

Page 27: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

27

Support Vector Machine (SVM)

Support vectors

Maximizesmargin

SVMs maximize the margin around

the separating hyperplane.

A.k.a. large margin classifiers

Solving SVMs is a quadratic

programming problem

Seen by many as the most

successful current text classification

method*

*but other discriminative methods

often perform very similarly

Sec. 15.1

Narrowermargin

Page 28: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

28

Linear classifiers

Many common text classifiers are linear classifiers

Classifiers more powerful than linear often don’t performbetter on text problems.Why?

Despite the similarity of linear classifiers, noticeableperformance differences between them

For separable problems, there is an infinite number of separatinghyperplanes.

Different training methods pick different hyperplanes.

Also different strategies for non-separable problems

Sec.14.4

Page 29: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

29

Linear classifiers:

binary and multiclass classification

Consider 2 class problems

Deciding between two classes, perhaps, government and non-

government

Multi-class

How do we define (and find) the separating surface?

How do we decide which region a test doc is in?

Sec.14.4

Page 30: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

30

More than two classes

One-of classification (multi-class classification)

Classes are mutually exclusive.

Each doc belongs to exactly one class

Any-of classification

Classes are not mutually exclusive.

A doc can belong to 0, 1, or >1 classes.

For simplicity, decompose into K binary problems

Quite common for docs

Sec.14.5

Page 31: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

31

Set of binary classifiers: any of

Build a separator between each class and its complementary

set (docs from all other classes).

Given test doc, evaluate it for membership in each class.

Apply decision criterion of classifiers independently

It works although considering dependencies between categories may

be more accurate

Sec.14.5

Page 32: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

32

Multi-class: set of binary classifiers

Build a separator between each class and its

complementary set (docs from all other classes).

Given test doc, evaluate it for membership in each class.

Assign doc to class with:

maximum score

maximum confidence

maximum probability

?

?

?

Sec.14.5

Page 33: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

33

k Nearest Neighbor Classification

kNN = k Nearest Neighbor

To classify a document d:

Define k-neighborhood as the k nearest neighborsof d

Pick the majority class label in the k-neighborhood

Sec.14.3

Page 34: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

34

Nearest-Neighbor (1NN) classifier

Learning phase: Just storing the representations of the training examples in D.

Does not explicitly compute category prototypes.

Testing instance 𝑥 (under 1NN): Compute similarity between x and all examples in D.

Assign x the category of the most similar example in D.

Rationale of kNN: contiguity hypothesis

We expect a test doc 𝑑 to have the same label as the training docs

located in the local region surrounding 𝑑.

Sec.14.3

Page 35: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

35

Test Document = Science

Government

Science

Arts

Sec.14.1

Page 36: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

36

k Nearest Neighbor (kNN) classifier

1NN: subject to errors due to

A single atypical example.

Noise (i.e., an error) in the category label of a single trainingexample.

More robust alternative:

find the k most-similar examples

return the majority category of these k examples.

Sec.14.3

Page 37: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

38

kNN example: k=6

Government

Science

Arts

P(science| )?

Sec.14.3

Page 38: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

39

kNN decision boundaries

Government

Science

Arts

Boundaries are in

principle arbitrary

surfaces (polyhedral)

kNN gives locally defined decision boundaries between classes – far away points do not influence each classification decision (unlike Rocchio, etc.)

Sec.14.3

Page 39: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

1NN: Voronoi tessellation

40

The decision boundaries between classes are piecewise linear.

Page 40: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

kNN algorithm

41

Page 41: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Time complexity of kNN

42

kNN test time proportional to the size of the training

set!

kNN is inefficient for very large training sets.

Page 42: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

43

Similarity metrics

Nearest neighbor method depends on a similarity (ordistance) metric.

Euclidean distance: Simplest for continuous vector space.

Hamming distance: Simplest for binary instance space.

number of feature values that differ

For text, cosine similarity of tf.idf weighted vectors istypically most effective.

Sec.14.3

Page 43: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

44

Illustration of kNN (k=3) for text vector

space

Sec.14.3

Page 44: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

45

3-NN vs. Rocchio

Nearest Neighbor tends to handle polymorphic

categories better than Rocchio/NB.

Page 45: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

46

Nearest neighbor with inverted index

Naively, finding nearest neighbors requires a linear search

through |𝐷| docs in collection

Similar to determining the 𝑘 best retrievals using the test doc

as a query to a database of training docs.

Use standard vector space inverted index methods to

find the k nearest neighbors.

Testing Time: O(B|Vt|)

Typically B << |D| if a large list of stopwords is used.

Sec.14.3

B is the average number of training docs in which

at least one word of test-document appears

Page 46: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

A nonlinear problem

Linear classifiers do

badly on this task

kNN will do very well

(assuming enough

training data)

47

Sec.14.4

Page 47: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Overfitting example

48

Page 48: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

49

kNN: summary No training phase necessary

Actually: We always preprocess the training set, so in reality training time ofkNN is linear.

May be expensive at test time

kNN is very accurate if training set is large. In most cases it’s more accurate than linear classifiers

Optimality result: asymptotically zero error if Bayes rate is zero.

But kNN can be very inaccurate if training set is small.

Scales well with large number of classes Don’t need to train C classifiers for C classes

Classes can influence each other Small changes to one class can have ripple effect

Sec.14.3

Page 49: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

50

Choosing the correct model capacity

Sec.14.6

Page 50: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Linear classifiers for doc classification

51

We typically encounter high-dimensional spaces in textapplications.

With increased dimensionality, the likelihood of linearseparability increases rapidly

Many of the best-known text classification algorithms arelinear.

More powerful nonlinear learning methods are more sensitiveto noise in the training data.

Nonlinear learning methods sometimes perform better ifthe training set is large, but by no means in all cases.

Page 51: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Which classifier do I use for a given text

classification problem?

Is there a learning method that is optimal for all text

classification problems?

No, because there is a tradeoff between complexity of the

classifier and its performance on new data points.

Factors to take into account:

How much training data is available?

How simple/complex is the problem?

How noisy is the data?

How stable is the problem over time?

For an unstable problem, it’s better to use a simple and robust

classifier.

52

Page 52: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Reuters collection

53

Only about 10 out of 118 categories are large

Common categories

(#train, #test)

• Earn (2877, 1087) • Acquisitions (1650, 179)• Money-fx (538, 179)• Grain (433, 149)• Crude (389, 189)

• Trade (369,119)• Interest (347, 131)• Ship (197, 89)• Wheat (212, 71)• Corn (182, 56)

Page 53: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Evaluating classification

54

Evaluation must be done on test data that are

independent of the training data

training and test sets are disjoint.

Measures: Precision, recall, F1, accuracy

F1 allows us to trade off precision against recall (harmonic

mean of P and R).

Page 54: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Precision P and recall R

55

Precision P = tp/(tp + fp)

Recall R = tp/(tp + fn)

actually in the

class

actually in the

class

predicted to be in

the classtp fp

Predicted not to

be in the classfn tn

Page 55: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

56

Good practice department:

Make a confusion matrix

This (i, j) entry means 53 of the docs actually in class i were

put in class j by the classifier.

In a perfect classification, only the diagonal has non-zero entries

Look at common confusions and how they might be addressed

53

Class assigned by classifier

Ac

tua

l C

lass

Sec. 15.2.4

𝑐𝑖𝑗

Page 56: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

57

Per class evaluation measures

Recall: Fraction of docs in class i classified correctly:

Precision: Fraction of docs assigned class i that are

actually about class i:

Accuracy: (1 - error rate) Fraction of docs classified

correctly:

j i

ij

i

ii

c

c

j

ji

ii

c

c

j

ij

ii

c

c

Sec. 15.2.4

Page 57: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

Averaging: macro vs. micro

58

We now have an evaluation measure (F1) for one class.

But we also want a single number that shows aggregate

performance over all classes

Page 58: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

59

Micro- vs. Macro-Averaging

If we have more than one class, how do we combinemultiple performance measures into one quantity?

Macroaveraging: Compute performance for each class,then average.

Compute F1 for each of the C classes

Average these C numbers

Microaveraging: Collect decisions for all classes, aggregatethem and then compute measure.

Compute TP, FP, FN for each of the C classes

Sum these C numbers (e.g., all TP to get aggregate TP)

Compute F1 for aggregate TP, FP, FN

Sec. 15.2.4

Page 59: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

60

Micro- vs. Macro-Averaging: Example

Truth:

yes

Truth:

no

Classifier:

yes

10 10

Classifier:

no

10 970

Truth:

yes

Truth:

no

Classifier:

yes

90 10

Classifier:

no

10 890

Truth:

yes

Truth:

no

Classifier:

yes

100 20

Classifier:

no

20 1860

Class 1 Class 2 Micro Ave. Table

Macroaveraged precision: (0.5 + 0.9)/2 = 0.7

Microaveraged precision: 100/120 = .83

Microaveraged score is dominated by score

on common classes

Sec. 15.2.4

Page 60: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

61

Evaluation measure: F1

Page 61: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

62

Amount of data?

Little amount of data stick to less powerful classifiers (i.e. linear ones)

Naïve Bayes should do well in such circumstances (Ng and Jordan 2002NIPS)

The practical answer is to get more labeled data as soon as you can

Reasonable amount of data We can use all our clever classifiers

Huge amount of data Expensive methods like SVMs (train time) or kNN (test time) are

quite impractical

Naïve Bayes can come back into its own again! Or other advanced methods with linear training/test complexity

With enough data the choice of classifier may not matter much, andthe best choice may be unclear

Sec. 15.3.1

Page 62: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

63

How many categories?

A few (well separated ones)?

Easy!

A zillion closely related ones?

Think:Yahoo! Directory

Quickly gets difficult!

May need a hybrid automatic/manual solution

Sec. 15.3.2

Page 63: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

dairycrops

agronomyforestry

AI

HCI

craft

missions

botany

evolution

cellmagnetism

relativity

courses

agriculture biology physics CS space

... ... ...

… (30)

www.yahoo.com/Science

... ...

Yahoo! Hierarchy

64

Page 64: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

65

How can one tweak performance?

Aim to exploit any domain-specific useful features that

give special meanings or that zone the data

Aim to collapse things that would be treated as different

but shouldn’t be.

E.g., ISBNs, part numbers, chemical formulas

Does putting in “hacks” help?

You bet!

Feature design and non-linear weighting is very important in the

performance of real-world systems

Sec. 15.3.2

Page 65: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

66

Upweighting

You can get a lot of value by differentially weighting

contributions from different document zones.

That is, you count as two instances of a word when you

see it in, say, the abstract

Upweighting title words helps (Cohen & Singer 1996)

Doubling the weighting on the title words is a good rule of thumb

Upweighting the first sentence of each paragraph helps

(Murata, 1999)

Upweighting sentences that contain title words helps (Ko et al,

2002)

Sec. 15.3.2

Page 66: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

67

Two techniques for zones

1. Have a completely separate set of features/parameters

for different zones like the title

2. Use the same features (pooling/tying their parameters)

across zones, but upweight the contribution of different

zones

Commonly the second method is more successful:

it costs you nothing in terms of sparsifying the data, but can

give a very useful performance boost

Which is best is a contingent fact about the data

Sec. 15.3.2

Page 67: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

68

Does stemming/lowercasing/… help?

As always, it’s hard to tell, and empirical evaluation isnormally the gold standard.

But note that the role of tools like stemming is ratherdifferent for TextCat vs. IR:

For IR, you want to improve recall

For TextCat, with sufficient training data, stemming does nogood.

It only helps in compensating for data sparseness

Sec. 15.3.2

Page 68: Text classification IIce.sharif.edu/courses/96-97/1/ce324-1/resources/root/slides... · Rocchino for text classification Relevance feedback methods can be adapted for text categorization

69

Resources

IIR, Chapter 14

Ch. 14


Recommended