+ All Categories
Home > Documents > Classification Lecture 12. Topics Classification Frame Terminology and measures Using...

Classification Lecture 12. Topics Classification Frame Terminology and measures Using...

Date post: 01-Jan-2016
Category:
Upload: morris-fitzgerald
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
28
Classification Lecture 12
Transcript
Page 1: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Classification

Lecture 12

Page 2: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Topics

• Classification Frame

• Terminology and measures

• Using Classifications– In system use– In system development

• Creating Classifications– Card sorting

Page 3: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Classification Frame

• Classification separates candidates into two or more classes– classifying students by grade of degree

• We will look at the simple case of two classes first:– filtering Email : Good or Spam– retrieving documents : Relevant or Irrelevant– classifying credit card transactions : Valid or fraudulent– detecting spelling mistakes : ok or mistake (red line)– medical testing : normal or abnormal– Systems Requirement : ambiguous or not abmiguous

• METAPHOR : SYSTEM IS A SIEVE

Page 4: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Classification Errors (Information Retrieval)

Relevant Irrelevant

Retrieved

Not retrieved true negative

true positive

false negative(Type II error)

false positive(Type 1 error)

Precision = TP/ (TP + FP) = TP/ Retrieved

Recall = TP / (TP + FN) = TP / Relevant

Efficiency = (TP + TN) / (TP + TN + FP + FN) = (TP+TN) / Full Collection

Page 5: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Example Calculation : email filteringGood Email Spam

reject

accept

• Precision = TP/ (TP + FP) =• Recall = TP / (TP + FN) =• Efficiency = (TP + TN) / (TP+TN+FP+FN) =

7 11

3 5

Page 6: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Example Calculation : email filteringGood Email Spam

reject

accept

• Precision = TP/ (TP + FP) = 3/8• Recall = TP / (TP + FN) = 3/7• Efficiency = (TP + TN) / (TP+TN+FP+FN) = 9/18= 50%• Recall > Precision => not quite balanced

7 11

3 5

TP FP

FN TN

4 6

Page 7: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Trade-off

• The two errors are usually in conflict – we can decrease the risk of a False

Positive (reject more Spam) – but – we increase the risk of False Negatives

(rejecting good email)

• a TRADE-OFF

Page 8: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Classification ErrorsGood student Poor student

Pass

Fail

• Write in the terms – relevant, retrieved, true positive, false positive etc

Page 9: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Improved Precision

• Precision = TP/ (TP + FP) = TP/ Retrieved• Recall = TP / (TP + FN) = TP / Relevant

TP -True Positives

relevant

TN - True Negatives

FN - False Negatives

retrieved

FP - False Positives

Page 10: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Precision and Recall

• Precision = TP/ (TP + FP) = TP/ Retrieved• Recall = TP / (TP + FN) = TP / Relevant• Efficiency = (TP + TN) / (TP + TN + FP + FN) = (TP+TN) / Full Collection

TP -True Positives

relevant

TN - True Negatives

FN - False Negatives

retrieved

FP - False Positives

Full collection

Page 11: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Improved Recall

• Precision = TP/ (TP + FP) = TP/ Retrieved• Recall = TP / (TP + FN) = TP / Relevant

TP -True Positives

relevant

TN - True Negatives

FN - False Negatives

retrieved

FP - False Positives

Page 12: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Exercise: Precision and Recall in Assessment

• Precision means ……

• Recall means ….

• Ideal values (as %)– Precision=– Recall=– Efficiency

• Estimated values– Precision=– Recall=– Efficiency

Page 13: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Classification in the News• Criminal Justice as a Classifer

– Murder, Manslaughter or Innocent• What counts as ‘torture’?• Prisoners of war – US invents a new category for the

Quantanamo Bay prisoners• Blood groups:

– A,B,AB,O– RH+ , RH-

• Classification of Cloud types (Cumulus, Cirrus…) by Luke Howard 1802

• Hip evaluation to determine priority for replacement• Programme classification – where does ‘Information

Systems’ go?

Page 14: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Categories are Information Structures

• Many systems require the user to classify things in the real world into categories in order to process them:

– Files and documents into a hierarchical directory structure– Subject matter in a dissertation into sections– Facilities in the University (helpdesk, reception..– Skills in a Placements system– Budget headings, Nominal Ledger headings

• In the computer system, categories can be clearly distinguished:– Codes for each category

• In the real world:– categories don’t exist - The fallacy of misplaced concreteness– multiple taxonomies are valid – classifying the same things in different ways for different

purposes• Users typically has the task of

– mapping the real, complex things into the appropriate categories interpreting categorical information

• Implications– Users face a ‘matching’ problem – which category does the item fit best?– IS designers have to devise support for these tasks as well.– Users will not be consistent in their classification (e.g. IS books in Library)

Page 15: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Categories in IS theory

• Much of IS theory is based on a taxonomy:– Problem /solution– Method/methodology/technique..– ER model – Data Flow Diagram– Soft Systems Analysis - CATWOE– Logical /Physical– Swot analysis

• Strengths/Weaknesses/Opportunities/Treats

– Objective, Goal, Requirement, Constraint

Page 16: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Classification and Systems Design

• Steps in Classification– defining the domain (what kinds of things are to be classified) – creating the taxonomy (the set of categories), its purpose and force– defining the representation of individuals– defining the mapping between individuals and categories– coding the categories– creating automatic classifiers– assisting human classifiers– assisting users to interpret categorical information– evaluating classification performance– supporting evolution of taxonomy and classifiers

“An early step towards understanding any set of Phenomena is to learn what kinds of things there are in the set – to develop a taxonomy”

Herbert Simon

Page 17: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

A Poor Classification?• The Argentinean writer Jorge Luis Borges ‘Imaginary Beasts’,

‘Labyrinths’..) quotes a ‘certain Chinese encyclopaedia’ in which animals are divided into:

A) belonging to the EmperorB) embalmedC) tameD) suckling pigsE) sirensF) fabulousG) stray dogsH) included in the present classificationI) frenziedJ) innumerableK) drawn with a very fine camel hair brushL) et ceteraM) having just broken the water pitcherN) that from a long way off look like flies

Page 18: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

Page 19: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

Categories not

Mutually Exclusive

An object can be put in any of

several categories

Page 20: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

Categories not

Complete

Some objects don’t

belong anywhere

Page 21: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

Categories not

Balanced

Some categories

much larger than others

Page 22: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

A B C

Classifier

Machine

Human

Categories/Classes

Taxonomy

Categories

Inconsistant

Categories lack a

single organising principle

Page 23: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Characteristics of a good Taxonomy

• Categories must be:– Mutually exclusive

• Every object in at most one category

– Complete (exhaustive)• Every object in at least one category

– Balanced• Categories divide objects evenly

– Consistent• Same characteristics used throughout

– Hierarchical integrity• Categories at one level not confused with categories at

another level

Page 24: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Kinds of classification

• Classical– Classes defined by presence of features

• Square : 4 sides, equal length, equal angles• Triangle : 3 sides, equal length, equal angles

• Probabilistic– Classes defined by weighted sum of features

• ‘bird’ moves, winged, feathered, sings, lays eggs• Is a robin a bird? Is a emu a bird?

• Exemplar (prototype)– Classes defined by one or more key examples

• Robin is a central example of ‘bird’• Chicken is more remote example

• Which kind is used in IS Theory?• Which kind is used in IS Use?

Page 25: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Automated Clustering

• Clustering techniques find groups of similar objects

• Used in data mining to identify customer groups with similar buying behaviour…

• Mathematical Techniques – k-nearest neighbour– ID3 to create decision tree

• Human Techniques– Card sorting

Page 26: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Classifying• Learning Classifiers

– Based on sample of population– Classified by hand– Split into two parts

• The training set used to compute the classifier• The test set used to test the ability of the classifier

– Many kinds of classifiers available, all need good understanding of statistics e.g. Naïve Bayesian, Decision Tree, SVM

– Threshold set to balance recall and precision• Rule and example based for human classifier but performance

varies with experience and skill– E.g. book classification, Yahoo directory classification, medical

diagnosis– Human classifiers need to be trained too– If classification done by end-users, classification is likely to be

inconsistent

Page 27: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Review

• 3 tier web architecture – describe, explain, terminology, typical interactions

• SQL & PHP• Extended ER models• Interaction in human and computer systems – sequence diagrams,

state-full interaction• Alternative Development Processes –Agile Development and

Extreme Programming – description, application, comparison with SSADM, choice of appropriate development model

• Frames – rationale, role in IS development, basic recognition in a problem description of simple frames and the following in detail

• Matching Frame – typical applications, fitness function, recognising nominal, ordinal, interval and ratio scales, use of weights

• Classification Frame – typical applications, terminology, calculation of recall and precision, guidelines for constructing a taxonomy

Page 28: Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.

Preview

• XML and XSLT

• Business Processes and BPML

• Scenarios and Use cases

• Learning Frame


Recommended