+ All Categories
Home > Documents > Tin Kam Ho Bell Laboratories Lucent Technologies

Tin Kam Ho Bell Laboratories Lucent Technologies

Date post: 04-Jan-2016
Category:
Upload: fabrizio-bellew
View: 26 times
Download: 5 times
Share this document with a friend
Description:
Data Complexity Analysis. for Classifier Combination. Tin Kam Ho Bell Laboratories Lucent Technologies. Outlines. Classifier Combination: Motivation, methods, difficulties Data Complexity Analysis: Motivation, methods, early results. - PowerPoint PPT Presentation
Popular Tags:
60
Tin Kam Ho Bell Laboratories Lucent Technologies
Transcript
Page 1: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Tin Kam Ho Bell Laboratories

Lucent Technologies

Page 2: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Outlines

• Classifier Combination:– Motivation, methods, difficulties

• Data Complexity Analysis:– Motivation, methods, early results

Page 3: Tin Kam Ho                 Bell Laboratories      Lucent Technologies
Page 4: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Supervised Classification --Discrimination Problems

Page 5: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

An Ill-Posed Problem

Page 6: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Where Were We in the Late 1990’s?

• Statistical Methods– Bayesian classifiers, polynomial discriminators, nearest-

neighbors, decision trees, neural networks, support vector machines, …

• Syntactic Methods– regular grammars, context-free grammars, attributed

grammars, stochastic grammars, ...

• Structural Methods– graph matching, elastic matching, rule-based systems, ...

Page 7: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Classifiers

• Competition among different …– choices of features– feature representations– classifier designs

• Chosen by heuristic judgements

• No clear winners

Page 8: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Classifier Combination Methods

• Decision optimization methods– find consensus from a given set of classifiers– majority/plurality vote, sum/product rule– probability models, Bayesian approaches– logistic regression on ranks or scores– classifiers trained on confidence scores

Page 9: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Classifier Combination Methods

• Coverage optimization methods

– subsampling methods:

stacking, bagging, boosting

– subspace methods: random subspace projection, localized selection

– superclass/subclass methods: mixture of experts, error-correcting output codes

– perturbation in training

Page 10: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Layers of Choices

Best Features?

Best Classifier?

Best Combination Method?

Best (combination of )* combination methods?

Page 11: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Before We Start …

• Single or multiple classifiers?– accuracy vs efficiency

• Feature or decision combination?– compatible units, common metric

• Sequential or parallel classification?– efficiency, classifier specialties

Page 12: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Difficulties in Decision Optimization

• Reliability versus overall accuracy

• Fixed or trainable combination function

• Simple models or combinatorial estimates

• How to model complementary behavior

Page 13: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Difficulties inCoverage Optimization

• What kind of differences to introduce:– Subsamples? Subspaces? Subclasses?– Training parameters? – Model geometry?

• 3-way tradeoff: – discrimination + uniformity + generalization

• Effects of the form of component classifiers

Page 14: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Dilemmas and Paradoxes

• Weaken individuals for a stronger whole?

• Sacrifice known samples for unseen cases?

• Seek agreements or differences?

Page 15: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Model of Complementary Decisions

• Statistical independence of decisions:assumed or observed?

• Collective vs point-wise error estimates

• Related estimates of neighboring samples

Page 16: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Stochastic Discrimination

• Set-theoretic abstraction

• Probabilities in model or feature spaces

• Enrichment / Uniformity / Projectability

• Convergence by law of large numbers

Page 17: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Stochastic Discrimination

• Algorithm for uniformity enforcement

• Fewer, but more sophisticated classifiers

• Other ways to address the 3-way trade-off

Page 18: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Geometry vs Probability

• Geometry of classifiers

• Rule of generalization to unseen samples

• Assumption of representative samples

Optimistic error bounds

• Distribution-free arguments

Pessimistic error bounds

Page 19: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Sampling Density

N = 2 N = 10

N = 100 N = 500 N = 1000

Page 20: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Summary of Difficulties

• Many theories have inadequate assumptions• Geometry and probability lack connection• Combinatorics defies detailed modeling• Attempt to cover all cases gives weak results

• Empirical results overly specific to problems• Lack of systematic organization of evidences

Page 21: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

More Questions

• How do confidence scores differ from feature values?• Is combination a convenience or a necessity?• What are common among various

combination methods?• When should the combination hierarchy

terminate?

Page 22: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Data Dependent Behavior of Classifiers

• Different classifiers excel in

different problems

• So do combined systems

• This complicates theories and

interpretation of observations

Page 23: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Questions to ask:

• Does this method work for all problems?

• Does this method work for this problem?

• Does this method work for this type of problems?

Study the interaction of

data and classifiers

Page 24: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Characterization of

Data and Classifier Behavior

in a Common Language

Page 25: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Sources of Difficultyin Classification

• Class ambiguity

• Boundary complexity

• Sample size and dimensionality

Page 26: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Class Ambiguity

• Is the problem intrinsically ambiguous?

• Are the classes well defined?

• What is the information content of the features?

• Are the features sufficient for discrimination?

Page 27: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

• Kolmogorov complexity

• Length may be exponential in dimensionality

• Trivial description: list all points, class labels

• Is there a shorter description?

Boundary Complexity

Page 28: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Sample Size & Dimensionality

• Problem may appear deceptively simple or complex with small samples

• Large degree of freedom in high-dim. spaces

• Representativeness of samples vs.

generalization ability of classifiers

Page 29: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Mixture of Effects

• Real problems often have mixed effects of

class ambiguity

boundary complexity

sample size & dimensionality

• Geometrical complexity of class manifolds

coupled with probabilistic sampling process

Page 30: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Easy or Difficult Problems

• Linearly separable problems

Page 31: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Easy or Difficult Problems

• Random noise

1000 points 500 points 100 points 10 points

Page 32: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Easy or Difficult Problems

• Others

Nonlinearboundary Spirals

4x4checkerboard

10x10checkerboard

Page 33: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Description of Complexity

• What are real-world problems like?

• Need a description of complexity to

– set expectation on recognition accuracy

– characterize behavior of classifiers

• Apparent or true complexity?

Page 34: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Possible Measures

• Separability of classes– linear separability– length of class boundary– intra / inter class scatter and distances

• Discriminating power of features– Fisher’s discriminant ratio– overlap of feature values– feature efficiency

Page 35: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Possible Measures

• Geometry, topology, clustering effects– curvature of boundaries– overlap of convex hulls– packing of points in regular shapes– intrinsic dimensionality– density variations

Page 36: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Linear Separability

• Intensively studied in early literature• Many algorithms only stop with positive

conclusions– Perceptrons, Perceptron Cycling Theorem, 1962– Fractional Correction Rule, 1954– Widrow-Hoff Delta Rule, 1960– Ho-Kashyap algorithm, 1965– Linear programming, 1968

Page 37: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Length of Class Boundary

• Friedman & Rafsky 1979– Find MST (minimum spanning tree)

connecting all points regardless of class

– Count edges joining opposite classes

– Sensitive to separability

and clustering effects

Page 38: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Fisher’s Discriminant Ratio

• Defined for one feature:

22

21

221

σσ)μ(μ

f

2

22

121 ,,, σσμμ

means, variances of classes 1,2

• One good feature makes a problem easy

• Take maximum over all features

Page 39: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Volume of Overlap Region

• Overlap of class manifolds

• Overlap region of each dimension asa fraction of range spanned by the two classes

• Multiply fractions to estimate volume

• Zero if no overlap

Page 40: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Convex Hulls & Decision Regions

• Hoekstra & Duin 1996

• Measure nonlinearity of a classifier w.r.t.

a given dataset

• Sensitive to smoothness of decision boundaries

Page 41: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Shapes of Class Manifolds

• Lebourgeois & Emptoz 1996

• Packing of same-class

points in hyperspheres

• Thick and spherical,

or thin and elongated manifolds

Page 42: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Measures of Geometrical Complexity

Page 43: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Space of Complexity Measures

• Single measure may not suffice

• Make a measurement space

• See where datasets are in this space

• Look for a continuum of difficulty:

Easiest Cases Most difficult cases

Page 44: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

• UC-Irvine collection

• 14 datasets (no missing values, > 500 pts)

• 844 two-class problems

• 452 linearly separable

• 392 linearly nonseparable

• 2 - 4648 points each

• 8 - 480 dimensional feature spaces

Data Sets: UCI

Page 45: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Data Sets: Random Noise

• Randomly located and labeled points

• 100 artificial problems

• 1 to 100 dimensional feature spaces

• 2 classes, 1000 points per class

Page 46: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Patterns in Measurement Space

Page 47: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Correlated or Uncorrelated Measures

Page 48: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Separation + ScatterSeparation

Page 49: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Observations

• Noise sets and linearly separable sets occupy opposite ends in many dimensions

• In-between positions tell relative difficulty

• Fan-like structure in most plots

• At least 2 independent factors, joint effects

• Noise sets are far from real data

• Ranges of noise sets: apparent complexity

Page 50: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Principle Component Analysis

Page 51: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Principle Component Analysis

Page 52: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

A Trajectory of Difficulty

Page 53: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

What Else Can We Do?

• Find clusters in this space

• Determine intrinsic dimensionality

Study effectiveness of these measures

Interpret problem distributions

Page 54: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

What Else Can We Do?

• Study specific domains with these measures• Study alternative formuations, subproblems

induced by localization, projection, transformation

Apply these measures to more problems

Page 55: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

What Else Can We Do?

Relate complexity measures to

classifier behavior

Page 56: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Bagging vs Random Subspaces for Decision Forests

Subspaces betterSubsampling better

Fisher’s discriminant ratioLength of class boundary

Page 57: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Bagging vs Random Subspaces for Decision Forests

Nonlinearity, nearest neighborsvs linear classifier

% Retained adherence subsets,Intra/inter class NN distances

Page 58: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Error Rates of Individual Classifiers

Error rate, nearest neighborsvs linear classifier

Error rate, single treesvs sampling density

Page 59: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Observations• Both types of forests are good for

problems of various degrees of difficulty

• Neither is good for extremely difficult cases- many points on boundary - ratio of intra/inter class NN dist. close to 1 - low Fisher’s discriminant ratio - high nonlinearity of NN or LP classifiers

• Subsampling is preferable for sparse samples

• Subspaces is preferable for smooth boundaries

Page 60: Tin Kam Ho                 Bell Laboratories      Lucent Technologies

Conclusions

• Real-world problems have different types of geometric characteristics

• Relevant measures can be related to classifier accuracies

• Data complexity analysis improves understanding of classifier or combination behavior

• Helpful for combination theories and practices


Recommended