+ All Categories
Home > Documents > Statistical Learning Methods for Useful Links Emerging...

Statistical Learning Methods for Useful Links Emerging...

Date post: 24-May-2018
Category:
Upload: dodien
View: 223 times
Download: 1 times
Share this document with a friend
20
3/27/2003 DASFAA Tutorial, Kyoto 1 Statistical Learning Methods for Emerging Database Applications Edward Chang Associate Professor, Electrical Engineering, UC Santa Barbara CTO, VIMA Technologies 3/27/2003 DASFAA Tutorial, Kyoto 2 Useful Links aRelated Publications `http://www-db.stanford.edu/~echang/ aSoftware Free Trial `http://www.imagebeagle.com `Locate objectionable images on your hard drives `Before your boss finds it!!! 3/27/2003 DASFAA Tutorial, Kyoto 3 Outline aStatistical Learning aEmerging Applications Data Characteristics aClassical Models aKernel Methods `Linear Model View `Nearest Neighbor View `Geometric View aDimension Reduction Methods 3/27/2003 DASFAA Tutorial, Kyoto 4 Statistical Learning aProgram the computers to learn! aComputers improve performance with experience at some task aExample: `Task: playing checkers `Performance: % games it wins `Experience: expert players 3/27/2003 DASFAA Tutorial, Kyoto 5 Statistical Learning aTask Ŷ = f(U) `Represented by some model(s) `Implies hypothesis aPerformance `Measured by error functions aExperience (L) `Characterized by training data aAlgorithm (Φ) 3/27/2003 DASFAA Tutorial, Kyoto 6 Supervised Learning aX: Data `U: Unlabeled pool `L: Labeled pool aG: Labels `Regression `Classification aΦ: Learning algorithm af = Φ(L) aŶ = f(U)
Transcript
Page 1: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 1

3/27/2003 DASFAA Tutorial, Kyoto 1

Statistical Learning Methods for Emerging Database Applications

Edward ChangAssociate Professor,Electrical Engineering, UC Santa BarbaraCTO, VIMA Technologies

3/27/2003 DASFAA Tutorial, Kyoto 2

Useful Links

Related Publicationshttp://www-db.stanford.edu/~echang/

Software Free Trialhttp://www.imagebeagle.comLocate objectionable images on your hard drivesBefore your boss finds it!!!

3/27/2003 DASFAA Tutorial, Kyoto 3

Outline

Statistical LearningEmerging Applications Data CharacteristicsClassical ModelsKernel Methods

Linear Model ViewNearest Neighbor ViewGeometric View

Dimension Reduction Methods

3/27/2003 DASFAA Tutorial, Kyoto 4

Statistical Learning

Program the computers to learn!Computers improve performancewith experience at some taskExample:

Task: playing checkersPerformance: % games it winsExperience: expert players

3/27/2003 DASFAA Tutorial, Kyoto 5

Statistical Learning

Task Ŷ = f(U)Represented by some model(s)Implies hypothesis

PerformanceMeasured by error functions

Experience (L)Characterized by training data

Algorithm (Φ)3/27/2003 DASFAA Tutorial, Kyoto 6

Supervised Learning

X: DataU: Unlabeled pool L: Labeled pool

G: LabelsRegressionClassification

Φ: Learning algorithmf = Φ(L) Ŷ = f(U)

Page 2: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 2

3/27/2003 DASFAA Tutorial, Kyoto 7

Learning Algorithms

Linear ModelK-NNNeural NetworksDecision TreesKernel MethodsEtc.

3/27/2003 DASFAA Tutorial, Kyoto 8

Classical Model

N:Number of training instancesN+, N-

D:DimensionalityN >> D N → ∞

E.g., PAC learnabilityN- ≈ N+

3/27/2003 DASFAA Tutorial, Kyoto 9

Emerging DB Applications

N < DN+ << N-

ExamplesInformation Retrieval with relevance feedbackGene Profiling

3/27/2003 DASFAA Tutorial, Kyoto 10

Image Retrieval Demo

N < DN < 50D = 150

N+ << N-

ACM SIGMOD 01; ACM MM 01,02; IEEE CVPR 03

3/27/2003 DASFAA Tutorial, Kyoto 11

SVMactive

3/27/2003 DASFAA Tutorial, Kyoto 12

SVMactive

Page 3: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 3

3/27/2003 DASFAA Tutorial, Kyoto 13

SVMactive

3/27/2003 DASFAA Tutorial, Kyoto 14

SVMactive

3/27/2003 DASFAA Tutorial, Kyoto 15

Ranking

3/27/2003 DASFAA Tutorial, Kyoto 16

Gene Profiling ExampleN = 59 cases, D = 4026 genes

3/27/2003 DASFAA Tutorial, Kyoto 17

Outline

Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel Methods

Linear Model ViewNearest Neighbor ViewGeometric View

Dimension Reduction Methods

3/27/2003 DASFAA Tutorial, Kyoto 18

Linear Model

Y = β0 + ΣΣ βj Xj (j = 1 to p)Y = XTβRSS(β) = (y – Xβ)T(y – Xβ)

RSS: Residual Sum of Squareβ = (XTX)-1 XTy

Page 4: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 4

3/27/2003 DASFAA Tutorial, Kyoto 19

Linear Model

3/27/2003 DASFAA Tutorial, Kyoto 20

Maximum Likelihood

Y = β0 + ΣΣ βj Xj (j = 1 to p)Y = XTβY = XTβ + ε

ε (noise signals) are independentε → N (0, ∂2)

P(y|βx) has a normal dist. withMean at y = βxVariance ∂2

3/27/2003 DASFAA Tutorial, Kyoto 21

Linear Model

P(y|βx) → N (0, ∂2) Training

Given (x1,y1) (x2,y2) … (xn,yn)Infer P(β | x1, x2,… xn, y1, y2,…yn )By Bayes rule, orMaximum Likelihood Estimate

3/27/2003 DASFAA Tutorial, Kyoto 22

Maximum Likelihood

For what β isP(y1, y2,…yn | x1, x2,… xn, β) maximized?ΠΠ P(yi|βxi) maximized? ΠΠ exp(-½(yi-βxi/∂)2) maximized?ΣΣ (-½(yi-βxi/∂)2 maximized?ΣΣ (yi-βxi)2 minimized?

3/27/2003 DASFAA Tutorial, Kyoto 23

Least Square Linear Model

Solution Method #1RSS(β) = (y – Xβ)T(y – Xβ)β = (XTX)-1 XTy

Solution Method #2 (for D > N)Gradient decentPerceptron

3/27/2003 DASFAA Tutorial, Kyoto 24

Other Linear Models

LDAFind the projection direction which minimizes the overlap for two Gaussian distributions

Separating Hyperplane

Page 5: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 5

3/27/2003 DASFAA Tutorial, Kyoto 25

LDA

3/27/2003 DASFAA Tutorial, Kyoto 26

3/27/2003 DASFAA Tutorial, Kyoto 27

Separating Hyperplane

3/27/2003 DASFAA Tutorial, Kyoto 28

Separating Hyperplane

3/27/2003 DASFAA Tutorial, Kyoto 29

Maximum Margin Hyperplane

3/27/2003 DASFAA Tutorial, Kyoto 30

Linear Model Fits All Data?

Page 6: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 6

3/27/2003 DASFAA Tutorial, Kyoto 31

How about Joining the Dots?

Y(x) = 1/k ΣΣ yi,

xi ∈Nk(x)K = 1

3/27/2003 DASFAA Tutorial, Kyoto 32

Linear Models

N ≥ DLeast SquareLDA

D > NPerceptronMaximum Hyperplane

3/27/2003 DASFAA Tutorial, Kyoto 33

Linear Model Fits All?

3/27/2003 DASFAA Tutorial, Kyoto 34

NN with k = 1

3/27/2003 DASFAA Tutorial, Kyoto 35

Nearest Neighbor

Four Things Make a Memory Based Learner

A distance functionK: number of neighbors to consider?A weighted function (optional)How to fit with the local points?

3/27/2003 DASFAA Tutorial, Kyoto 36

Problems

Fitting NoiseJagged Boundaries

Page 7: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 7

3/27/2003 DASFAA Tutorial, Kyoto 37

Solutions

Fitting NoisePick a Larger K?

Jagged BoundariesIntroducing Kernel as a weighting function

3/27/2003 DASFAA Tutorial, Kyoto 38

NN with k = 15

3/27/2003 DASFAA Tutorial, Kyoto 39

NN

3/27/2003 DASFAA Tutorial, Kyoto 40

Solutions

Fitting NoisePick a larger K?

Jagged BoundariesIntroducing Kernel as a weighting function

3/27/2003 DASFAA Tutorial, Kyoto 41

Nearest Neighbor -> Kernel Method

Four Things Make a Memory Based Learner

A distance functionK: number of neighbors to consider? AllA weighted function: RBF kernelsHow to fit with the local points? Predict weights

3/27/2003 DASFAA Tutorial, Kyoto 42

Kernel Method

RBF Weighted FunctionKernel width holds the keyUse cross validation to find the “optimal” width

Fitting with the Local PointsWhere NN meets Linear Model

Page 8: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 8

3/27/2003 DASFAA Tutorial, Kyoto 43

LM vs. NNLinear Model

f(x) is approximated by a global linear functionMore stable, less flexible

Nearest NeighborK-NN assumes f(x) is well approximated by a locally constant functionLess stable, more flexible

Between LM and NNThe other models…

3/27/2003 DASFAA Tutorial, Kyoto 44

Decision Theories

Bias & Variance TradeoffBayes PredictionVC DimensionalityPAC Learnability

3/27/2003 DASFAA Tutorial, Kyoto 45

Variance vs. Bias

MSE(x0) = ET [f(x0) – ŷ0]2

= ET[ŷ0 – ET(ŷ0)]2 + [ET(ŷ0)– f(x0)]2

Error = VarT(ŷ0) + Bias2(ŷ0)

3/27/2003 DASFAA Tutorial, Kyoto 46

Outline

Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel MethodsDimension Reduction Methods

3/27/2003 DASFAA Tutorial, Kyoto 47

Where Are We and Where Am I Heading To ?

LM and NNKernel Method of Three Views

LM viewNN viewGeometric view

3/27/2003 DASFAA Tutorial, Kyoto 48

Linear Model View

Y = β0 + ΣΣ β XSeparating Hyperplane

Max||β||=1 CSubject to yyii f(f(xxii) ) ≥≥ C, orC, oryyii ((β0 +β xi) ≥≥ CC

Page 9: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 9

3/27/2003 DASFAA Tutorial, Kyoto 49

Separating Hyperplane

3/27/2003 DASFAA Tutorial, Kyoto 50

Separating Hyperplane

3/27/2003 DASFAA Tutorial, Kyoto 51

Maximum Margin Hyperplane

3/27/2003 DASFAA Tutorial, Kyoto 52

Classifier Margin

Margin Defined as with of the boundary before hitting a data object

Maximum MarginTends to minimize classification varianceNo formal theory for this yet

3/27/2003 DASFAA Tutorial, Kyoto 53

Separating Hyperplane

3/27/2003 DASFAA Tutorial, Kyoto 54

M’s Mathematical Representation

Plus-plane{x: wx+b = +1}

Minus-plane{x: wx+b = -1}

w ⊥ Plus-planew(u – v) = 0, if u and v on plus-plane

w ⊥ Minus-plane

Page 10: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 10

3/27/2003 DASFAA Tutorial, Kyoto 55

Separating Hyperplane

3/27/2003 DASFAA Tutorial, Kyoto 56

M

Let x- be any point on minus-planeLet x+ be the closest plus-plane-point to x-

x+ = x- + λw, whyThe line (x+x-) ⊥ minus-plane

M = |x+ - x-|

3/27/2003 DASFAA Tutorial, Kyoto 57

M

1. wx- + b = -1 2. wx+ + b = 1 3. x+ = x- + λw 4. M = |x+ - x-|5. w(x- + λw) + b = 1 (from 2 & 3)6. wx- + b + λww = 17. λww = 2

3/27/2003 DASFAA Tutorial, Kyoto 58

M

1. λww = 22. λ = 2/ww3. M = |x+ - x-| = |λw| = λ|w| = 2/|w|

4. Max MGradient decent, simulated annealing, EM, Newton’s method?

3/27/2003 DASFAA Tutorial, Kyoto 59

Max M

Max M = 2/|w|Min |w|/2Min |w|2/2

subject to yi(xiw+b) ≥ 1i = 1,…,N

Quadratic criterion with linear inequality constraints

3/27/2003 DASFAA Tutorial, Kyoto 60

Max M

Min |w|2/2subject to yi(xiw+b) ≥ 1i = 1,…,N

Lp = minw,b |w|2/2 + ΣΣi=1..N αi[yi(xiw+b)-1]

w = ΣΣi=1..N αiyixi

0 = ΣΣi=1..N αiyi

Page 11: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 11

3/27/2003 DASFAA Tutorial, Kyoto 61

Wolfe Dual

Ld = ΣΣi=1..N α - 1/2 ΣΣΣΣi,j=1..Nαiαjyiyjxixj

Subject to αi ≥ 0αi [yi(xiw+b)-1] = 0KKT conditions⌧αi > 0, yi(xiw+b) = 1 (Support Vectors)⌧αi = 0, yi(xiw+b) > 1

3/27/2003 DASFAA Tutorial, Kyoto 62

Class Predictionyyqq = = w xq + b

w = ΣΣi=1..N αiyixi

yyqq = sign(= sign(ΣΣi=1..N αiyi(xi ·Xq) + b)

3/27/2003 DASFAA Tutorial, Kyoto 63

Non-seperatable Classes

Soft Margin HyperplaneBasis Expansion

3/27/2003 DASFAA Tutorial, Kyoto 64

Non-separating Case

3/27/2003 DASFAA Tutorial, Kyoto 65

Soft Margin SVMs

Min |w|2/2subject to yi(xiw+b) ≥ 1i = 1,…,N

Min |w|2/2 + C ∑εi

xiw+b ≥ 1 - εi if yi = 1xiw+b ≤ -1 + εi if yi = -1εi ≥ 0

3/27/2003 DASFAA Tutorial, Kyoto 66

Non-separating Case

Page 12: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 12

3/27/2003 DASFAA Tutorial, Kyoto 67

Wolfe Dual

Ld = ΣΣi=1..N α - 1/2 ΣΣΣΣi,j=1..Nαiαjyiyjxixj

Subject to C ≥ αi ≥ 0ΣΣ αiyi = 0KKT conditions

yyqq = = sign ((ΣΣi=1..N αiyi(xi ·Xq) + b)

3/27/2003 DASFAA Tutorial, Kyoto 68

Basis Function

3/27/2003 DASFAA Tutorial, Kyoto 69

Harder 1D Example

3/27/2003 DASFAA Tutorial, Kyoto 70

Basis Function

Φ(X) = (x, x2)

3/27/2003 DASFAA Tutorial, Kyoto 71

Harder 1D Example

3/27/2003 DASFAA Tutorial, Kyoto 72

Some Basis Functions

Φ(X) = ΣΣ γmhm(X) hm(X) Rp → R

Common FunctionsPolynomialRadial basis functionsSigmoid functions

Page 13: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 13

3/27/2003 DASFAA Tutorial, Kyoto 73

Wolfe DualLd = ΣΣi=1..N α - 1/2 ΣΣΣΣi,j=1..Nαiαjyiyj Φ(xi)Φ (xj)Subject to

C ≥ αi ≥ 0ΣΣ αiyi = 0KKT conditions

yyqq = sign (= sign (ΣΣi=1..N αiyi(Φ(xi)·Φ(Xq)) + b)K(xi, xj) = Φ(xi)·Φ(Xj)

Kernel function!

3/27/2003 DASFAA Tutorial, Kyoto 74

Quadratic Basis Functions

Φ(X) = {1, xi, xi xj}, ij = 1..p(p+1)(p+2) termsP2 termsO(P2) computational cost

It is equivalent to (xixj+1)2

O(p) computational costTotal Cost

O(N2p)

3/27/2003 DASFAA Tutorial, Kyoto 75

Dot Product Saves the Day

O(N2p)Quadratic

O(N2p2)Cubic

O(N2p3)Quartic

O(N2p4)

3/27/2003 DASFAA Tutorial, Kyoto 76

Quiz

What is a polynomial kernel degree dfunction’s signature?(xixj+1)d

3/27/2003 DASFAA Tutorial, Kyoto 77

Nearest Neighbor View

Z, a set of zero mean jointly Gaussian random variables,

Each Zi corresponds to one example Xi

Cov (zi, zj) = K(xi, xj)yi, the lable of zi, +1 or -1

P(yi | zi) = σ(yi,zi)

3/27/2003 DASFAA Tutorial, Kyoto 78

Training Data

Page 14: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 14

3/27/2003 DASFAA Tutorial, Kyoto 79

General Kernel Classifier [Jaakkola, etc. 99]

MAP Classification for xt

yt = sign (Σ αi yi K(xt,xi)) K(xi, xj) = Cov (zi, zj) (some similarity function)

Supervised Training: Compute αi Given X and y, andAn error function such as J(α) = - ½ Σ αi αj yi yj K(xi,xj) + Σ F(αi)

3/27/2003 DASFAA Tutorial, Kyoto 80

Leave One Out

3/27/2003 DASFAA Tutorial, Kyoto 81

SVMs yt = sign (Σ αi yi K(xt,xi))(yi xi) training data, αi nonnegative, and kernel K positive definiteαi is obtained by maximizing

J(α) = - ½ Σ αi αj yi yj K(xi,xj) + Σ F(αi)F(αi) = αi

αi ≥ 0, Σyiαi = 0

3/27/2003 DASFAA Tutorial, Kyoto 82

SVMs

3/27/2003 DASFAA Tutorial, Kyoto 83

Important Insight

K(xi, xj) = Cov (zi, zj) To design of a kernel is to design a similarity function that produces a positive definite covariance matrix on the training instances

3/27/2003 DASFAA Tutorial, Kyoto 84

Basis Function Selection

Three General ApproachesRestriction methods⌧Limit the class of functionsSelection methods⌧Scan the dictionary adaptively (Boosting)Regularization methods⌧Use the entire dictionary but restrict

coefficients (Ridge Regression)

Page 15: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 15

3/27/2003 DASFAA Tutorial, Kyoto 85

Overfitting?

Probably NotBecause

N free parameters (not D)Maximizing margin

3/27/2003 DASFAA Tutorial, Kyoto 86

Geometrical View

S = w X + b|w| = 1, b = 0V = {w | Si f(xi) > 0; i = 1..n, |w| = 1}SVM is the center of the largest sphere contained in V

3/27/2003 DASFAA Tutorial, Kyoto 87

SVMs

3/27/2003 DASFAA Tutorial, Kyoto 88

BPMs

Bayes Objective FunctionŜt = Bayes Z (Xt) = argmin Si in S E H|Z = x [l(H(x), Si)]

BPMs [Herbrich, etc. 2001]Abp= argmin h in H Ex[E H|Z = x [l(H(x), h(x))]]

3/27/2003 DASFAA Tutorial, Kyoto 89

BPMs

Linear ClassifierInput X Posses Spherical Gaussian Density

BP is the Center of Mass of the Version Space

3/27/2003 DASFAA Tutorial, Kyoto 90

BPMs vs. SVMs

Page 16: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 16

3/27/2003 DASFAA Tutorial, Kyoto 91

BPMs

Use SVMs to find a good h in HFind the BP

Billiard Algorithm [Herbrich, etc. 2001]

Perceptron Algorithm [Herbrich, etc. 2001]

3/27/2003 DASFAA Tutorial, Kyoto 92

Billiard Ball Algorithm (R. Herbrich )

3/27/2003 DASFAA Tutorial, Kyoto 93

Outline

Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel MethodsDimension Reduction Methods

3/27/2003 DASFAA Tutorial, Kyoto 94

Dimensionality Curse

D: Data DimensionWhen D increases

Nearest neighbors are not localAll points are equally distanced

3/27/2003 DASFAA Tutorial, Kyoto 95

Sparse High-D Space [C. Aggarwal, etc. ICDT 2001]

Hyper-cube Range Queries

dd ssP =][3/27/2003 DASFAA Tutorial, Kyoto 96

Page 17: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 17

3/27/2003 DASFAA Tutorial, Kyoto 97

Sparse High-D Space

Spherical Range Queries

3/27/2003 DASFAA Tutorial, Kyoto 98

)12(

)5.0()]5.0,([+Γ

•=∈ dQspRP

ddd π

3/27/2003 DASFAA Tutorial, Kyoto 99 3/27/2003 DASFAA Tutorial, Kyoto 100

Dimensionality Curse

3/27/2003 DASFAA Tutorial, Kyoto 101 3/27/2003 DASFAA Tutorial, Kyoto 102

So?

Is nearest neighbor estimate cursed in high-D spaces?

Yes!When D is large and N is relatively small, the estimate is off!!

Page 18: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 18

3/27/2003 DASFAA Tutorial, Kyoto 103

Are We Doomed?

How does the curse affect classification?Similar objects tend to clustertogetherClassification makes binary prediction

3/27/2003 DASFAA Tutorial, Kyoto 104

Distribution of Distances

3/27/2003 DASFAA Tutorial, Kyoto 105

Some Solutions to High-D

Restricted Estimators Specifying the nature of local neighborhood

Adaptive Feature Reduction PCA, LDA

Dynamic Partial Function

3/27/2003 DASFAA Tutorial, Kyoto 106

Three Major Paradigms

Preserve data description in a lower dimensional space

PCAMaximize discriminability in a lower dimensional space

LDAActivate only similar channels

DPF

3/27/2003 DASFAA Tutorial, Kyoto 107

Minkowski Distance

Objects P and QD = (ΣM (pi - qi)n)1/n

Similar images are similar in all M features

3/27/2003 DASFAA Tutorial, Kyoto 108

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

00.

060.

130.

190.

250.

320.

380.

440.

510.

570.

630.

690.

760.

820.

880.

95

Feature Distance

Freq

uenc

y

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

00.

060.

130.

190.

250.

320.

380.

440.

510.

570.

630.

690.

760.

820.

880.

95

Feature Distance

Freq

uenc

y

Page 19: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 19

3/27/2003 DASFAA Tutorial, Kyoto 109

Weighted Minkowski Distance

D = (ΣM wi(pi - qi)n)1/n

Similar images are similar in the same subset of the M features

3/27/2003 DASFAA Tutorial, Kyoto 110

0 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 0

0.007545 0.01307 0.004637 0.002413 0.002635 0.002954 0.0020070.014669 0.02717 0.010578 0.006734 0.007725 0.006379 0.0057660.012615 0.023055 0.009333 0.006764 0.007363 0.006593 0.0054430.082128 0.212612 0.068016 0.037835 0.032241 0.018068 0.0132030.061564 0.176548 0.045542 0.026445 0.026374 0.018583 0.0220370.019243 0.037016 0.015684 0.010834 0.012792 0.013536 0.0093460.09418 0.153677 0.066896 0.040249 0.036368 0.030341 0.0211380.1284 0.335405 0.13774 0.072613 0.054947 0.039216 0.043319

0.041414 0.101403 0.035881 0.022633 0.018991 0.017131 0.019450.014024 0.049782 0.01457 0.0053 0.004439 0.003041 0.0052260.049319 0.120274 0.045804 0.020165 0.019499 0.013805 0.018513

GIF

00.020.040.060.080.1

0.120.14

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

Feature Number

Aver

age

Dis

tanc

e

0 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

0.002923 0.004377 0.029086 0.017063 0.007649 0.002019 0.001984 0.011560.006648 0.010143 0.070708 0.046142 0.023502 0.005178 0.005169 0.030140.006298 0.009264 0.075118 0.042225 0.020053 0.006285 0.006533 0.0300430.010198 0.056025 0.052869 0.033199 0.018294 0.00688 0.006858 0.023620.017066 0.047514 0.104013 0.073459 0.037468 0.013849 0.01293 0.0483440.008148 0.015337 0.074134 0.044238 0.021222 0.005197 0.005099 0.0299780.013529 0.051743 0.063263 0.038084 0.020885 0.010481 0.009844 0.0285110.045746 0.104141 0.145924 0.11276 0.065015 0.026333 0.02593 0.0751920.026167 0.034522 0.085067 0.054154 0.02918 0.015887 0.014371 0.0397320.002676 0.012148 0.008913 0.004682 0.002452 0.000913 0.000905 0.0035730.014527 0.036084 0.046779 0.024712 0.017418 0.004182 0.004991 0.0196160.012121 0.030269 0.045198 0.022268 0.012468 0.004706 0.004955 0.017919

Scale up/down

00.050.1

0.150.2

0.250.3

0.350.4

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

Feature Number

Aver

age

Dis

tanc

e

0.024788 0.069615 0.0226 0.009364 0.01 0.00678 0.0097120.094781 0.227558 0.099002 0.046466 0.047815 0.036883 0.0246990.093399 0.233519 0.188091 0.043026 0.037991 0.022151 0.0240640.040228 0.102763 0.034949 0.014184 0.01465 0.010237 0.0155170.001163 0.000896 0.000722 0.000627 0.000349 0.000452 0.0027580.006947 0.006769 0.003541 0.006377 0.002048 0.005515 0.0130060.006365 0.005313 0.002064 0.004006 0.002055 0.003338 0.01010.011705 0.010935 0.006615 0.007506 0.003319 0.005911 0.0152110.009434 0.010169 0.004484 0.006306 0.002582 0.004798 0.0136570.006305 0.005997 0.003392 0.005719 0.002382 0.004853 0.0128020.005835 0.00945 0.004323 0.00564 0.002688 0.004535 0.0063320.008149 0.009636 0.0047 0.006213 0.002564 0.003375 0.0064210.006776 0.010315 0.005393 0.008004 0.003845 0.005659 0.0132030.001526 0.002551 0.000576 0.000371 0.000331 0.000286 0.000380.016302 0.022657 0.007055 0.00353 0.002171 0.004162 0.003980.012414 0.020159 0.007076 0.003102 0.00188 0.004606 0.003490.007231 0.013591 0.004979 0.001092 0.000582 0.002766 0.0007410.011588 0.015102 0.005764 0.003855 0.00262 0.004584 0.0037920.01212 0.016013 0.006441 0.004048 0.002728 0.004856 0.004241

0.012235 0.01671 0.00483 0.002616 0.00197 0.00268 0.001672

Cropping

00.05

0.10.15

0.20.25

0.30.35

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

Feature Number

Aver

age

Dis

tanc

e

0.006109 0.019169 0.032795 0.015229 0.008667 0.002357 0.00292 0.0123940.01223 0.070665 0.046472 0.02549 0.017445 0.008694 0.00841 0.021302

0.019067 0.08113 0.04592 0.024327 0.014169 0.004995 0.005275 0.0189370.011323 0.029089 0.063856 0.037716 0.01988 0.00522 0.005556 0.0264460.000995 0.000971 0.00241 0.001415 0.000736 0.000275 0.000272 0.0010220.007103 0.006337 0.015615 0.008709 0.003433 0.001572 0.002071 0.006280.004321 0.004457 0.012494 0.007507 0.003403 0.001351 0.001976 0.0053460.007451 0.008135 0.017145 0.008711 0.003192 0.001154 0.00223 0.0064860.00576 0.006822 0.015235 0.00869 0.003676 0.001193 0.002159 0.006191

0.006491 0.005948 0.013473 0.007436 0.003165 0.001777 0.002377 0.0056460.003832 0.005257 0.011884 0.008077 0.002654 0.001227 0.001213 0.0050110.004812 0.005389 0.011737 0.00729 0.003216 0.001534 0.002039 0.0051630.008795 0.007888 0.016303 0.008801 0.004048 0.002367 0.0027 0.0068440.000451 0.000707 0.002277 0.001346 0.000797 0.000253 0.000239 0.0009820.004914 0.006924 0.01499 0.009123 0.006657 0.003364 0.003391 0.0075050.004473 0.006398 0.017247 0.008858 0.005219 0.002338 0.002392 0.0072110.001723 0.003639 0.010426 0.005216 0.003024 0.00043 0.000423 0.0039040.00427 0.005712 0.011221 0.00856 0.006923 0.004464 0.004462 0.007126

0.004978 0.006186 0.009864 0.007161 0.005881 0.003835 0.003847 0.0061180.001722 0.0046 0.015611 0.007291 0.00338 0.000508 0.00049 0.005456

Rotation

0

0.02

0.04

0.06

0.08

0.1

0.12

1 10 19 28 37 46 55 64 73 82 91 100

109

118

127

136

Feature Number

Ave

rage

Dis

tanc

e

3/27/2003 DASFAA Tutorial, Kyoto 111

Similarity Theories

Objects are similar in all respects (Richardson 1928)Objects are similar in some respects (Tversky 1977)Similarity is a process of determining respects, rather than using predefined respects (Goldstone 94)

3/27/2003 DASFAA Tutorial, Kyoto 112

DPF

Which Place is Similar to Kyoto?PartialDynamicDynamic Partial Function

3/27/2003 DASFAA Tutorial, Kyoto 113

Precision/Recall

3/27/2003 DASFAA Tutorial, Kyoto 114

Summary

Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel Methods

Linear Model ViewNearest Neighbor ViewGeometric View

Dimension Reduction Methods

Page 20: Statistical Learning Methods for Useful Links Emerging ...infolab.stanford.edu/~echang/DASFAA03-tutorial-handout.pdf3/27/2003 DASFAA Tutorial, Kyoto 61 Wolfe Dual aLd = ...

2003-4-5

DASFAA 2003 Plenary Tutorial [email protected] all right reserved 20

3/27/2003 DASFAA Tutorial, Kyoto 115

Emerging DB Applications

N < DN+ << N-

ExamplesInformation Retrieval with relevance feedbackGene ProfilingBioinformatics

3/27/2003 DASFAA Tutorial, Kyoto 116

Useful Links

Related Publicationshttp://www-db.stanford.edu/~echang/

Software Free Trialhttp://www.imagebeagle.comLocate objectionable images on your hard drivesBefore your boss finds it!!!

3/27/2003 DASFAA Tutorial, Kyoto 117

References1. The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J.

Friedman, Springer, N.Y., 20012. Machine Learning, T. Mitchell, 19973. High-dimensional Data Analysis, D. Donoho, American Math. Society Lecture,

20004. Support Vector Machine Active Learning for Image Retrieval, S. Tong and E.

Chang, ACM MM, 20015. Dynamic Partial Function, B. Li and E. Chang, ACM Journal, 20036. Pattern Discovery in Sequences under a Markov Assumption, D. Chudova and

P. Smyth, ACM KDD 20027. Bayes Point Machines, R. Herbrich, T. Graepel and C. Campbell, Journal of

Machine Learning Research, 20018. The Nature of Statistical Learning Theory, V. Vapnik, Springer, N.Y., 19959. Probabilistic Kernel Regression Models, T. Jaakkola and D. Haussler,

Conference of AI and Statistics, 199910. Support Vector Machines, Lecture Notes, A. Moore, CMU11. On the Surprising Behavior of Distance Metrics in High-dimensional Space, C.

Aggarwal, A. Hinneburg, and D. Keim, ICDT 2001


Recommended