Learning from data streams via online transductionsho/pub/icdm_tdm2004_slides.pdfLearning from data...

Learning from data streams via onlinetransduction

Shen-Shyang Ho and Harry Wechsler

Department of Computer Science, George Mason University

IEEE-ICDM-TDM 2004

Presented byShen-Shyang Ho

1

Content:

1. Tranduction vs Induction

2. Motivation and Problem

3. Online transduction

4. Contribution/Solution

5. Speed and Performance: a comparison

6. Transductive Confidence Machine (TCM)

7. Adiabatic (incremental) SVM

8. Modified algorithm

9. An application example: stream-based active learning

10. Conclusion

2

Transduction and Induction

1. Induction: a general rule is inferred from the training data and pre-dictions are based on this general rule.

2. Transduction: the unknown (prediction) values at some points of inter-est are estimated directly from the training data.

3. The practical distinction between transduction and induction is whetherwe extract and store the general rule or not

3

Motivation and Problem:

1. Problem: slower and more computationally expensive.

2. Show that it can be fast.

3. Engineer a fast incremental TCM.

4. Motivate using transduction in online/realtime data-mining

4

Online transduction

1. Online Setting: examples are fed to a learner one by one.

2. Transduction is unsuitable for off-line and batch learningsetting ... cannot match Induction in term of speed.

3. In an online setting, the general rule, induced from the current trainingset, updates whenever a new packet of data or an individual datumbecomes available.

4. What do we mean by the prediction accuracy of a particular (target)rule in an online setting?

5. Vovk and et. al. point out that “it makes more sense to empha-size the overall frequency of accurate prediction” during theonline process.

5

Contribution/Solution

1. Integrating adiabatic SVM into TCM

2. ... to construct a fast TCM

3. Not all incremental classifier can be integrated

6

Experiment: Speed

7

Experiment: Performance

Figure 1: LEFT: cumulative errors; CENTER: cumulative uncertain predictions; RIGHT: cumulative correct predictions fromall the certain predictions

8

Transductive Confidence Machine: Definitions

Individual Strangeness Measure:

1. Nearest Neighbor:

αi =minj 6=i;yj=yi

d(xi, xj)

minj 6=i;yj 6=yid(xi, xj)

2. SVM:

(a) Lagrange multipliers

(b) Distance from the hyperplane.

9

Transductive Confidence Machine: Definitions

Consider a sequence of labeled examples, {z1, z2, · · · , zn−1} with their cor-responding strangeness values, α1, α2, · · · , αn−1 and an unlabeled example,zn, assigned a particular label and its strangeness value, αn, we define ap-value function t : Zn → [0, 1] which returns a p-value of zn (assignedthe particular label) by

t(z1, z2, · · · , zn) =#{i = 1, · · · , n : αi ≥ αn}

n(1)

which satisfies

P{(z1, z2, · · · , zn) : t(z1, z2, · · · , zn) ≤ r} ≤ r

for any r ∈ [0, 1] and for any probability P that the probability distributionof Z induced over the choice of n examples.

10

Transductive Confidence Machine: Algorithm

Standard Algorithm:

Input: training set T = {z1, z2, · · · , zn−1} and an unlabeled example zn = (xn, ?):

1. FOR y = 1 to c (number of possible classes)

(a) Label zn as class y

(b) Construct a classifier (e.g. SV M) using T and (xn, y)

(c) Use SV M to compute strangeness,α = {α1, α2, · · · , αn } for T and (xn, y)

(d) Use strangeness, α to compute p-value for (xn, y)

ENDFOR

2. Assign the label with the largest p-value to zn = (xn, ?)

3. Confidence = 1 - second highest p-value

4. Credibility = highest p-value

11

Transductive Confidence Machine - Region Pre-dictor

1. A (deterministic) region predictor is a function, when given a train-ing set T , an unlabeled test input zn and a confidence level δ, returns aset of possible predictions Γ1−δ(T, zn) of zn with a degree of confidence,i.e.

Γ1−δ(T, zn)

contains y ∈ Y that satisfy

#{i = 1, · · · , n : αi ≥ αyn}

n> δ

2. A “uncertain” region predictor is a region predictor that hasmore than one label. The region predictor can also be an empty set.

12

Online Transductive Confidence Machine

Randomized TCM (rTCM) defines a randomized region predictor Γ suchthat for any label y ∈ Y ,

1. Include label y in Γ when#{i = 1, · · · , n : αi > αy

n}n

> δ

2. Do not include y in Γ when#{i = 1, · · · , n : αi ≥ αy

n}n

≤ δ

3. Otherwise, include y in Γ with probability

#{i = 1, · · · , n : αi ≥ αyn} − nδ

#{i = 1, · · · , n : αi = αyn}

y is included when the above expression is greater than a random number drawn from a uniformdistribution on [0, 1].

A rTCM is proven to have an error probability δ at trial n = 1, 2, · · · ,and this error probability at each trial is independent of other trials.

13

Adiabatic Incremental SVM

1. Main idea: All the examples in the training set do not violate theKKT conditions when a new example zn+1 is added to this training setwhen its Lagrange multiplier αn+1 is incremented.

2. Express new solution using old solution, new examples and other “statevariables” as needed.

3. In order to do so, some “bookkeeping” procedure is necessary.

• Assume ∆αn+1 is small enough so that no element moves across S(support vector), E (error vector) and R (remaining vectors).

• Upper Limit on Increment ∆αn+1

14

Modified fast online “tightly-coupled” TCM us-ing SVM - Algorithm

Algorithm (One transduction process for a new unlabeled example):

Input: zn = (xn, ?), T , {α, b}, J , Q, S, E, R.

1. Keep a copy of {α, b}, J , Q, S, E, R for re-initialization at the end of each FOR loop

2. FOR y = 1 to c (number of possible classes)

(a) Label zn as y (i.e. zn = (xn, y))(b) Initialize αy

znto zero.

(c) Compute Qznj for all j ∈ S (the set of support vectors) and include it into Q.(d) Compute gzn

using Equation (6).(e) If gzn > 0,

i. αyzn

= 0, and goto (g)(f) Elseif gzn

≤ 0, apply the largest possible increment ∆αyzn

so that (the first) one of the following conditions occurs:i. gzn

= 0: Add zn to S, update J accordingly, goto (g).ii. αy

zn= C: Add zn to E and goto (g).

iii. Do “bookkeeping” among S, E and R. Update J if S changesiv. Repeat (f) until no more changes to S, E and R.

(g) Compute p-values for zn = (xn, ?) labeled y using α = {α1, α2, · · · , αn−1, αyzn}.

(h) Test whether y is in the randomized region predictor of zn = (xn, ?) with confidence level 1− δ.(i) Store the p-value of zn labeled y.(j) Re-initialize {α, b}, J , Q, S, E, R to their respective input values.

ENDFOR

3. Assign the label with the largest p-value to example zn.

4. Calculate the confidence as 1− second largest p-value.

15

Application example: Active learning

Goals of active learning:

1. less computation due to smaller training sets without penalizing theperformance of the classifier

2. reducing the human labeling efforts and cost.

16

Using Statistical Information from TCM:

Deviation from an uncertain region predictor:

U(e) = |pi − pj|

where pi is the p-value for an unlabeled example e labeled as i and pj isthe p-value for the example labeled as j.

As U(e) increases from 0 to 1, the example e becomes more likely to bea particular label.

Selection Criteria: For a given example e, if U(e) ≤ η (a thresholdvalue), add e to the training set, T .

17

Application: Data-set

• Image Data-set

• 2310 examples in a 18 dimensional input space.

• 20 different training/testing partitions of Image data-set such that eachtraining set contains 1300 examples and each testing set contains 1010examples.

18

Experiment: Application

Method No of selected Accuracyexamples Estimate

Standard 1300 95.65±0.56Random 652.00±14.98 94.04±0.88Active 273.30±49.27 94.33±1.05

Ho S.-S. and Wechsler H. Stream-based active learning using algorithmic randomness theory, Manuscript in preparation

19

Conclusion

1. Strength of online TCM:

(a) Well-calibrated predictions under i.i.d. assumption.

(b) A statistical meaningful confidence/credibility measure, i.e. p-values.

(c) “confidence interval” (region predictor) for the possible prediction(s)at a specified confidence level.

2. Transduction can be fast and consistent (just need some thoughtfulengineering).

3. Can it be useful for online/realtime Data Mining?

4. Current Research: Concept drift and change detection problem.

20

Acknowledgement

The first author would like to thank Dr. Volodya Vovk for interestingdiscussions.

The first author would like to thank Dr Alex Gammerman and Dr.Volodya Vovk for the manuscript of their book “Algorithmic learning in arandom world”.

21

Main Reference used to modified the algorithm

1. Cauwenberghs, G. and Poggio, T. Incremental support vector machinelearning, Advances in Neural Information Processing Systems 13, MITPress, 409-415, 2000. Matlab code available onhttp://bach.ece.jhu.edu/pub/gert/svm/incremental/

2. Vovk V. On-line confidence machines are well-calibrated. Proc. 43thIEEE Symposium on Foundations of Computer Science, 187-196, 2002.

22

Date post:	08-Jul-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Learning from data streams via online transductionsho/pub/icdm_tdm2004_slides.pdfLearning from data...

Documents