+ All Categories
Home > Documents > Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Date post: 12-Jan-2016
Category:
Upload: nicole
View: 43 times
Download: 0 times
Share this document with a friend
Description:
Harnessing the Strengths of Anytime Algorithms for Constant Data Streams. Philipp Kranen , Thomas Seidl Data Management and Data Exploration Group RWTH Aachen University, Germany. - PowerPoint PPT Presentation
Popular Tags:
16
Harnessing the Strengths of Anytime Algorithms for Constant Data Streams Philipp Kranen , Thomas Seidl Data Management and Data Exploration Group RWTH Aachen University, Germany
Transcript
Page 1: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Harnessing the Strengthsof Anytime Algorithms

for Constant Data Streams

Philipp Kranen, Thomas Seidl

Data Management and Data Exploration GroupRWTH Aachen University, Germany

Page 2: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Agenda

2

Problem statement Formal model Novel approaches Evaluation Conclusion

Page 3: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Motivation – data streams in all day life …

3

type 1

type 2

type m

arrival interval ta

constant data stream

tf td

Page 4: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

time

qualit

y

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Problem statement

4

Budget algorithmsTailored to a specific application - no result in less time - no improvement

Anytime algorithmsNatural choice for varying streams + result after any time + exploit additional time

Constantstreams

Varyingstreams

Data streamsare ubiquitous Network traffic Sensor

measurements Customer data Surveillance data …

Goal: Improve the resulting quality on constant

streams over that of budget algorithms

Basic idea: spend less time on “confident” items

Prerequisite: a confidence measure for the current result

Page 5: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Model – premise

5

Example: classification of items on a conveyor belt Given

Anytime classification algorithm (e.g. anytime nearest neighbor)

Confidence measure (td – tf) ≥ ta

Time is normalized to [0, 1] t=0 corresponds to the result just after initialization t=1 complete model has been read, no further

improvement possible n improvement steps (e.g. n training set items for nearest

neighbor) Confidence measure ranges from 0 to 1

0 no confidence 1 certain

First: assume linear dependency between confidence and accuracy

Page 6: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Model – assumptions

6

scattering [ = σ(t) ]

budget confidence [ = μ(t) ]

ĉ

F(ĉ, t)

Individual confidences are scattered around the mean value (budget confidence)

dxxttgtcFc

ˆ

)),(),((),ˆ( time

Page 7: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Model – expected time to reach confidence ĉ

7

F(ĉ, t) is the probability that the confidence at time t is larger than ĉ

Use F(ĉ, t) as a cumulative distribution function (n steps!)

h(ĉ, tj) is the probability that we first exceeded ĉ from tj-1 to tj

Determine the expected time needed to reach ĉ by

)ˆ)((),ˆ( ctcptcF

n

ntch

ncFct

1

),ˆ(1

)0,ˆ()ˆ(

),ˆ(),ˆ(),ˆ( 1 jjj tcFtcFtch

time

F(0.3, t)1-F(0.3, t)

confidence

trad. budgetbatch approach

Page 8: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Batch approach

8

To improve the over all quality of the results,we have to process several items in parallel

batch approach

Buffer

batch approach

Buffer

time: t0

time: t0 + 5∙ta

arrival interval ta

type 1type 2

type m

tf td

type 1type 2

type m

Page 9: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

FiFo approach

9

Use FiFo queue with capacity of r Initialize and insert newly arriving items Remove eldest item on overflow Improve item s with lowest time weighted

confidence

if confidences are similar, give priority to older items

)/)(( afd tttr

remaining time

weig

ht

remaining time

weig

ht

))(()( stwsconf r

Page 10: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Evaluation – classifiers and confidence measures

10

Anytime nearest neighbor classifier(ordered w.r.t. leave-one-out performance on training set)

Anytime support vector machine(m times one class versus all)

Anytime Bayesian classification(Hierarchy of mixture densities per class)

k

isinnsd

knn esconf 1),(

)(

))()((211)(shsh

svmjjesconf

)|()|()(21sCPsCPsconf iibt

Page 11: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Evaluation – batch approach

11

Throughout: 4-fold cross validation, time scaled to [0, 1]

Budget: performance increases with allotted time Batch: accuracy increases with growing window

size (equal time) Largest (relative) improvement for small window

sizes

Page 12: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Evaluation – batch approach and model

12

Results confirm theoretic model: “linear” dependency

betweenaccuracy and confidence

Expected time t(c) decreases with growing window size

confidence

trad. budgetbatch approach

Page 13: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Evaluation – FiFo approach

13

FiFo approach also outperforms the respective budget algorithm

Accuracy increases with larger minimal time factor mtf

Confidence alone yields the best distribution of time allowance

remaining time

weig

ht

1

mtf

))(()( stwsconf r

Page 14: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Evaluation – comparison

14

FiFo approach performs better than the batch approach in comparable settings throughout all experiments

Performance improvement even for small window/queue sizes

Page 15: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Conclusion

15

Data streams are ubiquitous So far: budget algorithms on constant streams Achievement: quality improvement over budget

algorithmsby harnessing the strengths of anytime algorithms

Two simple yet effective approaches Evaluation using three prominent classifiers and

simple confidence measures Both approaches outperform the respective

budget algorithms Results confirm theoretic model and motivate

further research Anytime algorithms Confidence measures

Page 16: Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen, Thomas Seidl – Harnessing the Strengths of Anytime Algorithms for Constant Data Streams – ECML PKDD ’09

Poster session tonight

16

Discuss about the paper Investigate stream data items …


Recommended