+ All Categories
Home > Documents > Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil...

Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil...

Date post: 17-Jan-2016
Category:
Upload: kory-stewart
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue University Presented by Chris Mayfield VLDB Conference, Seoul, Korea September 14, 2006
Transcript
Page 1: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Load Shedding in Stream Databases – A Control-Based Approach

Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin YaoDepartment of Computer Science, Purdue University

Presented by Chris Mayfield

VLDB Conference, Seoul, KoreaSeptember 14, 2006

Page 2: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Data stream management systems

• Applications• Financial analysis• Mobile services• Sensor networks• Network monitoring• More …

• Continuous data, discarded after being processed

• Continuous query• Data-active query-

passive model

User

DSMS

User

User

Data

Data

Data

Data

Data

Query Results

Page 3: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

DSMS architecture

• Network of query operators (O1 – O3)• Each operator has its own queue (q1 – q4)• Scheduler decides which operator to

execute• Query results (Q1, Q2) pushed to clients• Example systems:

• Aurora/Borealis• STREAM

Page 4: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Quality in DSMS data processing• Data processing in DSMS is quality-critical

• tuple delay• data loss• sampling rate, window size, …

• Overloading during spikes degraded quality (delay)

• Solution: adjust data loss (i.e., load shedding)• On DSMS side • Eliminating excessive load by dropping data

items

• The real problem is:

tuple delay is the major concern: results generated from old data are useless!

How to maintain processing delayswhile minimizing data loss ?

Page 5: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Related work (load shedding)

• Accuracy of aggregate queries under load shedding (Babcock et al., ICDE04)

• Data triage (Reiss & Hellerstein, ICDE05)• Put data into an asylum upon overloading

• LoadStar (Chi et al., VLDB05)• QoS-driven load shedding (Tatbul et al.,

VLDB03)• Key questions

- When?- How much?- Where?

• Use a load shedding roadmap (LSRM) to decide where

• Intuitive algorithm to decide when and how much

Page 6: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Example Limitations• Highly dynamic environment is reality

• Bursty data input• Variable unit processing cost

• Fails to capture current system status (queue length) and output (delay)• Delay positively related to queue length

• Example 1. Unbounded increase of delay• Example 2. Unnecessary data loss

Page 7: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Our approach

• The feedback control loop:• Plant• Monitor• Controller• Actuator

• How it works• Error (e) = desirable output

(yr) - measured output (y) • Focal point: controller,

which maps e to control signal u

• Disturbances

• View load shedding as a control theory problem • Control: manipulation of system behavior by adjusting input

• Cruise control of automobiles, room temperature control, etc.

• Open-loop (preset) vs. closed-loop (feedback) control

Page 8: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Challenges (theory → practice)

• Can we model the system?• Analytical model may not be easy to derive• System identification: experimental methods

• How to design the controller?• Use control theoretical tools for guaranteed

performance

• DSMS-specific problems• Lack of real-time measurement of output signal

( y ) • How to set control period (T)

• Real system evaluation• we use Borealis in our study

Page 9: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Modeling a DSMS• Borealis data stream manager

• Round robin operator scheduler• FIFO waiting queues• For now, fix the per-tuple processing cost c

• Proposed model: y = qc

where q is the number of outstanding data tuples

• Discrete form: y(k) = q(k-1) c• Denote the input load as fi and system

processing power as fo:

kj

oi jfjfH

cTckqky )]()([)1()(

Page 10: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Controller design

• Design based on pole placement• Locations tell how fast/well system

responds

• Guaranteed performance targeting• Convergence rate - responsiveness• Damping - smoothness

• The controller: (see appendix for details)

Page 11: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Control period• Provides more complete answer to the

question “when to shed load”? • Empirically set in previous studies• Case-by-case decision with some systematic

rules• In our problem, a tradeoff between:

• Sampling theory (Nyquist-Shannon Theorem): in order to capture the moving trends of the disturbances, higher (shorter) sampling frequency (period) is preferred

• Stochastic feature of output ( y ) and parameter ( c ):

more samples are needed longer period is preferred

• The first factor should be given more weight

Page 12: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Input for experiments

• Controller and load shedder implemented in Borealis

• Synthetic (“Pareto”) and real (“Web”) data streams

• Small query network with variable average processing cost

Page 13: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Experimental results• Experiments for

comparison• Aurora – open loop

solution• Baseline – a simple

feedback method

• Target delay: 2 sec• Control period: 1

sec• Total time: 400 sec• For both input

types, data loss are almost the same for all three load shedding strategies

Page 14: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Future work

• Time-varying DSMS model• For example, time-varying cost c• Possible solution: adaptive control

• Adaptation other than load shedding• New disturbances?• Model changes? (i.e. at runtime)

• Other database problems

distubance disturbance

InternalDynamics

ExternalController

InternalController

ExternalDynamics

Page 15: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Summary

• Load shedding is an effective quality adaptation method

• Ad hoc solutions do not work well under dynamic load and system features

• We propose an approach to guide load shedding in a highly dynamic environment based on feedback control theory

• Initial experimental results performed in a real-world DSMS show promising potential of our approach

Page 16: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Backup - 1

Page 17: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Backup - 2

• Lack of robustness of open-loop solution• More optimistic

policy adapted in Aurora

• Unstable performance

• Our solution is robust• Under input

streams with different burstiness

Page 18: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Backup - 3

Page 19: Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.

Backup - 4 (Model verification)

• Feed Borealis with synthetic streams• Input rate: step or sinusoidal function of

time• Average processing cost is fixed


Recommended