+ All Categories
Home > Documents > Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur...

Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur...

Date post: 22-Dec-2015
Category:
View: 215 times
Download: 2 times
Share this document with a friend
Popular Tags:
32
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur Çetintemel Brown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon Lee Brown University Greg Seidman Brown University Michael Stonebraker MIT Nesime Tatbul Brown University Stan Zdonik Brown University
Transcript
Page 1: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Monitoring Streams -- A New Class of Data Management Applications

Don Carney Brown University

Uğur Çetintemel Brown University

Mitch Cherniack Brandeis University

Christian Convey Brown University

Sangdon Lee Brown University

Greg Seidman Brown University

Michael Stonebraker MIT

Nesime Tatbul Brown University

Stan Zdonik Brown University

Page 2: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Background

• MIT/Brown/Brandeis team• First Aurora, then Borealis

– Practical system– Designed for Scalablility: 106 stream inputs, queries– QoS-Driven Resource Management – Stream Storage Management – Realiability/ Fault Tolerance– Distribution and Adaptivity

• First stream startup: StreamBase– Financial applications

Page 3: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Example Stream Applications

• Market Analysis– Streams of Stock Exchange Data

• Critical Care– Streams of Vital Sign Measurements

• Physical Plant Monitoring– Streams of Environmental Readings

• Biological Population Tracking– Streams of Positions from Individuals of a Species

Page 4: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Not Your Average DBMS

1. External, Autonomous Data Sources

2. Querying Time-Series

3. Triggers-in-the-large

4. Real-time response requirements

5. Noisy Data, Approximate Query Results

Page 5: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Outline

2. Aurora Overview/ Query Model

3. Runtime Operation

4. Adaptivity

Page 6: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Aurora from 100,000 Feet

Query App QoS...

...

Query App QoS

...

Query App QoS

...

...

...

...

Each Provides:

• A over input data streams

• A Quality-Of-Service Specification ( )(specifies utility of partial or late results)

Application

Query

QoS

Page 7: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Aurora from 100 Feet

App QoS...

...

App QoS

...

App QoS

...

...

Queries = Workflow (Boxes and Arcs)

• Workflow Diagram = “Aurora Network”

• Boxes = Query Operators

• Arcs = Streams

Slide

Tumble

Streams (Arcs)

• stream: tuple sequence from common source

(e.g., sensor)

• tuples timestamped on arrival (Internal use: QoS)

Query Operators (Boxes)

• Simple: FILTER, MAP, RESTREAM

• Binary: UNION, JOIN, RESAMPLE

• Windowed: TUMBLE, SLIDE, XSECTION, WSORT

Page 8: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Aurora in Action

App QoS...

...

App QoS

...

App QoS

...

...

Slide

Tumble

App

TumbleTumble App

“Box-at-a-time” Scheduling

Arcs Tuple Queues

Outputs Monitored for QoS

Page 9: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Continuous and Historical Queries

ad-hoc query

O4

O5

QoS

App…

O1 O3O2

continuous query

QoS

App… …Queues

O7O8 O9

view3 Days

QoS… …

ConnectionPoint

1 Hour

Page 10: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Quality-of-Service (QoS)

Output Value

Specifies “Utility” Of Imperfect Query ResultsDelay-Based (specify utility of late results)Delivery-Based, Value-Based (specify utility of partial results)

QoS Influences…

Scheduling, Storage Management, Load Shedding

% Tuples Delivered

B

Delay

A C

Page 11: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Talk Outline

1. Introduction

2. Aurora Overview

3. Runtime Operation

4. Adaptivity

5. Related Work and Conclusions

Page 12: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Runtime OperationBasic Architecture

Scheduler

QOSMonitor

Box Processors

.

.

.

Buffer

Storage Manager

Persistent Store

…q1…q2

…qi

…q1

…qn

.

.

.

…q2

...

.

.

.

Catalog

Router

inputs outputs

Page 13: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Runtime OperationScheduling: Maximize Overall QoS

Choice 1: A: Cost: 1 sec(…, age: 1 sec)

B: Cost: 2 sec(…, age: 3 sec)

Delay = 2 secUtility = 0.5

Delay = 5 secUtility = 0.8

Schedule Box A now rather than later

Ideal: Maximize Overall Utility Presently exploring scalable heuristics (e.g., feedback-based)

Choice 2:

Page 14: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Runtime OperationScheduling: Minimizing Per Tuple Processing Overhead

Train Scheduling:

A B… xyz A (x)A (y)A (z) B (A (x))B (A (y))B (A (z))

Default Operation: = Context Switch

AB… xyz B (A (x))B (A (y))B (A (z))Box Trains:

A B… xyz A (z, y, x) B (A (z), A (y), A (x))Tuple Trains:

Page 15: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

1. Run-time Queue Management

Prefetch Queues Prior to Being Scheduled

Drop Tuples from Queues to Improve QoS

2. Connection Point Management

Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, …

Runtime OperationStorage Management

Page 16: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Talk Outline

1. Introduction

2. Aurora Overview

3. Runtime Operation

4. Adaptivity

5. Related Work and Conclusions

Page 17: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Stream Query Optimization

• Differences with Traditional Query Optimization?

Page 18: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Motivation of ‘Query Migration’

• Continuous query over streams– Statistics unknown before start– Statistics changing during execution

• Stream rates, arrival pattern, distribution, etc

• Need for dynamic adaptation– Plan re-optimization

• Change the shape of query plan tree

Page 19: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Stream Query Optimization• New classes of operators (windows) may mean

new rewrites• New execution modes (continuous/pipelining)• More dynamic fluctuations in statistics compile

time optimization not possible• Global optimization not practical; as huge query

networks Adaptive optimization.• Other cost models taking memory into account, not

throughput but output rate, etc.• Query optimization and load shedding

Page 20: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Query Optimization

Compile-time, Global Optimization Infeasible

Too Many Boxes

Too Much Volatility in Network, Data

Dynamic, Local OptimizationScope re what to optimize

Threshold re when to optimize

Page 21: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Run-time Plan Re-Optimization

• Step 1 - Decide when to optimize– Statistics Monitoring

• Step 2 – Generate new query plan– Query Optimization

• Step 3 – Replace current plan by new plan– Plan Migration

Page 22: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Adaptivity in Query Optimization

Dynamic Optimization : Migration

3. Drain Subnetwork4. Optimize Subnetwork5. Turn on Taps

1. Identify Subnetwork2. Buffer Inputs

Page 23: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Stateful Operator in CQ

• But what about stateful operators ?– Need non-blocking operators in CQ– Operator needs to output partial results– State data structure keep received tuples

AB

A B

b1b2b3b4b5

ax

State A State B

ax

ax b2ax b3

Key Observation: The purge of tuples in states relies on processing of new tuples.

Example: Symmetric NL join w/ window constraints

Page 24: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Naïve Migration Strategy Revisited

• Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan

AB

BC

A B C(2)

All tuples drained

(4)Processing

Resumed

(3) Old Replaced

By new

Deadlock Waiting Problem:

Page 25: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

AdaptivityQuery Optimization

State Movement Protocol

Parallel Track Protocol

Page 26: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Moving State Strategy

• Basic idea– Share common states

between two migration boxes

• Key steps– State Matching

• Match states based on IDs.– State Moving

• Create new pointers for matched states in new box

– What’s left?• Unmatched states in new

box

CDSABC SD

BCSAB SC

ABSA SB

ABSA SBCD

CDSBC

SD

BCSB SC

QA QB QC QD QA QB QC QD

QABCD QABCD

Old Box New Box

Page 27: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Parallel Track Strategy

• Basic idea– Execute both plans

in parallel and gradually “push” old tuples out of old box by purging

• Key steps– Connect boxes– Execute in parallel

• Until old box “expired” (no old tuple or sub-tuple)

– Disconnect old box– Start execute new

box only

CD

SABC SD

BC

SAB SC

AB

SA SB

AB

SASBCD

CD

SBC SD

BCSB SC

QA QB QCQD

QA QB QC QD

QABCD QABCD

Page 28: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

1. Two Load Shedding Techniques:• Random Tuple Drops

Add DROP box to network (DROP a special case of FILTER)Position to affect queries w/ tolerant delivery-based QoS reqts

• Semantic Load SheddingFILTER values with low utility (acc to value-based QoS)

2. Triggered by QoS Monitor

e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS

AdaptivityLoad Shedding

Page 29: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

AdaptivityDetecting Overload

Throughput Analysis

Cost = cSelectivity = s

Input rate = r Output rate = min (1/c, r) * s

1/c > r Problem

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

Monitor each application’s Delay-based QoS

Problem: Too many apps in “bad zone”

Latency Analysis

Page 30: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

ImplementationGUI

Page 31: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

ImplementationRuntime

0 1 2 3 4 56

Page 32: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

ConclusionsAurora Stream Query Processing System

1. Designed for Scalability

2. QoS-Driven Resource Management

3. Continuous and Historical Queries

4. Stream Storage Management

5. Implemented Prototype

Web site: www.cs.brown.edu/research/aurora/


Recommended