Auto-scaling Techniques for Elastic Data Stream Processing

Public

Auto-scaling Techniques for Elastic Data

Stream Processing Thomas Heinze, Valerio Pappalardo, Zbigniew Jerzak, Christof Fetzer

March 2014

Public

Outline

1. Introduction

2. Auto-scaling Techniques for Elastic Data Stream Processing

3. Evaluation

4. Conclusion and Future Work

Public

Utilization within Cloud Environments

Cluster of Twitter[1] has average CPU utilization < 20%, however ~80% of the resources are reserved

Google Cluster Trace[2] shows an average 25-35% CPU utilization and 40% memory

The average utilization within public clouds

is estimated to be between 6% to 12%[3].

1 Benjamin Hindman, et al. "Mesos: A platform for fine-grained resource sharing in the data center." NSDI, 2011. 2 Charles Reiss, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SOCC, ACM, 2012. 3 Arunchandar Vasan, et al. “Worth their watts? An empirical study of datacenter servers“. HPCA, 2010.

3

Public

Elasticity

Users needs to reserve required resources Limited understanding of the performance of the system

Limited knowledge of characteristics of the workload

Workload Load

Static Provisioning

Elastic Provisioning

time

Underprovisioning

Overprovisioning

4

Public

Elastic Data Stream Processing

Long standing continous queries over potential infinite data stream

Small memory footprint (MB – GB) for most use cases, fast scale out Strict requirements on end to end latency Unpredictable workload, high variability (within seconds to minutes) Load balancing influences running queries

Input Streams

Data Stream Processing Engine Output Streams

5

Public

Auto-scaling Techniques[1]

Different algorithmic approaches for various domains and use cases.

1) Threshold-based approaches

2) Time series analysis

3) Reinforcement learning

4) Queuing theory

5) Control theory

1T. Lorido-Botran et al., “Auto-scaling techniques for elastic applications in cloud environments,” Tech.

Rep., 2012. 6

Public

Requirements

Workload Independence: Independent from workload characteristics; we make no assumption on the input workload.

Adaptivity: Adapt online to changing conditions like different workload characteristics.

Configurability: Easy to setup and configure by an end user.

Computational feasibility: The algorithm has to be computationally feasible for scale out within seconds.

Feasible: Threshold-based approaches, Reinforcement Learning, Control Theory.

Not feasible: Time Series Analysis, Queuing Models.

7

Public

Threshold-based Approach

Upper Threshold:

„If CPU utilization of host is larger than x for y seconds,

host is marked as overloaded.“

Lower Threshold:

„If CPU utilization of host is small than z for w seconds,

host is marked as underloaded.“

Additional parameters: Target Utilization, Grace Period

Two Variants: Local Thresholds vs. Global Thresholds

8

Public

Reinforcement Learning[1]

Lookup table describing for each state the best action

Best action is adapted based on an online learning algorithm

Extension for Elastic Data Stream Processing:

Local policy

Initialisation based on threshold-based policy

Grace Period

1R. Das et al., “Model-based and model-free approaches to autonomic resource allocation”, IBM

Research Report, 2005. 9

Public

Elastic Data Stream Processing

Elastic CEP Engine Elastic CEP Engine FU

GU

Operator Placement Distributed Data

Stream Processing Engine Processing

Coordination

Auto-scaling Techniques

Heinze, Thomas, et al. "Elastic Complex Event Processing under Varying Query Load.“ BD3,VLDB,

2013.

Queries

Input Streams

Output Streams

10

Public

Measured

Host Utilization

Auto-scaling Technique

Max./ Min. Host Utilization,

Target Utilization

Scaling

Decision

Integration of Auto-Scaling Techniques

11

Public

Setup

Based on data taken from Frankfurt Stock Exchange

35 aggregation queries using varying data selection and time ranges

up to 10 processing nodes

Experiment duration: ~1 hour

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1 101 201 301 401 501 601 701

Even

t C

ou

nt

(Eve

nt/

s)

Time

Day 2

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1 101 201 301 401 501 601 701

Even

t C

ou

nt

(Eve

nt/

s)

Time

Day 1

12

Public

Configurability

Global Thresholds Local Thresholds

Local threshold are more robust, while global thresholds highly sensitiv.

0

1

2

3

4

5

6

7

8

9

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

0.2-0.8 0.3-0.8 0.3-0.9

Late

ncy

(s)

Uti

lizat

ion

Latency Avg Utilization Max Utilization Min Utilization Median Latency

0

1

2

3

4

5

6

7

8

9

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

0.2-0.8 0.3-0.8 0.3-0.9

Late

ncy

(s)

Uti

lizat

ion

13

Public

Configurability

Reinforcement Learning Local Thresholds

Reinforcement Learning achieves best utilization and latency values.

0

1

2

3

4

5

6

7

8

9

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

0.2-0.8 0.3-0.8 0.3-0.9

Late

ncy

(s)

Uti

lizat

ion

0

1

2

3

4

5

6

7

8

9

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

0.2-0.8 0.3-0.8 0.3-0.9

Late

ncy

(s)

Uti

lizat

ion


14

Public

Different Workloads

Reinforcement Learning Local Thresholds

Reinforcement Learning reduces variability between different workloads.

0

1

2

3

4

5

6

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

Day 1 Day 2 Day3

Late

ncy

(s)

Uti

lizat

ion

0

1

2

3

4

5

6

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

Day 1 Day 2 Day3

Late

ncy

(s)

Uti

lizat

ion


15

Public

Lessions Learned

Elasticity for data stream processing systems poses new challenges towards auto-scaling techniques

Global thresholds create many overload situations

Adaptive Reinforcement Learning is more stable than Local Thresholds, improves utilization by 5% and reduces latency by 30%

Need to handle variations within a workload better

Investigate additional parameters like latency, queue length.

Study influence of the placement algorithm

16

Date post:	10-May-2015
Category:	Technology
Upload:	zbigniew-jerzak
View:	201 times
Download:	1 times

Auto-scaling Techniques for Elastic Data Stream Processing

Technology