Services Computing Technology and System Lab Ares a High … · 2019-08-12 · Services Computing...

Post on 22-May-2020

1 views 0 download

transcript

Services Computing Technology and System Lab

Cluster and Grid Computing Lab

a High Performance and Fault-tolerant Distributed Stream Processing SystemAres

Changfu Linlcf@hust.edu.cn

Joint work with JingJing Zhan, Hanhua Chen, Jie Tan & Hai Jin{zjj, chen, tjmaster, hjin}@hust.edu.cn

Cluster and Grid Computing LabServices Computing Technology and System LabSchool of Compute Science and TechnologyHuazhong University of Science and Technology, Wuhan, 430074, Chinahttp://grid.hust.edu.cn/

01

Real-time Stream Processing

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Use CasesE-commerce RecommendationAnomaly Detection

Ecosystem

02

Real-time Stream Processing

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Use CasesE-commerce RecommendationAnomaly Detection

Requirements

High Availability

Low LatencyExtract value from data streams in real-time

Failures are unavailable for long-time running applications

03

Low Latency Vs. High Availability

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Net

wor

k To

polo

gy

1

2

O1

3

4

O2

5

6

O3

8

O4

7

App

licat

ion

Topo

logy

04

Low Latency Vs. High Availability

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Net

wor

k To

polo

gy

1

2

O1

3

4

O2

5

6

O3

8

O4

7

App

licat

ion

Topo

logy

Low Latency

1 5

Node A

Latency

DEBS’13, CIKM’14, ICDCS’14, Middleware’15, INFOCOM’16

Elaborated task allocation schemes

Co-locate upstream and downstream task pairs

05

Low Latency Vs. High Availability

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Net

wor

k To

polo

gy

1

2

O1

3

4

O2

5

6

O3

8

O4

7

App

licat

ion

Topo

logy

Low Latency

1 5

Node A

Latency

Recovery

DEBS’13, CIKM’14, ICDCS’14, Middleware’15, INFOCOM’16

Elaborated task allocation schemes

downstream task must wait upstream task

Rack 1

1 5

5 8

3

5

Cascaded waiting

5

8

1 3 4Co-locate upstream and downstream task pairs

06

Challenge: Exploit Task Dependency

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Net

wor

k To

polo

gy

1

2

O1

3

4

O2

5

6

O3

8

O4

7

App

licat

ion

Topo

logy

Recovery

Latency1

Rack A

5

Rack B

1 5

Node A

Latency

Recovery

High Availability

Best trade-off between low latency and high availability via exploiting task dependency?

Low Latency

07

Ares’s Stream Latency Model

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Idea: Divide the application topology into multiple source-sink paths

1

2

O1

3

4

O2

5

6

O3

8

O4

7

08

Ares’s Stream Latency Model

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Idea: Divide the application topology into multiple source-sink paths

Source

Sink

# source-sink path2 ∗ 3 + 2 ∗ 3 = 12

1

2

O1

3

4

O2

5

6

O3

8

O4

7

09

Ares’s Stream Latency Model

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Idea: Divide the application topology into multiple source-sink paths

Source

Sink

latency(1→5→8)= 0.1 + 0.4 + 0.5 + (0.3 + 0.2) = 1.4

1 5 8

A B Ct1

t5

t8

0.1 0.3

0.5

0.2

0.3

0.4 0.7

0.2 0.5

Node A Node B Node C

Processing Time Transferring Time

A

B

C

0.3

0.1

0.2

85

# source-sink path2 ∗ 3 + 2 ∗ 3 = 12

1

1 ( )| | p P

latency pP ∈∑

1

2

O1

3

4

O2

5

6

O3

8

O4

7

10

Ares’s Stream Recovery Model

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Idea: Exploit the dependency between upstream and downstream tasks for the rack failure

Task dependency is a main challenge for failure recovery.[NSDI’16,StreamScope]

11

Ares’s Stream Recovery Model

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Idea: Exploit the dependency between upstream and downstream tasks for the rack failure

1 5

1 6

3

5 recovery(rack 3)= 𝑐𝑐(𝑤𝑤15 +𝑤𝑤16 +𝑤𝑤35 + 𝑤𝑤58)

𝑤𝑤58 =4

12= 0.33

Task dependency is a main challenge for failure recovery.[NSDI’16,StreamScope]

1

2

O1

3

4

O2

5

6

O3

8

O4

7

The recovery time is proportional to the sum of weights of task pairs

1 ( )| | r R

recovery rR ∈∑

12

Fault Tolerant Scheduler (FTS) Problem

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Definition: Given an application topology G(V, E), a task set T, a node set N, a rack set R, and a network topology ψ, find a task allocation π that minimizes the allocation cost u(π).

1 1( ) ( ) (1 ) ( )| | | |p P r R

u W latency p W recovery rP R

π∈ ∈

= + −∑ ∑

, ( ) ( )( ) ( , ) ( , )

ij ij i j

t t ij i j ij ijt T e E e E

t d wψ π ψ π

µ π α λ π β π π γ∈ ∈ ∈ =

= + +∑ ∑ ∑

processing cost transferring cost recovering cost

Gen

eral

izat

ion

13

The Nirvana algorithm

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Think like a player: When we can’t solve a problem from a Holisticperspective, why not solve it from an individual perspective.

Observation: the allocation decision of a task only depends on the decisions of its neighbor tasks

Game Theory Players

Strategies

Cost Function

FTS Game The task set T

The node set N

Individual allocation cost 𝜃𝜃𝑡𝑡(𝜋𝜋)

, ( ) ( )( ) ( , ) ( , )

ij ij i j

t t ij i j ij ijt T e E e E

t d wψ π ψ π

µ π α λ π β π π γ∈ ∈ ∈ =

= + +∑ ∑ ∑

( ) ( ), ( ) ( )

1 1( ) ( , ) ( , )2 2

i t

t t t ij i t it iti neighbor t i neighbor t

t d wψ π ψ π

θ π α λ π β π π γ∈ ∈ =

= + +∑ ∑

( ) ( )tt T

µ π θ π∈

=∑

14

The Nirvana algorithm

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Think like a player: When we can’t solve a problem from a holisticperspective, why not solve it from an individual perspective.

Observation: the allocation decision of a task only depends on the decisions of its neighbor tasks

?

15

Theoretical Analysis Results

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Theorem 1There exists Nash equilibrium for the FTS game.

Theorem 2Our design achieves a 2-approximation ratio for the FTS problem.

Theorem 3The upper bound of the number of rounds to converge to the Nash

equilibrium for the FTS game is h(X+Y+Z).

Theorem 4The computation complexity of our design is 𝑂𝑂(2𝜉𝜉|𝑁𝑁|( 𝑇𝑇 + 2|𝐸𝐸|)),

where 𝜉𝜉 denotes the number of rounds to converge to Nash equilibriumfor the FTS game.

16

The Ares Architecture

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Sche

dulin

g So

lutio

n

Database

Coordinator

Worker Node

ExecutorExecutor

Executor

Worker process

Supervisor

Load Monitor

Worker Node

ExecutorExecutor

Executor

Worker process

Supervisor

Load Monitor

Worker Node

ExecutorExecutor

Executor

Worker process

Supervisor

Load Monitor

...

Master Node

Custom Scheduler

Nirv

ana-

base

dCo

ntro

l

Nirvana Agent

Dist

ribut

ed S

tream

Pro

cess

ing

Syste

m

17

Evaluation

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Setups30 node cluster, implement Storm 16 cores & 64GB DDR3 Intel Xeon server1Gbps Ethernet interfacebaseline: R-Storm[Middleware’15]

ApplicationWord Count Application: Follow the setting of Heron[SIGMOD’15]

Join Application: TPC-H dataset

MetricsThroughputAverage Tuple Processing TimeAverage Rack Recovery Time

18

Is Processing Latency Better?

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Average Reduction of 50.2%

19

Is Recovery Time Better?

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Average Reduction of 48.9%

20

Is Throughput Better?

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Average Improvement of 2.24×

21

Summary

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

The fault tolerant scheduling problem

The Nirvana algorithm

Implementation of Ares on top of Storm

Based on best-response dynamics

The stream latency modelThe stream recovery model

Thank you! Any question?