Services Computing Technology and System Lab Ares a High … · 2019-08-12 · Services Computing...

transcript

Services Computing Technology and System Lab

Cluster and Grid Computing Lab

a High Performance and Fault-tolerant Distributed Stream Processing SystemAres

Changfu Linlcf@hust.edu.cn

Joint work with JingJing Zhan, Hanhua Chen, Jie Tan & Hai Jin{zjj, chen, tjmaster, hjin}@hust.edu.cn

Cluster and Grid Computing LabServices Computing Technology and System LabSchool of Compute Science and TechnologyHuazhong University of Science and Technology, Wuhan, 430074, Chinahttp://grid.hust.edu.cn/

Real-time Stream Processing

Cluster and Grid Computing LabServices Computing Technology and System Lab Ares

Use CasesE-commerce RecommendationAnomaly Detection

Ecosystem

Real-time Stream Processing

Use CasesE-commerce RecommendationAnomaly Detection

Requirements

High Availability

Low LatencyExtract value from data streams in real-time

Failures are unavailable for long-time running applications

Low Latency Vs. High Availability

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Low Latency

Node A

Latency

DEBS’13, CIKM’14, ICDCS’14, Middleware’15, INFOCOM’16

Elaborated task allocation schemes

Co-locate upstream and downstream task pairs

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Low Latency

Node A

Latency

Recovery

DEBS’13, CIKM’14, ICDCS’14, Middleware’15, INFOCOM’16

Elaborated task allocation schemes

downstream task must wait upstream task

Rack 1

Cascaded waiting

1 3 4Co-locate upstream and downstream task pairs

Challenge: Exploit Task Dependency

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Recovery

Latency1

Rack A

Rack B

Node A

Latency

Recovery

High Availability

Best trade-off between low latency and high availability via exploiting task dependency?

Low Latency

Ares’s Stream Latency Model

Idea: Divide the application topology into multiple source-sink paths

Source

# source-sink path2 ∗ 3 + 2 ∗ 3 = 12

Source

latency(1→5→8)= 0.1 + 0.4 + 0.5 + (0.3 + 0.2) = 1.4

A B Ct1

0.1 0.3

0.4 0.7

0.2 0.5

Node A Node B Node C

Processing Time Transferring Time

# source-sink path2 ∗ 3 + 2 ∗ 3 = 12

1 ( )| | p P

latency pP ∈∑

Ares’s Stream Recovery Model

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Idea: Exploit the dependency between upstream and downstream tasks for the rack failure

Task dependency is a main challenge for failure recovery.[NSDI’16,StreamScope]

Ares’s Stream Recovery Model

Rack 1

Switch 1

Rack 3

Switch 3

Rack 2

Switch 2

Idea: Exploit the dependency between upstream and downstream tasks for the rack failure

5 recovery(rack 3)= 𝑐𝑐(𝑤𝑤15 +𝑤𝑤16 +𝑤𝑤35 + 𝑤𝑤58)

𝑤𝑤58 =4

12= 0.33

Task dependency is a main challenge for failure recovery.[NSDI’16,StreamScope]

The recovery time is proportional to the sum of weights of task pairs

1 ( )| | r R

recovery rR ∈∑

Fault Tolerant Scheduler (FTS) Problem

Definition: Given an application topology G(V, E), a task set T, a node set N, a rack set R, and a network topology ψ, find a task allocation π that minimizes the allocation cost u(π).

1 1( ) ( ) (1 ) ( )| | | |p P r R

u W latency p W recovery rP R

π∈ ∈

= + −∑ ∑

, ( ) ( )( ) ( , ) ( , )

ij ij i j

t t ij i j ij ijt T e E e E

t d wψ π ψ π

µ π α λ π β π π γ∈ ∈ ∈ =

= + +∑ ∑ ∑

processing cost transferring cost recovering cost

The Nirvana algorithm

Think like a player: When we can’t solve a problem from a Holisticperspective, why not solve it from an individual perspective.

Observation: the allocation decision of a task only depends on the decisions of its neighbor tasks

Game Theory Players

Strategies

Cost Function

FTS Game The task set T

The node set N

Individual allocation cost 𝜃𝜃𝑡𝑡(𝜋𝜋)

, ( ) ( )( ) ( , ) ( , )

ij ij i j

t t ij i j ij ijt T e E e E

t d wψ π ψ π

µ π α λ π β π π γ∈ ∈ ∈ =

= + +∑ ∑ ∑

( ) ( ), ( ) ( )

1 1( ) ( , ) ( , )2 2

t t t ij i t it iti neighbor t i neighbor t

t d wψ π ψ π

θ π α λ π β π π γ∈ ∈ =

= + +∑ ∑

( ) ( )tt T

µ π θ π∈

Think like a player: When we can’t solve a problem from a holisticperspective, why not solve it from an individual perspective.

Observation: the allocation decision of a task only depends on the decisions of its neighbor tasks

Theoretical Analysis Results

Theorem 1There exists Nash equilibrium for the FTS game.

Theorem 2Our design achieves a 2-approximation ratio for the FTS problem.

Theorem 3The upper bound of the number of rounds to converge to the Nash

equilibrium for the FTS game is h(X+Y+Z).

Theorem 4The computation complexity of our design is 𝑂𝑂(2𝜉𝜉|𝑁𝑁|( 𝑇𝑇 + 2|𝐸𝐸|)),

where 𝜉𝜉 denotes the number of rounds to converge to Nash equilibriumfor the FTS game.

The Ares Architecture

Database

Coordinator

Worker Node

ExecutorExecutor

Executor

Worker process

Supervisor

Load Monitor

Worker Node

ExecutorExecutor

Executor

Worker process

Supervisor

Load Monitor

Worker Node

ExecutorExecutor

Executor

Worker process

Supervisor

Load Monitor

Master Node

Custom Scheduler

Nirvana Agent

Evaluation

Setups30 node cluster, implement Storm 16 cores & 64GB DDR3 Intel Xeon server1Gbps Ethernet interfacebaseline: R-Storm[Middleware’15]

ApplicationWord Count Application: Follow the setting of Heron[SIGMOD’15]

Join Application: TPC-H dataset

MetricsThroughputAverage Tuple Processing TimeAverage Rack Recovery Time

Is Processing Latency Better？

Average Reduction of 50.2%

Is Recovery Time Better?

Average Reduction of 48.9%

Is Throughput Better?

Average Improvement of 2.24×

Summary

The fault tolerant scheduling problem

Implementation of Ares on top of Storm

Based on best-response dynamics

The stream latency modelThe stream recovery model

Thank you! Any question?

Services Computing Technology and System Lab Ares a High … · 2019-08-12 · Services Computing...

Documents