+ All Categories
Home > Documents > Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for...

Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for...

Date post: 20-Jan-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
106
Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1
Transcript
Page 1: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Clarinet: WAN-Aware Optimization for Analytics Queries

Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella

1

Page 2: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Overview

2

• Web apps hosted on multiple DCs Low latency access to end-user

Page 3: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Overview

2

• Web apps hosted on multiple DCs Low latency access to end-user

Page 4: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Overview

2

• Web apps hosted on multiple DCs Low latency access to end-user

Page 5: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Overview

2

• Web apps hosted on multiple DCs Low latency access to end-user

Page 6: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Overview

2

• Web apps hosted on multiple DCs Low latency access to end-user• Need efficient methods to analyze data located in multiple data centers

Page 7: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Centralized Aggregation is Wasteful

3

Page 8: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Centralized Aggregation is Wasteful

3

Intra-data centerAnalytics

Framework

SELECT * … FROM .. WHERE .. ;

Page 9: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Centralized Aggregation is Wasteful

3

Intra-data centerAnalytics

Framework

SELECT * … FROM .. WHERE .. ;

• Available WAN bandwidth is limited Aggregation latency overhead

0

50

100

150

200

250

300

350

400

450

500

1 11 21 31 41 51 61 71 81

Ban

dw

idth

(M

bp

s)

Directional WAN links sorted by bandwidth

Measured pairwise bandwidth between EC2 regions

450 Mbps

20 Mbps

Page 10: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Centralized Aggregation is Wasteful

3

Intra-data centerAnalytics

Framework

SELECT * … FROM .. WHERE .. ;

• Available WAN bandwidth is limited Aggregation latency overhead

• WAN links are expensive High data transfer cost

$$$$

$$$$

$$$

Page 11: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Geo-distributed Analytics

4

Page 12: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Page 13: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer

Page 14: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer

Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs

Page 15: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer

Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs

Geo-distributed

Requires WAN-aware optimization

Page 16: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer

Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs

Geo-distributed

Requires WAN-aware optimization

Iridium [SIGCOMM 15]GeoDe [NSDI 15]

Page 17: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer

Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs

Geo-distributed

Requires WAN-aware optimization

Clarinet

Page 18: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer

Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs

Geo-distributed

Requires WAN-aware optimization

Clarinet

2.7x reduction in query runtime

Page 19: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

5

T1, T2, T3: Tables storing click logs

Page 20: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

T1, T2, T3: Tables storing click logs

Page 21: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

Page 22: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

Page 23: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

T1

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

Page 24: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

10 GB 200 GB

200 GB 200 GB⋈

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

T1

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

Page 25: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

10 GB 200 GB

200 GB 200 GB⋈

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

T1

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

Plan running time: 41 s

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

Page 26: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

10 GB 200 GB

200 GB 200 GB⋈

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

T1

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

40 s1 s

Plan running time: 41 s

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

Page 27: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

10 GB 200 GB

200 GB 200 GB⋈

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

T1

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

40 s1 s

Plan running time: 41 s

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

WAN-only bottleneck

Page 28: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

10 GB 200 GB

200 GB 200 GB⋈

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

T1

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

40 s1 s

Plan running time: 41 s

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

⋈12 GB

200 GB 200 GB

T2 T1

T3

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 ⋈

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

Plan A Plan B Plan C

Plan running time: 20.96 s Plan running time: 17.6 s

WAN-only bottleneck

Page 29: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

10 GB 200 GB

200 GB 200 GB⋈

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

T1

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

40 s1 s

Plan running time: 41 s

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

⋈12 GB

200 GB 200 GB

T2 T1

T3

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 ⋈

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

Plan A Plan B Plan C

Plan running time: 20.96 s Plan running time: 17.6 s

Chosen by network agnostic query optimizer

WAN-only bottleneck

Page 30: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

10 GB 200 GB

200 GB 200 GB⋈

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

T1

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

40 s1 s

Plan running time: 41 s

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

⋈12 GB

200 GB 200 GB

T2 T1

T3

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 ⋈

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

Plan A Plan B Plan C

Plan running time: 20.96 s Plan running time: 17.6 s

Chosen by network agnostic query optimizer

WAN-only bottleneck

Page 31: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

WAN Aware Query Optimization

T2

DC2

T3

DC3

T1

DC1

80 Gbps 40 Gbps

100 Gbps

5

10 GB 200 GB

200 GB 200 GB⋈

T2 T3

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

T1

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

40 s1 s

Plan running time: 41 s

QUERY

SELECT T1.user, T1.latency, T2.latency, T3.latency

FROM T1, T2, T3

WHERE T1.user == T2.user AND T1.user == T3.user

AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

⋈12 GB

200 GB 200 GB

T2 T1

T3

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 ⋈

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝑀𝑜𝑏𝑖𝑙𝑒

Plan A Plan B Plan C

Plan running time: 20.96 s Plan running time: 17.6 s

Chosen by network agnostic query optimizer

WAN-only bottleneck

WAN-aware query optimizer that uses network transfer duration to choose query plans

Page 32: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Outline

1. Motivation

2. Challenges in choosing query plan based on WAN transfer durations

3. Solution• Single query

• Multiple simultaneous queries

4. Experimental Evaluation

6

Page 33: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

Page 34: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶

Page 35: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶

T1 T2

MAP: SELECT MAP: SELECT

REDUCE: JOIN

200 GB200 GB

Map Reduce Job

Page 36: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶20 s

T1 T2

MAP: SELECT MAP: SELECT

REDUCE: JOIN

200 GB200 GB

Map Reduce Job

Tasks placed in single DC

Page 37: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶

10 s

T1 T2

MAP: SELECT MAP: SELECT

REDUCE: JOIN

200 GB200 GB

Map Reduce Job

Tasks are placed uniformly across DC1 and DC2

Page 38: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶

T1 T2

MAP: SELECT MAP: SELECT

REDUCE: JOIN

200 GB200 GB

Map Reduce Job

1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Page 39: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶

T1 T2

MAP: SELECT MAP: SELECT

REDUCE: JOIN

200 GB200 GB

Map Reduce Job

1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s

20.5 s11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Page 40: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶

T1 T2

MAP: SELECT MAP: SELECT

REDUCE: JOIN

200 GB200 GB

Map Reduce Job

1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s

20.5 s11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Used by high priority application

Page 41: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶

T1 T2

MAP: SELECT MAP: SELECT

REDUCE: JOIN

200 GB200 GB

Map Reduce Job

1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s

20.5 s11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Used by high priority application

Page 42: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶

T1 T2

MAP: SELECT MAP: SELECT

REDUCE: JOIN

200 GB200 GB

Map Reduce Job

1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s

20.5 s11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Used by high priority application

Choose query plan based on:1. Best available task placements

Page 43: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Other factors also affect query plan run time

7

80 Gbps

40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

⋈200 GB 200 GB

T1 T2

𝜎𝐶 𝜎𝐶

T1 T2

MAP: SELECT MAP: SELECT

REDUCE: JOIN

200 GB200 GB

Map Reduce Job

1. Plan A: 41 s2. Plan B: 20.963. Plan C: 17.6 s

20.5 s11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Used by high priority application

Choose query plan based on:1. Best available task placements2. Schedule of network transfers

Page 44: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Joint plan selection, placement and scheduling

8

Page 45: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Joint plan selection, placement and scheduling

Query Optimizer Multiple query plans (join orders) per query

SELECT * FROM … WHERE.. ;

8

Page 46: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Joint plan selection, placement and scheduling

Query Optimizer Multiple query plans (join orders) per query

SELECT * FROM … WHERE.. ;

8

Logical plan to physical plan Assign parallelism for each stage

Page 47: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Joint plan selection, placement and scheduling

Clarinet

Query Optimizer

Network aware task placement and scheduling for each query plan

Multiple query plans (join orders) per query

Choose plan with smallest run time for execution

SELECT * FROM … WHERE.. ;

8

Logical plan to physical plan Assign parallelism for each stage

Page 48: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Joint plan selection, placement and scheduling

Clarinet

Query Optimizer

Network aware task placement and scheduling for each query plan

Multiple query plans (join orders) per query

Choose plan with smallest run time for execution

SELECT * FROM … WHERE.. ;

8

Logical plan to physical plan Assign parallelism for each stageClarinet binds query to plan lower in the stack

Page 49: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Network aware placement and scheduling

9

T1 T3

T2SELECT SELECT

SELECTJOIN

JOIN

Page 50: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Network aware placement and scheduling

• Task placement decided greedily one stage at a time• Minimize per stage run time

9

T1 T3

T2SELECT SELECT

SELECTJOIN

JOIN

Page 51: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Network aware placement and scheduling

• Task placement decided greedily one stage at a time• Minimize per stage run time

• Scheduling of network transfers• Determines start times of inter-DC network transfers

9

T1 T3

T2SELECT SELECT

SELECTJOIN

JOIN

Page 52: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Network aware placement and scheduling

• Task placement decided greedily one stage at a time• Minimize per stage run time

• Scheduling of network transfers• Determines start times of inter-DC network transfers• Formulate a Binary Integer Linear Program to solve

scheduling• Factors transfer dependencies

9

T1 T3

T2SELECT SELECT

SELECTJOIN

JOIN

Page 53: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

10

How to extend the late-binding strategy to multiple queries?

Page 54: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Queries affect each others’ run time

11

80 Gbps 40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

Page 55: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Queries affect each others’ run time

11

80 Gbps 40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

QUERY 1

SELECT …

device == “mobile”

…;

QUERY 2

SELECT …

genre == “pc”

…;

Page 56: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Queries affect each others’ run time

11

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝐶

80 Gbps 40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

QUERY 1

SELECT …

device == “mobile”

…;

QUERY 2

SELECT …

genre == “pc”

…;

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑃𝐶 𝜎𝑃𝐶

𝜎𝑅

Same query plan (Plan C) for Query 1 and Query 2

Page 57: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Queries affect each others’ run time

11

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝐶

80 Gbps 40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

QUERY 1

SELECT …

device == “mobile”

…;

QUERY 2

SELECT …

genre == “pc”

…;

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑃𝐶 𝜎𝑃𝐶

𝜎𝑅

Same query plan (Plan C) for Query 1 and Query 2

Contention increases query run time

Page 58: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Queries affect each others’ run time

11

80 Gbps 40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

QUERY 1

SELECT …

device == “mobile”

…;

QUERY 2

SELECT …

genre == “pc”

…;

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝐶 ⋈

⋈12 GB

200 GB 200 GB

T2 T1

T3

200 GB

𝜎𝑃𝐶 𝜎𝑃𝐶

𝜎𝐶

Different query plans for Query 1 (Plan C) and Query 2 (Plan B)

Page 59: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Queries affect each others’ run time

11

80 Gbps 40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

QUERY 1

SELECT …

device == “mobile”

…;

QUERY 2

SELECT …

genre == “pc”

…;

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝐶 ⋈

⋈12 GB

200 GB 200 GB

T2 T1

T3

200 GB

𝜎𝑃𝐶 𝜎𝑃𝐶

𝜎𝐶

Different query plans for Query 1 (Plan C) and Query 2 (Plan B)No contention of network links

Page 60: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Queries affect each others’ run time

11

80 Gbps 40 Gbps

100 Gbps

T2

DC2

T3

DC3

T1

DC1

QUERY 1

SELECT …

device == “mobile”

…;

QUERY 2

SELECT …

genre == “pc”

…;

⋈16 GB

200 GB 200 GB

T1 T3

T2

200 GB

𝜎𝑀𝑜𝑏𝑖𝑙𝑒 𝜎𝑀𝑜𝑏𝑖𝑙𝑒

𝜎𝐶 ⋈

⋈12 GB

200 GB 200 GB

T2 T1

T3

200 GB

𝜎𝑃𝐶 𝜎𝑃𝐶

𝜎𝐶

Different query plans for Query 1 (Plan C) and Query 2 (Plan B)No contention of network links

Choosing execution plans jointly for multiple queries improves performance

Page 61: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Iterative Shortest Job First

QO

QUERY A QUERY B

QO

QUERY C

QO• Best combination minimize average

completion• Computationally intractable

12

Page 62: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Iterative Shortest Job First

Clarinet

QO

QUERY A QUERY B

QO

QUERY C

QO• Best combination minimize average

completion• Computationally intractable

• Iterative Shortest Job First (SJF) scheduling heuristic

1. Pick shortest physical query plan in each iteration

12

Page 63: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Iterative Shortest Job First

Clarinet

QO

QUERY A QUERY B

QO

QUERY C

QO• Best combination minimize average

completion• Computationally intractable

• Iterative Shortest Job First (SJF) scheduling heuristic

1. Pick shortest physical query plan in each iteration

10 1218 5 8 20 30Iter 1:

12

Page 64: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Iterative Shortest Job First

Clarinet

QO

QUERY A QUERY B

QO

QUERY C

QO• Best combination minimize average

completion• Computationally intractable

• Iterative Shortest Job First (SJF) scheduling heuristic

1. Pick shortest physical query plan in each iteration

10 1218 5 8 20 30Iter 1:

12

Page 65: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Iterative Shortest Job First

Clarinet

QO

QUERY A QUERY B

QO

QUERY C

QO• Best combination minimize average

completion• Computationally intractable

• Iterative Shortest Job First (SJF) scheduling heuristic

1. Pick shortest physical query plan in each iteration

• Reserve bandwidth to guarantee completion time

10 1218 5 8 20 30Iter 1:

0 t12

B1Link 1

Link 2

5

Page 66: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Iterative Shortest Job First

Clarinet

QO

QUERY AQUERY B

QO

QUERY C

QO• Best combination minimize average

completion• Computationally intractable

• Iterative Shortest Job First (SJF) scheduling heuristic

1. Pick shortest physical query plan in each iteration

• Reserve bandwidth to guarantee completion time

0 t12

B1Link 1

Link 2

5

Page 67: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Iterative Shortest Job First

Clarinet

QO

QUERY AQUERY B

QO

QUERY C

QO• Best combination minimize average

completion• Computationally intractable

• Iterative Shortest Job First (SJF) scheduling heuristic

1. Pick shortest physical query plan in each iteration

• Reserve bandwidth to guarantee completion time

15 1718 25 30Iter 2:

0 t12

B1Link 1

Link 2

5

Page 68: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Iterative Shortest Job First

Clarinet

QO

QUERY AQUERY B

QO

QUERY C

QO• Best combination minimize average

completion• Computationally intractable

• Iterative Shortest Job First (SJF) scheduling heuristic

1. Pick shortest physical query plan in each iteration

• Reserve bandwidth to guarantee completion time

15 1718 25 30Iter 2:

0 t12

B1Link 1

Link 2

5

Page 69: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Iterative Shortest Job First

Clarinet

QO

QUERY AQUERY B

QO

QUERY C

QO• Best combination minimize average

completion• Computationally intractable

• Iterative Shortest Job First (SJF) scheduling heuristic

1. Pick shortest physical query plan in each iteration

• Reserve bandwidth to guarantee completion time

15 1718 25 30Iter 2:

A1

A2

0 t12

B1Link 1

Link 2

5 157

Page 70: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Avoid fragmentation and improve completion time

13

Page 71: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Avoid fragmentation and improve completion time

• SJF & reservation leads to bandwidth fragmentation

13

Page 72: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Avoid fragmentation and improve completion time

• SJF & reservation leads to bandwidth fragmentation

13

A1

A2

0 t

B1Link 1

Link 2

1210

Scheduled in SJF order

22

Page 73: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Avoid fragmentation and improve completion time

• SJF & reservation leads to bandwidth fragmentation

13

A1

A2

0 t

B1Link 1

Link 2

1210

Scheduled in SJF order

22

Dominant transfers execute sequentially

Page 74: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Avoid fragmentation and improve completion time

• SJF & reservation leads to bandwidth fragmentation

13

Extended idling

A1

A2

0 t

B1Link 1

Link 2

1210

Scheduled in SJF order

22

Dominant transfers execute sequentially

Page 75: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Avoid fragmentation and improve completion time

• SJF & reservation leads to bandwidth fragmentation

13

Alternate schedule with same query plans

0 t

B1A1

A2

122

Extended idling

A1

A2

0 t

B1Link 1

Link 2

1210

Scheduled in SJF order

22

Dominant transfers execute sequentially

Page 76: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Avoid fragmentation and improve completion time

• SJF & reservation leads to bandwidth fragmentation

13

Alternate schedule with same query plans

0 t

B1A1

A2

122

Extended idling

A1

A2

0 t

B1Link 1

Link 2

1210

Scheduled in SJF order

22

Dominant transfers execute sequentiallyRe-arranging transfers resulting in deviation from

SJF schedule can help

Page 77: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

k-Shortest Jobs First Heuristic

14

Link 1

Link 2

Link nOffline schedule

t

Page 78: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

k-Shortest Jobs First Heuristic

14

Link 1

Link 2

• Identify transfers of k-shortest yet incomplete jobs

Link nOffline schedule

t

Page 79: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

k-Shortest Jobs First Heuristic

14

Link 1

Link 2

• Identify transfers of k-shortest yet incomplete jobs• Relax transfer schedule Start as soon as link is free and task is available

Link nOffline schedule

t

Page 80: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

k-Shortest Jobs First Heuristic

14

Link 1

Link 2

• Identify transfers of k-shortest yet incomplete jobs• Relax transfer schedule Start as soon as link is free and task is available• Best ’k’ Prior observations (or) through offline simulations

Link nOffline schedule

t

Page 81: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO Existing Query Optimizers

Batch of queries

Page 82: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO Existing Query Optimizers• Modified Hive to generate multiple

plans

Batch of queries

Page 83: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO Existing Query Optimizers• Modified Hive to generate multiple

plans• QOs control set of generated plans• Existing optimizations are applied

• Push down Select• Partition pruning

Batch of queries

Page 84: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO Existing Query Optimizers• Modified Hive to generate multiple

plans• QOs control set of generated plans• Existing optimizations are applied

• Push down Select• Partition pruning

Batch of queries

Enforces Clarinet’s schedule

Clarinet

Execution framework

Page 85: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO Existing Query Optimizers• Modified Hive to generate multiple

plans• QOs control set of generated plans• Existing optimizations are applied

• Push down Select• Partition pruning

Batch of queries

Enforces Clarinet’s schedule• Modified Tez’s DAGScheduler

Clarinet

Execution framework

Page 86: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO Existing Query Optimizers• Modified Hive to generate multiple

plans• QOs control set of generated plans• Existing optimizations are applied

• Push down Select• Partition pruning

Batch of queriesOnline query arrivals

Enforces Clarinet’s schedule• Modified Tez’s DAGScheduler• Fairness guarantees

Clarinet

Execution framework

Page 87: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

16

Evaluation

Compare Clarinet with following GDA approaches:

Page 88: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

16

Evaluation

1. Hive2. Hive + Iridium3. Hive + Reducers in single DC

Compare Clarinet with following GDA approaches:

Page 89: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

16

Evaluation

1. Hive2. Hive + Iridium3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling

Compare Clarinet with following GDA approaches:

Page 90: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

16

Evaluation

1. Hive2. Hive + Iridium3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling: WAN aware task placement across DCs

Compare Clarinet with following GDA approaches:

Page 91: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

16

Evaluation

1. Hive2. Hive + Iridium3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling: WAN aware task placement across DCs: Distributed filtering + central aggregation

Compare Clarinet with following GDA approaches:

Page 92: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

16

Evaluation

1. Hive2. Hive + Iridium3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling: WAN aware task placement across DCs: Distributed filtering + central aggregation

Compare Clarinet with following GDA approaches:

• Geo-Distributed Analytics stack across 10 EC2 regions

Page 93: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

16

Evaluation

1. Hive2. Hive + Iridium3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling: WAN aware task placement across DCs: Distributed filtering + central aggregation

Compare Clarinet with following GDA approaches:

• Geo-Distributed Analytics stack across 10 EC2 regions

• Workload:• 30 batches of 12 randomly chosen TPC-DS queries

Page 94: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Evaluation: Reduction in average completion time

17

GDA ApproachVs. Hive

Average Gains

Clarinet 2.7x

Hive + Iridium 1.5x

Hive + Reducers in single DC

0.6x

Page 95: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Evaluation: Reduction in average completion time

17

GDA ApproachVs. Hive

Average Gains

Clarinet 2.7x

Hive + Iridium 1.5x

Hive + Reducers in single DC

0.6x

Clarinet chooses a different plan for 75% of queries

Page 96: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Evaluation: Reduction in average completion time

17

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 6 11 16 21 26 31 36 41 46 51 56

CD

F

Link ID sorted by bandwidth

WAN bandwidth distribution

Hive bytesdistribution

Clarinet bytesdistribution

Data from a single batch 12 queries

GDA ApproachVs. Hive

Average Gains

Clarinet 2.7x

Hive + Iridium 1.5x

Hive + Reducers in single DC

0.6x

Clarinet chooses a different plan for 75% of queries

Page 97: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Evaluation: Reduction in average completion time

17

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 6 11 16 21 26 31 36 41 46 51 56

CD

F

Link ID sorted by bandwidth

WAN bandwidth distribution

Hive bytesdistribution

Clarinet bytesdistribution

Data from a single batch 12 queries

GDA ApproachVs. Hive

Average Gains

Clarinet 2.7x

Hive + Iridium 1.5x

Hive + Reducers in single DC

0.6x

Clarinet chooses a different plan for 75% of queries

Page 98: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Evaluation: Reduction in average completion time

17

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 6 11 16 21 26 31 36 41 46 51 56

CD

F

Link ID sorted by bandwidth

WAN bandwidth distribution

Hive bytesdistribution

Clarinet bytesdistribution

Data from a single batch 12 queries

GDA ApproachVs. Hive

Average Gains

Clarinet 2.7x

Hive + Iridium 1.5x

Hive + Reducers in single DC

0.6x

Clarinet chooses a different plan for 75% of queries

Page 99: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

18

Evaluation: Optimization overhead

Page 100: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

18

Evaluation: Optimization overhead

1. Generate multiple query plans

2. Iterative multi-query plan selection

Page 101: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

18

Evaluation: Optimization overhead

1. Generate multiple query plans• Up to 64 plans in less than 5 s

2. Iterative multi-query plan selection

Page 102: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

18

Evaluation: Optimization overhead

1. Generate multiple query plans• Up to 64 plans in less than 5 s

2. Iterative multi-query plan selection• Max. 15 s for batches with 12 queries

Page 103: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

18

Evaluation: Optimization overhead

1. Generate multiple query plans• Up to 64 plans in less than 5 s

2. Iterative multi-query plan selection• Max. 15 s for batches with 12 queries

Insignificant w.r.t. query running times (order of 10’s of minutes)

Page 104: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Summary

• WAN-awareness in QO + cross-layer optimization

19

Distributed Storage Layer

Distributed Execution Layer

Query Optimizer Clarinet

Page 105: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Summary

• WAN-awareness in QO + cross-layer optimization

• Presented a scalable way to implement multi-query optimization with minimal overhead

19

Distributed Storage Layer

Distributed Execution Layer

Query Optimizer Clarinet

Page 106: Clarinet: WAN-Aware Optimization for Analytics Queries...Clarinet: WAN-Aware Optimization for Analytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1. Overview

Summary

• WAN-awareness in QO + cross-layer optimization

• Presented a scalable way to implement multi-query optimization with minimal overhead

19

Distributed Storage Layer

Distributed Execution Layer

Query Optimizer Clarinet

2.7xReduction in average completion time


Recommended