Implementing Declarative Overlays Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros...

Post on 27-Mar-2015

213 views 1 download

Tags:

transcript

Implementing Declarative Overlays

Boon Thau Loo1

Tyson Condie1, Joseph M. Hellerstein1,2, Petros Maniatis2, Timothy Roscoe2, Ion Stoica1

1University of California at Berkeley, 2Intel Research Berkeley

P2

Overlays Everywhere…

Overlay networks are widely used today: Routing and forwarding component of large-

scale distributed systems Provide new functionality over existing

infrastructure

Many examples, variety of requirements: Packet delivery: Multicast, RON Content delivery: CDNs, P2P file sharing, DHTs Enterprise systems: MS Exchange

Overlay networks are an integral part of many large-scale distributed systems.

Overlay networks are an integral part of many large-scale distributed systems.

Problem

Non-trivial to design, build and deploy an overlay correctly: Iterative design process:

Desired properties Distributed algorithms and protocols Simulation Implementation Deployment Repeat…

Each iteration takes significant time and utilizes a variety of expertise

The Goal of P2

Make overlay development more accessible: Focus on algorithms and protocol designs, not

the implementation

Tool for rapid prototyping of new overlays: Specify overlay network at a high level Automatically translate specification to protocol Provide execution engine for protocol

Aim for “good enough” performance Focus on accelerating the iterative design

process Can always hand-tune implementation later

Outline

Overview of P2Architecture By Example Data Model Dataflow framework Query Language

ChordAdditional Benefits Overlay Introspection Automatic Optimizations

Conclusion

Traditional Overlay Node

node

Network State

route ...

Traditional Overlay Node

node

Network State

route ...

Overlay Program Packets OutPackets In

node

Local Tables

route ...

P2 Overlay Node

...

Netw

ork In Dataflow

...

...

Netw

ork Out D

ataflow

Overlay description: dataflow scripting language

Runtime dataflows maintain network

state

Overlay description: declarative query language

Planner

P2 Query Processor

Packets OutPackets InOverlay Program

Advantages of the P2 Approach

Declarative Query Language Concise/high level expression Statically checkable (termination,

correctness)

Ease of modification Unifying framework for introspection and implementation Automatic optimizations Query and dataflow level

Data Model

Relational data: relational tables and tuples Two kinds of tables: Stored, soft state:

E.g. neighbor(Src,Dst), forward(Src,Dst,NxtHop) Transient streams:

Network messages: message (Rcvr, Dst) Local timer-based events: periodic (NodeID,10)

...

Netw

ork In Dataflow

...

...

Netw

ork Out D

ataflow

node

Local Tables

route ...

Dataflow framework

Dataflow graph C++ dataflow elements

Similar to Click: Flow elements (mux, demux, queues) Network elements (cc, retry, rate

limitation)

In addition: Relational operators (joins, selections,

projections, aggregation)

...

Netw

ork In Data

flow

...

...

Netw

ork Out D

ataflow

node

Local Tables

route ...

Outline

Overview of P2Architecture By Example Data Model Dataflow framework Query Language

Chord in P2 Additional Benefits Overlay Introspection Automatic Optimizations

Conclusion

Simple ring

routing example

Example: Ring Routing

3

28

15

1840

60

58 13

37

0

56

42

222433

Each node has an address and

an identifier

Each object has an

identifier.

Every node

knows its successor

Objects “served” by successor

3

28

15

1840

60

58 13

37

Ring State

node(IP40,40) succ(IP40,58,IP58)

node(IP58,58) succ(IP58,60,IP60)

Stored tables: node(NAddr, N) succ(NAddr, Succ, SAddr)

Example: Ring lookup

3

28

15

1840

60

58 13

37

0

56

42

222433

Find the responsible node for a given key k?

n.lookup(k)if k in (n, n.successor)

return n.successor.addr

elsereturn n.successor.

lookup(k)

lookup(IP40,IP37,59)

Ring Lookup Events

3

28

15

1840

60

58 13

37

node(IP40,40) succ(IP40,58,IP58)

Event streams: lookup(Addr, Req, K) response(Addr, K, Owner)

lookup(IP37,IP37,59)

response(IP37,59,IP60)

lookup(IP58,IP37,59)

node(IP58,58) succ(IP58,60,IP60)

n.lookup(k)

if k in (n, n.successor) return n.successor.addrelse return n.successor. lookup(k)

Strand 1

Strand 2

...

...

...

node

Local Tables

succ ...

Netw

ork In Dataflow

Netw

ork Out D

ataflow

Pseudocode Dataflow “Strands”

Pseudocode:n.lookup(k)

if k in (n, n.successor) return n.successor.addrelse return n.successor. lookup(k)

Dataflow Strand

Event Strea

m

Action

s

Element1

Element2

Elementn

Event: Incoming network messages, periodic timers

Strand 1

Strand 2

...

Netw

ork In

Da

taflow

...

...

Netw

ork Ou

t Da

taflow

node

Local Tables

succ ...

Strand Elements

Condition: Process event using strand elements

Action: Outgoing network messages, local table updates

Pseudocode Strand 1n.lookup(k)

if k in (n, n.successor) return n.successor.addrelse return n.successor.lookup(k)

RECEIVE lookup(NAddr, Req, K)

Stored tables node(NAddr, N) succ(NAddr, Succ, SAddr)

Event streams lookup(Addr, Req, K) response(Addr, K, Owner)

node(NAddr, N) & succ(NAddr, Succ, SAddr) & K in (N, Succ]

SEND response(Req, K, SAddr) to Req

Event:

Condition:

Action:

Strand 1

Strand 2

...

Netw

ork In D

ataflow

...

...

Netw

ork Out D

ataflow

node

Local Tables

succ ...

Pseudocode to Strand 1Event: RECEIVE lookup(NAddr, Req, K)

Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr) & K in (N, Succ]

Action: SEND response(Req, K, SAddr) to Req

Matchlookup.Add

r = node.Addr

Matchlookup.Add

r = succ.Addr

lookup FilterK in

(N,Succ)

FormatResponse(Req,K,SAddr)

Response

n.lookup(k)if k in (n, n.successor)

return n.successor.addrelse

return n.successor. lookup(k)

succnode

Strand 1

Strand 2

...

Netw

ork In D

ataflow

...

...

Netw

ork Out D

ataflow

node

Local Tables

succ ...

Join Join Select

Project

Dataflow strand

Pseudocode to Strand 2Event: RECEIVE lookup(NAddr, Req, K)

Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr)

Joinlookup.Add

r = node.Addr

Joinlookup.Add

r = succ.Addr

SelectK not in (N,Succ)

& K not in (N, Succ]

lookup

Action: SEND lookup(SAddr, Req, K) to SAddr

Projectlookup(SAdd

r,Req,K)

lookup

n.lookup(k)if k in (n, n.successor)

return n.successor.addrelse

return n.successor. lookup(k)

node succ

Strand 1

Strand 2

...

Netw

ork In D

ataflow

...

...

Netw

ork Out D

ataflow

node

Local Tables

succ ...

Dataflow strand

Strand Execution

Strand 1

Strand 2

...

...

...

node

Local Tables

succ ...

Netw

ork In Dataflow

Netw

ork Out D

ataflow

lookuplookup/

response

lookup

lookup

lookup

response

Actual Chord Lookup Dataflow

L1 Joinlookup.NI ==

node.NI

Joinlookup.NI ==bestSucc.NI

TimedPullPush 0

SelectK in (N, S]

ProjectlookupRes

MaterializationsInsert

Insert

Insert

L3TimedPullPush

0

JoinbestLookupDist.NI

== node.NI

L2TimedPullPush

0

node

Demux(@local?)

Tim

edPullP

ush0

Network OutQueueremote

local

Netw

ork In

best

Look

upD

ist

finge

rbe

stS

ucc

bestSucc

look

up

Mux

Tim

edPullP

ush 0

Queue

Dup

finger

node

RoundR

obin

Dem

ux(tuple nam

e)

Agg min<D>on finger

D:=K-B-1, B in (N,K)

Agg min<BI>on finger

D==K-B-1, B in (N,K)

Joinlookup.NI ==

node.NI

Query Language: Overlog

“SQL” equivalent for overlay networksBased on Datalog: Declarative recursive query language Well-suited for querying properties of

graphs Well-studied in database literature

Static analysis, optimizations, etc

Extensions: Data distribution, asynchronous messaging,

periodic timers and state modification

Query Language: Overlog

Datalog rule syntax: <head> <condition1>, <condition2>, … , <conditionN>.

Overlog rule syntax: <Action> <event>, <condition1>, … , <conditionN>.

Query Language: Overlog

Event: RECEIVE lookup(NAddr, Req, K)

Condition: lookup(NAddr, Req, K) & node(NAddr, N) & succ(NAddr, Succ, SAddr) & K in (N, Succ]

Action: SEND response(Req, K, SAddr) to Req

response@Req(Req, K, SAddr)

lookup@NAddr(Naddr, Req, K),node@NAddr(NAddr, N),

succ@NAddr(NAddr, Succ, SAddr), K in (N,Succ].

Overlog rule syntax: <Action> <event>, <condition1>, … , <conditionN>.

P2-Chord

Chord Routing, including: Multiple successors Stabilization Optimized finger

maintenance Failure recovery

47 OverLog rules13 table definitionsOther examples:

Narada, flooding, routing protocols

10 pt font

Performance Validation

Experimental Setup: 100 nodes on Emulab testbed 500 P2-Chord nodes

Main goals: Validate expected network properties

Sanity Checks

Logarithmic diameter and state (“correct”)BW-efficient: 300 bytes/s/node

Churn Performance

Metric: Consistency [Rhea at al]P2-Chord: P2-Chord@64mins: 97% consistency P2-Chord@16mins: 84% consistency P2-Chord@8min: 42% consistency

Hand-crafted Chord: MIT-Chord@47mins: 99.9% consistency

Outperforms P2 under higher churn

Not intended to replace a carefully hand-crafted Chord

Benefits of P2

Introspection with QueriesAutomatic optimizationsReconfigurable Transport (WIP)

Introspection with Queries

Unifying framework for debugging and implementation Same query language, same platform

Execution tracing/logging Rule and dataflow level Log entries stored as tuples and queried

Correctness invariants, regression tests as queries: “Is the Chord ring well formed?” (3 rules) “What is the network diameter?” (5 rules) “Is Chord routing consistent?” (11 rules)

With Atul Singh (Rice) and Peter Druschel (MPI)

Automatic OptimizationsApplication of traditional Datalog optimizations to network routing protocols (SIGCOMM 2005)Multi-query sharing:

Common “subexpression” elimination Caching and reuse of previously computed results Opportunistically share message propagation across

rules Join

lookup.Addr =

node.Addr

Joinlookup.Ad

dr = succ.Addr

lookup SelectK not in (N,Succ)

Projectlookup(SAddr

,Req,K)

lookup

ProjectResponse(Req,K,SAddr)

SelectK in

(N,Succ)

responseJoinlookup.Add

r = node.Addr

lookupJoin

lookup.Addr =

succ.Addr

Automatic Optimizations

Cost-based optimizations Join ordering affects performance

Joinlookup.Add

r = node.Addr

Joinlookup.Add

r = succ.Addr

SelectK not in (N,Succ)

lookup Projectlookup(SAddr

,Req,K)

lookup

ProjectResponse(Req

,K,SAddr)

SelectK in

(N,Succ)

response

Joinlookup.Add

r = node.Addr

Joinlookup.Add

r = succ.Addr

Open Questions

The role of rapid prototyping?How good is “good enough” performance for rapid prototypes?When do developers move from rapid prototypes to hand-crafted code?Can we get achieve “production quality” overlays from P2?

Future Work

“Right” languageFormal data and query semantics Static analysis Optimizations Termination Correctness

Conclusion

P2: Declarative Overlays Tool for rapid prototyping new overlay

networks

Declarative Networks Research agenda: Specify and

construct networks declaratively Declarative Routing : Extensible

Routing with Declarative Queries (SIGCOMM 2005)

Thank You

http://p2.cs.berkeley.edu

P2

Latency CDF for P2-Chord

Median and average latency around 1s.