SDN: Google's B4 and Traﬃc Engineering · 2016-04-04 · Introduction Why SDN based solution?...

SDN: Google's B4 and Traffic Engineering

1 / 57

Outline

1 B4: Experience with a Globally-Deployed Software Defined WAN

2 Achieving high utilization with software-driven WAN

2 / 57

Introduction

Modern WANs are critical to performance, reliability

Typically provisioned to 30-40% average utilization (2-3x bandwidthcost over-provisioning).

3 / 57

Introduction

Modern WANs are critical to performance, reliability

Typically provisioned to 30-40% average utilization (2-3x bandwidthcost over-provisioning).

Overheads + high bandwidth requirement.

3 / 57

Introduction

Google’s WAN, one of the largest in the Internet.

Delivers range of services like search, video, cloud computing, etc.

4 / 57

Introduction



Architecturally two distinct WANs

4 / 57

Introduction




1 User-facing network peers: for user traffic.

4 / 57

Introduction




1 User-facing network peers: for user traffic.2 B4

◮ Connectivity between data centers.◮ 90% of internal traffic runs on this network.

eg. asynchronous data copies, end user data replication, etc.

4 / 57

Introduction




1 User-facing network peers: for user traffic.2 B4

◮ Connectivity between data centers.◮ 90% of internal traffic runs on this network.

eg. asynchronous data copies, end user data replication, etc.

Why two different WANs?- different requirements (eg. priority, latency, etc.)

Internet traffic continues to grow rapidly, but Google’s WANtraffic grows even more faster.

4 / 57

Introduction

SDN approach for DC WAN interconnect.

5 / 57

Introduction


Motivation:◮ Deploy routing and TE protocols customized to Google’s unique

requirements.

5 / 57

Introduction


Motivation:◮ Deploy routing and TE protocols customized to Google’s unique

requirements.

Design goals:◮ Treat failures as common events.◮ Switches provide programmatic interface under central control.

5 / 57

Introduction

Why SDN based solution?

Limitations with traditional WAN architectures.

Elastic bandwidth demands: majority traffic, tolerant to transient failures

Moderate number of sites: few dozen data centers

End application control: control the network at every level with more

flexibility, thus reducing over-provisioning of resources

Cost sensitivity: nearly impossible to match the growing demand with

traditional approaches

Others include success of SDN and OF, rapid iteration of novelprotocols, improved capacity planning, scalability, flexibility, etc. E

6 / 57

Introduction

Manage switches using SDN principles

SDN Application: support standard routing protocols + centralizedTE service

◮ Edge servers make decisions on resource availability.◮ Use multipath forwarding based on application priority.◮ Dynamic reallocate bandwidth for link/switch failures.

7 / 57

Introduction

Manage switches using SDN principles

SDN Application: support standard routing protocols + centralizedTE service

◮ Edge servers make decisions on resource availability.◮ Use multipath forwarding based on application priority.◮ Dynamic reallocate bandwidth for link/switch failures.

Allows to achieve:

◮ near 100% link utilization on many B4 links◮ 70% on all link utilization

(ie. 2-3x efficiency improvements vs standard practice)

7 / 57

Design - Overview

8 / 57

Design - Overview

Logically, a three layered architecture.

B4 WAN - consists multiple sites.within each site, the switch hardware layer forwards traffic

Site Controller layer - consists of Network Control Servers (NCS)hosting both OpenFlow Controllers (OFC) and Network ControlApplications (NCAs).

- OFC maintains network state based on NCA directives- Paxos for fault tolerance of individual servers

Global layer - logically centralized applications like SDN Gateway,central TE server.

- enables central control of entire network- SDN gateway provides abstractions to TE server

9 / 57

Design - Overview

Options for integrating existing routing protocols with centralized trafficengineering:

10 / 57

Design - Overview


Approach 1: Build one integrated, centralized service combiningboth routing and TE

10 / 57

Design - Overview



Approach 2: Build routing and centralized TE as separateindependent services

10 / 57

Design - Overview



Approach 2: Build routing and centralized TE as separateindependent services

Which one would you prefer?

10 / 57

Design - Overview

Approach 2: Building routing and centralized TE as separate independentservices.

11 / 57

Design - Overview


Why?

Focus on SDN infrastructure development.

11 / 57

Design - Overview


Why?


Debug SDN architecture before adding new features.

11 / 57

Design - Overview


Why?


Debug SDN architecture before adding new features.

TE layer sits on top of routing protocols

BIG RED BUTTON to disable TE (back to shortest path forwarding)

11 / 57

Design - Switch Design

Conventional design needs deep buffers, large forwarding tables, hardwaresupport for HA.

12 / 57



For B4, Google resolves them by:

◮ adjusting transmission rates by careful endpoint management

12 / 57





◮ having modest number of DCs + abstraction = smaller forwardingtables

12 / 57






◮ moving software functionality from switches to upper layers

12 / 57






◮ moving software functionality from switches to upper layers

Need for custom switches

Switches that could export low-level control over switch forwardingbehavior

12 / 57


High-radix switch - deploying fewer larger switches ⇒ yields easiermanagement and software scalability

B4 switches - uses multiple merchant silicon switch chips + two-stageClos topology

Figure: High-radix switch

13 / 57

Design - Network Control Functionality

Majority functionality runs on NCS

Paxos handles leader election for all control functionalities◮ Failure detection◮ New leader election

Modified ONIX for OFC◮ OFC is the Network Information Base (NIB)

eg. topology info., trunk configs., link status, etc.

14 / 57

Design - Routing

How to integrate OpenFlow-based switch with existing routingprotocols?

Google chose Quagga stack for BGP/ISIS on NCS.

Developed an SDN application called”Routing Application Proxy (RAP)”.

RAP provides connectivity between Quagga and OF switches for:◮ BGP/ISIS route updates◮ routing-protocol packets flowing between switches and Quagga◮ interface updates from the switches to Quagga

15 / 57

Traffic Engineering

Goal: share bandwidth among competing

applications/flow-groups

16 / 57

Traffic Engineering

Goal: share bandwidth among competing

applications/flow-groups

Objective function: max-min fair allocation

16 / 57

Traffic Engineering

Notions

Network Topology: a group represents sites as vertices and site-to-siteconnectivity as edges.

Flow Group (FG): aggregate applications to flow groups defined as{source site, dest site, QoS} rule.

Tunnel (T): a site-level path in the network eg. sequence of sites

(A ⇒ B ⇒ C)

Tunnel Group (TG): maps FG to a set of tunnels (T ) and correspondingweights.

17 / 57

Traffic Engineering

Figure: Overview of Traffic Engineering

18 / 57

TE - Bandwidth Functions

Associate bandwidth function with every application

Admin-specified static weights (slope functions)

Allocate bandwidth based on flow’s relative priority (fair share)

19 / 57

TE - Max-Min Fair Allocation

Formal definition:

Resources are allocated to sources in order of increasing demand

No source gets a resource share larger than its demand

Sources with unsatisfied demand gets an equal share of the resource

S. Keshav (1997)

An Engineering Approach to Computer Networking, p. 215-217

Publisher Addison-Wesley, Reading, MA, 1997

20 / 57

TE - Max-Min Fair Allocation

Figure: Example of Max-Min Fair Allocation

1 Assign(

10Mbps4 flows

)

= 2.5 Mbps per flow

2 Sum the over-assigned amount (Residual) for flow 1, 0.5 Mbps over-assigned

3 Assign(

ResidualNo. of under assigned flows

)

to each flow = 0.5/3 = 0.0666 Mbps

4 Repeat steps 2 and 3 with new residual until no residual left or no demand isgreater than residual

Final assignment:Flow 1 = 2 Mbps, Flow 2 = 2.6 Mbps, Flow 3 = 2.7 Mbps, Flow 4 = 2.7 Mbps

21 / 57

TE - Weighted Max-Min Fair Allocation

Figure: Example of Weighted Max-Min Fair Allocation

1 Normalize weights (so that smallest weight is 1) W=[5,8,1,2]

2 Unit share =(

Total resourcesum of normalized weights

)

=(

1616

)

= 1

3 Assign every flow [unit share X normalized weight of flow ] units of resource

4 Calculate over-assigned resources and repeat steps 1,2,3, and 4 with thisresidual

Final assignment:Flow 1 = 4 Mbps, Flow 2 = 2 Mbps, Flow 3 = 4 Mbps, Flow 4 = 6 Mbps

22 / 57

TE - Optimization

LP optimal for allocating fair share for FGs is expensive and notscalable.

B4 team designed their own algorithm to achieve this with at least99% utilization and 25 times faster performance relative to LP.

Two main components:

1 Tunnel Group Generation: allocates bandwidth to FGs usingbandwidth functions to prioritize bottleneck edges.

2 Tunnel Group Quantization: changes split ratios in each TG tomatch granularity supported by switch hardware tables.

23 / 57

TE Protocol & OF - TE State and OpenFlow

Three modes of B4 switch:

1 Encapsulating switch

2 Transit switch

3 Decapsulating switch

24 / 57


25 / 57


Source switch maps packets to FG using <dest ip >, forwards tocorresponding TG.

TG hashes packets to a T in the desired ratio.

Each site in the path maintains per-tunnel forwarding rules.

Source site encapsulates packet with outer header (ie. Tunnel ID).

Transit switch uses tunnel ID to match rules and forwards it.

Decapsulating switch terminates flow based on tunnel ID.

26 / 57

TE Protocol & OF - Composing Routing and TE

B4 supports two routing services.1 Shortest path routing (uses Longest Prefix Match - LPM table)2 TE (uses Access Control List - ACL table)

Map different flows and groups to appropriate tables.

ACL takes strict precedence over LPM entries.

27 / 57

TE Protocol & OF - Composing Routing and TE

28 / 57

TE Protocol & OF - Coordinating TE State Across Sites

Figure: Overview of Traffic Engineering

29 / 57

TE Protocol & OF - Coordinating TE State Across Sites

TE server coordinates T/TG/FG rule installations across multipleOFCs.

TED - Traffic Engineering Database captures state needed to forwardpackets along multiple paths.

TED - <key,value> data store.

Compute per-site TED, generate TE Ops to OFCs.

TE Ops either add/modify/delete TED entries at OFCs.

OFCs convert TE Ops to flow-programming instructions and sends toall devices in its site.

Finally, OFC responds to original TE Op. g

30 / 57

TE Protocol & OF - Dependencies and Failures

Dependencies among Ops:

◮ to avoid packet drops, all Ops cannot run simultaneouslyeg. configure a T at all sites before configuring TG/FG

31 / 57




Synchronizing TED between TE and OFC:

◮ requires common TED view◮ TE session supports this synchronization◮ TE synchronizes TED with persistent memory - to handle

simultaneous failures

31 / 57







Ordering issues:

◮ site-specific sequences IDs assigned to TE Ops◮ enables ordering among operations

31 / 57







Ordering issues:

◮ site-specific sequences IDs assigned to TE Ops◮ enables ordering among operations

TE Op failures:

◮ due to RPC failure, OFC rejection, etc.◮ dirty/clean bit for each TED entry◮ enables resuming TE Ops from point of failure

31 / 57

Evaluation - Deployment and Evolution

Network traffic doubled in the year 2012

32 / 57


33 / 57


Observations:

1 Topology aggregation significantly reduces path churn and systemload.

2 Edge removals happen multiple times a day.

3 WAN links are susceptible ot frequent port flaps and benefit fromdynamic centralized management

34 / 57

Evaluation - TE Ops Performance

100x reduction in no. of TE Opsby caching recently used tunnels.

reduction in failed Ops

Reduced latency

35 / 57

Evaluation - TE Ops Performance

Notes:

TG Ops run for every topology change or change in demand

Growth in no. of TG Ops due to addition of network sites

Reduction in failure of TG Ops due to optimizations

36 / 57

Evaluation - Impact of Failures

Figure: Impact of failure between two sites

Failure of transit router requires longer convergence time (≈ 3.3 sec)◮ update multi-path table entries for potentially several tunnels◮ each update Op is slow

37 / 57

Evaluation - TE Algorithm Evaluation

Throughput improves as wehave more number of paths

Adding more paths and usingfiner granularity traffic splittinggives more flexibility to TE, butconsumes more hardware tableresources

B4′s deployment uses TE with quantum 1/4 and 4 paths

38 / 57

Evaluation - Link Utilization

Utilization close to100%

Ability to mix priorityclasses across all edges

Use separate edges fordifferent classes

39 / 57

Evaluation - Link Utilization

Figure: Per-link utilization in a trunk, demonstrating the effectiveness of hashing

For at least 75% site-to-site edges, max-min ratio of link utilization is:

◮ 1.05 without failures (ie. 5% from optimal)◮ 2.0 with failures

40 / 57

Conclusion

B4 now serves more traffic than Google’s public facing WAN withhigher growth rate.

SDN deployed cost-effective WAN bandwidth, running many links at100% utilization.

Hybrid approach an effective way to introduce SDN into existingdeployments.

Leveraging control at edge increases WAN utilization and improvingfault tolerance.

41 / 57

Date post:	25-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

SDN: Google's B4 and Traﬃc Engineering · 2016-04-04 · Introduction Why SDN based solution?...

Documents