+ All Categories
Home > Technology > July NYC Open Networking Meeup

July NYC Open Networking Meeup

Date post: 16-Apr-2017
Category:
Upload: cumulus-networks
View: 226 times
Download: 0 times
Share this document with a friend
51
v BGP in the Datacenter Pete Lumbis – @PeteCCDE Datacenter Architect CCIE #28677, CCDE 2012::3 cumulusnetworks.com 1
Transcript
Page 1: July NYC Open Networking Meeup

v

BGP in the Datacenter

Pete Lumbis – @PeteCCDEDatacenter Architect

CCIE #28677, CCDE 2012::3

cumulusnetworks.com 1

Page 2: July NYC Open Networking Meeup

Pete Who?

CCIE R&S #28677, CCDE 2012::3

Former Cisco TAC Routing Escalation

Current Cumulus Networks SE

DC Automation and Architecture

Page 3: July NYC Open Networking Meeup

Agenda

The history of L2Routing in the datacenterBGP in the datacenterTroubleshooting improvementsBGP on Servers

cumulusnetworks.com 3

Page 4: July NYC Open Networking Meeup

In the Beginning…

There was L2…

cumulusnetworks.com 4

Page 5: July NYC Open Networking Meeup

In the Beginning…

…but it had problems

cumulusnetworks.com 5

50% bandwidth loss due to STP

Page 6: July NYC Open Networking Meeup

In the Beginning…

…but it had problems

cumulusnetworks.com 6

Unexpected Root change

Root

Page 7: July NYC Open Networking Meeup

In the Beginning…

…but it had problems

cumulusnetworks.com 7

STP Brownout

Flooding!

Temporary loops!

STP Block on TCN!

Page 8: July NYC Open Networking Meeup

Agenda

The history of L2Routing in the datacenterBGP in the datacenterTroubleshooting improvementsBGP on Servers

cumulusnetworks.com 8

Page 9: July NYC Open Networking Meeup

Layer 3 Clos

cumulusnetworks.com 9

Server gateway is attached Leaf

Routing Between Spine and Leafs

10.1.1.0/24 10.2.2.0/24 10.3.3.0/24

OSPF or BGP

Page 10: July NYC Open Networking Meeup

Layer 3 – Spine and Leaf

cumulusnetworks.com 10

Full ECMP

Page 11: July NYC Open Networking Meeup

Layer 3 – Spine and Leaf

cumulusnetworks.com 11

Full ECMPManageable Oversubscription

48 x 10Gig = 480 Gigs

2 x 40Gig = 80 Gigs = 6:1 Oversubscription

Page 12: July NYC Open Networking Meeup

Layer 3 – Spine and Leaf

cumulusnetworks.com 12

Full ECMPManageable Oversubscription

Easy to Adjust

48 x 10Gig = 480 Gigs

2 x 40Gig = 80 Gigs = 6:1 Oversubscription

Page 13: July NYC Open Networking Meeup

Layer 3 – Spine and Leaf

cumulusnetworks.com 13

Full ECMPManageable Oversubscription

Easy to Adjust

48 x 10Gig = 480 Gigs

3 x 40Gig = 120 Gigs = 4:1 Oversubscription

Page 14: July NYC Open Networking Meeup

Layer 3 – Spine and Leaf

cumulusnetworks.com 14

Full ECMPManageable Oversubscription

Easy to Adjust

48 x 10Gig = 480 Gigs

3 x 40Gig = 120 Gigs = 4:1 Oversubscription

Page 15: July NYC Open Networking Meeup

Layer 3 – Spine and Leaf

cumulusnetworks.com 15

Full ECMPManageable Oversubscription

Easy to AdjustMassive Scale

48 x 10Gig = 480 Gigs

3 x 40Gig = 120 Gigs = 4:1 Oversubscription

Page 16: July NYC Open Networking Meeup

Layer 3 – Spine and Leaf

cumulusnetworks.com 16

Full ECMPManageable Oversubscription

Easy to AdjustMassive ScaleControlled Failures

Leaf Failure Reduces Compute

Page 17: July NYC Open Networking Meeup

Layer 3 – Spine and Leaf

cumulusnetworks.com 17

Full ECMPManageable Oversubscription

Easy to AdjustMassive ScaleControlled Failures

Spine Failure Increases Oversubscription

Page 18: July NYC Open Networking Meeup

Agenda

The history of L2Routing in the datacenterBGP in the datacenterTroubleshooting improvementsBGP on Servers

cumulusnetworks.com 18

Page 19: July NYC Open Networking Meeup

BGP as an IGP

RFC Draft submitted 2014Microsoft and FacebookTargeting DCAll the hows and whys

cumulusnetworks.com 19

Page 20: July NYC Open Networking Meeup

But I thought BGP was…

…slow Nope. Not with BFD and timer tuning. Just as fast as OSPF.

…hard to configure We’ll get to that one later, but it can be easy

…only for service providers SPs build for scale and stability. You should too

…hard to troubleshoot Nice and easy when everything is defined + recent

advancescumulusnetworks.com 20

Page 21: July NYC Open Networking Meeup

Single ASN for SpinesUnique ASN for LeafsUse Private ASN range2-byte (1023):

64512 – 65534

4-byte (94 million): 4200000000 - 4294967294

BGP Datacenter Design

cumulusnetworks.com 21

65534 65534

64512 64513 64514

Page 22: July NYC Open Networking Meeup

Reducing BGP Configuration Complexity

Classically lots to manage

cumulusnetworks.com 22

65534 65534

64512 64513 64514

router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as 64512 neighbor 10.1.1.2 remote-as 64513 neighbor 10.1.1.3 remote-as 64514 neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3

Page 23: July NYC Open Networking Meeup

Reducing BGP Configuration Complexity

First – Simplify Remote AS

cumulusnetworks.com 23

65534 65534

64512 64513 64514

router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as 64512 neighbor 10.1.1.2 remote-as 64513 neighbor 10.1.1.3 remote-as 64514 neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3

Page 24: July NYC Open Networking Meeup

Reducing BGP Configuration Complexity

First – Simplify Remote AS

cumulusnetworks.com 24

65534 65534

64512 64513 64514

router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as external neighbor 10.1.1.2 remote-as external neighbor 10.1.1.3 remote-as external neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3

Page 25: July NYC Open Networking Meeup

Reducing BGP Configuration Complexity

First – Simplify Remote AS

cumulusnetworks.com 25

65534 65534

64512 64513 64514

router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as external neighbor 10.1.1.2 remote-as external neighbor 10.1.1.3 remote-as external neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3

remote-as internal as well

Page 26: July NYC Open Networking Meeup

Reducing BGP Configuration Complexity

Next – Use Peer Groups

cumulusnetworks.com 27

65534 65534

64512 64513 64514

router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as external neighbor 10.1.1.2 remote-as external neighbor 10.1.1.3 remote-as external neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3

Page 27: July NYC Open Networking Meeup

Reducing BGP Configuration Complexity

Next – Use Peer Groups

cumulusnetworks.com 28

65534 65534

64512 64513 64514

router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 peer-group leafs neighbor 10.1.1.2 peer-group leafs neighbor 10.1.1.3 peer-group leafs neighbor leafs remote-as external neighbor leafs timers 1 3 neighbor leafs timers connect 3

Page 28: July NYC Open Networking Meeup

Reducing BGP Configuration Complexity

Finally – BGP Unnumbered

cumulusnetworks.com 29

65534 65534

64512 64513 64514

router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 peer-group leafs neighbor 10.1.1.2 peer-group leafs neighbor 10.1.1.3 peer-group leafs neighbor leafs remote-as external neighbor leafs timers 1 3 neighbor leafs timers connect 3

Page 29: July NYC Open Networking Meeup

Reducing BGP Configuration Complexity

Finally – BGP Unnumbered

cumulusnetworks.com 30

65534 65534

64512 64513 64514

router bgp 65534 router-id 10.0.0.1 neighbor swp1 peer-group leafs neighbor swp2 peer-group leafs neighbor swp3 peer-group leafs neighbor leafs remote-as external neighbor leafs timers 1 3 neighbor leafs timers connect 3

Page 30: July NYC Open Networking Meeup

BGP Unnumbered

Uses IPv6 Link Local addresses Automatically assigned, no address

management

No need for infrastructure Ips Only need Loopbacks

Advertises both IPv4 and IPv6 Routes RFC 5549. Full interop with Cisco, Arista,

Junipercumulusnetworks.com 31

Page 31: July NYC Open Networking Meeup

Agenda

The history of L2Routing in the datacenterBGP in the datacenterTroubleshooting improvementsBGP on Servers

cumulusnetworks.com 34

Page 32: July NYC Open Networking Meeup

BGP Troubleshooting Improvements - Traceroute

How do you troubleshoot links without IPs?

Traceroute improvements Report back loopback IP

cumulusnetworks.com 35

Page 33: July NYC Open Networking Meeup

BGP Troubleshooting Improvements - Hostnames

Who is the peer?

Hostname BGP extension

draft-walton-bgp-hostname-capability

cumulusnetworks.com 36

Page 34: July NYC Open Networking Meeup

Comparing BGP Configurations

Traditional Config

cumulusnetworks.com 37

router bgp 65534 router-id 10.0.0.1 maximum-paths 64 bgp bestpath as-path multipath-relax neighbor 10.1.1.1 remote-as 64512 neighbor 10.1.1.2 remote-as 64513 neighbor 10.1.1.3 remote-as 64514 neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3

router bgp 65534 router-id 10.0.0.1 neighbor swp1 peer-group leafs neighbor swp2 peer-group leafs neighbor swp3 peer-group leafs neighbor leafs remote-as external

Cumulus Config

Page 35: July NYC Open Networking Meeup

Agenda

The history of L2Routing in the datacenterBGP in the datacenterTroubleshooting improvementsBGP on Servers

cumulusnetworks.com 38

Page 36: July NYC Open Networking Meeup

BGP to the Server

Why stop at the top of rack?BGP to the Server!Cumulus Quagga, GoBGP, Bird.

Just Linux Apps!

No L2, No mLAG, No Infrastructure IPs Use BGP Unnumbered

Same troubleshooting and monitoringcumulusnetworks.com 39

Page 37: July NYC Open Networking Meeup
Page 38: July NYC Open Networking Meeup

Summary

L3 > L2 At least 1 better Routing provides better scale and stability

Easy to configure, automate, troubleshoot

BGP all the way to the server!Smart defaults and Configuration Simplifications cumulusnetworks.com 41

Page 39: July NYC Open Networking Meeup

© 2014 Cumulus Networks. Cumulus Networks, the Cumulus Networks Logo, and Cumulus Linux are trademarks or registered trademarks of Cumulus Networks, Inc. or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. The registered trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide basis.

Thank You!

cumulusnetworks.com 42

Page 40: July NYC Open Networking Meeup

Asaf Wachtel, Sr. Director EnterpriseJuly 2016

25GbE Technology Update

Page 41: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 44- Mellanox Confidential -

Open APIs

Open Composable Networks

Automation

End-to-End Interconnect

Network OS

ChoiceSONiC

Page 42: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 45- Mellanox Confidential -

Open Networking is Real: OCP Summit March 2016

Page 43: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 46- Mellanox Confidential -

25/50/100GbE: The Future is Here!

Compute Nodes

Storage Nodes

Network40GbE

10GbE 40GbE

Compute Nodes

150% Higher Bandwidth

Storage Nodes

25% Higher Bandwidth

Network150%

Higher Bandwidth

100GbE

25GbE 50GbE

Similar ConnectorsSimilar Infrastructure

Similar Cost / Power

Page 44: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 47- Mellanox Confidential -

Who needs more than 10GbE?

Latest multi-core Intel CPUs can easily drive more than 10Gb/s

Cloud (public or private)• Multi-tenancy• Need to deliver higher SLAs with lower predictability

Hyperconverged / Software Defined Storage / NVMe• Network & Storage on the same wire• Faster & Cheaper storage media

Database / Big Data• Increasing volumes• Moving from batch to real-time

Network Function Virtualization (NFV)• I/O intensive data plane

Page 45: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 48- Mellanox Confidential -

Why 25GbE? Do the Math!

Best match for current PCI technology• PCIe3x8 = ~52Gb/s; 2 x 25 = 50Gb/s

Most efficient switch silicon design• Maximizes both ports and bandwidth• 40GbE requires 4 lanes per port == cost + power

Unmatched price-performance / Best price per Gb/s• 25G = 2.5X BW at 1.5x the price

Lower OPEX & TCO• Cut number of NICs, cables, switch ports in half• Lower power & cooling

Better switch port density • Fewer uplinks needed to maintain 1:1 subscription

Uses existing fiber infrastructure (single lane) Fully backward compatible

• Mix/match new 25GbE components and existing 10GbE

Future proof + economies of scale (50/100GbE)• 50Gb is 2x25G, 100G is 4x25G

2.5X bandwidth with single-lane technology

Page 46: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 49- Mellanox Confidential -

25GbE Industry Timeline

March 2014: Microsoft presents proposal for 25GbE to IEEE, leveraging

existing activities, such as 25G PHY (100GbE) & SFP28 (32G FC)

July 2014: Open Industry Consortium to Bring 25 and 50 Gigabit Ethernet to

Cloud-Scale Networks

August 2015: First products ship to end customers

September 2015: The 25G Ethernet Consortium specification draft completed

December 2015: Multi-vendor interoperability validated by multiple customers

Q4 2015 – Q2 2016: Ecosystem grows and matures

June 2016: IEEE 802.3by standard approved by The IEEE-SA Standards

Board

Page 47: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 50- Mellanox Confidential -

25GbE vs 10GbE

25GbE 10GbEPicture

Standard SFP28 SFP+Physical Form Factor SFP SFPNumber of lanes 1 1Lane speed 25Gbps 10Gbps

Encoding 64b/66b 64b/66bBackward/Forward Compatibility

Fully interoperable @ 10Gb/s

Fully interoperable @ 10Gb/s

Max Copper Reach 5m 7mMM Fiber Reach 100m 300mSM Fiber Reach 10KM 10KM

Page 48: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 51- Mellanox Confidential -

3 Types of Connectivity Products

Direct Attach Copper (DAC)

“Transceiver”4-channels Transmit4-channels Receiver

Copper Wires.Directly Attaches one system to another

Key feature = Lowest Priced Link<3m reaches

Optical TransceiverConverts electrical signals to optical.

Transmits blinking laser light over optical fiber.Key feature = long reach - up to 10Km.

Active Optical Cable2 Transceivers with optical fiber bonded in.Key feature = Lowest Priced Optical Link

100m/200m Reaches

10G/40G14G/56G

25G/50G/100G

InfiniBand: DAC & AOCs

Ethernet: DAC & AOCs & TransceiversSFP28

LC Transceiver

QSFP28 LC

Transceiver QSFP28 MPO

Transceiver

VCSEL & Silicon Photonics

Multi-mode &

Single-mode

MPO & LC Connectors

Page 49: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 52- Mellanox Confidential -

As Data Rates Increase, Distances Decrease Favoring Silicon Photonics + Single-mode Fiber

Link Length (m)10 100 500150 300 1000 2000

10

25

50

3 51

20

Dat

a R

ate

per L

ane

(Gb\

s)

10000500020 30 50 752

Single mode fiber

OM4OM3

Copper Multi-mode fiber

Silicon Photonics

Direct Attach Copper• Zero power• Demo’d 8m at 100G• Best fit 3m

DACs

Active Optical Cables• VCSEL 100m• Silicon Photonics 200m• Best fit for 5-20m

SR/SR4 VCSEL Transceivers• Reaches to 100m• Best fit for MMF• Structured cabling

Silicon Photonics Transceivers• Reaches to 2km • Best fit for SMF• Parallel PSM4 or WDM4

3-5M 70m 100M

MMF= MULTI-MODE FIBER SMF = SINGLE-MODE FIBER

2Km/10KmSR-SR4VCSELs

Page 50: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 53- Mellanox Confidential -

Webscale IT Innovation: QSFP TOR for 4x Density and Lower COGS

EST = $166Single cable!

Break-out cabling vs standard cabling

Ideal port density and configuration deployment options

4 cables = $216

Qty (4) cables @ $54

Benefits• Easier cable management

• fewer cables

• 23% lowers cost

Benefits• Flexible configuration options

• Highest port density

• Lowest power consumption

• Half-width deployment option • 4 SFP+ plus 4 QSFP+ ports• Up to 128 ports of 10GbE in 2 RU• Illogical configuration with wasted ports

* RU = rack unit

• 16 QSFP28 ports (32 in 1 RU*)• Up to 128 10/25GbE ports in 1 RU• Logical configuration options:

• Redundant “48 + 4” in 1 RU

Mellanox Competition

To achieve equivalent bandwidth

$1000 less cable cost per rack

Page 51: July NYC Open Networking Meeup

© 2016 Mellanox Technologies 54- Mellanox Confidential -

Summary: 25/50/100GbE is Here!

Interconnect

Adapter 100GbE Adapter

150 million messages per second

10 / 25 / 40 / 50 / 56 / 100GbE

32 100GbE Ports, 64 25/50GbE Ports

10 / 25 / 40 / 50 / 56 / 100GbE

Throughput of 6.4Tb/s

Switch

Software

Transceivers

Active Optical and Copper Cables

10 / 25 / 40 / 50 / 56 / 100GbE VCSELs, Silicon Photonics and Copper


Recommended