+ All Categories
Home > Documents > Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David...

Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David...

Date post: 25-Aug-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
54
Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)
Transcript
Page 1: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Network EngineerDavid Swafford

Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Page 2: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)
Page 3: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)
Page 4: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

BB

BB

ORI

GIN

DATA

CENT

ERS

Build out-of-band network

Build IP network

Provision compute

Building a new POP (Point of Presence)

Build Optical Network

Page 5: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Building the out-of-band network

Install Internet service

Provision firewall

Provision management switches

Provision console servers

CONSOLE

SWITCH

FIREWALL

Page 6: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Bringing in optical connectivity

ORI

GIN

DATA

CENT

ERS

Trench fiber

Provision waves

Provision client transponders

ROUTER

ROUTER

OPTICAL LINE SYSTEM Provision optical line system

CLIENTTRANSPONDERS

POINT O

F PRESENCE

CLIENTTRANSPONDERS

Page 7: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Building the IP Network

SWITCHROUTER ROUTER

OPT

ICAL

NET

WO

RK

ROUTER

SWITCH SWITCH

Page 8: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Building the IP Network

SWITCHROUTER ROUTER

OPT

ICAL

NET

WO

RK

ROUTER

SWITCH

CONSOLESWITCH

FIREWALLCOMPUTE

SWITCH

Page 9: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Provisioning one of our edge routersLetting People Know Rack and Stack Cabling Management IP assignment Config Generation Software Upgrades Loading Config Validating Config Validating Hardware (Fans, Power Supplies, Linecards) Validating Physical Connectivity (LLDP and Light Levels) Validating Logical Connectivity (Protocols) Updating External Systems (Location Data, Status) Undraining Traffic

Page 10: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

What was already solved?Letting People Know Rack and Stack Cabling Management IP assignment Config Generation Software Upgrades Loading Config Validating Config Validating Hardware (Fans, Power Supplies, Linecards) Validating Physical Connectivity (LLDP and Light Levels) Validating Logical Connectivity (Protocols) Updating External Systems (Location Data, Status) Undraining Traffic

Page 11: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

30 steps involving 10+ tools...

MOPs?

Page 12: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

CC0 Licensed Image:https://www.pexels.com/photo/accident-action-danger-emergency-260367/

Page 13: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

We wanted push button!BUILD

Page 14: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Major Pieces Needed

A method to quickly and reliably: apply configuration to a blank device upgrade software

Software for: notifying people checking hardware updating our asset management system changing BGP policy to enable traffic

Empower and enable our engineers!

Page 15: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Options for loading configuration

ROUTER

CONSOLE

FIREWALL

OUT-OF-BAND MANAGEMENT

CONSOLE >> '\r'CONSOLE << 'login:'CONSOLE >> 'root\r' CONSOLE << 'password:' CONSOLE >> '\r' CONSOLE >> 'router>' CONSOLE << 'enable\r'CONSOLE >> 'router#'CONSOLE >> 'config t\r'CONSOLE << 'router(config)#' ...

Page 16: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Options for loading configuration

ROUTER

CONSOLE

FIREWALL

OUT-OF-BAND MANAGEMENT

ETH0 >> DHCPDISCOVERETHO << DHCPOFFERETHO >> DHCPREQUEST ETH0 << DHCPACKETH0 >> HTTP-REQUESTETH0 << HTTP-RESPONSEETH0 >> HTTP-REQUESTETH0 << HTTP-RESPONSE...

SWITCH

Page 17: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Replacing MOPs with Vending Machine

Page 18: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Automating the MOPs?

We needed to write a LOT of code.

We needed a workflow automation system

We needed to replace the MOPs

Page 19: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

How? Divide and conquer!

The system was built for the network engineer

We removed the barriers

We empowered our peer network engineering teams

Page 20: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Building for the network engineer

Small, independent pieces of code written in any programming language

Steps should do only one thing

Knowledge of "the system" should not be required

Page 21: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

How? Isolate "the system" from the workflow

Units of work are called Steps

A Step is a compiled piece of code that is executed as a binary

Testing and development reduced to only your step

Page 22: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Giving the system a name

We named it Vending Machine!

Vending Machine is a purpose-built workflow automation system created around Zero Touch Provisioning

Stability in step-level isolation

Page 23: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Provisioning redefined

vm configure <name>

Letting People Know Rack and Stack Cabling Management IP assignment Config Generation Software Upgrades Loading Config Validating Config Validating Hardware (Fans, Power Supplies, Linecards) Validating Physical Connectivity (LLDP and Light Levels) Validating Logical Connectivity (Protocols) Updating External Systems (Location Data, Status) Undraining Traffic

BUILDMOPs

Page 24: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Zero Touch Provisioning

Page 25: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

MANAGEMENT LANROUTER

DHCP SERVER

OPTION 67: BOOTFILE-NAME:HTTP://VM/ABCD1234/AGENT.PY

DHCPOFFER

DHCPDISCOVEROPTION 60, VENDOR-CLASS:

"VENDORX;MODEL1001;ABCD1234"

Requesting a ZTP agent over DHCP

Page 26: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

MANAGEMENT LANROUTER

DHCP SERVER

OK:<body>...binary data of script</body>

HTTP-REPLY

HTTP-GET

HTTP://VM/ABCD1234/AGENT.PY

Requesting a ZTP agent over DHCP

VENDING MACHINE

Page 27: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Building a feedback loop

ROUTERVENDING MACHINE

HTTP-GET /start/<SN>

/complete/<SN>

OK: "{'JOB_ID': '1'}"

OK

Page 28: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Delaying ZTP while running other Steps

ROUTERVENDING MACHINE

404 NotFound

HTTP-GET agent.py

ZTP

SSH CHECK

CONFIG GENERATION

Page 29: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

ROUTER

VENDING MACHINE

ZTP

ERASE

SSH CHECK

UNDRAIN TRAFFIC

BUILD

Page 30: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

YES!

ROUTERVENDING MACHINE

ARE YOU UP YET?

DONE

UNDRAIN TRAFFIC

YES!

BGP PEERS UP?

Page 31: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Writing a Vending Machine Step

Page 32: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

#!/usr/bin/python3

import jsonimport loggingimport sys

def main():

stdin = sys.stdin.read().strip() input = json.loads(stdin) hostname = input['hostname']

logging.info(f'Generating configs for {hostname}')

build_configs(hostname)

verify_configs(hostname)

Config Generation

Page 33: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

#!/usr/bin/python3

from thrift.transport import TSocketfrom thrift.transport import TTransportfrom thrift.protocol import TBinaryProtocolfrom configservice import ConfigGenerationServicefrom configservice.ttypes import ConfigGenerationResult

def build_configs(self, hostname): transport = TSocket('localhost', 9090) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport)

with ConfigGenerationService.Client(protocol) as client: result = client.generate_configs(hostname) if result.status == ConfigGenerationResult.SUCCESS: logging.info('Generated new configs!') else: logging.info('Configs are already up-to-date.')

Apache Thrift's client example: http://thrift-tutorial.readthedocs.io/en/latest/usage-example.html

Config Generation

Page 34: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

#!/usr/bin/python3

import urllib3VM_VIP = '2a03:2880:f101:83:face:b00c:0:25de'

def verify_configs(self, hostname):

with urllib3.PoolManager() as http: url = f'http://{VM_VIP}/{hostname}/config.conf')

response = http.request('GET', url)

if response.status == 200: logging.info( f'Successfully fetched config from {url}') sys.exit(0)

logging.error( f'Failed to fetch config from {url}') sys.exit(1)

Config Generation

Page 35: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

CONFIGGENERATION

STEP

'{"asset_id": "10001", "hostname": "router1", "serial": "AAEF0016", "job_id": "1", "attempt_id": "1"}'

STDIN:

EXIT_SUCCESS

STDERR:INFO: Generating configs for router1...

INFO: Generated new configs!

CONFIGGENERATION

SERVICE

Config Generation

Page 36: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Vending Machine Internals

Page 37: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Design Goals

Flexibility and Rapid Development

Scalable

Fast

Resilient

Predictable

Page 38: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

The System

EXECUTOR

EXECUTOR

EXECUTOR

MYSQLDB

ZOOKEEPER QUEUE

CONTROLLER

CONTROLLER

Page 39: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Coordinating Jobs

EXECUTOR

EXECUTOR

EXECUTOR

MYSQLDB

ZOOKEEPER QUEUE

CONTROLLER

CONTROLLER

Page 40: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Distributing the Work

0

ZOOKEEPER QUEUE

QUEUEPOSITION

Job: 1 Step: are_we_up_yet

CONTROLLERQUEUES STEP

Page 41: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Distributing the Work

0

ZOOKEEPER QUEUE

CONTROLLERQUEUES STEP

QUEUEPOSITION

Job: 1 Step: are_we_up_yet

Job: 2 Step: erase_device

Job: 2 Step: are_we_up_yet

1

2

Page 42: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Distributing the Work

0

ZOOKEEPER QUEUE

CONTROLLERQUEUES STEP

QUEUEPOSITION

Job: 1 Step: are_we_up_yet

Job: 2 Step: erase_device

Job: 2 Step: are_we_up_yet

1

2

EXECUTOR

ARE WE UP YET

U REPO

NEW VERSION

AVAILABLE?

Page 43: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Distributing the Work

0

ZOOKEEPER QUEUE

CONTROLLERQUEUES STEP

QUEUEPOSITION

Job: 1 Step: are_we_up_yet

Job: 2 Step: erase_device

Job: 2 Step: are_we_up_yet

1

2

EXECUTOR

'{"asset_id": "10001", "hostname": "router1", "serial": "AAEF0016", "job_id": "1", "attempt_id": "1"}'

STDIN:ARE WE UP YET

U

Page 44: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Distributing the Work

0

ZOOKEEPER QUEUE

CONTROLLERQUEUES STEP

QUEUEPOSITION

Job: 1 Step: are_we_up_yet

Job: 2 Step: erase_device

Job: 2 Step: are_we_up_yet1

2

EXECUTOR

ARE WE UP YET

EXIT_FAILURE

ERROR: Device not up yet!

Page 45: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Transient Failures

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 111] Connection refused

Page 46: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

SN: *MAKE: FACEBOOK MODEL: WEDGE LOCATION: *SN: ABCD1234

MAKE: WELLFLEET MODEL: BNX LOCATION: DEN

Target

Device

What to do?MATCH?

Page 47: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

SN: *MAKE: FACEBOOK MODEL: WEDGE LOCATION: *SN: ABCD1234

MAKE: WELLFLEET MODEL: BNX LOCATION: DEN

Target

Device

What to do?MATCH? No Match

Page 48: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

SN: *MAKE: FACEBOOK MODEL: WEDGE LOCATION: *

SN: ABCD1234MAKE: WELLFLEET MODEL: BNX LOCATION: DEN

Target

Device

What to do?

MATCH?

SN: *MAKE: WELLFLEET MODEL: * LOCATION: *

Page 49: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

SN: *MAKE: FACEBOOK MODEL: WEDGE LOCATION: *

SN: ABCD1234MAKE: WELLFLEET MODEL: BNX LOCATION: DEN

Target

Device

What to do?

MATCH?

SN: *MAKE: WELLFLEET MODEL: * LOCATION: *

SN: *MAKE: WELLFLEET MODEL: * LOCATION: DEN MOST

SPECIFIC

Page 50: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Going Beyond the Device

EB1.US- WEST

EB2.US-WEST

EB1.US-CENTRAL

EB2.US-CENTRAL

EB2.US-SOUTH

EB1.US-EAST

EB2.US-EAST

EB1.US-SOUTH

DRAIN PLANE 2

REBUILD:EB2.US-WESTEB2.US-CENTRAL EB2.US-SOUTH EB2.US-EAST

UNDRAIN PLANE 2

Page 51: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

vm configure router1

Page 52: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

vm detailrouter1

Page 53: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

router1

vm log tail

Page 54: Scaling the Facebook backbone through Zero Touch ...€¦ · 6/27/2018  · Network Engineer David Swafford Scaling the Facebook backbone through Zero Touch Provisioning (ZTP)

Recommended