+ All Categories
Home > Documents > Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron:...

Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron:...

Date post: 16-Mar-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
51
Transcript
Page 1: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.
Page 2: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Robotron: Top-down Network Management at Scale

Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, HongyiZengACM SIGCOMM 2016August 25, 2016

Page 3: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Scale of Facebook Community

1.7 Billion 500 Million 1 Billion1 Billionon Facebook Monthly on Whatsapp Monthly on Instagram Monthly on Messenger Monthly

Page 4: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Network Management at Facebook

`

...

...

...

...

...

...

...

...

...R

...R

...

...

1 511

512 1024

• Goals: Build and evolve FB network• Example tasks: circuit/device

turnup, network monitoring• Human interactions -> outages

What’s involved?

Page 5: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• Distributed Configurations• Multiple Domains• Versioning• Dependency• Vendor Differences

Network Management at FacebookWhy is it hard?

Page 6: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Network Management at Facebook

2004-2007 2008 2009 2010 2011 2012 2013 2014 2015

ManualConfigurationandMonitoringwithad-hocscripts

Early days…

Page 7: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Contribution

2004-2007 2008 2009 2010 2011 2012 2013 2014 2015

ManualConfigurationandMonitoringwithad-hocscripts

Robotronstarted

OurPaper

• Shed light on• Network management tasks• Robotron’s usage• Evolution of Roboron• Our experiences using Robotron

Page 8: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Overview of Facebook’s NetworkLifecycle of user requests

POPsInternet Backbone Data CentersUsers

Page 9: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Point of Presence (POP)

POPsInternet Backbone Data CentersUsers

• Standardized topology• Services: LB, Cache• Common tasks• Build/upgrade a cluster• Provisioning new peering

circuits

Page 10: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Backbone

POPsInternet Backbone Data CentersUsers

• Irregular, demand-driven topology• Common tasks:• Add/migrate circuits• Add/remove routers

Page 11: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Datacenter

POPsInternet Backbone Data CentersUsers

• Standardized topology• Services: Web, Cache,

Database• Common tasks• Build/decomm a cluster• Cluster capacity upgrade

Page 12: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

POP

Overview of Facebook’s Network

0

0.2

0.4

0.6

0.8

1

# o

f cl

ust

ers

(norm

aliz

ed)

Time

Gen3V6Gen3

Gen2V6Gen2-DGen2-CGen2-BGen2-A

Gen1

0

0.2

0.4

0.6

0.8

1

# o

f cl

ust

ers

(n

orm

aliz

ed

)

Time

Gen2Gen1

(normalize

d)

DC

Multiple versions of FB cluster architectures co-exist

8 generations

Page 13: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Robotron: “Top-Down” Network Management System@FBOverview

FBNet DB

NetworkDesign

ConfigGeneration Deployment Monitoring

Page 14: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

FBNet: Modeling the NetworkExample 4-post POP cluster

20G

Internet

PSWa PSWb PSWc PSWd

PR1

BB1 BB2

To Top-of-Rack switches & servers

PR2

4-post POPCluster

Page 15: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

NetworkswitchLinecard

PhysicalInterface

PhysicalInterface

AggregatedInterface

V6Prefix

BgpV6Session

Circuit

Circuit

FBNet: Modeling the NetworkObject

PR1PSWa

10G

10Get1/1

et1/2

et2/1

et3/1

ae0 ae12001::1 2001::2

eBGP session

Linecard

Circuit

Page 16: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

name=PSWaslot=1

model=X

name=et1/1

name=et1/2

name=ae0

prefix=2001::1

NetworkswitchLinecard

PhysicalInterface

PhysicalInterface

AggregatedInterface

V6Prefix

BgpV6Session

speed=10G

Circuit

speed=10G

Circuit

FBNet: Modeling the NetworkValue

PR1PSWa

10G

10Get1/1

et1/2

et2/1

et3/1

ae0 ae12001::1 2001::2

eBGP session

Linecard

Circuit

Page 17: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

name=PSWaslot=1

model=Xdevice=

name=et1/1linecard=

agg_interface=

name=et1/2agg_interface=

linecard=

name=ae0

prefix=2001::1interface=

a_prefix=z_prefix=

NetworkswitchLinecard

PhysicalInterface

PhysicalInterface

AggregatedInterface

V6Prefix

BgpV6Session

a_endpoint=z_endpoint=speed=10G

Circuit

a_endpoint=z_endpoint=speed=10G

Circuit

FBNet: Modeling the NetworkRelationship

PR1PSWa

10G

10Get1/1

et1/2

et2/1

et3/1

ae0 ae12001::1 2001::2

eBGP session

Linecard

Circuit

It’s complicated

Page 18: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

FBNet Model Snippet

class PhysicalInterface(Interface):linecard = models.ForeignKey(Linecard)agg_interface = models.ForeignKey(

AggregatedInterface)

Page 19: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

FBNet Model SnippetRelated models

class PhysicalInterface(Interface):linecard = models.ForeignKey(Linecard)agg_interface = models.ForeignKey(

AggregatedInterface)

Page 20: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

FBNet Model SnippetModel inheritance

class PhysicalInterface(Interface):linecard = models.ForeignKey(Linecard)agg_interface = models.ForeignKey(

AggregatedInterface)

Page 21: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

FBNet

FBNet: ArchitectureAPI Layer

ReadAPIReadAPIReadServiceReadAPIReadAPIWriteService• RPC services

• Read: fine-grained per-model query

• Write: task-based• High Availability: Multiple

replicas per DC

Page 22: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

FBNet

FBNet: ArchitectureAPI Layer

ReadAPIReadAPIReadServiceReadAPIReadAPIWriteService• 1 primary, multiple secondary

DBs• Scalability: 1 slave per DC

Primary SlaveSlaveSecondary

ReplicationStream

Page 23: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Robotron’s management life cycle

NetworkDesign

ConfigGeneration

FBNet DB

Deployment Monitoring

Page 24: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Network DesignDesign intent à FBNet objects

Cluster(devices={

PR: DeviceSpec(hardware=“Router_Vendor1”num_devices=2)

PSW: DeviceSpec(hardware=“Switch_Vendor2”num_devices=4)

},Link_groups=[

LinkGroup(a_device=PR,z_device=PSW,pifs_per_agg=2,ip=V6)

])

Template for a POP cluster FBNet objects

BackboneRouters:2NetworkSwitches:4

Circuits:16PhysicalInterfaces:32

AggregatedInterfaces:16V6Prefixes:16

BgpV6Sessions:8

94 objectsacross7models

PR1 PR2

PSWa PSWb PSWc PSWd

Page 25: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Config GenerationFBNet objects à Device configs

PR1 PR2

PSWa PSWb PSWc PSWd

FBNet

FBNet objects

Per-deviceobjects

Vendoragnostic

Config Schema

PR1 PSWa

PSWc

PSWb

PSWd

PR2

struct Device {1: list<AggregatedInterface> aggs,

}struct AggregatedInterface {1: string name,2: i32 number,3: string v4_prefix,4: string v6_prefix,5: list<PhysicalInterface> pifs,

}struct PhysicalInterface {1: string name,

}

Page 26: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Config GenerationFBNet objects à Device configs

Vendor 1 Vendor 2

Config Schema

interfacetemplate

BGPtemplate

MPLStemplate…

interfacetemplate

BGPtemplate

MPLStemplate…

PR1 PR2

PSWa PSWb PSWc PSWd

FBNet

PR1 PSWa

PSWc

PSWb

PSWd

PR2

FBNet objects

Per-deviceobjects

Vendoragnostic

PR1 config

PR2 config

PSWa config PSWb config

PSWc config PSWd config

Vendor-specificDeviceConfigs

VendorSpecific

{% for agg in device.aggs %}interface {{agg.name}}mtu 9192no switchportload-interval 30{% if agg.v4_prefix %}ip addr {{agg.v4_prefix}}{% endif %}{% if agg.v6_prefix %}ipv6 addr {{agg.v6_prefix}}{% endif %}no shutdown

!{% endfor %}

Page 27: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Config GenerationFBNet objects à Device configs

Vendor 1 Vendor 2

Config Schema

interfacetemplate

BGPtemplate

MPLStemplate…

interfacetemplate

BGPtemplate

MPLStemplate…

PR1 PR2

PSWa PSWb PSWc PSWd

FBNet

PR1 PSWa

PSWc

PSWb

PSWd

PR2

FBNet objects

Per-deviceobjects

Vendoragnostic

PR1 config

PR2 config

PSWa config PSWb config

PSWc config PSWd config

Vendor-specificDeviceConfigs

VendorSpecific

{% for agg in device.aggs %}interface {{agg.name}}mtu 9192no switchportload-interval 30{% if agg.v4_prefix %}ip addr {{agg.v4_prefix}}{% endif %}{% if agg.v6_prefix %}ipv6 addr {{agg.v6_prefix}}{% endif %}no shutdown

!{% endfor %}

Page 28: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Config GenerationFBNet objects à Device configs

Vendor 1 Vendor 2

Config Schema

interfacetemplate

BGPtemplate

MPLStemplate…

interfacetemplate

BGPtemplate

MPLStemplate…

PR1 PR2

PSWa PSWb PSWc PSWd

FBNet

PR1 PSWa

PSWc

PSWb

PSWd

PR2

FBNet objects

Per-deviceobjects

Vendoragnostic

PR1 config

PR2 config

PSWa config PSWb config

PSWc config PSWd config

Vendor-specificDeviceConfigs

VendorSpecific

{% for agg in device.aggs %}interface {{agg.name}}mtu 9192no switchportload-interval 30{% if agg.v4_prefix %}ip addr {{agg.v4_prefix}}{% endif %}{% if agg.v6_prefix %}ipv6 addr {{agg.v6_prefix}}{% endif %}no shutdown

!{% endfor %}

Page 29: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Config GenerationFBNet objects à Device configs

Vendor 1 Vendor 2

Config Schema

interfacetemplate

BGPtemplate

MPLStemplate…

interfacetemplate

BGPtemplate

MPLStemplate…

PR1 PR2

PSWa PSWb PSWc PSWd

FBNet

PR1 PSWa

PSWc

PSWb

PSWd

PR2

FBNet objects

Per-deviceobjects

Vendoragnostic

PR1 config

PR2 config

PSWa config PSWb config

PSWc config PSWd config

Vendor-specificDeviceConfigs

VendorSpecific

{% for agg in device.aggs %}interface {{agg.name}}mtu 9192no switchportload-interval 30{% if agg.v4_prefix %}ip addr {{agg.v4_prefix}}{% endif %}{% if agg.v6_prefix %}ipv6 addr {{agg.v6_prefix}}{% endif %}no shutdown

!{% endfor %}

Page 30: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• # of FBNet model change?• # changed FBNet objects per design change?• Frequency and size of config change?

Usage Statistics

Page 31: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

FBNet Model ChangesHow much does FBNet model change over time?

• Still many changes over time• Reasons: new models, values, relationships

Page 32: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Design ChangesHow many FBNet object are changed per design change?

0

0.25

0.5

0.75

1

1 10 100 1,000 10,000

CD

F ac

ross

des

ign

chan

ges

# of FBNet objects

AllInterface

Circuitv6 Prefixv4 Prefix

Device

0

0.25

0.5

0.75

1

1 10 100 1,000 10,000C

DF

acro

ss d

esig

n ch

ange

s

# of FBNet objects

AllInterface

Circuitv6 Prefixv4 Prefix

Device

POP/DC

Backbone

Page 33: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Design ChangesHow many FBNet object are changed per design change?

0

0.25

0.5

0.75

1

1 10 100 1,000 10,000

CD

F ac

ross

des

ign

chan

ges

# of FBNet objects

AllInterface

Circuitv6 Prefixv4 Prefix

Device

0

0.25

0.5

0.75

1

1 10 100 1,000 10,000C

DF

acro

ss d

esig

n ch

ange

s

# of FBNet objects

AllInterface

Circuitv6 Prefixv4 Prefix

Device

POP/DC

Backbone

• POP/DC: bigger design changes• Backbone: smaller design changes

Page 34: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• Median number of config lines changed per week• POP/DC devices: 500 lines• Backbone devices: <100 lines

• Avg number of times changes happen per week• POP/DC devices: 2.53• Backbone devices: 12.46

Configuration ChangesWhat’s the frequency and size of configuration change?

• POP/DC: few bigger config changes• Backbone: many smaller config changes

Page 35: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Evolution of RobotronBottom-up, experience driven

2008 2009 2010 2011 2012 2013 2014 2015 2016

FBNetmodelingstarted

Activemonitoring

Passivemonitoring

BasicDeployment

Basicdesignandconfiggeneration

Robotron

Page 36: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• A new eBGP session needed a proper import policy• Robotron was used without proper support à egress link

saturated• Most development time spent on model changes

Experience: Modeling is laboriousProblem Scenario: new eBGP session configuration

• Lesson: Modeling is hard• Open problem: Lack of a network model

widely accepted by vendors

Page 37: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

1. An engineer updated FBNet to add a new rack, but forgot to generate config

2. The engineer pushed stale config3. The rack added never came online

Experience: Coupling changes is keyProblem Scenario: POP cluster switch turnup

• Lesson: Network design, config generation and deployment should be tightly coupled

• Open problem:• Atomicity• Conflict resolution

Page 38: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• Engineer bypassed Robotron to manually configure devices• SSH into device• Make config change• Log out

• Needed upon emergencies• Passively curtail with config monitoring

Experience: Fallback is importantProblem Scenario: Robotron-less management

• Lesson: Bypassing mechanism is needed • Open problem:• How to reliably account for such

activities?• How to safely revert such activities?

Page 39: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• First work sharing experience on a production network management system• Open research problems:• Network modeling• Atomicity and conflict resolution across management tasks• Make network management system work with manual fallback

mechanisms

Conclusion

Page 40: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Questions?

[email protected]• Poster session on Thursday

Page 41: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• Irregular, demand-driven topology• PRs/DRs form an iBGP

mesh• Common tasks:• Add/migrate circuits• Add/remove

BBs/PRs/DRs

Overview of Facebook’s NetworkBackbone: Interconnecting POPs/DCs

BB

BBBB

BB

BB

BB

BB

PR1

PR2

To POPs & Internet

DR1

DR2

To DCs

Page 42: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• Standardized topology• Services: LB (Proxygen),

Cache• Common tasks• Build/upgrade a cluster• Provisioning new peering

circuits

Overview of Facebook’s NetworkPoint of Presence (POP)

Internet

PR1

BB1 BB2

PR2

POPClusters

Page 43: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• Standardized topology• Services: Web, Cache (TAO),

Database• Common tasks• Build/decomm a cluster• Cluster capacity upgrade

Overview of Facebook’s NetworkData Center

DR1

BB3 BB4

DCClusters

DR2

Page 44: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

FBNet: Modeling the NetworkObject, Value, and Relationship

PR1PSWa

10G

10Get1/1

et1/2

et2/1

et3/1

ae0 ae12001::1 2001::2

eBGP session

Linecard

Circuitname=PSWa

slot=1model=Xdevice=

name=et1/1linecard=

agg_interface=

name=et1/2agg_interface=

linecard=

name=ae0

prefix=2001::1interface=

a_prefix=z_prefix=

NetworkswitchLinecard

PhysicalInterface

PhysicalInterface

AggregatedInterface

V6Prefix

BgpV6Session

a_endpoint=z_endpoint=speed=10G

Circuit

a_endpoint=z_endpoint=speed=10G

Circuit

Page 45: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Dependencies between FBNet models

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30CD

F a

cro

ss m

od

els

# of related models

Page 46: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• Manual config changes on devices are error-prone• Ideal: All changes made through Robotron• Reality: Robotron has latency, bugs and missing features. Quick fixes

needed upon emergency• Alternatives to discourage manual changes:• Config monitoring• Automatic config override after emergency window

Experience: Fallback is neededProblem Scenario: manual changes to devices

Page 47: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• Bottom-up config analysis: [Benson11,Sung09,Kim11,…]• Abstraction-driven design and config generation:• Top down config optimization: [Condor, Sun13]• Centralized platform for network management: [Onix,

Statesman]• Template based config generation: [Enck09]• Config modeling: [OpenConfig, DMTF]

Related Work

Page 48: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

FBNet

Desired

FBNet: Modeling the NetworkDesired versus Derived

A

B C

Derived

A

B C=?

Page 49: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• New device: full config replacement• Existing devices: Incremental “Live” updates• Dryrun, Atomic, Phased, etc

DeploymentDevice configs à Devices

Page 50: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

• Passive monitoring• Active monitoring• Config monitoring

MonitoringIs the network healthy?

Page 51: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y.

Recommended