+ All Categories
Home > Documents > Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... ·...

Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... ·...

Date post: 12-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
8
A Scalable, Commodity Data Center Network Architecture Al9Fares, Loukissas, Vahdat, "A Scalable, Commodity Data Center Network Architecture," Proc. of ACM SIGCOMM '08, 38(4):63974, Oct. 2008. Presenter: William Beyer Paper Goals Point out faults with current data center designs Propose new architecture based on fat9tree Scalable interconnecUon bandwidth Economies of scale Backward compaUbility A Typical Data Center Data center topology is typically 293 level tree of switches and routers OversubscripUon RaUo of worst9case achievable aggregate bandwidth among end9hosts to the total bisecUon bandwidth of the network topology Ability of hosts to fully uUlize their uplink capacity 1:1 – All hosts can use full uplink capacity 5:1 – Only 20% of host bandwidth may be available Typical raUo is 2.5:1 (400 Mbps) to 8:1 (125 Mbps)
Transcript
Page 1: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

A"Scalable,"Commodity"Data"Center"Network"Architecture"

Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"Data"Center"Network"Architecture,""Proc."of"ACM"

SIGCOMM"'08,"38(4):63974,"Oct."2008.""

Presenter:"William"Beyer"

Paper"Goals"

•  Point"out"faults"with"current"data"center"designs"

•  Propose"new"architecture"based"on"fat9tree"– Scalable"interconnecUon"bandwidth"– Economies"of"scale"– Backward"compaUbility"

A"Typical"Data"Center"

•  Data"center"topology"is"typically"293"level"tree"of"switches"and"routers"

OversubscripUon"

•  RaUo"of"worst9case"achievable"aggregate"bandwidth"among"end9hosts"to"the"total"bisecUon"bandwidth"of"the"network"topology"– Ability"of"hosts"to"fully"uUlize"their"uplink"capacity"

•  1:1"–"All"hosts"can"use"full"uplink"capacity"•  5:1"–"Only"20%"of"host"bandwidth"may"be"available"

•  Typical"raUo"is"2.5:1"(400"Mbps)"to"8:1"(125"Mbps)"

Page 2: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

MulU9path"RouUng"

•  “MulU9rooted”"tree"required"to"communicate"at"full"bandwidth"for"large"clusters"– Otherwise"limited"to"max"bandwidth"of"a"single"expensive"switch"(1289port"10"GigE)"

•  Use"mulU9path"rouUng"technique"such"as"ECMP"– Performs"staUc"load"splidng,"cannot"account"for"flow"sizes"

– RouUng"tables"become"very"large"with"mulUple"paths"

Cost"Analysis"

Cost"Analysis" Fat9tree"Architecture"

•  k9ary"fat9tree:"three9layer"topology"(edge,"aggregaUon,"core)"–  k"pods,"each"consists"of"(k/2)2"hosts"and"two"layers"(edge/aggregate)"each"with"k/2"k9port"switches"

–  Each"edge"switch"connects"to"k/2"hosts"and"k/2"aggregate"switches"

–  Each"aggregate"switch"connects"to"k/2"edge"and"k/2"core"switches"

–  (k/2)2"core"switches:"each"connects"to"k"pods"–  Supports"k3/4"hosts!"

Page 3: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Fat9tree"Topology"with"k"="4" Issues"with"Fat9tree"Topologies"

•  Backwards"compaUble"with"IP/Ethernet"– Good"thing,"but"rouUng"algorithms"will"naively"choose"a"single"shortest"path"to"use"between"subnets"

– Leads"to"boilenecks"quickly"–  (k/2)2"shortest"paths"available,"should"use"them"all"equally"

•  Complex"wiring"due"to"lack"of"high"speed"ports"

Addressing"in"Fat9tree"

•  Use"10.0.0.0/8"private"addressing"block"•  Pod"switches"have"address"10.pod.switch.1"– Pod"and"switch"in"[0,"k91]"based"on"posiUon"

•  Core"switches"have"address"10.k.j.i"–  i"and"j"denote"core"posiUon"in"(k/2)2"core"switches"

•  Hosts"have"address"10.pod.switch.ID"–  ID"is"host"ID"in"switch"subnet"([2,"(k/2)"+"1])"– k"<"256,"this"scheme"does"not"scale"indefinitely"

Two9Level"Lookup"Table"

•  Prefixes"used"for"forwarding"intra9pod"traffic"•  Suffixes"used"for"forwarding"inter9pod"traffic"

Page 4: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Two9Level"Lookup"ImplementaUon"

•  Implemented"in"hardware"using"a"TCAM"– Can"perform"parallel"lookups"across"table"– Stores"don’t"care"bits,"suitable"for"storing"variable"length"prefixes"

•  Prefixes"preferred"over"suffixes"

RouUng"Algorithm"

•  Prefixes"in"two9level"table"prevent"intra9pod"traffic"from"leaving"pod"

•  Inter9pod"traffic"handled"by"suffix"table"– Suffixes"based"off"host"IDs,"ensures"spread"of"traffic"across"core"switches"

– Prevents"packet"reordering"by"having"staUc"path"•  Each"host9to9host"communicaUon"has"a"single"staUc"path"– Beier"than"having"a"single"path"between"subnets"

RouUng"Algorithm"(cont.)"

•  Core"switches"contain"(10.pod.0.0/16,"port)"entries"–  StaUcally"forwards"inter9pod"traffic"on"specified"port"

•  Aggregate"switches"contain"(10.pod.switch.0/24,"port)"entries"–  Switch"value"is"the"edge"switch"number"

•  Assumes"a"central"enUty"with"full"knowledge"of"topology"generates"these"rouUng"tables"– Also"responsible"for"detecUng"switch"failures"and"re9rouUng"traffic"

RouUng"Algorithm"Example"

Page 5: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Dynamic"RouUng"Techniques"•  AlternaUves"to"two9level"rouUng"table"– Aiempt"to"classify"and"schedule"flows"rather"than"use"staUc"rouUng"

•  Flow"ClassificaUon"–  Periodically"reassigns"flow"output"ports"–  Prevents"compeUUon"between"flows"for"a"single"port"

•  Flow"Scheduling"–  IdenUfy"large"flows"and"establish"reserved"paths"for"them"

–  Requires"communicaUon"between"edge"switches"and"a"central"flow"scheduler"

Fault"Tolerance"

•  Many"possible"paths"between"hosts"leads"to"“easy”"fault"tolerance"

•  Each"switch"maintains"BidirecUonal"Forwarding"DetecUon"session"with"neighbors"– Allows"switch"to"determine"when"neighbors"fail"

•  Two"primary"types"of"link"failure"– Between"lower"and"upper"switches"– Between"upper"and"core"switches"

Router"Power"and"Heat"DissipaUon" Topology"Power/Heat"DissipaUon"

Page 6: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Cafarella,"2013"

Hamilton,"2008"

Cafarella,"2013"

Emerson"Network"Power,"2007"

Cafarella,"2013"

EPA,"2007"

Sotware"ImplementaUon"

•  Validated"in"sotware"using"Click"– Click"is"a"modular"sotware"router"architecture"–  Implement"routers"on"PCs,"supports"experimental"router"designs"

•  Click"modules"called"“elements”"– Each"element"performs"a"specified"task"– RouUng"table"lookup,"decrement"packet"TTL,"etc…"

•  Implemented"elements"for"two9level"table,"flow"classifier,"and"flow"scheduler"

Page 7: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

EvaluaUon"Setup"

•  Uses"a"49port"fat9tree"as"seen"previously"– Two9level"table"and"flow9based"schemes"analyzed"– Compared"against"hierarchical"tree"with"oversubscripUon"raUo"of"3.6:1"

•  Both"evaluated"using"Click"– Emulate"switches"and"hosts"on"PCs"

•  All"hosts"generate"96"Mbit/s"of"outgoing"traffic"– This"value"prevents"CPU"from"throiling"test"

EvaluaUon"Results"•  Percentages"indicate"aggregate"network"bandwidth"– Measured"as"amount"of"incoming"traffic"received"by"hosts""

Flow"Scheduler"Requirements"

•  Minimal"Ume"and"memory"requirements"for"flow"scheduler"

•  Feasible"to"use"at"least"unUl"k"grows"extremely"large"

Packaging"Problem"

•  Fat9tree"has"significant"cabling"overhead"– 1"GigE"switches"used"to"reduce"cost"– Lack"of"10"GigE"ports"leads"to"more"cabling"

•  Present"a"packaging"soluUon"for"k=48"– Generalizes"to"other"values"of"k"

Page 8: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Packaging"SoluUon" Strengths"

•  Fat9tree"architecture"seems"to"outperform"hierarchical"soluUon"

•  Excellent"power"and"heat"reducUons"over"hierarchical"approach"

•  EvaluaUon"methods"were"good"overall"with"tests"performed"

•  Data"centers"can"easily"switch"to"this"new"method"

Weaknesses"

•  Language"used"in"paper"was"confusing"at"Umes"– Referred"to"pod"switches"as"“aggregate"switch”,"“upper9layer"switch”,"and"“upper"pod"switch”"at"various"points"

•  EvaluaUon"performed"with"small"value"of"k=4"– Would"have"been"nice"to"see"higher"values"of"k"tested"

– Academic"project"and"resources"were"obviously"a"factor"for"evaluaUon"

References"•  Al9Fares,"Loukissas,"Vahdat,""

A"Scalable,"Commodity"Data"Center"Network"Architecture,""Proc.&of&ACM&SIGCOMM&'08,"38(4):63974,"Oct."2008."

•  Cafarella,"M."(2013,"April"20)."Datacenters."EECS&485."Lecture"conducted"from"University"of"Michigan,"Ann"Arbor."

•  "Energy"Efficient"Cooling"SoluUons"for"Data"Centers.""Emerson&Network&Power."2007"Web."28"Oct."2013."<hip://www.emersonnetworkpower.com/documents/en9us/latest9thinking/edc/documents/white%20paper/energy_efficient_cooling_soluUons_for_data_centers.pdf>."

•  Hamilton,"James.""PerspecUves"9"Cost"of"Power"in"Large9Scale"Data"Centers.""Perspec>ves&@&James&Hamilton's&Blog."N.p.,"28"Nov."2008."Web."28"Oct."2013."<hip://perspecUves.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx>."

•  "Report"to"Congress"on"Server"and"Data"Center"Energy"Efficiency.""Energy&Star."U.S."Environmental"ProtecUon"Agency,"2"Aug."2007."Web."28"Oct."2013."<hip://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf?db729bf5a>."


Recommended