Networking Best Practices for Hyper-Converged Infrastructure
2
What to expect
• Basic overview of Networking Best Practices for Hyper-converged infrastructure (HCI)
• Basic overview of some fabric related decisions
• Targeted at Networking “Experts”
• Detailed or expert level overview of HCI
• In depth analysis of do’s/don’t’s of HCI
WHAT IT IS WHAT IT IS NOTWHAT IT IS WHAT IT IS NOT
3
Agenda
• Dell EMC HCI Portfolio
• Networking Foundations
• Key Networking Decisions for HCI Environments
• Networking Items for specific HCI Environments
• Where to Learn More
Networking Foundations and HCI Solutions
5
Focusing on Hyper-converged infrastructure (HCI)
• Per IDC, hyper-converged systems collapse core storage and compute functionality into a single, virtualized solution. A key characteristic of hyper-converged systems that differentiates from other integrated systems is their ability to provide all compute and storage functions through the same server-based resources.
• Specific to the Dell EMC portfolio, hyper-converged infrastructure (HCI) includes:
* IDC Worldwide Converged Systems Tracker - Mar 2016
VxRail and vSAN(VxRail, vSAN Ready Nodes and
Bundles)
XC Series ScaleIO(SW only, Ready Nodes and Bundles)
6
Leaf 2Leaf 1 VLT
Node nServer
Leaf 4Leaf 3 VLT
Node nServer
Leaf 6Leaf 5 VLT
Node nServer
Spine 2Spine 1
Router 2Router 1
North-South East-West
Edge/Core
Spine
Leaf
Host
• North - South = Request and Response to the outside world
• East - West = Sharing resources inside the data center
Is your Data Center optimized for the right type of traffic ?
7
Leaf-Spine: Intro and Advantages
• Each leaf connects to each spine. The topology is fully non-blocking, or more accurately non-interfering
• Always one hop away from the next leaf switch. Minimizes latency and reduces bottlenecks
• Scalable. Can be scaled either increasing port count density or by adding additional stages.
Three-Tier Hierarchical Network Leaf-Spine Network
8
L2/L3 Boundary: Spine or Leaf?
Rack 1
Spine 1 Spine 2
Leaf 1BLeaf 1A
Node n10GbE
Server-1
Node n10GbE
Server-3
Node n10GbE
Server-2
Man
agem
ent N
etw
ork
LAN Traffic
Management
Std Uplinks in VLT
Com
pute
L3L2
Rack 1
Spine 1 Spine 2
Leaf 2Leaf 1
10GbE
Server-1
Node n10GbE
Server-3
10GbE
Server-2
Man
agem
ent N
etw
ork
LAN Traffic
Management
Point-to-Point Interfaces
Com
pute
ECMP
L3L2
9
L2/L3 Boundary: At the Spine
Rack 1
Spine 1 Spine 2
Leaf 1BLeaf 1A
Node n10GbE
Server-1
Node n10GbE
Server-3
Node n10GbE
Server-2
Man
agem
ent N
etw
ork
LAN Traffic
Management
Std Uplinks in VLT
Com
pute
L3L2• ToR is referred to as a leaf switch with the
core referred to as spine switch.
• All paths are non-blocking and forwarding.
• Scalability is limited to a single pair of spine switches.
• Ideally suited for any protocol that only works in the data-link layer (e.g. DCB, and FCoE)
10
L2/L3 Boundary: At the Leaf• Leaf switches act as gateway for each
rack.
• Results in all inter-rack flows being routed vs. switched.
• No dependency on STP (spanning tree protocol)
• Minimized broadcast traffic to each rack.
• ECMP (Equal Cost Multi-Path)
Rack 1
Spine 1 Spine 2
Leaf 2Leaf 1
10GbE
Server-1
Node n10GbE
Server-3
10GbE
Server-2
Man
agem
ent N
etw
ork
LAN Traffic
Management
Point-to-Point Interfaces
Com
pute
ECMP
L3L2
11
Equal Cost Multipathing (ECMP)
• Routing equivalent of port channel configuration.
• Multiple best paths are inserted into the routing table for any given destination.
Rack 3Rack 2Rack 1
Leaf 6Leaf 5Leaf 4
Spine 1 Spine 2
Leaf 3 L3L2Leaf 2Leaf 1
ECMP
12
L2/L3 Boundary Decision
• Boundary at the leaf pair reduces STP interaction/complexity
• At the leaf pair - the broadcast domain is contained at the rack level
• At the Spine - fits to the comfort level of most networking admins
• At the Spine – allows for lower latency L2 traffic to traverse between the racks
• When in doubt place boundary at the Spine
At Leaf At Spine
Some protocols require it
Lower latency (but not by much)
Comfort level of most admins
Each rack is an island
Broadcast domain reduced
Fewer Spanning Tree interactions
13
The Routing Decision
• OSPF: Open Shortest Path First
• BGP: Border Gateway Protocol (EBGP preferred)
• In the majority of HCI use cases, there is no right or wrong answer to this decision
• The probability of having internal resources that understand OSPF is higher – which should play a role in your decision.
• For VERY large deployments (>4k nodes) there is a clear recommendation: BGP
OSPF BGP
Ease of Troubleshoot
Reduced CPU Utilization (at
scale)
Scalability
Widespread Knowledge
Configuration Ease
Topology Table
14
MTU: 1,500 bytes1,500 1,500 1,5001,500 1,500 1,500 1,500 1,500 1,5001,500 1,500
MTU: 9,000 bytes9,000 9,000
Jumbo Frames Decision
Jumbo frame must be enabled in the entire network– Physical switches– Virtual switches– Physical or virtual hosts– Storage
1,500
Total number of bytes supported in the payload of the transmission
ESXi-1
Leaf 32Leaf 31Leaf 2
Spine 1 Spine 2 Spine 3 Spine 4
Leaf 1
Node nESXi-2
VM
1
3
5
6 7
VLT8
9 10
2 11
12
VLT
4
13
Frame MTU
Benefits of Jumbo
Remember
If Done Correctly
+10%Throughput*
-10%CPU Utilization*
*Source: Ethernet Alliance http://www.ethernetalliance.org/wp-content/uploads/2011/10/EA-Ethernet-Jumbo-Frames-v0-1.pdf
15
Multicast – Internet Group Management Protocol (IGMP) Snooping • Multicast is a one to many network
communication that is goes to a specific group of computers.
• IGMP (Internet Group Management Protocol) is used to establish group memberships for multicast.
• HCI solutions use this as a method for nodes to join and communicate within the cluster.
• vSAN and VxRail requirements for IPv4 and IPv6 Multicast
• IGMP snooping allows the networking switch to listen to IGMP traffic and create a table of which devices are participating in the multicast group
16
Leaf-Spine: Virtual Link Trunking (VLT)
• Virtual Link Trunking (VLT) – This is the Dell EMC layer 2 aggregation protocol that allows a single device to use a link aggregation group (LAG) across two physical devices treating them as one logical device.
• Why we recommend it: Redundancy, Resiliency, Active-Active path• HCI solutions that need active passive separation for initial deployment – VxRail, VSAN
ready nodes and bundles, and XC Series
Rack 3Rack 2Rack 1
Leaf 2B Leaf 3BLeaf 3A
Spine 1 Spine 2
Leaf 2ALeaf 1BLeaf 1A
Rack 4
VLT VLTVLT
17
Are you willing to risk your network by not having a physically separate management network ?
MGMT
Stack-ID
LNK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ACT 50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 51 53
Compute cluster Physical Wiring View
Stack-ID
LNK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ACT 50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 51 53
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 321 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 51 5249 50
QSFP+
29 31
30 32
25 27
26 28
21 23
22 24
17 19
18 20
13 15
14 16
9 11
10 12
5 7
6 8
1 3
2 4
Stack ID
QSFP+
29 31
30 32
25 27
26 28
21 23
22 24
17 19
18 20
13 15
14 16
9 11
10 12
5 7
6 8
1 3
2 4
Stack ID
Spine Pair (S6010, Z9100)
Leaf Pair (S4048(t),
S6010,Z9100)
MGMT(S3048, S4048T)
10GbER730
10GbER730
10GbER730
10GbER730
Leaf Leaf
Spine Spine
MGMT
Logical configuration
Physical configuration
Management Network
18
Bringing it all together – Fabric Decisions for HCI
Decision Recommendation CommentL2/L3 Boundary
At Spine
Routing Protocol
OSPF or BGP Unless very large, then BGPEBGP preferred if BGP
JumboFrames
Stick with default in most cases
Fine tune for Jumbo after deployment
ManagementNetwork
Dedicated OOB
VLT/LAG Utilize Deployment may need this off
Solution Specific Guidance
20
Networking specifics for each HCI solution
VxRail
XC Series
vSAN Nodes, Bundles
ScaleIO Nodes, Bundles
• Same requirements as vSAN with additional added features
• IPv4 and IPv6 Multicast/MLD (Multicast Listener Discovery)
• Verification of no Link Aggregation Control Protocol (LACP) settings on the ports connecting to the appliance
• vSAN 6.6 removes multicast requirements
21
Networking specifics for each HCI solution
VxRail
XC Series
vSAN Nodes, Bundles
ScaleIO Nodes, Bundles
• Verification of IPv4 IGMP snooping not enabled for the appliance ports
• Spanning Tree Protocol (STP) should be disabled or at a minimum set for edge devices
• IPv6 Multicast Listener Discovery (MLD) is enabled
22
Networking specifics for each HCI solution
VxRail
XC Series
vSAN Nodes, Bundles
ScaleIO Nodes, Bundles
• Same requirements as VXRAIL with fewer software features
• IPv4 Multicast• Verification of no Link Aggregation
Control Protocol (LACP) settings on the ports connecting to the appliance
• vSAN 6.6 removes multicast requirements
23
Networking specifics for each HCI solution
VxRail
XC Series
vSAN Nodes, Bundles
ScaleIO Nodes, Bundles
• Meta Data Manager (MDM) Nodes need to have very low latency cross node connectivity for communications around cluster management
24
Dell EMC Networking Data Center Portfolio
Fabric Spine
Z-Series
• Purpose-built 40/100GbE fabric switches for modern data center architectures
• Optimized for cost-effective non-blocking performance
Blade IO
MXL for M1000e
FN-IOM for FX2
• 10GbE & 40GbE I/O solutions for M1000e & FX2 deployments
• Optimized for east-west traffic patterns with native local switching capabilities
Top-of-Rack/Leaf
S-Series
• Complete connectivity—1/10/25/40/50/100GbE
• Optimized for diverse virtualization and SDN environments
Top of Rack / Leaf Spine / Core Blade IO
Resources & Where To Learn More
26
Where to learn more, details on today’s sessionDell EMC TechCenter Networking Guides:http://en.community.dell.com/techcenter/networking/p/guides
Dell EMC TechCenter Networking Wiki:http://en.community.dell.com/techcenter/networking/
Dell.com Network Hardware and Devices:http://www.dell.com/us/business/p/networking-products?~ck=anav
Dell EMC Support for Networking manuals, software (firmware, updates) and general support:http://www.dell.com/support/home/us/en/04/Products/ser_stor_net/networking
Easy Google Searches to reach these locations: “Dell EMC TechCenter”“Dell EMC Networking”
27
Ansible Resources
http://ansible-dellos-docs.readthedocs.io/en/latest/
https://galaxy.ansible.com/Dell-Networking/
https://github.com/Dell-Networking/ansible-dellos-examples
Ease your deployment with Ansible scripts
Thank You