Evolution of OpenStack Networking at CERN · 2019-02-26 · Evolution of OpenStack Networking at...

Post on 21-Jun-2020

5 views 0 download

transcript

Evolution of OpenStack Networking at CERN

Nova Network, Neutron and SDN

Belmiro Moreira @belmiromoreirabelmiro.moreira@cern.ch

Ricardo Rocha @ahcorportoricardo.rocha@cern.ch

Founded in 1954

What is 96% of the universe made of?

Fundamental Science

Why isn’t there anti-matter in the universe?

What was the state of matter just after the Big Bang?

6

7

CELL 1

TOP

CELL 2

CELL N

Compute

GPU

Compute

Nova Network

Neutron

Neutron CapabilitiesCPU PinningHuge Pages

SMPGPU

...

ConfigurationNeutron vs Nova Network

Allowed Projects...

Moving from CellsV1 to CellsV2 at CERN, Mon 21 11:35

Scalability & Flexibility

8

CELL

NODE 2

NODE 1

VN

V2

V1

V3

V2

V1

V3

Hypervisors Virtual Machines

● Order of ~10s of cells (currently 70), with ~200 hypervisors per cell● Number of virtual machines per hypervisor varies per use case

○ From 4 to 30 VMs per hypervisor

9

CELL

NODE 2

NODE 1

VN

V2

V1

V3

V2

V1

V3S513-V-IP123

137.1XX.43.0/24( Primary Service )

S513-V-VM908188.1XX.191.0/24

( Secondary Service )

Hypervisors Virtual Machines

● Flat but segmented network, with multiple broadcast domains○ Scalability○ Segmentation done on Primary Services

● Primary Services can have multiple Secondaries● No route if Secondary is in a different Primary

○ VM IP allocation must belong to the hypervisor’s Primary

10

LanDB

Source of Truth

● All devices must be present● Used for different purposes

○ Security checks○ DNS/DHCP Configuration○ Switch/router configuration○ Active Directory, …

Primary Services Secondary Services

HypervisorsVirtual Machines

IPv4 DNSIPv6 Aliases

Aliases IPv6 Readiness Ownership ...

11

Phase 1.Nova Network

Phase 2.Neutron

Phase 3.SDN

12

Phase 1. Nova Network

● Custom NetworkManager● Late IP allocation - after scheduling to compute nodes● Patching done directly in the Nova code

● Nova Network is being deprecated...○ Quantum is the new thing… Neutron is the new thing...

NOVA COMPUTE

LanDB

NOVA DB

13

Phase 2. Neutron

● Linuxbridge, Flat / Provider networks● Better integration using ML2, mechanism driver and extensions

○ Quickly became possible to have it out of tree○ Our extensions have a similar role to Neutron Segments

● Gradual enroll, cell by cell● Vanilla upstream packages for Neutron, much smaller patch on Nova● More split pieces, potential points of failure

○ Periodic consistency checks

NOVA COMPUTE Neutron1

23

4a

LanDB

4b

https://gitlab.cern.ch/cloud-infrastructure/openstack-neutron-cern

14

Phase 2. Neutron

Subnet Cluster

Which subnets belong to this cluster?

neutron cluster-list+--------+----------------------+-------------------------------------------------------+| id | name | subnets |+--------+----------------------+-------------------------------------------------------+| ... | VMPOOL SXXXX-C-IPZZZ | ... 188.xxx.yy.zz/22 || ... | VMPOOL SBBBB-C-IPWWW | ... 137.aaa.bb.ccc/25 || | | ... 137.bbb.cc.0/25 || | | ... 137.bbb.dd.0/25 |+--------+----------------------+-------------------------------------------------------+

15

Phase 2. Neutron

Host Restrictions

Which subnets can i use for this hypervisor?

neutron host p06253927y321a1+----------------------------+--------------------------------------+| Field | Value |+----------------------------+--------------------------------------+| all_subnets | 4ca09148-32b5-4da4-95f9-35e83e2e1984 || available_random_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 || available_subnets | 4ca09148-32b5-4da4-95f9-35e83e2e1984 || least_available_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 || most_available_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 |+----------------------------+--------------------------------------+

16

Phase 2. Neutron

● Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

3 Virtual Machines→5x 64GB Virtual Machines

~default rabbit configuration

~default neutron configuration

~looking ok(ish)

< 1000 Nodes

17

Phase 2. Neutron

● Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

Cluster crashes once, crashes constantlyCannot allocate 1318267840 bytes of memory (of type "heap").

Statistics db issues→collect_statistics_interval = 60000

Agents (too) aggressively trying to reconnect→rabbit_retry_backoff = 60

Agents not re-connecting properly→restart neutron servers

Scale up Rabbit nodes, larger VMs

1200 Nodes

18

Phase 2. Neutron

● Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

Cluster crashes periodicallyLots of queued messages, until it goes

( neutron server )→rpc_thread_pool_size = 2048→rpc_conn_pool_size = 60→rpc_response_timeout = 120→rpc_workers = 4

( rabbit )→tcp_backlog: 4096→tcp_listen_options { reuseaddr: true, keepalive, true }→tcp_keepalive = true→rabbitmq_server_erl_args = '+K +A128 +P 1048576'→vm_memory_high_watermark = 0.8→ulimits (65536 for nofile/nproc soft and hard)→cluster_partition_handling = autoheal

2000 Nodes

19

Phase 2. Neutron

● Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

Cluster crashes less, but still happensLots of queued messages, until it goes

( rabbit virtual machines )→ip link set %k txqueuelength 10000

( neutron agent )→report_interval=43200

( neutron server )→agent_downtime=86500

Other Considerations ( not done, not helpful )→increase rpc_state_report_workers→heartbeat timeouts on the rabbit cluster

2400 Nodes

20

Phase 2. Neutron

● Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

Stable cluster→5x 64GB Virtual Machines

Ocasional network partitions→recovering most times, but not always→procedure for a quick cluster rebuild (~10min downtime)

~5000 Nodes

Stable cluster→5x 64GB Virtual Machines

Ocasional network partitions→recovering most times, but not always→procedure for a quick cluster rebuild (~10min downtime)

21

Phase 2. Neutron

● Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

~5000 Nodes

22

Phase 2. Neutron

● Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

~5000 Nodes

23

Phase 2. Neutron

Migrating existing cells from Nova Network

● Puppet for reconfiguration● Custom command for the live VM changes

$ openstack network cluster migrate --dry-run --host p06146676a327ab$ openstack network cluster migrate --host p06146676a327ab

$ openstack network cluster migrate --cluster ‘VMPOOL SXXXX-C-IPZZZ’

https://gitlab.cern.ch/cloud-infrastructure/python-neutronclient-cern

commands.extend([ "brctl delif %s %s" % (NOVA_BRIDGE, raw_device), "ip link set %s down" % NOVA_BRIDGE, "ip link set %s name %s" % (NOVA_BRIDGE, CERN_NETWORK_BRIDGE), "brctl addif %s %s" % (CERN_NETWORK_BRIDGE, raw_device), "ip link set %s up" % CERN_NETWORK_BRIDGE, "ip route add default via %s dev %s" % (gw, CERN_NETWORK_BRIDGE),])

for instance in instances: ip = instance.addresses['CERN_NETWORK'][0] mac = ip['OS-EXT-IPS-MAC:mac_addr'] nova_tap = nova_interfaces[mac] neutron_tap = nova_interfaces[mac] commands.extend([ "brctl delif %s %s" % (NOVA_BRIDGE, nova_tap), "ip link set %s name %s" % (nova_tap, neutron_tap), ])

24

Phase 3. SDN

● Current network deployment has significant limitations

● Limited IP Mobility○ Segmented broadcast domains○ Live migration limited to single cluster○ Ad-hoc tunnels for hardware retirement campaigns

● Hardware Repurposing○ Multiple network domains (General, Services, …)○ Services dedicated to a single domain

● No Floating IPs

● No Tenant/Private Networks

25

Phase 3. SDN

● Small prototype setups to evaluate functionality

Neutron/OpenVSwitch OpenDaylight OVN

DHCP Neutron Neutron/Built-in Built-in

Floating IPs Yes Yes Yes

Distributed Routing Only with DVR Yes Yes

Tunneling Protocols vxlan / GRE / geneve vxlan / GRE / geneve vxlan / geneve

Security Groups IPTables OpenFlow Native OpenFlow Native + Logging

Load Balancing Octavia Octavia Octavia / OVN Native

Acceleration Limited DPDK DPDK DPDK

Tracing tcpdump tcpdump ovn-trace

Physical Switch Integr. L2 / L3 L2 / L3 L2 / L3

26

Phase 3. SDN

● In the end we picked OpenContrail / Tungsten

27

Phase 3. SDN

● In the end we picked OpenContrail / Tungsten

OPENSTACK

WANGATEWAY

XMPP

BGP

CONTROLLER

NETCONF/EVPNOVSDB

CONTROLLER

HYPERVISOR

HYPERVISORPHYSICAL

PHYSICAL

VROUTER

MPLSoUDP/GRE

VXLAN

28

Phase 3. SDN

● In the end we picked OpenContrail / Tungsten

OPENSTACK

WANGATEWAY

XMPP

BGP

CONTROLLER

NETCONF/EVPNOVSDB

CONTROLLER

HYPERVISOR

HYPERVISORPHYSICAL

PHYSICAL

VROUTER

MPLSoUDP/GRE

VXLAN

Cassandra, Config, Analytics, ...https://github.com/Juniper/contrail-helm-deployer

29

Phase 3. SDN

● In the end we picked OpenContrail / Tungsten

OPENSTACK

WANGATEWAY

XMPP

BGP

CONTROLLER

NETCONF/EVPNOVSDB

CONTROLLER

HYPERVISOR

HYPERVISORPHYSICAL

PHYSICAL

VROUTER

MPLSoUDP/GRE

VXLAN

Neutron ML2 vs Monolithic

Separate Region

● Scaling Neutron was not trivial, mostly due to the agents / rabbitmq○ Deployed in production and stable

● Currently finalizing the migration from Nova Network to Neutron

● Evaluated different SDN solutions

● Ongoing work deploying Tungsten in a new Region

● Looking forward to offer Floating IPs, Private Networks and much more

Summary

30

31

Questions?

Belmiro Moreirabelmiro.moreira@cern.ch

@belmiromoreira

Ricardo Rocharicardo.rocha@cern.ch

@ahcorporto