Date post: | 14-Jun-2015 |
Category: |
Software |
Upload: | salvorlando |
View: | 410 times |
Download: | 1 times |
Can you trust Neutron?A tour of scalability and reliability improvements from Havana to Juno
Salvatore Orlando (@taturiello)Aaron Rosen (@aaronorosen)
From Havana to Juno
● 12 months● 1672 commits● +147765 -70127 lines of code
(excluding changes in neutron/locale/*)
But... did it really get any better?
Measuring scalability - Process
● Goal: Validate agent scalability under varying loado In this talk we’ll discuss the L2 agent only, sorry!
● Testbed: single server OpenStack installation
● Methodology: run several experiments increasing
the number of servers concurrently createdo Number of servers ranging from 1 to 20o Every experiment is repeated 20 timeso For each metric, study mean, median, and variance
Measuring scalability - Metrics
Instance metrics (t_start = instance created):● t_active - time until the instance reaches active state● t_ping - time until the instance can be pinged● t_allocate_net - time spent configuring networking for instance
Port metrics (t_start = VIF plugged):● t_proc: time until the agent start processing the port● t_up: time until the port is wired● t_dhcp: time for adding DHCP info for the new port
Measuring scalability - Results
t_up in Havana and Juno - a rather remarkable difference!
Measuring scalability - Resultst_allocate_net almost constant in Juno
Growth trend is only 15% of the one seen in Havana
Measuring scalability - results● VM failure rate
analysiso Failure == error while
creating VM or unable to ping within 3 min timeout
● Juno is infallible decently reliable (Havana not as much…)
Analysing progress
FolsomGrizzly Havana
IcehouseJuno
>>>>
>>
<<
How the software improved
● Boot VMs only once network is wired
● Remove choke points from L2 agents
● Streamline security group RPC
● Better router processing in L3 agents
● Reporting floating IP processing status
● many others… which unfortunately won’t fit into the time
allocated to this talk
More results
● Virtually no improvements in time to ping an instance
- As the tests are executed on a single host IO contention between instances is the main bottleneck.
- “Time to ping” is slowed down by longer instance boot times
● Instances are slower to go to “ACTIVE” then they were in Havana
- This is actually a desired feature
- Indeed it’s the reason for which failure rate in Juno is 0 even with 20 concurrent instances
Nova/Neutron Event reporting
Problem: Nova displays cached IPAM info about instance from neutron. Cache is updated slowly…
nova-api
neutron-api1. Associate floating IP to port
2. Show me instance!
Wat? No floating ip?
Nova/Neutron Event reportingSolution: Neutron sends events to nova on IPAM changes causing nova to update its cache.
neutron-api1. Associate floating IP to port
nova-api
2. network-changed for instance X
nova-compute3. dispatch event to compute host
4. update_network cache for instance X
5. Show me instance!
I haz floating ip
Nova/Neutron Event reportingProblem: Instances would go active before network was wired. Some dhcp clients (as the one in cirros images) doesn’t continue retrying...
nova-api1. Boot instance
W00T Active!
Timeout.. Hrm?!?
2. Ready?!?
3. ssh instance…..
Nova/Neutron Event reportingSolution:Neutron sends events to nova on when network is ready.
nova-api
1. Boot instance
nova-scheduler nova-compute
VM
3. Started in paused state
neutron-api
2B. event: network-vif-plugged: port X
VM
Neutron Backend
2. Allocate network for instance
3B. unpaused
1B. Port X active
Enabling/disabling event reporting
Settings in nova.conf
vif_plugging_timeout = 300vif_plugging_is_fatal = True
Speeding up L2 interface processing
Problem - device processing delayed by:- inefficient server/agent interface- preemptive behaviour of security group callbacks- pedantic polling of interfaces on integration bridge- superficial analysis of devices to process
Solution:- ovsdb-monitor triggers interface processing only when changes are detected- Neutron server perform at most 2 RPC call over AMQP for each API operation
- only 1 call in most cases- The L2 agent queries the server only once for retrieving interface detail- Security group updates are processed in the same loop as interface, thus avoiding starvation.- The agent only processes interfaces which are ready to be used - and most importantly
processes them only once!
Streamlining security group RPCs
Problem - exponential complexityThe payload of the RPC call to retrieve security group rules grows exponentially when the number of devices increases
Solution:Restructure the format of the payload exchanged between agent and server, removing data redundancy.With the new payload format, security group rules are not repeated anymore.
Streamlining security group RPCs
Credits: Miguel Angel Ajo Pelayohttp://www.ajo.es/post/95269040924/neutron-security-group-rules-for-devices-rpc-rewrite
RPC message payload size vs # of ports RPC execution time vs # of ports
Reducing router processing times
Problems:● Router synchronization starves RPC handling● Not enough parallelism in router and floating IP processing
Solution:● Router synchronization tasks and RPC messages are added to a priority
queue. Items pulled from the queue are processed in separate threads.● Apply iptables command in a non blocking fashion
Know your floating IP status
Problem:There was no way to know whether your floating IP is ready or not(beyond pinging it, obviously)
Solution:- Introducing the concept of operational status for floating IPs.- The L3 agent calls back the server to confirm successful floating IP creation (ACTIVE), or an
error (DOWN)- The state defaults to DOWN. Goes ACTIVE upon floating IP association, and DOWN when the
floating IP is disassociated.
Other enhancements (in brief)
● Multiple REST API workers
● Multiple RPC over AMQP workers
● Better IP address recycling
● Removal of several locking queries
o ie: LOCK FOR UPDATE statements
● Removal of conditions triggering LOCK WAIT timeout errors
o bug triggered by eventlet yielding within a transaction
Where we are...● The L2 agent scalability considerably improved over the past 12 months
o Results measured with OVS only but the same considerations apply to Linux Bridge as well
● Security groups can now be used even in very large deployments
● Nova/Neutron interface much more reliableo Boot a server only when the network for it is wired
o Faster, less chatty communication
● Some progress on resource status trackingo Far from being optimal, but at least now you can now when your floating IP is
ready to use...
… and where we want to be● There is still a lot of room for improvement in the agents
o E.g.: OVS agent still scan all ports on integration bridge at each iteration
● The Nova/Neutron interface is better, but is however far from idealo Enhanced caching on the nova side can avoid a lot of round trips to neutron
● Little to nothing has been done for tracking async operation and resource status. For example:o there is no way to know whether DHCP info are ready for a port
o security group updates are processed asynchronously, but it is impossible to know when processing completes
Final thoughts
● “Much better” is different from “ideal”o ≅ 3 seconds for wiring an interface could not be ideal for many
applicationso scalability limits should be addressed even if they involve architectural
changes
● What about data plane scalability?
● What about API usability?