+ All Categories
Home > Technology > 5 scalability Cloudstack Developer Day

5 scalability Cloudstack Developer Day

Date post: 19-Oct-2014
Category:
View: 1,681 times
Download: 0 times
Share this document with a friend
Description:
5 scalability Cloudstack Developer Day By Alex Huang Architect, Cloud Platforms Group, Citrix Systems Inc.
Popular Tags:
32
CloudStack Scalability Testing, Development, Results, and Futures
Transcript
Page 1: 5  scalability Cloudstack Developer Day

CloudStack Scalability Testing, Development, Results, and Futures

Page 2: 5  scalability Cloudstack Developer Day

•Secure, multi-tenant cloud orchestration platform – Turnkey platform for delivering IaaS clouds

– Hypervisor agnostic

– Highly scalable, secure and open

– Complete Self-service portal

– Open source, open standards

– Deploys on premise

Apache CloudStack: a project in incubation

Page 3: 5  scalability Cloudstack Developer Day

Router

L3 Core Switch Top of Rack

Switch

… … … … … Availability Zone 1

Servers

Management Server Cluster

Object Storage

Pod 1 Pod 2 Pod 3 Pod N

Primary MySQL

Load Balancer

Admin

Internet

Backup MySQL

Manage hosts, create VMs, virtual disks, virtual

networks, meter usage, ….

Page 4: 5  scalability Cloudstack Developer Day

Thinking about cloud orchestration at scale

• Host management

• Capacity management

• What host to use to deploy a new VM

• Failure handling

• Security group propagation

• Set a goal

Page 5: 5  scalability Cloudstack Developer Day

We can’t afford this as our QA lab

Page 6: 5  scalability Cloudstack Developer Day

User API

Admin API

Load Balancer

Mgmt. Server

Mgmt. Server

Mgmt. Server

MySQL

Zone Simulator

MySQL

Simulator enables scale testing

Mgmt. Server

Page 7: 5  scalability Cloudstack Developer Day

User API

Admin API

Load Balancer

Mgmt. Server

Mgmt. Server

Mgmt. Server

MySQL

Zone Simulator

MySQL

Environment

Mgmt. Server

2 cores, 4 with Hyper

Threading. 2.2 GHz Xeon.

16 GB RAM. 12 GB JVM

Heap. Single spinning disk, later

single SSD. 32 GB RAM.

MySQL 5.5.

Page 8: 5  scalability Cloudstack Developer Day
Page 9: 5  scalability Cloudstack Developer Day

Allocator performance is awful with 1000 hosts

• Two minutes to decide which host to use for a new VM!

• Computing capacity for every pod repeatedly

• Fixed that, but still 12 seconds to decide

• Use host tags, down to 2 seconds

• Major changes required to improve further

• In 2.2.0, store capacity info in DB, skip pod altogether

• Harness the power of SQL select and all is well

Page 10: 5  scalability Cloudstack Developer Day

Polling doesn’t scale

TRUE? FALSE? Sometimes, it is good enough

Page 11: 5  scalability Cloudstack Developer Day

Host management

• Check host state via TCP connection

• Check every minute

• 30,000 checks per minute, 500 per second

• But they take 10 seconds, so 5000 in parallel

• Not using async I/O so 5000 threads required…

• Single JVM can support 2000+ threads so this is concerning but may not be the limiting factor

Page 12: 5  scalability Cloudstack Developer Day

Host management

• What is the maximum feasible JVM heap size?

• Some people use heaps with hundreds of GB

• Commercial tools can help, but cost

• We decided to stay below 20 GB (GC concerns)

• How much CPU is required for background processing?

Page 13: 5  scalability Cloudstack Developer Day

CPU utilization while deploying 30,000 VMs on 30,000 hosts

CP

U U

tiliz

ation.

400%

is m

axim

um

Time

20,000 5000 5000

Idle

Page 14: 5  scalability Cloudstack Developer Day

Deploy time from 25,000 to 30,000 VMs S

econds t

o d

eplo

y

VM number: 25,000 plus X

Page 15: 5  scalability Cloudstack Developer Day

Problem: agent load balancing

• Management servers

start/stop/fail/crash

• How do newly started Management Servers get agents / work?

• When a Management Server exits, how do others pick up its load?

• When new hosts are added how is the load distributed?

Mgmt Server 1

Mgmt Server 2

Page 16: 5  scalability Cloudstack Developer Day

Common use case timings at scale

• 30,000 hosts and 4 Management Servers

• 4 Management Servers running, 1 fails: 10 minutes to redistribute 7500 agents

• 3 Management Servers running, add a fourth: 40 minutes to redistribute load evenly

• 0 Management Servers running, start all 4 simultaneously: 16 minutes to connect to all 30,000 hosts

IMPORTANT

Page 17: 5  scalability Cloudstack Developer Day

DB Security Group

Web Security Group

Understanding security groups

… …

Web VM

Web VM

Web VM

Web VM

DB VM

Web VM

DB VM

Web VM

Ingress Rule: Allow VMs in Web Security Group access to VMs in DB Security Group on Port 3306

Page 18: 5  scalability Cloudstack Developer Day

L3 isolation with distributed firewalls Tenant 1 VM 1

10.1.0.2

Tenant 2 VM 1

10.1.0.3

Tenant 1 VM 2

10.1.0.4

Public Internet

10.1.0.1

Public IP address 65.37.141.11 65.37.141.24 65.37.141.36 65.37.141.80

Load Balancer

L3 Core

Pod 1 L2 Switch

Pod 3 L2 Switch

10.1.16.1

… 10.1.8.1

Pod 2 L2 Switch

Page 19: 5  scalability Cloudstack Developer Day

L3 isolation with distributed firewalls Tenant 1 VM 1

10.1.0.2

Tenant 2 VM 1

10.1.0.3

Tenant 1 VM 2

10.1.0.4

Tenant 1 VM 3

10.1.16.47

Tenant 1 VM 4

10.1.16.85

Public Internet

10.1.0.1

Public IP address 65.37.141.11 65.37.141.24 65.37.141.36 65.37.141.80

Load Balancer

L3 Core

Pod 1 L2 Switch

Pod 3 L2 Switch

10.1.16.1

… 10.1.8.1

Pod 2 L2 Switch

Page 20: 5  scalability Cloudstack Developer Day

L3 isolation with distributed firewalls Tenant 1 VM 1

10.1.0.2

Tenant 2 VM 1

10.1.0.3

Tenant 1 VM 2

10.1.0.4

Tenant 2 VM 2

10.1.16.12

Tenant 2 VM 3

10.1.16.21

Tenant 1 VM 3

10.1.16.47

Tenant 1 VM 4

10.1.16.85

Public Internet

10.1.0.1

Public IP address 65.37.141.11 65.37.141.24 65.37.141.36 65.37.141.80

Load Balancer

L3 Core

Pod 1 L2 Switch

Pod 3 L2 Switch

10.1.16.1

… 10.1.8.1

Pod 2 L2 Switch

Page 21: 5  scalability Cloudstack Developer Day

One firewall per

Virtual Machine

Page 22: 5  scalability Cloudstack Developer Day

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

… VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

VM VM

VM …

One million firewalls?

Page 23: 5  scalability Cloudstack Developer Day

Well-known software scaling techniques • Message queues • Consistency tradeoffs • Idempotent configuration & retries

CloudStack uses • Special purpose queues • Optimized for large security groups • Eventual consistency for rule updates

Orchestrating hundreds of thousands of firewalls

Page 24: 5  scalability Cloudstack Developer Day

Problem: firewall rules explosion in dom0

-A FORWARD -m tcp –p tcp –dport 3060 –src 10.1.16.31 – j ACCEPT -A FORWARD -m tcp –p tcp –dport 3060 –src 10.1.45.112 – j ACCEPT -A FORWARD -m tcp –p tcp –dport 3060 –src 10.1.189.5 – j ACCEPT -A FORWARD -m tcp –p tcp –dport 3060 –src 10.21.9.77 – j ACCEPT

Performance suffers for large security groups

Allow Security Group {Web} on TCP port 3060

Page 25: 5  scalability Cloudstack Developer Day

ipset –N web_sg iptreemap ipset –A web_sg 10.1.16.31 ipset –A web_sg 10.1.16.112 ipset –A web_sg 10.1.189.5 ipset –A web_sg 10.21.9.77 -A FORWARD –p tcp –m tcp –dport 3060 –m set –match-set web_sg src -j ACCEPT

Fix with ipsets:

Problem: firewall rules explosion in dom0

See also http://daemonkeeper.net/781/mass-blocking-ip-addresses-with-ipset/

Page 26: 5  scalability Cloudstack Developer Day

Security group propagation time

Seconds t

o f

ully

synced

Number of VMs in security group

Page 27: 5  scalability Cloudstack Developer Day

Problem: database connection management

• Scale testing resulted in several “too many open connections” errors from MySQL

• Common problem: holding open connections while doing long-running operations

• Took some code clean up and refactoring

• No longer an issue

• MySQL supports 10,000 connections

• CloudStack is far below that

Page 28: 5  scalability Cloudstack Developer Day

DB connections per MS while deploying 30,000 VMs N

um

ber

of D

B c

onnections

Time

20,000

5,000 5,000

Page 29: 5  scalability Cloudstack Developer Day

Other considerations (beyond control plane)

• Network design and devices

• Object store scalability

• Per-host and cluster scalability

• Storage

• Understand your workload

Page 30: 5  scalability Cloudstack Developer Day

Future work

• Improve simulator accuracy

• Publish results of advanced network (VLAN) testing

• Verify assumption of VM density not impacting scale

Page 31: 5  scalability Cloudstack Developer Day

More information and joining the project

Project web site:

http://incubator.apache.org/projects/cloudstack.html

Mailing lists:

[email protected]

[email protected]

Scalability study:

http://wiki.cloudstack.org/pages/viewpage.action?pageId=14320020

Page 32: 5  scalability Cloudstack Developer Day

Q&A


Recommended