Date post: | 10-May-2015 |
Category: |
Technology |
Upload: | yandex |
View: | 1,767 times |
Download: | 3 times |
© MIRANTIS 2013 PAGE 1© MIRANTIS 2013
Scaling Puppet Deployments
Matthew MosesohnSenior Deployment Engineer
© MIRANTIS 2013 PAGE 2
Configure by hand
● Insert media into system● Install OS● Install software● Configure software● Verify● Done?
© MIRANTIS 2013 PAGE 3
Automate
● PXE installation
– Imaging– Cobbler– Foreman– Razor
● Configuration
– Puppet– Chef– Salt– Ansible
© MIRANTIS 2013 PAGE 4
Puppet
● Powerful tool written in Ruby
● Extensible
● Built in syntax checking
● Large community
● Used in many major companies, including:
– Google– Cisco– PayPal– VMWare
© MIRANTIS 2013 PAGE 5
Our purpose
● FUEL is a tool designed to deploy OpenStack
● FUEL consists of:
– Astute: Orchestration library built on Mcollective– Library: Puppet manifests– Web: Python web app to deliver a rich user experience– Cobbler: provisioning of bare metal– Bootstrap: lightweight install environment for node discovery
© MIRANTIS 2013 PAGE 6
Tiny example
● 1 master Cobbler and Puppet server● 2 node OpenStack cluster● OS deployment: 5 minutes● Puppet configuration: 15 minutes each● Total time: ~40 minutes
© MIRANTIS 2013 PAGE 7
Typical example
● 1 master Cobbler and Puppet server● 10 node OpenStack cluster● OS deployment: 30 minutes total● Puppet configuration: 15 minutes each● Total time: ~2hr 45min
© MIRANTIS 2013 PAGE 8
Stretching the limits
● 1 master Cobbler and Puppet server● 100 node OpenStack cluster● OS deployment: ?? minutes total● Puppet configuration: 15 minutes each● Total time: Maybe 24 hours?
© MIRANTIS 2013 PAGE 9
How to get to 1,000?
● Physical limitations of physical disks● Physical limitations of network● Puppet limitations● Cobbler limitations● Messaging/orchestration limitations● Durability/patience of client applications
© MIRANTIS 2013 PAGE 10
Approach: Scale the server!
● Pure speed. Don't care about anything else.● Buy expensive system with 2 SSDs in RAID-0, 12
cores, 256GB memory, and bonded NICs● Peak I/O: ~800MB/s
© MIRANTIS 2013 PAGE 11
How crowded is your network segment?
● More than 500 nodes on one network is bad● Broadcast traffic will hinder normal traffic● One lost packet means TFTP must fail and start
over● Make a second network and set a DHCP relay● Update your PXE server's DHCP configuration
© MIRANTIS 2013 PAGE 12
err: Could not retrieve catalog from remote server: Connection
refused connect(2)
© MIRANTIS 2013 PAGE 13
Puppet load
● Catalog compile time– 12s per node
● Serve files: 12mb each host● Receive and store 500kb report in YAML format● Store in PuppetDB
© MIRANTIS 2013 PAGE 14
How to avoid failure
● IPMI control of all nodes (expensive)● Orchestration that can reset a host if it gets
“stuck” along the way● Staggered approach to avoid overload on master
© MIRANTIS 2013 PAGE 15
How the pros do it
● Large US bank● 2 Puppet CA servers● 3 Puppet catalog masters● DNS round robin for catalog servers● 2000 hosts● Must stagger initial deployments
© MIRANTIS 2013 PAGE 16
Conclusion
● Not fast enough● Too much data● Still a bottleneck● Expensive hardware
© MIRANTIS 2013 PAGE 17
Approach: Ditch Puppetmaster!
● Still need to provision a base OS● Still need package repository● Still need to be fast● Still need to have some “brain” to identify
servers
© MIRANTIS 2013 PAGE 18
Speed up provisioning
● Install every nth server to serve as a provisioning mirror all in RAM
● TFTP still must come from master server, but 30 minutes of pain for bootstrap is okay
● HTTP for OS installation can be balanced via DNS round robin to each mirror
● Provision mirror hosts last
© MIRANTIS 2013 PAGE 19
Package repository
● YUM repository should be located close to cluster
● Mirror via Cobbler/Foreman ● Or somewhere in your organization with fast
disks
© MIRANTIS 2013 PAGE 20
External Node Classifiers
Arbitrary script to tell nodes what resources to install
ENC providers include:
– Puppet Dashboard – Foreman– Hiera– LDAP– Amazon CloudFormation– YAML file carried by
pigeon
© MIRANTIS 2013 PAGE 21
External Node Classifiers
● What they can provide:– Puppet master hostname– Environment name (production, devel, stage)– Classes to use– Puppet facts needed for installation
© MIRANTIS 2013 PAGE 22
Getting Puppet manifests to nodes
● How do you place manifests on a node?● Without relying on one host, pick most robust
system available
© MIRANTIS 2013 PAGE 23
Getting Puppet manifests to nodes
● Plain Git– Version controlled system– Widely implemented– Simple to get started– Fits into Puppet's environment structure via branches
© MIRANTIS 2013 PAGE 24
Getting Puppet manifests to nodes
● Puppet Librarian– Created by Tim “Rodjek” Sharpe from GitHub– Flexible manifest sources– Can specify a puppet “forge”– Can retrieve from git repositories– Dependency handling– Version specification optional– Creates a local Git repository to track changes
© MIRANTIS 2013 PAGE 25
Getting Puppet manifests to nodes
● RPM format– Technique used by Sam Bashton– Versioned as well– As easy to deploy as any other package– Requires clever building process
© MIRANTIS 2013 PAGE 26
Getting Puppet manifests to nodes
● RPM format magic– Jenkins job to take GIT code with manifests– Run puppet-lint on all puppet code– Create tarball of puppet manifests and hiera data– Wrap inside a package with a new version number– Push ready package to software repository
© MIRANTIS 2013 PAGE 27
Running local is better
● Deploying on great new hardware
● Faster catalog build
● No waiting for manifests or uploading reports
● No timeouts or connections refused
© MIRANTIS 2013 PAGE 28
What about my precious logs?!
© MIRANTIS 2013 PAGE 29
Rsyslog
● Scaling rsyslog requires lots of disk, but they don't have to be fast
● Rsyslog can throttle clients effectively● Clients can hold logs until server is ready to
receive● Everybody wins
© MIRANTIS 2013 PAGE 30
Doing the math
Stage Before After
Bootstrap OS 10min 10min (but that's okay)
Base OS provision 8hrs (10 concurrent) 30min to set up 20 mirrors25-40min to install (200 concurrent)30min to install mirrors
Puppet provisioning 10d 10hr (15min x 1000 hosts, one at a time)
45 mins for all 3 controllers, one at a time20 mins for compute nodes
Totals: 12 days 2-3 hours
© MIRANTIS 2013 PAGE 31
References
● http://www.tomshardware.com/reviews/ssd-raid-benchmark,3485-3.html
● http://www.masterzen.fr/2012/01/08/benchmarking-puppet-stacks/
● http://theforeman.org/manuals/1.3/index.html#3.5.5FactsandtheENC
● https://github.com/rodjek/librarian-puppet
● http://www.slideshare.net/PuppetLabs/sam-bashton
© MIRANTIS 2013 PAGE 32
Ref commandspuppet agent --{summarize,test,debug,evaltrace,noop} | perl -pe 's/^/localtime().": "/e'
Time:
....
Nova paste api ini: 0.02
Package: 0.03
Notify: 0.03
Nova config: 0.10
File: 0.40
Exec: 0.56
Service: 1.39
Augeas: 1.56
Total: 11.85
Last run: 1379522172
Config retrieval: 7.73