PuppetCamp SEA 1 - Puppet Deployment at OnApp

Post on 19-Jun-2015

445 views 2 download

Tags:

description

Wai Keen Woon, CTO CDN Division OnApp Malaysia, gave an interesting overview of what the Puppet architecture at OnApp looks like. The CDN division at OnApp is a large provider of CDN services, and as such makes a very interesting candidate for a case study.

transcript

Puppet Deployment at OnApp

Wai Keen Woon CTO, CDN Division waikeen.woon@onapp.com

WARNING

<ObligatoryPlug>

About OnApp

OnApp launched July 1st 2010

Backed by LDC

The leading cloud management software for hosts

The instant global CDN for hosts

Deep industry knowledge

100+ employees in US, EU, APAC

A leading provider of software for hosts

Vital Statistics

1 in 3 public clouds

cloud deployments

global clients

800+

300+

Customer Stories

paid for idle capacity

get low

PoPs

Instant CDN that gives you…

75+ cost, high margin

OK.

</ObligatoryPlug>

Systems Overview

l  Core & Development l  ~20 physical servers l  ~200 VMs l  Homogeneous environment – 64-bit Debian everywhere l  Mainly use OpenVZ and KVM for virtualization

l  CDN Delivery Edge Servers l  100+ servers in 60+ cities l  Running on the OnApp platform – either Xen or KVM

l  Puppet integral to our setup – since day 1

Why Puppet?

l  More reliable configuration of servers. Less need to “run ssh in a for loop” and miss out something.

l  Self-documenting – our manifests are almost able to bootstrap an empty server. l  Our manifests can't bootstrap an empty environment yet. l  Limitation – manifests describe what/where/how something

is setup, but doesn't describe *why*. l  Nice syntax – easy on the eyes. Comprehensive builtin

resource types. Able to fallback to dumb ways of doing things if required (use file, exec et al).

Core Infra Environments

l  Systems manifest describes everything. l  Three environments:

β

What Would OnApp Setup...

l  Essential utilities (tcpdump, less, vim, etc). l  Users & their SSH keys, sudoers.

l  Developer's shell => /bin/false if production l  Base firewall rules. l  Nagios agent. l  Set uniform locality settings: UTC timezone,

en_US.UTF-8 locale. l  SMTP that smarthosts to our central relay. l  Syslogd for remote logs to central logging server. l  Finally, the services.

Core Infra Manifest Excerpt

$portal_domain = "portal.alpha.onappcdn.com"

$portal_db_host = "portal.alpha.onappcdn.com"

$portal_db_user = "aflexi_webportal"

$auth_nameservers = { "ns1" => "175.143.72.214",

"ns2" => "175.143.72.214",

"ns3" => "175.143.72.214",

"ns4" => "175.143.72.214",

}

$monitoring_host_server =

[ "monitoring.alpha.onappcdn.com",

"dns.alpha.onappcdn.com" ]

node "monitoring.alpha.onappcdn.com" {

include base

include s_db_monitoring

include s_monitoring_server

include collectd::rrdcached

include s_munin

include s_monitoring_alerts

include s_monitoring_graph

} class collectd::rrdcached {

package { "rrdcached":

ensure => latest,

}

service { "rrdcached":

ensure => running,

}

}

BLUE – env config definitions RED – node definitions GREEN – class definitions

Package Repo Integration

l  Jenkins builds debs of our code and stores it into an apt repository for the environment it is built for.

l  Puppet keeps packages up-to-date (ensure => latest) and restarts services on package upgrades. Puppet-agent[25431]: (/Stage[main]/Debian/Exec[apt-get-update]/returns) executed successfully puppet-agent[25431]: (/Stage[main]/Python::Aflexi::Mq/Package[python-aflexi-mqcore]/ensure) ensure changed '7065.20120530.113915-1' to '7066.20120604.090916-1' puppet-agent[25431]: (/Stage[main]/S_mq/Service[worker-rabbitmq]) Triggered 'refresh' from 1 events puppet-agent[25431]: Finished catalog run in 16.08 seconds

Nagios Integration

l  Plugs into nagios – uses “exported resources”

Nagios Integration

Server manifest

*exports the service that is checked @@nagios_service { "check_load_$fqdn":

check_command => "check_nrpe_1arg!check_load", use => "generic-service", host_name => $fqdn, service_description => "check_load", tag => $domain, }

Nagios service manifest *collects the resources to check

Nagios_service <<| tag == "onappcdn.cm" |>> { target => "/etc/n3/conf.d/services.cfg", require => Package["nagios3"], notify => Exec["reload-nagios"], }

Nagios Integration

l  What's logged on the nagios server when puppet runs? puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/Nagios_host[hrm.onappcdn.com]/ensure) created puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/Nagios_service[check_load_hrm.onappcdn.com]/ensure) created nagios3: Nagios 3.2.1 starting... (PID=5601) puppet-agent[15293]: (/Stage[main]/Nagios::Base/Exec[reload-nagios]) Triggered 'refresh' from 8 events

Monitoring Puppet Itself

l  Lots of tools/dashboards out there to achieve this. l  For us: “grep -i err */syslog”. Dumb, but works until we

need to Really Address it. l  Common issues:

l  Puppet gets “stuck”. And only one puppet instance can run at any one time.

l  Manifest errors – syntax, merge issues. l  Badly-written manifests (vague dependencies,

conditions/commands not robust enough). l  An important dependent resource failing (e.g. apt-get

install fails due to dpkg-configure error).

File/Dir Organization

l  We use git to revision control our puppet manifests.

l  Style we adopted mainly comes from Hunter Haugen*

l  A branch for each environment, plus a “common” branch.

l  Each branch checked out as a separate directory in /etc/puppet/environments/$env

l  And puppetmaster's includedir configured to that directory.

* - http://hunnur.com/blog/2010/10/dynamic-git-branch-puppet-environments/

l  Common branch Manifests/ alpha.pp beta.pp Modules/ Base/ Users/

l  Alpha env branch Modules/ Python/ Services/ Nameserver/

l  Beta env branch Modules/ Python/ Services/ Nameserver/

File/Dir Organization

l  Common goes into its own branch – for convenience; less merging needed for manifests that we are Really Sure won't differ between environments.

l  System manifest into common/manifests/$env.pp l  Initially tried putting manifest into alpha/beta/omega

branches as site.pp – merge hell. l  Introduced extra variable - $effective_env

l  Abstracts the puppet environment name, from the environment that the manifest runs in.

File/Dir Organization

l  Hotfixes branch off omega and merged to alpha/beta/omega.

l  Development branches off alpha l  This branch can be trialed as a separate environment (use

--environment to specify custom env on puppet client). l  Merge to alpha → beta → omega. l  Or merge as feature branch to any other environment.

l  “git diff branchA branchB” - differences are shown clearly between environments.

Edge Servers

l  Our edge servers are hosted on OnApp cloud (only). l  When creating an edge server, the cloud control panel

l  Instantiates a VM from a lightly-customized Debian image. l  Configures the package repositories. l  Issues a puppet run to set up.

l  Advantage of setting it up through puppet instead of a “gold image” - our system can be installed on bare metal if needed, can be reproducibly installed on $future_debian_release

Edge Servers

l  Our edge servers are hosted on OnApp cloud (only). l  When creating an edge server, the control panel

instantiates a VM from a lightly-customized Debian image, and issues a puppet run to set it up.

Edge Servers – External Node Classifier

l  No text manifest – all code, using “external node classifier”.

l  Assign variables and classes specific to the edge server through node classifier. E.g. its password, the services it runs.

l  In python, output = {} output[“classes”] = [ “class1”, “class2” ] output[“parameters”] = { “param1”: “value1” } print yaml.dump(output)

Edge Servers – External Node Classifier

l  This YAML-encoded structure... $ puppet-nodeclassifier 85206671.onappcdn.com classes: [base, nginx ] parameters: { edge_secret_key: 86zFsrM7Ma, monitoring_domain: monitoring.alpha.onappcdn.com }

l  … is equivalent to this textual manifest: node 85206671.onappcdn.com { $edge_secret_key = “86zFsrM7Ma” $monitoring_domain = “monitoring.alpha.onappcdn.com” include base include nginx }

Edge Servers Storedconfigs

l  Puppet stores facts about the edge servers into MySQL.

l  We make minimal use of this – for example sizing nginx's in-memory cache depending on the amount of memory it has.

l  Could probably use more e.g. set # threads based on cpu core count.

l  The data's always there if we ever want to query it...

Q&A

l  Questions? Comments? l  P/S – final plug – we're hiring sysadmins!