Date post: | 19-Jun-2015 |
Category: |
Technology |
Upload: | olindata |
View: | 445 times |
Download: | 2 times |
WARNING
<ObligatoryPlug>
About OnApp
OnApp launched July 1st 2010
Backed by LDC
The leading cloud management software for hosts
The instant global CDN for hosts
Deep industry knowledge
100+ employees in US, EU, APAC
A leading provider of software for hosts
Vital Statistics
1 in 3 public clouds
cloud deployments
global clients
800+
300+
Customer Stories
paid for idle capacity
get low
PoPs
Instant CDN that gives you…
75+ cost, high margin
OK.
</ObligatoryPlug>
Systems Overview
l Core & Development l ~20 physical servers l ~200 VMs l Homogeneous environment – 64-bit Debian everywhere l Mainly use OpenVZ and KVM for virtualization
l CDN Delivery Edge Servers l 100+ servers in 60+ cities l Running on the OnApp platform – either Xen or KVM
l Puppet integral to our setup – since day 1
Why Puppet?
l More reliable configuration of servers. Less need to “run ssh in a for loop” and miss out something.
l Self-documenting – our manifests are almost able to bootstrap an empty server. l Our manifests can't bootstrap an empty environment yet. l Limitation – manifests describe what/where/how something
is setup, but doesn't describe *why*. l Nice syntax – easy on the eyes. Comprehensive builtin
resource types. Able to fallback to dumb ways of doing things if required (use file, exec et al).
Core Infra Environments
l Systems manifest describes everything. l Three environments:
β
What Would OnApp Setup...
l Essential utilities (tcpdump, less, vim, etc). l Users & their SSH keys, sudoers.
l Developer's shell => /bin/false if production l Base firewall rules. l Nagios agent. l Set uniform locality settings: UTC timezone,
en_US.UTF-8 locale. l SMTP that smarthosts to our central relay. l Syslogd for remote logs to central logging server. l Finally, the services.
Core Infra Manifest Excerpt
$portal_domain = "portal.alpha.onappcdn.com"
$portal_db_host = "portal.alpha.onappcdn.com"
$portal_db_user = "aflexi_webportal"
$auth_nameservers = { "ns1" => "175.143.72.214",
"ns2" => "175.143.72.214",
"ns3" => "175.143.72.214",
"ns4" => "175.143.72.214",
}
$monitoring_host_server =
[ "monitoring.alpha.onappcdn.com",
"dns.alpha.onappcdn.com" ]
node "monitoring.alpha.onappcdn.com" {
include base
include s_db_monitoring
include s_monitoring_server
include collectd::rrdcached
include s_munin
include s_monitoring_alerts
include s_monitoring_graph
} class collectd::rrdcached {
package { "rrdcached":
ensure => latest,
}
service { "rrdcached":
ensure => running,
}
}
BLUE – env config definitions RED – node definitions GREEN – class definitions
Package Repo Integration
l Jenkins builds debs of our code and stores it into an apt repository for the environment it is built for.
l Puppet keeps packages up-to-date (ensure => latest) and restarts services on package upgrades. Puppet-agent[25431]: (/Stage[main]/Debian/Exec[apt-get-update]/returns) executed successfully puppet-agent[25431]: (/Stage[main]/Python::Aflexi::Mq/Package[python-aflexi-mqcore]/ensure) ensure changed '7065.20120530.113915-1' to '7066.20120604.090916-1' puppet-agent[25431]: (/Stage[main]/S_mq/Service[worker-rabbitmq]) Triggered 'refresh' from 1 events puppet-agent[25431]: Finished catalog run in 16.08 seconds
Nagios Integration
l Plugs into nagios – uses “exported resources”
Nagios Integration
Server manifest
*exports the service that is checked @@nagios_service { "check_load_$fqdn":
check_command => "check_nrpe_1arg!check_load", use => "generic-service", host_name => $fqdn, service_description => "check_load", tag => $domain, }
Nagios service manifest *collects the resources to check
Nagios_service <<| tag == "onappcdn.cm" |>> { target => "/etc/n3/conf.d/services.cfg", require => Package["nagios3"], notify => Exec["reload-nagios"], }
Nagios Integration
l What's logged on the nagios server when puppet runs? puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/Nagios_host[hrm.onappcdn.com]/ensure) created puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/Nagios_service[check_load_hrm.onappcdn.com]/ensure) created nagios3: Nagios 3.2.1 starting... (PID=5601) puppet-agent[15293]: (/Stage[main]/Nagios::Base/Exec[reload-nagios]) Triggered 'refresh' from 8 events
Monitoring Puppet Itself
l Lots of tools/dashboards out there to achieve this. l For us: “grep -i err */syslog”. Dumb, but works until we
need to Really Address it. l Common issues:
l Puppet gets “stuck”. And only one puppet instance can run at any one time.
l Manifest errors – syntax, merge issues. l Badly-written manifests (vague dependencies,
conditions/commands not robust enough). l An important dependent resource failing (e.g. apt-get
install fails due to dpkg-configure error).
File/Dir Organization
l We use git to revision control our puppet manifests.
l Style we adopted mainly comes from Hunter Haugen*
l A branch for each environment, plus a “common” branch.
l Each branch checked out as a separate directory in /etc/puppet/environments/$env
l And puppetmaster's includedir configured to that directory.
* - http://hunnur.com/blog/2010/10/dynamic-git-branch-puppet-environments/
l Common branch Manifests/ alpha.pp beta.pp Modules/ Base/ Users/
l Alpha env branch Modules/ Python/ Services/ Nameserver/
l Beta env branch Modules/ Python/ Services/ Nameserver/
File/Dir Organization
l Common goes into its own branch – for convenience; less merging needed for manifests that we are Really Sure won't differ between environments.
l System manifest into common/manifests/$env.pp l Initially tried putting manifest into alpha/beta/omega
branches as site.pp – merge hell. l Introduced extra variable - $effective_env
l Abstracts the puppet environment name, from the environment that the manifest runs in.
File/Dir Organization
l Hotfixes branch off omega and merged to alpha/beta/omega.
l Development branches off alpha l This branch can be trialed as a separate environment (use
--environment to specify custom env on puppet client). l Merge to alpha → beta → omega. l Or merge as feature branch to any other environment.
l “git diff branchA branchB” - differences are shown clearly between environments.
Edge Servers
l Our edge servers are hosted on OnApp cloud (only). l When creating an edge server, the cloud control panel
l Instantiates a VM from a lightly-customized Debian image. l Configures the package repositories. l Issues a puppet run to set up.
l Advantage of setting it up through puppet instead of a “gold image” - our system can be installed on bare metal if needed, can be reproducibly installed on $future_debian_release
Edge Servers
l Our edge servers are hosted on OnApp cloud (only). l When creating an edge server, the control panel
instantiates a VM from a lightly-customized Debian image, and issues a puppet run to set it up.
Edge Servers – External Node Classifier
l No text manifest – all code, using “external node classifier”.
l Assign variables and classes specific to the edge server through node classifier. E.g. its password, the services it runs.
l In python, output = {} output[“classes”] = [ “class1”, “class2” ] output[“parameters”] = { “param1”: “value1” } print yaml.dump(output)
Edge Servers – External Node Classifier
l This YAML-encoded structure... $ puppet-nodeclassifier 85206671.onappcdn.com classes: [base, nginx ] parameters: { edge_secret_key: 86zFsrM7Ma, monitoring_domain: monitoring.alpha.onappcdn.com }
l … is equivalent to this textual manifest: node 85206671.onappcdn.com { $edge_secret_key = “86zFsrM7Ma” $monitoring_domain = “monitoring.alpha.onappcdn.com” include base include nginx }
Edge Servers Storedconfigs
l Puppet stores facts about the edge servers into MySQL.
l We make minimal use of this – for example sizing nginx's in-memory cache depending on the amount of memory it has.
l Could probably use more e.g. set # threads based on cpu core count.
l The data's always there if we ever want to query it...
Q&A
l Questions? Comments? l P/S – final plug – we're hiring sysadmins!