Puppet at Opera Sofware - PuppetCamp Oslo 2013

Post on 14-Dec-2014

1,115 views 0 download

description

A bit of history, frustration-driven development, and why and how we started looking into Puppet at Opera Software. What we're doing, successes, pain points and what we're going to do with Puppet and Config Management next.

transcript

Puppet at OperaPuppet Camp Oslo 2013

cosimo@opera.com

devs sysadmin

devs sysadmin

DevSys?

FDD

Frustration Driven Development

# LVS main config file # # Last modified: # 2012-12-10 Commented out all wlb servers, as they haven't been in use … # 2012-XX-XX Tons of shifting around servers, upgrading and problems (Everyone) # 2011-04-01 Removed all old b#-servers (N.....) # 2010-03-24 Bye bye bigma. (M..../Cosimo) # 2010-03-03 Restore pre Feb 26th config that seems to ensure stability (Cosimo) # When adding bigboy/bigcat, bad site lockups happen # 2010-03-03 Reducing weight on b12 as it is less powerfull (M....) # 2010-02-26 re-adding bigdog, and lowering bigunc, also vamping up b12 to 100% # 2010-02-26 Bigdog is crashing, removing from lvs (M......) # 2010-02-03 Enabled f8 and b7, first b7, then some hours later f8 … (N......) # 2010-01-19 Bigant ready to rock and roll! (Cosimo) # 2010-01-13 Removed bigpa, fatgirl from database pool (Cosimo) # 2010-01-07 Added b8 to backend pool (Cosimo) # 2010-01-05 Added bigant to the My Opera databases (Cosimo) # 2009-11-22 Added bigdog to the My Opera databases (Cosimo) # 2009-11-18 Added b7 and f8 as back-end servers (M.....) # 2009-11-18 Removed p23-02 backend, moved to auth (Cosimo) # 2009-11-12 Removing b7 and f8 from Mysql Load balancers (Cosimo) # 2009-11-11 Added Lenny backend p23-02 (Cosimo) # 2009-10-11 phased-in InnoDB-powered bigma in production (Cosimo) # 2009-09-23 phased-in InnoDB-powered bigma in production (Cosimo) # 2009-06-27 switched master from bigma to bigsis (w-mlb) \o/ (N.....) # 2009-06-23 shifting load away from bigbro. it's dying? (Cosimo) # 2009-03-18 pushing bigbro as much as we can, to test it out (Cosimo) global_defs { lvs_id MY_LVS … }

innodb_buffer_pool_size = 128M # was 64M # was 16M # was 32M

The Pilot – Goals

● New deployment procedure

● Sane configuration files

● Configuration management

CM Tools Evaluation (2009)

CFEngine 2

BCfg2

Puppet 0.25.4

LCFG

CM Tools Evaluation

CFEngine 2

BCfg2

Puppet 0.25.4 2.6.2 2.7.14→ →

LCFG

The very beginning...

commit 9c54321f51bf969940b63b48d055743ac504035eAuthor: Cosimo Streppone <cosimo@opera.com>Date: Thu Jan 14 13:21:40 2010 +0000

Generic puppet recipes. To be continued.

Our approach

A “conservative” approach, surely

• Keep it simple. No concat/append/modify

• As few dependencies as possible

• Stability and reliability is critical

• No pulls from github or external URLs

• We don't use puppet for deployment

• Even realize() gets me into panic mode

Three Years In

• Modules repository, with 60+ mods• Some custom facter plugins• Shared projects conventions & structure• Shared deployment procedures and libs• Good server baseline configuration• Our team, ~200 nodes• Opera Mini Ops team, thousands of nodes

Datacenters

It's Modules all the way down...

Apache

base_packages

Cassandra

Django

Bash

RRDCached

Munin

Solr 4.0

RabbitMQ

Postfix

Varnish

Statsd

PowerDNS

Tomcat

Sshsecurity_upgrades

Projects structure

Master config file /config/production.json

Role-specific files /config/role/<role>/

Puppet manifests /config/puppet/

Deployment scripts /deploy/

Master configuration file{ "master_rev" : "20130129", "application" : "geodns", "environment" : "production", "domain" : "localdomain", "contact" : "cosimo@opera.com",

"puppet_vars" : { # Available in manifests "some-password" : "hola/amigos" },

"systems" : { # List of all hostnames and their roles "node01" : { "puppet_class" : [ "geodns::backend" ] }, "node02" : { "puppet_class" : [ "geodns::frontend" ], "puppet_vars" : { … }, }, … }

/etc/puppet →

puppet.conf (master configuration file)

fileserver.conf

files → {auth, geodns, opcdn} (local project files)

modules → (shared generic modules)

{ntp, apache, varnish, nginx, ...}

manifests → (generic and project specific manifests)

classes/

{basenode, backend, frontend}.pp

classes/ <project> /

<anything goes, project-specific>

Puppet master layout

/etc/puppet/manifests/site.pp

$server = "puppetmaster.opera.com" import "os/*.pp" import "classes/*.pp" # generic classes import "classes/*/*.pp" # project classes node default { include basenode } filebucket { "main": server => $server } File { ignore => ['.svn', '.git', 'CVS' ], backup => "main", }

Puppet master - site.pp

/etc/puppet/puppet.conf

external_nodes = /etc/puppet/bin/puppet-node-classifier

node_terminus = exec

/etc/puppet/manifests/nodes/geodns-production.json

{ "application" : "geodns",

"environment" : "production",

"domain" : "localdomain",

"systems" : {

"node01" : {

"puppet_class" : [ "geodns::backend" ],

}, …

}

}

Puppet master – no nodes.pp

$ facter --puppetarchitecture => amd64datacenter => nervdomain => opera.comfacterversion => 1.5.7fqdn => node01.int.opera.comhardwareisa => unknownhardwaremodel => x86_64hostname => node01id => rootinterfaces => eth0,eth1ipaddress => 1.2.3.4ipaddress_eth0 => 1.2.3.4…

Facter

facter/datacenter.rb

Facter.add("datacenter") do setcode do datacenter = "unknown" # Get current ip address from Facter's own db ipaddr = Facter.value(:ipaddress) if ipaddr.match("^1\.2\.3\.") datacenter = "dc1" elsif ipaddr.match(...) … end endend

Facter – custom plugins

case $datacenter { "dc1" : { include opera::datacenters::dc1 } "dc2" : { include opera::datacenters::dc2 } "dc3" : { include opera::datacenters::dc3 } … default: { include opera::datacenters::base }}

Facter – custom plugins

class basenode {

include opera

# Opera-specific data-center based settings case $datacenter { "dc1" : { include opera::datacenters::dc1 } … default: { include opera::datacenters::base } }

include apt-opera include base_packages include locales include logcheck include munin include nagios include cron include perl include python include puppet include ntp include timezone … }

Basenode class

autosign+ some preinstalled packages+ internal apt repository+ a bit of shell scripting

Bootstrap script

Real world examples – 1 Project class geodns::backend {

include opera::admins::devops include security-upgrades include powerdns include geoip::city include memcache

package { [ 'libjson-xs-perl', … ]: ensure => 'present' }

bash::prompt { '/root/.bashrc': description => 'geodns', color => 'red', }

munin::plugin::custom { 'geodns_': } munin::plugin { [ 'geodns_country', 'geodns_errors', … ]: plugin_name => 'geodns_', } }

Real world examples – 2 Varnish

varnish::config { "project-varnish-config":

vcl_conf => "tvstore.vcl", storage_type => "malloc", storage_size => "512M", listen_port => 8100, sess_workspace => 131072, ttl => 60, thread_pools => 2, thread_min => 400, thread_max => 3000,

# Needed for GeoIP support in varnish: # http://stackoverflow.com/questions/5906603/ cc_command => "exec cc -fpic -shared -Wl,-x \ -L/usr/include/GeoIP.h -lGeoIP -o %o %s"

}

Real world examples – 3 Munin

include munin::server

file { '/etc/munin/munin-conf.d/project-settings.conf': … }

Real world examples – 4 Solr

include solr4

solr4::core { 'core1': config => '.../core1/solrconfig.xml', properties => '.../core1/solrcore.properties', schema => '.../core1/schema.xml',}

solr4::config { 'solr-search-config': cores => ['core1', … ],}

Pain points AKA wish-list

Speed!

~60 s runtime ~600 resources→

TOO SLOW!

notice: /Stage[main]/Django/Package[Django]/ensure: ensure changed '1.4.3' to '1.4.2'

notice: /Stage[main]/Package[cython]/ensure: created

notice: /Stage[main]/Java::Sun_java6/Exec[debconf-set-selections-sun-java6-bin] /returns: executed successfully

notice: /Stage[main]/Java::Sun_java6/Exec[debconf-set-selections-sun-java6-jre] /returns: executed successfully

Resources that don't go away

Shared resources

cron::logcleanup { … }

• Used by both Apache and Nginx modules• Getting conflicts if you pull both

Shared environment

Many projects run under the same master.

A syntax error anywhere blocks everyone.

Testing

Would be awesome to be ableto test our modules and manifests.

Locally.

Without a puppetmaster.

Future directions

Things we'd like to look into...

• PuppetDB

• Better systems inventory

• Better Nagios integration

• Testing manifests and modules

Q & A

@cstrepcosimo@opera.com

https:/ /github.com/cosimo/http://w w w.streppone.it /cosimo/blog/