Chef at Etsy

Post on 28-Nov-2014

1,659 views 0 download

description

Slides from my "Chef at Etsy" talk at the London Chef Meetup on Thurs Oct 10th, 2014

transcript

Chef at Etsy

@jonlives

Jon Cowie

Sr Operations Engineer

30 Million Members

4

1 Million Active Shops

20 Million Items Listed

5

60 Million Monthly Unique Visitors

@jonlives

We Love Chef!

@jonlives

Absorb what is useful.

Discard what is useless.

@jonlives

“I am not smart enough to build an ontology … that

can encompass all the variations in infrastructure.

Nobody is, the world moves too fast.”

@jonlives

There is no magic pill.

@jonlives

You are the expert.

@jonlives

Chef at Etsy

• Chef Server 11.1.4

• ~2000 Nodes

• CentOS, some Mac OS X

@jonlives

Beginning of 2010 Today

@jonlives

Chef at Etsy

@jonlives

Evolution of Chef

@jonlives

2010: The Beginning

• ~250 Nodes (Ubuntu & CentOS

• The first cookbooks

• Out of the box workflow

@jonlives

2011: Growth

• ~400 Nodes (CentOS)

• Chef still pretty specialised knowledge

• Handlers added

@jonlives

2012: A big year

• ~800 Nodes (CentOS & MacOS X) • More in-house Chef expertise • Workflow tooling • Debugging tooling • Monitoring

@jonlives

2013: Chef at Etsy

• ~1500 Nodes • Workflow tooling enhancements • Feature flags in Chef • Chef performance - Chef 11 upgrade

@jonlives

2014: Chef at Etsy

• ~2000 nodes • Consolidation • CI with Chef • Omnibus • Work-in-Progress tooling

@jonlives

Patterns & Workflows

@jonlives

Cookbook Workflow

@jonlives

$> review -r jcowie --cc ops

@jonlives

knife-spork

• https://github.com/jonlives/knife-spork • Workflow tool • Helps multiple chefs avoid clashing • Visibility into changes • Plugins

@jonlives

knife-spork

• knife spork bump • knife spork upload • Test change

@jonlives

Test Change

• https://github.com/jonlives/knife-flip

• knife node flip foo.etsy.com testing

• knife role flip MyRole testing

@jonlives

Test Change

• https://github.com/mrtazz/knife-wip • Uses node tags <irccat> CHEF: bburry started work cent7 package bugfixing on deploy01.ny5.etsy.com

@jonlives

knife-spork

• knife spork bump • knife spork upload • Test change • knife spork promote --remote • git commit and push

@jonlives

Monitoring & Debugging

@jonlives

knife-spork & CI Job

<irccat> CHEF: Jon Cowie uploaded pentaho@0.1.8 <irccat> CHEF: Jon Cowie promoted pentaho@0.1.8 to production <snip> <irccat> Git PUSH -> Sysops/chef <snip> <Jenkins> Starting build #5649 for job chef-server-git-sync <Jenkins> Project chef-server-git-sync build #5649: SUCCESS in 2 min 36 sec: http://ci.etsycorp.com/job/chef-server-git-sync/5649/

@jonlives

IRC Handler<irccat> Chef run failed on officebackup01.office.etsy.com gist failed, see /var/log/chef/client.log on the host !

<irccat> Still Failing on dbnest01.ny4.etsy.com since 2 days ago https://github.etsycorp.com/gist/656d8914fbef5a6bd9aa

@jonlives

Lastrun Data

• https://github.com/jgoulah/knife-lastrun

• knife node lastrun foo.bar.com

@jonlives

Lastrun Data%  knife  node  lastrun  dbnest01.ny4.etsy.com  Status                  failed                                        Elapsed  Time          29.055892                                  Start  Time              2014-­‐10-­‐06  12:54:51  +0000  End  Time                  2014-­‐10-­‐06  12:55:20  +0000  !<snip>  !Exception  <snip>  Installed  package  backupd-­‐1.4-­‐1.365657d.el5.centos  is  newer  than  candidate  package  backupd-­‐1.2-­‐1.99ddb8e.el5  

@jonlives

Dashboards

@jonlives

Dashboards

@jonlives

Dashboards

@jonlives

Monitoring & Debugging

• https://github.com/etsy/chef-handlers • https://github.com/etsy/dashboard • https://github.com/jgoulah/knife-lastrun • https://github.com/bmarini/knife-inspect

@jonlives

Feature Flags

@jonlives

Downsides of Existing Approach

• Holding cookbook in testing is blocking • Accidental promotions • Testing env affects all cookbooks • “Upgrade” envs often used • How to make it more “Etsy”?

@jonlives

@jonlives

chef-whitelist

• https://github.com/etsy/chef-whitelist • Databag driven • Cookbook library • Feature flags!

@jonlives

chef-whitelist{ "id": "php-5-5-17", "patterns": [ "statsd*.ny5.etsy.com", "deploy*.ny5.etsy.com", <snip> ] }

@jonlives

chef-whitelist

if node.is_in_whitelist? "php-5-5-17" package "php-pecl-opcache" do action :remove end end

@jonlives

Configuration Data

@jonlives

Keep cookbooks:• Simple • Modular • Scalable • Maintainable

@jonlives

Environments

• Cookbook version constraints

@jonlives

Roles

• Group-level config • Syslog-ng • Iptables • Sudoers

@jonlives

Roles - iptables“firewall": { "ports": { "11211": { "subnet_group": "prod_subnets" }, <snip> } }

@jonlives

Roles - Syslog-ng"syslog": {

"web": {

"web_apache_access_log": {

"source": "/var/log/httpd/access_log",

"source_program_override": "APACHEACCESS: ",

"destination": "/data/syslog/current/web/access.log",

"destination_filters": [

"host('^(web0|dlweb)')",

"match('APACHEACCESS')"

]

}

}

@jonlives

Data Bags

• Global / Datacenter specific Config • Ganglia • Cobbler • VOIP

• Data Storage

@jonlives

Data Bags - Ganglia{

"id": "config_se5",

"grid_name": "EtsySE5",

"authority": "http://gangliase5.etsycorp.com",

"trusted_hosts": <snip>,

"groups": {

"Utilities": "239.2.11.71",

<snip>

}

<snip>

}

@jonlives

Data Bags - Cobbler{

"id": "config_corp",

"cobbler_server": "corpking02.corp.etsy.com",

"dns_servers": [ “10.x.x.x", “10.x.x.x" ],

"dhcp_ranges": {

"10.100.x.0": {

"routers": "10.x.x.1",

"mask": "255.255.255.0",

"range": "10.x.x.11 10.x.x.250"

}

}

}

@jonlives

Write cookbooks you’ll thank yourself for.

@jonlives

!

http://jonliv.es/book !

Discount Code: AUTHD !

40% off Print 50% off Digital

@jonlives

Thanks! Questions?

!

@jonlives / http://jonliv.es / jon@etsy.com