+ All Categories
Home > Technology > Puppet Camp Melbourne Nov 2014 - A Build Engineering Team’s Journey of Infrastructure as Code

Puppet Camp Melbourne Nov 2014 - A Build Engineering Team’s Journey of Infrastructure as Code

Date post: 20-Aug-2015
Category:
Upload: peter-leschev
View: 10,450 times
Download: 4 times
Share this document with a friend
73
Transcript

@peterleschev

Husband, Father of 3 & Atlassian

Build Engineering

Peter Leschev

A Build Engineering Team’s Journey of

Infrastructure as CodeNov-2014

• Build platform & services used internally within the company• 90k builds per month• 43k automated tests just for JIRA• Developers expect a reliable infrastructure & fast CI feedback

Build Engineering today @ Atlassian

• 1000 build agents (own hardware + EC2 instances)• include SCM clients, JDKs, JVM build tools, databases, headless

browser testing, python builds, NodeJS, installers & more

• Maintain 20 AMIs of various build configurations• 8 Bamboo Servers• maven.atlassian.com / 6 Nexus instances • Monitoring - opsview / graphite / statsd

Build Engineering today @ Atlassian

Build Engineering today @ Atlassian

Infrastructure as Code

= Puppet + SCM ?

• Manually maintained snowflakes• Started using puppet

4 years ago...

Production rollout

puppetmaster

build agents

Production rollout failure

puppetmaster

build agents

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence of Change

Dev Rollout Soak in Prod

atlassian.com/git

Style in Pull Requests

• Automated style checking• Setup automated build that runs checks & posts results• Setup ratchet build to detect regressions

Puppet Lint https://github.com/rodjek/puppet-lintTim Sharpe

@rodjek

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence of Change

Dev Code review Rollout Soak in Prod

initial + Code review

• Coding on Puppet Master• Culture of manually modifying

production - Configuration Drift

• Impact on Builds

Using Staging for Development

puppetmaster

build agentsbuild agents

staging puppet environment

• Easily spin up Infrastructure locally on your laptop• Reproducible / disposable environments• Machine provisioning via Virtual Box / VMWare / AWS• Configuration applied via Shell Scripts / Puppet / Chef• Develop and test infrastructure changes locally

Vagrant http://www.vagrantup.com/Mitchell Hashimoto

@mitchellh

Vagrant

Vagrantfile

vagrant basebox

http://www.vagrantup.com/Mitchell Hashimoto

@mitchellh

Vagrant

Spins up a local VM to a known state

Destroy the VM when done

Make some puppet changes and then run:

to apply your changes

SSH into your VM using:

to check your changes

http://www.vagrantup.com/Mitchell Hashimoto

@mitchellh

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence of Change

Dev Code review Rollout Soak in Prod

initial + Code review + Vagrant

• Vagrant basebox differences with production machines• Originally using publicly available vagrant baseboxes

• Installed packages biggest differences

• Generating a basebox manually was a painful process

Vagrant != Production

Packer http://packer.ioMitchell Hashimoto

@mitchellh

Vagrant box for Virtualbox

packer template JSON

Vagrant box for AWS

• Latest basebox generated in CI & published to fileshare• No need to generate baseboxes locally

Basebox generation via CI

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence in Change

Dev Code review Rollout Soak in Prod

initial + Code review + Vagrant + Packer

Developing locally

Rolling out to production

Broken build agents!

Rolling out to staging

• Behaviour Driven Development

Cucumber https://github.com/cucumber/aruba

Cucumber & Vagrant

Vagrant

Custom Provisioner

Virtual Box

VM

puppet apply

cucumber *.features

via ssh

• Requires cucumber dependencies to be installed on tested VM

• Tests run within the VM making testing firewall rules harder

Disadvantages

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence in Change

Dev Code review Rollout Soak in Prod

initial + Code review + Vagrant + Packer + Cukes

But it works on my machine!– Every Developer”“

• ‘From scratch’ provisioning• Confidence that you can rebuild in disaster

Continuous Integration

The Cattle: you give them numbers. When

they get ill, you shoot them

The Pets: you give nice names,

you stroke them, and when they get ill,

you nurse them back to health,

taking a long time over it

– Tim Bell, CERN”

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence in Change

Dev Code review CI & Rollout Soak in Prod

initial + Code review + Vagrant + Packer + Cukes + CI

Provisioning from scratch is slow

Spread out CI

provision VM #1

Moved from sequential to parallel provisioning

provision VM #2

provision VM #3

provision VM #4

provision VM #1

provision VM #2 provision VM #3

provision VM #4

There are so many MacPros you can steal

The ones I had my eye on....

Profiling Puppet Runs

Add “--evaltrace” to puppet apply

+ =Collect and show the longest occurrences of: “Evaluated in ([\d\.]+) seconds”

Profiling Cucumber runs

http://itshouldbeuseful.wordpress.com/2010/11/10/find-your-slowest-running-cucumber-features/

• Provision locally & for CI• Faster & different class of problems found• Matches production state

Delta Provisioning

‘from scratch’ provision delta provision

provision VM

export VM fileshare

import VM box

provision VMon success

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence in Change

Dev Code review CI & Rollout Soak in Prod

initial + Code review + Vagrant + Packer + Cukes + CI+ Delta CI

Broken buildsmaster

Branch builds

BUILDENG-5670

BUILDENG-5669

master

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence in Change

Dev Branch CI Code review CI & Rollout Soak in Prod

initial + Code review + Vagrant + Packer + Cukes+ CI + Delta CI + Branch CI

Slow builds

Vagrant-AWS https://github.com/mitchellh/vagrant-aws

• MacPros no longer required• They were limited in supply & old

• 2x speed improvement• Only limited by our credit card limit

Vagrant-AWS https://github.com/mitchellh/vagrant-aws

Catalog Diff

puppet master --logdest console --compile HOSTNAME

Step 1: Generate a hash of a node’s catalog

HOSTNAME.json

- Sort elements - Remove timestamps - Generate shasum

f50db91e6461f5bdcb56769a8f77da1fac26943d

Catalog Diff

Step 2: Compare the hash of master versus your branch to avoid unnecessary provisioning

Example 1:

f50db91e6461f5bdcb56769a8f77da1fac26943d f50db91e6461f5bdcb56769a8f77da1fac26943d=

Hash is the same, no build required

master branch

Example 2:

f50db91e6461f5bdcb56769a8f77da1fac26943d 18033e4d21b78bab6deb3ae1ff3c147ade5a37ca!=

Hash is different, build required

master branch

Catalog Diff

Step 3: Profit!Reduction in

feedback time +

$$$ saved

Images: http://pixabay.com/p-30984/ https://www.flickr.com/photos/williamnyk/3598113750/

Infrequent Releases

• Puppet runs impacted running builds• Disabling all the build agents

• Performing the roll out

• git clone / librarian-puppet / symlink update on puppetmaster

• Manually kick off puppet on all the build agents

• Enabling all the build agents

• Set of Puppet environments for every bamboo server

Painful Puppet Rollouts

Graceful Service restarts

+Bamboo Agent JVM process watches for touch file & shutdowns when Idle (written as a Bamboo Plugin)

• BEFORE - Multiple puppet envs for each Bamboo Server• jbac_staging

• jbac_production

• cbac_staging

• cbac_production

• etc

• AFTER - Changed to use ‘staging’ & ‘production’ only

Puppet Environments

• BEFORE: Manually on puppetmaster• git clone the puppet tree

• run librarian-puppet to pull external modules

• Update staging / production symlink

• AFTER: Bamboo build which performs the above steps automatically

Updates on Puppetmaster

Bot automation - ‘open prs’

Less Human interaction +

More automation =

Higher Confidence

Less Human Effort =

Increased frequency of releases

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence in Change

Dev Branch CI Code review CI & Rollout Soak in Prod

initial + Code review + Vagrant + Packer+ Cukes + CI + Delta CI + Branch CI+ Frequent Releases

Should I be scared?– Peter Leschev, 3 months ago”“

I’m scared!– Peter Leschev, 3.5 years ago”“

Hipchat integration

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence in Change

Dev Branch CI Code review CI & Rollout Soak in Prod

initial + Code review + Vagrant + Packer+ Cukes + CI + Delta CI + Branch CI+ Frequent Releases + Notification

HIGH

NONE

Lifecycle of an infra change

confidence

Confidence in Change

Dev Branch CI Code review CI & Rollout Soak in Prod

before after

Confidence in Change

or

Finding & fixing problems sooner rather

than later

Snowflakes

Pets

Cattle

Stateless Machines

We’re still on the Journey

Come join us!

atlassian.com/jobs

one more thing…

• https://forge.puppetlabs.com/atlassian/nexus_rest • Configure Nexus using Custom Puppet Provider Types

rather than XML files

Puppet Module for Sonatype Nexus

Thank you!

Questions?


Recommended