Devops a Software Revolution

The Journal of Information Technology Management

Cutter IT Journal

Vol. 24, No. 8August 2011

Devops: A Software Revolutionin the Making?

Opening Statementby Patrick Debois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Why Enterprises Must Adopt Devops to Enable Continuous Delivery by Jez Humble and Joanne Molesky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Devops at Advance Internet: How We Got in the Doorby Eric Shamow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

The Business Case for Devops: A Five-Year Retrospectiveby Lawrence Fitzpatrick and Michael Dillon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Next-Generation Process Integration: CMMI and ITIL Do Devopsby Bill Phifer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Devops: So You Say You Want a Revolution?by Dominica DeGrandis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

At Arms Length

Automating the delivery process reducesthe time development and operations haveto spend together. This means that whendev and ops must interact, they can focuson the issues that really matter.

Arm in Arm

Development and operations often play catsand dogs during deployment. To stop thefighting, we must bring them closer togetherso they will start collaborating again.

Some people get stuck on theword devops, thinking thatits just about developmentand operations workingtogether. Systems thinkingadvises us to optimize thewhole; therefore devopsmust apply to the wholeorganization, not only thepart between developmentand operations.

Patrick Debois,Guest Editor

Cutter IT Journal

Cutter Business Technology Council:Rob Austin, Ron Blitstein, ChristineDavis, Tom DeMarco, Lynne Ellyn,Israel Gat, Tim Lister, Lou Mazzucchelli,Ken Orr, and Robert D. Scott

Editor Emeritus: Ed YourdonPublisher: Karen Fine CoburnGroup Publisher: Chris GeneraliManaging Editor: Karen PasleyProduction Editor: Linda M. DiasClient Services: [email protected]

Cutter IT Journal is published 12 timesa year by Cutter Information LLC,37 Broadway, Suite 1, Arlington, MA02474-5552, USA (Tel: +1 781 6488700; Fax: +1 781 648 8707; Email: [email protected]; Website:www.cutter.com). Print ISSN: 1522-7383;online/electronic ISSN: 1554-5946.

2011 by Cutter Information LLC. All rights reserved. Cutter IT Journalis a trademark of Cutter Information LLC.No material in this publication may bereproduced, eaten, or distributed withoutwritten permission from the publisher.Unauthorized reproduction in any form,including photocopying, downloadingelectronic copies, posting on the Internet,image scanning, and faxing is against thelaw. Reprints make an excellent trainingtool. For information about reprints and/or back issues of Cutter Consortiumpublications, call +1 781 648 8700or email [email protected].

Subscription rates are US $485 a yearin North America, US $585 elsewhere,payable to Cutter Information LLC.Reprints, bulk purchases, past issues,and multiple subscription and site licenserates are available on request.

Part of Cutter Consortiums mission is tofoster debate and dialogue on the businesstechnology issues challenging enterprisestoday, helping organizations leverage IT forcompetitive advantage and business success.Cutters philosophy is that most of the issuesthat managers face are complex enough tomerit examination that goes beyond simplepronouncements. Founded in 1987 asAmerican Programmer by Ed Yourdon,Cutter IT Journal is one of Cutters keyvenues for debate.

The monthly Cutter IT Journal and its com-panion Cutter IT Advisor offer a variety ofperspectives on the issues youre dealing withtoday. Armed with opinion, data, and advice,youll be able to make the best decisions,employ the best practices, and choose theright strategies for your organization.

Unlike academic journals, Cutter IT Journaldoesnt water down or delay its coverage oftimely issues with lengthy peer reviews. Eachmonth, our expert Guest Editor delivers arti-cles by internationally known IT practitionersthat include case studies, research findings,and experience-based opinion on the IT topicsenterprises face today not issues you weredealing with six months ago, or those thatare so esoteric you might not ever need tolearn from others experiences. No otherjournal brings together so many cutting-edge thinkers or lets them speak so bluntly.

Cutter IT Journal subscribers consider theJournal a consultancy in print and likeneach months issue to the impassioneddebates they participate in at the end ofa day at a conference.

Every facet of IT application integration,security, portfolio management, and testing,to name a few plays a role in the successor failure of your organizations IT efforts.Only Cutter IT Journal and Cutter IT Advisordeliver a comprehensive treatment of thesecritical issues and help you make informeddecisions about the strategies that canimprove ITs performance.

Cutter IT Journal is unique in that it is writtenby IT professionals people like you whoface the same challenges and are under thesame pressures to get the job done. TheJournal brings you frank, honest accountsof what works, what doesnt, and why.

Put your IT concerns in a business context.Discover the best ways to pitch new ideasto executive management. Ensure the successof your IT organization in an economy thatencourages outsourcing and intense inter-national competition. Avoid the commonpitfalls and work smarter while under tighterconstraints. Youll learn how to do all this andmore when you subscribe to Cutter IT Journal.

About Cutter IT Journal

Cutter IT Journal

Name Title

Company Address

City State/Province ZIP/Postal Code

Email (Be sure to include for weekly Cutter IT E-Mail Advisor)

Fax to +1 781 648 8707, call +1 781 648 8700, or send email to [email protected]. Mail to Cutter Consortium, 37 Broadway,Suite 1, Arlington, MA 02474-5552, USA.

SUBSCRIBE TODAY

Request Online LicenseSubscription Rates

For subscription rates for online licenses,contact us at [email protected] or+1 781 648 8700.

Start my print subscription to Cutter IT Journal ($485/year; US $585 outside North America)

Opening Statement

3Vol. 24, No. 8 CUTTER IT JOURNAL

Despite all the great methodologies we have in IT,delivering a project to production still feels like goingto war. Developers are nervous because they have thepressure of delivering new functionality to the cus-tomer as fast as possible. On the other side, operationsresists making that change a reality because it knowschange is a major cause of outages. So the usual finger-pointing begins when problems arise: Its a develop-ment problem; Oh no, its an operations problem.This tragic scene gets repeated over and over again inmany companies, much to the frustration of manage-ment and the business, which is not able to predictreleases and deliver business value as expected andrequired.

The problem is further amplified by two industrytrends: agile development and large-scale/cloud infra-structure. Agile development by its nature embraceschange and fosters small but frequent deployments toproduction. This higher frequency puts additional pres-sure on operations, which becomes even more of a bot-tleneck in the delivery process. What is remarkable isthat even though agile actively seeks collaboration fromall its stakeholders, most agile projects did not extendthemselves toward the operations people. Continuousintegration and the testing department served as thetraditional buffer between development and operations,and in that buffer, pressure has also been building up.

The second driver, large-scale/cloud infrastructure,forced operations to change the way they managed theirinfrastructure. Traditional manual labor didnt cut it atthe scale of thousands of machines. Its good to want todo small increments and frequent changes, but you needto have the tools to support that. Virtualization enabledops to spin up new environments very quickly, andcloud made away with the resource problem. The realdifferentiator came from two concepts: configurationmanagement and infrastructure as code. The firstenables you to describe the desired state of an infra-structure through a domain-specific language, allowingyou to both create and manage your infrastructure usingthe same tools. Infrastructure as code is a result of theincrease of SaaS and APIs used to handle infrastructure.Embracing both concepts allows operations to automate

a lot of its work not only the initial provisioning of anew machine, but the whole lifecycle. This gave birthto the concept of agile infrastructure.

As a response to these two drivers, devops was born. Anumber of people got together and started a grass-rootsmovement that set out to remove the traditional bound-aries between development and operations. Someconsider this picking up where traditional agile leftoff. After all, software doesnt bring value unless it isdeployed in production; otherwise, its just inventory.To tackle the problem, devops encourages cross-silocollaboration constantly, not only when things fail.

Operations becomes a valued member of the traditionalagile process with an equal voice. Far too often, SLAsnegotiated with a customer work counter to the changesrequested by the same customer. Now both functionaland nonfunctional requirements are evaluated for theirbusiness priority. This moves the usual discussionon priorities back before development starts. As bothdevelopment and operations serve the same customer,the needs of both are discussed at the same time.

Once priorities have been determined and work canstart, developers pair together with operations peopleto get the job done. This pairing allows for betterknowledge diffusion across the two traditionally sepa-rate teams. Issues such as stability, monitoring, andbackup can be addressed immediately instead of beingafterthoughts, and operations gets a better understand-ing of how the application works before it actuallydeploys it in production.

When its hard, do it more often. Like exercise, the moreyou practice deployment to production, the better youwill get at it. Thanks to automation, operations canintroduce small changes much more quickly to the

by Patrick Debois, Guest Editor

Get The Cutter Edge free: www.cutter.com

Like exercise, the more you practice deploymentto production, the better you will get at it.

2011 Cutter Information LLCCUTTER IT JOURNAL August 20114

infrastructure. Practicing deployment in development,testing, quality assurance, and production environ-ments, and managing it from the same set of definedmodels in configuration management, results in morerepeatable, predictable results.

As both development and operations are now responsi-ble for the whole, lines are beginning to blur. Developersare allowed to push changes to production (after goingthrough an extensive set of tests), and just like their oper-ations counterparts, they wear pagers in case of emergen-cies. They are now aware of the pain that issues cause,resulting in feedback on how they program things.

People active in continuous integration were among thefirst to reach out to operations. In the first article in thisissue, Jez Humble and Joanne Molesky elaborate on theconcept of an automated deployment pipeline, whichextends the boundaries of software development to thedeployment phase. All changes to a production environ-ment follow the same flow, regardless of whether thechanges are to code, infrastructure, or data. The deploy-ment pipeline acts as sonar for issues: the further youget in the pipeline, the more robust the results get. Andyou want to track the issues as fast as possible, sincethe cost of fixing them goes up the further you go inthe process.

Feedback is available to all people: those in operationslearn what issues they might expect in production,while developers learn about the production environ-ments. Nor is feedback only of a technical nature management and the business can learn from produc-tion trial runs what customers really want and how theyreact. This moves product development even further

into customer development and allows you to test-driveyour ideas in almost real time.

In our second article, Eric Shamow shows that, likeany other change, devops is clearly a discovery path people and processes dont change overnight. Acommon pattern in devops adoption seems to be thatorganizational changes go hand in hand with the intro-duction of new tools. A tool by itself will not changeanything, but the way you use a tool can make adifference in behavior, resulting in a cultural change.Therefore, driving change requires management sup-port; otherwise, the impact of the change will be limitedto some local optimization of the global process.

Shamows story also shows that a lot of organizationswere already taking devops-like approaches, so forthem it is nothing new. The great thing with the termdevops is that people can use it to label their stories.And thanks to this common label, we can search forthose stories and learn from others about what worksor not.

All the focus in devops on automation to reduce cycletime and enhance repeatability makes you wonder ifwe will all become obsolete and be replaced by tools.There is no denying that automation can be an effectivecost cutter, as our next article, a case study by LarryFitzpatrick and Michael Dillon, clearly demonstrates. Butwe need to recognize the real opportunity: all the timewon by automation gains us more time to design andimprove our work. This is where the competitive advan-tage comes in. People sometimes lightly say that theirmost valuable asset is people, but its oh so true. Theywill give your business the edge it needs if you let them.

Agility in coding, agility in systems it takes timeand effort to nurture these ideas, but the results canbe astonishing. The feedback cycle makes the wholedifference in quality and results. With all this closecollaboration, lines begin to blur. Will we all be develop-ers? Does everybody deploy? In a multidisciplinaryapproach, you have either the option of generalizingspecialists (the approach taken in the organizationFitzpatrick and Dillon discuss) or specializing generalists.Both can work, but if you are generalizing specialists,you get the extra depth of their specialization as well.This is sometimes referred to as having T-shapedpeople in the collaboration: people with in-depthknowledge (vertical) who are also able to understandthe impact in a broad spectrum (horizontal). In thetraditional organization, this would map to a matrixorganization structure, similar to the one the authorsdescribe. Its important to rotate people through different

UPCOMING TOPICS IN CUTTER IT JOURNAL

SEPTEMBER Robert D. Scott

21st-Century IT Personnel: Tooling Up or Tooling Down?

OCTOBER Dennis A. Adams

Creative Destruction: How to Keep from Being Technologically Disrupted

5Get The Cutter Edge free: www.cutter.com Vol. 24, No. 8 CUTTER IT JOURNAL

projects and roles, and its equally important for thespecialists to touch base again with their respectivespecialist groups to catch up with new global changes.

Many of the initial devops stories originated in Web 2.0companies, startups, or large-scale Internet applications.In contrast with more traditional enterprises, theseorganizations have a much lighter and more flexiblemanagement structure. Its worth nothing that severalof the large Web 2.0 companies such as Amazon andFlickr didnt use the term agile but were workingwith a similar mindset. Small groups of both developersand operations people delivered new releases andunderstood that they were working toward the samegoals for the same customers.

Given these origins, many people at first dismissdevops, thinking it will never work in their traditionallymanaged enterprise. They see it conflicting with exist-ing frameworks like CMMI, agile, and Scrum. Consider,however, that most of the stories in this journal comefrom traditional enterprises, which should serve todebunk that myth. Traditional enterprises need to careeven more about collaboration, as the number of peopleinvolved makes one-on-one communication moredifficult. The term devops is only a stub for moreglobal company collaboration.

Our next author, Bill Phifer, clearly shows there isno reason to throw existing good practices away.They might need to be adapted in execution, but noth-ing in them inherently conflicts with devops. Focusingparticularly on CMMI for Development and ITIL V3,Phifer discusses how these two best practice modelscan be integrated to enable improved collaborationbetween development and operations. But as usefulas this mapping of CMMI and ITIL functions to eachother is, Phifer argues that it is not enough. Rather, theresult of effective integration between CMMI and ITILas applied to the devops challenge is a framework thatensures IT is aligned with the needs of the business.

Indeed, we sometimes forget that the reason we doIT is for the business, and if the business doesnt earnmoney, we can be out of a job before we know it. AsPhifer emphasizes, this relation to the business is animportant aspect in devops: some people get stuck onthe word devops, thinking that its just about devel-opment and operations working together. While thisfeedback loop is important, it must be seen as part ofthe complete system. Systems thinking advises us tooptimize the whole; therefore, devops must apply tothe whole organization, not only the part betweendevelopment and operations.

In our final article, Dominica DeGrandis focuses onjust this kind of systems thinking. She talks aboutthe leadership changes that will be required to enablethe devops revolution and outlines how statisticalprocess control can be applied to devops to increasepredictability and thus customer satisfaction. DeGrandisconcludes by describing her experience at Corbis (thesame company where David Anderson developed hisideas on applying Kanban to software development),recounting how devops-friendly practices like contin-uous integration and smaller, more frequent releasesgreatly improved the IT organizations ability to deliverbusiness value.

Only by providing positive results to the business andmanagement can IT reverse its bad reputation andbecome a reliable partner again. In order to do that,we need to break through blockers in our thoughtprocess, and devops invites us to challenge traditionalorganizational barriers. The days of top-down controlare over devops is a grass-roots movement similar toother horizontal revolutions, such as Facebook. The roleof management is changing: no longer just directive, itis taking a more supportive role, unleashing the powerof the people on the floor to achieve awesome results.

Patrick Debois is a Senior Consultant with Cutter ConsortiumsAgile Product & Project Management practice. To understandcurrent IT organizations, he has made a habit of changing both hisconsultancy role and the domain in which he works: sometimes asa developer, manager, sys admin, tester, or even as the customer.During 15 years of consultancy, there is one thing that annoys himbadly; it is the great divide between all these groups. But times arechanging now: being a player on the market requires you to getthe battles under control between these silos. Mr. Debois firstpresented concepts on agile infrastructure at Agile 2008 in Toronto,and in 2009 he organized the first devopsdays conference. Sincethen he has been promoting the notion of devops to exchange ideasbetween these different organizational groups and show how they canhelp each other achieve better results in business. Hes a very activemember of the agile and devops communities, sharing a lot infor-mation on devops through his blog http://jedi.be/blog and twitter(@patrickdebois). Mr. Debois can be reached at [email protected].

The term devops is only a stub for moreglobal company collaboration.

Roughly speaking those who work in connection with the[Automated Computing Engine] will be divided into itsmasters and its servants. Its masters will plan out instruc-tion tables for it, thinking up deeper and deeper ways ofusing it. Its servants will feed it with cards as it calls forthem. They will put right any parts that go wrong. Theywill assemble data that it requires. In fact the servantswill take the place of limbs. As time goes on the calcula-tor itself will take over the functions both of masters andof servants.

Alan Turing, Lecture on the Automatic Computing Engine1

Turing was as usual remarkably prescient inthis quote, which predicts the dev/ops division. In thisarticle, we will show that this divide acts to retard thedelivery of high-quality, valuable software. We arguethat the most effective way to provide value with ITsystems and to integrate IT with the business isthrough the creation of cross-functional product teamsthat manage services throughout their lifecycle, alongwith automation of the process of building, testing, anddeploying IT systems. We will discuss the implicationsof this strategy in terms of how IT organizations shouldbe managed and show that this model provides benefitsfor IT governance not just in terms of improved servicedelivery, but also through better risk management.

THE PREDICAMENT OF IT OPERATIONS

IT organizations are facing a serious challenge. Onthe one hand, they must enable businesses to respondever faster to changing market conditions, serving cus-tomers who use an increasing variety of devices. Onthe other hand, they are saddled with evermore com-plex systems that need to be integrated and maintainedwith a high degree of reliability and availability. Thedivision between projects and operations has become aserious constraint both on the ability of businesses toget new functionality to market faster and, ironically,on the ability of IT to maintain stable, highly available,high-quality systems and services.

The way organizations traditionally deliver technologyhas been codified in the established discipline of project

management. However, while the projects that createnew systems are often successful, these projects usuallyend with the first major release of the system thepoint at which it gets exposed to its users. At this stagethe project team disbands, the system is thrown over thewall to operations, and making further changes involveseither creating a new project or work by a business asusual team. (The flow of value in this model is shownin Figure 1.) This creates several problems:

Many developers have never had to run the systemsthey have built, and thus they dont understand thetradeoffs involved in creating systems that are reli-able, scalable, high performance, and high quality.Operations teams sometimes overcompensate forpotential performance or availability problems bybuying expensive kits that are ultimately never used.

Operations teams are measured according to thestability of the systems they manage. Their rationalresponse is to restrict deployment to production asmuch as possible so they dont have to suffer theinstability that releases of poor-quality softwareinevitably generate. Thus, a vicious cycle is created,and an unhealthy resentment between project teamsand operations teams is perpetuated.

Because there are several disincentives for teams torelease systems from early on in their lifecycle, solu-tions usually dont get near a production environ-ment until close to release time. Operation teamstight control over the build and release of physicalservers slows the ability to test the functionality anddeployment of solutions. This means that their fullproduction readiness is usually not assessed until itis too late to change architectural decisions that affectstability and performance.

The business receives little real feedback on whetherwhat is being built is valuable until the first release,which is usually many months after project approval.Several studies over the years have shown that thebiggest source of waste in software development isfeatures that are developed but are never or rarely


Why Enterprises Must Adopt Devops to EnableContinuous Deliveryby Jez Humble and Joanne Molesky

THE POWER OF THE PIPELINE


used a problem that is exacerbated by long releasecycles.

The funding model for projects versus operatingexpenses creates challenges in measuring the costof any given system over its lifecycle. Thus, it isnearly impossible to measure the value providedto the business on a per-service basis.

Due to the complexity of current systems, it is difficultto determine what should be decommissioned whena new system is up and running. The tendency is tolet the old system run, creating additional costs andcomplexity that in turn drive up IT operating costs.

The upshot is that IT operations must maintain an ever-increasing variety of heterogeneous systems, while newprojects add more. In most organizations, IT operationsconsumes by far the majority of the IT budget. If youcan drive operating costs down by preventing andremoving bloat within systems created by projects,youd have more resources to focus on problem solvingand continuous improvement of IT services.

DEVOPS: INTEGRATING PROJECT TEAMS AND IT OPERATIONS

Devops is about aligning the incentives of everybodyinvolved in delivering software, with a particular empha-sis on developers, testers, and operations personnel. Afundamental assumption of devops is that achieving bothfrequent, reliable deployments and a stable productionenvironment is not a zero-sum game. Devops is anapproach to fixing the first three problems listed abovethrough culture, automation, measurement, and sharing.2

We will address each of these aspects in turn.

Culture

In terms of culture, one important step is for operationsto be involved in the design and transition (develop-ment and deployment) of systems. This principle is infact stated in the ITIL V3 literature.3 Representativesfrom IT operations should attend applicable inceptions,retrospectives, planning meetings, and showcases ofproject teams. Meanwhile, developers should rotatethrough operations teams, and representatives fromproject teams should have regular meetings with theIT operations people. When an incident occurs inproduction, a developer should be on call to assistin discovering the root cause of the incident and tohelp resolve it if necessary.

Automation

Automation of build, deployment, and testing is keyto achieving low lead times and thus rapid feedback.Teams should implement a deployment pipeline4 toachieve this. A deployment pipeline, as shown in Figure2, is a single path to production for all changes to agiven system, whether to code, infrastructure andenvironments, database schemas and reference data,or configuration. The deployment pipeline modelsyour process for building, testing, and deploying yoursystems and is thus a manifestation of the part of yourvalue stream from check-in to release. Using the deploy-ment pipeline, each change to any system is validatedto see if it is fit for release, passing through a compre-hensive series of automated tests. If it is successful, itbecomes available for push-button deployment (withapprovals, if required) to testing, staging, and produc-tion environments.

Business ProjectsMeasured on delivery

OperationsMeasured on stability

Project teams

Tech, ops, and appmanagement

DBAs Service deskUsers

Systems

Value stream (concept to cash)

Figure 1 The flow of value in project-based organizations.


Deployments should include the automated provision-ing of all environments, which is where tools such asvirtualization, IaaS/PaaS, and data center automationtools such as Puppet, Chef, and BladeLogic come inhandy. With automated provisioning and management,all configuration and steps required to recreate the cor-rect environment for the current service are stored andmaintained in a central location. This also makes dis-aster recovery much simpler, provided you regularlyback up the source information and your data.

Measurement

Measurement includes monitoring high-level businessmetrics such as revenue or end-to-end transactions perunit time. At a lower level, it requires careful choiceof key performance indicators, since people changetheir behavior according to how they are measured. Forexample, measuring developers according to test cover-age can easily lead to many automated tests with noassertions. One way to help developers focus on creat-ing more stable systems might be to measure the effectof releases on the stability of the affected systems. Makesure key metrics are presented on big, visible displaysto everybody involved in delivering software so theycan see how well they are doing.

In terms of process, a critical characteristic of your deliv-ery process is lead time. Mary and Tom Poppendieckask, How long would it take your organization todeploy a change that involves just one single line ofcode? Do you do this on a repeatable, reliable basis?5

Set a goal for this number, and work to identify andremove the bottlenecks in your delivery process. Oftenthe biggest obstacle to delivering faster is the lengthytime required to provision and deploy to production-likeenvironments for automated acceptance testing, show-cases, and exploratory and usability testing, so this is agood place to start.

Sharing

Sharing operates at several levels. A simple but effectiveform of sharing is for development and operationsteams to celebrate successful releases together. It alsomeans sharing knowledge, such as making sure the rel-evant operations team knows what new functionality iscoming their way as soon as possible, not on the day ofthe release. Sharing development tools and techniquesto manage environments and infrastructure is also akey part of devops.

DIY Deployments

If you implemented all of the practices described above,testers and operations personnel would be able to self-service deployments of the required version of the sys-tem to their environments on demand, and developerswould get rapid feedback on the production readinessof the systems they were creating. You would have theability to perform deployments more frequently andhave fewer incidents in production. By implementingcontinuous delivery, in which systems are production-ready and deployable throughout their lifecycle, youwould also get rid of the crunches that characterizemost projects as they move toward release day.

However, while the practices outlined above can helpfix the new systems you have entering production, theydont help you fix the string and duct tape that is hold-ing your existing production systems together. Letsturn our attention to that problem now.

The pipeline tells you the status of whats currently in each environment Everyone gets visibility into the risk of each change People can self-service the deployments they want The pipeline provides complete traceability and auditing ability into all changes

Single path to production for all changes to your system

Everything required torecreate your productionsystem (except data) inversion control:

VersionControl

Build binaries;Run automated

tests

Self-serviceautomated

deployment to testing envs.

Self-serviceautomated

deployment to staging and prod.

Source code, infrastructureconfiguration, automatedtests, database scripts,tool chain

Every changetriggers a new

release candidate

Changes thatpass automated

tests areavailable fordeployment

Approved changes can bedeployed into user-facingenvironments on demand

Figure 2 The deployment pipeline.

One way to help developers focus on creatingmore stable systems might be to measurethe effect of releases on the stability of theaffected systems.


DID RUBE GOLDBERG DRAW THE ARCHITECTURE DIAGRAM FOR YOUR PRODUCTION SYSTEMS?

We propose an old-fashioned method for simplifyingyour production systems, and it rests on an approachto managing the lifecycle of your services. Treat eachstrategic service like a product, managed end to endby a small team that has firsthand access to all of theinformation required to run and change the service(see Figure 3). Use the discipline of product manage-ment, rather than project management, to evolve yourservices. Product teams are completely cross-functional,including all personnel required to build and run theservice. Each team should be able to calculate the costof building and running the service and the value itdelivers to the organization (preferably directly in termsof revenue).

There are many ways people can be organized to formproduct teams, but the key is to improve collaborationand share responsibilities for the overall quality of ser-vice delivered to the customers. As the teams cometogether and share, you will develop a knowledge basethat allows you to make better decisions on what canbe retired and when.

What is the role of a centrally run operations group inthis model? The ITIL V3 framework divides the workof the operations group into four functions:6

1. The service desk

2. Technical management

3. IT operations management

4. Application management

Although all four functions have some relationship withapplication development teams, the two most heavilyinvolved in interfacing with development teams areapplication management and IT infrastructure. Theapplication management function is responsible formanaging applications throughout their lifecycle.Technical management is also involved in the design,testing, release, and improvement of IT services, inaddition to supporting the ongoing operation of theIT infrastructure.

In a product development approach, the central appli-cation management function goes away, subsumed intoproduct teams. Nonroutine, application-specific requeststo the service desk also go to the product teams. The tech-nical management function remains but becomes focusedon providing IaaS to product teams. The teams responsi-ble for this work should also work as product teams.

To be clear, in this model there is more demand forthe skills, experience, and mindset of operations peoplewho are willing to work to improve systems, but lessfor those who create works of art manually config-ured production systems that are impossible to repro-duce or change without their personal knowledge andpresence.

Once your organization has reached some level ofmaturity in terms of the basics of devops as described

Products/Services Operations

Ops management Service desk

Users

Infrastructure

Value stream (concept to cash)

Productowner

Figure 3 The flow of value in a product development model.


in the previous section, you can start rearchitectingto reduce waste and unnecessary complexity. Select aservice that is already in production but is still underactive development and of strategic value to the busi-ness. Create a cross-functional product team to managethis service and create a new path to production, imple-mented using a deployment pipeline, for this service.When you are able to deploy to production using thedeployment pipeline exclusively, you can remove theunused, redundant, and legacy infrastructure fromyour system.

Finally, we are not proposing that the entire serviceportfolio be managed this way. This methodology issuitable for building strategic systems where the costallocation model is not artificial. For utility systemsthat are necessary for the organization but do not differ-entiate you in the market, COTS software is usually thecorrect solution. Some of the principles and practices wepresent here can be applied to these services to improvedelivery, but a dependence on the product owner tocomplete change will restrict how much you can do andhow fast you can go. Certainly, once changes are deliv-ered by a COTS supplier, you can test and deploy thechanges much faster if you have the ability to provisionsuitable test and production environments on demandusing automation.

DEVOPS AT AMAZON: IF ITS A TRENDY BUZZWORD,THEYVE BEEN DOING IT FOR YEARS

In 2001 Amazon made a decision to take its big ballof mud7 architecture and make it service-oriented.This involved not only changing the architecture of thecompanys entire system, but also its team organization.In a 2006 interview, Werner Vogels, CTO of Amazon,gives a classic statement not only of the essence ofdevops, but also of how to create product teams and cre-ate a tight feedback loop between users and the business:

Another lesson weve learned is that its not only thetechnology side that was improved by using services.The development and operational process has greatlybenefited from it as well. The services model has been akey enabler in creating teams that can innovate quicklywith a strong customer focus. Each service has a team

associated with it, and that team is completely responsiblefor the service from scoping out the functionality, toarchitecting it, to building it, and operating it.

There is another lesson here: Giving developers oper-ational responsibilities has greatly enhanced the qualityof the services, both from a customer and a technologypoint of view. The traditional model is that you take yoursoftware to the wall that separates development and oper-ations, and throw it over and then forget about it. Not atAmazon. You build it, you run it. This brings developersinto contact with the day-to-day operation of their soft-ware. It also brings them into day-to-day contact with thecustomer. This customer feedback loop is essential forimproving the quality of the service.8

Many organizations attempt to create small teams, butthey often make the mistake of splitting them function-ally based on technology and not on product or service.Amazon, in designing its organizational structure, wascareful to follow Conways Law: Organizations whichdesign systems ... are constrained to produce designswhich are copies of the communication structures ofthese organizations.9

IMPROVED RISK MANAGEMENT WITH CONTINUOUS DELIVERY

A common reason given for not trying devops and con-tinuous delivery in IT shops is that this approach doesnot comply with industry standards and regulations.Two controls that are often cited are segregation ofduties and change management.

Regulations and standards require organizations toprove they know what is happening and why, protectinformation and services, and perform accurate report-ing. Most IT organizations are subject to some kind ofregulation and implement controls in order to ensurethey are in compliance. Controls are also essential toreducing the risk of having bad things happen thatmay affect the confidentiality, integrity, and availabilityof information.

Segregation of duties is a concept derived from theworld of accounting to help prevent fraud and reducethe possibility of error. This control is required byregulations and standards such as SOX and PCI DSS.The relevant COBIT control states:

PO4.11 Segregation of Duties

Implement a division of roles and responsibilitiesthat reduces the possibility for a single individual tocompromise a critical process. Make sure that personnelare performing only authorized duties relevant to theirrespective jobs and positions.10

Many organizations attempt to createsmall teams, but they often make themistake of splitting them functionally basedon technology and not on product or service.


The spirit of the IT control is that one person should nothave the ability to make preventable errors or introducenefarious changes. At a basic level, you have checks andbalances to make sure this doesnt happen. This controlcan be implemented many different ways; an extremeinterpretation of this control by some organizations isthat development, operations, and support functionsneed to be functionally and physically separated andcannot talk to each other or view each others systems.

Those of us who have experienced working in theseorganizations know that this level of control only servesto increase cycle time, delay delivery of valuable func-tionality and bug fixes, reduce collaboration, andincrease frustration levels for everyone. Furthermore,this type of separation actually increases the risk of errorand fraud due to the lack of collaboration and under-standing between teams. All IT teams should be ableto talk and collaborate with each other on how to bestreach the common goal of successful, stable deploy-ments to production. If your IT teams dont talk andcollaborate with each other throughout the service/product delivery lifecycle, bad things will happen.

A Better Way: The Automated Deployment Pipeline

Reducing the risk of error or fraud in the deliveryprocess is better achieved through the use of an auto-mated deployment pipeline as opposed to isolated andmanual processes. It allows complete traceability fromdeployment back to source code and requirements. In afully automated deployment pipeline, every commandrequired to build, test, or deploy a piece of software isrecorded, along with its output, when and on whichmachine it was run, and who authorized it. Automationalso allows frequent, early, and comprehensive testingof changes to your systems including validating con-formance to regulations as they move through thedeployment pipeline.

This has three important effects:

1. Results are automatically documented and errorscan be detected earlier, when they are cheaper to fix.The actual deployment to production with all associ-ated changes has been tested before the real event, soeveryone goes in with a high level of confidence thatit will work. If you have to roll back for any reason,it is easier.

2. People downstream who need to approve or imple-ment changes (e.g., change advisory board members,database administrators) can be automatically notified,at an appropriate frequency and level of detail, of whatis coming their way. Thus, approvals can be performedelectronically in a just-in-time fashion.

3. Automating all aspects of the pipeline, includingprovisioning and management of infrastructure,allows all environments to be locked down suchthat they can only be changed using automatedprocesses approved by authorized personnel.

Thus, the stated goals of the change managementprocess are achieved:11

Responding to the customers changing businessrequirements while maximizing value and reducingincidents, disruptions, and rework

Responding to business and IT requests for changethat will align the services with the business needs

As well as meeting the spirit of the control, thisapproach makes it possible to conform to the letter ofthe control. Segregation of duties is achieved by havingyour release management system run all commandswithin the deployment pipeline as a special user createdfor this purpose. Modern release management systemsallow you to lock down who can perform any givenaction and will record who authorized what, and whenthey did so, for later auditing. Compensating controls(monitoring, alerts, and reviews) should also be appliedto detect unauthorized changes.

IMPLEMENTING CONTINUOUS DELIVERY

Continuous delivery enables businesses to reduce cycletime so as to get faster feedback from users, reduce therisk and cost of deployments, get better visibility intothe delivery process itself, and manage the risks of soft-ware delivery more effectively. At the highest levelof maturity, continuous delivery means knowing thatyou can release your system on demand with virtuallyno technical risk. Deployments become non-events(because they are done on a regular basis), and all teammembers experience a steadier pace of work with lessstress and overtime. IT waits for the business, instead ofthe other way around. Business risk is reduced becausedecisions are based on feedback from working software,not vaporware based on hypothesis. Thus, IT becomesintegrated into the business.

Achieving these benefits within enterprises requires thediscipline of devops: a culture of collaboration between

If your IT teams dont talk and collaborate witheach other throughout the service/productdelivery lifecycle, bad things will happen.


all team members; measurement of process, value, cost,and technical metrics; sharing of knowledge and tools;and regular retrospectives as an input to a process ofcontinuous improvement.

From a risk and compliance perspective, continuousdelivery is a more mature, efficient, and effectivemethod for applying controls to meet regulatory require-ments than the traditional combination of automatedand manual activities, handoffs between teams, and last-minute heroics to get changes to work in production.

It is important not to underestimate the complexity ofimplementing continuous delivery. We have shown thatobjections to continuous delivery based on risk manage-ment concerns are the result of false reasoning and mis-interpretation of IT frameworks and controls. Rather,the main barrier to implementation will be organiza-tional. Success requires a culture that enables collabora-tion and understanding between the functional groupsthat deliver IT services.

The hardest part of implementing this change is to deter-mine what will work best in your circumstances andwhere to begin. Start by mapping out the current deploy-ment pipeline (path to production), engaging everyonewho contributes to delivery to identify all items andactivities required to make the service work. Measure theelapsed time and feedback cycles. Keep incrementalismand collaboration at the heart of everything you do whether its deployments or organizational change.

ACKNOWLEDGMENTS

The authors would like to thank Evan Bottcher, TimCoote, Jim Fischer, Jim Highsmith, John Kordyback,and Ian Proctor for giving their feedback on an earlydraft of this article.

ENDNOTES1Turing, Alan. The Essential Turing, edited by B. Jack Copeland.Oxford University Press, 2004.

2Willis, John. What Devops Means to Me. Opscode, Inc.,16 July 2010 (www.opscode.com/blog/2010/07/16/what-devops-means-to-me).

3ITIL Service Transition. ITIL V3. Office of GovernmentCommerce (OGC), 2007. (ITIL published a Summary ofUpdates in August 2011; see www.itil-officialsite.com.)

4Humble, Jez, and David Farley. Continuous Delivery:Anatomy of the Deployment Pipeline. informIT, 7 September2010 (www.informit.com/articles/article.aspx?p=1621865).

5Poppendieck, Mary, and Tom Poppendieck. Implementing LeanSoftware Development: From Concept to Cash. Addison-WesleyProfessional, 2006.

6ITIL Service Transition. See 3.7Foote, Brian, and Joseph Yoder. Big Ball of Mud. In PatternLanguages of Program Design 4, edited by Neil Harrison, BrianFoote, and Hans Rohnert. Addison-Wesley, 2000.

8Vogels, Werner. A Conversation with Werner Vogels.Interview by Jim Gray. ACM Queue, 30 June 2006.

9Conway, Melvin E. How Do Committees Invent? Datamation,Vol. 14, No. 5, April 1968, pp. 28-31.

10COBIT 4.1. IT Governance Institute (ITGI), 2007.11ITIL Service Transition. See 3.

Jez Humble is a Principal at ThoughtWorks Studios and author ofContinuous Delivery, published in Martin Fowlers Signature Series(Addison-Wesley, 2010). He has worked with a variety of platformsand technologies, consulting for nonprofits, telecoms, and financialservices and online retail companies. His focus is on helping organiza-tions deliver valuable, high-quality software frequently and reliablythrough implementing effective engineering practices. Mr. Humblecan be reached at [email protected].

Joanne Molesky is an IT governance professional and works as aPrincipal Consultant with ThoughtWorks Australia. Her work focuseson helping companies to reduce risk, optimize IT processes, and meetregulatory compliance through the application of agile methodologiesand principles in delivering IT services. She is certified in ITILand COBIT Foundations and holds CISA and CRISC designationsthrough ISACA. Ms. Molesky can be reached at [email protected].

Advance Internet is a medium-sized company withroughly 75 employees and around 1,000 servers devotedto a wide array of internally written and modified appli-cations. When I arrived here in 2009, the company wasin the midst of rolling out a new configuration manage-ment system (CMS) that had been in the works forroughly three years. Multiple deployment failures andpoor information transfer had resulted in a completebreakdown of communication between operations anddevelopment. Developers were finding it increasinglydifficult to get code onto test or production servers, or toget new servers provisioned. When they were deployed,it was often done incorrectly or without followingrelease notes, and it often had to be redone severaltimes. Operations had no information from develop-ment about when to expect releases, and when thereleases did come, they would often be incomplete orinconsistent: they had configuration files in varied loca-tions, contained poor or outdated release notes, andoften depended on applications not installed from anyrepository. The entire operations team eventually leftunder various circumstances, and I was brought in asthe new manager of systems operations to evaluate thesituation, build a new team, and try to instill a cultureof cooperation between development and operations.

I was not aware of devops as a movement at the timewe started this project, although I became aware of itthrough my work and by reaching out to other pro-fessionals for suggestions and support. However, myexperience discovering these problems and solving themfrom scratch led us straight down the devops path.

THE UNDERLYING ENVIRONMENT

There were several problems that became evidentimmediately. We began to tackle these issues individu-ally, beginning with monitoring. At that time, the alertswe received were not relevant and timely, and metricswere unavailable, which led to a lot of guesswork andfrom-the-gut decision making. The build situation alsoneeded to be fixed. New servers would be built by oper-ations, be handed over to development for vetting, andthen bounce back and forth in a three- or four-week

cycle as corrections and modifications were made onboth sides. By the end of this process, there was no clearrecord of who had done what, and the next build wasinevitably condemned to failure. Even once we had got-ten an idea of what went into a successful build, therewas no automated means of repeating the process.

Monitoring and alerting turned out to be the easy partof the equation, but the one that would have the greatestimpact on the tenor of the changes to come. Rather thanattempting to fix the existing monitoring framework,we built a brand-new one using Nagios and PNP forgraphing. We then developed an application severitymatrix based around the organizations list of knownapplications and their relative importance in orderto form the basis for a per-application escalation chain.With these established, we then met group by groupwith each team and the relevant stakeholders for theapplication or server in question and painstakinglylisted which services needed to be monitored, how criti-cal those services were, and when alerts should escalate.

The key piece of this was involving not just peopleresponsible for monitoring the system, but the peoplewho owned the system on the business side. Becausethey were participating in the discussion, they could helpdetermine when a software engineer should be pagedor when an alert could be ignored until morning, whenthey themselves should be paged, and how long to waitbefore senior management was notified. Once thesemeetings were held, we were able to move over 80%of the servers in our original alerting system to thenew one. The remaining outliers enabled us to convertunknown unknowns into known unknowns wenow had a map of those servers with no clear owner andlittle documentation, so we knew which servers requiredthe most focus. We also established a recurring quarterlyschedule with the same stakeholders to review theirapplication settings and determine whether we neededto make adjustments in thresholds and escalations.

Once this process was complete, access to our newNagios install was given to everyone inside the technicalorganization, from the lowest-level support representa-tive to the VP overseeing our group. We met with any-one who was interested and had several training classes


Devops at Advance Internet: How We Got in the Doorby Eric Shamow

DISCIPLES OF DISCIPLINE


that demonstrated how to navigate the interface and howto build graph correlation pages. The goal was to openthe metrics to the organization. Operations may not beable to keep its eye on everything at all times, so withmore people with different interests looking at our data,problems were more likely to become apparent and deci-sions could be made and fact-checked immediately.

In the meantime, the configuration piece was movingmuch more slowly. Advance had long grown past thepoint where server deployments were manual, butwhat was in place was not much better than that man-ual deploy. There were two build systems in place CONF REPO and HOMR and I will stop to discussthem here purely as cautionary tales. Prior groups hadclearly attempted to improve server deployment beforeand failed. The question I faced was, why? And howcould I avoid repeating those mistakes?

PREVIOUS ATTEMPTS AT A TOOLSET, AND MOVING FORWARD

In small-scale environments, making bulk configurationchanges is not challenging. Servers are configured indi-vidually, and when a change must be made to multiplesystems, administrators typically use a tool such asclusterSSH (which sends input from the users work-station to multiple remote systems) or a shell script,which repeats the same changes on all of these systemssequentially. As environments become larger, however,it becomes more difficult to maintain systems in thisway, as any changes made in an ad hoc fashion mustbe carefully documented and applied across the board.Therefore, in large-scale environments, it is common touse a configuration management system, which trackschanges and applies them automatically to designatedservers.

The original configuration environment evident atthe time of my arrival at Advance was CONF REPO.CONF REPO was in fact a heavily modified version ofCFEngine. At some point in time, an inadequately testedchange in CFEngine apparently caused a widespreadsystems outage, and the ops team responded by mod-ifying CFEngines default behavior. A script calledcache_configs now attempted to generate a simulation

of CFEngines cache from configuration files and auto-matically copied this to each host. The host would thencheck what was in the new cache against what was onthe filesystem, and it would then stop and send an alertrather than replace the files if there were any discrepan-cies. Although most of this functionality could havebeen achieved by CFEngine out of the box, the teamhad elected to rewrite large parts of it in the new CONFREPO framework, with the result that as CONF REPOscodebase diverged from CFEngines, the frameworkbecame unmaintainable. CONF REPO was dropped asan active concern but never fully retired, so many hostsremained on this system.

This was followed by an attempt at writing a configmanagement system from scratch, called HOMR.HOMR was really a wrapper around Subversion andfunctioned as a pull-only tool. Changes would bechecked into HOMR, and then homr forceconfigswould be run on the client. This would pull downall configs and replace existing ones without warning.The result was that HOMR wouldnt be run on ahost for weeks or months while individual adminstinkered with settings; then another admin on anentirely unrelated task (changing default SSH or NSCDsettings, for instance) would run homr forceconfigsacross the board and wipe out a bunch of serversconfigs. This often resulted in production outages.

My evaluation of this situation was that operations wastrying to solve a discipline problem with a technologicalsolution. CFEngine, the originally chosen configurationmanagement tool, is a well-respected and widely usedsystem. Had it been used properly and left in place,Advance would have had a CMS years ago. Instead,due to poor discipline surrounding testing of changes,the system was abandoned and replaced with increas-ingly convoluted and difficult-to-maintain frameworks.This was not the way forward.

Although there was appeal in simply returning toCFEngine, there was enough mistrust of the tool aroundthe organization that I instead elected to get a fresh startby bringing in Puppet as our new CMS. The idea wasto create a greenfield safe zone of servers that wereproperly configured and managed and then to slowlyexpand that zone to include more and more of ourenvironment.

Advance is a CentOS/RHEL shop, so I started bybuilding a Cobbler server and cleaning up our exist-ing Kickstart environment so that we had a reasonableinstall configuration. I then took the base configurationout of HOMR, translated it into Puppet, and set up a

Operations was trying to solve a disciplineproblem with a technological solution.


bootstrap script in our Kickstart so that we could buildPuppet machines with our base environment. We nowhad the ability to bring up new Puppet hosts. I wrappedthe Puppet configuration directories in a Mercurial repos-itory and shut off direct write access to this directory forthe entire admin team all changes would have to bemade through Mercurial, ensuring accountability.

The next step was to start porting configs into config-uration management (hereafter CM), building morehosts, and training my newly hired team of admins touse this system. More critically, I needed a way to getbuy-in from suspicious developers and product owners(who had heard lots of stories of new configurationmanagement systems before) so that I could lift up theirexisting systems, get their time and involvement inbuilding CM classes reflecting their use, and ultimatelyrebuild their production environments to prove thatwe had in fact captured the correct configuration. Thisneeded to work, it needed to be surrounded by account-ability and visible metrics, and it needed to be donequickly.

BOOTSTRAPPING THE PROCESS

There were a few key pieces of learned wisdom I tookinto this next phase that helped ensure that it was a suc-cess. This was information Id gathered during my yearsas a system administrator, and I knew that if I couldleverage it, Id get buy-in both from my new staff andfrom the rest of the organization. To wit:

Unplanned changes are the root of most failures.Almost all organizations resist anything like changemanagement until you can prove this to them, inblack and white.

Engineers do a much better job of selling internalinitiatives to each other than management does, evenif the manager in question is an engineer. When possi-ble, train a trainer, then let him or her train the group.

Engineers like to play with fun stuff. If you tie funstuff to what they consider boring stuff, they willput up with a certain amount of the boring to getat the fun.

Organizations generally distrust operations ifoperations has a lousy track record. The only wayto fix this is to demonstrate repeatedly and over along period of time that operations can execute tothe businesss priorities.

The unplanned changes principle was the first itemto tackle and the easiest to prove within my team. Most

engineers are at least somewhat aware of this phenome-non, even if they think that they themselves are neverresponsible for such problems. Clear metrics, graphs,and logs made this stunningly clear. Problems werealmost always caused by changes that had been madeoutside the approved process, even if those problemsdidnt show up until a change made as part of theprocess ran up against the unplanned change weekslater. It took a lot longer to prove this to the business,because engineers will see the results in a log file andimmediately draw the picture, whereas less technicalstakeholders will require explanations and summaryinformation. But over time, the concept is one that mostpeople can understand intuitively; logs and audit trailsmade the case.

I was careful to pick a single person within the group totrain on Puppet, a move that had a number of effects. Itcreated something of a mystique around the tool (Whydoes he get to play with it? I want in!). It also enabledme to carefully shape that persons methodology toensure that it was in line with my thinking, while givinghim a lot of latitude to develop his own best practicesand standards. This was a true collaboration, and itgave me a second person who could now answer CMquestions as well as I could and who could train others.

Finally, we used the carrot approach and movedthe most exciting, interesting new projects into CM.Anything new and shiny went there, regardless of itsimportance to the business. Want to build a new backupserver for everybodys local Time Machine backups?Fine, but its got to go in CM, and I want to see some-one else deploy the final build to ensure it works. Afew months of this, combined with scheduled training,meant that we quickly had a full group of people atleast nominally versed in how Puppet worked, how tointeract with Mercurial, and how to push changes outto servers.

At this point we were successful, but we had not yetachieved our goals. We could easily have failed herethrough the same kind of testing failure that broughtdown CFEngine within our group. The key step wasscaling the toolset to keep pace with the number ofpeople involved.

SCALING

Once we were prepared to move beyond two peoplemaking commits to our repository, we needed to worryabout integration testing. At this point I identified twopossible strategies for dealing with this:


1. Automated integration testing with a toolset such asCucumber

2. Manual testing with a development environment

Although the first was clearly the optimal solution, thetwo werent mutually exclusive. While the Cucumbersolution would require a long period of bootstrappingand development, the development environment couldbe set up quickly and be used and tested immediately.We thus forked our efforts: I tasked my CM expert onthe team with figuring out the development environ-ment and how to get code moved around, and I beganthe longer-term work on the Cucumber solution.

An issue that immediately raised itself was how to dealwith potential code conflicts and the code repository.The team we had built was comfortable with scriptingand low-level coding but had limited familiarity withcode repositories. My initial plan had been to use typi-cal developer tools such as branches and tags or per-haps a more sophisticated Mercurial-specific toolsetrevolving around Mercurial Queues. As we began tointroduce more junior admins into the process, how-ever, we found an enormous amount of confusionaround merging branches, the appropriate time tomerge, how to deal with conflicts, and other typicalsmall-group issues when people encounter distributedchange control for the first time. Rather than attempt toeducate a large number of people on this quickly, wedecided to build a simple semaphore-based system toprevent two people from working on the same CMmodule simultaneously.

Some will argue that this solution is suboptimal, but itallowed us to work around the branching issue duringthe early phases of deployment. There was never andstill isnt any doubt that the right way to do this waswith native repository technologies, but the key point forus was that not adhering to those enabled us to roll outa functioning environment with minimal conflicts andwithout the need to debug and train on complex mergeissues. This kept our admins focused on the core ele-ments of moving to real change control rather than tiedup in the details of running it. The goal was to keepit simple initially to encourage more focus on gettingsystems into CM and writing good, clean CM entries.

DISCIPLINE AND CODING STANDARDS

Again drawing from lessons learned as a systemsadministrator and dealing with maintaining admin-written scripts, I began work on Puppet, operatingfrom a series of core principles that informed decisionmaking around the environment:

Coding standards are easy. Many style guides areavailable online, but if your tool of choice doesnthave one, write your own. It does not have to bedeeply complex, but your code should adhere totwo principles: code must be clean and readable, andoutput must be similarly clean and machine parsable.

Development should be as generic as possible sothat nobody wastes time reinventing the wheel. Ifyou are building a ticket server that runs behindApache, take the time to build a generic Apachemodule that can be used by the entire team; if it usesMySQL, build that MySQL module. Treat the codepool as a commons and make everyone responsiblefor growing it and keeping it clean.

Rely on self-correction. Once a critical mass is builtand everyone is contributing to the same code base,peers wont tolerate bad code, and you as a managerwont have to execute as much oversight day-to-day.Oversight and periodic code review are still impor-tant, but its much more important to have a stronggroup ethic that contributing quick but poorly writ-ten code to the repository is unacceptable. The morethat peers find errors in each others code and help tofix them, the more likely the system is to be stable asa whole and the better the code will likely integrate.

Enforce the use of a development environment. Ifchanges arent put into place in a real, running CMenvironment and run in that environment to verifythat they function properly, there is almost no pointin doing devops you are just abstracting theprocess of doing it live. The key isnt just to putcode through a code repository its to then do thesame rigorous testing and deployment that develop-ers must do before operations accepts its code asready to go.

We did all of the above, to good effect. Within two tothree months, everybody on the team had a few per-sonal virtual machines hooked up to the developmentenvironment, which they could quickly redeploy tomatch real-world systems. Code was being contributed,shared, fixed, and extended by the entire group. Whenthe CM system got upgraded and offered us new fea-tures we wanted to use, it took only very little trainingand some examples set by the specialist to enable therest of the group to start working with the new tools.

Treat the code pool as a commons andmake everyone responsible for growing itand keeping it clean.


CLOSING THE LOOP

None of the things we did to this point would havebeen helpful if we hadnt achieved the goal of improvedcommunication and better deployment. So how did thecompany respond to these changes?

Most of the concerns from management and othergroups were focused on time to deliver new builds andserver updates, the accuracy of those builds, and com-munication with development about the processes itneeded to follow to get material onto a server. Prior toour changeover to devops-style configuration manage-ment, the CMS project was a particular sore point.Rebuilding an individual server took an average oftwo weeks and two full days of admin time. After thechange, a single engineer could do that same build inhalf a day with active attention not fully on the build with absolute precision. As a result, we were ableto rebuild entire CMS environments in a day. Becausedevelopers worked with us to manage configurationfiles (only operations has commit rights, but we willoften grant developers limited read access), it becamevery clear what needed to be in a package. And tofacilitate faster development, for nonproduction orQA environments such as development and load, weused the Puppet ensure => latest directive for key pack-ages. Consequently, developers only needed to checkthe newest version of their application package into arepository, and within 30 minutes Puppet would installit to their environments. In the meantime, our publiclyavailable and transparent metrics enabled them tomeasure the results of these changes and ensure thatchanges were consistent across environments.

These developments had two immediate effects. First,the folks on the old CMS project started talking loudlyabout how well our project was going, and soon otherproduct managers whose applications suffered froma situation similar to the former CMS project began torequest that their servers move into our new CM. Wesoon had a queue longer than we could handle of proj-ects fighting to be moved into the new environment.For a while, my ability to move servers in laggedbehind my ability to train my staff in doing so.

Second, senior management immediately noticed theimprovement in turnaround time and the disappearanceof debugging time due to environment inconsistency.They immediately mandated that we roll our new CMout to as many environments as possible, with the goalof fully moving all of Advance into full CM by yearsend. Management recognized the importance of thisapproach and was willing to make it a business ratherthan a technological priority.

I should not underemphasize the role of PR in makingthis project a success. A tremendous amount of timewent into meeting with product owners, project man-agers, development teams, and other stakeholders toassure them about what was changing in their newenvironment. I was also willing to bend a lot of rulesin less critical areas to get what we wanted in place;we made some short-term engineering compromises toassuage doubts in exchange for the ability to gain bettercontrol of the system in its entirety. For instance, wewould often waive our rules barring code releases onFridays, or we would handle less urgent requests withan out-of-band response normally reserved for emer-gencies to ensure that the developers would feel a senseof mutual cooperation. The gamble was that allowinga few expediencies would enable us to establish andmaintain trust with the developers so we could accom-plish our larger goals. That gamble has paid off.

Puppet is now fully entrenched in every major applica-tion stack, and it is at roughly 60% penetration withinour environment. The hosts not yet in the system aregenerally development or low-use production hosts thatfor a variety of reasons havent been ported; we con-tinue slow progress on these. We have also begun totrain engineers attached to the development teams inthe systems use, so that they can build modules for usthat we can vet and deploy for new environments. Thisgives the development teams increased insight into andcontrol over deployments, while allowing us to main-tain oversight and standards compliance. We have alsoestablished some parts of our Mercurial repositorieswhere developers can directly commit files (e.g., often-changing configuration files) themselves, while stillleveraging our system and change history.

NEXT STEPS

Were not finished yet in fact were just gettingstarted. Exposure to the devops community has leftme full of fresh ideas for where to go next, but I wouldsay that our top three priorities are integration andunit testing, elastic deployments, and visibility andmonitoring improvements.

For integration and unit testing, we have alreadyselected Cucumber and have begun writing a testsuite, with the goal that all modifications to ourPuppet environment will have to pass these tests inorder to be committed. This suite would not be in lieuof, but rather in addition to, our development environ-ment. The ultimate goal here is to enable developmentteams or product owners to write agile-style user


stories in Cucumber, which developers can translateinto RSpec and we can then run to ensure that CMclasses do what they assert they can do. The integrationtesting piece is complete; unit tests will take consider-ably more time, but we will roll these out over the nextsix months.

Elastic deployment is ready to go but has not beenimplemented. Discussion of this is beyond the scopeof this article, but the summary is that we use a combi-nation of an automated deploy script, Cobbler, Puppet,and Nagios to determine thresholds for startup andshutdown and then automatically deploy hosts aroundsome concrete business rules. The tools are in place, andwe are now looking at viable candidate environmentsfor test deployment.

Finally, visibility and monitoring, like security, are nota destination but a process. We continue to sharpen thesaw by examining alternate monitoring tools, differentways to slice and improve metrics, automating simplestatistical analysis such as moving trend lines, anddeveloping dashboards. We are currently working ona number of dashboard components that would allowtechnically minded developers and even senior manage-ment to put together correlations and information fromthe servers on the fly.

Devops is now entrenched in the culture and fully sup-ported by management. Now that weve gotten it in thedoor, its up to us to use it to drive our environmentfurther and even more tightly couple operations withdevelopment. Were on the same team now we sharethe same information and tools.

FOR THOSE BEGINNING THE JOURNEY

If you are reading this and havent yet begun thetransition to a functioning devops environment, thelength of the road may seem daunting. There are manyorganization-specific choices we made that may not beappropriate for you, but the key is not to get hung upon the details. I would offer the following thoughts toanyone looking to implement a solution similar to ours:

Above all else, focus on the culture change andremember that you are working in a business envi-ronment. This change in culture will only occur ifthere is total commitment both from the driver of thechange and the majority of the people who will haveto participate in the new culture.

If you must give and be prepared to give, becauseat the end of the day, there will come an emergency

more critical than your adherence to protocol havea procedure in place for integrating those changesonce the crisis is over. Dont allow management orother teams to see devops as something that preventscritical change; rather, emphasize the seriousness ofbypassing it and then facilitate the changes the stake-holders need while working within your own systemfor addressing exceptions.

Dont worry about the specific toolset let the toolsyou choose select themselves based on the cultureand skills of the group you have. Choosing the per-fect configuration management system for a teamwhose strengths are not in the same area as the toolwill not provide the best results, and the best-in-breed solution may not be the best for you. Keepeveryone comfortable using the system and they willbe more willing to use it; tools can always be changedonce procedures are set.

Your own transparency will open up other groups.The culture of information hiding in IT can be toxicand is done for various reasons, many of which areunintentional. Whether you are on the dev or theops side of the divide, ensure that regardless ofthe transparency of the other side your informa-tion, your metrics, your documentation, and yourprocedures are kept open and available for view. Beopen and willing to discuss changes in procedurewith people that are affected by them. If you allowthe devops change to be just another proceduralchange, it will fail. The change must fit the contoursof your organization, which means give and take onboth sides. If there has been a rift, you will need togive first and often to establish trust. The trust youearn at this point will buy you the leeway to do thepreviously unthinkable once the project really startsrolling.

At the time of writing, Eric Shamow was Manager of SystemsOperations for Advance Internet. Mr. Shamow has led and workedin a wide variety of operations teams for the past 10 years. He hasspent time in university environments and private industry, workedin a Wall Street startup, and at the number three comScore-ratednews site. Mr. Shamow has managed a variety of Linux and Unixenvironments, as well as several large Windows-based deployments.He recently began work as a Professional Services Engineer forPuppet Labs.

Mr. Shamow has extensive experience in automating and document-ing large-scale operational environments and has long worked to bringmetrics-driven transparency to his teams. He has a paper due to bepresented at USENIXs LISA 11 on his work building an elastic vir-tual environment. Mr. Shamow can be reached at [email protected].

THE JOURNEY BEGINS

In 2005, a large organization with a fully outsourcedIT operation and a sizable application developmentfootprint was relying primarily on manpower formany aspects of development support, engineering,and operations. Having failed to leverage automation,its development, deployment, and operations latencieswere measured in months, and manual change controlsled to repeated break/fix cycles. Five years into theunderperforming outsourcing contract, the organizationlacked confidence that the outsourcer could execute onseveral key initiatives, so it decided to take back controland insource its application development. Today, thespeed, capacity, and reliability of the operation is verydifferent, due in large part to a devops group thatgrew within the new development organization, func-tioning at the juncture of applications and operations.

The devops movement (see Table 1) grew out of anunderstanding of the interdependence and importanceof both the development and operations disciplines inmeeting an organizations goal of rapidly producingsoftware products and services. In this organization,a devops group emerged during the aforementionedinsourcing of the companys application development1

capabilities. Out of a desire to achieve consistency, effi-ciency, visibility, and predictability in development,and to support the transition to iterative/agile methods,the organization consolidated its development supportresources from multiple development teams into a cen-tralized Development Services Group (DSG). The DSGsmission was to standardize practice, automate develop-ment, and drive staffing efficiencies for software config-uration management (SCM)/build, quality assurance(testing), and systems analysis activities.

From its inception five years ago, the DSG has led atransformation that delivers value far beyond initialexpectations. Serving a development community ofover 500, the group operates a centralized build farmthat handles 25,000 software builds a year and an auto-mated deployment stack that executes 1,500 softwaredeployments a year. In addition to improving reliabil-ity, velocity, and transparency, this foundation has also

been used to automate test frameworks, provisioning,monitoring, and security assurance. Most significantly,the group has captured cost savings and avoidance forthe entire IT organization.

Bumps in the Road

This journey was particularly challenging and per-haps different from other devops stories due to theexistence of an outsourcer that was highly resistantto change and frequently cloaked itself in the termsof the contract. This single-vendor contract coveredsubstantially all of the IT function from operations tomaintenance and development. Fortunately, certaincontractual terms (activated by outsourcer failures)allowed the organization to begin to take control ofdevelopment. Minor grumbling proved little hindrancewhen the new development organization took backcontrol of development tool support (SCM, build, issuetracking, etc.), but the resistance escalated substantiallywhen other functions proved inadequate.

Within two years of insourcing application develop-ment, development cycle times had been reduced frommonths to weeks. But this development accelerationexposed a major impedance mismatch between oper-ations and development teams. Inefficiencies in the out-sourced operations ticket handling, system provisioningin both development and production environments,deployment, change management, and monitoringwere exposed as major bottlenecks as well as a sourceof tension between the applications and operations


The Business Case for Devops: A Five-Year Retrospectiveby Lawrence Fitzpatrick and Michael Dillon

IF YOU WANT IT DONE RIGHT, YOU'VE GOT TO DO IT YOURSELF

Increased collaboration between operations and development

Reduced cycle times for operations activities (e.g., provisioning, deployment, change controls)

Extreme focus on automation in tools and processes

Fostering continuous improvement as both a means to the above and as a strategy to adapt to increasingly rapid changes happening in IT

Table 1 Devops in a Nutshell


organizations. Driven by increasing demands for veloc-ity and the inability of the existing operational modelto service these needs, the DSG took on some capabilityhistorically reserved for operations.

A number of obstacles dogged the organization as tradi-tionally operational roles were augmented and trans-formed. The infrastructure groups feared loss of controlwithout loss of accountability and felt constrained bythe terms of the outsourcing contract. Finance was con-cerned that the organization was paying twice for thesame services. Compliance watchdogs didnt know howto interpret DSGs role in the new landscape. Isolateddevelopment teams fought the loss of their oilers2

to automation.

Focusing on Results

By focusing on project delivery, sticking to solvingproblems that impacted development productivity,and paying close attention to cultural issues, the DSGwas able to sidestep objections and deliver results.Experienced leaders in the group knew the devopsapproach would yield agility, flexibility, and cost bene-fits, but they had no way to quantify those benefitsa priori. Now, with five years of experience and metricsbehind them, the DSG has been able to substantiate thevalue proposition. Their experience and lessons learnedare reflected in the key decision points and guidingprinciples that follow.

GUIDING PRINCIPLES FOR THE DSG

In a period of 15 months between 2006-2007, the newlyformed DSG grew within a development organizationthat went from 50 people and eight projects to 175people and 25 projects. Within another two years, thedevelopment organizations size had tripled again. Inorder to maintain efficiency and effectiveness duringthis rapid growth, the DSG focused on three principles:

1. Centralize

2. Remain project-focused

3. Automate wherever possible

Not only did these principles keep the group focused,but they also reinforced the value of the devops approach

throughout the organization. Corporate stakeholdersintuitively accepted the value of centralization andautomation for cost containment, and project stakeholdersaccepted the sharing of project resources in exchange forthe emphasis on project delivery, skill, and automation.

Centralize Capabilities as Practices

An obvious way to achieve economies of scale and con-sistency is to centralize. The DSG initially consolidatedpractices around SCM, testing, and systems analysis byhiring a practice lead for each area. The practice leadwas responsible for delivering skilled staff to projectteams, managing staff, and developing best practicesand procedures to be used across projects. Initially,project leadership resisted allowing the practice leads tohire for their teams, but by letting the experts hire theirown kind, the practice leads took more ownership ofproject problems and recruited more qualified staff.

Remain Project-Focused A Hybrid Matrix

In order to neutralize the tendency of centralized orga-nizations to lose touch with those they serve, the DSGstaff was organizationally managed by their respectivepractice leads but were funded by the project teams,embedded into the project teams, and operationallymanaged by the respective project manager or otherproject leadership (see Figure 1). This dual-reportingtension a practice boss (from DSG) and an oper-ational boss (from the project) was necessary toensure that both the projects and the larger organiza-tions goals were met. This structure reinforced impor-tant values: a commitment to delivery, a bidirectionalconduit to harness innovation from the floor and intro-duce it back through the practice to other project teams,the ability to hold to standards in the face of deliverypressure, and visibility into project team risk and per-formance through multiple management channels. Italso helped resolve funding issues common to sharedservices groups: project teams were required to fundtheir own DSG resources, even though they wererecruited and managed by DSG, while simultaneouslyholding the practices to delivery expectations.

Automate and Manage to Metrics

While the automation of development procedure andthe generation of performance metrics tend to drive effi-ciency and consistency, development teams often fail toautomate. Their reluctance is understandable amongthe things that preclude automation are a lack of moneyfor tools, project urgencies that trump the investment oftime needed to code procedures, or a bias toward hiringnonprogrammers into support roles in order to save

This dual-reporting tension was necessary toensure that both the projects and the largerorganizations goals were met.


money. The DSG eliminated these obstacles by invest-ing in tools, affirming the importance and expectationof automation (tracked by metrics and published indashboards), and ensuring that skilled professionalswere hired for crucial service roles. All this was pro-vided as a service to the delivery teams.

EARLY WINS AND PRACTICE LEADERSHIP

Because the DSG represented a significant change tothe organizations structure (i.e., by introducing matrixpractices), the group needed to show stakeholders earlysuccesses. A focus on recruiting technical specialistsbecame the single most important step on the roadto building DSG credibility and acceptance. The DSGhired staff with a particular profile: a high affinity forautomation, a strong sense of service to the develop-ment community, a considerable track record of deliv-ery, and expertise in their practice area.

Practice staff were targeted to key projects that hadinfluence across dimensions such as platform (.NET,Java), size, maturity, and importance to the organiza-tion.3 Despite being placed and managed from the cen-ter, the embedded practice staff was held accountablefor delivery as part of the development team in orderto defuse sensitivity to remote control from the outside.This focus on accountability helped to mitigate the keyweakness of a pure matrix organization the drop-inengagement model which doesnt breed trust orenable the tight communication loops common to high-functioning teams.

As practice staff merged into the development streamsof key projects, their unique positions allowed them toobserve patterns across the organization that were hin-dering development and to identify opportunities tointroduce improvements (see

Date post:	13-Sep-2015
Category:	Documents
Upload:	mails4vips
View:	235 times
Download:	3 times

Devops a Software Revolution

Documents