+ All Categories
Home > Business > 2011 09 19 LSPE Dev Ops Cookbook 1a

2011 09 19 LSPE Dev Ops Cookbook 1a

Date post: 13-May-2015
Category:
Upload: gene-kim
View: 1,313 times
Download: 1 times
Share this document with a friend
Popular Tags:
42
@RealGeneKim, [email protected] The DevOps Cookbook: Codifying Kick-Ass Business Practices That Matter Gene Kim, CISA, TOCICO Jonah #lspe September 19, 2011
Transcript
Page 1: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

The DevOps Cookbook: Codifying Kick-Ass Business Practices

That Matter

Gene Kim, CISA, TOCICO Jonah#lspe

September 19, 2011

Page 2: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Where Did The High Performers Come From?

Page 3: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Higher Performing IT Organizations Are More Stable, Nimble, Compliant And Secure

High performers maintain a posture of compliance Fewest number of repeat audit findings One-third amount of audit preparation effort

High performers find and fix security breaches faster 5 times more likely to detect breaches by automated control 5 times less likely to have breaches result in a loss event

When high performers implement changes… 14 times more changes One-half the change failure rate One-quarter the first fix failure rate 10x faster MTTR for Sev 1 outages

When high performers manage IT resources… One-third the amount of unplanned work 8 times more projects and IT services 6 times more applications

Source: IT Process Institute, 2008

Page 4: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Common Traits of High Performers

Source: IT Process Institute

Change management

Causality

Compliance and continual reduction of operational variance

Culture of…

Integration of IT operations/security via problem/change management Processes that serve both organizational needs and business objectives Highest rate of effective change

Highest service levels (MTTR, MTBF) Highest first fix rate (unneeded rework)

Production configurations Highest level of pre-production staffing Effective pre-production controls Effective pairing of preventive and detective controls

Page 5: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Visible Ops: Playbook of High Performers

• The IT Process Institute has been studying high-performing organizations since 1999– What is common to all the high

performers?– What is different between them and

average and low performers?– How did they become great?

• Answers have been codified in the Visible Ops Methodology

• The “Visible Ops Handbook” is now available from the ITPI

www.ITPI.org

Page 6: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

2007: Three Controls Predict 60% Of Performance

• To what extent does an organization define, monitor and enforce the following?– Standardized configuration strategy– Process discipline– Controlled access to production systems

Source: IT Process Institute, 2008

Page 7: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

The Darkest Moment In My Journey

Page 8: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Tough Love From Ari Balogh

Page 9: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Why Was I So Unsatisfied With The State Of IT Practice?

• IT operations work continued to be viewed as tactical• Information security and compliance programs were sucking all

the air out of the room (due to scoping problems)• The activation energy for successful improvement programs was

still too high• The IT operations issues overshadowed by development

– Issues are amplified 10x in production: outages, findings, lawsuits– Technical debt builds up over time– IT operations is often the constraint in the organization

• Linkage of IT performance to business performance not obvious enough

• “Why doesn’t the business care? I found the pump handle!”

Page 10: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Seeing The Bigger Problem

Operations Sees…• Fragile applications are prone to failure• Long time required to figure out “which

bit got flipped”• Detective control is a salesperson• Too much time required to restore

service• Too much firefighting and unplanned

work • Planned project work cannot complete• Frustrated customers leave• Market share goes down• Business misses Wall Street

commitments• Business makes even larger promises to

Wall Street

Dev Sees…• More urgent, date-driven projects put

into the queue• Even more fragile code put into

production• More releases have increasingly

“turbulent installs”• Release cycles lengthen to amortize

“cost of deployments”• Failing bigger deployments more

difficult to diagnose• Most senior and constrained IT ops

resources have less time to fix underlying process problems

• Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs

• Ever increasing amount of tension between IT Ops and DevelopmentThese aren’t IT Operations problems…

These are business problems!

Page 11: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

The Dreaded Disease

IT Operations Constipatus (noun)

Occurs when IT Operations creates fatal blockages in project flow. Creates blinding pain in Dev organization.

Blockage worsens with chronic break/fix and security/compliance work, and when technical debt is never paid off.

Causes host to lose energy, become unable to achieve organizational goals. Dangerous to CEOs.Photo credit:

http://www.flickr.com/photos/keenepubliclibrary/2435790649/

Page 12: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

DevOps Can Break A Core Chronic Conflict In IT *

• Every IT organization is pressured to simultaneously:– Respond more quickly to urgent business needs– Provide stable, secure and predictable IT service

Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written extensively on the theory and practice of identifying and resolving core, chronic conflicts.

Words often used to describe ITIL process owners:“hysterical, irrelevant, bureaucratic, bottleneck, difficult to understand, not aligned

with the business, immature, shrill, perpetually focused on irrelevant technical minutiae…”

Page 13: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Framed This Way, Help Can Come From A Surprising Place

• The VP Application Development will often have the following complaints:– IT Operations is the bottleneck– We complete the code, but it takes too long for IT Operations to

get the code into production– Environments are never available when we need them– Releases often cause chaos and disruption to all the other

production services– Turbulent installs have become the norm: 30 min installs take 3

days– Due to slow OS upgrades, applications delayed by 2 quarters– We are always late getting features to market

Page 14: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

A Reframed IT Operations Problem Statement

• Increase flow from Dev to Production– Increase throughput– Decrease WIP

• Our goal is to create a system of operations that allows – Planned work to quickly move to production– Ensure service is quickly restored when things go wrong

• How does this relate to Visible Ops?– We focused much on “unplanned work”– What’s happening to all the planned work?– At any given time, what should IT Ops be working on?– Now we are focusing on the flow of planned work

Page 15: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

What These Breakthroughs Look Like

Page 16: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Goal #1: Decrease Cycle Time Of Releases

• Create determinism in the release process• Move packaging responsibility to development• Release early and often• Decrease cycle time

– Reduce deployment times from 6 hours to 45 minutes– Refactor deployment process that had 1300+ steps spanning 4 weeks

• Never again “fix forward,” instead “roll back,” escalating any deviation from plan to Dev

• Verify for all handoffs (e.g., correctness, accuracy, timeliness, etc…)• Ensure environments are properly built before deployment begins• Control code and environments down the preproduction runways• Hold Dev, QA, Int, and Staging owners accountable for integrity

Page 17: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Goal #2: Increase Production Rigor

• Define what work is and where work can come from• Protect the integrity of the work queue (e.g., are checks being written

than won’t clear?)• To preserve and increase throughput, elevate preventive projects and

maintenance tasks• Document all work, changes and outcomes so that it is repeatable• Ops builds Agile standardized deployment stories, to be completed

after Dev sprints are complete• Maintains adequate situational awareness so that incidents could be

quickly detected and corrected• Standardize unplanned work and escalations• Always seeking to eradicate unplanned work and increase throughput

Lean Principle: “Better -> Faster -> Cheaper”

Page 18: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

The Prescriptive DevOps Cookbook

• Capture and codify how to start and finish successful DevOps transformations– Create isomorphic mapping between

plant floors and IT shops– Co-authoring with Patrick DeBois, Mike

Orzen, John Willis– Describe in detail how to replicate the

transformations describe in “When IT Fails: The Novel”

• Goals– How does Development, IT Operations

and Infosec become dependable partners– How do they work together to solve

business problems (and Infosec, too)

Page 19: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Goal Statement

• Build a system of work where Dev and Ops can be relied upon so that they work together to simultaneously achieve:– fast flow of features into production– deliver services in production that are:

• Attributes of Rugged DevOps– Scalability, availability, survivability, sustainability,

security, supportability, defensibility

Page 20: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Underpinning Principles

• Agile: increase velocity• Lean: reduce WIP• Systems thinking: Dev, Test, IT Operations,

Project Management, Information Security• Lean: implementing effective

countermeasures

Page 21: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Cookbook Outline

• Part 1: Enable IT Operations to become a dependable partner

• Part 2: Enable Dev to become a dependable partner

• Part 3: Dev and IT Operations to create breakthrough results

Page 22: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Part 1: IT Ops

• Enable fast, repeatable and predictable flow of planned work– Create single work queue, master list of

commitments, master production schedule– Create catalog of acceptable work: bill of

materials, work centers, routings• Runners, repeaters and strangers

– Create job release function

Page 23: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Part 1: IT Ops

• Minimize disruption from unplanned work– Standardize unplanned work: make it repeatable– Modify first response: ensure constrained

resources have all data at hand to diagnose– Elevate preventive activities to reduce incidents– Stories about reducing reliance on Brent

Page 24: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Part 2: Dev

• Continuous deployment and integration in place

• Working through some assumptions about Agile methods in place

Page 25: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Part 3: DevOps

• Pick a pilot project• Baseline current performance• Create organization

– Someone needs to see the end-to-end flow from Dev to Production to Incident

– Enable correct feedback loops

Page 26: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Part 3: DevOps

• Dev and Ops work together in Sprint 0 and 1 to create code and environments– Create environment that Dev deploys into– Create downstream environments: QA, Staging,

Production– Create the Agile information radiator– Integrate infosec and QA into daily sprint activities

Page 27: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Part 3: DevOps

• Embed Ops person into Dev structure– Describes non-functional requirements, use cases

and stories from Ops– Has a vote like other team members– Responsible for bringing Ops experiences into

“quality at the source”– Has special responsibility for pulling the Andon

cord

Page 28: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Part 3: DevOps

• Potentially decouple production releases from Sprint boundaries– Issue: how to enable deployments that are more

frequent than the typical 1 or 2 week intervals– Sprints vs. Kanbans

Page 29: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Part 3: DevOps

• Put Dev into Ops escalation chain– MobBrowser case study: “Waking up developers

at 3am is a great feedback loop: defects get fixed very quickly”

– Determine when SOD is a control being relied upon

Page 30: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

The Prescriptive DevOps Cookbook

• I am seeking fellow travelers who want to capture and codify the best known methods, patterns/anti-patterns, recipes and case studies of how to implement successful DevOps-style transformations.

Page 31: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Questions

• Are there areas that we’ve neglected to mention?

• What are the largest barriers to implementing what’s been covered?

• Do you have any tricks/tips/cookbooks you’d like to share?

Page 32: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Page 33: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

The Theory of Constraints Approach To Visible Ops

• Dr. Goldratt wrote The Goal in 1984, describing Alex’s challenge to fix his plant’s cost and due date issues within 90 days

• Some tenets that went against common wisdom:– Every flow of work has a

constraint/bottleneck– Any improvement not made at the

bottleneck is merely an illusion– Fallacy of cost accounting as

operational management tool

Page 34: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

When IT Fails: The NovelDay 1

• Steve Masters, CEO• Dick Landry, CFO

• Parts Unlimited$4B revenue/year

Page 35: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

When IT Fails: The NovelDay 2

• Bill Palmer, VP IT Operations (promoted)– Wes Davis, Director, Distributed Systems– Patty McKee, Director, IT Service Support Services

• The payroll outage– All salaried employees will get paid, but not the hourlies– CISO put in tokenization application in the factories, breaking

database query that uses SSN– IT Ops thought it was a SAN firmware upgrade failure– All HR apps go down– CFO is on front page of news, apologizing to community

Page 36: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

When IT Fails: The NovelDay 4

• Chris Allers, VP Application Development• Sarah Moulton, SVP Retail Products

• “We can deploy by next week by cutting some corners, but IT Ops is in the way… again…”

• “Bill, your team lacks a sense of urgency. We must go. We’ve already bought the newspaper ads – they’re bought, paid for and being printed…”

Page 37: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

When IT Fails: The NovelDay 3

• Nancy Mailer, Chief Audit Executive• John Pesche, CISO

• IT Operations has 980 IT general control deficiencies on critical financial systems, potentially dooming financial statement to having a footnote. Needs management response in 1 week.

• Bill grapples with who to put on the project. 1 yr of work, just to fix issues, even without Phoenix.

Page 38: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

The Goal For IT: Day 10

• The Deployment

• Database conversion, the point of no return, taking 1000x longer.

• In store POS won’t come up by Sat 8am, maybe by next Tuesday

• Emptying shopping cart shows last successful order credit card #

Page 39: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Resources• From the IT Process Institute www.itpi.org

– Both Visible Ops Handbooks– ITPI IT Controls Performance Study

• “Lean IT” by Orzen and Bell– Winner of the Shingo Prize 2011

• “Inspired: How To Create Products That Customers Love” by Cagan

• “Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation” by Humble, Farley

• Follow Gene Kim– @RealGeneKim– mailto:[email protected] – http://realgenekim.me/blog

Page 40: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Call To Action

• If you’re interested in reviewing early versions of “When IT Fails: The Novel,” email me.

• If you’re interested in helping build or review the DevOps Cookbook, email me.

• I’m [email protected]

• Thank you for allowing me to join your tribe!

Page 41: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

Page 42: 2011 09 19 LSPE Dev Ops Cookbook 1a

@RealGeneKim, [email protected]

About Gene Kim

• I’ve spent the last 12 years studying high performing IT organizations, trying to understand:– What do they have in common?– What is present in successful transformations, absent in unsuccessful

transformations?– How do we lower the activation energy required to create the

transformations?• Founder and former CTO of Tripwire, Inc.• Co-author of Visible Ops Handbook, Security Visible Ops

Handbook• Active researcher

– Co-founder of IT Process Institute– Committee member of Institute of Internal Auditors– Leader of PCI Security Standards Council Scoping SIG


Recommended