Fail Fast, Fail Often

Post on 17-Feb-2017

391 views 8 download

transcript

FAIL FAST, FAIL OFTEN

Gordon Haff @ghaff, Technology EvangelistWilliam Henry @ipbabble, DevOps Strategy Lead13 July 2016

FAILURE

2

3

FAILURE

4

FAILURE

ALSO FAILURE

5

FAILURES HAVE CONSEQUENCES

6

THE INESCAPABLE CONCLUSION?

7

DON’TFAIL

8

DON’TFAIL

9

FAILWELL

10

11

Experiment by Peter Skillman, former VP of design at Palm

12

WHAT HE LEARNED

• Kindergarteners do not spend 15 minutes in a bunch of status transactions trying to figure out who is going to be CEO of Spaghetti Corporation.

• They don’t sit around talking about the problem. They just start building to determine what works and what doesn’t.

SOFTWARE = GREAT MATCH FOR

FAILING WELL

13

14

FIVE PRINCIPLES:

THE RIGHT

scopeapproachworkflowincentivesculture

15

THE RIGHT SCOPEConstrain the impact of failure

• Enable experimentation

• Stop cascading of failures

• Make deployments incremental, frequent, and routine events

• Generally decouple activities and decisions from each other

• Small, autonomous, bounded context services

16

SMALL

• “Two pizza teams”

• Well-defined functional units

• Organized around business capabilities (Conway's Law)

17

AUTONOMOUS

• Implementation changes can happen independently of other services

• Data and functionality exposed only through service calls over the network

• Designed to be externalizable

• No back-doors

18

THE RIGHT APPROACHContinuously experiment, iterate, and improve

• It’s about the process

• Identify mistakes early

• Establish safety nets

• Fail and move on

19

THE PROCESSInvolves people and communication

• The most effective process have continuous communication - think scrums and kanban

• Allows for collaboration that can identify failures before they happen

• Allows for feedback to continuously improve and cultivate growth

• Provides transparency

20

DEV LESSONS: BREAKING CODE VIOLENTLYBuild in violent failures to highlight issues

• C/C++ lessons:

• Sanity check using assertions

• Invariant checks

• If ever I’m here in the code and these conditions aren’t met, then I have no business being here. Something is wrong and I should fail violently.

• Involves tracing through the failure

21

AUTOMATED REGRESSION TESTING

• As products and services evolve we discovered that maintaining and incrementally adding new tests became valuable

• These tests were/are most often based on experienced failures and bugs

• Scripts were developed to run nightly builds against various developer changes to test for regression

• Testing tools evolved - proprietary and open source

22

OPS LESSONS: CHAOS MONKEYTest robustness of recovery using failure

• Platform should provide uninterrupted services to the customer

• Therefore:

• Should always recover in acceptable amount of time

• We should have random failures to ensure that changes have not regressed or caused new recovery problems

http://understeer.hatenablog.com/entry/2012/02/29/224629

23

THE RIGHT WORKFLOWRepeatably automate for consistency

• Goal is repeatable automation

• Toyota’s yellow cord

• Initially pipelines may be very different

• Different tools

• Traditional vs. “cloud native”

• It’s a journey• Consolidation evolves naturally

24

DESIRABLE ENTERPRISE CI/CD WORKFLOW

myRepo ProjectRepo

CI

Commit Push

Pass/Fail

Local Test

BuildRepo

CD

ReleaseRepo

Monitor

Build Test Review/Appr Deliver Deploy

3rd Party

25

CI/CD PIPELINE TOOLSET

CI/CD Workflow UI

gerrit

26

OPS LESSONS: RED/GREENConfiguration as code has built in failure

Continuous Integration / Continuous Deployment

Image & Package &Metadata Repository

src repo

Dev./Build QA Productionin OHC

Events

27

THE RIGHT INCENTIVESAlign rewards and behavior with desirable outcomes

• Incentives (advancement, money, recognition) need to reward trust, cooperation, and innovation

• Peer reward systems also valuable

• Individual has control over their own success

• But people still have responsibility for their actions

28

THE RIGHT CULTUREBuild systems and organizations that allow for failing well

• Transparency

• Even good decisions can have bad outcomes

• Innovation inherently risky• Cut losses (avoid sunk cost fallacy)

This is why open source is so successful!

29

30

BUT CULTURE ISN’T SOMETHING YOU JUST CHANGE

• Lack of agreed-to model of what “right” culture looks like

• Different organizations require different behaviors

• Culture change is difficult to measure and quantify

• Culture is very hard to impose

• Culture is an output, not an input

31

CULTURE IS:

emergentpervasivethe keystone

plus.google.com/+RedHat

linkedin.com/company/red-hat

youtube.com/user/RedHatVideos

facebook.com/redhatinc

twitter.com/RedHatNews

THANK YOU

CREDITS

33

Tacoma Narrows Bridge: Barney Elliott; The Camera Shop - Screenshot taken from 16MM Kodachrome motion picture film by Barney Elliott.

Time cover: Time, Inc.

Wipeout, Flickr/CC: https://www.flickr.com/photos/andymorffew/15843725192

Marshmallow challenge: http://marshmallowchallenge.com/Welcome.html

Linux Collaboration Summit: Linux Foundation.

Two pizzas: Flickr/CC https://www.flickr.com/photos/dongkwan/283076601

Frog: Kathy CC/Flickr https://flic.kr/p/b9fFV

Square peg Flickr/CC: https://www.flickr.com/photos/epublicist/3546059144/