+ All Categories
Home > Documents > Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on...

Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on...

Date post: 01-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
25
Embracing Failure (not my life story)
Transcript
Page 1: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Embracing Failure(not my life story)

Page 2: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.
Page 3: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.
Page 4: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Setting the Mood•Understand that they WILL

happen •Failures are not binary

•Impact determines importance •deadlines for fixes are variable

Page 5: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Terminology

•Website •Production •Downtime

Page 6: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Monitor Failures

Page 7: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

What is Monitoring?•Graphs. Everywhere. •Alerts on failures

•phone calls •texts

•Answers: Are we failing?

Page 8: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.
Page 9: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

healthcare.gov

•Know when you’re down before CNN

Page 10: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.
Page 11: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Postmortems(fool me once. shame on you.

fool me twice. shame on me.)

Page 12: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Postmortems

1. Reconstruct the factual timeline

2. Root cause analysis

3. Remediation items

Page 13: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Postmortems

•Why did we fail? •Blameless •Moderated

Page 14: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Gamedays(You wouldn’t wing a talk.

Don’t wing a hot fix)

Page 15: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Gameday

•Best defense is a good offense

•Simulate possible failures •Do it in production

Page 16: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

kill -9

1. Draw a block diagram

2. Cut every connection

3. Watch the fireworks

Page 17: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

SafeMachine(like a state machine … but safer)

Page 18: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Try, Try, Try again•What if we could just retry

failures? •Side effects are the root of all

evil •Safe failures vs Unsafe failures

Page 19: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

What’s in a SafeMachine

•Actions •States

START Computed File

Uploaded File END

compute uploadrecord

successful

Page 20: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

initialize_succeeded

initialize_failed

initialize_inprogress

computed_succeeded

Page 21: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

START

a1

a1

a2

a2

a2

a3

a3

a3

END

The Pipeline

Page 22: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

The Pipeline

START Computed File

Uploaded File END

Safe Unsafe Safe

Page 23: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Embracing Failure•Monitor •Postmortems •Gamedays - you wouldn’t

wing a talk? •SafeMachine

Page 24: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

@chriswu_

Page 25: Embracing Failure - chriswu.mechriswu.me/talks/embracing-failure.pdf · (fool me once. shame on you. fool me twice. shame on me.) Postmortems 1. Reconstruct the factual timeline 2.

Additional resources

• Postmortems https://codeascraft.com/2012/05/22/blameless-postmortems/

• Gamedays - https://stripe.com/blog/game-day-exercises-at-stripe

• links at the bottom of this post are also great

• Error Tracking - https://getsentry.com/welcome/


Recommended