+ All Categories
Home > Technology > SRE From Scratch

SRE From Scratch

Date post: 16-Nov-2014
Category:
Upload: grier-johnson
View: 2,014 times
Download: 4 times
Share this document with a friend
Description:
How to bootstrap an SRE team into your company. How to hire them, what to have them work on and how to interact with them as a team. Finally some thought on general practices to consider before your SREs arrive. There are also kitten pictures.
Popular Tags:
112
SRE FROM SCRATCH
Transcript
Page 1: SRE From Scratch

SRE FROM SCRATCH

Page 2: SRE From Scratch

SITE RELIABILITY ENGINEERING

Page 3: SRE From Scratch

PRODUCTION ENGINEERING

Page 4: SRE From Scratch

DEVOPS?

Page 5: SRE From Scratch

WHAT DO SRE DO?

Page 6: SRE From Scratch

KEEP THE SITE UP

Page 7: SRE From Scratch

KNOW THE PRODUCTION ENVIRONMENT

Page 8: SRE From Scratch

KNOW THEIR PRODUCT

Page 9: SRE From Scratch

LIAISON, ADVISOR, CONSULTANT

Page 10: SRE From Scratch

TOOLING AND AUTOMATION

Page 11: SRE From Scratch

TRIAGE

Page 12: SRE From Scratch

SO? WHY DO I NEED THEM?

Page 13: SRE From Scratch

UPTIME

Page 14: SRE From Scratch

THE ENVIRONMENT IS A PRODUCT

Page 15: SRE From Scratch

THEY’VE DONE THIS BEFORE

Page 16: SRE From Scratch

OK... LET’S HIRE SOME

Page 17: SRE From Scratch

WHAT TO LOOK FOR...

Page 18: SRE From Scratch

SRES!

Page 19: SRE From Scratch

SYSADMINS THAT PROGRAM

Page 20: SRE From Scratch

PROGRAMMERS THAT DO SYSADMIN

Page 21: SRE From Scratch

EXPERIENCE WITH SCALE

Page 22: SRE From Scratch

HOW DO I INTERVIEW THEM?

Page 23: SRE From Scratch

FUNDAMENTALS

Page 24: SRE From Scratch

HARDWARE

Page 25: SRE From Scratch

SYSTEM INTERNALS

Page 26: SRE From Scratch

UNIX ENVIRONMENT

Page 27: SRE From Scratch

NETWORKING

Page 28: SRE From Scratch

APPLICATION SUPPORT

Page 29: SRE From Scratch

OPERATING AT SCALE

Page 30: SRE From Scratch

PROGRAMMING

Page 31: SRE From Scratch

DON’T HIRE HEROES

Page 32: SRE From Scratch

OK, I’VE HIRED SOME, WHAT SHOULD THEY DO?

Page 33: SRE From Scratch

DESIGN REVIEW

Page 34: SRE From Scratch

DATA FLOWS

Page 35: SRE From Scratch

DEPENDENCIES

Page 36: SRE From Scratch

FAILURE CONDITIONS

Page 37: SRE From Scratch

SCALING

Page 38: SRE From Scratch

LAUNCH PREPAREDNESS

Page 39: SRE From Scratch

DOCUMENTATION

Page 40: SRE From Scratch

BUILD INFRASTRUCTURE

Page 41: SRE From Scratch

MONITORING

Page 42: SRE From Scratch

DEPLOYMENT

Page 43: SRE From Scratch

OPERATOR TOOLS

Page 44: SRE From Scratch

CONFIGURATION MANAGEMENT

Page 45: SRE From Scratch

SELF-SERVICE

Page 46: SRE From Scratch

HOW SHOULD THE TEAMS INTERACT...

Page 47: SRE From Scratch

DON’T GIVE ALL THE DAY-TO-DAY TASKS TO THE SRES

Page 48: SRE From Scratch

SHARE THE LOAD

Page 49: SRE From Scratch

HAVE YOUR SRES SIT WITH YOU

Page 50: SRE From Scratch

INCLUDE THEM IN DISCUSSIONS THE AFFECT THE PRODUCTION ENVIRONMENT

Page 51: SRE From Scratch

SOFTWARE IS NEVER THROWN OVER THE WALL

Page 52: SRE From Scratch

HAND-OFFS

Page 53: SRE From Scratch

SRES SHOULD BLOCK DANGEROUS CHANGES

Page 54: SRE From Scratch

IF YOUR SRES ARE FIGHTING FIRES, THEY’RE NOT BUILDING

INFRASTRUCTURE

Page 55: SRE From Scratch

IF YOUR SOFTWARE IS CAUSING FIRES, FIX IT

Page 56: SRE From Scratch

ASK YOUR SRE TO HELP MAKE FLAME-PROOF SOFTWARE

Page 57: SRE From Scratch

DON’T HIDE YOUR PROBLEMS FROM SRE

Page 58: SRE From Scratch

SRE SHOULD BE INVOLVED TO UNDERSTAND THE PROBLEM

Page 59: SRE From Scratch

EVERYONE SHOULD BE WRITING CODE OR MAKING

HARD DECISIONS

Page 60: SRE From Scratch

OF COURSE THERE ARE OPTIONS...

Page 61: SRE From Scratch

SRE CAN DO ALL THE SUPPORT

Page 62: SRE From Scratch

SRES ARE A LIMITED RESOURCE

Page 63: SRE From Scratch

SWE CAN SUPPORT PRODUCTS...

Page 64: SRE From Scratch

APP SUPPORT BY SWE, INFRASTRUCTURE SUPPORT

BY SRE

Page 65: SRE From Scratch

OR JUST ROTATE AROUND

Page 66: SRE From Scratch

ANY PRODUCTION ADVICE?

Page 67: SRE From Scratch

SELF-SERVICE

Page 68: SRE From Scratch

ALL TOOLS SHOULD BE WRITTEN WITH THE IDEA THAT

ROBOTS CAN RUN THEM

Page 69: SRE From Scratch

BEFORE ROBOTS RUN THEM, ANYONE IN THE COMPANY

SHOULD BE ABLE TO

Page 70: SRE From Scratch

PEOPLE SHOULD MAKE HARD DECISIONS, NOT PUSH

BUTTONS

Page 71: SRE From Scratch

GIVE PEOPLE ACCESS

Page 72: SRE From Scratch

SWE SHOULD HAVE AS MUCH ACCESS AS THEY NEED.

Page 73: SRE From Scratch

SWE ALREADY WRITES CODE THAT HAS ACCESS TO

SENSITIVE DATA

Page 74: SRE From Scratch

PRODUCTION DATA STAYS IN PRODUCTION

Page 75: SRE From Scratch

MAKE GOOD SYNTHETIC DATA

Page 76: SRE From Scratch

MAKE GOOD WAYS TO TEST IN PROD

Page 77: SRE From Scratch

CANARY, A/B TEST, ETC.

Page 78: SRE From Scratch

LEARN TO TRIAGE

Page 79: SRE From Scratch

THINGS BREAK, YOU MUST FIX THEM

Page 80: SRE From Scratch

MONITORING, METRICS, OPERATOR TOOLS, FAST

BUILD AND DEPLOY

Page 81: SRE From Scratch

TO FIX, YOU NEED TO KNOW IT’S BROKEN

Page 82: SRE From Scratch

MONITORING

Page 83: SRE From Scratch

MONITOR APPLICATIONS

Page 84: SRE From Scratch

MONITOR BEHAVIOR

Page 85: SRE From Scratch

STANDARDIZE YOUR METRICS

Page 86: SRE From Scratch

PUSH METRICS OUT

Page 87: SRE From Scratch

DECOUPLE YOUR SYSTEMS

Page 88: SRE From Scratch

WATCH SYSTEMS AS A FUNCTION OF CAPACITY

Page 89: SRE From Scratch

ONLY ALERT ON SYSTEM METRICS KNOWN TO HURT

YOU

Page 90: SRE From Scratch

DATA STORES

Page 91: SRE From Scratch

BEWARE THE RDBMS

Page 92: SRE From Scratch

LEARN TO SHARD

Page 93: SRE From Scratch

DITCH THE DURABILITY WHERE YOU CAN

Page 94: SRE From Scratch

BUT FIGURE OUT HOW TO BOOTSTRAP NON-DURABLE

STORES

Page 95: SRE From Scratch

MEMCACHE IS A BLESSING AND A CURSE

Page 96: SRE From Scratch

ALWAYS CONSIDER A SITE-WIDE POWER OUTAGE

Page 97: SRE From Scratch

USE DURABLE AND NON-DURABLE STORES TOGETHER

Page 98: SRE From Scratch

ASK YOUR SRE FOR MORE INFO

Page 99: SRE From Scratch

DESPITE ALL THIS, YOU CAN STILL FAIL...

Page 100: SRE From Scratch

OBVIOUS FAILURE

Page 101: SRE From Scratch

DOWNTIME

Page 102: SRE From Scratch

DOWNTIME WITHOUT KNOWING

Page 103: SRE From Scratch

NON-OBVIOUS FAILURES

Page 104: SRE From Scratch

HEROIC ACTS

Page 105: SRE From Scratch

WERE YOU UP ALL NIGHT?

Page 106: SRE From Scratch

DID YOU DO THAT SAME TASK ALL DAY?

Page 107: SRE From Scratch

DID A WHOLE TEAM STOP WHAT THEY WERE DOING?

Page 108: SRE From Scratch

THESE ARE HEROIC ACTS, THEY ARE POISON

Page 109: SRE From Scratch

HEROISM = FAILURE

Page 110: SRE From Scratch

COMES FROM LEGACY SYSTEMS, PROCEDURES

Page 111: SRE From Scratch

ALSO FROM PERSONALITY TRAITS...

Page 112: SRE From Scratch

QUESTIONS?

• Grier Johnson

• @grierj

[email protected]


Recommended