+ All Categories
Home > Documents > TriAgile 2021

TriAgile 2021

Date post: 01-Dec-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
18
TriAgile 2021 Despite Good Code, Production Failures - Why? Kevin S. Green IBM Service Management Architect [email protected]
Transcript

TriAgile 2021

Despite Good Code, Production Failures - Why?

Kevin S. GreenIBM Service Management [email protected]

TriAgile

Anthony J D’Angelo

Kahoot.it - Question 1Your Experience

TriAgile

Anthony J D’Angelo

Kahoot.it - Question 2Your Experience

TriAgile

Anthony J D’Angelo

Kahoot.it - Question 3Your Experience

TriAgile

Kahoot.it - Question 4Your Experience

TriAgile

Anthony J D’Angelo

Kahoot.it - Question 5Your Experience

TriAgile

Anthony J D’Angelo

-What is CSMO?-What is SRE?-Diversify Agile teams with SRE-Getting Started -Q&A

Agenda

TriAgile

Anthony J D’Angelo

IBM Cloud © 2018 IBM Corporation

DevSecOpsDevelopment IT Operations

NoOps ?

SRE

8

aaS

EnvOps

ITIL ?

Shift Right

Shift Left

Cloud Service Management and Operations (CSMO)

Environment Ops

DevSecOps

Site ReliabilityEngineering

Management Service ITIL, IT4IT, ZeroOutage

AIOps

Cloud Service Management and Operations What is CSMO?

Concept Client

TriAgileWhat is Site Reliability Engineering(SRE)

9

What is SRE?

• System Thinking• Data-Driven Decisions• Engineering Rigid• Embracing Risk• Eliminating Toil• Technical Debt• Simplicity• Collaboration• Shared Responsibility• Trust & Transparency

• Ops to scale with load through Automation, but don’t stop at Automation• Cap operational load: 50% time spent on toil - 50% on engineering projects (improvements)• Excess Ops work overflows to the Dev Team, share 5% of Ops work with Dev Team• Have an SLA / SLO for the service, measure against the SLA / SLO• Error budget to control velocity. Effective self-regulation of features vs. stability• Observability, including the Golden Signals: Latency, Traffic, Errors, Saturation, Requests• Actionable symptom-based alerts, from the user perspective. (Automated) runbooks to

govern actions.• Blameless Post Mortem for every event• Hire (only) developers; Common staffing pool for SRE and Dev

Monitoring

Incident Response

Post Mortem / RCA

Testing & Release Procedures

Capacity Planning

Development

Product

“Fundamentally, it’s what happens when you ask a

software engineer to design an operations function.

response

analysis

preparation

design

TriAgile

Majority of client focus on Development and DevOps to get code created and delivered to clients quickly. Ops Modernization waned.

Questionable Progress

Client ExampleDiversify w/ SRE

• Outcome: Client Agile Development practices resulted significant progress. Ops struggled to keep pace. Client experienced regular monthly outages during peak demand.

– Stability Outages• Scenario: client DevOps obtained high

velocity. Development/Software Engineers and Ops were working separately. Development was successfully modernizing its work.

Client Implemented CSMO Practices to achieve Stability!

TriAgileManifesto for Agile Software Development

Diversify w/ SRE

We are uncovering better ways of developingsoftware by doing it and helping others do it.Through this work we have come to value:

Individuals and interactions over processes and toolsWorking software over comprehensive documentationCustomer collaboration over contract negotiationResponding to change over following a plan

That is, while there is value in the items onthe right, we value the items on the left more.

Service Levels

NFRs, IM, B2M

Error Budget

SRE Roles

TriAgileIndividuals and InteractionsDiversify w/ SRE

• Application SREs – work closely with Application Development team.

• Platform SREs – focus on platforms such as cloud or other foundation infrastructure.

• Transformation SREs – drives transformation of organization to adopt SRE.

• Solution SREs – focus on products such as monitoring tools, CI/CD pipeline.

TriAgileWorking SoftwareDiversify w/ SRE

SREs utilize several practices to assure that the target software is operational. While development focuses on functional requirements, SREs focus on non-functional requirements to assure the software works. An example practice is Build to Manage (B2M)

TriAgile

Customer Collaboration

Diversify w/ SRE

Uptime Downtime per month

Downtime per year

99.999 % .4 min 5 min

99.99 % 4 min 52 min

99.9 % 43 min 8h 46m

99,5 % 3h 36m 1d 19h 48m

99 % 7h 12m 3d 15h 36m

Well engineered softwareWell engineered operations

Well engineered business

Well engineered infrastructure

How many 9’s do you need?

Service levels are mechanisms SREs utilize to determine how important a service is to the customer. Each SRE supported service has key measures based on collaboration that informs the team and business of the level of resilience required. Key metrics utilized include:• Service level indicator (SLI) – quantative measure of service

reliability. • Service Level Objective (SLO) – a goal, reliability target for a

given SLI. • Service Level Agreement(SLA) – consequences for not

meeting the SLO.

TriAgileResponding to changeDiversify w/ SRE

100%

99%

SLA(i.e. 99%)

availability

Error Budget

SLA

SLA at risk:

No more Releases this cycle !

SLA

SLA overachieved

Be more aggressive in rolling out changes.Explore new things.

Very advanced approach: Force Downtime to set realistic expectations on contracted availability (for instance for internal services).OR

Error Budgets are utilized by SREs and Software Engineers to drive the velocity of changes released.

TriAgileCSMO Resources (including SRE Information)

IBM Garage Architecture Center - https://www.ibm.com/cloud/architecture/architectures/serviceManagementArchitecture• CSMO Field Guide https://www.ibm.com/cloud/architecture/content/field-guide/csmo-field-guide/• CSMO Ref Arch https://www.ibm.com/cloud/architecture/architectures/serviceManagementArchitecture/referenceArchitecture• CSMO Course https://www.ibm.com/cloud/architecture/content/course/explore-csmo

Getting Started

TriAgile 2021

Thank You!How can we help?

TriAgile 2021

Thank You!


Recommended