Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | matthew-skelton |
View: | 1,202 times |
Download: | 0 times |
#u
nid
ev
op
s
Software Operability,
Run Book Collaboration,
and DevOps
Matthew Skelton18th December 2013
DevOps Summit,
Bangalore, India
www.devops-summit.org
@matthewpskelton
softwareoperability.com
#u
nid
ev
op
s
Agenda
• Software Operability
• Run Book Collaboration
• Making Operability Work
• Questions
#u
nid
ev
op
s
Background
• Software systems since 1998
• Software build & deployment
specialist & DevOps enthusiast
• London Continuous Delivery
meetup group - londoncd.org.uk
• Experience DevOps workshops
#u
nid
ev
op
s
Software
Operability
#u
nid
ev
op
s
Software Operability
• Definitions
• Examples
• Why focus on operability?
• How DevOps can help
#u
nid
ev
op
s
Operability?
#u
nid
ev
op
s
Etymology of Operability?
• Cognates:
– Opera
– Operate
– Operational
– Inter-operability
#u
nid
ev
op
s
#u
nid
ev
op
s
Software Operability
• Operability: the properties of a
system which make it work well in
Production
#u
nid
ev
op
s
Operable Systems
Since 1929,
Mallorca, Spain
#u
nid
ev
op
s
Software Operability
• David Copeland (@davetron5000):
“How your software runs in
production is all that matters. The
most amazing abstractions, cleanest
code, or beautiful algorithms are
meaningless if your code doesn’t run
well on production.”
• http://www.naildrivin5.com/blog/2013/06/16/production-is-all-that-matters.html
#u
nid
ev
op
s
Operational Criteria
• Deploy
• Monitor
• Diagnose
• Debug
• Query
• Control
• Inspect
• Clear
• ...
#u
nid
ev
op
s
“Non-Functional”
#u
nid
ev
op
s
Shaped by Operability
• Hooks (internal APIs) for:
– Logging
– Monitoring
– Diagnostics
– Health checks
– Data clear-down
– Service / daemon / container control
#u
nid
ev
op
s
Ops Folk are Users Too!
#u
nid
ev
op
s
#u
nid
ev
op
s
Why focus on Operability?
• Deploy more rapidly, frequently
• High cost of Production outage
• Systems now more complicated
#u
nid
ev
op
s
Outages are Embarrassing!
#u
nid
ev
op
s
Operational considerations
#u
nid
ev
op
s
Operational considerations
#u
nid
ev
op
s
Operational considerations
#u
nid
ev
op
s
How DevOps can help
• DevOps is one way to address poor operability
• Improved collaboration and communication between Dev teams and Ops teams
• Example: Run Book Collaboration
#u
nid
ev
op
s
Run Book
Collaboration
#u
nid
ev
op
s
Run Book Collaboration
• Feedback loops and learning
• What is a run book?
• How can run book collaboration
help operability?
#u
nid
ev
op
s
Feedback Loops
Gene Kim:
http://itrevolution.com/the-three-ways-principles-underpinning-devops/
#u
nid
ev
op
s
Run Book
#u
nid
ev
op
s
Templates
#u
nid
ev
op
s
Example
• 1 Table of Contents
• 2 System Overview – 2.1 Service Overview
– 2.2 Contributing Applications, Daemons, and Windows Services
– 2.3 Hours of Operation
– 2.4 Execution Design
– 2.5 Infrastructure and Network Design
– 2.6 Resilience, Fault Tolerance and High-Availability
– 2.7 Throttling and Partial Shutdown– 2.8 Required Resources
– 2.9 Expected Traffic and Load • 2.9.1 Hot or Peak Periods• 2.9.2 Warm Periods• 2.9.3 Cool or Quiet Periods
– 2.10 Environmental Differences
– 2.11 Tools
• 3 Security and Access Control
• 4 System Configuration – 4.1 Configuration Management
• 5 System Backup and Restore – 5.1 Backup Requirements
• 5.1.1 Special Files
– 5.2 Backup Procedures
– 5.3 Restore Procedures
• 6 Monitoring and Alerting – 6.1 Error Messages
– 6.2 Events
– 6.3 Health Checks
– 6.4 Other Messages
• 7 Operational Tasks – 7.1 Deployment
– 7.2 Batch Processing
– 7.3 Power Procedures
– 7.4 Routine Checks • 7.4.1 System Rebuilds
– 7.5 Troubleshooting
• 8 Maintenance Tasks – 8.1 Maintenance Procedures
• 8.1.1 Patching – 8.1.1.1 Normal Cycle
– 8.1.1.2 Zero-Day Vulnerabilities
• 8.1.2 GMT/BST time changes• 8.1.3 Cleardown Activities
– 8.1.3.1 Log Rotation
– 8.2 Testing • 8.2.1 Technical Testing• 8.2.2 Post-Deployment
• 9 Failure and Recovery Procedures – 9.1 Failover– 9.2 Recovery
– 9.3 Troubleshooting Failover and Recovery
• 10 Contact Details
#u
nid
ev
op
s
Example
• 1 Table of Contents
• 2 System Overview – 2.1 Service Overview
– 2.2 Contributing Applications, Daemons, and Windows Services
– 2.3 Hours of Operation
– 2.4 Execution Design
– 2.5 Infrastructure and Network Design
– 2.6 Resilience, Fault Tolerance and High-Availability
– 2.7 Throttling and Partial Shutdown
– 2.8 Required Resources
– 2.9 Expected Traffic and Load
• 3 Security and Access Control
• 4 System Configuration
• 5 System Backup and Restore
• 6 Monitoring and Alerting
• 7 Operational Tasks
• 8 Maintenance Tasks
• 9 Failure and Recovery Procedures
• 10 Contact Details
#u
nid
ev
op
s
Example
2.1 Service Overview
2.2 Contributing Applications, Daemons, and Windows Services
2.3 Hours of Operation
2.4 Execution Design
2.5 Infrastructure and Network Design
2.6 Resilience, Fault Tolerance and High-Availability
2.7 Throttling and Partial Shutdown
2.8 Required Resources
2.9 Expected Traffic and Load
#u
nid
ev
op
s
It’s Not Documentation
#u
nid
ev
op
s
Focus on Collaboration
#u
nid
ev
op
s
Outcomes
• Better understanding
• Better cross-team working
• Reduction in operational problems
• Fewer outages
• Reduced long-term cost-of-
ownership
#u
nid
ev
op
s
Run Book as Collaboration
• Focus on the collaboration
• Run book is a means, not an end
• Throw it away when complete (?)
• Aim to automate more over time
• See http://runbookcollab.info/
#u
nid
ev
op
s
Making Operability
Work
#u
nid
ev
op
s
Making Operability Work
• NFRs vs Operational Features
• Budget changes
• Organisation changes
• Responsibility changes
• Avoid on-call anti-patterns
#u
nid
ev
op
s
“Non-Functional”
#u
nid
ev
op
s
Operational Features
Features
#u
nid
ev
op
s
Taking Operability Seriously
• Single product backlog
– End-user + Operational features
– New features + bugs
• Product Owner on call
– Accountable for operational failures
– Seriously!
#u
nid
ev
op
s
#u
nid
ev
op
s
Budget changes
• “What is your budget code?”
• Capex vs. Opex?
• Remove budget barriers to
regular, effective communication
#u
nid
ev
op
s
Niek Bartholomeus (@niekbartho) - http://niek.bartholomeus.be/https://speakerdeck.com/niekbartho/self-organization-vs-global-optimization-a-comparison-between-
traditional-and-modern-organizations
#u
nid
ev
op
s
Organisation changes
• “I’ll need to ask my manager first”
• Lack of autonomy
• Remove reporting barriers to regular, effective communication
• More at http://bit.ly/DevOpsTopologies
#u
nid
ev
op
s
“I just want to write code”
#u
nid
ev
op
s
Mysterious Coding Tricks
#u
nid
ev
op
s
On-call for Responsibility
#u
nid
ev
op
s
On-call Anti-Patterns
• Too much overtime pay
• Too little overtime pay
• Rota team too small
• No training in incident response
• No team ownership of product
• No team autonomy for changes
#u
nid
ev
op
s
On call - Goal
• Team members want to help
make things better
• Empowered to fix problems
• Reduce the times they are woken
up
#u
nid
ev
op
s
The operability of operability
• Operational Features, not “NFRs”
• Sustainable collaboration
• Sensible, fair on-call rotas
• Over-compensate in time off
• Avoid burn-out
#u
nid
ev
op
s
Recapitulation
#u
nid
ev
op
s
Software Operability
Making software
systems work well
in Production
#u
nid
ev
op
s
Run Book Collaboration
Shared focus on operability throughout the delivery cycle
#u
nid
ev
op
s
Making Operability Operable
Use DevOps team patterns for sustainable operability
#u
nid
ev
op
s
What’s Next?
#u
nid
ev
op
s
Further Reading
• Patterns for
Performance and
Operability
– Ford, Gileadi, Purba,
Moerman
• http://whoownsmyoperability.com/
– Recommended reading lists
#u
nid
ev
op
s
Operability Book
• Software Operability – How to make software work well in Production– Due early 2014
• Sign up at OperabilityBook.com
• Discount code for DevOps Summit attendees
#u
nid
ev
op
s
Experience DevOps
• A hands-on workshop for DevOps
culture
• Forthcoming dates:
– Bangalore: 19th December 2013
– London: February 2014 (tbc)
• http://experiencedevops.org/
#u
nid
ev
op
s
PIPELINE Conference
• Continuous Delivery
• Tuesday 8th April 2014
• London, UK
• http://pipelineconf.info/
• @PipelineConf
#u
nid
ev
op
s
Questions &
Discussion
Matthew Skelton
@matthewpskelton
softwareoperability.com
operabilitybook.com
bit.ly/DevOpsTopologies
#u
nid
ev
op
s
Acknowledgements
http://pianofortekeys.files.wordpress.com/ 2013/04/ariadnne_wideweb__470x3300.jpg
http://www.blinkenlights.nl/images/ blinkenlights-big.jpeg
http://www.danatronics.com/s db_apps.html
http://riverbankoftruth.com/ wp-content/uploads/2013/07/embarrassed-chimp22.jpg
http://www.thinkgeek.com/edm/ 20040709.html
http://indianaohindiana.com/wp-content/uploads/2013/10/Tome.jpg
http://www.guavaworks.com/company-blog/guava-doesnt-do-cookie-cutter.html
http://www.carpages.co.uk/ford/ford-sand-sculptures-05-09-11.asp
http://www.thisismoney.co.uk/money/experts/ article-2324270/Take-smaller-pension-pots-tax-free-leave-final-salary-untouched.html
http://paranoidnews.org/wp-content/uploads/2010/10/Alien-Hunt-Alarm-Clock.jpg
http://particulations.blogspot.co.uk/ 2010/08/headingley-hole.html
http://marvel.wikia.com/ Stephen_Strange_(Earth-616)
#u
nid
ev
op
s
Further Slides
#u
nid
ev
op
s
The Phoenix Project
#u
nid
ev
op
s
Continuous Delivery