Date post: | 06-Aug-2015 |
Category: |
Software |
Upload: | devan-stormont |
View: | 150 times |
Download: | 1 times |
“Big Picture” Goals
What should we be aiming for?
■ Don’t try to do everything perfectly
■ Do tighten every feedback loop to respond as quickly as possible to problems
Goals in Practice
How do we accomplish this?
■ One-click manual steps
■ Monitoring results at every phase
■ Automatic reporting or action
Development
■ One-click builds
■ Continuous builds
■ Test suites
■ Static code analysis
Deployment
■ One-click deployments
■ Minimizing down time
■ Monitoring rollout health
■ Incremental node rollout
Production
■ Instance health
■ Process/service health
■ Third-party service health
■ User activity
Analysis
■ Automatic data collection
■ Scheduled analysis
■ Report notifications
■ Automatic rollbacks
Development
The primary goals of development automation are:
■ Notify developers of the error(s)
■ Prevent bad code from being released
Development
(There shouldn’t be anything new here to most developers)
■ One-click (or one-command) builds
Ideally, this is exactly the same regardless of developer OS
■ Sanity checks proactively fail builds
Unit testing
Property testing
Static code analysis
■ Continuous builds systems
More thorough functional/integration tests
Every customer-reported issue should have an automated regression test!
Deeper code analysis
Development - Feedback Loop
(There shouldn’t be anything new here to most developers)
■ Developer systems: Failed builds should prevent code check-ins
■ Continuous builds failures
Send out notifications
Automatically roll back check-ins to release branches
Alternatively, success automatically integrates into release branches
Push system - the system is the gatekeeper■ Continuous builds also generate reports about lower-threshold warnings
Static code analysis, test code coverage
Pull system - up to developers to be pro-active
Minimize these as much as possible!
Deployment
The primary goals of deployment automation are:
■ Automatically push out changes
■ Actively monitor rollout for problems
■ Automatically roll back to known good states
Deployment
■ One-click (one-command) automatic rollouts
Should be staged across instances/regions
Should minimize down time - hot swap!
■ Monitor rollout health
Node availability
Process/service availability
Data migration health
■ Failure thresholds
Developer notifications
Rollouts automatically unwind to known-good states
Deployment
With enough seamless monitoring in place:
■ Deployments should be invisible to users
■ A “good” code check-in can automatically drive a new deployment
Production
The primary goals of end-user production automation are:
■ Monitor for problems
■ Proactively address problems
■ If necessary, roll back to known good states
Production
There are two distinct elements to monitoring in production
■ Detecting system problems
■ Monitoring users
Production
■ System monitoring
Notifications if systems/instances go down or are overloaded
Automatically scale up new resources upon need
■ Service watchdogs
Automatic service restarts
Capture and storage of logs
Pushed by client, service, or cron job/scheduled task■ Third-party APIs
Periodically check health/accessibility
Notifications upon failure
A problem with a necessary third-party service is a problem for your service
Your users will blame you for every issue
Production
■ User monitoring
How many users are active?
What services are those users using?
What services are users hitting errors with?
■ Extended user monitoring
Social media
App store reviews
Automatic notifications!■ Users like interaction - people like to be noticed
Immediate, graceful interaction is likely to earn positive public feedback
Even from users who were complaining about a problem!
Production
Resolve problems
■ Automatically spin up/down resources to adapt to user load
■ Proactive notifications about errors
■ For critical issues, allow the production environment to automatically rollback
to the last known-good state
■ Users who feel like you helped them personally are likely to become your
evangelists
Analysis
The primary goals of analysis automation are:
■ Proactive, early warning of known problems
Notifications of significant issues
Automatically resolving where possible
Unwinding bad deployments upon certain thresholds
■ Ability to more easily detect unknown problems
Requires prior collection of good enough information to resolve
Usually feeds into the next development iteration
■ Really touches all of the previous pieces, as already shown
Listed here because analysis should be treated as a first-class citizen
Analysis
■ Really touches all of the previous pieces, as already shown
Listed here because analysis should be treated as a first-class citizen
■ If you’re not driving development (or even features) through the use of
measurement, all you’re really doing is educated guessing
Not this problem
Fix this problem first
Recap - Problem Resolution
The main points applicable to every stage
■ Automatic notifications of failures
■ Rollback to known-good state
■ Automatic resource scaling (up/down)
What this buys us
■ Immediate visibility to every link in the chain
■ Rapid, iterative releases for problem resolution
■ Rapid learning about your users