Date post: | 07-Jan-2017 |
Category: |
Presentations & Public Speaking |
Upload: | lee-atchison |
View: | 184 times |
Download: | 3 times |
5 Keys to Building High Availability Web Applications
for Service and Microservice Based Systems
Lee Atchison, Principal Cloud Architect and Advocate Confidential ©2008–16 New Relic, Inc. All rights reserved.
You had power most of the time.
Why are you complaining?
Confidential ©2008–16 New Relic, Inc. All rights reserved.
How do you keep an
application operational?
Confidential ©2008–16 New Relic, Inc. All rights reserved.
5 Keys to High Availability Web Applications
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Key 4Key 3Key 2Key 1 Key 5
Key 4Key 3Key 2Key 1
Build applications keeping
availability in mind
Key 5
Confidential ©2008–16 New Relic, Inc. All rights reserved.
OR
Develop forfailure
As a Service Developer…
Your response to
a dependency
failure must be
Confidential ©2008–16 New Relic, Inc. All rights reserved.
As a Service Developer…
Your response to
a dependency
failure must be
Understandable
Confidential ©2008–16 New Relic, Inc. All rights reserved.
As a Service Developer…
Your response to
a dependency
failure must be
Predictable
Understandable
Confidential ©2008–16 New Relic, Inc. All rights reserved.
As a Service Developer…
Your response to
a dependency
failure must be
Predictable
Reasonable for the given
dependency failure
Understandable
Confidential ©2008–16 New Relic, Inc. All rights reserved.
How should I
respond when a
dependency fails?
Don’t know something? Don’t show it!
§ Don’t show a drop down list of accounts if you can’t contact the account service
§ Don’t show an image (or show a placeholder) if you can’t determine which image to showProvide a
graceful backoff
Confidential ©2008–16 New Relic, Inc. All rights reserved.
16
Example (Real Life)
Our web application showing a page…
One day, that 3rd
party system failedAn avatar was representing the customer on each page
The app didn’t know what to do – so it failed, too
A 3rd party system generated the avatar
Our application was completely down, all because of a minor icon missing...
17
Why did this cause your application to fail?
§ Recognized the failure of the 3rd party provider as soon as possible
§ Substitute a generic image (or removed it)when the service failure was detected
§ Circuit Breaker pattern would help a lot here
It didn’t know how to respond.It could have:
How should I
respond when a
dependency fails?
Fail as early as possible:
§ Don’t propagate bad data… once you determine a piece of data is invalid, discard it as soon as possible
§ Validate input given…reject bad input immediatelyProvide a
graceful backoff
Confidential ©2008–16 New Relic, Inc. All rights reserved.
19
Example (Real Life)
Account service was having performance problems…
Customers felt a performance problem
Someone was sending bad requests
400
System had“browned out”
0
Service tried toprocess the request…
(And eventually failed)
20
So, what brought our
application to its knees?
§ Input to the service was obviously bad
§ Yet, we attempted to use the input
§ Result was a failed service
Key 4Key 3Key 2Key 1
Always think about scaling
OR
Just because your application works now does not mean it will
work tomorrow…
Key 5
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Build applications keeping
availability in mindOR
Develop for failure
Just because your
application works
now does not mean
it will work
tomorrow… Why?
§ Most web applications have increasing traffic patterns
§ Traffic will increase, double, triple, 10x…sooner than you think
§ Don’t build it for today’s trafficbuild it for tomorrow’s traffic
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Build for
tomorrow
might mean:
§ Build in the ability to increase the size and capacity of your databases.
§ Determine what logical limits exist to your data scaling. What happens when your database tops out in its capabilities?
§ Build your application so that you can add additional application servers easily. This often involves being observant about where and how state is maintained, and how traffic is routed.*
§ Think about caching. What information can be cached? What can't? Why can't it?
§ Redirect static traffic to offline providers.§ Think about whether specific pieces of dynamic
content can actually be generated statically.
* This topic is large enough for an entire chapter, even an entire book, on on its own. Confidential ©2008–16 New Relic, Inc. All rights reserved.
Example: Is It Static or Dynamic?
Non-static content
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Example: Is It Static or Dynamic?
Non-static content
Banner is now static
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Example: Is It Static or Dynamic?
Non-static content
Banner is now static
Personalized content can be added in browser
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Key 4Key 3Key 2Key 1
Always think about scaling
OR
Just because your application works now does not mean it will
work tomorrow…
Mitigate risk
Key 5
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Build applications keeping
availability in mindOR
Develop for failure
All Systems Have Risk in Them
Risk is a measure of the likelihood of a surprise occurring
Server will crash
Database will get corrupted
Returned answer will be incorrect
Network connection
will fail
Newly deployed piece of
software will fail
There is risk that a …
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Risk
§ Keeping a system available requires removing risk…
Hence, removing surprise
§ But as systems become more and more complicated…... this becomes less and less possible
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Risk
Managing what
your risk is
Managing how much
risk is acceptable
Knowing what you can do to mitigate
the risk
Risk Management
is at the heart of building highly
available systems
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Risk
Knowing what you can do to mitigate
the risk
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Risk mitigation
Risk Mitigation
Risk mitigation is part of risk management
Risk mitigation:
§ Knowing what to do when a problem occurs in order to reduce the impact of the problem
§ Making sure your application works as best and as completely as possible, even when services and resources fail
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Risk Mitigation
Risk mitigation requires thinking about the things that can go wrong
… and putting a plan together, now…
to be able to handle the situation when it does happen.
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Key 4Key 3Key 2Key 1
Always think about scaling
OR
Just because your application works now does not mean it will
work tomorrow…
Mitigate risk
Monitor availability
OR
Yes, we can help you
Key 5
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Build applications keeping
availability in mindOR
Develop for failure
Monitor Availability
§ Understand how your application is performing
§ Use application monitoring:§ Keep an eye on how your app is performing§ Generate notifications when the application
performs in abnormal ways
§ Make sure your app is properly instrumented§ Internal as well as external to your app
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Monitor Availability
§ Have your tools monitor continuously
§ Establish a baseline for how your application is performing
§ Look for trends and patterns
§ Look for outliers and deviations from the trends§ Treat these as potential availability issues
§ As your system grows:§ Examine how your baseline changes§ Make sure your scalability plan will
continue to work
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Service Level Agreements
Establish Internal SLAs
Quick diagnoses
“Hot spots” to optimize
performance
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Service Level Agreements
Establish Internal SLAs
Quick diagnoses
“Hot spots” to optimize
performance
Critical to building scalable application
Only way to scale an organization in a reliable way is with reliable SLAs
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Availability response
OR
Yes, that was your pager that
went off
Key 4Key 3Key 2Key 1
Always think about scaling
OR
Just because your application works now does not mean it will
work tomorrow…
Mitigate risk
Monitor availability
OR
Yes, we can help you
Key 5
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Build applications keeping
availability in mindOR
Develop for failure
Responsiveness
When a problem occurs…
§ Do you know what to do to fix the problem?
§ Does everyone on your team know what to do?
§ Do you have playbooks?
§ Does your pager rotation and notification system work?
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Responsiveness
You must be prepared to act on issues.
This means:
§ Alerts that reach the needed individuals
§ Prepared processes and procedures for common failure modes(this is part of risk mitigation process)
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Responsiveness
When an alert is triggered…
§ Owner of that service must be first ones alerted
§ Other teams may want to be alerted as well…§ Services that are tightly dependent on
triggered service§ Early warning notification for upstream
or downstream issues§ May want a “second level” notification
for dependenciesConfidential ©2008–16 New Relic, Inc. All rights reserved.
Responsiveness
BEFORE the problem occurs:
§ Well established plans
§ Documented processes and cheat sheets
§ Contact lists for critical consuming service owners§ Clear, precise escalation plan:§ Who to contact if problem becomes too
big for responder to handle§ If scope of problem extends significantly
and critically beyond failing system§ Know who to escalate if first responder doesn’t
respond Confidential ©2008–16 New Relic, Inc. All rights reserved.
5 Keys to High Availability Web Apps
Availability response
Key 4Key 3Key 2Key 1
Build applications keeping
availability in mind
Always think about scaling
Mitigate risk
Monitor availability
Key 5
Confidential ©2008–16 New Relic, Inc. All rights reserved.
Q
Thank you for your time!
Questions?Lee [email protected] @leeatchison leeatchison
Architecting for Scale
Published by: O’Reilly MediaAvailable: May 2016www.architectingforscale.com
Confidential ©2008–16 New Relic, Inc. All rights reserved.