Date post: | 15-Apr-2017 |
Category: |
Technology |
Upload: | vikash-kodati |
View: | 116 times |
Download: | 1 times |
FRONTLINE SYSTEMS Circuit Breaker Pattern Vikash Kodati
13th July 2016
T-Mobile Confidential2
AGENDA
4/6/2016
• Problem Statement• Circuit Breaker Definition• Solution Landscape• Live Demo• Q&A
T-Mobile Confidential3
CHARACTERISTICS OF MICROSERVICE
6/13/2016
• Componentization via services• Organized around business capabilities• Products not projects• Smart endpoints and dump pipes• Decentralized Data Management• Infrastructure Automation• Design for failure
T-Mobile Confidential4
DESIGN FOR FAILURE
6/13/2016
Typical first year for a new cluster:~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)~1 network rewiring (rolling ~5% of machines down over 2-day span)~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)~5 racks go wonky (40-80 machines see 50% packet loss)~8 network maintenances (4 might cause ~30-minute random connectivity losses)~12 router reloads (takes out DNS and external vips for a couple minutes)~3 router failures (have to immediately pull traffic for an hour)~dozens of minor 30-second blips for dns~1000 individual machine failures~thousands of hard drive failuresslow disks, bad memory, misconfigured machines, flaky machines, etc.
Note: Data taken from Jeff Dean’s slides
T-Mobile Confidential5
PROBLEM STATEMENT
4/6/2016
Given the types of failures that can occur, we need a Fault-Tolerant system such that it• System to continues to operate in event of failure of a
subset of its components• System needs to be Highly Available (HA)• Handles failure gracefully
T-Mobile Confidential6
SOLUTION LANDSCAPE
4/6/2016
Development Phase• Avoiding Cascading failures• Circuit breaker• Timeouts• Retry• Bulkhead• Cache optimizations
• Avoid malicious clients• Rate limiting
Pre-Deploy Phase• Load test• A/B test• Longevity
Post-Deploy Phase• Health check• Metrics
T-Mobile Confidential7
CIRCUIT BREAKER PATTERN
4/6/2016
• If a power surge occurs in the electrical wiring, the breaker will trip. (“On” to “Off”)
• Netflix Hystrix follows circuit breaker pattern• If a service’s error rate exceeds a threshold it will trip the
circuit breaker and blocks the requests for a specific period of time
• Threshold configurable:• End point taking > 1 sec to respond• End point returns a 500 error• End point returns a 500 error 6 times in a row
T-Mobile Confidential8
CIRCUIT BREAKER ILLUSTRATION
4/6/2016
T-Mobile Confidential9
CIRCUIT BREAKER STATE TRANSITIONS
4/6/2016
Closed
Open
Half-Open
SuccessTrip Breaker
Calls failing fast
Attempt Reset
Trip Breaker
Reset Breaker
T-Mobile Confidential10
DEMO TOPOLOGY
4/6/2016
Webbrowser
Zuul (Proxy)
Eureka Server
Reading Service BookStore
T-Mobile Confidential11
ROLES
6/13/2016
The pattern includes • Service Discovery (Eureka), • Circuit Breaker (Hystrix), • Intelligent Routing & Reverse Proxy (Zuul) and • Microservices (Spring Cloud)
T-Mobile Confidential12
HYSTRIX DASHBOARD
4/6/2016
T-Mobile Confidential13
HYSTRIX DASHBOARD DRILL DOWN
4/6/2016
T-Mobile Confidential14
SUMMARY
6/13/2016
• Like a physical circuit breaker, the circuit breaker pattern allows a subsystem to fail gracefully without a complete system failure• Failure is inevitable, be prepared for it • Primarily used in aggregation scnearios
T-Mobile Confidential15
THANK YOU & QA
6/13/2016
Vikash Kodati
• Email: [email protected]• Yammer: https://www.yammer.com/t-mobile.com/users/vikashkodati• Github: https://github.com/vikashkodati• LinkedIn: /in/vikashkodati• Twitter: @vikashkodati• Blog: https://tmobileusa.sharepoint.com/portals/hub/personal/vikashkodati