Date post: | 10-Feb-2017 |
Category: |
Engineering |
Upload: | michael-kehoe |
View: | 240 times |
Download: | 2 times |
Michael Kehoe Senior Site Reliability Engineer
SouthBay SRE MeetupLinkedIn Traffic Shifting
2
$ whoami Michael Kehoe
• Sr Site Reliability Engineer (SRE)• Member of PROD-SRE• https://www.linkedin.com/in/michaelkkehoe
5
Why do we do traffic shifts
• Tomitigateuserimpactfromproblemswitha3rdpartyproviderorLinkedIn’sinfrastructure/services
• TovalidateDisasterRecovery(DR)incaseofanydatacenterfailure
• Tovalidateandtestcapacityheadroomacrossourdatacenters
• Toexposebugsandsuboptimalconfigurationsbyloadtestingoneormoredatacenters
• Toperformplannedmaintenance• Tovalidateandexercisethetrafficshiftautomation
7
Edge Traffic shifts How does it work
• WeuseIPVStoloadbalanceatouredges• Wecanwithdrawanycastroutestoremovetrafficfrom
thatPoP• HealthchecksonouredgeproxyaretestedbyDNS
providerstoverifywhetherthatPoPisinrotation• Wecanfailthosehealthcheckstoremoveunicast
trafficfromthatPoP
9
Datacenter Traffic shifts How does it work?
• Differenttraffictypesarepartitionedandcontrolledseparately• Logged-invsLogged-out• CDN• Monitoring• Microsites
• Logged-inusersareplacedinto‘buckets’andhaveprimary/secondarydatacenterassignments
• Bucketsaremarkedonline/offlinetomovesitetraffic
13
Single Master Failover How does it work?
• Onlyusedinextremecases• LeveragedistributedlockinginApacheZookeeper• Singlemasterserviceshaveaspringcomponentthatchecks
themastershipoftheserviceinaparticulardatacenter
15
Conclusion
• Thebestwaytoprepareforadisasteristopracticeoneregularly!
• Toolingandautomationisyourbestfriendduringanoutage• Capacityplanning/managementisextremelyimportant