Date post: | 16-Apr-2017 |
Category: |
Technology |
Upload: | lee-atchison |
View: | 72 times |
Download: | 0 times |
Flying Two Mistakes HighA Guide to Not CrashingLee Atchison, Principal Cloud Architect and Advocate at New Relic, Inc.
©2008-16NewRelic,Inc.Allrightsreserved.
2
SafeHarbor
©2008-16NewRelic,Inc.Allrightsreserved.
This document and the information herein (including any information that may be incorporated by reference) isprovided for informational purposes only and should not be construed as an offer, commitment, promise orobligation on behalf of New Relic, Inc. (“New Relic”) to sell securities or deliver any product, material, code,functionality, or other feature. Any information provided hereby is proprietary to New Relic and may not bereplicated or disclosed without New Relic’s express written permission.
Such information may contain forward-looking statements within the meaning of federal securities laws. Anystatement that is not a historical fact or refers to expectations, projections, future plans, objectives, estimates,goals, or other characterizations of future events is a forward-looking statement. These forward-lookingstatements can often be identified as such because the context of the statement will include words such as“believes,” “anticipates,”, “expects” or words of similar import.
Actual results may differ materially from those expressed in these forward-looking statements, which speak onlyas of the date hereof, and are subject to change at any time without notice. Existing and prospective investors,customers and other third parties transacting business with New Relic are cautioned not to place undue relianceon this forward-looking information. The achievement or success of the matters covered by such forward-lookingstatements are based on New Relic’s current assumptions, expectations, and beliefs and are subject to substantialrisks, uncertainties, assumptions, and changes in circumstances that may cause the actual results, performance, orachievements to differ materially from those expressed or implied in any forward-looking statement. Furtherinformation on factors that could affect such forward-looking statements is included in the filings we make withthe SEC from time to time. Copies of these documents may be obtained by visiting New Relic’s Investor Relationswebsite at http://ir.newrelic.com or the SEC’s website atwww.sec.gov.
New Relic assumes no obligation and does not intend to update these forward-looking statements, except asrequired by law. New Relic makes no warranties, expressed or implied, in this document or otherwise, withrespect to the information provided.
3
WhoamI?
©2008-16NewRelic,Inc.Allrightsreserved.
Specializein:
Cloudcomputing
Services&Microservices
Scalability, Availability
28yearsinindustry7inAmazonRetail&AWS(BuiltSW/VGAppStore,AWSElasticBeanstalk)
4inNewRelic(ArchitectureLead,Cloud,ServiceMigration)
@leeatchison leeatchison
4
Iwanttotellyouastory…
©2008-16NewRelic,Inc.Allrightsreserved.
5
Iwanttotellyouastory…
©2008-16NewRelic,Inc.Allrightsreserved.
Youtellmeifthisisokornot…
6
Iwanttotellyouastory…
©2008-16NewRelic,Inc.Allrightsreserved.
Thiswasarecentlyoverheardconversation…
Youtellmeifthisisokornot…
7
Isthisok?
©2008-16NewRelic,Inc.Allrightsreserved.
“Wewerewonderinghowchangingasettingon
ourMySQLdatabasemightimpactourperformance…
8
Isthisok?
©2008-16NewRelic,Inc.Allrightsreserved.
“Wewerewonderinghowchangingasettingon
ourMySQLdatabasemightimpactourperformance…
…butwewereworriedthatthechangemaycauseourproductiondatabasetofail…”
9
Isthisok?
©2008-16NewRelic,Inc.Allrightsreserved.
“…Sincewedidn’twanttobringdownproduction,wedecidedtomakethechangetoourbackup
(replica)databaseinstead…
UnderConstruction
…butwewereworriedthatthechangemaycauseourproductiondatabasetofail…”
10
Isthisok?
©2008-16NewRelic,Inc.Allrightsreserved.
“…Sincewedidn’twanttobringdownproduction,wedecidedtomakethechangetoourbackup
(replica)databaseinstead…
…Afterall,itwasn’tbeingusedforanything
atthemoment.”
UnderConstruction
11
Isthisok?
©2008-16NewRelic,Inc.Allrightsreserved.
Until,ofcourse,thebackupwasneeded…
UnderConstructionX
12
Isthisok?
©2008-16NewRelic,Inc.Allrightsreserved.
Until,ofcourse,thebackupwasneeded…
Thiswasatruestory
UnderConstruction!!!!X
X
13
Iflyradiocontrolledmodelairplanes
©2008-16NewRelic,Inc.Allrightsreserved.
“Keepyourplaneatleasttwomistakeshigh.”
There’sanoldadage:
14
ButWhy?
©2008-16NewRelic,Inc.Allrightsreserved.
“Keepyourplaneatleasttwomistakeshigh.”
15
WhyTwoMistakesHigh?
©2008-16NewRelic,Inc.Allrightsreserved.
Youperformsomestunt,anditfails…Youlosealtitude
Youalwayswanttobehighenoughtomakeamistake,
evenifyou’vejustmadeamistake…
16
WhyTwoMistakesHigh?
©2008-16NewRelic,Inc.Allrightsreserved.
Youperformsomestunt,anditfails…Youlosealtitude
Now,youarelower,andyouaretryingtorecover
Youalwayswanttobehighenoughtomakeamistake,
evenifyou’vejustmadeamistake…
17
WhyTwoMistakesHigh?
©2008-16NewRelic,Inc.Allrightsreserved.
Youperformsomestunt,anditfails…Youlosealtitude
Now,youarelower,andyouaretryingtorecover
Youwanttostillbehighenough, sothatifyoumakeanothermistake,youwon’tcrash
Youalwayswanttobehighenoughtomakeamistake,
evenifyou’vejustmadeamistake…
18
WhyTwoMistakesHigh?
©2008-16NewRelic,Inc.Allrightsreserved.
Youperformsomestunt,anditfails…Youlosealtitude
Now,youarelower,andyouaretryingtorecover
Youwanttostillbehighenough, sothatifyoumakeanothermistake,youwon’tcrash
Youalwayswanttobehighenoughtomakeamistake,
evenifyou’vejustmadeamistake…
19
Putanotherway…
©2008-16NewRelic,Inc.Allrightsreserved.
…evenifyouarecurrentlyrecovering
fromamistake
…flyingtwomistakeshigh,youcanalwayshaveabackupplanfor
recovering fromamistake
20©2008-16NewRelic,Inc.Allrightsreserved.
Don’tscrewup...
…whileyouarescrewingup
Thissameapplieswhenbuildinghighlyavailable,highscaleapplications
21©2008-16 New Relic, Inc. All rights reserved.
22
Howdowekeep“TwoMistakesHigh”inanapplication?
©2008-16NewRelic,Inc.Allrightsreserved.
Walkthroughramifications andrecoveryplan
23©2008-16NewRelic,Inc.Allrightsreserved.
Walkthroughramifications andrecoveryplan
Makesurerecoveryplanworks§ Hasnomistakes§ Hasitsownrecoveryplan
Howdowekeep“TwoMistakesHigh”inanapplication?
24©2008-16NewRelic,Inc.Allrightsreserved.
Walkthroughramifications andrecoveryplan
Ifrecoveryplandoesn’twork…
it’snotagoodrecoveryplan
Makesurerecoveryplanworks§ Hasnomistakes§ Hasitsownrecoveryplan
Howdowekeep“TwoMistakesHigh”inanapplication?
25©2008-16 New Relic, Inc. All rights reserved.
EXAMPLEHowmanynodesdoweneed?
26
EXAMPLEHowmanynodesdoweneed?
©2008-16NewRelic,Inc.Allrightsreserved.
HowmanynodesdoIneedtohandlemytrafficdemands?
BuildingaService§ Designedtohandle1,000req/sec(assumesinglenode=300req/sec)
27
EXAMPLEHowmanynodesdoweneed?
©2008-16NewRelic,Inc.Allrightsreserved.
Right???
§ ceil[1,000/300]=4nodes§ Withfournodes,canhandleourtraffic§ PLUS wehaveenoughnodesthatwecanloseone!Wehaveredundancy!
28
EXAMPLEWellno…
©2008-16NewRelic,Inc.Allrightsreserved.
Youthink4nodesgivesyouredundancy,butitdoesn’t...
Ifyouloseoneofthosenodes:§ Remainingnodescanonlyhandle300*3=900req/sec
§ Cannothandlethe1,000req/secload
29
EXAMPLEHowmanydoweneed?
©2008-16NewRelic,Inc.Allrightsreserved.
4nodes...allowshandlingourtrafficbutwecannothandleanodefailure
5nodes...allowshandlingasinglenodefailure
But…
Noupgrading
6nodes...amulti-nodefailure,
Or…
Handleafailureduringanupgrade
30
LESSONFlyTwoMistakesHigh
©2008-16NewRelic,Inc.Allrightsreserved.
Evenifyouthinkyouhaveredundancy…
§ Thinkthroughthefailuremodes§ …and makesure
31©2008-16 New Relic, Inc. All rights reserved.
EXAMPLERollingupgrades
32
EXAMPLERollingupgrades
©2008-16NewRelic,Inc.Allrightsreserved.
Areyousafe?
Youneed10nodestorunyourapplication
Youhave11nodes,sothatyoucandorollingupgrades
§ Bringonenodedownatatimetoupgrade…
§ Alwaysatleast10available...
33
EXAMPLEWellno…
©2008-16NewRelic,Inc.Allrightsreserved.
Withthefailedservertocontendwith…youhavenoroomtodoanupgradeorrollback,
andyouareatriskforanotherfailure
§ Whatifthatnodefailsduringupgrade?§ Whatifyounowhavetorollback?
34
LESSONFlyTwoMistakesHigh
©2008-16NewRelic,Inc.Allrightsreserved.
Makesureyoucanhandlefailures
§ Evenduring“exceptional”events,suchasupgrades
§ Exceptionaleventscancausefailures
35©2008-16 New Relic, Inc. All rights reserved.
EXAMPLEUnknowndependencies
? ?
36
EXAMPLEUnknowndependencies
©2008-16NewRelic,Inc.Allrightsreserved.
Areyousafe?
Youhaveyourapplicationrunningon20servers…§ Youcanrunon15serversifnecessary
§ Plentyofredundancy
37
EXAMPLEWell,depends…
©2008-16NewRelic,Inc.Allrightsreserved.
Areanyofthe20servers in
thesamerack?
38
EXAMPLEWell,depends…
©2008-16NewRelic,Inc.Allrightsreserved.
Areanyofthe20servers in
thesamerack?
Sharethesamepowersupply?
39
EXAMPLEWell,depends…
©2008-16NewRelic,Inc.Allrightsreserved.
Areanyofthe20servers in
thesamerack?
Sharethesamepowersupply?
Sharethesamepowersource?
40
EXAMPLEWell,depends…
©2008-16NewRelic,Inc.Allrightsreserved.
Areanyofthe20servers in
thesamerack?
Sharethesamepowersupply?
Sharethesamepowersource?
SharethesameA/Csystem?
41
LESSONFlyTwoMistakesHigh
©2008-16NewRelic,Inc.Allrightsreserved.
Redundancyisnotredundancywhentheresourcesarenotindependent
42©2008-16 New Relic, Inc. All rights reserved.
EXAMPLEFailureloop
43
EXAMPLEFailureloop
©2008-16NewRelic,Inc.Allrightsreserved.
Areyousafefrompoweroutages?
Youliveinanapartment…§ Theapartmentprovidesanenclosedgaragetostorethingsin
§ Thepowergoesoutinyourplacealot…§ ...youbuyagenerator,storeitinthegarage
44
EXAMPLEFailureloop
©2008-16NewRelic,Inc.Allrightsreserved.
Oops
Oops…thegarage:§ Hasasingledoor,thebiggaragedoor§ Ithasagaragedooropener§ Thatrequireselectricitytoopen...§ Thegeneratorisonlyavailable...whenyoualreadyhavepower…
45
LESSONFlyTwoMistakesHigh
©2008-16NewRelic,Inc.Allrightsreserved.
Makesureyourrecoveryplansactuallyareoperationalwhenyouareinafailuremode
46©2008-16 New Relic, Inc. All rights reserved.
EXAMPLEHighredundancyinaction
47
EXAMPLEArealsystem…
©2008-16NewRelic,Inc.Allrightsreserved.
Highlyindependent
Multi-levelerror recovery
Highlyrecoverablesystem
Redundant
48
EXAMPLEArealsystem…
©2008-16NewRelic,Inc.Allrightsreserved.
Infact,oneoftheveryfirstlargescalesoftwareapplicationsutilizingextremeredundancyandfailuremanagement
Highlyindependent
Multi-levelerror recovery
Highlyrecoverablesystem
Redundant
49
EXAMPLEWhatisthissystem?
©2008-16NewRelic,Inc.Allrightsreserved.
50
EXAMPLEUSSpaceShuttleProgram
©2008-16NewRelic,Inc.Allrightsreserved.
§ Theyhadproblems…seriousmechanicalproblems...
§ Butthesoftwaresystemutilizedstateoftheart:• Redundancytechniques• Errorrecoverytechniques
51
EXAMPLEUSSpaceShuttleSystem
©2008-16NewRelic,Inc.Allrightsreserved.
Five onboardcomputers§ Fourwereidentical(fifthtalkaboutlater)
§ Allfour:– Rantheexactsameprogramduringcriticalperiods
– Givensamedata– Expectedtogeneratethesameresult
52
EXAMPLEFourcomputers
©2008-16NewRelic,Inc.Allrightsreserved.
Computersvotedontheproperoutcome
Ifanyonecomputerdidnotgeneratethesameresults:
53
EXAMPLEFourcomputers
©2008-16NewRelic,Inc.Allrightsreserved.
Computersvotedontheproperoutcome
Thosethatdisagreedwiththeoutcomewereturnedoffforremainderoftheflight
Ifanyonecomputerdidnotgeneratethesameresults:
54
EXAMPLEFourcomputers
©2008-16NewRelic,Inc.Allrightsreserved.
Ultimateindemocraticsystems…
Computersvotedontheproperoutcome
Thosethatdisagreedwiththeoutcomewereturnedoffforremainderoftheflight
Ifanyonecomputerdidnotgeneratethesameresults:
55
EXAMPLEFourcomputers
©2008-16NewRelic,Inc.Allrightsreserved.
CouldFLYwithonlyTHREE computersworking
CouldLANDwithonlyTWO computersworking
56
EXAMPLETies
©2008-16NewRelic,Inc.Allrightsreserved.
Whatifthefourcomputerscouldn’tdecide?
(softwarebugormultiplefailures)
57
EXAMPLETies
©2008-16NewRelic,Inc.Allrightsreserved.
Whatifthefourcomputerscouldn’tdecide?
(softwarebugormultiplefailures)
Fifthcomputerwasusedasatiebreaker
§ Muchsimplerversionofsoftware…onlyusedforkeydecisions
§ Softwarewrittenbyindependentsoftwareteam,unconnectedwithrestofsoftwaredevelopers
§ (Intheory)wouldnotintroducesamesoftwareerrors…
58
HighlySuccessful
©2008-16 New Relic, Inc. All rights reserved.
30-yearoperationofSpaceShuttle:§ Neveracasewhereaseriouslifethreateningproblemoccurredthatwasaresultofasoftwareproblem
§ Eventhoughsoftwarewasthemostcomplexsoftwareeverbuiltforaspaceprogram
59
USSpaceShuttle
©2008-16NewRelic,Inc.Allrightsreserved.
Thisisextreme(notneededbymostprojects)§ Showswhatispossible...§ Independenceiscriticaltohighavailability
60
LESSONFlyTwoMistakesHigh
©2008-16NewRelic,Inc.Allrightsreserved.
Useavailabilitysolutionconsistentwiththerisk
61
LESSONFlyTwoMistakesHigh
©2008-16NewRelic,Inc.Allrightsreserved.
Useavailabilitysolutionconsistentwiththerisk
Highertherisk,higherthefocusonavailability
62
LESSONFlyTwoMistakesHigh
©2008-16NewRelic,Inc.Allrightsreserved.
Useavailabilitysolutionconsistentwiththerisk
Highertherisk,higherthefocusonavailability
Don’toverinvest,don’tunder
invest
63
LESSONFlyTwoMistakesHigh
©2008-16NewRelic,Inc.Allrightsreserved.
Useavailabilitysolutionconsistentwiththerisk
Highertherisk,higherthefocusonavailability
Don’toverinvest,don’tunder
invest
Butthinkahead,avoidthesurprise
64
Andremember…
©2008-16NewRelic,Inc.Allrightsreserved.
“Keepyourplaneatleasttwomistakeshigh.”
ArchitectingforScaleBy:LeeAtchisonPublished by:O’ReillyMedia,Available:June2016www.architectingforscale.com
WanttoLearnMore?
©2008-15 New Relic, Inc. All rights reserved.
Thank you.
LeeAtchisonPrincipalCloudArchitectandAdvocateatNewRelic,Inc.
Architecting forScalePublished by:O’ReillyMedia,Available: June2016www.architectingforscale.com
@leeatchison leeatchison