Date post: | 10-May-2018 |
Category: |
Documents |
Upload: | nguyenkhanh |
View: | 215 times |
Download: | 1 times |
LearningGoals
• Broadscopeofso;wareengineering• Importanceofnontechnicalissues• IntroducContokeychallenges
2
Gov’texample:So)wareisintegraltoDoDsystems.
QuoCnganAirForcelieutenantgeneral,“TheonlythingyoucandowithanF-22thatdoesnotrequireso;wareistakeapictureofit.”
5
CrouchingDragon,HiddenSo;ware:So;wareinDodWeaponSystems(Ferguson,IEEESo;ware,2001)
FailedSo)wareProjects• SAGE(Semi-AutomaCcGroundEnvironment);started1951,almostobsoletewhenfinishedin1963;highercoststhanManha]anproject
• FBIVirtualCaseFilestoppedin2005a;er3yearsand170M$
• LondonstockexchangestoppedTaurusproject1993a;er11yearswhen13200%overbudget
6
EnvyofEngineers• Producingacar/bridge– EsCmablecostsandrisks– Expectedresults– Highquality
• SeparaConbetweenplanandproducCon
• SimulaConbeforeconstrucCon• Qualityassurancethroughmeasurement• PotenCalforautomaCon14
So)wareEngineering?
15
„TheEstablishmentanduseofsoundengineeringprinciplesinordertoobtaineconomicallyso4warethatisreliableandworksefficientlyonrealmachines.”
[Bauer1975,S.524]
WhathappenedwithHealthCare.gov?• PoorteamandprocesscoordinaCon.• Changingrequirements.• Inadequatequalityassuranceinfrastructure.• ArchitectureunsuitedtotheulCmatesystemload.
17
Howtodevelopso)ware?
1. Discusstheso;warethatneedstobewri]en
2. Writesomecode3. TestthecodetoidenCfythedefects4. Debugtofindcausesofdefects5. Fixthedefects6. Ifnotdone,returntostep119
So)wareProcess
“ThesetofacCviCesandassociatedresultsthatproduceaso;wareproduct”Whatmakesagoodprocess?
20
Sommerville,SE,ed.8
23
PercentofEffort
TimeProjectbeginning
Projectend
100%
0%
Trashing/Rework
ProducCveCoding
Process:CostandTimeesCmates,WriCngRequirements,Design,ChangeManagement,QualityAssurancePlan,
DevelopmentandIntegraConPlan
Exampleprocessissues• ChangeControl:Mid-projectinformalagreementtochanges
suggestedbycustomerormanager.Projectscopeexpands25-50%
• QualityAssurance:LatedetecConofrequirementsanddesignissues.Test-debug-reimplementcyclelimitsdevelopmentofnewfeatures.Releasewithknowndefects.
• DefectTracking:Bugreportscollectedinformally,forgo]en• SystemIntegraCon:IntegraConofindependentlydeveloped
componentsattheveryendoftheproject.Interfacesoutofsync.
• SourceCodeControl:Accidentallyoverwri]enchanges,lostwork.
• Scheduling:Whenprojectisbehind,developersareaskedweeklyfornewesCmates.
26
29
Largeteams(29people)createaroundsixCmesasmanydefectsassmallteams(3people)andobviouslyburnthroughalotmoremoney.Yet,thelargeteamappearstoproduceaboutthesamemountofoutputinonlyanaverageof12days’lessCme.Thisisatrulyastonishingfinding,throughitfitswithmypersonalexperienceonprojectsover35years. -PhillipAmour,2006,CACM49:9
Conway’sLaw
30
“AnyorganizaConthatdesignsasystem(definedbroadly)willproduceadesignwhosestructureisacopyoftheorganizaCon'scommunicaConstructure.”
— MelConway,1967
“Ifyouhavefourgroupsworkingonacompiler,you'llgeta4-passcompiler.”
Microso)'sSmallTeamPrac0ces
• Visionstatementandmilestones(2-4month),noformalspec
• FeatureselecCon,prioriCzedbymarket,assignedtomilestones
• Modulararchitecture– Allowssmallfederatedteams(Conway'slaw)
• SmallteamsofoverlappingfuncConalspecialists
32Windows95:200developersandtesters,oneof250products
Microso)'sSmallTeamPrac0ces
• FeatureTeam– 3-8developers(design,develop)– 3-8testers(validaCon,verificaCon,usability,marketanalysis)– 1programmanager(vision,schedulecommunicaCon;leader,facilitator)–workingonseveralfeatures– 1productmanager(markeCngresearch,plan,betas)
33
Microso)'sSmallTeamPrac0ces
• "Synchronizeandstabilize"• Foreachmilestone– 6-10weeksfeaturedevelopmentandconCnuoustesCng• frequentmerges,dailybuilds
– 2-5weeksintegraConandtesCng(“zero-bugrelease”,externalbetas)– 2-5weeksbuffer
34
AgilePrac0ces(e.g.,Scrum)
• 7+/-2teammembers,collocated• selfmanaging• Scrummaster(potenCallysharedamong2-3teams)• Productowner/customerrepresentaCve
35
MeasuringProgress?
• “I’malmostdonewiththeX.ComponentAisalmostfullyimplemented.ComponentBisfinishedexceptfortheonestupidbugthatsomeCmescrashestheserver.Ionlyneedtofindtheonestupidbug,butthatcanprobablybedoneinana;ernoon?”
37
AlmostDoneProblem• Last10%ofwork->40%ofCme(or20/80)
• Makeprogressmeasureable
• AvoiddependingenCrelyondeveloperesCmaCons
38
Cme
%com
pleted
90%100%
reportedprogress
plannedactual
ProjectPlanning
40
IdenCfyconstraints
EsCmateprojectparameters
Definemilestones
Createschedule
acCviCesbegin
Checkprogress
ReesCmateprojectparameter
Refineschedule
renegoCateconstraints Technicalreview
Problem?
no
yes
Done? yesno
Abort?
Budget,Personal,Deadlines
newfeaturerequests
ReasonsforMissedDeadlines• Insufficientstaff(illnesses,staffturnover,...)• InsufficientqualiCcaCon• UnanCcipateddifficulCes• UnrealisCcCmeesCmaCons• UnanCcipateddependencies• Changingrequirements,addiConalrequirements• Especiallyinstudentprojects– UnderesCmatedCmeforlearningtechnologies– UnevenworkdistribuCon– Last-minutepanic.
41
RecognizeSchedulingIssuesEarly
• MonitoringandformalreporCngnecessary– Establishwho,when,what– Compareplanned/actualdata
• Measurablemilestones• Outdatedschedulesnomeaningfulmanagementmechanism
42
Task:Es0mateTime• A:JavaversionoftheMonopolyboardgamewithPi]sburghstreetnames– (you)
• B:Banksmartphoneapp– (youwithteamof4developers,oneexperiencedwithiPhoneapps,onewithbackgroundinsecurity)
• EsCmatein8hdays(20workdaysinamonth,220peryear)
46
Anda,BenteCD,DagIKSjøberg,andAudrisMockus."Variabilityandreproducibilityinso;wareengineering:Astudyoffourcompaniesthatdevelopedthesamesystem."IEEETransacLonsonSo4wareEngineering35.3(2009):407-429.47
Innova0vevsRou0neProjects
• Mostso;wareprojectsareinnovaCve– Google,Amazon,Ebay,Ne{lix– VehiclesandroboCcs– Languageprocessing,Graphics
• RouCne(now,not10yearsago)– E-commercewebsites?– Manycontrolsystems?– RouCnegetsautomated->innovaConcycle
50
SourcesofUncertainty
• UnpredictableoperaCngenvironment– Cybersecuritythreats,devicedrivers– UnanCcipatedusagescenarios
• LimitedpredicCvepowerofmodels– HalCng,abstractinterpretaCon,tesCng
• BoundedraConalityofhumans– Designers,developers– Customers,users
51
Riskmanagement• Keytaskofaprojectmanager• IdenCfyandevaluaterisksearly• Ifnecessary,planmiCgaConstrategies• Documentresultsofriskanalysisinprojectplan
• Projectrisks:schedulingandresources– e.g.,staffillness/turnover
• Productrisks:QualityandfuncConalityoftheproduct– e.g.usedcomponenttooslow
• Businessrisks:– e.g.,compeCtorintroducessimilarproduct
52
So)wareArchitecture
"Theso4warearchitectureofacompuLngsystemisthesetofstructuresneededtoreasonaboutthesystem,whichcompriseso4wareelements,relaLonsamongthem,andproperLesofboth." [Clementsetal.2010]
55
Beyondfunc0onalcorrectness
• Qualityma]ers,eg.,– Availability– Modifiability,portability– Performance,scalability– Security– Testability– Usability– Costtobuild,costtooperate
56
Designvs.ArchitectureDesignQuesCons
• HowdoIaddamenuiteminEclipse?
• HowcanImakeiteasytoaddmenuitemsinEclipse?
• Whatlockprotectsthisdata?
• HowdoesGooglerankpages?
• WhatencodershouldIuseforsecurecommunicaCon?
• Whatistheinterfacebetweenobjects?
ArchitecturalQuesCons
• HowdoIextendEclipsewithaplugin?
• Whatthreadsexistandhowdotheycoordinate?
• HowdoesGooglescaletobillionsofhitsperday?
• WhereshouldIputmyfirewalls?
• Whatistheinterfacebetweensubsystems?
57
DecisiontoRearchitectTwi`er
"A;erthatexperience,wedeterminedweneededtostepback.Wethendeterminedweneededtore-architectthesitetosupporttheconCnuedgrowthofTwi]erandtokeepitrunningsmoothly."
62
RedesignGoals• Improvemedianlatency;loweroutliers• Reducenumberofmachines10x• Isolatefailures• "Wewantedcleanerboundarieswith“related”logicbeinginoneplace"– encapsulaConandmodularityatthesystemslevel(ratherthanattheclass,module,orpackagelevel)
• Quickerreleaseofnewfeatures– "runsmallandempoweredengineeringteamsthatcouldmakelocaldecisionsandshipuser-facingchanges,independentofotherteams"
63
JVMvsRubyVM
• Railsserverscapabileof200-300requests/sec/host• ExperiencewithScalaontheJVM;leveloftrust• RewriteforJVMallowed10-20krequests/sec/host
64
ProgrammingModel• Rubymodel:Concurrencyatprocesslevel;requestqueuedto
behandledbyoneprocess• Twi]erresponseaggregatedfromseveralservices–addiCve
responseCmes• "Aswestartedtodecomposethesystemintoservices,each
teamtookslightlydifferentapproaches.Forexample,thefailuresemanLcsfromclientstoservicesdidn’tinteractwell:wehadnoconsistentback-pressuremechanismforserverstosignalbacktoclientsandweexperienced“thunderingherds”fromclientsaggressivelyretryinglatentservices."
• Goal:Singleanduniformwayofthinkingaboutconcurrency– ImplementedinalibraryforRPC(Finagle),connecConpooling,failoverstrategiesandloadbalancing65
IndependentSystems• "Inourmonolithicworld,weeitherneededexpertswho
understoodtheenLrecodebaseorclearownersatthemoduleorclasslevel.Sadly,thecodebasewasgeZngtoolargetohaveglobalexpertsand,inpracLce,havingclearownersatthemoduleorclasslevelwasn’tworking.Ourcodebasewasbecominghardertomaintain,andteamsconstantlyspentLmegoingon“archeologydigs”tounderstandcertainfuncLonality.Orwe’dorganize“whalehunLngexpediLons”totrytounderstandlargescalefailuresthatoccurred."
• FrommonolithicsystemtomulCpleservices– AgreeonRPCinterfaces,developsysteminternalsindependently
– Self-containedteams
66
Storage• Single-masterMySQLdatabasebo]leneckdespitemore
modularcode• Temporalclustering
– Short-termsoluCon– Skewedloadbalance– Onemachine+replicaConsevery3weeks
• Movetodistributeddatabase(GlizzardonMySQL)with"roughlysortable"ids
• Stabilityoverfeatures–usingolderMySQLversion
67
Data-DrivenDecisions
• Manysmallindependentservices,numbergrowing• OwndynamicanalysistoolontopofRPCframework• Frameworktoconfigurelargenumbersofmachines– Includingfacilitytoexposefeaturetopartsofusersonly
68
OnSaturday,August3inJapan,peoplewatchedanairingofCastleintheSky,andatonemomenttheytooktoTwi]ersomuchthatwehitaone-secondpeakof143,199Tweetspersecond.
70
Outcome:Rearchitec0ngTwi`er
"Thisre-architecturehasnotonlymadetheservicemoreresilientwhentrafficspikestorecordhighs,butalsoprovidesamoreflexiblepla{ormonwhichtobuildmorefeaturesfaster,includingsynchronizingdirectmessagesacrossdevices,Twi]ercardsthatallowTweetstobecomericherandcontainmorecontent,andarichsearchexperiencethatincludesstoriesandusers."
71
KeyInsights:Twi`erCaseStudy
• ArchitecturaldecisionsaffectenCresystems,notonlyindividualmodules• Abstract,differentabstracConsfordifferentscenarios• Reasonaboutqualitya]ributesearly• Makearchitecturaldecisionsexplicit
72
QATradeoffs
• UnderstandlimitaConsofQAapproaches– e.g.tesCngvsstaCcanalysis,formalverificaConvsinspecCon,…
• Mixandmatchtechniques• DifferenttechniquesfordifferentqualiCes• …WhenamIdone?
79
Quickasideonbugfixingandthetrickyrela0onshipbetweendesign,intent,implementa0on,andyourcrankyusers…
80
Racecondi0ons• Racescanoccurwhen:
– MulCplethreadsofcontrolaccessshareddata– DatagetscorruptedwheninternalintegrityassumpConsareviolated.
• Howweprotectagainstraces– Use“lock”objectsthatenableaccessbyonethreadataCme
• E.g.,eventdispatch• AlanguagefeatureinJava,Ada95,etc.
– FollowathreaddisciplineinwhichonlyonethreadcanaccesscriCcaldata(CommoninGUIAPIse.g.,graphicaltoolkitredraw)
• Issue:Basicallythehardestbugstofind,fix,andprotectagainst.– Why?
81
java.io.BufferedInputStream
• Bufferingwrapperforunbufferedstreaminput:read,close,reset,skip,mark,etc.
• JDK<1.2:RacecondiConbetweenmethodsreadandclose:interleavedexecuConcouldcausereadtothrowNullPointerException– Butnotalways;concurrencyànon-
determinisCc!
• JDK1.2fixesbysynchronize-ingthemethods,prevenCngcloseandreadfrominterleaving.
82 (AaronGreenhouse)
Reac0ontobugfix
“Thisreallysucks.Nowjusttoconvertto[JDK1.2]I’vegottorewritecodethathasworkedsinceJDK1.02…It’spre`yobviousthatsyncingclosewouldbreakthings.” CommentinBugID#4225348:
“Aaempttoclosewhilereadingcausesdeadlock”
83
Whywaseveryonesomad?• Javasocketprogrammingidiomthatrequirestheabilitytoclosemid-read:
“Hung”socketstream:Useseparatethreadtocloseandinterrupt“hung”readorwrite
• Inotherwords:clientsassumedreadandclosecaninterleave!– Bugfixpreventsinterleaving.– Intentinferred—isitcorrect?
• Designchoices—Whatis/wasthedesignintent?– Interleavingintended—Fixracewhileallowinginterleaving– Interleavingnotintended—ProvidealternaCveidiomtogetthesameeffect.
• WhatshouldtheJavadesignershavedone?What’sagoodsolu0ontothisproblem?Whosefaultwasit?
84
Upshot• FixwasundoneinJDK1.3
– Re-enabledsocketidiom.– Compromisessafetyoftheclassbyre-enablingtheracecondiCon
• BufferedInputStreamwasfixedtobothpreventtheraceandallowsocketidiomforJDK1.5
• Issue#1–RacecondiConindeployedproducConlibrarycode• Issue#2–LackofdocumentaConofdesignintentwith
respecttoconcurrency.• Moral:bugsarehard,andcorrectnessdependsoncontext
anduserexpecta0ons.
85