Date post: | 08-Apr-2017 |
Category: |
Software |
Upload: | aviran-mordo |
View: | 1,474 times |
Download: | 2 times |
PowerPoint Presentation
Aviran MordoHead of Engineering
@aviranmlinkedin.com/in/aviranaviransplace.com The Road to Continuous Delivery
How many built a website?1
Wix is a web publishing platform2
Wix In NumbersOver 66,000,000 usersStatic storage is >2PB of data3 Data centers + 3 Clouds (Google, Amazon)2B HTTP requests/day1000 people work at Wix
Traditional Dev Pipeline
10:45
Traditional Dev Pipeline
WaterfallLong development cycleTime waste (Wait)Late feedbackHard to fix1-2 Releases a year
Scrum
Scrum != Agile
Lets Go Back In Time
Where We WereWorking traditional waterfallWith fear of change With low product qualityWith slow development velocityWith tradition enterprise development lifecycle - Three months of a VERSION development and QA - Six months of crisis mode stabilizing system
10:45
Production SystemApproach to ProductionBuild only what is neededStop if something goes wrongEliminate anything which does not add value
Philosophy of WorkRespect for workersFull utilization of workers capabilitiesEntrust workers with responsibility & authority Taiichi Ohno (1912-1990)
Taiichi Ohno, Toyota's chief of production in the post-WWII period. He was THE main developer of Toyota Production System (TPS).
13
Seeing WasteSeven Wastes of Manufacturing
InventoryExtra ProcessingOverproductionTransportationWaitingMotionDefects Seven Wastes of Software Development
Partially Done WorkPaperworkExtra FeaturesBuilding the wrong thingWaiting for informationTask switchingDefects
Taiichi Ohno, Toyota's chief of production in the post-WWII period. He was THE main developer of Toyota Production System (TPS).
14
The Biggest Source Of WasteFeatures and functions used in a typical systemOften or Always used: 20%Rarely or never used: 64%
Taiichi Ohno, Toyota's chief of production in the post-WWII period. He was THE main developer of Toyota Production System (TPS).
15
Lean Product development Top 5 Most-Used Commands in Microsoft Word PasteSaveCopyUndoBold
Paste itself accounts for more than 11% of all commands used, and has more than twice as much usage as the #2 entry on the list, Save.
32% of the total command usage
Scaling challenges ProductProduct Minimum Viable Product (MVP)Does MVP meet your product standards?What about tooltip, help,first time ux, etc.. ? And that can win in a/b test
To Be Implemented
17
Get out of thought land The law of failure Most new its will fail even if they are flawlessly executed
Invest less, in-touch less , better ability to admit it fail Data beats opinions - let the customer decide
Make sure you building the right it before build it right
Quick Feedback
18
Continuous Delivery
RiskWaterfall - minimize number of deploymentsCD - minimize number of changes and impact in $$
Risk = #deployments* chance of something going wrong (~ number of changes) * impact of something wrong in $$
Small Development IterationsNo WaterfallNo ScrumNo IterationsNo long documentsBuild something smallWhen it is ready, deploy it - Measure it - Then fix it - Repeat, until Dev, Product and Customers are happy
Product / Dev / QA / Ops boundaries are going down
22
What Is The Common Denominator?Product managerProject managerQAOperationsDBADevelopers can do these jobs
CD is culture & mindsetTrust the developers - Empower developers to change production - Developer knows his system best
Automation as a default choice - No more is it worth to automate ? - Everything should be automated
Welcome to the twilight zone - Product/Dev/QA boundaries are going down - Everyone need to care about everything - Less formality : Corridor - IN , Meeting Room - Out
24
Dev Centric Culture Involve the DeveloperProduct definition (with product) Development (with architect)Testing (with QA developers)Deployment / Rollback(with ops) Monitoring / BI (with BI team)DevOps to enable deployment and rollback, fully automatedSupport Circle
The process for releasing/deploying software MUST be repeatable and reliableAutomate everything!If something's difficult or painful, do it more oftenKeep everything in source controlDone means releasedBuilt in qualityEverybody has responsibility for the release processImprove continuouslyContinuous Delivery principles
Test Driven DevelopmentNo new code is pushed to Git without being fully tested - We currently have over 40,000 automated tests
Before fixing a bug first write a test to reproduce the bug
Cover legacy (untested) systems with Integration tests
What people think of TDDTDD slows down developmentWith TDD we write more code (product + test code).TDD has no significant impact on quality
What people think of TDDTDD slows down developmentWith TDD we write more code (product + test code).TDD has no significant impact on quality
TDD Actual impact on development
We develop products fasterRemoves fear of changeEasier to enter some one elses projectDo we still need QA? (Yes, they code automation tests) - We dont have QA for back-end applicationsWriting a feature is 10-30% slower, 45-90% less bugs50% faster to reach production.Considerably less time to fix bugs (almost no need for debuger)
Guidelines for successful TDDTests should run on project checkout to a random computer. Tests should be debugged on a developers machine Tests should run fastTests have to be readable They are the projects specsFixture is evil!
Refactoring
Is Refactoring Rework?Absolutely NOT !Refactoring is the outcome of learningRefactoring is the cornerstone of improvementRefactoring builds the capacity to changeRefactoring doesnt cost, it pays
RefactoringRefactor from inside outSmall iterations with testsRefactor small methods make sure the tests dont breakDeploy oftenRe-write from the outside inWrite from scratch (one piece at a time)Code duplication sometimes needed (temporary)Protected by Feature Toggle
Before refactoring cover everything with tests- Legacy code usually covered by IT tests
Feature Toggles10:45
One of the key components to successful CD35
Code branch
New CodeOld Code
FT OpenedYesNo
Usage exampleSimple if statement in your code
Feature TogglesEveryone develops on the TrunkEvery piece of code can get to production at anytime Unused new code can go to production no harm doneOperational new code goes with a guard use new or old code by feature toggle
DB Schema Changes Without DowntimeAdding columns - Use another table link by primary key - Use blob field for schema flexibility
Removing fields - Stop using. Do not do any DB schema changes
New DB schema with data migrationPlan a lazy migration path controlled by feature toggleWrite to old / Read from oldWrite to both / Read from old Write to both / Read from new, fallback to old Backward compatibility is a mustWrite to new / Read from new, fallback to oldEagerly migrate data in the backgroundWrite to new / Read from new
Feature Toggle Strategies (gradual expose users)
Company employeesSpecific users or group of usersPercentage of trafficBy GEO By LanguageBy user-agentUser Profile basedBy context (site id or some kind of hash on site id)
Feature Toggle OverrideBy specific serverUsed to test system loadNew database flows/migrationRefactoring that may affect performance and memory usageBy Url parameterEnable internal testingProduct acceptanceFaking GEOBy FT cookie valueTestingWhen working with API on a single page application
Full load on a single serverOverride size limitation by setting a cookie on the client43
A/B Test
A/B TestEvery new feature is A/B testedWe open the new feature to a % of users - Define KPIs to check if the new feature is better - If it is better, keep it - If worse, check why and improve - impact of flaws is just for % of our users
Link to purchase on the editor was causing drop in conversion because users went there too soon without intent45
An interesting site effect on productHow many times did you have the conversion what is better? - Put the menu on top / on the sideWell, how about building both and A/B Testing?
Link to purchase on the editor was causing drop in conversion because users went there too soon without intent46
Marking users for persistent UXAnonymous user - Toss is randomly determined - Can not guarantee persistent experience if changing browser
Registered User - Toss is determined by the user ID - Guarantee toss persistency across browsers - Allows setting additional tossing criteria (for example new users only) - Only use this for sections that a user has to be authenticated
Do not mix anonymous and registered tests
AB test parentage of users with optional filtersNew Users Only (Registered users only)By language By GEOBy Browser user-agent OSAny other criteria you have on your users
A/B Test FeaturesA/B Test OverrideStartStopPauseBots are always excluded from the test
Wix PETRI
NOT !!!
Gradual DeploymentAssume two componentsWe shutdown one and install on it the new version. It is not active yetDo self testActivate the new server it is passes self testContinue deploying the other servers, a few at a time, checking each one with self testA 1.1B 1.1A 1.1B 1.2A 1.1A 1.1B 1.1B 1.1A 1.1A 1.1B 1.1B 1.2A 1.1B 1.2A 1.1A 1.1B 1.1B 1.2A 1.1B 1.1A 1.1A 1.1B 1.1B 1.2
Self Test / Post Deployment TestAfter each server deployment run a self test before deploying the next server.Checking server configuration and topology - Make sure DB is accessible - Is the schema the one I expect - Access required local resources (files, config, templates, etc) - Access remote resources - RPC / REST endpoints reachable and operationalServer will refuse requests unless it passes the self testAllow a way to skip self test (and continue deployment)
Tools - App-info Self Test
Backward and Forward compatibleAssume two components
We release a new version of one
Now Rollback the other
A 1.1B 1.2A 1.2B 1.1A 1.1A 1.1B 1.1B 1.2A 1.2A 1.1B 1.1B 1.1A 1.1B 1.1A 1.1A 1.1B 1.1B 1.1A 1.0A 1.2A 1.1B 1.2B 1.1B 1.2A 1.2A 1.2A 1.1B 1.2B 1.1B 1.0
A Story on Wix Time Machine
Time machine event = Deployment capabilities : no click deployment - Dozens of services , 130+ servers, 3 Data CentersBackward and forward compatibility at the extreme field test case - Mixed versions of services / DB with no service downtimeEmpowerment - The power we give to individual Risk taken and failure embracement
57
Wix in 201417,000 Deployments (production changes) a yearDouble the velocity from last yearEvery 7 minutes production changes its state (during working hours)
Do You Have The Guts To Deploy 60 Times A Day?
CD prepare to invest..Dev infrastructure - Refactor , Refactor, RefactorTesting infrastructure & know howDeployment infrastructure & toolsAutomation , Automation , Automation Monitoring (business and technical)hundreds of aspects thresholds use is a MustMonitor business KPIsInternal & external Endless Tuning & learning
60
How does it work CD PracticesTest driven developmentSmall Development IterationsBackwards and Forwards compatibleGradual Deployment & Self-TestFeature ToggleA/B TestingException ClassificationProduction visibility
Tools - App-info - Dashboard
Tools - App-info Running Experiments
App-Info Resource Pools
Tools Monitoring - New Relic
Tools Frying Pan
Tools Lifecycle To Rule Them All
Where are we today?We have re-written our flash editor product as an HTML 5 editor - In just 4 monthsIntroduced Wix 3rd party applications (developers API) - In just 6 weeksWe are easily replacing significant parts of our infrastructureAnd we are doing ~60 releases a day!Production state changes every 7 minutes.
Aviran MordoHead of Back-end Engineering
@aviranmlinkedin.com/in/aviranaviransplace.com The Road to Continuous Delivery
Read more: The Road to Continuous Delivery: http://goo.gl/K6zEK
Dev-Centric Culture: http://goo.gl/0Vo70t
How many built a website?69
How would you do it?How will you change Wix session encryption key with as little service interruption as possibleEncryption key is currently hard coded in the frameworkAll the services have the encryption keyUser server creates a sessionServices can renew a sessionExternal services not in the framework also have the encryption key