Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty

Post on 13-Jan-2017

1,173 views 2 download

transcript

1 @Dynatrace

Application Quality Metrics for your Pipeline(and why Docker is not the solution to all of your problems) Andreas (Andi) Grabner -

@grabnerandi

Metrics-Driven DevOps

700 deployments / year

10 + deployments / day

50 – 60 deployments / day

Every 11.6 seconds

Example #1: Online Casino 282! Objects on that page9.68MB Page Size

8.8s Page Load Time

Most objects are images delivered from your main

domain

Very long Connect time (1.8s) to your CDN

879! SQL Queries8! Missing CSS & JS Files

340! Calls to GetItemById

Example #2: Lawyer Website based on SharePoint

11s! To load Landing Page

• Waterfall Agile: 3 years• 220 Apps - 1 deployment per month

“EVERYONE can do Continuous Delivery”

“Every manual tester does AUTOMATION”“WE DON’T LOG BUGS – WE FIX THEM!”

Measures Built-In, Visible to EveryonePromote your Wins, Educate your Peers

Challenges

Deploy Faster!!

Fail Faster!?

Its not about blind automation of pushing more bad code through a shiny pipeline

Metrics based

Decisions!

Time of D

eployment

Availability dropped to 0%

Bad Deployment based on Resource Consumption

With increasing load: Which LAYER doesn’t SCALE?

Usage by Channel? Errors on Devices?

App with Regular Load supported by

10 Containers

Twice the Load but 48 (=4.8x!) Containers! App doesn’t scale!!

Technical Debt!

80%$60B

Insufficient Focus on Quality

The “War Room”

Facebook – December 2012

20%80%

I learning from

others

4 use cases WHY did it happen? HOW to avoid it! METRICS to guide you.

#1 : Not every Architect

makes good decisions

• Symptoms• HTML takes between 60 and 120s to render• High GC Time

• Developer Assumptions• Bad GC Tuning• Probably bad Database Performance as rendering was simple

• Result: 2 Years of Finger pointing between Dev and DBA

Project: Online Room Reservation System

Developers built own monitoringvoid roomreservationReport(int officeId){ long startTime = System.currentTimeMillis(); Object data = loadDataForOffice(officeId); long dataLoadTime = System.currentTimeMillis() - startTime; generateReport(data, officeId);}

Result:Avg. Data Load Time: 45s!

DB Tool says:Avg. SQL Query: <1ms!

#1: Loading too much data24889! Calls to the Database

API!

High CPU and High Memory Usage to keep all data in Memory

#2: On individual connections 12444! individual

connections

Classical N+1 Query Problem

Individual SQL really <1ms

#3: Putting all data in temp Hashtable

Lots of time spent in Hashtable.get

Called from their Entity Objects

• … you know what code is doing you inherited!!• … you are not making mistakes like this

• Explore the Right Tools• Built-In Database Analysis Tools• “Logging” options of Frameworks such as Hibernate, …• JMX, Perf Counters, … of your Application Servers• Performance Tracing Tools: Dynatrace, Ruxit, NewRelic,

AppDynamics, Your Profiler of Choice …

Lessons Learned – Don’t Assume …

Key Metrics# of SQL Calls# of same SQL Execs (1+N)# of ConnectionsRows/Data Transferred

41 @Dynatrace

42 @Dynatrace

#2There is no easy "Migration" to Micro(Services)

43 @Dynatrace

26.7s Execution Time 33! Calls to the

same Web Service

171! SQL Queries through LINQ by this Web Service – request

similar data for each call

Architecture Violation: Direct access to DB instead from frontend logic

44 @Dynatrace

Key Metrics# Service Calls, # Containers# of Threads, Sync and Wait # SQL executions# of SAME SQL’sPayload (kB) of Service Calls

45 @Dynatrace

46 @Dynatrace

#3don't ASSUME you

know the environment

Distance calculation issues

480km biking in 1 hour!

Solution: Unit Test in Live App reports Geo

Calc Problems

Finding: Only happens on certain

Android versions

3rd party issues

Impact of bad 3rd party calls

49 @Dynatrace

Key Metrics# of functional errors# and Status of 3rd party callsPayload of Calls

12 000 000 $

51 @Dynatrace

#4Thinking Big?

Then Start Small!

52 @DynatraceAvailability dropped to 0%

Load Spike resulted in UnavailabilityAd on air

53 @Dynatrace

Alternative: “GoDaddy goes DevOps”

Response time improved 4x

1h before SuperBowl KickOff

1h after Game ended

54 @Dynatrace

Key Metrics

# Domains

Total Size of Content

55 @Dynatrace

What have we learned so far?

56 @Dynatrace

1. # Resources2. Size of Resources3. Page Size4. # Functional Errors5. 3rd Party calls6. # SQL Executions7. # of SAME SQLs

MetricBased

DecisionsAre Cool

We want to get from here …

To here!

Use these application metrics as additional Quality Gates

60

What you currently measure

What you should measure

Quality Metrics in your pipeline # Test Failures

Overall Duration

Execution Time per test# calls to API# executed SQL statements# Web Service Calls# JMS Messages# Objects Allocated# Exceptions# Log Messages# HTTP 4xx/5xxRequest/Response SizePage Load/Rendering Time…

Extend your Continuous Integration

12 0 120ms3 1 68ms

Build 20 testPurchase OKtestSearch OK

Build 17 testPurchase OKtestSearch OK

Build 18 testPurchase FAILEDtestSearch OK

Build 19 testPurchase OKtestSearch OK

Build # Test Case Status # SQL # Excep CPU

12 0 120ms3 1 68ms

12 5 60ms3 1 68ms

75 0 230ms3 1 68ms

Test & Monitoring Framework Results Architectural Data

We identified a regresesion

Problem solved

Exceptions probably reason for failed testsProblem fixed but now we have an

architectural regressionProblem fixed but now we have an

architectural regressionNow we have the functional and architectural confidence

Let’s look behind the scenes

#1: Analyzing every Unit & Integration test

#2: Metrics for each test

#3: Detecting regression based on measure

Unit/Integration Tests are auto baselined! Regressions auto-detected!

Build-by-Build Quality ViewBuild Quality Overview in

Dynatrace or JenkinsBuild Quality Overview in

Dynatrace & your CI server

Production Data: Real User & Application Monitoring

Recap!

#1: Pick your App Metrics

# of Service Calls Bytes Sent & Received

# of Worker Threads

# of Worker Threads

# of SQL Calls, # of Same SQLs # of DB

Connections

# of SQL Calls, # of Same SQLs # of DB

Connections

#2: Figure out how to monitor themhttp://bit.ly/dtpersonal

#3: Automate it into your Pipeline

#4: Also do it in Production

Better Software,

Faster!!

Draw better Unicorns

75 @Dynatrace

Questions and/or DemoSlides: slideshare.net/grabnerandiGet Tools: bit.ly/dtpersonalYouTube Tutorials: bit.ly/dttutorialsContact Me: agrabner@dynatrace.comFollow Me: @grabnerandiRead More: blog.dynatrace.com

76 @Dynatrace

Andreas GrabnerDynatrace Developer Advocate@grabnerandihttp://blog.dynatrace.com