+ All Categories
Home > Documents > 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now...

1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now...

Date post: 14-Jan-2016
Category:
Upload: alexia-greer
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
15
1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015
Transcript
Page 1: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

1

Mean Time to Innocence

Your Dashboards are Green – but your end users are still complaining. Now What?

Phil StanhopeOctober 2015

Page 2: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

22

30B Real-Time Steering Decisions per day

6B trace route and RUM latency measurements per day

That’s over 6 Light years!

13 Hops per traceroute

Traffic covering 80% of ASNs on the internet seen every few minutes

52K ASN monitored

200M BGP updates per day

No major CDN can deliver 99.9 uptime – from the end users perspective. But is it fault.

Real Time Feeds

Cooked Time Series Data – Near Real Time

Pre-Cooked across ~1000 dimensions every 5 minutes (Geography, Mobile Network, Fixed Line Networks, Target Markets Cities and IPSets)

Outages & Hijacks

Pairwise Comparisons

Performance Alarms

Some Numbers

Page 3: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

3

● Major Outages • Major Impact• Rare

● Regional Outages and Degradations

• Variable Impact• Always Happening

“We experienced an Internet connectivity issue with a provider outside of our network which affected traffic from some end-user networks.” AWS

Business Impacting

3

Page 4: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

4

● Consolidated view across your Internet Infrastructure

● Determine the impact to Cloud, CDN and Hosting Infrastructure globally

● Immediate time to information

What is Internet Intelligence?

4

Page 5: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

5

Leverage Currently Deployed Dyn Assets

● Global Monitoring Infrastructure

● Custom Cloud Monitoring Infrastructure

● Real User Monitoring data

● Global Routing Infrastructure Monitors

How is it Done?

5

Page 6: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

6

Global Monitoring Infrastructure

6

Page 7: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

77

Reachability Markets

Page 8: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

88

What is being Monitored?

Page 9: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

99

Waterfalls & RUM – Where do you start?

Page 10: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

1010

Rather than focus on entire page RUM and waterfall – focus on what happens OUTSIDE of normal your span of control as a cloud, content & security consumer:

Monitor the critical content servers (CDNs both public and private)

Monitor the cloud providers, DNS providers & core SaaS providers

Give you the tooling to get to start answering mean time to innocence questions

Is it a problem you have ability to address? Not if it’s your cloud provider’s transit. Or the ISPs recursive DNS.

Is your CDN provider overloaded? Is there a more generalized congestion problem on the internet?

Are the network paths to your users suboptimal – maybe even hijacked?

Can you see a micro-outage? Can you see patterns with providers?

Did a user come via a proxy gateway? Does the gateway fail to forward websockets?

Let’s Dive in – Some Context

Page 11: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

1111

NOTE: This is a fake URL – it won’t work for you. Sorry.

A single web page that shows combination of real-time and near-real-time forensic data

Intentionally unbranded – what can you do with our datasets?

Covers the internal APIs that we use – they are all becoming public. Talk to me!

Common set of UX controls can be used to a variety of real-time and batch data:

GeoViews, Sunburst, Matrix & Long-Term Trending

Under the covers: ReactJS, D3, GeoJson/Topojson, jQuery, Go, Varnish, Nginx, Websockets

Live Demo

Page 12: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

1212

Telemetry Data Cooking Pipeline

Users

Cover 80%Of the ASNs

On the internetEvery

minute

Relays

Globally distributed network. Handling

50K/sec per relayORI

GINS

DNS

RECURSIVES

ProbersNetwork of

300+ probers

performing10K

traces/second AND

synthetic DIG &

HTTP[S]

Geo annotated

real-time API

Time Series analyzed API

Gatherers

Real-time geo annotation,

data transformation

& filtering.Handling 100K/sec events

Cookers

Statistical analysis and aggregation

services

Page 13: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

13

Browser Recursive AuthoritiveInjector &Beacon

GET - http://dyninsight.com/inject/CUST_ID/CUST_DATA

beacon = HMAC(secret, token)Javascript “injection” – just like injecting an advertisement into a page

Writes a transparent iframe into the pageLoading the iframe requires resolving beaconGuaranteed to cause recursive DNS cache miss

time, client_ip, beacon

time, recursive_ip,beacon

HTTP DNS LOG & ANALYZE

Collect

GET - http://beacon.dyninsight.com/CUST_ID/CUST_DATA/token

time, client_ipDynamic HTML - containing customer resources to test

Resources 1 .. N @ target origins tested

Resource timing Information sent to collector Per resource timing info

1

2

3

4

5

KEY:

Gatherer

token = encode(cust_id, client_ip, time, nodeid, referer)

Time – 2 - AuthoritativeTime – 2 – Recursive (inferred)

Page 14: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

1414

Aggregated @ 5min, 1H & 1D

Cooking – What’s going on in our Data Kitchen?

MHD

Raw MHD formatted data at one minute

granularity

Client IP STATSHistograms

5 minute timing histograms

across 6 latency features

DNS IP STATSHistograms

5 minute timing histograms

across 6 latency features

IP MapsClient

Recursive

Recursive Client

Client IP SetsTyped Label IP Sets

LatenciesCountry

CityContinent

ASN

DNS IP Sets

Typed Label IP Sets

LatenciesCountry

CityContinent

ASN

Correlation

Scores and Ranks

Daily by Origin for every TYLIP

feature

All data is GEO RedundantGathering, Raw, Intermediates & Aggregates

Geo annotated

real-time API

Gatherers

Page 15: 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

15

QUESTIONS?


Recommended