Managing Virtual Sprawl

Post on 19-Jun-2015

1,874 views 0 download

Tags:

description

http://twitter.com/jhitchco

transcript

Managing Virtual Sprawl(How to not let this happen to you)

Jeremy Hitchcock, jeremy@dyn.com

Why care?

What you have

What you want What you got

Managing clouds like managing single systems increases "system" management by 10x-20x

Clouds Promise

• Greater efficiency

• Faster deploys/less management

• Little/no capital costs and no step functions

Sprawl Eats Potential

• Greater efficiency

• Faster deploys/less management

• Little/no capital costs and no step functions

15 years 3 years

Just not good, yet

Don’t just change broken light bulbs

Wait until it gets dark, then change them all

Let’s get started

1. Architectures

2. Pain points

3. Best practices

4. What do we get?

1: Architectures

• Architecture changes

• Decoupling

• Geography/load balancing

• Disaster recovery

2004

2007-2008

Opera dynamic resource pricing model

Decoupling

• Apps and infrastructure mirror each other

• Years of coupled development

• Hard to retrofit, easier to do from start

Decoupling

New:

Old:Web App DB

Dispatcher

Processing

Storage

Decoupling is Hard

• Logging/debugging

• Common scratch

• Images and provisioning

• Configuration data (run/boot)

• Job dispatch (async/sync)

Images and provisioning

Publish New Code

Add __ new front ends

Even better is that is automatic

Configuration Data

• Most config data is on each image

• Instead, auto populate into source control

• Config, image, controller re-architected

Job Dispatch (sync)

Read photo off disc

Request for photo

Log

Resize/reformat

Return photo to user

Job Dispatch (async)

Read photo off disc

Request for photo Log

Resize/reformat

Return photo to user

1

2

3

4

5

Geography/load balancing

• Data centers do not house eyeballs

• Intra/inter-site load balancing

• Names to numbers (users think names)

• Between clouds/interoperability?

Disaster Recovery

• Practice them

• Failovers should be automatic

• DNS (Quick DNS nit: use short TTLs)

• Contingency plans

Case Study: Authorize.net

Case Study: Authorize.net

Case Study: Authorize.net

; QUESTION SECTION:;secure.authorize.net. IN A

;; ANSWER SECTION:secure.authorize.net. 86400 IN A 64.94.118.32secure.authorize.net. 86400 IN A 64.94.118.33

Case Study: Authorize.net

Case Study: Authorize.net

; QUESTION SECTION:;secure.authorize.net. IN A

;; ANSWER SECTION:secure.authorize.net. 86400 IN A 64.94.118.32secure.authorize.net. 86400 IN A 64.94.118.33

GAH!

2: Pain Points

• Inventory

• Delivery speed

• Supply/demand

• Configuration

• Points of failure

“I can ping it but I don’t know where it is!”

Inventory

• Does it matter?

• Not an asset tag but provisioning scripts

• Audit bills (operational costs)

Delivery Speed

• May actually suffer (more pieces, not iron)

• Be analytical about what can be slow

• Limiting factor of what’s virtualized

• Were you looking before?

•Where is the testing from?

•Is this load dependent?

•Do users notice/care?

•Does it matter?

•Cost to make it faster?

•Savings to make it slower?

Graph from Gomez

Delivery Speed

Supply/demand

• Capital investments versus operating costs

• Big architecture changes to constant tuning

• Sampling time

Configuration

• Configuration in source control

• Has to move to a centralized location

• Patches, updates, revision images

• Lot of hard work here (no return)

Points of Failure

• It’s about risk

• All in the name, DNS

• 99.9% is different from 99.99%

• Any page is better than nothing

3: Best Practices

• App rewrite

• Controller (code, monitoring)

• Configuration (chef, puppet, etc)

• Dev/staging/production (Django/Rails)

• Security

• Monitoring and verification

Dev/Staging/Production

• This stuff works, use it

• Clouds make this possible

• ONLY exception is load testing (big exception)

• Nothing going to work out of the box

Security

• No “behind the firewall”

• Not an after thought, core feature

• Something to test

• Two hash encryption (private data)

• Centralized management makes security easier (At least double or nothing)

Monitoring and Verification

What you monitorWhat your user sees

Are they the same? Test transactions

4: What do we get?

• More choice on availability

• Less step functions (capacity, cost)

• Reduce computational marginal cost

Final Remarks

• Sprawl eats away from the promised good

• Never truly decoupled, apps dictate arch

• Management tools still lacking, more homegrown

• Make it all automatic, not easy

Questions?Jeremy Hitchcock, jeremy@dyn.com

DynDNS.com offers a suite of DNS, email, domain registration and virtual servers for the home and small business user.

The Dynect Platform provides the enterprise with external managed DNS and traffic management services.