+ All Categories
Home > Documents > Public Clouds (EC2, Azure, Rackspace, …)

Public Clouds (EC2, Azure, Rackspace, …)

Date post: 23-Feb-2016
Category:
Upload: yin
View: 55 times
Download: 0 times
Share this document with a friend
Description:
Public Clouds (EC2, Azure, Rackspace, …). Multi-tenancy Different customers’ virtual machines (VMs) share same server. VM. VM. VM. VM. VM. VM. VM. Tenant: Why Cloud? Pay-as-you-go Infinite Resources Cheaper Resources. Provider: Why multi-tenancy? Improved resource utilization - PowerPoint PPT Presentation
Popular Tags:
43
Public Clouds (EC2, Azure, Rackspace, …) VM Multi-tenancy Different customers’ virtual machines (VMs) share same server Provider: Why multi- tenancy? Improved resource utilization Benefits of economies VM VM VM VM VM 1 VM Tenant: Why Cloud? Pay-as-you-go Infinite Resource Cheaper Resources
Transcript
Page 1: Public Clouds (EC2, Azure, Rackspace, …)

1

Public Clouds (EC2, Azure, Rackspace, …)

VM

Multi-tenancyDifferent customers’ virtual machines (VMs) share same server

Provider: Why multi-tenancy?• Improved resource utilization• Benefits of economies of scale

VM

VM

VM

VM

VM

VM

Tenant: Why Cloud?• Pay-as-you-go• Infinite Resources• Cheaper Resources

Page 2: Public Clouds (EC2, Azure, Rackspace, …)

Implications of Multi-tenancy

• VMs share many resources– CPU, cache, memory, disk, network, etc.

• Virtual Machine Managers (VMM) – Goal: Provide Isolation

• Deployed VMMs don’t perfectly isolate VMs– Side-channels [Ristenpart et al. ’09, Zhang et al. ’12]

2

VM

VM

VMM

Page 3: Public Clouds (EC2, Azure, Rackspace, …)

3

Lies to by the Cloud

• Infinite resources

• All VMs are created equally

• Perfect isolation

Page 4: Public Clouds (EC2, Azure, Rackspace, …)

This Talk

Taking control of where your instances run• Are all VMs created equally?• How much variation exists and why?• Can we take advantage of the variation to improve

performance?

Gaining performance at any cost• Can users impact each other’s performance?• Is there a way to maliciously steal another user’s resource?• Is tehre

Page 5: Public Clouds (EC2, Azure, Rackspace, …)

5

Heterogeneity in EC2

• Cause of heterogeneity:– Contention for resources: you are sharing!– CPU Variation:

• Upgrades over time• Replacement of failed machined

– Network Variation: • Different path lengths• Different levels of oversubscription

Page 6: Public Clouds (EC2, Azure, Rackspace, …)

6

Are All VMs Created Equally?• Inter-architecture:

– Is there differences between architectures– Can this be used to predict perform aprior?

• Intra-architecture:– Within an architecture– If large, then you can’t predict performance

• Temporal– On the same VM over time?– There is no hope!

Page 7: Public Clouds (EC2, Azure, Rackspace, …)

7

Benchmark Suite & Methodoloy

• Methodology:– 6 Workloads– 20 VMs (small instances) for 1 week– Each run micro-benchmarks every hour

Page 8: Public Clouds (EC2, Azure, Rackspace, …)

8

Inter-Architecture

Page 9: Public Clouds (EC2, Azure, Rackspace, …)

9

Intra-Architecture

CPU is predictable – les than 15%Storage is unpredictable --- as high as 250%

Page 10: Public Clouds (EC2, Azure, Rackspace, …)

10

Temporal

Page 11: Public Clouds (EC2, Azure, Rackspace, …)

11

Overall

CPU type can only be used to predict CPU performance

For Mem/IO bound jobs need to empirically learn how good an instance is

Page 12: Public Clouds (EC2, Azure, Rackspace, …)

12

What Can We Do about it?

• Goal: Run VM on best instances

• Constraints:– Can control placement – can’t control which instance

the cloud gives us– Can’t migrate

• Placement gaming:– Try and find the best instances simply by starting and

stopping VMs

Page 13: Public Clouds (EC2, Azure, Rackspace, …)

Measurement Methodology

• Deploy on Amazon EC2– A=10 instances– 12 hours

• Compare against no strategy: – Run initial machines with no strategy• Baseline varies for each run

– Re-use machines for strategy

Page 14: Public Clouds (EC2, Azure, Rackspace, …)

EC2 results

1 2 30

20

40

60

80

100Baseline Strategy

Apache Runs

MB/

sec

1 2 38

9

10

11

12Baseline Strategy

NER Runs

Reco

rds/

sec

16 migrations

Page 15: Public Clouds (EC2, Azure, Rackspace, …)

15

Placement Gaming

• Approach:– Start a bunch of extra instances– Rank them based on performance– Kill the under performing instances

• Performing poorer than average– Start new instances.

• Interesting Questions:– How many instances should be killed in each round?– How frequently should you evaluate performance of

instances.

Page 16: Public Clouds (EC2, Azure, Rackspace, …)

16

Resource-Freeing Attacks:Improve Your Cloud Performance

(at Your Neighbor's Expense)

(Venkat)anathan Varadarajan, Thawan Kooburat,

Benjamin Farley, Thomas Ristenpart,

and Michael Swift

DEPARTMENT OF COMPUTER SCIENCES

Page 17: Public Clouds (EC2, Azure, Rackspace, …)

17

Contention in Xen

• Same Core– Same core & same L1 Cache & Same memory

• Same Package– Diff core but share L1 Cache and memory

• Different Package– Diff core & diff Cache but share Memory

Page 18: Public Clouds (EC2, Azure, Rackspace, …)

18

I/O contends with self

• VMs contend for the same resource– Network with Network:

• More VMs Fair share is smaller– Disk I/O with Disk I/O:

• More disk access longer seek times

• Xen does N/W batching to give better performances– BUT: this adds jitter and delay– ALSO: you can get more than your fairshare because

of the batch

Page 19: Public Clouds (EC2, Azure, Rackspace, …)

19

I/O contends with self

• VMs contend for the same resource– Network with Network:

• More VMs Fair share is smaller– Disk I/O with Disk I/O:

• More disk access longer seek times

• Xen does N/W batching to give better performances– BUT: this adds jitter and delay– ALSO: you can get more than your fairshare because

of the batch

Page 20: Public Clouds (EC2, Azure, Rackspace, …)

20

Everyone Contends with Cache

• No contention on same core– VMs run in serial so access to cache is serial

• No contention on diff package– VMs use different cache

• Lots of contention when same package– VMs run in parallel but share same cache

Page 21: Public Clouds (EC2, Azure, Rackspace, …)

21

CPU Net Disk Cache0

100

200

300

400

500

600

Perfo

rman

ce D

egra

datio

n (%

)

Contention in Xen

Local Xen TestbedMachine Intel Xeon E5430,

2.66 GhzCPU 2 packages each

with 2 coresCache Size 6MB per package

VM

VM

Non-work-conserving CPU scheduling

Work-conservingscheduling

3x-6x Performance loss Higher cost

Page 22: Public Clouds (EC2, Azure, Rackspace, …)

This work: Greedy customer can recover performance by interfering with other tenants

Resource-Freeing Attack

What can a tenant do?

22

Pack up VM and move(See our SOCC 2012 paper)… but, not all workloads cheap to move

VM

VM

Ask provider for better isolation… requires overhaul of the cloud

Page 23: Public Clouds (EC2, Azure, Rackspace, …)

23

Resource-freeing attacks (RFAs)

• What is an RFA? • RFA case studies

1. Two highly loaded web server VMs2. Last Level Cache (LLC) bound VM and

highly loaded webserver VM• Demonstration on Amazon EC2

Page 24: Public Clouds (EC2, Azure, Rackspace, …)

24

The Setting

Victim:– One or more VMs– Public interface (eg, http)

Beneficiary:– VM whose performance we want

to improve

Helper:– Mounts the attack

Beneficiary and victim fighting over a target resource

Helper

VM

VM

Victim

Beneficiary

Page 25: Public Clouds (EC2, Azure, Rackspace, …)

Example: Network Contention

• Beneficiary & Victim– Apache webservers hosting static and

dynamic (CGI) web pages.• Target Resource: Network Bandwidth• Work-conserving scheduler– network bandwidth

25

Net

Clients

What can you do?

VictimBeneficiary

Loca

l Xen

Test

be

d

Page 26: Public Clouds (EC2, Azure, Rackspace, …)

26

Recipe for a Successful RFAShift resource away from the target resource towards the bottleneck resource

Shift resource usage via public interface

Proportion of Network usage

CPU intensive dynamic pages

Static pagesProp

ortio

n of

CPU

usa

ge

Push

tow

ards

CPU

bott

lene

ck

Reduce target resource usage

Limits

Page 27: Public Clouds (EC2, Azure, Rackspace, …)

29

An RFA in Our Example

Net

Helper

CGI R

eque

st

CPU Utilization

Clients

Result in our testbed:Increases beneficiary’s share of bandwidth

No RFA: 1800 page requests/secW/ RFA: 3026 page requests/sec

50% 85%share of

bandwidth

Page 28: Public Clouds (EC2, Azure, Rackspace, …)

30

Shared CPU Cache:– Ubiquitous: Almost all workloads need cache– Hardware controlled: Not easily isolated via

software– Performance Sensitive: High performance cost!

Resource-freeing attacks 1) Send targeted requests to victim 2) Shift resources use from target to a bottleneck

Can we mount RFAs when targetresource is CPU cache?

Page 29: Public Clouds (EC2, Azure, Rackspace, …)

31

Cache Contention

1000 2000 30000

50

100

150

200

250

Webserver Request Rate

Cach

e Pe

rform

ance

Deg

rada

tion

(%)

RFA Goal

Page 30: Public Clouds (EC2, Azure, Rackspace, …)

Case Study: Cache vs. Network

• Victim : Apache webserver hosting static and dynamic (CGI) web pages

• Beneficiary: Synthetic cache bound workload (LLCProbe)

• Target Resource: Cache• No cache isolation:– ~3x slower when sharing

cache with webserver

32

Net

Cache

$$$ Clients

Loca

l Xen

Test

be

d

VictimBeneficiary

CoreCore

Page 31: Public Clouds (EC2, Azure, Rackspace, …)

33

Net

Cache vs. NetworkVictim webserver frequently interrupts, pollutes the cache– Reason: Xen gives higher

priority to VM consuming less CPU time

Cache

Clients$$$

Cache state time line

Beneficiary starts to run

Core Core

decreased cache efficiency

Webserver receives a

request

Heavily loaded web server

cache state

Page 32: Public Clouds (EC2, Azure, Rackspace, …)

34

Net

Cache vs. Network w/ RFARFA helps in two ways:1. Webserver loses its

priority.2. Reducing the capacity

of webserver.Cache

Clients$$$

Cache state time line

Core Core

HelperHeavily loaded webserver requests under RFA

CGI R

eque

stBeneficiary starts to run

Webserver receives a

request

Heavily loaded web server

cache state

Page 33: Public Clouds (EC2, Azure, Rackspace, …)

35

RFA: Performance ImprovementRFA intensities – time in ms per second

196% slowdown

86% slowdown

60%Performance Improvement

Page 34: Public Clouds (EC2, Azure, Rackspace, …)

36

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM

Page 35: Public Clouds (EC2, Azure, Rackspace, …)

37

Limitations

• Experiments setup:– Only 1 VMs in each experiment– Don’t vary the number of each type of job

Page 36: Public Clouds (EC2, Azure, Rackspace, …)

38

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM

Page 37: Public Clouds (EC2, Azure, Rackspace, …)

39

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM

Page 38: Public Clouds (EC2, Azure, Rackspace, …)

40

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM

Page 39: Public Clouds (EC2, Azure, Rackspace, …)

41

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM

Page 40: Public Clouds (EC2, Azure, Rackspace, …)

42

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM

Page 41: Public Clouds (EC2, Azure, Rackspace, …)

43

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM

Page 42: Public Clouds (EC2, Azure, Rackspace, …)

44

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM

Page 43: Public Clouds (EC2, Azure, Rackspace, …)

45

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM


Recommended