Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant Shenoy, Arun...

1

Black-box and Gray-box Strategies for Virtual Machine Migration

Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif †

Univ. of Massachusetts Amherst†Intel

4th USENIX Symposium on Networked Systems Design & Implementation (NSDI 2007)

http://www.usenix.org/event/nsdi07/tech/wood.html



2

Introduction

• Operate application in data center.– Effective management of data center resources

while meeting SLAs• Virtualization

• Benefit of Virtualization– Application isolation– Server consolidation(multiplexing)– Handle workload dynamics

3

Motivation

• Efficient data center resource management– Live Migration

• However, detecting workload hotspots and initiating a migration is currently handled manually– Lacks the agility to respond to sudden workload

changes– Need consider multiple resource• CPU, network, and memory

4

Solution

• Automated black-box and gray-box strategies for virtual machine migration (Sandpiper)– Monitoring system resource usage– Hotspot detection– Determining a new mapping– Initiating the necessary migrations

5

The Sandpiper ArchitectureGathering

resource usage statistics on that

serverGathers processor, network and memory swap statisticsfor each VM Implements a

daemon to gatherOS-level statistics and application logs

Construct resource usage profiles foreach virtual server(Predict PM workload)

Monitors usage profiles to detect hotspots.Hotspot: any resource exceeds a threshold(or SLA violation) for a sustain period

Determine:What virtual servers should migrateWhere to move themHow much of a resource to allocate the virtual servers after migration

6

Black-box monitoring(1/4)

• VM workload usages is inferred solely from external observations.– From Domain-0.

• Monitoring parameter: – CPU usage– Network bandwidth– Memory swap rate

• Monitoring interval

7

Black-box monitoring(2/4)-CPU monitoring

• VM CPU usage can be determined by tracking scheduling events in the hypervisor.– Does not include VM’s disk IO and network CPU

overhead.• These kinds of overhead is count on Domain-0• Each VM is then charged:

– domain-0’s CPU usage*(VM IO request/ total IO requests)

• Assumption: the monitoring engine and the nucleus overhead is negligible

8

Black-box monitoring(3/4)-Network monitoring

• Background:– Domain-0 in Xen implements the network

interface driver– VMs access the driver via clean device

abstractions(virtual firewall-router (VFR) interface)• Monitoring engine can use the Linux /proc

interface VNIC’s usage– /proc/net/dev

9

Black-box monitoring(4/4)-Memory monitoring

• Challenge:– Domain-0 cannot directly monitor each VM’s

actual memory usage/utilization.• Only know the amount of memory assigned to the VM.

• Solution:– Observing swap activity in Domain-0 can infer the

working set sizes.[11]

[11] S. Jones, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. Geiger: Monitoring the buffer cache in a virtual machine environment. In Proc. ASPLOS’06, pages 13–23, October 2006.

10

Gray-box monitoring

• Motivation:– Black-box monitoring is not feasible to “peek inside” a VM to

gather usage statistics.• Solution:

– Install a light-weight monitoring daemon inside each virtual server

– Use /proc interface to gather OS-level statistics• CPU, network, memory

– Application-level statistics• Daemon get statistics from function provided by application itself• E.g. web/database server: request rate, request drop rate, service

time

11

Profile Generation(1/2)

• Profile: a compact description of that server’s resource usage over a sliding time window W.

• Profile content:– Blackbox parameter:• CPU utilization, network bandwidth utilization, and

memory swap rate

– Graybox parameter:• memory utilization, service time, request drop rate and

incoming request rate. (assumption: web server-apache)

12

Profile Generation(2/2)

• Profile type:– Distribution profile:• The probability distribution of the resource usage over

the window W.

– Time series profile:• The temporal fluctuations and it is simply a list of all

reported observations within the window W.

13

• Goal:– Signaling a need for VM migration whenever SLA

violations are detected.• A hotspot is flagged only if thresholds or SLAs

are exceeded for a sustained time.– at least k/n most recent observations and the next

predicted value exceed a threshold.• Use time series profile• Formula: (auto-regressive family of

predictors-AR(1).)

Hotspot detection

14

Resource Provisioning

• Goal:– Ensures that the SLAs are not violated even in the

presence of peak workloads.• Estimate the peak CPU, network and memory

requirement of each overloaded VM

• Black-box provisioning• Gray-box provisioning

15

Black-box provisioning(1/3)

• Estimation of peak CPU&Network bandwidth needs:– Distribution profile• Use historical data to predict the peak.• Challenge: Estimation error!!

• Background:– Both the CPU scheduler and the network packet

scheduler in Xen are work-conserving.

16


• Estimation error:– Example:• Two virtual machines that are assigned CPU weights of

1:1(50% of each)• Assume that VM1 is overloaded and requires 70% of the

CPU to meet its peak needs.

Case 1 Case 20%

20%

40%

60%

80%

100%

70%50%

20% 50%

VM2VM1

17


• Solution of estimation error:– adds a constant Δ to scale up this estimate.

• Estimation of peak memory needs:– If swap activity exceeds the threshold.– Then the current allocation is deemed insufficient

and is increased by a constant amount Δm

18

Gray-box provisioning(1/3)

• The gray-box approach can access to application-level logs.– Ability to estimate the peak resource needs of the

application even when the resource is fully utilized.

• Estimating peak CPU needs:– An application model is necessary to estimate the

peak CPU needs.

19


• Estimating peak CPU needs(cont.):– Applications such as web and database servers can be

modeled as G/G/1 queuing systems[23].– G/G/1 queuing system behavior[13]:

• = mean service time (obtain from server log)• d = mean response time of request(obtain by SLA)• = request arrival rate• = variance of inter-arrival rate(obtain from server log)• = variance of service time(obtain from server log)

[23] B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning for multi-tier internet applications. In Proc. ICAC ’05, June 2005.[13] L. Kleinrock. Queueing Systems, Volume 2: Computer Applications. John Wiley and Sons, Inc., 1976.

20


• Estimating peak CPU needs(cont.):– We can map the current CPU usage with , then the

peak CPU usage can be calculated:

• Estimating peak network bandwidth

– b = mean requested file size

21

Hotspot mitigation(1/3)

• Hotspot mitigation alg:– Goal:

• Determine which VM should be migrate to where to dissipate.

– Challenge:• NPHard--- multi-dimensional bin packing problem

– Bin=physical server, dimension=resource constraints

– Solution:• A heuristic which solve:

– Which overloaded VMs to migrate– Migrate to where such that migration overhead is minimized.

» Migration overhead can not be neglect

22


• Hotspot mitigation alg(cont.):– Intuition:

• Move load from the most overloaded servers to the least-loaded servers,

• minimize data copying incurred during migration

– Volume: the degree of load along multiple dimensions in a unified fashion.

• where cpu, net and mem are the corresponding utilizations of that resource for the virtual or physical server

23


• Hotspot mitigation alg(cont.):– volume-to-size ratio (VSR):

• Volume/Size(Size=the memory size of the VM)

– Migration decision:• Move highest VSR VM from the highest volume server and

determines if it can be housed on the least volume physical server.

– Swap decision(only consider 2-way swap):• Activate when simple migration cannot solve hotspot.• Swap the highest VSR VM on the highest volume hotspot server

with k lowest VSR VMs in lowest volume server– If a swap cannot be found, the next least loaded server is considered

• Note: a swap may require a third server(RAM issue)

24

Implementation• Virtualization platform:

– Xen• Sandpiper Control plane:

– Run on the control node(Python)• Profiling Engine:

– Use past 200 measurement to generate profile• Hotspot trigger:

– 3/5 (k/n) past reading+next predicted over threshold• Default threshold:

– 75%• Monitoring Engine:

• Gray-box monitoring daemon: – Linux OS daemon, Apache module(service time, request rate, drop rate, file size)

25

Evaluation Environment

• Data center:– 20 server(2.4Ghz pentium-4 servers)– Connected with gigabit ethernet– At least 1GB ram

• OS– Linux 2.6.16+Xen 3.0.2-3

• Workload generator– A cluster of Pentium-3 Linux servers

26

• Experiment 1 uses 3 physical servers and 5 VMs with memory allocations as following.

• All VMs run Apache serving dynamic PHP web pages.

• Use httperf to inject a workload

Experiment 1-Migration Effectiveness

27

Experiment 1-Migration Effectiveness(cont.)

t=166,Hotspot detected,VM1 has highest VSR

PM3 has lowest volume

t=362,Hotspot detected,VM4 has 2-nd highest VSR

(no PM has enough capacity to host VM3)

PM1 has lowest volume

VM3

In final phaseVM1 and VM5 the same Volume

But VM5 use smaller memoryPM2 has lowest volume

28

Experiment 2- Virtual Machine Swaps

VM ID RAM(MB) Host Machine CPU Load type

VM1 384 PM1 Steadily increase

VM2 384 PM1 Constant



• Experiment setting:

• As before, clients use httperf to request dynamic PHP pages.

29

Experiment 2- Virtual Machine Swaps(cont.)Hotspot detected on PM1.

The only viable solution is to swap VM2 with VM4.(3 party swap)

VM4 use smallest memory, so it is migrated twice.

Migration of VM4 is completed, VM2 start to be migrated to PM2.

Migration of VM2 is completed, VM4 start to be migrated to PM1.Migration overhead

30

Experiment 3- Mixed resource workloads

• Experiment setting:

• VM2 is database that stores its table in memory• PM2 has more physical memory

VM ID Description Host Machine

RAM(MB)

VM1 Network intensive PM1 256

VM2 Network intensive+ memory grow over time

PM1 256

VM3 CPU intensive PM2 256

VM4 CPU intensive PM2 256

31

Experiment 3- Mixed resource workloads(cont.)

• PM1 has a network hotspot and PM2 has a CPU hotspot

• Sandpiper swaps a network intensive VM for a CPU-intensive VM at t=130

32

Experiment 3- Mixed resource workloads(cont.)

• Sandpiper responds by increasing the RAM allocation in steps of 32MB every time swapping is observed;

• When no additional RAM is available, the VM is swapped to the second physical server at t=430.– Swap two Network-intensive VM(VM1 and VM2)

33

Experiment 4- Gray v. Black: Memory Allocation

• Goal:– Compare the effectiveness of the black- and graybox

approaches in mitigating memory hotspots• Using the SPECjbb 2005 benchmark generate memory usage.

• Settings:Host Machine

PM RAM(MB) VM ID VM RAM(MB) Description

PM1 384 VM1 256 Gray-box strategy

PM2 384 VM2 256 Black-box strategy

PM3 1024 none none Idle server(Wait for migration)

34

Experiment 4- Gray v.s. Black: Memory Allocation(cont.)

• Experiment Result:

• Observation:– The gray-box system can reduce or eliminate swapping without significant

overprovisioning of memory.

35

Experiment 4- Gray v.s. Black: Apache Performance

• Settings:•

• We use httperf to generate requests for CPU intensive PHP scripts on all VMs.

Host Machine

Hosting VM

Actual VM CPU requirement

PM1 VM1~2 70%

PM1 VM3 33%

PM2 VM4 7%

PM3 none None

36


• Black-box strategy error guess:

1 2

3 4

37

• Compare Gray-box strategy with Black-box strategy:

• Gray-box strategy can migrate VM3 to PM2 and VM1 to PM3 concurrently


38

Experiment 5-Prototype Data Center Evaluation

• Data Center environment– 16 servers that run a total of 35 VMs.– 1 additional server runs the control plane – 1 additional server is reserved as a scratch node for

swaps.• Settings:– Six physical servers running a total of 14 VMs to be

overloaded• four servers see a CPU hotspot and two see a network

hotspot

39

Experiment 5-Prototype Data Center Evaluation

• Result:– Sandpiper eliminates hotspots on all six servers by

interval 60. Migration overhead

40

Sandpiper overhead and scalability

• Sandpiper’s CPU and network overhead: – depends on the number of PMs and VMs in the

data center.– Overhead of Graybox strategy may affected by the

size of application-level statistics gathered

41

Sandpiper overhead and scalability(cont.)

• Nucleus overhead:– Network:• Each report uses only 288 bytes per VM• The resulting overhead on a gigabit LAN is negligible

– CPU usage:• Compare the performance of a CPU benchmark with

and without our resource monitors running.– On a single physical server running 24 concurrent VMs,

• Nucleus overheads reduce the CPU benchmark by approximately 1%.

42

Sandpiper overhead and scalability(cont.)

• Control Plane Scalability:– Source of computation complexity• Computation of a new mapping of virtual machines to

physical servers after detecting a hotspot

43

Conclusion&future work

• In this paper, we proposed Sandpiper, a automatic system which can:– monitoring and detecting hotspots– determining a new mapping of physical to virtual resources– initiating the necessary migrations

• We discussed a blackbox strategy and graybox strategy.• Evaluation showed we can bring rapid hotspot elimination in

data center environments.• Future work:

– Support replicated services • automatically determining whether to migrate a VM or to spawn a

replica.

44

Comment

• Advantage:– Good point to separate the monitoring strategy in

blackbox and graybox.– Sandpiper’s architecture and strategy may fit our “Plan A”

• Shortage:– The relationship of CPU utilization and request rate may

not be linear– The hotspot mitigation algorithm only consider average

the workload between physical machine• Should consider how to make PM get highest utilization without

hotspot

Date post:	24-Dec-2015
Category:	Documents
Upload:	marybeth-brook-pearson
View:	216 times
Download:	0 times

Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant Shenoy, Arun...

Documents