DonDon t’t Lose Sleep Over Availability - USENIX · DonDon t’t Lose Sleep Over Availability:...

Don’t Lose Sleep Over Availability:Don t Lose Sleep Over Availability: The GreenUp Decentralized Wakeup Service

Siddhartha Sen, Princeton University

J b R L h Ri h d H h C l G J SJacob R. Lorch, Richard Hughes, Carlos G. J. Suarez, Brian Zill, Weverton Cordeiro, and Jitendra Padhye

Enterprise networksEnterprise networks


WAN

users, IT admins


WAN

users, IT admins

Despite the cloud, this is a common scenario


St St

Energy savings

Stay Green!

Availability

Stay Up!

Energy savings Availability


St St

Energy savings

Stay Green!

Availability

Stay Up!

GreenUp!Energy savings AvailabilityGreenUp!

Sleep proxySleep proxy

WAN

Machine(active)


WAN

Machine(active) Sleep proxy


WAN

Machine(asleep) Sleep proxy


WAN

Send traffic to me!Machine(asleep) Sleep proxy


WAN

Send traffic to me!

Machine(asleep) Sleep proxy


WAN

Machine(asleep) Machine

(asleep)Sleep proxy


Remote requestq(TCP SYN)

WAN


(asleep)Sleep proxy


WAN


(asleep)Sleep proxyRemote request

(TCP SYN)


WAN

Wake up!(WoL)Machine

(asleep) Machine(asleep)

Sleep proxy(WoL)


WAN


(asleep)Sleep proxyWake up!

(WoL)


WAN



Remote requestq(TCP SYN)

WAN



WAN

Machine(active) Sleep proxyRemote request

(TCP SYN)


WAN

Response



RResponse

WAN



RResponse

Pros Cons

WAN• No special hardware• No envir. changesN h

• Dedicated server per subnet

• No app changes


Dedicated servers are a problemDedicated servers are a problem

• High deployment andHigh deployment and management cost

• Single point of failure

Dedicated servers are a problemDedicated servers are a problem

• High deployment andHigh deployment and management cost

• Single point of failure

• High availability becomes expensive!

GreenUp A decentralized minimalGreenUp: A decentralized, minimal software‐only sleep proxy

GreenUp A decentralized minimalGreenUp: A decentralized, minimal software‐only sleep proxy

Any machine can act as a proxy (manager) forAny machine can act as a proxy (manager) for sleeping machines on the subnet

OutlineOutline1. How does GreenUp work?2. What can I learn from GreenUp?

3. How effective is GreenUp?– Evaluation on 100 user machines, currently y

deployed on 1,100machines


Machine State… …… …… …

Distributed management

Subnet state coordination

Guardians







Guardians



GreenUp’s environmentGreenUp s environment

• Subnet domainsSubnet domains

• Load‐sensitive, unreliable machinesLoad sensitive, unreliable machines

• Single administrative domainSingle administrative domain

• Availability most importanty p

Running example (not to scale)Running example (not to scale)

M1 M8

M5

M2

M5

M6M3

M6

M7

M9M4

M7


M1 M8

M5

M2

M5

M6awake

M3M6

M7

M9M4

M7


M1 M8

M5

M2

M5

M6awake

M3M6

M7

M9M4

M7

asleep + unmanaged

Running example (not to scale)Running example (not to scale)manager

M1 M8 asleep + managedM5

M2 awake

managedM5

M6M3

M6

M7

M9M4asleep +

unmanaged

M7

Distributed management: hWho manages M9?

b f l– No guarantees before sleep– M1 failure abandons M8M1 M8

M5• Probe randomly, repeat since machines unreliable

M2

M5

M6

• Load‐sensitive machines, di ib bi

M3M6

M7 so distribute probing– Robust to manager issuesM4

M9

M7


• Wait for notification?– No guarantees before sleepM1 f il b d M8

M1 M8

M5 – M1 failure abandons M8

• Probe randomly, repeatM2

M5

M6 Probe randomly, repeat since machines unreliableM3

M6

M7• Load‐sensitive machines, so distribute probing

b

M4M9

M7

– Robust to manager issues



M1 M8



M5


M6


b

M4M9

M7




M1 M8



M5


M6


b

M4M9

M7



M1 M8total #

machines

M5

machinesawake#nM2

M5

M6 machinesawake#M3

M6

M7M4

M9

M7


M5

M1 M8total #

machines# managed

by i

M5

M6 machinesawake#imn M2

M6

M7

machinesawake#M3

M7M4

M9


M5

M1 M8total #

machines# managed

by i

M5

M6 machinesawake#imn M2

M6

M7

machinesawake#M3

M7M4

M9


M5

M1 M8

M5

machinesawake#ln 1

1pimn M2

M6

M7 • Coupon collector analysis

machinesawake#M3

M6

M7 Coupon collector analysisM4

M9


M5

M1 M8

M5

machinesawake#ln 1

1pimn M2

M6


machinesawake#M3

M6


M9


M5

M1 M8p = Pr(machine probed)

M5

machinesawake#ln 1

1pimn M2

M6


machinesawake#M3

M6


M9


M5

M1 M8p = Pr(machine probed)

M9

M5

machinesawake#ln 1

1pimn M2

M6


machinesawake#M3

M6


Multiple managersMultiple managers

M1 M8

M5

M2 machinesawake#ln 1

1pimn

M5

M6

M7 • Availability most importantM3

machinesawake#M6

M7 Availability most important

• Simple resolution protocolM4

M9M9


M1 M8

M5

M9


1pimn

M5

M6M9


machinesawake#M6M9




M1 M8

M5

M9


1pimn

M5

M6M9


machinesawake#M6M9




M1 M8

M5

M9


1pimn

M5

M6M9


machinesawake#M6M9




M1 M8

M5

M9


1pimn

M5

M6


machinesawake#M6



Load balanceLoad balance

M8 M9M1

M5

M2

M5

M6M3

M6

M7M4

M7


M5

M2 M8

M5

M6M3 M9

M6

M7M4 M1

M7


M5

• Induction analysis: equivalent to balls‐in‐bins!

M2 M8

M5

M6)2/ln(ln

)2/ln(nn after n/2 sleeps

M3 M9M6

M7M4 M1

M7


M5

• Induction analysis: equivalent to balls‐in‐bins!

Distributed management elects leaders in aM2 M8

M5

M6)2/ln(ln

)2/ln(nn after n/2 sleeps

Distributed management elects leaders in a robust and load‐balanced way, assuming temporary conflicts are tolerable

M3 M9M6

M7

temporary conflicts are tolerable.

M4 M1M7

Subnet state coordinationSubnet state coordination

• Distributed management relies on global state

M1 M8

M5 g– Who to probe?– How to manage?

M2

M5

M6

• IP address, MAC addressM3

M6

M7• TCP listen ports

M9M4

M7


• Distributed management relies on global state

M1 M8

M5 g– Who to probe?– How to manage?

M2Machine StateM8 …M9

M5

M6

• IP address, MAC addressM3M9 …M5 …… …

M6

M7• TCP listen ports

M9M4

M7


• Replicated state machine?– Unreliable machines, correlated behavior

M1 M8

M5 correlated behavior– Strong consistency overkillM2Machine State

M8 …M9

M5

M6• External database?– Lose instant deployability

M3M9 …M5 …… …

M6

M7

• Exploit subnet and weaker M9M4

M7

consistency



M1 M8


M8 …M9

M5


M3M9 …M5 …… …

M6

M7


M7

consistency



M1 M8


M8 …M9

M5


M3M9 …M5 …… …

M6

M7


M7

consistency


1. Periodic broadcast while awake

M1 M8

M5

2. Rebroadcast by h l l

M2

M5

M6 managers while asleep

3 Daily roll call toM3

M6

M7

M9M9

3. Daily roll call to garbage‐collect stateM4

M7



M1 M8

M5


M2

M5



M6

M7

M9M9


M9 state

M7



M1 M8

M5


M2

M5



M6

M7

M9


M7



M1 M8

M5


M2

M5



M6

M7

M9

3. Daily roll call to garbage‐collect state

M7



M1 M8

M5


M2

M5

M6 ? managers while asleep


M6

M7

?

M9


M7



M1 M8

M5


M2

M5

M6 ? managers while asleep


M6

M7

?

M9


M7



M1 M8M8 state

M5


M2

M5



M6

M7

M9


M7



M1 M8M8 state

M5


M2

M5

M6M8’ state

managers while asleep


M6

M7

M9


M7



M1 M8

M5

M8’ state


M2

M5



M6

M7

M9


M7



M1 M8

M5


M2

M5



M6

M7

M9


M7



M1 M8

M5Subnet state coordination distributes2. Rebroadcast by

h l l

M2

M5

M6

Subnet state coordination distributes per‐machine state on a subnet when strong consistency is not requiredmanagers while asleep


M6

M7

strong consistency is not required.

M9


M7





Guardians




P iMachine State… …… …… …

• Protects against simultaneous sleep

• Caps the max loadDistributed management


Guardians• Caps the max load







Guardians



Deployment in MicrosoftDeployment in Microsoft

• C# codeC# code– Interfaces with packet sniffer/network driver

• Client GUI for users and d l teasy deployment

• Pilot on 1 100 machinesPilot on 1,100 machines

EvaluationEvaluation

• Logs from 101 Windows 7 machines Feb –Logs from 101 Windows 7 machines, Feb. Sep. 2011

• Questions:– Does GreenUp consistently wake machines whenDoes GreenUp consistently wake machines when accessed?

– Does it do so in time to meet user patience?Does it do so in time to meet user patience?– Can GreenUp scale to large subnets?

GreenUp wakes machines reliablyGreenUp wakes machines reliably


• Connect to machines usingConnect to machines using Samba (TCP port 139)

• 11 different days (weekends, evenings):– 496 already awake, 278 woken, 5 unwakeable

– Most failures due to WoL– Most failures due to WoL

• 99.4% success rate


• Connect to machines usingConnect to machines using Samba (TCP port 139)

• 11 different days (weekends, evenings):

WoL is availability bottleneck!

– 496 already awake, 278 woken, 5 unwakeable

– Most failures due to WoL

bott e ec

– Most failures due to WoL

• 99.4% success rate

GreenUp wakes machines quicklyGreenUp wakes machines quickly

• GreenUp relies on someGreenUp relies on some user patience– Wakeup delay– User retry logic

• Side‐effect of WoLfailure: manager logs how long user waitshow long user waits– 48 events


• GreenUp relies on someGreenUp relies on some user patience– Wakeup delay– User retry logic

• Side‐effect of WoLfailure: manager logs how long user waitshow long user waits– 48 events


• Convolving: GreenUp wakes machines before userConvolving: GreenUp wakes machines before user gives up 85% of the time


87% of wakeups take < 9 sec







13% of users give up after 3 sec (port scanners?)


GreenUp scales to large subnetsGreenUp scales to large subnets

• Sources of manager loadSources of manager load– Intercept traffic for asleep machinesBroadcast state– Broadcast state

– Probe/respond to probes





Good load balance +h k henough awake machines

few managed machines!


• Simulated probing load on 2.4‐GHz, dual‐coreSimulated probing load on 2.4 GHz, dual core Windows 7 machine w/ 4GB memory and 1Gb/s NIC:

# of managed machines CPU utilization100 12%200 21%300 29%

• Guardians ensure max load is 100


• Simulated probing load on 2.4‐GHz, dual‐coreSimulated probing load on 2.4 GHz, dual core Windows 7 machine w/ 4GB memory and 1Gb/s NIC:

# of managed machines CPU utilization100 12%200 21%300 29%

• Guardians ensure max load is 100

Does GreenUp save more energy?Does GreenUp save more energy?

• Energy savings depends on sleep timeEnergy savings depends on sleep time





Average 31% $17.50/machine/year



• IT enforces sleep policy at Microsoft, so hardIT enforces sleep policy at Microsoft, so hard to tell

Extension: Higher availability via l l d h d ffexplicit load hand‐off

M5

M8 M9M1

M2

M5

M6M3

M6

M7M4

M7


M5

M2

M5

M6M8

M3M6

M7M9

M4M7

M1


M5

M8 M9M1

M2

M5

M6M3

M6

M7M4

M7


M5

M2

M5

M6M3

M6

M7M4

M7M8 M9M1


+M5

M2

M5

M6M3

M6

M7M4

M7M8 M9M1


+M5

• Theorem Expected maxM2

M5

M6 • Theorem. Expected max load = n/d Hd.M3

M6

M7M4

# awake machines

Harmonic numbers

M7M8 M9M1

Other solutionsOther solutions

• Sleep proxy idea: Christensen & Gulledge ’98Sleep proxy idea: Christensen & Gulledge 98• Recently:

System TechniqueSomniloquy, NSDI ’09 augmented NICsq y, gLiteGreen, ATC ’10Jettison, EuroSys ’12 VM migration

SleepServer, ATC ’10 application stubsNedevschi et al., NSDI ’08 Reich et al ATC ’10 dedicated serversReich et al., ATC 10

Other solutionsOther solutions

• Sleep proxy idea: Christensen & Gulledge ’98Barriers to Sleep proxy idea: Christensen & Gulledge 98• Recently:

deployment

System TechniqueSomniloquy, NSDI ’09 augmented NICsq y, gLiteGreen, ATC ’10Jettison, EuroSys ’12 VM migration

SleepServer, ATC ’10 application stubsNedevschi et al., NSDI ’08 Reich et al ATC ’10 dedicated serversReich et al., ATC 10

GreenUpGreenUp

• Completely decentralized software‐onlyCompletely decentralized, software only sleep proxy

• Useful distributed systems techniques

• High availability at low cost, even as machines sleep!machines sleep!

http://research.microsoft.com/en‐us/projects/greenup/http://research.microsoft.com/en us/projects/greenup/

Date post:	17-Apr-2018
Category:	Documents
Upload:	vuongdat
View:	225 times
Download:	6 times

DonDon t’t Lose Sleep Over Availability - USENIX · DonDon t’t Lose Sleep Over Availability:...

Documents