Don’t Lose Sleep Over Availability:Don t Lose Sleep Over Availability: The GreenUp Decentralized Wakeup Service
Siddhartha Sen, Princeton University
J b R L h Ri h d H h C l G J SJacob R. Lorch, Richard Hughes, Carlos G. J. Suarez, Brian Zill, Weverton Cordeiro, and Jitendra Padhye
Enterprise networksEnterprise networks
Enterprise networksEnterprise networks
WAN
users, IT admins
Enterprise networksEnterprise networks
WAN
users, IT admins
Despite the cloud, this is a common scenario
Enterprise networksEnterprise networks
St St
Energy savings
Stay Green!
Availability
Stay Up!
Energy savings Availability
Enterprise networksEnterprise networks
St St
Energy savings
Stay Green!
Availability
Stay Up!
GreenUp!Energy savings AvailabilityGreenUp!
Sleep proxySleep proxy
WAN
Machine(active)
Sleep proxySleep proxy
WAN
Machine(active) Sleep proxy
Sleep proxySleep proxy
WAN
Machine(asleep) Sleep proxy
Sleep proxySleep proxy
WAN
Send traffic to me!Machine(asleep) Sleep proxy
Sleep proxySleep proxy
WAN
Send traffic to me!
Machine(asleep) Sleep proxy
Sleep proxySleep proxy
WAN
Machine(asleep) Machine
(asleep)Sleep proxy
Sleep proxySleep proxy
Remote requestq(TCP SYN)
WAN
Machine(asleep) Machine
(asleep)Sleep proxy
Sleep proxySleep proxy
WAN
Machine(asleep) Machine
(asleep)Sleep proxyRemote request
(TCP SYN)
Sleep proxySleep proxy
WAN
Wake up!(WoL)Machine
(asleep) Machine(asleep)
Sleep proxy(WoL)
Sleep proxySleep proxy
WAN
Machine(asleep) Machine
(asleep)Sleep proxyWake up!
(WoL)
Sleep proxySleep proxy
WAN
Machine(active) Sleep proxy
Sleep proxySleep proxy
Remote requestq(TCP SYN)
WAN
Machine(active) Sleep proxy
Sleep proxySleep proxy
WAN
Machine(active) Sleep proxyRemote request
(TCP SYN)
Sleep proxySleep proxy
WAN
Response
Machine(active) Sleep proxy
Sleep proxySleep proxy
RResponse
WAN
Machine(active) Sleep proxy
Sleep proxySleep proxy
RResponse
Pros Cons
WAN• No special hardware• No envir. changesN h
• Dedicated server per subnet
• No app changes
Machine(active) Sleep proxy
Dedicated servers are a problemDedicated servers are a problem
• High deployment andHigh deployment and management cost
• Single point of failure
Dedicated servers are a problemDedicated servers are a problem
• High deployment andHigh deployment and management cost
• Single point of failure
• High availability becomes expensive!
GreenUp A decentralized minimalGreenUp: A decentralized, minimal software‐only sleep proxy
GreenUp A decentralized minimalGreenUp: A decentralized, minimal software‐only sleep proxy
Any machine can act as a proxy (manager) forAny machine can act as a proxy (manager) for sleeping machines on the subnet
OutlineOutline1. How does GreenUp work?2. What can I learn from GreenUp?
3. How effective is GreenUp?– Evaluation on 100 user machines, currently y
deployed on 1,100machines
OutlineOutline1. How does GreenUp work?2. What can I learn from GreenUp?
Machine State… …… …… …
Distributed management
Subnet state coordination
Guardians
3. How effective is GreenUp?– Evaluation on 100 user machines, currently y
deployed on 1,100machines
OutlineOutline1. How does GreenUp work?2. What can I learn from GreenUp?
Machine State… …… …… …
Distributed management
Subnet state coordination
Guardians
3. How effective is GreenUp?– Evaluation on 100 user machines, currently y
deployed on 1,100machines
GreenUp’s environmentGreenUp s environment
• Subnet domainsSubnet domains
• Load‐sensitive, unreliable machinesLoad sensitive, unreliable machines
• Single administrative domainSingle administrative domain
• Availability most importanty p
Running example (not to scale)Running example (not to scale)
M1 M8
M5
M2
M5
M6M3
M6
M7
M9M4
M7
Running example (not to scale)Running example (not to scale)
M1 M8
M5
M2
M5
M6awake
M3M6
M7
M9M4
M7
Running example (not to scale)Running example (not to scale)
M1 M8
M5
M2
M5
M6awake
M3M6
M7
M9M4
M7
asleep + unmanaged
Running example (not to scale)Running example (not to scale)manager
M1 M8 asleep + managedM5
M2 awake
managedM5
M6M3
M6
M7
M9M4asleep +
unmanaged
M7
Distributed management: hWho manages M9?
b f l– No guarantees before sleep– M1 failure abandons M8M1 M8
M5• Probe randomly, repeat since machines unreliable
M2
M5
M6
• Load‐sensitive machines, di ib bi
M3M6
M7 so distribute probing– Robust to manager issuesM4
M9
M7
Distributed management: hWho manages M9?
• Wait for notification?– No guarantees before sleepM1 f il b d M8
M1 M8
M5 – M1 failure abandons M8
• Probe randomly, repeatM2
M5
M6 Probe randomly, repeat since machines unreliableM3
M6
M7• Load‐sensitive machines, so distribute probing
b
M4M9
M7
– Robust to manager issues
Distributed management: hWho manages M9?
• Wait for notification?– No guarantees before sleepM1 f il b d M8
M1 M8
M5 – M1 failure abandons M8
• Probe randomly, repeatM2
M5
M6 Probe randomly, repeat since machines unreliableM3
M6
M7• Load‐sensitive machines, so distribute probing
b
M4M9
M7
– Robust to manager issues
Distributed management: hWho manages M9?
• Wait for notification?– No guarantees before sleepM1 f il b d M8
M1 M8
M5 – M1 failure abandons M8
• Probe randomly, repeatM2
M5
M6 Probe randomly, repeat since machines unreliableM3
M6
M7• Load‐sensitive machines, so distribute probing
b
M4M9
M7
– Robust to manager issues
Distributed management: hWho manages M9?
M1 M8total #
machines
M5
machinesawake#nM2
M5
M6 machinesawake#M3
M6
M7M4
M9
M7
Distributed management: hWho manages M9?
M5
M1 M8total #
machines# managed
by i
M5
M6 machinesawake#imn M2
M6
M7
machinesawake#M3
M7M4
M9
Distributed management: hWho manages M9?
M5
M1 M8total #
machines# managed
by i
M5
M6 machinesawake#imn M2
M6
M7
machinesawake#M3
M7M4
M9
Distributed management: hWho manages M9?
M5
M1 M8
M5
machinesawake#ln 1
1pimn M2
M6
M7 • Coupon collector analysis
machinesawake#M3
M6
M7 Coupon collector analysisM4
M9
Distributed management: hWho manages M9?
M5
M1 M8
M5
machinesawake#ln 1
1pimn M2
M6
M7 • Coupon collector analysis
machinesawake#M3
M6
M7 Coupon collector analysisM4
M9
Distributed management: hWho manages M9?
M5
M1 M8p = Pr(machine probed)
M5
machinesawake#ln 1
1pimn M2
M6
M7 • Coupon collector analysis
machinesawake#M3
M6
M7 Coupon collector analysisM4
M9
Distributed management: hWho manages M9?
M5
M1 M8p = Pr(machine probed)
M9
M5
machinesawake#ln 1
1pimn M2
M6
M7 • Coupon collector analysis
machinesawake#M3
M6
M7 Coupon collector analysisM4
Multiple managersMultiple managers
M1 M8
M5
M2 machinesawake#ln 1
1pimn
M5
M6
M7 • Availability most importantM3
machinesawake#M6
M7 Availability most important
• Simple resolution protocolM4
M9M9
Multiple managersMultiple managers
M1 M8
M5
M9
M2 machinesawake#ln 1
1pimn
M5
M6M9
M7 • Availability most importantM3
machinesawake#M6M9
M7 Availability most important
• Simple resolution protocolM4
Multiple managersMultiple managers
M1 M8
M5
M9
M2 machinesawake#ln 1
1pimn
M5
M6M9
M7 • Availability most importantM3
machinesawake#M6M9
M7 Availability most important
• Simple resolution protocolM4
Multiple managersMultiple managers
M1 M8
M5
M9
M2 machinesawake#ln 1
1pimn
M5
M6M9
M7 • Availability most importantM3
machinesawake#M6M9
M7 Availability most important
• Simple resolution protocolM4
Multiple managersMultiple managers
M1 M8
M5
M9
M2 machinesawake#ln 1
1pimn
M5
M6
M7 • Availability most importantM3
machinesawake#M6
M7 Availability most important
• Simple resolution protocolM4
Load balanceLoad balance
M8 M9M1
M5
M2
M5
M6M3
M6
M7M4
M7
Load balanceLoad balance
M5
M2 M8
M5
M6M3 M9
M6
M7M4 M1
M7
Load balanceLoad balance
M5
• Induction analysis: equivalent to balls‐in‐bins!
M2 M8
M5
M6)2/ln(ln
)2/ln(nn after n/2 sleeps
M3 M9M6
M7M4 M1
M7
Load balanceLoad balance
M5
• Induction analysis: equivalent to balls‐in‐bins!
Distributed management elects leaders in aM2 M8
M5
M6)2/ln(ln
)2/ln(nn after n/2 sleeps
Distributed management elects leaders in a robust and load‐balanced way, assuming temporary conflicts are tolerable
M3 M9M6
M7
temporary conflicts are tolerable.
M4 M1M7
Subnet state coordinationSubnet state coordination
• Distributed management relies on global state
M1 M8
M5 g– Who to probe?– How to manage?
M2
M5
M6
• IP address, MAC addressM3
M6
M7• TCP listen ports
M9M4
M7
Subnet state coordinationSubnet state coordination
• Distributed management relies on global state
M1 M8
M5 g– Who to probe?– How to manage?
M2Machine StateM8 …M9
M5
M6
• IP address, MAC addressM3M9 …M5 …… …
M6
M7• TCP listen ports
M9M4
M7
Subnet state coordinationSubnet state coordination
• Replicated state machine?– Unreliable machines, correlated behavior
M1 M8
M5 correlated behavior– Strong consistency overkillM2Machine State
M8 …M9
M5
M6• External database?– Lose instant deployability
M3M9 …M5 …… …
M6
M7
• Exploit subnet and weaker M9M4
M7
consistency
Subnet state coordinationSubnet state coordination
• Replicated state machine?– Unreliable machines, correlated behavior
M1 M8
M5 correlated behavior– Strong consistency overkillM2Machine State
M8 …M9
M5
M6• External database?– Lose instant deployability
M3M9 …M5 …… …
M6
M7
• Exploit subnet and weaker M9M4
M7
consistency
Subnet state coordinationSubnet state coordination
• Replicated state machine?– Unreliable machines, correlated behavior
M1 M8
M5 correlated behavior– Strong consistency overkillM2Machine State
M8 …M9
M5
M6• External database?– Lose instant deployability
M3M9 …M5 …… …
M6
M7
• Exploit subnet and weaker M9M4
M7
consistency
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8
M5
2. Rebroadcast by h l l
M2
M5
M6 managers while asleep
3 Daily roll call toM3
M6
M7
M9M9
3. Daily roll call to garbage‐collect stateM4
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8
M5
2. Rebroadcast by h l l
M2
M5
M6 managers while asleep
3 Daily roll call toM3
M6
M7
M9M9
3. Daily roll call to garbage‐collect stateM4
M9 state
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8
M5
2. Rebroadcast by h l l
M2
M5
M6 managers while asleep
3 Daily roll call toM3
M6
M7
M9
3. Daily roll call to garbage‐collect stateM4
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8
M5
2. Rebroadcast by h l l
M2
M5
M6 managers while asleep
3 Daily roll call toM3
M6
M7
M9
3. Daily roll call to garbage‐collect state
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8
M5
2. Rebroadcast by h l l
M2
M5
M6 ? managers while asleep
3 Daily roll call toM3
M6
M7
?
M9
3. Daily roll call to garbage‐collect stateM4
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8
M5
2. Rebroadcast by h l l
M2
M5
M6 ? managers while asleep
3 Daily roll call toM3
M6
M7
?
M9
3. Daily roll call to garbage‐collect stateM4
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8M8 state
M5
2. Rebroadcast by h l l
M2
M5
M6 managers while asleep
3 Daily roll call toM3
M6
M7
M9
3. Daily roll call to garbage‐collect stateM4
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8M8 state
M5
2. Rebroadcast by h l l
M2
M5
M6M8’ state
managers while asleep
3 Daily roll call toM3
M6
M7
M9
3. Daily roll call to garbage‐collect stateM4
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8
M5
M8’ state
2. Rebroadcast by h l l
M2
M5
M6 managers while asleep
3 Daily roll call toM3
M6
M7
M9
3. Daily roll call to garbage‐collect stateM4
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8
M5
2. Rebroadcast by h l l
M2
M5
M6 managers while asleep
3 Daily roll call toM3
M6
M7
M9
3. Daily roll call to garbage‐collect stateM4
M7
Subnet state coordinationSubnet state coordination
1. Periodic broadcast while awake
M1 M8
M5Subnet state coordination distributes2. Rebroadcast by
h l l
M2
M5
M6
Subnet state coordination distributes per‐machine state on a subnet when strong consistency is not requiredmanagers while asleep
3 Daily roll call toM3
M6
M7
strong consistency is not required.
M9
3. Daily roll call to garbage‐collect stateM4
M7
OutlineOutline1. How does GreenUp work?2. What can I learn from GreenUp?
Machine State… …… …… …
Distributed management
Subnet state coordination
Guardians
3. How effective is GreenUp?– Evaluation on 100 user machines, currently y
deployed on 1,100machines
OutlineOutline1. How does GreenUp work?2. What can I learn from GreenUp?
P iMachine State… …… …… …
• Protects against simultaneous sleep
• Caps the max loadDistributed management
Subnet state coordination
Guardians• Caps the max load
3. How effective is GreenUp?– Evaluation on 100 user machines, currently y
deployed on 1,100machines
OutlineOutline1. How does GreenUp work?2. What can I learn from GreenUp?
Machine State… …… …… …
Distributed management
Subnet state coordination
Guardians
3. How effective is GreenUp?– Evaluation on 100 user machines, currently y
deployed on 1,100machines
Deployment in MicrosoftDeployment in Microsoft
• C# codeC# code– Interfaces with packet sniffer/network driver
• Client GUI for users and d l teasy deployment
• Pilot on 1 100 machinesPilot on 1,100 machines
EvaluationEvaluation
• Logs from 101 Windows 7 machines Feb –Logs from 101 Windows 7 machines, Feb. Sep. 2011
• Questions:– Does GreenUp consistently wake machines whenDoes GreenUp consistently wake machines when accessed?
– Does it do so in time to meet user patience?Does it do so in time to meet user patience?– Can GreenUp scale to large subnets?
GreenUp wakes machines reliablyGreenUp wakes machines reliably
GreenUp wakes machines reliablyGreenUp wakes machines reliably
• Connect to machines usingConnect to machines using Samba (TCP port 139)
• 11 different days (weekends, evenings):– 496 already awake, 278 woken, 5 unwakeable
– Most failures due to WoL– Most failures due to WoL
• 99.4% success rate
GreenUp wakes machines reliablyGreenUp wakes machines reliably
• Connect to machines usingConnect to machines using Samba (TCP port 139)
• 11 different days (weekends, evenings):
WoL is availability bottleneck!
– 496 already awake, 278 woken, 5 unwakeable
– Most failures due to WoL
bott e ec
– Most failures due to WoL
• 99.4% success rate
GreenUp wakes machines quicklyGreenUp wakes machines quickly
• GreenUp relies on someGreenUp relies on some user patience– Wakeup delay– User retry logic
• Side‐effect of WoLfailure: manager logs how long user waitshow long user waits– 48 events
GreenUp wakes machines quicklyGreenUp wakes machines quickly
• GreenUp relies on someGreenUp relies on some user patience– Wakeup delay– User retry logic
• Side‐effect of WoLfailure: manager logs how long user waitshow long user waits– 48 events
GreenUp wakes machines quicklyGreenUp wakes machines quickly
• Convolving: GreenUp wakes machines before userConvolving: GreenUp wakes machines before user gives up 85% of the time
GreenUp wakes machines quicklyGreenUp wakes machines quickly
87% of wakeups take < 9 sec
• Convolving: GreenUp wakes machines before userConvolving: GreenUp wakes machines before user gives up 85% of the time
GreenUp wakes machines quicklyGreenUp wakes machines quickly
87% of wakeups take < 9 sec
• Convolving: GreenUp wakes machines before userConvolving: GreenUp wakes machines before user gives up 85% of the time
GreenUp wakes machines quicklyGreenUp wakes machines quickly
87% of wakeups take < 9 sec
13% of users give up after 3 sec (port scanners?)
• Convolving: GreenUp wakes machines before userConvolving: GreenUp wakes machines before user gives up 85% of the time
GreenUp scales to large subnetsGreenUp scales to large subnets
• Sources of manager loadSources of manager load– Intercept traffic for asleep machinesBroadcast state– Broadcast state
– Probe/respond to probes
GreenUp scales to large subnetsGreenUp scales to large subnets
GreenUp scales to large subnetsGreenUp scales to large subnets
GreenUp scales to large subnetsGreenUp scales to large subnets
GreenUp scales to large subnetsGreenUp scales to large subnets
Good load balance +h k henough awake machines
few managed machines!
GreenUp scales to large subnetsGreenUp scales to large subnets
• Simulated probing load on 2.4‐GHz, dual‐coreSimulated probing load on 2.4 GHz, dual core Windows 7 machine w/ 4GB memory and 1Gb/s NIC:
# of managed machines CPU utilization100 12%200 21%300 29%
• Guardians ensure max load is 100
GreenUp scales to large subnetsGreenUp scales to large subnets
• Simulated probing load on 2.4‐GHz, dual‐coreSimulated probing load on 2.4 GHz, dual core Windows 7 machine w/ 4GB memory and 1Gb/s NIC:
# of managed machines CPU utilization100 12%200 21%300 29%
• Guardians ensure max load is 100
Does GreenUp save more energy?Does GreenUp save more energy?
• Energy savings depends on sleep timeEnergy savings depends on sleep time
Does GreenUp save more energy?Does GreenUp save more energy?
• Energy savings depends on sleep timeEnergy savings depends on sleep time
Does GreenUp save more energy?Does GreenUp save more energy?
• Energy savings depends on sleep timeEnergy savings depends on sleep time
Average 31% $17.50/machine/year
Does GreenUp save more energy?Does GreenUp save more energy?
• Energy savings depends on sleep timeEnergy savings depends on sleep time
• IT enforces sleep policy at Microsoft, so hardIT enforces sleep policy at Microsoft, so hard to tell
Extension: Higher availability via l l d h d ffexplicit load hand‐off
M5
M8 M9M1
M2
M5
M6M3
M6
M7M4
M7
Extension: Higher availability via l l d h d ffexplicit load hand‐off
M5
M2
M5
M6M8
M3M6
M7M9
M4M7
M1
Extension: Higher availability via l l d h d ffexplicit load hand‐off
M5
M8 M9M1
M2
M5
M6M3
M6
M7M4
M7
Extension: Higher availability via l l d h d ffexplicit load hand‐off
M5
M2
M5
M6M3
M6
M7M4
M7M8 M9M1
Extension: Higher availability via l l d h d ffexplicit load hand‐off
+M5
M2
M5
M6M3
M6
M7M4
M7M8 M9M1
Extension: Higher availability via l l d h d ffexplicit load hand‐off
+M5
• Theorem Expected maxM2
M5
M6 • Theorem. Expected max load = n/d Hd.M3
M6
M7M4
# awake machines
Harmonic numbers
M7M8 M9M1
Other solutionsOther solutions
• Sleep proxy idea: Christensen & Gulledge ’98Sleep proxy idea: Christensen & Gulledge 98• Recently:
System TechniqueSomniloquy, NSDI ’09 augmented NICsq y, gLiteGreen, ATC ’10Jettison, EuroSys ’12 VM migration
SleepServer, ATC ’10 application stubsNedevschi et al., NSDI ’08 Reich et al ATC ’10 dedicated serversReich et al., ATC 10
Other solutionsOther solutions
• Sleep proxy idea: Christensen & Gulledge ’98Barriers to Sleep proxy idea: Christensen & Gulledge 98• Recently:
deployment
System TechniqueSomniloquy, NSDI ’09 augmented NICsq y, gLiteGreen, ATC ’10Jettison, EuroSys ’12 VM migration
SleepServer, ATC ’10 application stubsNedevschi et al., NSDI ’08 Reich et al ATC ’10 dedicated serversReich et al., ATC 10
GreenUpGreenUp
• Completely decentralized software‐onlyCompletely decentralized, software only sleep proxy
• Useful distributed systems techniques
• High availability at low cost, even as machines sleep!machines sleep!
http://research.microsoft.com/en‐us/projects/greenup/http://research.microsoft.com/en us/projects/greenup/