Reactive Spin-locks: A Self-tuning Approach
Phuong Hoai Ha
Marina Papatriantafilou
Philippas Tsigas
I-SPAN ’05, Las Vegas,Dec. 7th – 9th, 2005
I-SPAN '05 2
Outline
• Mutual exclusion– Overhead– Available reactive spin-locks
• New reactive spin-lock– Model– Algorithm– Evaluation
• Conclusions
I-SPAN '05 3
Mutual exclusion
• Performance goals:– Low latency– Low contention– …
Entry section Critical section Exit sectionNoncritical sec.
Lock releasedRequests issuedArbitrationLock sent to winner
I-SPAN '05 4
Spin-lock categories
• Arbitrating locks:– Determine who is the next lock-holder in advance, e.g.
ticket-locks, queue-locks.– Advantages:
• Prevent processors from causing bursts in network traffic and high contention on the lock.
• Non-arbitrating locks:– E.g. Test-and-set locks– Advantages:
• Exploit locality/cache• Tolerate failures in the Entry section.
I-SPAN '05 5
Arbitrating vs. non-arbitrating locks
InterconnectionNetwork
InterconnectionNetwork
11 33 55
22 44 66
InterconnectionNetwork
InterconnectionNetwork
I-SPAN '05 6
Available reactive spin-lock algorithms
• Drawbacks:– Their reactive schemes rely on
• Fixed experimental thresholds– The thresholds frequently become inappropriate in variable
and unpredictable environments like multiprogramming systems
– E.g. ticket locks with proportional backoff, test-and-test-and-set locks with exponential backoff
• Known probability distributions of some inputs– The assumption is not usually feasible.
I-SPAN '05 7
New reactive spin-lock algorithm
• Ideas– A non-arbitrating lock with adaptive sensible backoff
delay.
• Advantages– Its reactive scheme is self-tuning
• Neither experimentally tuned thresholds nor probability distributions of inputs are needed
– It combines advantages of both arbitrating and non-arbitrating spin-lock categories.
• It can exploit locality as well as reduce contention on the lock.
I-SPAN '05 8
Find sensible backoff delay• Need to optimize trade-off between:
– Latency • The interval between a pair of lock-release and lock-acquisition
– Contention on the lock • This is an online problem.
Load on the lock
delay=?
I-SPAN '05 9
Reactive scheme
– Increase delay only when the load on lock is the highest so far,– When increasing delay, increase just enough to keep the competitive ratio c = P - (P-1)/P1/(P-1)
• Bounds for loads on the lock: 1 lt P• During a load-rising phase:
• Similar for load-dropping phase
• In each load-rising/load-dropping phase, the reactive scheme is competitive with competitive ration c=(ln(P))
I-SPAN '05 10
InterconnectionNetwork
InterconnectionNetwork
Algorithm
00
00 11
33 22
11223344
•The algorithm guarantees mutual exclusion and non-livelock. Its space complexity is log(P).
I-SPAN '05 11
Evaluation
• Benchmarks– Spark98 kernel: lmv– SPLASH-2 suite: Volrend and Radiosity
• Representatives:– Arbitrating: ticket lock with (tuned) proportional
backoff– Non-arbitrating: test-and-test-and-set lock with (tuned)
exponential backoff
• System– A ccNUMA SGI Origin2000 with 28 250MHz MIPS
R1000 processors.
I-SPAN '05 12
Experimental results
Spark98_Complete_Sgi2k_ExecTime
0
200
400
600
800
1000
1200
1 4 8 12 16 20 24 28
#processors
tim
e (m
s)
tts ticket reactive
I-SPAN '05 13
Experimental results (2)
Volrend_Sgi2k_ExecTime
0
200
400
600
800
1000
4 8 12 16 20 24 28
#processors
tim
e (m
s)
tts ticket reactive
I-SPAN '05 14
Experimetal results (3)
Radiosity_Sgi2k_ExecTime
0
2000
4000
6000
8000
10000
12000
14000
16000
4 8 12 16 20 24 28
#processors
tim
e (m
s)
tts ticket reactive
I-SPAN '05 15
Conclusions
• We have designed and implemented a new reactive spin-lock:– It is self-tuning.– It combines advantages of both arbitrating and non-
arbitrating locks– Its reactive scheme is competitive with c= (ln(P))
The lock automatically adjusts its backoff delay reasonably according to loads on the lock as well as applications
Thanks for your attention!
I-SPAN '05 17
Estimate delay bases • Fairness
– A fair lock helps parallel application gain performance since the application threads can execute their non-critical section in parallel.
– Definition:
• Heuristic to estimate basel
Nn
nfairness
ii
ii
t .max
2
.
DoCS
bDoCSabasel
, where a, b are system documented constants
and DoCS is the delay outside CS
, where ni is #lock-acquisitions of a processor in t and N is #processors
I-SPAN '05 18
NUMA• Another parameter that makes the problem harder is NUMA
– Latency is much different– E.g. ccNUMA SGI Origin2000
I-SPAN '05 19
Model: An online problem
• A sequence of loads on the lock are unfolded on-the-fly.• When observing a load, the algorithm must decide how much its
current backoff delay should be lengthened.– If increasing delay too soon, it will waste time on a long delay when
the lock becomes available– If not increasing delay in time, it will cause high contention on the
lock
it must increase delay at high loads reasonably
Goal is to maximize t delayt .loadt ,where t delayt P
I-SPAN '05 20
Algorithm• LockType:
<lock, counter>
• Initial delay = L.counter x
basel
• The algorithm guarantees mutual exclusion and non-livelock. Its space complexity is log(P).
Acquire( Lock pL)L = FAA(pL.L, <1,1>)if L.lock then delay = ComputeDelay(L) cond = <1,0>do sleep(delay) L = pL.L if L.lock then
delay = ComputeDelay(L) continue;
cond = FAA(pL.L, <1,0>) while cond.lock
Release( Lock pL)do L = pL.Lwhile not CAS(pL.L,L,<0,L.counter-1>)