How Good can a Transactional Memory be?
R. Guerraoui , EPFL
How Good can a Transactional Memory
be?
From the New York TimesSan Francisco, May 7, 2004Intel announces a drastic change in its business strategy:« Multicore is THE way to boost performance »
Multicores Multicores areare almost almost everywhereeverywhere
Dual-core commonplace in laptopsQuad-core in desktopsDual quad-core in serversAll major chip manufacturers produce multicore CPUs
SUN Niagara (8 cores, 64 concurrent threads)Intel Xeon (6 cores)AMD Opteron (4 cores)…
The free ride is over
The free ride is over
Every one will need to fork threads
Forking threads is easy
Handling their conflicts is hard
Traditional scalingTraditional scaling
1x2x
4x
Time: Moore’s Law
Speedup
User code
Traditional CPU
Speedup
1x2x
4x
User code
Multicore CPU
Time: Moore’s Law
Ideal multicore scalingIdeal multicore scaling
Real multicore scalingReal multicore scaling
Speedup
1x1.4x
2.2x
User code
Multicore CPU
Time: Moore’s Law
Parallelization & synchronization require
great care!
Parallelization & synchronization require
great care!
Coarse grained locks => slow
Fine grained locks => errors
Double-ended queue
Enqueue Dequeue
A synchronization abstraction that is:
Simple to use Efficient to implement
Wanted
Transactions
accessing object 1; accessing object 2;
Back to the undergraduate level
atomic {
}
Historical perspective
Eswaran et al (CACM’76) Database Papadimitriou (JACM’79) Theory Liskov/Sheifler (TOPLAS’82) Language Knight (ICFP’86) Architecture Herlihy/Moss (ISCA’93) Hardware Shavit/Touitou (PODC’95) Software
Simple example
(consistency invariant)
0 < x < y
T: x := x+1 ; y:= y+1
Simple example
(transaction)
Two-phase locking
To write O, T requires a write-lock on O;
To read O, T requires a read-lock on O;
Before committing, T releases all its locks
T waits if some T’ acquired a lock on O
T waits if some T’ acquired a write-lock on O
Why two phases?
To write O, T requires a write-lock on O; T waits if some T’ acquired a lock on O
To read O, T requires a read-lock on O; T waits if some T’ acquired a write-lock on O
T releases the lock on O when done with O
Why two phases?
T1
T2
read(0) write(1)
O1 O2
read(0) write(1)
O2 O1
Two-phase locking - better dead than wait
-
To write O, T requires a write-lock on O;
To read O, T requires a read-lock on O;
Before committing, T releases all its locks
T aborts if some T’ acquired a lock on O
A transaction that aborts restarts again
T aborts if some T’ acquired a lock on O
Two-phase locking - better kill than wait -
To write O, T requires a write-lock on O;
To read O, T requires a read-lock on O;
Before committing, T releases all its lock
T aborts T’ if T’ acquired a lock on O
T aborts T’ if T’ acquired a write-lock on O
A transaction that aborts restarts again
Two-phase locking- invisible reads -
To write O, T requires a write-lock on O; T aborts T’ if some T’ acquired a write-lock
on OTo read O, T checks if all objects read remain valid - else T aborts
Before committing, T checks if all objects read remain valid and releases all its locks
“It is better for Intel to get involved in this [Transactional Memory] now so when we get to the point of having …tons… of cores we will have the answers”
Justin Rattner, Intel Chief Technology Officer
“…we need to explore new techniques like transactional memory that will allow us to get the full benefit of all those transistors and map that into higher and higher performance.”
Bill Gates, President of a transparently operated private foundation
“…manual synchronization is intractable…transactions are the only plausible solution….”
Tim Sweeney, Epic Games
Speedup
1x2x
4x
User code
Multicore CPU
Time: Moore’s Law
Multicore scalingMulticore scaling
Micro-benchmarks …
Linked-lists; red-black trees, etc.Consider specific loads: typically read-only transactions
Challenging TMsSTMBench7 (Eurosys’07)
Large data structure: challenge memory overhead
Long operations: kills non-linear algorithms
Complex access patterns: stretches flexibility
STMBench7:(Transact’08) Transaction Terminator
All TMs collapsed because of internal bugs or memory usage (except X)
Certain urban legends about performance revealed inaccurate; e.g., OF is inherently slower than lock-based TMs
Real-world scalingReal-world scaling
Speedup
1x1.4x
2.2x
User code
Multicore CPU
Time: Moore’s Law
Parallelization & synchronization require
great care!
Parallelization & synchronization require
great care!
Software Transactional Memory:Why is it only a Research Toy?
C. Cascaval, C. Blundell, M. Michael,H. Cain, P. Wu, S. Chiras, S. Chatterjee
Why STM can be more thana Research Toy?
A. Dragojević, P. Felber, V. Gramoli, R. Guerraoui
How Good can a Transactional Memory
be?
How much commit can a TM perform?
A closer look at TM
What really prevents commitment? (Safety)
How much commitment is practically possible? (Progress)
To write an object O, a transaction acquires O and aborts “the” transaction that owns O
To read an object, a transaction T takes a snapshot to see if the system hasn’t changed since T’s last reads; else T is aborted
Two-phase locking- invisible reads -
Killer write (ownership)
Careful read (validation)
Two-phase locking- invisible reads -
More efficient algorithm
Apologizing versus asking permission
Killer writeLazy read: validity check at commit time
A history is atomic if itsrestriction to committed transactions is serializable
The old safety (Pap79)
A history H of committed transactions is serializable if there is a history S(H) that is (1) equivalent to H(2) sequential (3) legal
Back to the example
Invariant: 0 < x < yInitially: x := 1; y := 2
Division by zero
T1: x := x+1 ; y:= y+1
T2: z := 1 / (y - x)
T1: x := 3; y:= 6
Infinite loop
T2: a := y; b:= x; repeat b:= b + 1 until a = b
We need to restrict ALL transactions (as with critical sections)
The old safety property restricts only committed transactions
A history H is opaque if for every transaction T in H, at least one history in committed(T,H) is serializable
Candidate property: Opacity(PPoPP’08)
Careful read (validation)
Killer write (ownership)
Two-phase locking- invisible reads -
Visible vs Invisible Read (SXM; RSTM)
Write is mega killer: to write an object, a transaction aborts any live one which has read or written the object
Visible but not so careful read: when a transaction reads an object, it says so
TheoremReads are eithervisible or careful
NB. Assuming minimal progress and single versions
Intuition of the proof
T1
T2
read()
write()commit
I1,I2,..,Im
O1,O2,..,Onread()
Ik
The theorem does not hold for classical atomicity
i.e., the theorem does not hold for database transactions
A closer look at TM
What really prevents commitment? (Opacity)
How to prove opacity?
Model checking
Check that the conflict graph is acyclic
Number of nodes is unbounded (STM) NP-Complete problem
Model checking opacity?
Hint: reduce the verification space
Uniform system All transactions are treated equallyTransaction and variable names do not
matter
Atomic system Read, write and commit operations are
atomic
TM verification theorem (PLDI’08)
A TM either violates opacity with 2 transactions and 3 variables or satisfies it with any number of variables and transactions
Examples
It takes 15mn to check the correctness of TL2 and DSTM
Reverse two lines in TL2: bug found in 10mn
A closer look at TM
What really prevents commitment? Opacity
How much commitment is possible (given opacity)?
Ideal progress
No correct transaction aborts
T1
T2
read()
write()
commit
O1
O1write()
O2
Aborting is a fatality
read()
O2
abort
Eventual progress
Every correct transaction eventually commits
NB. We allow the possibility for a transaction to abort a finite number of times as long as it eventually commits
Eventual progress
T1
T2
read()
write()
commit
O1
O1write()
O2
read()
O2
abort
Eventual progress is impossible
NB. This impossibility is fundamentally different from FLP: It holds no matter what underlying object is used
A closer look at TM
What really prevents commitment? Opacity
How much commitment is possible (given opacity)?
Conditional progress - obstruction-freedom -
A correct transaction that does not encounter contention eventually commits
DSTM
To write O, T requires a write-lock on O (use C&S);
T aborts T’ if some T’ acquired a write-lock on O (use C&S)
To read O, T checks if all objects read remain valid - else abort (use C&S)
Before committing, T releases all its locks (use C&S)
DSTM uses C&S
C&S is the strongest synchronization primitive
Is OF-TM possible with less than C&S?
e.g., R/W objects
The consensus number of OF-TM is 2 (SPAA’08)
OF-TM cannot be implemented with R/W objects only
OF-TM does not need C&S
A closer look at TM
What really prevents commitment? Opacity
How much commitment is practically possible? conditional
More?
Permissiveness (DISC’08)
A TM is permissive if it never aborts when it should not
More precisely
Let P be any safety property and H any P-safe history prefix of a deterministic TM
We say that a TM is permissive w.r.t P if – Whenever <H;commit> satisfies P– <H;commit> can be generated by the TM
Permissive TM
• But….
• Very expensive (maintain global conflict graphs)
• Let P be any safety property and H any history H generated by a TM
• The TM is probabilistic permissive with respect to P if – Whenever <H;commit> satisfies P:– <H;commit> can be generated by
TM with a positive probability
Probabilistic permissiveness
• AVSTM: a probabilistically permissive TM w.r.t opacity
• Idea: at commit, a transaction guesses a serialization time in the future
AVSTM
• Non-blocking
• Good performance under high-contention
• Can be combined with an OF-TM
AVSTM
A closer look at TM
• What really prevents commitment? Opacity
• How much commitment is practically possible? conditional; permissiveness
• How fast?
Off-line scheduler (PODC’95)
• Compare the TM protocol with an off-line scheduler that knows:
– The starting time of transactions– Which objects are accessed (i.e., conflicts)
Greedy contention manager
• State• Priority (based on start time)• Waiting flag (set while waiting)
• Wait if other has• Higher priority AND not waiting
• Abort other if• Lower priority OR waiting
Example of competitive ratio
• Let s be the number of objects accessed by all transactions
• Compare time to commit all transactions
• Greedy is O(s)-competitive with the off-line scheduler– GHP’05 O(s2)– AEST’06 O(s)
Joint work with
• A. Dragojevic• T. Henzinger• M. Herlihy • M. Kapalka• V. Sing
• See Transactions@epfl: lpdwww.epfl.ch
Transactions are conquering the parallel programming world
They look simple and familiar and might make the programmer
happy
They are in fact very tricky and should make US happy
The one slide to remember