Ganesh Gopalakrishnan
Associate Professor
Computer Science
University of Utah
www.cs.utah.edu/~ganesh
* Verification of Coherence Protocols against Shared Memory Consistency Models using Test Model-Checking
* Overview of the Utah Verifier Group
Intel MPG talk of 11/12/99, Santa Clara:
Past Utah Verifier group Members• Ratan Nalumasu, PhD ‘98 (HP)
– new partial-order reduction algorithm and model-checker PV
– approach to write high-level specs for coherency protocols and obtain split transaction protocols automatically
– test model-checking approach • Abdel Mokkedem, Postdoc (Compaq)
– help in above, plus modeling & verifying the PCI 2.1 protocol
• Rajnish Ghughal, MS ‘99 (Intel, Oregon)– test model-checking for weak memory models
Present group Members• Ravi Hosabettu, PhD student
– approach to pipelined processor modeling and verification using layered abstraction map
– recently finished verification of high-level design model of CPU with reorder buffer, branches, speculation, exceptions (PVS proof - 35 days)
• Michael Jones, PhD student– verifying the PCI 2.1 protocol using an abstraction map to
PCI_abstract followed by a special-purpose SML model-checker for PCI_abstract
• Annette Bunker, PhD student– background research
• New group members: Ritwik Bhattacharya, Jason Yang, Ali Sezgin, Prosenjit Chatterjee
Verification of Coherence Protocols Against Shared Memory Consistency Models
Using Test Model-Checking
FM and shared-memory system design• Processor-speed growth faster than memory
speed-growth
• Mismatch exacerbated by shared memory multiprocessors
• Complex protocols employed to hide memory latencies
• Need for formal verification techniques that can be employed during design
• Handle strong (e.g. seq consistency) and weak (e.g. TSO) memory models
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
6
• Graf (CAV’94)– for more than SC (hence unsound for SC)– properties depend on design
• Alur, McMillan, Peled (LICS’96)– undecidable if data can be compared
• Nalumasu, Ghughal, Mokkedem, Gopalakrishnan (CAV’98)• Henzinger, Qadeer, Rajamani (CAV’99)
– needs invariants– invariants depend on design– assumes address-symmetry
• Collier (‘80s)– not available at design-time
Related Work
Memory Models
• Describes memory system’s behavior in
response to memory operations
Memory System
MemoryOperations(read or write)from variousprocesses
Uniprocessor Memory Model:the von Neumann model
• Memory operations (reads and writes) execute in the
order in which they appear in program
Memory
P
Sequential Consistency: A multiprocessor memory model
• Memory operations complete in program order
• A Write becomes instantly visible to all processors
Memory
PnP2P1 . . .
Weaker Memory Models
• Sequential Consistency : intuitive and strong memory model, but..– Does not allow many architectural optimizations
• Weaker memory models :– Memory operations can occur out of order– Allows for more architectural optimizations to
enable significant performance gain
• Many real processors are allowing weaker memory models e.g. Sun Ultra 4, Alpha, PowerPC, Intel etc.
An Example Weaker Memory ModelSPARC Total Store Order (TSO)
• The presence of local caches + write buffers + out of order memory accesses
• Performance vs. programming complexity
Memory
P1 P2 Pn
. . . .
Memory Model Verification Problem
CPU CPU
….
Mem
CPU+Cache
CPU+Cache
CPU+Cache
Snooping busMem
=
Why informal methods insufficient ?
• Danger of using incorrect optimizations– uniprocessor opt may not be legal for
multiprocessors
• Danger of incorrect implementations of legal optimizations
• Concurrency - informal methods inadequate
• Memory system semantics are complex and non-intuitive– more so for weaker memory models
An optimization : fine for uni-processor...
P1
F1 := 1R1 := F2
Writes have higher latencies than reads
A Simple Optimization : Let Read of F2 bypass write of F1
Works fine for uni-processor machines
… but not so for multiprocessors
P1
F1 := 1R1 := F2if (R1 == 0) critical section
P2
F2 := 1R2 := F1if (R2 == 0) critical section
Many optimizations in uni-processor designsnot applicable for multiprocessors
If Read bypasses Write then
Both P1 and P2 in critical section !!
Our main example: A Symmetric Multi-Processors (SMP) bus
CPU$
Memory
CPU$
CPU$
Coherentsnoopingbus
Problem studied: how can the CPU designer
- specify desired orderings of reads and writes
- verify the implementation for adherence (in appearance)
The `Utah Runway Bus Model’ (URM)
Runout
Runin
Cache lines
Noncoh
Client0 Client1
b a
Host
Broadcast
Coh_chans
Client0 Client1b a
Host
Broadcast
- Drive memory system model using test automata- See if error-state(s) reached
How test model-checking works
Deriving Test-automata
• Assume that memory-systems do not decode ‘data’ and use addresses only in = and != tests
• Establish Limited Address Theorems for the chosen memory model (PO in our case)– for an interesting class of programs, examining all two-address
programs is sufficient
• List all possible violations over 1- and 2-addresses
• Abstract these violations into test-automata
• Test automata – are sound
– completeness results under investigation
– found effective in practice
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
20
An Illustrative Example
P2X1 := AX2 := AX3 := A
....Xk := A
P1A := 1A := 2A := 3
....A := k
There exists some i,js.t. j < i /\ X(j) < X(i)
Suppose the observedexecutions are:
rd(1)
rd(0)
rd(0)
rd(1)
wr(0)
wr(1)
wr(1)Errorstate
P2P1
- Achieves the effect of k = infinity- Considers all interleavings
Then a_iare:
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
21
All one-address PO violations (1-3 of 5)
v is not the initialvalue T of a, and a is not writtenanywhere
(1)
P_i...rd(a, v)…
P_ j...…...
(2)
P_i...rd(a,v1)…rd(a,v2)...
P_ j…wr(a,v2)…wr(a,v1)...
P_ i and P_ jcould be thesame process
(3)
P_i...rd(a,v)…rd(a,T)...
P_ j…wr(a,v)…
P_ i and P_ jcould be thesame process
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
22
...All one-address PO violations (4-5 of 5)
v is not the initialvalue T of a, and a is not writtenbefore being read
(4) P_i...rd(a,v)…wr(a,v)...
(5) P_i...wr(a,v)…rd(a,T)...
Client0 Client1b a
Host
Broadcast Verification of Program Orderingfor all one-address programs
Error states: E1, E2
xmeans Write(A,x)
Read(A,-)
Error states: E1, E2
Client0 Client1b a
Host
Broadcast Verification of Program Orderingfor all two-address programs
x,ymeans Write(A,x)
Read(A,-)Read(B,-)
Write(B,y)
Error states: E1, E2
Client0 Client1b a
Host
BroadcastCan run demo of thismodel-checking on this laptop if there is interest (need to boot linux..)
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
26
How to Handle Weaker Memory Models?
• Identify new rules (if necessary)
• Create new tests and test model-checking automata
• Consider memory operations other than read and
write
– fences, barriers etc.
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
27
Weaker memory models - relaxations
• Partial-PO Relaxation :
– Relaxes PO partially - WR is always relaxed
– May relax WA in various orders
– examples : SPARC V9 TSO, PSO, Intel Pentium Pro,
Processor consistency etc.
• Complete-PO relaxation :
– Relaxes PO completely
– typically does not relax WA
– examples : SPARC V9 RMO, Alpha, PowerPC, Release
Consistency
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
28
SPARC Total Store Order (TSO)
• Relaxes Write-Read (WR) sub-rule
• Also relaxes WA in a subtle way
Memory
P1 P2 Pn
. . . .
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
29
TSO and PSO Specification (Ghughal, MS ‘99)
• TSO = (UPO,RO,WO,RW,WA-S,MB-WR)
• PSO = (UPO,RO,RW,WA-S,MB-WR,MB-WW)
• A series of “pure tests” are defined to test for
individual ordering rules (e.g. RO) in isolation
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
30
Motivation for Pure Tests
P2X1 := AX2 := AX3 := A
....Xk := A
P1A := 1A := 2A := 3
....A := k
There exists some i,js.t. j < i /\ X(j) < X(i)
rd(1)
rd(0)
rd(0)
rd(1)wr(0)
wr(1)
wr(1)Errorstate
P2P1
A visit to Error-state tells that ONE OFRO, WO, RW, or WR is violated -- NOTwhich one
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
31
Steps for creating test-automata
• Identify violation in the setting of a simple example
• Argue that regardless of WO, this violates RO
• Generalize error to execution sequence (next slide)
• Build test automata (following that)
P1A := 1;A := 2;
P2X := A;Y := A;Z := A;
Initialize all variables to 0
Finally A==2; X==Z==1 or 2, Y==1 or 2, Y!=X
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
32
Pure Test for RO over the same operand (WO is NOT assumed!)
P1
A:=1A:=2
..A:=k
P2
X[1]:=A X[2]:=A
..X[k]:=A
Condition : for all p, q, r : p < q < r : X[p] = X[r] => X[p] = X[q] = X[r]
• New Test for RO
• Formally proved that this (+ all others) are pure tests• Completeness still open.
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
33
Test Automata for RO on Same Operand Obtained Assuming Data Independence
s0
s1
A := 0
A := 0
s0
s1
read(A)
s2
X2 :=read(A)
X1 := read(A)
read(A)
P2
P1
Safety Property :
Finally, X1 = X3 = 1 => X1 = X2 = X3
Non-deterministic switch
read(A)
s2
X3 :=read(A)
read(A)
A := 1
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
34
Pure Test for RO- different operands- WO not assumed
P1
B:=1
P2
X := A;Y := B;C := Y;
P3
U := C;A := U;
• Initially all vars == 0• Finally all vars == 1 => In P2, B must have been read before A
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
35
Pure Test for RO- different operands- WO not assumed
P1
B:=1B:=2
..B:=k
P2
Y[0] := 0;X[1] := A;Y[1] := B;C := Y[i];X[2] := A;Y[2] := B;C := Y[2];
…X[k] := A;Y[k] := B;C := Y[k];
Condition : Exists i:1<= i<= k Forall j:0<=i: X[i] != Y[j]
P3
U[1] := C;A := U[1];U[2] := C;A := U[2];
...U[k] := C;A := U[k];
“X is getting ahead of all the Y’s so far” -- need to examinea history of values...
Turn into OR accumulatorvia data-independence!
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
36
Safety Property :
(P2 in S1 /\ y==0)
=> x==0
Test Automata for RO (diff opnds)
s0
B:=0s0
s1
read(A); t := read(B); C := t; y := y \/ t;
P2
P1
B:=1s0P3
u := read(C);A := u;
read(A); t := read(B); C := t;
x := read(A); t := read(B); C := t;
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
37
A Pure Test for (UPO, WO)
P1 P2
A := 1; B := 1;
B := 2; A := 2;
U[1] := B; V[1] := A;
... ...
... ...
A := 2k-1; B := 2k;
B := 2k; A := 2k; U[k] := B V[k] := A
Condition : forall i,j : U[i] is even or U[i] >= 2j or
V[j] is even or V[j] >= 2i
will need 2 bits for test model-checking automata
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
38
(P1 and P2
in their S1)
=>
u is even \/
u = 11 \/
v is even \/
v = 11
Test Automata for UPO,WO (diff opnds)
s0
s1
A := 01;B := 00;read(B);
P1
s0
s1
P2
A := 01;B := 00;u := read(B);
B := 01;A := 00;v := read(A);
B := 01;A := 00;read(A);
A := 11;B := 10;read(B);
B := 11;A := 10;read(A);
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
39
WA-Relaxation of TSO
• Execution valid under TSO but not under SC.
• WA Relaxation - captured by new rule WA-S
P1
A := 1;C := 1;U := C;X := B
P2
B := 1;D := 1;V := D;Y := A;
Initially A = B = C = D = U = V = X = Y = 0;
Finally, A = B = C = D = U = V = 1; X = Y = 0;
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
40
Rule of WA-S• WA :
– a write becomes visible to all processors “instantly”
– atomic set of events - all write events
• WA-S :
– a write becomes visible to all other processors
“instantly”
– atomic set of events - all write events in stores of
other processors
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
41
Memory Barriers - membar
• A Special type of memory operations which enforces
additional PO constraints as required
• could select a particular sub-rule of PO
• example : R1
:= A; membar
LoadStore; B := R2;
• also known as fences etc.
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
42
Rule of MB (MemBar)
• Define one event corresponding to each membar
instruction
Pi
L : membar storestore
• Enforce orderings between all relevant
operations before and after membar
• Consists of 4 sub-rules :
MB-RR , MB-RW, MB-WW, MB-WR
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
43
What about Rule of MB?
• only orders some reads and writes with respect to
each other
• Hence, could use test for sub-rules of PO to check for
various sub-rules of MB
– e.g. (CMP, RO) could be used for (CMP,MB-RR)
• will need a MB-RR instruction between every two
reads in Tests, but only 1 in test model-checking
automata
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
44
Test Automata for (CMP, MB-RR)
s0
s1
A := 0
A := 1
s0
s1
read(A)
s2
X2 :=read(A) ; MB-RR
X1 := read(A) ; MB-RR
read(A)
P2
P1
Finally, X1 = X3 => X1 = X2 = X3
Non-deterministic switch
read(A)
s2
X3 :=read(A) ; MB-RR
read(A)
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
45
New Tests and Test model-checking automata
• Also, developed new tests for
– CMP, UPO, RO - checks for read ordering between
two different operands
– CMP, UPO, WO - checks for write ordering
– CMP, UPO,CON - checks for coherency
• Developed corresponding test automata
• Provided formal proofs for each test and the test
model-checking automata abstraction
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
46
How to handle models such as Alpha weaker memory model?
• Relaxes Program Order completely
• Orderings guaranteed by explicit membar when needed
• Write atomicity is relaxed in a manner similar to TSO
• Specification as
(UPO, ROO, WA-S, MB, MB-WW)
• Tests developed for the same
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
47
Memory Systems Verified
• Verified three memory systems using VIS for SC
• Also did last example in Promela and SPIN / PV
• Serial Memory : a simple memory system
• Lazy Caching : A Simple bus-based protocol
involving queues
• Runway-PA8000 Memory system : A fairly
complex commercial multiprocessor memory
system from Hewlett Packard (the URM)
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
48
Experimental Results (VIS)
WA States Bdds Time
Serial Memory 212k 10k 34 sec
Lazy Caching 1.9M 513k 59 min
PA8000 985k 1.7M 40 hrs
PO States Bdds Time
Serial Memory 7k 7k 9 sec
Lazy Caching 7.8M 306k 36 min
PA8000 953k 1.6M 27 hrs
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
49
SC verification of the HP/Runway modelPromela, with SPIN and PV (#states)
Spin PV
PO-1 56K 2794
PO-2 > 5M/DNF 11M
SC-1 499K 7880
SC-2a > 5M/DNF 5.9M
SC-2b > 4M/DNF 574K
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
50
Experimental Results for TSO operational model (in VIS)
TA States Bdds Time
CMP, RO, WO 3k 4k < 1 s
CMP, PO 6.5M 50k 2:38 s
CMP, WR 6.5k 50k 1:25 s
CMP, RW 6.5k 50k 3:02 sCMP, RO 10k 2k 1:25 s
Green is Pass ; Red is Fail (as expected for TSO)
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
51
Memory Ordering Rules defined
• Seq Consistency (completeness under assumptions still being refined -
Nalumasu’98)
• Total Store Ordering
• Partial Store Ordering
• IBM 370
• Alpha
– Cross-checked our definitions for agreement against “Litmus Tests” defined in
the Alpha Architecture Manual
Test automata available for these (Ghughal MS’99)
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
52
Conclusions
• Test model-checking : – A new methodology for memory model
verification– could be effectively integrated in typical
design cycle
• Weaker memory model verification :– Developed test model-checking methodology
for weaker memory models– new rules to specify weaker memory models– new tests and test automata
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
53
Possible Future Work
• Explore instruction - data consistency : e.g. self-modifying code in shared data space?
• Explore other memory operations than read, write and barriers : e.g.
load-locked, store-conditional, TLB-flush, Cache flush, Cache Sync,...
• Explore issues related with explicit instruction fetching in shared data
space
• Use to study various speculative memory access schemes
• Work towards completeness
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
54
Possible Future Work (contd…)
• Explore multi-threaded executions• Explore speculation schemes for memory accesses• How to build reliable test-automaton coding techniques in the
framework of a model-checker• Use specialized reachability analysis procedures• Exploit symmetry - “semantic ones” too• See how it actually fits in design-flow• Explore possibilities to derive test-benches
BACKUP SLIDES
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
56
Formal Methods for Shared Memory System Design
Verification Provably-correctSynthesis
Theorem-proving
Model-checkingProtocol
Low-level concerns(e.g. deadlocks, progress,...)
Higher-level concerns (e.g. shared memory consistency models)
Finite-state Reachability
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
57
Example of problems due to “unexpected msgs”
Req Ack
Another Req? ? ?
Usually don’t know what to say…...saying nothing causes deadlock!
CacheCtrlr
DirectoryCtrlr
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
58
Overview of Synthesis Method
I ECacheCtrlr
F EDir Ctrlr
I E
F E
Req (N)ack
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
59
Model-checking Efficiency
Protocol N states / time(low level)
states / time(high level)
Mig 2 23,164 / 2.8 54 / 0.14 235 / 0.48 965 / 0.5
Inv 2 193,389 / 19.23 546 / 0.64 18,686 / 18.4
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
60
A Generic Example
P Q R
Q!aR!b
P?x
Q!c
R?y
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
61
Async Implementation of Example (i)
P Q R
Q!aR!b
P?x
R?y
Q!c
1 msg buffer location for Ack/Nack
R!!bQ!!a
11/12/99 Ganesh, Utah Verifier group -- Intel MPG talk
62
Async Implementation of Example (ii)
P Q R
Q!aR!b
P?x
R?y
Q!c
R!!bQ!!aQ!!cP!!ack
Progress Buffer
Proctype broadcast
• Read <trans, from, val, addr> from runout0 and runout1
• Send to runin0, runin1, and runin2
These are coherent transactions being acquired by all clients and the host (bus controller)
Proctype host
• Wait for a transaction to appear in runin, and its coherency responses to appear in coh_chans
• Decide whether to– merely ingest the data being put-out
(cache to cache copy happening), or – to supply the data (in which case
determine return ‘mode’ - (Shared_return or Private_return.)
Proctype client
• One client decides to behave like “P”
• The other behaves like “Q” (test automata for POS)
• P and Q first check for all “read moves”– done while line is readable (shared,
private-clean, or dirty)
• Then check for write possibilities ...
..write possibilities:Proctype client
• Write possibilites checked only if similar request not already made
• Also line must be writeable (private-clean or dirty)
• Line invalid => request via rsp or rp
• Clients snoop transactions (including own) via channel runin
• Every client also sends coherency response to host
..Proctype client
• Either host or another client supplies data through one of the non_coh inputs (if client, it supplies via c2cw - if host, it supplies via hdr)
• Correct sharing status also indicated when data is sent out.