+ All Categories
Home > Documents > Program Analysis and Synthesis of Parallel Systems

Program Analysis and Synthesis of Parallel Systems

Date post: 24-Feb-2016
Category:
Upload: bevis
View: 58 times
Download: 0 times
Share this document with a friend
Description:
Program Analysis and Synthesis of Parallel Systems. Roman Manevich Ben-Gurion University. Three papers. A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12] - PowerPoint PPT Presentation
Popular Tags:
95
Program Analysis and Synthesis of Parallel Systems Roman Manevich Ben-Gurion University
Transcript
Page 1: Program Analysis and Synthesis of Parallel Systems

Program Analysis and Synthesis of Parallel Systems

Roman Manevich Ben-Gurion University

Page 2: Program Analysis and Synthesis of Parallel Systems

Three papers

1. A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11]

2. Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12]

3. Parameterized Verification of Transactional Memories [PLDI’10]

Page 3: Program Analysis and Synthesis of Parallel Systems

What’s the connection?

A Shape Analysisfor Optimizing ParallelGraph Programs [POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs [OOPSLA’12]

Parameterized Verificationof Transactional Memories [PLDI’10]

From analysisto language design

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Page 4: Program Analysis and Synthesis of Parallel Systems

What’s the connection?

A Shape Analysisfor Optimizing ParallelGraph Programs [POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs [OOPSLA’12]

Parameterized Verificationof Transactional Memories [PLDI’10]

From analysisto language design

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Page 5: Program Analysis and Synthesis of Parallel Systems

A Shape Analysis for Optimizing Parallel Graph Programs

Dimitrios Prountzos1

Keshav Pingali1,2

Roman Manevich2

Kathryn S. McKinley1

1: Department of Computer Science, The University of Texas at Austin2: Institute for Computational Engineering and Sciences, The

University of Texas at Austin

Page 6: Program Analysis and Synthesis of Parallel Systems

6

Motivation• Graph algorithms are ubiquitous

• Goal: Compiler analysis for optimization of parallel graph algorithms

Computational biology Social Networks

Computer Graphics

Page 7: Program Analysis and Synthesis of Parallel Systems

7

Minimum Spanning Tree Problem

c d

a b

e f

g

2 4

6

5

3

7

4

1

Page 8: Program Analysis and Synthesis of Parallel Systems

8

Minimum Spanning Tree Problem

c d

a b

e f

g

2 4

6

5

3

7

4

1

Page 9: Program Analysis and Synthesis of Parallel Systems

9

Boruvka’s Minimum Spanning Tree Algorithm

Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node

c d

a b

e f

g

2 4

6

5

3

7

4

1d

a,c b

e f

g

4

6

3

4

17

lt

Page 10: Program Analysis and Synthesis of Parallel Systems

10

Parallelism in Boruvka

c d

a b

e f

g

2 4

6

5

3

7

4

1

Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node

Page 11: Program Analysis and Synthesis of Parallel Systems

11

Non-conflicting iterations

c d

a b

2

5

3

7

4

1

Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node

e f

g

4

6

Page 12: Program Analysis and Synthesis of Parallel Systems

12

Non-conflicting iterations

Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node

d

a,c b3

4

17

e f,g6

Page 13: Program Analysis and Synthesis of Parallel Systems

13

Conflicting iterations

c d

a b

e f

g

2 4

6

5

3

7

4

1

Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node

Page 14: Program Analysis and Synthesis of Parallel Systems

Optimistic parallelization in Galois• Programming model

– Client code has sequential semantics– Library of concurrent data structures

• Parallel execution model– Thread-level speculation (TLS)– Activities executed speculatively

• Conflict detection– Each node/edge has associated exclusive lock– Graph operations acquire locks on

read/written nodes/edges– Lock owned by another thread conflict

iteration rolled back– All locks released at the end

• Two main overheads– Locking– Undo actions

i1

i2

i3

Page 15: Program Analysis and Synthesis of Parallel Systems

Generic optimization structure

Program AnnotatedProgram

ProgramAnalyzer

ProgramTransformer

OptimizedProgram

Page 16: Program Analysis and Synthesis of Parallel Systems

Overheads (I): locking• Optimizations– Redundant locking elimination– Lock removal for iteration private data– Lock removal for lock domination

• ACQ(P): set of definitely acquired locks per program point P• Given method call M at P:

Locks(M) ACQ(P) Redundant Locking

Page 17: Program Analysis and Synthesis of Parallel Systems

18

Overheads (II): undo actions

Lockset Grows

Lockset Stable

Failsafe

foreach (Node a : wl) {

}

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

Program point P is failsafe if: Q : Reaches(P,Q) Locks(Q) ACQ(P)

Page 18: Program Analysis and Synthesis of Parallel Systems

GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

Lockset analysis

• Redundant Locking• Locks(M) ACQ(P)

• Undo elimination• Q : Reaches(P,Q)

Locks(Q) ACQ(P)

• Need to compute ACQ(P) : Runtime

overhead

Page 19: Program Analysis and Synthesis of Parallel Systems

The optimization technically

• Each graph method m(arg1,…,argk, flag) contains optimization level flag• flag=LOCK – acquire locks • flag=UNDO – log undo (backup) data• flag=LOCK_UNO D – (default) acquire locks and

log undo• flag=NONE – no extra work

• Example:Edge e = g.getEdge(lt, n, NONE)

Page 20: Program Analysis and Synthesis of Parallel Systems

Analysis challenges

• The usual suspects: – Unbounded Memory Undecidability – Aliasing, Destructive updates

• Specific challenges:– Complex ADTs: unstructured graphs– Heap objects are locked– Adapt abstraction to ADTs

• We use Abstract Interpretation [CC’77]– Balance precision and realistic performance

Page 21: Program Analysis and Synthesis of Parallel Systems

Shape analysis overview

HashMap-Graph

Tree-based Set

……

Graph { @rep nodes @rep edges …}

Graph Spec

Concrete ADTImplementationsin Galois library

Predicate Discovery

Shape Analysis

Boruvka.javaOptimizedBoruvka.java

Set { @rep cont …}

Set Spec

ADT Specifications

Page 22: Program Analysis and Synthesis of Parallel Systems

ADT specification

Graph<ND,ED> {

@rep set<Node> nodes @rep set<Edge> edges

Set<Node> neighbors(Node n);

}

Graph Spec

...Set<Node> S1 = g.neighbors(n);

...

Boruvka.java

Abstract ADT state by virtual set fields

@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) )

Assumption: Implementation satisfies Spec

Page 23: Program Analysis and Synthesis of Parallel Systems

Graph<ND,ED> {

@rep set<Node> nodes@rep set<Edge> edges

@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);}

Modeling ADTs

c

a bGraph Spec

dst

src

srcdst

dst

src

Page 24: Program Analysis and Synthesis of Parallel Systems

Modeling ADTs

c

a b

nodes edges

Abstract State

cont

ret nghbrs

Graph Spec

dst

src

srcdst

dst

src

Graph<ND,ED> {

@rep set<Node> nodes@rep set<Edge> edges

@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);}

Page 25: Program Analysis and Synthesis of Parallel Systems

Abstraction scheme

cont cont

S1 S2L(S1.cont) L(S2.cont)

(S1 ≠ S2) ∧ L(S1.cont) ∧ L(S2.cont)

• Parameterized by set of LockPaths: L(Path) o . o ∊ Path Locked(o)– Tracks subset of must-be-locked objects

• Abstract domain elements have the form: Aliasing-configs 2LockPaths …

Page 26: Program Analysis and Synthesis of Parallel Systems

27

( L(y.nd) ) )( ( L(x.nd))

( L(y.nd) ) ( L(y.nd) L(x.rev(src)) ) )( ( L(x.nd))

Joining abstract states

( L(y.nd) ) ( () L(x.nd) )

Aliasing is crucial for precisionMay-be-locked does not enable our optimizations

#Aliasing-configs : small constant (6)

Page 27: Program Analysis and Synthesis of Parallel Systems

28

lt

GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

Example invariant in Boruvka The immediate neighbors of a and lt are locked

a

( a ≠ lt ) ∧ L(a) L(a.rev(src)) L(a.rev(dst))∧ ∧ ∧ L(a.rev(src).dst) L(a.rev(dst).src) ∧ ∧ L(lt) L(lt.rev(dst)) L(lt.rev(src)) ∧ ∧ ∧ L(lt.rev(dst).src) L(lt.rev(src).dst)∧

…..

Page 28: Program Analysis and Synthesis of Parallel Systems

Heuristics for finding LockPaths

• Hierarchy Summarization (HS)– x.( fld )*– Type hierarchy graph acyclic

bounded number of paths

– Preflow-Push: • L(S.cont) L(S.cont.nd)∧• Nodes in set S and their data are locked

Set<Node>

S

Node

NodeData

cont

nd

Page 29: Program Analysis and Synthesis of Parallel Systems

30

Footprint graph heuristic• Footprint Graphs (FG)[Calcagno et al. SAS’07]

– All acyclic paths from arguments of ADT method to locked objects– x.( fld | rev(fld) )* – Delaunay Mesh Refinement: L(S.cont) L(S.cont.rev(src)) L(S.cont.rev(dst)) ∧ ∧ ∧ L(S.cont.rev(src).dst) L(S.cont.rev(dst).src)∧– Nodes in set S and all of their immediate neighbors are

locked

• Composition of HS, FG– Preflow-Push: L(a.rev(src).ed)

FG HS

Page 30: Program Analysis and Synthesis of Parallel Systems

Experimental evaluation• Implement on top of TVLA– Encode abstraction by 3-Valued Shape Analysis

[SRW TOPLAS’02]• Evaluation on 4 Lonestar Java benchmarks

• Inferred all available optimizations• # abstract states practically linear in program size

Benchmark Analysis Time (sec)

Boruvka MST 6Preflow-Push Maxflow 7Survey Propagation 12Delaunay Mesh Refinement 16

Page 31: Program Analysis and Synthesis of Parallel Systems

Impact of optimizations for 8 threads

Boruvka MST Delaunay Mesh Refinement

Survey Propa-gation

Preflow-Push Maxflow

0

50

100

150

200

250

Baseline Optimized

Tim

e (s

ec)

5.6× 4.7×11.4×

2.9× 8-core Intel Xeon @ 3.00 GHz

Page 32: Program Analysis and Synthesis of Parallel Systems

Note 1

• How to map abstract domain presented so far to TVLA?– Example invariant: (x≠y L(y.nd)) (x=y L(x.nd))– Unary abstraction predicate x(v) for pointer x– Unary non-abstraction predicate

L[x.p] for pointer x and path p– Use partial join– Resulting abstraction similar to the one shown

Page 33: Program Analysis and Synthesis of Parallel Systems

Note 2

• How to come up with abstraction for similar problems?1. Start by constructing a manual proof• Hoare Logic

2. Examine resulting invariants and generalize into a language of formulas• May need to be further specialized for a given

program – interesting problem (machine learning/refinement)

– How to get sound transformers?

Page 34: Program Analysis and Synthesis of Parallel Systems

Note 3

• How did we avoid considering all interleavings?

• Proved non-interference side theorem

Page 35: Program Analysis and Synthesis of Parallel Systems

Elixir : A System for Synthesizing Concurrent Graph Programs

Dimitrios Prountzos1

Roman Manevich2

Keshav Pingali1

1. The University of Texas at Austin2. Ben-Gurion University of the Negev

Page 36: Program Analysis and Synthesis of Parallel Systems

GoalAllow programmer to easily implement

correct and efficientparallel graph algorithms

• Graph algorithms are ubiquitousSocial network analysis, Computer graphics, Machine learning, …

• Difficult to parallelize due to their irregular nature

• Best algorithm and implementation usually

– Platform dependent– Input dependent

• Need to easily experiment with different solutions• Focus: Fixed graph structure

• Only change labels on nodes and edges• Each activity touches a fixed number of nodes

Page 37: Program Analysis and Synthesis of Parallel Systems

• Problem Formulation– Compute shortest distance

from source node S to every other node• Many algorithms

– Bellman-Ford (1957)– Dijkstra (1959)– Chaotic relaxation (Miranker 1969)– Delta-stepping (Meyer et al. 1998)

• Common structure– Each node has label dist

with known shortest distance from S• Key operation

– relax-edge(u,v)

Example: Single-Source Shortest-Path

2 5

1 7

A B

C

D E

F

G

S

34

22

1

9

12

2 A

C

3

if dist(A) + WAC < dist(C) dist(C) = dist(A) + WAC

Page 38: Program Analysis and Synthesis of Parallel Systems

Scheduling of relaxations:• Use priority queue of nodes,

ordered by label dist• Iterate over nodes u in priority

order• On each step: relax all

neighbors v of u – Apply relax-edge to all (u,v)

Dijkstra’s algorithm2 5

1 7

A B

C

D E

F

G

S

34

22

1

9

7

53

6

<C,3> <B,5><B,5> <E,6> <D,7><B,5>

Page 39: Program Analysis and Synthesis of Parallel Systems

40

Chaotic relaxation

• Scheduling of relaxations:• Use unordered set of edges• Iterate over edges (u,v) in any

order• On each step:– Apply relax-edge to edge (u,v)

2 5

1 7

A B

C

D E

F

G

S

34

22

1

9

5

12

(S,A) (B,C) (C,D)(C,E)

Page 40: Program Analysis and Synthesis of Parallel Systems

Insights behind Elixir

What should be done How it should

be done

Unordered/Ordered algorithms

Operator Delta

: activity

Parallel Graph Algorithm

Operators Schedule

Order activity processing

Identify new activities

Static Schedule

Dynamic Schedule

“TAO of parallelism”PLDI 2011

Page 41: Program Analysis and Synthesis of Parallel Systems

Insights behind ElixirParallel Graph

Algorithm

Operators Schedule

Order activity processing

Identify new activities

Static Schedule

Dynamic Schedule

Dijkstra-style Algorithm

q = new PrQueueq.enqueue(SRC)

while (! q.empty ){ a = q.dequeue

for each e = (a,b,w){ if dist(a) + w < dist(b){

dist(b) = dist(a) + w q.enqueue(b)

} } }

Page 42: Program Analysis and Synthesis of Parallel Systems

Contributions

• Language– Operators/Schedule separation– Allows exploration of

implementation space• Operator Delta Inference

– Precise Delta required for efficient fixpoint computations

• Automatic Parallelization– Inserts synchronization to atomically

execute operators– Avoids data-races / deadlocks– Specializes parallelization based on

scheduling constraints

Parallel Graph Algorithm

Operators Schedule

Order activity processing

Identify new activities

Static Schedule

Dynamic Schedule

Synchronization

Page 43: Program Analysis and Synthesis of Parallel Systems

44

SSSP in ElixirGraph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, dist bd)

edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]

sssp = iterate relax schedule≫

Graph type

OperatorFixpointStatement

Page 44: Program Analysis and Synthesis of Parallel Systems

OperatorsGraph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]

sssp = iterate relax schedule ≫

Redex patternGuardUpdate

ba if bd > ad + w

adw

bd

ba

adw

ad+w

Cautiousby construction – easy to generalize

Page 45: Program Analysis and Synthesis of Parallel Systems

46

Fixpoint statementGraph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]

sssp = iterate relax schedule ≫

Apply operator until fixpoint

Scheduling expression

Page 46: Program Analysis and Synthesis of Parallel Systems

Scheduling examplesGraph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]

sssp = iterate relax schedule ≫

Locality enhanced Label-correctinggroup b unroll 2 approx metric ad ≫ ≫Dijkstra-style

metric ad group b ≫

q = new PrQueueq.enqueue(SRC)while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } }}

Page 47: Program Analysis and Synthesis of Parallel Systems

Operator Delta Inference

Parallel Graph Algorithm

Operators Schedule

Order activity processing

Identify new activities

Static Schedule

Dynamic Schedule

Page 48: Program Analysis and Synthesis of Parallel Systems

Identifying the delta of an operator

b

a

relax1

??

Page 49: Program Analysis and Synthesis of Parallel Systems

Delta Inference Example

ba

SMT Solver

SMT Solver

assume (da + w1 < db)

assume ¬(dc + w2 < db)

db_post = da + w1

assert ¬(dc + w2 < db_post)Query Program

relax1

c

w2

w1

relax2

(c,b) does not become active

Page 50: Program Analysis and Synthesis of Parallel Systems

assume (da + w1 < db)

assume ¬(db + w2 < dc)

db_post = da + w1

assert ¬(db_post + w2 < dc)Query Program

Delta inference example – active

SMT Solver

SMT Solver

ba

relax1

cw1

relax2

w2

Apply relax on all outgoing edges (b,c) such that:

dc > db +w2

and c a≄

Page 51: Program Analysis and Synthesis of Parallel Systems

Influence patterns

b=cad

ba=c

d

a=dc

b

b=da=c b=ca=d

b=da

c

Page 52: Program Analysis and Synthesis of Parallel Systems

System architecture

Elixir

Galois/OpenMP Parallel Runtime

Algorithm Spec

Parallel Thread-PoolGraph ImplementationsWorklist Implementations

Synthesize codeInsert synchronization

C++Program

Page 53: Program Analysis and Synthesis of Parallel Systems

ExperimentsExplored Dimensions

Grouping Statically group multiple instances of operator

Unrolling Statically unroll operator applications by factor K

Dynamic Scheduler Choose different policy/implementation for the dynamic worklist

...

Compare against hand-written parallel implementations

Page 54: Program Analysis and Synthesis of Parallel Systems

SSSP results

1 2 4 8 16 20 240

100

200

300

400

500

600

700

800

Elixir

Lonestar

Threads

Tim

e (m

s)

• 24 core Intel Xeon @ 2 GHz• USA Florida Road Network (1 M nodes, 2.7 M Edges)

Group + Unroll improve localityImplementation Variant

Page 55: Program Analysis and Synthesis of Parallel Systems

Breadth-First search results

Scale-Free Graph1 M nodes, 8 M edges

USA road network24 M nodes, 58 M edges

1 2 4 8 16 20 240

100

200

300

400

500

600

700

800

900

1000

Elixir (Variant 1)LonestarCilk

Threads

Tim

e (m

s)

1 2 4 8 16 20 240

1000

2000

3000

4000

5000

6000

7000

Elixir (Variant 2)LonestarCilk

Threads

Tim

e (m

s)

Page 56: Program Analysis and Synthesis of Parallel Systems

Conclusion

• Graph algorithm = Operators + Schedule– Elixir language :

imperative operators + declarative schedule• Allows exploring implementation space• Automated reasoning for efficiently computing

fixpoints• Correct-by-construction parallelization • Performance competitive with hand-parallelized

code

Page 57: Program Analysis and Synthesis of Parallel Systems

Parameterized Verificationof Software Transactional Memories

Michael Emmi Rupak MajumdarRoman Manevich

Page 58: Program Analysis and Synthesis of Parallel Systems

Motivation• Transactional memories [Herlihy ‘93]

– Programmer writes code w. coarse-grained atomic blocks – Transaction manager takes care of conflicts providing illusion of

sequential execution• Strict serializability – correctness criterion

– Formalizes “illusion of sequential execution”• Parameterized verification

– Formal proof for given implementation– For every number of threads– For every number of memory objects– For every number and length of transactions

59

Page 59: Program Analysis and Synthesis of Parallel Systems

STM terminology

• Statements: reads, writes, commit, abort• Transaction: reads and writes of variables followed by

commit (committing transaction)or abort (aborting transaction)

• Word: interleaved sequence of transactions of different threads

• Conflict: two statements conflict if– One is a read of variable X and other is a commit of a

transaction that writes to X– Both are commits of transactions that write to X

60

Page 60: Program Analysis and Synthesis of Parallel Systems

Safety property: strict serializability

• There is a serialization for the committing threads such that order of conflicts is preserved

• Order of non-overlapping transactions remains the same

61

Page 61: Program Analysis and Synthesis of Parallel Systems

Safety property: strict serializability• There is a serialization for the committing threads such

that order of conflicts is preserved• Order of non-overlapping transactions remains the same• Example word: (rd X t1), (rd Y t2), (wr X t2), (commit t2), (commit t1)

=>

Can be serialized to : (rd X t1), (commit t1), (rd Y t2), (wr X t2), (commit t2)

conflict

conflict62

Page 62: Program Analysis and Synthesis of Parallel Systems

Main results

• First automatic verification ofstrict serializability for transactional memories– TPL, DSTM, TL2

• New proof technique:– Template-based invisible invariant generation– Abstract checking algorithm to check inductive

invariants

63

Challenging – requires reasoning on both

universal and existential properties

Page 63: Program Analysis and Synthesis of Parallel Systems

Outline

• Strict serializability verification approach• Automating the proof• Experiments• Conclusion• Related work

64

Page 64: Program Analysis and Synthesis of Parallel Systems

Proof roadmap 1

Goal: prove model M is strictly serializable1. Given a strict-serializability reference model

RSS reduce checking strict-serializabilityto checking that M refines RSS

2. Reduce refinement to checking safety– Safety property SIM: whenever M can execute a

statement so can RSS– Check SIM on product system M RSS

65

Page 65: Program Analysis and Synthesis of Parallel Systems

Proof roadmap 2

3. Model STMs M and RSS in first-order logic• TM models use set data structures and typestate bits

4. Check safety by generating strong enough candidate inductive invariant and checking inductiveness– Use observations on structure of transactional

memories– Use sound first-order reasoning

66

Page 66: Program Analysis and Synthesis of Parallel Systems

Reference strict serializability model

• Guerraoui, Singh, Henzinger, Jobstmann [PLDI’08]• RSS : Most liberal specification of strictly

serializable system– Allows largest language of strictly-serializable

executions• M is strictly serializable iff every word of M is also

a word of RSS– Language(M) Language(RSS)– M refines RSS

67

Page 67: Program Analysis and Synthesis of Parallel Systems

Modeling transactional memories

• Mn,k=(predicates,actions)– Predicate : ranked relation symbol

p(t), q(t,v),…– Binary predicates used for sets so instead of rs(t,v)

I’ll write vrs(t)– Action :

a(t,v) = if pre(a) then p’(v)=…, q’(u,v)=…• Universe = set of k thread individuals and n memory

individuals• State S = a valuation to the predicates

68

Page 68: Program Analysis and Synthesis of Parallel Systems

Reference model (RSS) predicates

• Typestates:– RSS.finished(t), RSS.started(t), RSS.pending(t),

RSS.invalid(t)• Read/write sets– RSS.rs(t,v), RSS.ws(t,v)

• Prohibited read/write sets– RSS.prs(t,v), RSS.pws(t,v)

• Weak-predecessor– RSS.wp(t1,t2)

69

Page 69: Program Analysis and Synthesis of Parallel Systems

DSTM predicates

• Typestates:– DSTM.finished(t), DSTM.validated(t),

DSTM.invalid(t), DSTM.aborted(t)• Read/own sets– DSTM.rs(t,v), DSTM.os(t,v)

70

Page 70: Program Analysis and Synthesis of Parallel Systems

RSS commit(t) action

71

if RSS.invalid(t) RSS.wp(t,t) thent1,t2 . RSS.wp’(t1,t2) t1t t2t (RSS.wp(t1,t2) RSS.wp(t,t2) (RSS.wp(t1,t) v . vRSS.ws(t1) vRSS.ws(t)))…

post-statepredicate

current-statepredicate

action precondition

write-write conflict

executing thread

Page 71: Program Analysis and Synthesis of Parallel Systems

DSTM commit(t) action

if DSTM.validated(t) thent1 . DSTM.validated’(t1) t1t DSTM.validated(t1) v . vDSTM.rs(t1) vDSTM.os(t1)…

72

read-own conflict

Page 72: Program Analysis and Synthesis of Parallel Systems

FOTS states and execution

v

rd v t1

DSTMt1

t2

73

memory locationindividual

threadindividual

state S1

Page 73: Program Analysis and Synthesis of Parallel Systems

FOTS states and execution

v

rd v t1

DSTM.rs

predicate evaluationDSTM.rs(t1,v)=1

DSTMt1

t2

DSTM.started

74

predicate evaluationDSTM.started(t1)=1

state S2

Page 74: Program Analysis and Synthesis of Parallel Systems

FOTS states and execution

v

wr v t2

DSTM.rs DSTMt1

t2 DSTM.ws

DSTM.started

DSTM.started

75

state S3

Page 75: Program Analysis and Synthesis of Parallel Systems

Product system

• The product of two systems: AB • Predicates = A.predicates B.predicates• Actions =

commit(t) = { if (A.pre B.pre) then … }rd(t,v) = { if (A.pre B.pre) then … }…

• M refines RSS iff on every executionSIM holds: action a M.pre(a) RSS.pre(a)

76

Page 76: Program Analysis and Synthesis of Parallel Systems

Checking DSTM refines RSS

• The only precondition in RSS is for commit(t)• We need to check SIM =

t.DSTM.validated(t) RSS.invalid(t) RSS.wp(t,t)

holds for DSTM RSS for all reachable states• Proof rule:

77

DSTMRSS SIM

DSTM refines RSS

how do we check this safety

property?

Page 77: Program Analysis and Synthesis of Parallel Systems

Checking safety by invisible invariants

• How do we prove that property holdsfor all reachable states of system M?

• Pnueli, Ruah, Zuck [TACAS’01]• Come up with inductive invariant that

contains reachable states of M and strengthens SIM:

78

I1: Initial I2: transition ’ I3:

M

Page 78: Program Analysis and Synthesis of Parallel Systems

Strict serializability proof rule

Proof roadmap:1. Divine candidate invariant 2. Prove I1, I2, I3

I1: Initial I2: transition ’ I3: SIM

DSTM RSS SIM

79

Page 79: Program Analysis and Synthesis of Parallel Systems

Two challenges

Proof roadmap:1. Divine candidate invariant 2. Prove I1, I2, I3But

how do we find a candidate ?infinite space of possibilitiesgiven candidate how do we check the proof rule? checking AB is undecidable for first-order logic

I1: Initial I2: transition ’ I3: SIM

DSTM RSS SIM

80

Page 80: Program Analysis and Synthesis of Parallel Systems

Our solution

Proof roadmap:1. Divine candidate invariant 2. Prove I1, I2, I3But

how do we find a candidate ?use templates and iterative weakeninggiven candidate how do we check the proof rule? use abstract checking

I1: Initial I2: transition ’ I3: SIM

DSTM RSS SIM

81

utilize insightson transactional memory

implementations

Page 81: Program Analysis and Synthesis of Parallel Systems

Invariant for DSTMRSSP1: t, t1 . RSS.wp(t,t1) RSS.invalid(t) RSS.pending(t)

v . vRSS.ws(t1) vRSS.ws(t)

P2: t, v . vRSS.rs(t) DSTM.aborted(t) vDSTM.rs(t)

P3: t, v . vRSS.ws(t) DSTM.aborted(t) vDSTM.os(t)

P4: t . DSTM.validated(t) RSS.wp(t,t)

P5: t . DSTM.validated(t) RSS.invalid(t)

P6: t . DSTM.validated(t) RSS.pending(t)

Inductive invariant involving only RSS – can use for all future proofs

Page 82: Program Analysis and Synthesis of Parallel Systems

Templates for DSTMRSSP1: t, t1 . 1(t,t1) 2(t) 3(t)

v . v4(t1) v5(t)

P2: t, v . v1(t) 2(t) v3(t)

P3: t, v . V1(t) 2(t) v3(t)

P4: t . 1(t) 2(t,t)

P5: t . 1(t) 2(t)

P6: t . 1(t) 2(t)

Page 83: Program Analysis and Synthesis of Parallel Systems

Templates for DSTMRSSt, t1 . 1(t,t1) 2(t) 3(t) v . v4(t1) v5(t)

t, v . v1(t) 2(t) v3(t)

t . 1(t) 2(t,t)

Why templates?• Makes invariant separable• Controls complexity of invariants• Adding templates enables refinement

Page 84: Program Analysis and Synthesis of Parallel Systems

Mining candidate invariants

• Use predefined set of templates to specify structure of candidate invariants– t,v . 1 2 3

– 1, 2, 3 are predicates of M or their negations– Existential formulas capturing 1-level conflicts

v . v4(t1) v5(t2)

• Mine candidate invariants from concrete execution

85

Page 85: Program Analysis and Synthesis of Parallel Systems

Iterative invariant weakening

• Initial candidate invariant C0=P1 P2 … Pk• Try to prove I2: transition ’

C1 = { Pi | I0 transition Pi for PiI0}

• If C1=C0 then we have an inductive invariant• Otherwise, compute

C2 = { Pi | C1 transition Pi for PiC1}• Repeat until either– found inductive invariant – check I3: Ck SIM– Reached top {} – trivial inductive invariant

I1: Initial I2: transition ’ I3: SIM

DSTM RSS SIM

86

Page 86: Program Analysis and Synthesis of Parallel Systems

Weakening illustration

P1 P2 P3 P1 P2 P3

Page 87: Program Analysis and Synthesis of Parallel Systems

Abstract proof rule

I1: (Initial) I2: abs_transition(()) ’ I3: () SIM

DSTM RSS SIM

I1: Initial I2: transition ’ I3: SIM

DSTM RSS SIM

Formulaabstraction

Abstracttransformer

Approximateentailment

88

Page 88: Program Analysis and Synthesis of Parallel Systems

Conclusion

• Novel invariant generation using templates – extends applicability of invisible invariants

• Abstract domain and reasoning to check invariants without state explosion

• Proved strict-serializability for TPL, DSTM, TL2– BLAST and TVLA failed

89

Page 89: Program Analysis and Synthesis of Parallel Systems

Verification resultsproperty TPL DSTM TL2 RSS

Bound for invariant gen.

(2,1) (2,1) (2,1) (2,1)

No. cubes 8 184 344 7296

Bounded time 4 8 10 23

Invariant mining time

6 13 26 57

#templates 28 28 28 28

#candidates 22 53 97 19

#proved 22 30 68 14

#minimal 4 8 5 -

avg, time per invariant

3.7 20.3 36 43.4

avg. abs. size 31.7 256.9 1.19k 2.86k

Total time 3.5m 54.3m 129.3m 30.9m

90

Page 90: Program Analysis and Synthesis of Parallel Systems

Insights on transactional memories

• Transition relation is symmetric – thread identifiers not usedp’(t,v) … t1 … t2

• Executing thread t interacts only with arbitrary thread or conflict-adjacent thread

• Arbitrary thread:v . vTL2.rs(t1) vTL2.ws(t1)

• Conflict adjacent:v . vDSTM.rs(t1) vDSTM.ws(t)

91

Page 91: Program Analysis and Synthesis of Parallel Systems

v1

readset

writeset

v2

t2

t

t3

writeset

read-writeconflict

write-writeconflict

Conflict adjacency

92

v . vrs(t) vDSTM.ws(t2)v . vws(t1) vDSTM.ws(t2)

Page 92: Program Analysis and Synthesis of Parallel Systems

......

t2

t

t3

Conflict adjacency

93

Page 93: Program Analysis and Synthesis of Parallel Systems

Related work

• Reduction theoremsGuerarroui et al. [PLDI’08,CONCUR’08]

• Manually-supplied invariants – fixed number of threads and variables + PVSCohen et al. [FMCAS’07]

• Predicate Abstraction + Shape Analysis– SLAM, BLAST, TVLA

94

Page 94: Program Analysis and Synthesis of Parallel Systems

Related work

• Invisible invariantsArons et al. [CAV’01] Pnueli et al. [TACAS’01]

• Templates – for arithmetic constraints• Indexed predicate abstraction

Shuvendu et al. [TOCL’07]• Thread quantification

Berdine et al. [CAV’08]Segalov et al. [APLAS’09]

95

Page 95: Program Analysis and Synthesis of Parallel Systems

96

Thank You!


Recommended