Dataflow Analysis for Datarace-Free Programs
(ESOP ‘11)
Arnab DeJoint work with Deepak D’Souza
and Rupesh NasreIndian Institute of Science, Bangalore
Why Datarace-Free Programs?
DRF programs
Very weak guarantees
Sequentially consistent semantics
Java, C++, …
programs
Dataraces are often indicators of bugs.
Racy programs
SC for DRF
DRF?No Bug/Memory model
specific reasoning required
Performoptimization
assume DRF Optimized code
Analysis for DRF programs!
Yes
Verifier
Compiler
Datarace-Free Programs
In an execution, a release action synchronizes-with (sw) all acquire actions on same variable after it.
In an execution, happens-before (hb) relation is reflexive, transitive closure of synchronizes-with and program-order.
In all SC executions, all conflicting accesses must be ordered by happens-before.
Datarace-Free Programst1++;lock l;x = 1;unlock l;
t2++;lock l;x = 2;unlock l;
t++;lock l;x = 1;unlock l;
t2++;lock l;x = 2;unlock l;
sw edgepo edge po edge
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn(“cons”);
prod () { while (1) { lock (l); oldv = *p->data; free (p->data); newv = nextv (oldv); p->data = new (...); *p->data = newv; unlock (l); }}
cons () { while (1) { lock (l); v = *p->data; unlock (l); }}
Dataflow Analysis for Concurrent Programs
Kill dataflow facts conservatively.– More precise.
Track interleavings precisely.– More efficient.
Handle simple program constructs.– Handle modern language constructs.
Handle simple analyses.– Handle more complex analyses.
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn (“cons”);
prod () { while (1) { lock (l); oldv = *p->data; free (p->data); newv = nextv (oldv); p->data = new (...); *p->data = newv; unlock (l); }}
cons () { while (1) { lock (l); v = *p->data; unlock (l); }}
p
p,p->data
p,p->data
p,p->datap,p->datap,p->data
pp
p,p->datap.p->data
p,p->datap,p->datap,p->datap,p->data
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn (“cons”);
prod () { while (1) { lock (l); oldv = *p->data; free (p->data); newv = nextv (oldv); p->data = new (...); *p->data = newv; unlock (l); }}
cons () { while (1) { lock (l); v = *p->data; unlock (l); }}
p
p,p->data
p,p->data
p,p->datap,p->datap,p->data
pp
p,p->datap.p->data
p,p->datap,p->datap,p->datap,p->data
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn (“cons”);
prod () { while (1) { lock (l); oldv = *p->data; free (p->data); unlock (l); newv = nextv (oldv); lock (l); p->data = new (...); *p->data = newv; unlock (l); }}
cons () { while (1) { lock (l); v = *p->data; unlock (l); }}
p
p,p->data
p,p->data
p,p->datap,p->datap,p->data
pppp
p,p->datap.p->data
p,p->datappp
Our Algorithm for Lifting Sequential Analyses for Concurrent Programs
Build sync-CFG: add may-synchronize-edges from release to corresponding acquire instructions, if they can run in parallel.– From fork to first instruction of child thread.– From unlock to lock instructions on same lock
variable.– From last instruction of a child thread to join
instruction waiting for it.– …– May need to over-approximate the edges.
Our Algorithm for Lifting Sequential Analyses for Concurrent Programs
Sequential analysis on sync-CFG: – Consider flow function for
synchronization instructions as id.– Construct flow equations on sync-CFG.– Compute least fixed point (lfp) of flow
equations.
Restrictions on Analysis
Value Set analysis:– Collects set of values for each lvalue at
each program point, loses the correlation.– l := e : evaluate e on the input value
set and update the value set of l.– if(e) : propagate values that can make e true to true branch, similarly for false branch.
– Join operation is point-wise union.– Treats aliases conservatively.
Restrictions on Analysis (2)
Abstractions of value set analysis:– A is an abstraction of VS if there are α
and γ such that α(lfp of VS) ≤ lfp of A and lfp of VS ≤ γ(lfp of A).
– Null-pointer analysis, Interval analysis, Constant propagation, May pointer analysis…
Interpreting the Result
We assume that the value set of an lvalue (or its abstraction) is relevant only at those program points where that lvalue is read.– Result of NPA is important only where
the pointer is dereferenced.– Result of CP is important only where
that variable is read.Our result is sound only for relevant
lvalues at a given program point.
Why does it work?
For Value Set analysis:
– LFP of sequential analysis over-approximates join-over-all-paths in sync-CFG.
– It is enough to show that if an execution produces a value v for an lvalue l relevant at a program point E, then there is a path in sync-CFG that includes v in VS(l) at E.
Path in Sync-CFG
W: x = y
R: … = x
• Induction over execution length.
• W and R are related by hb.
• hb = (po U sw)*
• Flow functions of po edges over-approximate execution behavior.
• Flow functions of sw edges are identity.
Context-Sensitive Analysis
Analysis domain: – call string -> abstract state
On a call site c, – [s -> a] -> [sc -> a]
On return to call site c, – [sc -> a] -> [s -> a]
Context-Sensitive Analysis for Concurrent Programs
Use a summary component at each may-synchronize-with edge.
Join all the states at acquire and put in summary.
Join the summary with all (non-bottom) states at release.
Results
0102030405060708090
100
jdbf jtds jdbm
% o
f d
eref
eren
ces
pro
ved
saf
eour technique
sequentialanalysis
seq analysis
actually safe
our analysis
all derefs
Comparison with RADAR
Sources of Imprecision
Alias analysis, may happen in parallel analysis, …
Representation of multiple dynamic threads by a single static thread.
Paths in sync-CFG that do not correspond to any real execution.
foo() {
lock l;
x++;
unlock l;
}
main() {
fork(foo);
…
fork(foo);
}
bar() {
lock l;
x++;
unlock l;
}
baz() {
lock l;
x++;
unlock l;
}
Conclusion
A dataflow analysis technique for DRF programs.
Defined the conditions for soundness.Demonstrated scalability and
precision.