Date post: | 02-Jul-2015 |
Category: |
Technology |
Upload: | abhik-roychoudhury |
View: | 16,926 times |
Download: | 0 times |
Abhik Roychoudhury
National University of Singapore
ISSTA 2013 Workshop - July 2013
1
HOW SYMBOLIC REASONING CAN
HELP PROGRAM (DEBUGGING AND)
REPAIR
DEBUGGING VS. BUG HUNTING
P
input = 0
output = 0
PG( pc = end output > input)
Model Checker
Counter-example:
input = 0, output = 0
We should have (output >
input)
(a) Debugging (b) Model Checking
2
ISSTA 2013 Workshop - July 2013
EXECUTION WITH SYMBOLIC INPUTS
3
out = in + 1 out = in * 2 Program P Program
Q
Symbolic input
in ==
Concrete output
out == + 1
Concrete output
out == 2*
To expose difference, try to find such that + 1 2 *
Symbolic input
in ==
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
PATH CONDITION COMPUTATION
4
1 input in;
2 z = 0; x = 0;
3 if (in > 0){
4 z = in *2;
5 x = in +2;
6 x = x + 2;
7 }
8 else …
9 if ( z > x){
return error;
}
in == 5
Line# Assignment store Path condition
1 {} true
2 {(z,0),(x,0)} true
3 {(z,0),(x,0)} in > 0
4 {(z,2*in), (x,0)} in > 0
5 {(z,2*in), (x,in+2)} in > 0
6 {(z,2*in), (x, in+4)} in > 0
7 {(z, 2*in), (x, in+4)} in > 0
9 {(z, 2*in), (x, in+4)} in>0 (2*in > in +4)
Using the assignment store, can also compute symbolic
expression for output along each path.
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
USAGE OF DSE
5
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
input x, y;
a = 0; b = 0;
if (x > y)
a = x;
else
a = y;
if (x + y > 10)
b = a;
return b;
Passing inputs: Continue the search for
failing inputs, those which do not go through
the “same” path.
Path condition of (x == 0, y == 0)x ≤ y x + y ≤ 10
x == 0, y == 0
x > y
a = x a = y
x +y >10
b = a
return b
Cover more paths
x ≤ y x + y ≤ 10
x ≤ y x + y ≤ 10
x ≤ y
IMPLICIT ASSUMPTION IN DSE
Inputs executing a path are “similar”.
If we test one of them, no need to test the others.
Use “similarity” to skip over parts of a large search
space.
DSE is a tool to achieve this goal.
Testing is search over paths, not search over inputs.
Coarser-grained notion of similarity?
6
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
THE SEARCH FOR “SIMILARITY”
Testing
No need to test “similar” inputs.
Can look for “similarity” beyond paths.
Debugging
Given a failing input – find “similar” inputs that pass
Logical comparison to detect “deviations” – bug report.
Repair
Find “similar” inputs showing the “same” error
Group all executions through which a fail is rescued.
Symbolic execution used to capture “intended behavior”.7
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
8
“SIMILARITY” BEYOND PATHS
1 int x,y,z; // input variables
2 int out; // output variable
3 int a;
4 int b = 2;
5 if(x - y > 0) //b1
6 a = x;
7 else
8 a = y;
9 if (x + y > 10) //b2
10 b = a;
11 if(z*z > 3) //b3
12 printf("square(z) > 3 \n");
13 else
14 printf("square(z) <= 3 \n");
15 out = b; //slicing criteria
If x − y > 0 and x + y > 10,
then out == xPaths: 1,2,3,4,5,6,9,10,11,12,15
1,2,3,4,5,6,9,10,11,13,14,15
If x − y ≤ 0 and x + y > 10,
then out == yPaths: 1,2,3,4,5,7,8,9,10,11,12,15
1,2,3,4,5,7,8,9,10,11,13,14,15
If x + y ≤ 10,
then out == 2Paths: 1,2,3,4,5,6,9,11,12,15
1,2,3,4,5,6,9,11,13,14,15
1,2,3,4,5,7,8,9,11,12,15
1,2,3,4,5,7,8,9,11,13,14,15
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
PROGRAM SUMMARY
9
¬(x+y >10) (out== 2)
(x-y > 0) (x+y > 10) (out == x)
¬(x-y > 0) (x+y > 10) (out == y)
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
Group inputs which produce the same symbolic output.
- Efficient testing, and debugging
RELEVANT SLICE CONDITION
10
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
1 int x,y,z; //input variables
2 int out; // output variable
3 int a;
4 int b = 2;
5 if(x - y > 0) //b1
6 a = x;
7 else
8 a = y;
9 if (x + y > 10) //b2
10 b = a;
11 if(z*z > 3) //b3
12 printf("square(z)>3 \n");
13 else
14 printf("square(z)<=3\n");
15 out = b; //slicing criteria
Relevant Slicing
Potential Dependence
Path condition computed over
relevant slice
Backward dynamic slicing
control,
data and
potential dependence.
Precisely captures i-o relationship
Groups several paths together
PROPERTIES
t, t’ program inputs
π(t): execution trace of input t
RSC(π(t)): relevant slice condition computed on π(t)
Same symbolic output:
Given a path π(t), if an input t’ satisfies RSC(π(t)), then
RSC(π(t’) is the same as RSC(π(t)). π(t) and π(t’)
computes the same symbolic output.
Complete RSC coverage:
Path exploration (based on reordered RSC) can explore all
symbolic outputs.
11
Property for Path Condition: Suppose is a path condition, if t’ satisfy , , then the path condition for contains as a prefix.
)...( 21 i )'(t
)...( 21 i
However, this does NOT hold for Relevant-Slice Condition, making the exploration completely out of order.
mi
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
)...( 21 mf
(EXPECTED) VALIDATION
12
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
30
100
200
300
400
500Relevant Slice Condition
Paths explored Average formula size
0
50000
100000
150000
200000 Relevant Slice Condition
REGRESSION DEBUGGING
Old Stable
Program P
Test Input t
New Buggy
Program P’
1
3
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
ADAPTING TRACE COMPARISON
Directly Compare σ and π
Old Stable
Program P
Test Input t
New Buggy
Program P’
Path σ
for t
Path π
for t
New Input t’
14
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
THE SEARCH FOR “SIMILARITY”
Old Pgm. P
New Pgm. P’
Buggy input
The new test input
15
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
DARWIN
f:Path condition of t in P
Old Stable
Program P
Test Input t
New Buggy
Program P’
Alternative Input t’
Concrete and
Symbolic Execution
STP Solver
and input
validation
Satisfiable sub-
formulae from
f f’
f':Path condition of t in P’
'ff
Bug Report (Assembly level)
Bug Report (Source level)16
ISSTA 2013 Workshop - July 2013
CHOOSING ALTERNATIVE INPUTS
b1
b6
b3
b2
b4
b5
11
2
3
4
5
2
3
)...(' 21 mf
1f
21f
321f
' Solve ff
'f At most m alternate inputs !!
Check for satisfiability of
1
7
BUG REPORT COMPUTATION
b1
b6
b3
b2
b4
b5
1
2
3
4
5
3
'f 321f
tnew = input obtained by solving
Bug report by comparing traces of tbug
and tnew should be the branch b3 !!
At most m alternate inputs
at most m lines in bug report.
tbug
tnew
18
)...(' 21 mf
' Solve ff
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
COARSER-GRAINED “SIMILARITY”
Old Pgm. P
New Pgm. P’
Buggy input
The new test input
19Solve rsc rsc’ instead of f f ’
rsc, rsc’ Relevant slice conditions
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
RESULTS ON DARWIN20
Programs Path ConditionRelevant Slice
Condition
Time
JLex 543min 15min
Jtopas 81min 5min
NanoXML 3min 43s
Results
JLex 50LOC 3LOC
Jtopas 4LOC 4LOC
NanoXML 8LOC 6LOC
Less time
Better result
Smaller formula to solve, Less formula to solve ->
More accurate bug report, obtained faster.
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
IF WE ARE INTERESTED IN STATISTICS
Jlex
~7290 LoC
v1.2.1 vs. v1.1.1
Diff == 518 LoC
Jtopas
~5754 LoC
v0.7 vs. v0.8
Diff == 2489 LoC
NanoXML
~5244 LoC
v2.1 vs. v2.2
Diff == 2496 LoC21
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
Other results in DARWIN paper
First implementation on top of
BitBlaze (thanks to BitBlaze team)
Results on
libPNG – 36K LoC
TCPflow – 1000 LoC
Different implementations of
web-servers
Miniweb, Savant against Apache.
PROGRAM REPAIR
Correctness specification Test suite
Program repair Passing all tests
Repair strategy Rescue failing executions
Use of symbolic execution
Group together all executions through which a failing execution
could be rescued.
New notion of “similarity”
22
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
0. THE PROBLEM
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
inhibit up_sep down_se
p
Observed
output
Expected
Output
Result
1 0 100 0 0 pass
1 11 110 0 1 fail
0 100 50 1 1 pass
1 -20 60 0 1 fail
0 0 10 0 0 pass
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
23
1. FIND A SUSPECT
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Line Score Rank
4 0.75 1
8 0.6 2
3 0.5 3
6 0.5 3
5 0 5
7 0 5
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
24
2 WHAT IT SHOULD HAVE BEEN
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
inhibit up_sep down_se
p
Observed
output
Expected
Output
Result
1 11 110 0 1 fail
inhibit = 1, up_sep = 11, down_sep = 110
bias = X, path condition = true
inhibit = 1, up_sep = 11, down_sep = 110
bias = X, path condition = X> 110
inhibit = 1, up_sep = 11, down_sep = 110
bias = X, path condition = X ≤ 110
Line 4
Line 7 Line 8
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
25
2. WHAT IT SHOULD HAVE BEEN
1 int is_upward( int inhibit, int up_sep, int
down_sep){
2 int bias;
3 if (inhibit)
4 bias = f(inhibit, up_sep, down_sep)
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Inhibit
== 1
up_sep
== 11
down_se
p == 110
Symbolic Execution
f(1,11,110) > 110
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
26
3. FIX THE SUSPECT
Accumulated constraints
f(1,11, 110) > 110
f(1,0,100) ≤ 100
…
Find a f satisfying this constraint
By fixing the set of operators appearing in f
Candidate methods Search over the space of expressions
Program synthesis with fixed set of operators
More efficient!!
Generated fix
f(inhibit,up_sep,down_sep) = up_sep + 100
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
27
TO RECAPITULATE
Ranked Bug report
Hypothesize the error causes – suspect
Symbolic execution
Specification of the suspicious statement
Input-output requirements from each test
Repair constraint
Program synthesis
Decide operators which can appear in the fix
Generate a fixed statement by solving repair constraint.
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
28
WHAT IT SHOULD HAVE BEEN
Buggy Program
…
var = a + b – c;x
Concrete test input
Concrete Execution
Symbolic Execution with x as the
only unknown
Path conditions,
Output Expressions
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
29
EXAMPLE
30
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = f(inhibit, up_sep, down_sep) // X
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Inhibit == 1 up_sep == 11 down_sep == 110
Symbolic Execution
( pcj outj == expected_out(t) )
f(t) == X
j Paths
Repair constraint
( (X >110 1 ==1)
(X ≤ 110 0 == 1)
)
f(1,11,110) == X
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
30
TO RECAPITULATE
Ranked Bug report
Hypothesize the error causes – suspect
Symbolic execution
Specification of the suspicious statement
Input-output requirements from each test
Repair constraint
Program synthesis
Decide operators which can appear in the fix
Generate a fix by solving repair constraint.
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
31
WHY PROGRAM SYNTHESIS
Instead of solving
Select primitive components to be used by the synthesized program
based on complexity
Look for a program that uses only these primitive components and
satisfy the repair constraint
Where to place each component?
What are the parameters?
int tmp = down_sep -1;
return up_sep + tmp;
int tmp=down_sep + 1;
return tmp- inhibit;
int tmp = down_sep -1;
return tmp + inhibit ;int tmp = down_sep -1;
return tmp + inhibit ;
+
+
inhibit up_sep
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
Repair Constraint:
f(1,11,110) > 110 f(1,0,100) ≤ 100
f(1,-20,60) > 60
32
LOCATION VARIABLES
Define location variables for each component
Constraint on location variables solved by SMT. Well-formed e.g. defined before being used
Output constraint from each test (repair constraint)
Meaning of the components
Lines determine the value Lx == Ly x == y
Once locations are found, program is constructed.
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
Components = {+}
Lin == 0, Lout == 1, Lout+ == 1, Lin1+ == 0, Lin2+ == 0
0 r0 = input;
1 r = r0 + r0;
2 return r;
33
SUBJECTS USED
34
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
Subject LoC # Versions Description
TCAS 135 41 Air Traffic Control
Schedule 304 9 Process scheduler
Schedule2 262 9 Process scheduler
Replace 518 29 Text processing
Grep 9366 2 Text search engine
SIR programs
Subject LoC
mknod 183
mkdir 159
mkfifo 107
cp 2272
GNU CoreUtils
Repaired by both GP and
SEMFIX.
Ours/GP = 0.63 (time)
WHY IS SEMFIX MORE STABLE?
0
5
10
15
20
25
30
35
40
45
10 20 30 40 50
Total
Semfix
GenProg
# tests
# o
f pro
gra
ms r
epaired
TCAS
Overall 90 programs from SIR
SemFix repaired 48/90, GenProg repaired 16/90 for 50 tests.
GenProg running time is >3 times of SemFix
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
Time bound = 4 mins.
35
TYPE OF BUGS (SIR)
Total SemFix GenProg
Constant 14 10 3
Arithmetic 14 6 0
Comparison 16 12 5
Logic 10 10 3
Code
Missing
27 5 3
Redundant
Code
9 5 2
ALL 90 48 16
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
36
EXAMPLE FIXES
enabled = High_Confidence && (Own_Tracked_Alt_Rate <=
OLEV); /*&& (Cur_Vertical_Sep > MAXALTDIFF);missing
code*/
Synthesizes missing code
tmp = Up_Separation;
Synthesizes
tmp = ((OtherCapability < Alt_Layer_Value)?
Two_of_Three_Reports_Valid:
Cur_Vertical_Sep
);
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
37
STEPPING BACK, PERSPECTIVE
[Obvious] Level of automation
Never completely ~ Programming environments!
Program synthesis likely to play a useful role.
Is debugging required?
Testing search and repair combined.
Avoid statistical fault localization.
Find the location to fix via symbolic reasoning and
MAXSAT – not clear about quality of repair produced.
Can generate suggestions instead of repairs?
What is a repair (or not) may depend on context.
38
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
SPECIFIC APPLICATIONS OF REPAIR
Role-based sanitization of HTML output
XSS attacks – insert scripts into web-pages
Role-based XSS sanitization – reduce false +ve
39
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3
WordPress, a popular blogging application, groups users into roles.
•A user in the author role can create a new post in the blog with
most non-code tags permitted.
•Anonymous commenter can use only few text formatting tags.
(S)he cannot insert images, but authors can.
•Neither can insert <script> tag or …
Un-trusted input flows into HTML tag context, but sanitizer applies
changes as function of the user role.
- Weinberger et al, ESORICS 2011.
Given the policy, a hand-in-hand test generation followed by (context-
sensitive) repair?
REFERENCES
Path Exploration based on Symbolic Output Dawei Qi, Hoang D.T.
Nguyen, Abhik Roychoudhury, ESEC-FSE 2011, To appear in TOSEM.
DARWIN: An Approach for Debugging Evolving Programs Dawei
Qi, Abhik Roychoudhury, Zhenkai Liang, Kapil Vaswani, ESEC-FSE
2009, TOSEM 21(3), 2012.
SemFix: Program Repair via Semantic Analysis Hoang D.T.
Nguyen, Dawei Qi, Abhik Roychoudhury, Satish Chandra, ICSE 2013.
Co-authors
Dawei Qi, Zhenkai Liang, HDT Nguyen – NUS.
Satish Chandra – IBM.
Kapil Vaswani – MSR.
Collaborator (ongoing)
Prateek Saxena – NUS, Mattia Fazzini (visiting)40
ISS
TA
20
13
Work
sho
p -
Ju
ly 2
01
3