Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | pearl-mitchell |
View: | 218 times |
Download: | 0 times |
University of Maryland
Problem How much do you know about your
10 year old code base?– What types of bugs have been most
common? Implicit rules build up over time
– What do you do with a return value from a function?
– Didn’t someone rewrite the matrix objects?• how do you apply a transformation to an
image now? Failure understand implicit rules
leads to bugs– 32% of bugs detected during maintenance1[1] Matsumura, T., Monden, A., Matsumoto, K., The Detection of Faulty Code Violating Implicit Coding Rules, IWPSE ’02
University of Maryland
Source Code Change History
We can discover important properties of the code by looking at code changes– every change is committed– changes highlight misunderstood code– changes highlight new code
Studying each commit gives fine-grain knowledge– how quickly does a property emerge?– how fast is a property adopted?– how often is it used later?
University of Maryland
Applications
Bug finding– what types of bugs have been fixed in the
past?– what functions were involved?– Return Value Check Bug Finder
Code writing– how do we use that API? – how do we access that data structure?– Function Usage Pattern Miner
open(f)tmp = cnt = 0while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmpclose(f)
open(f)tmp = cnt = 0while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmpclose(f)
open(f)tmp = cnt = 0while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmpclose(f)
University of Maryland
Return Value Check Bug Returning error code and valid data
from a function is a common C idiom
int foo(){ … if( error ){
return error_code; } … return data;}
…value = foo();newPosition + = value; // ???
– the return value should be checked before being used
– lint checks for this error
This type of bug pattern has a high false positive rate– no error value returned– no useful return value Build a bug checker– improve its results with data from CVS
University of Maryland
Goal Which are most likely true errors
– where has the source code been changed to add such a check?
– look at each revision of each file in CVS– flag a function as involved in a return value
check in the CVS repository
Produce a ranking of the errors– group warnings by called function– rank functions that most likely need their
return value checked higher
value = foo();newPosition + = value; // ???
value = foo();if( value != Error) // Check newPosition + = value;
CVS commit
University of Maryland
HistoryAware Ranking
Split functions into two groups– flagged with a likely bug fix in a commit– not flagged with a likely bug fix in a commit
Rank by how often the function’s return value is checked in the latest version– current context
Flagged with likely bug fix in CVS
Not flagged with likely bug fix in CVS
Ranked by currentcontext data
Ranked by currentcontext data
0.99
0.10
0.99
0.51
University of Maryland
Case Studies
Does the HistoryAware ranking push likely bugs to the top?
Compare HistoryAware Ranking to Naïve Ranking– current context
Inspection criteria for warnings– functions flagged with a bug fix in a commit– functions with return value checked >50%
in current context
Apache web server1,129 C source files41,000 CVS commits
Wine, OSS Windows API3,099 C source files70,000 CVS commits
University of Maryland
Results - Apache
Warnings Likely Bugs False Positive Rate
CVS Bug Fix flagged functions 284 101 64%
Non-CVS Bug Fix flagged functions 283 70 75%
Total 567 171 70%
Precision
0
0.2
0.4
0.6
0.8
0 20 40 60 80 100 120Inspected Warnings
Naive Ranking
HistoryAware Ranking
Statistical Significance– Chi-square test finds the
difference between the false positive rate of the CVS bug fix flagged functions and functions check > 50% in the current context to be significant
University of Maryland
Results - Wine
Warnings Likely Bugs False Positive Rate
CVS Bug Fix flagged functions 778 260 67%
Non-CVS Bug Fix flagged functions 1537 285 81%
Total 2315 545 76%
Statistical Significance– Chi-square test finds the
difference between the false positive rate of the CVS bug fix flagged functions and functions check > 50% in the current context to be significant
Precision
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120 140 160
Inspected Warnings
Naive Ranking
HistoryAware Ranking
University of Maryland
Function Usage Pattern Miner
System specific rules that source code must follow
Function Usage Pattern– how functions are invoked with respect to
each other in the source code
Find new instances of patterns added to the source code
mdi = HeapAlloc(GetProcessHeap());if (!mdi) HeapFree(GetProcessHeap(), 0, cs);
HDC hdc = BeginPaint( hwnd, &ps );if( hdc ) DrawIcon( hdc, x, y, hIcon );EndPaint( hwnd, &ps );Called After Conditionally Called After
University of Maryland
Our Tool
Analyze each revision of each file– record instances of the function usage
patterns
Find new instances of the patterns– instances of a pattern in a revision of a file
where that instance was not found in the revision immediately prior
– per file, not per function
University of Maryland
Filtering
Lots of instances identified in the Wine software repository– 50 million
Preliminary filtering heuristic– only look at pairs of functions that are
separated by no more than 10 source lines• minimal control flow information
computed– many APIs contain functions that are called
in quick succession– error handling code is close to the error
producing function
University of Maryland
Transitive Patterns
called after may be a transitive pattern– only a binary pattern– allow larger patterns to be built
Patterns Identified1 2
3
4
5
6
– may need to add more context information
SelectObject called after BeginPaint
SetTextColor called after SelectObject
TextOutA called after SetTextColor
DeleteObject called after TextOutA
EndPaint called after DeleteObject
University of Maryland
Preliminary Case Study
Mined Wine CVS repository– 2,175 unique patterns added to the code 10
or more times– 65 unique patterns added 100 or more
times
Different categories of function pairs– Debug functionality– Heap management– Paired functionality – Error Handling
wine_tsx11_lock();XInternAtoms(thr_dis(), names, cnt, 0, atoms );wine_tsx11_unlock();
if (RegOpenKeyA(HKEY, name, &key)) { TRACE(message); RegCloseKey(key); SetLastError(NOT_FOUND);
University of Maryland
Called After Pattern
CategoryNew Instances
> 99 99 - 25
24 - 10
Debug 17 80 278
Heap 14 16 16
GUI 3 22 271
Paired Functionality
0 8 39
Error Handling
0 9 30
1,253 unique patterns added 10 or more times
wndClass.hCursor = LoadCursorA (0, (LPSTR)IDC_ARROW);RegisterClassA (&wndClass);
Obvious patterns– serves to validate our
results
Surprising patterns– point to interesting
relationships between functions
RtlDeleteCriticalSection(&det->waiters_count_lock);…HeapFree(GetProcessHeap(), 0, det);
University of Maryland
Conditionally Called After
922 unique patterns added 10 or more times
CategoryNew Instances
> 99 99 - 25
24 - 10
Debug 14 95 341
Heap 7 8 11
Paired Functionalit
y0 6 26
Error Handling
0 3 34
if (!(hModule = LoadLibraryExA(fileName, 0, LLDF))) WINE_ERR("LoadLibraryExA (%s) failed, %ld\n", fileName, GetLastError());
Error handling code– conditionally report
error– which functions need
errors handled
Debug code– conditionally call a
debug function
University of Maryland
RtlHeapFree Called After RtlHeapAlloc
Value: 8
dlls/kernel/heap.cdlls/ntdll/loader.c
University of Maryland
Future Work
Apply our tool to more projects Track removed usage patterns Better filtering heuristic
– control flow based– data flow based
How do we use the patterns we find?– documentation– feed patterns to static source
code checkers to find violations
hdc = BeginPaint( hwnd, &ps );if( hdc ) DrawIcon( hdc, x, y, hIcon );EndPaint( hwnd, &ps );
hdc = BeginPaint( hwnd, &ps );if( hdc ) DrawIcon( hdc, x, y, hIcon );EndPaint( hwnd, &ps );