+ All Categories
Home > Documents > University of Maryland Mining Source Code Change History for Program Understanding Chadd Williams.

University of Maryland Mining Source Code Change History for Program Understanding Chadd Williams.

Date post: 14-Dec-2015
Category:
Upload: pearl-mitchell
View: 218 times
Download: 0 times
Share this document with a friend
21
University of Maryland Mining Source Code Change History for Program Understanding Chadd Williams
Transcript

University of Maryland

Mining Source Code Change Historyfor Program Understanding

Chadd Williams

University of Maryland

Problem How much do you know about your

10 year old code base?– What types of bugs have been most

common? Implicit rules build up over time

– What do you do with a return value from a function?

– Didn’t someone rewrite the matrix objects?• how do you apply a transformation to an

image now? Failure understand implicit rules

leads to bugs– 32% of bugs detected during maintenance1[1] Matsumura, T., Monden, A., Matsumoto, K., The Detection of Faulty Code Violating Implicit Coding Rules, IWPSE ’02

University of Maryland

Source Code Change History

We can discover important properties of the code by looking at code changes– every change is committed– changes highlight misunderstood code– changes highlight new code

Studying each commit gives fine-grain knowledge– how quickly does a property emerge?– how fast is a property adopted?– how often is it used later?

University of Maryland

Applications

Bug finding– what types of bugs have been fixed in the

past?– what functions were involved?– Return Value Check Bug Finder

Code writing– how do we use that API? – how do we access that data structure?– Function Usage Pattern Miner

open(f)tmp = cnt = 0while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmpclose(f)

open(f)tmp = cnt = 0while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmpclose(f)

open(f)tmp = cnt = 0while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmpclose(f)

University of Maryland

Return Value Check Bug Returning error code and valid data

from a function is a common C idiom

int foo(){ … if( error ){

return error_code; } … return data;}

…value = foo();newPosition + = value; // ???

– the return value should be checked before being used

– lint checks for this error

This type of bug pattern has a high false positive rate– no error value returned– no useful return value Build a bug checker– improve its results with data from CVS

University of Maryland

Goal Which are most likely true errors

– where has the source code been changed to add such a check?

– look at each revision of each file in CVS– flag a function as involved in a return value

check in the CVS repository

Produce a ranking of the errors– group warnings by called function– rank functions that most likely need their

return value checked higher

value = foo();newPosition + = value; // ???

value = foo();if( value != Error) // Check newPosition + = value;

CVS commit

University of Maryland

HistoryAware Ranking

Split functions into two groups– flagged with a likely bug fix in a commit– not flagged with a likely bug fix in a commit

Rank by how often the function’s return value is checked in the latest version– current context

Flagged with likely bug fix in CVS

Not flagged with likely bug fix in CVS

Ranked by currentcontext data

Ranked by currentcontext data

0.99

0.10

0.99

0.51

University of Maryland

Case Studies

Does the HistoryAware ranking push likely bugs to the top?

Compare HistoryAware Ranking to Naïve Ranking– current context

Inspection criteria for warnings– functions flagged with a bug fix in a commit– functions with return value checked >50%

in current context

Apache web server1,129 C source files41,000 CVS commits

Wine, OSS Windows API3,099 C source files70,000 CVS commits

University of Maryland

Results - Apache

Warnings Likely Bugs False Positive Rate

CVS Bug Fix flagged functions 284 101 64%

Non-CVS Bug Fix flagged functions 283 70 75%

Total 567 171 70%

Precision

0

0.2

0.4

0.6

0.8

0 20 40 60 80 100 120Inspected Warnings

Naive Ranking

HistoryAware Ranking

Statistical Significance– Chi-square test finds the

difference between the false positive rate of the CVS bug fix flagged functions and functions check > 50% in the current context to be significant

University of Maryland

Results - Wine

Warnings Likely Bugs False Positive Rate

CVS Bug Fix flagged functions 778 260 67%

Non-CVS Bug Fix flagged functions 1537 285 81%

Total 2315 545 76%

Statistical Significance– Chi-square test finds the

difference between the false positive rate of the CVS bug fix flagged functions and functions check > 50% in the current context to be significant

Precision

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160

Inspected Warnings

Naive Ranking

HistoryAware Ranking

University of Maryland

Function Usage Pattern Miner

System specific rules that source code must follow

Function Usage Pattern– how functions are invoked with respect to

each other in the source code

Find new instances of patterns added to the source code

mdi = HeapAlloc(GetProcessHeap());if (!mdi) HeapFree(GetProcessHeap(), 0, cs);

HDC hdc = BeginPaint( hwnd, &ps );if( hdc ) DrawIcon( hdc, x, y, hIcon );EndPaint( hwnd, &ps );Called After Conditionally Called After

University of Maryland

Our Tool

Analyze each revision of each file– record instances of the function usage

patterns

Find new instances of the patterns– instances of a pattern in a revision of a file

where that instance was not found in the revision immediately prior

– per file, not per function

University of Maryland

Filtering

Lots of instances identified in the Wine software repository– 50 million

Preliminary filtering heuristic– only look at pairs of functions that are

separated by no more than 10 source lines• minimal control flow information

computed– many APIs contain functions that are called

in quick succession– error handling code is close to the error

producing function

University of Maryland

Transitive Patterns

called after may be a transitive pattern– only a binary pattern– allow larger patterns to be built

Patterns Identified1 2

3

4

5

6

– may need to add more context information

SelectObject called after BeginPaint

SetTextColor called after SelectObject

TextOutA called after SetTextColor

DeleteObject called after TextOutA

EndPaint called after DeleteObject

University of Maryland

Preliminary Case Study

Mined Wine CVS repository– 2,175 unique patterns added to the code 10

or more times– 65 unique patterns added 100 or more

times

Different categories of function pairs– Debug functionality– Heap management– Paired functionality – Error Handling

wine_tsx11_lock();XInternAtoms(thr_dis(), names, cnt, 0, atoms );wine_tsx11_unlock();

if (RegOpenKeyA(HKEY, name, &key)) { TRACE(message); RegCloseKey(key); SetLastError(NOT_FOUND);

University of Maryland

Called After Pattern

CategoryNew Instances

> 99 99 - 25

24 - 10

Debug 17 80 278

Heap 14 16 16

GUI 3 22 271

Paired Functionality

0 8 39

Error Handling

0 9 30

1,253 unique patterns added 10 or more times

wndClass.hCursor = LoadCursorA (0, (LPSTR)IDC_ARROW);RegisterClassA (&wndClass);

Obvious patterns– serves to validate our

results

Surprising patterns– point to interesting

relationships between functions

RtlDeleteCriticalSection(&det->waiters_count_lock);…HeapFree(GetProcessHeap(), 0, det);

University of Maryland

Conditionally Called After

922 unique patterns added 10 or more times

CategoryNew Instances

> 99 99 - 25

24 - 10

Debug 14 95 341

Heap 7 8 11

Paired Functionalit

y0 6 26

Error Handling

0 3 34

if (!(hModule = LoadLibraryExA(fileName, 0, LLDF))) WINE_ERR("LoadLibraryExA (%s) failed, %ld\n", fileName, GetLastError());

Error handling code– conditionally report

error– which functions need

errors handled

Debug code– conditionally call a

debug function

University of Maryland

University of Maryland

RtlHeapFree Called After RtlHeapAlloc

Value: 8

dlls/kernel/heap.cdlls/ntdll/loader.c

University of Maryland

Future Work

Apply our tool to more projects Track removed usage patterns Better filtering heuristic

– control flow based– data flow based

How do we use the patterns we find?– documentation– feed patterns to static source

code checkers to find violations

hdc = BeginPaint( hwnd, &ps );if( hdc ) DrawIcon( hdc, x, y, hIcon );EndPaint( hwnd, &ps );

hdc = BeginPaint( hwnd, &ps );if( hdc ) DrawIcon( hdc, x, y, hIcon );EndPaint( hwnd, &ps );

University of Maryland

Demo Demo of the visualization tool

tomorrow


Recommended