+ All Categories
Home > Documents > Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer...

Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer...

Date post: 29-Dec-2015
Category:
Upload: ronald-gordon
View: 218 times
Download: 4 times
Share this document with a friend
Popular Tags:
50
Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University http://ase.csc.ncsu.edu/dmse/
Transcript
Page 1: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

Improving Programmer Productivity via Mining Program Source Code

Tao XieDepartment of Computer Science

North Carolina State University

http://ase.csc.ncsu.edu/dmse/

Page 2: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 2

Mining SE Data

• MAIN GOAL– Transform static record-

keeping SE data to active data

– Make SE data actionable by uncovering hidden patterns and trends

MailingsBugzilla

Code repository

Executiontraces

CVS

Page 3: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 3

Overview of Mining SE Data

code bases

change history

programstates

structuralentities

software engineering data

bugreports/nl

programming defect detection testing debugging maintenance

software engineering tasks helped by data mining

classificationassociation/

patternsclustering

data mining techniques

Page 4: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 4

Overview of Mining SE Data

code bases

change history

programstates

structuralentities

software engineering data

bugreports/nl …

99 ASE 00 ICSE05 FSE*2 ASE PLDI POPL OSDI06 PLDI OOPSLA KDD07 ICSE*3 FSE*3 ASE PLDI*2 ISSTA*2 KDD

04 ICSE05 FSE*206 ASE07 ICSE*2

99 ICSE02 ICSE03 PLDI05 FSE PLDI06 ISSTA07 ISSTA

99 FSE 01 ICSE FSE02 ISSTA POPL KDD03 PLDI04 ASE ISSTA05 ICSE ASE 06 ICSE FSE*207 PLDI

03 ICSE06 ICSE06 ASE07 ICSE SOSP

Page 5: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 5

Overview of Mining SE Data

code bases

change history

programstates

structuralentities

software engineering data

bugreports/nl

programming defect detection testing debugging maintenance

software engineering tasks helped by data mining

classificationassociation/

patternsclustering

data mining techniques

Page 6: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 6

Overview of Mining SE Data

programming defect detection testing debugging maintenance

software engineering tasks helped by data mining

99 ASE00 ICSE05 FSE PLDI POPL06 FSE OOPSLA PLDI07 FSE ASE ISSTA KDD

01 SOSP04 OSDI05 FSE*206 ICSE*207 ICSE*2 FSE*2 ISSTA PLDI*2 SOSP

99 ICSE01 ICSE*2 FSE02 ICSE ISSTA POPL04 ISSTA06 ISSTA

03 ICSE PLDI*2 05 ICSE FSE ASE PLDI06 ICSE FSE07 ICSE ISSTA PLDI

02 KDD04 ICSE ASE05 FSE ASE*206 KDD07 ICSE*3

Page 7: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 7

Overview of Mining SE Data

code bases

change history

programstates

structuralentities

software engineering data

bugreports/nl

programming defect detection testing debugging maintenance

software engineering tasks helped by data mining

classificationassociation/

patternsclustering

data mining techniques

Page 8: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 8

Sample Projects on Mining Program Source Code

Data Algorithms TasksSet of functions, variables, etc. in a C function

FrequentItemset

Programming-rules-related bug finding UIUC [FSE 05]

Statement seq in a basic block in C

Frequent subsequence Copy-paste bug finding

UIUC [OSDI 04]Methods seq in a Java method from code search engine

Frequent subsequence API usage patterns

NCSU [MSR 06] Function seq in whole C program

Frequent partial order

API usage patterns/properties

NCSU [FSE 07] System dependence graph in whole C program

Frequent subgraph

Neglected-condition bug finding CASE [ISSTA 07]

Java API method signatures

Plan generation API Jungloids Berkeley [PLDI 05]

Method seq in a Java method from code search engine

Frequent sequences

API Jungloids NCSU [ASE 07]

Page 9: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 9

Some Recent Trends

• Data: dynamic execution data +static code bases

• Task: productivity (programming) + quality (defect detection, testing, debugging)

• Mining algorithm: simple ones (association rule) + frequent itemset/subsequence/ partial order/subgraph

• Data scope: local repositories public repositories with code search engines

Page 10: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 10

Sample Projects on Mining Program Source Code

Data Algorithms TasksSet of functions, variables, etc. in a C function

Frequentitemset

Programming-rules-related bug finding UIUC [FSE 05]

Statement seq in a basic block in C

Frequent subsequence Copy-paste bug finding

UIUC [OSDI 04]Methods seq in a Java method from code search engine

Frequent subsequence API usage patterns

NCSU [MSR 06] Function seq in whole C program

Frequent partial order

API usage patterns/properties

NCSU [FSE 07] System dependence graph in whole C program

Frequent subgraph

Neglected-condition bug finding CASE [ISSTA 07]

Java API method signatures

Plan generation API Jungloids Berkeley [PLDI 05]

Method seq in a Java method from code search engine

Frequent sequences

API Jungloids NCSU [ASE 07]

Page 11: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 11

Mining API Usage Patterns

• How should an API be used correctly?– An API may serve multiple functionalities– Different styles of API usage

• MAPO: “I know what method call I need, but I don’t know how to write code before and after this method call” [Xie&Pei MSR 06]

Page 12: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 12

Example Task -- MAPO

• “instrument the bytecode of a Java class by adding an extra method to the class”– org.apache.bcel.generic.ClassGen public void addMethod(Method m)

Page 13: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 13

First Try: ClassGen Java API Doc

addMethod

public void addMethod(Method m) Add a method to this class.

Parameters:

m - method to add

Page 14: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 14

Second Try: Code Search Engine

Page 15: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 15

MAPO Approach

• Analyze code segments relevant to a given API and disclose the inherent usage patterns– Input: an API characterized by a method, class, or

package– Code search engine: used to search relevant source

files from open source repositories – Frequent sequence miner: use BIDE [Wang&Han 04] to

mine closed sequential patterns from extracted method-call sequences

– Output: a short list of frequent API usage patterns related to the API

Page 16: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 16

Sequence Extraction

• Method sequences: extracted from Java source files returned from code search engines

public void generateStubMethod(ClassGen c) InstructionList il = new InstructionList(); MethodGen m= genFromISList(il); m.setMaxLocals(); m.setMaxStack(); c.addMethod(m.getMethod()); System.out.println(“…”); …

}

Call sequenceSource code

InstructionList.<init>()

genFromISList(InstructionList)

MethodGen.setMaxStack()

MethodGen.setMaxLocals()

MethodGen.getMethod()

ClassGen.addMethod(Method)PrintStream.println(String) …

Page 17: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 17

Sequence Preprocessing

• Remove common Java library calls

• Inline callees of the same class• Remove sequences that contain no query

words: ClassGen and addMethod

InstructionList.<init>()

genFromISList(InstructionList)

MethodGen.setMaxStack()

MethodGen.setMaxLocals()

MethodGen.getMethod()

ClassGen.addMethod(Method)PrintStream.println(String) …

public void generateStubMethod(ClassGen c) InstructionList il = new InstructionList(); MethodGen m= genFromISList(il); m.setMaxLocals(); m.setMaxStack(); c.addMethod(m.getMethod()); System.out.println(“…”); …

}

Page 18: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 18

Frequent Seq Postprocessing

• Remove sequences that contain no query words: ClassGen and addMethod

• Compress consecutive calls of the same method into one, e.g., abbba aba

• Remove duplicate frequent sequences after the compression, e.g., aba, aba aba

• Reduce a seq if it is a subseq of another, e.g., aba, abab abab

Page 19: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 19

Tool Architecture

e.g. koders.com

Page 20: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 20

Sample Mined API Sequence

InstructionList.<init>()

InstructionFactory.createLoad(Type, int)

InstructionList.append(Instruction)

InstructionFactory.createReturn(Type)

InstructionList.append(Instruction)

MethodGen.setMaxStack()

MethodGen.setMaxLocals()

MethodGen.getMethod()

ClassGen.addMethod(Method)

InstructionList.dispose()

Page 21: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 21

Sample Projects on Mining Program Source Code

Data Algorithms TasksSet of functions, variables, etc. in a C function

Frequentitemset

Programming-rules-related bug finding UIUC [FSE 05]

Statement seq in a basic block in C

Frequent subsequence Copy-paste bug finding

UIUC [OSDI 04]Methods seq in a Java method from code search engine

Frequent subsequence API usage patterns

NCSU [MSR 06] Function seq in whole C program

Frequent partial order

API usage patterns/properties

NCSU [FSE 07] System dependence graph in whole C program

Frequent subgraph

Neglected-condition bug finding CASE [ISSTA 07]

Java API method signatures

Plan generation API Jungloids Berkeley [PLDI 05]

Method seq in a Java method from code search engine

Frequent sequences

API Jungloids NCSU [ASE 07]

Page 22: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 22

Mining API Usage Patterns

• MAPO: “I know what method call I need, but I don’t know how to write code before and after this method call” [Xie&Pei MSR 06]

• Apiartor: “I know what possible set of APIs I need, but I don’t know what need to be used and what orders to use” [Acharya et al. FSE 07]

Page 23: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 23

Usage Patterns as Partial Order#include <abcdef.h>void p ( ) { b ( ); c ( ); }void q ( ) { c ( ); b ( ); }void r ( ) { e ( ); f ( ); }void s ( ) { f ( ); e ( ); }

int main ( ) { int i, j, k; a ( ); if ( i == 1) { f ( ); e ( ); c ( ); exit ( ); } else { if ( j == 1 ) p ( ); else q ( ); d ( ); if ( k == 1 ) r ( ); else s ( ); } }

1 a f e c2 a b c d e f3 a c b d e f4 a b c d f e5 a c b d f e

a

d

c

e

b

f

a b d e a b d fa c d ea c d f

(b) Static program traces

(c) Frequent subseq patterns

(d) Frequent partial order R(a) Example code

Page 24: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 24

Apiartor Overview

User-specified

APIs

Trigger Generator

Triggers

Model Checker

Traces

Scenario Extractor

Independent Scenarios

Miner

Partial Orders

Source Code

Specification Extractor

Specifications

FrequentUsage

Scenarios

Rel

ated

AP

Is

Trace Generator

Page 25: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 25

Example Partial Orders

XOpenDisplay

XCloseDisplay

XCreateWindow

XGetWindowAttributes

XCreateGC

XSetForeground

XGetBackground

XMapWindow

XChageWindowAttributes

XMapWindow

XSelectInput

XGetAtomName

XFreeGC

XNextEvent

A usage scenario around XOpenDisplay API as apartial order.

Specifications are shown with dotted lines.

Page 26: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 26

Sample Projects on Mining Program Source Code

Data Algorithms TasksSet of functions, variables, etc. in a C function

Frequentitemset

Programming-rules-related bug finding UIUC [FSE 05]

Statement seq in a basic block in C

Frequent subsequence Copy-paste bug finding

UIUC [OSDI 04]Methods seq in a Java method from code search engine

Frequent subsequence API usage patterns

NCSU [MSR 06] Function seq in whole C program

Frequent partial order

API usage patterns/properties

NCSU [FSE 07] System dependence graph in whole C program

Frequent subgraph

Neglected-condition bug finding CASE [ISSTA 07]

Java API method signatures

Plan generation API Jungloids Berkeley [PLDI 05]

Method seq in a Java method from code search engine

Frequent sequences

API Jungloids NCSU [ASE 07]

Page 27: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 27

Mining API Usage Patterns

• MAPO: “I know what method call I need, but I don’t know how to write code before and after this method call” [Xie&Pei MSR 06]

• Apiartor: “I know what possible set of APIs I need, but I don’t know what need to be used and what orders to use” [Acharya et al. FSE 07]

• PARSEWeb: “I know what type of object I need, but I don’t know how to write the code to get the object” [Thummalapenta&Xie ASE 07]

Page 28: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code

Example Task - OpenJMS

• Query: “javax.jms.QueueConnectionFactory ->

javax.jms.QueueSender”• PARSEWeb Solution:FileName:0_UserBean.java MethodName:ingest Rank:1 NumberOfOccurrences:23

Confidence:True Path: 1 2 3

javax.jms.QueueConnectionFactory,createQueueConnection() ReturnType:javax.jms.QueueConnection

javax.jms.QueueConnection,createQueueSession(boolean,javax.jms.Session.AUTO ACKNOWLEDGE) ReturnType:javax.jms.QueueSession

javax.jms.QueueSession,createSender(javax.jms.Queue)

ReturnType:javax.jms.QueueSender

Sun Java Message Services API Spec

Page 29: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 29

PARSEWeb Overview

Code Downloader

Code Search Engine

Open Source Repositories

Local SourceCode Repository

Code Analyzer

MethodInvocationSequences

SequenceMiner

ClusteredMethod Invocation

Sequences

QuerySplitter

Final MethodInvocationSequences

Query

Page 30: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 30

PARSEWeb Overview

Code Downloader

Code Search Engine

Open Source Repositories

Local SourceCode Repository

Code Analyzer

MethodInvocationSequences

SequenceMiner

ClusteredMethod Invocation

Sequences

QuerySplitter

Final MethodInvocationSequences

Query

Page 31: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 31

Code Analyzer

• Collect [Source Destination] method sequences invoked by each public method– Deal with local method calls by inlining methods– Deal with conditionals/loops by traversing

control flow graphs

• Resolve types in sequences– Challenges: downloaded files are partial– Solutions: heuristics are developed

Page 32: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 32

Type Heuristics

• Heuristic 1: The return type of a method-invocation statement contained in an initialization expression is same as the type of the declared variable.

e.g., QueueConnection connect; QueueSession session = connect.createQueueSession(false,int)

• Heuristic 2: The return type of an outer most method-invocation contained in a return statement is same as the return type of the enclosing method declaration.

e.g., public int test(){

...return connect.createQueueSession(false,int);

}

Page 33: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 33

PARSEWeb Overview

Code Downloader

Code Search Engine

Open Source Repositories

Local SourceCode Repository

Code Analyzer

MethodInvocationSequences

SequenceMiner

ClusteredMethod Invocation

Sequences

QuerySplitter

Final MethodInvocationSequences

Query

Page 34: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 34

Sequence Miner

• Candidate sequences produced by the code analyzer may be too many

Solutions:

• Cluster similar sequences– Clustering heuristics are developed

• Rank sequences– Ranking heuristics are developed

Page 35: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 35

Clustering Heuristics

• Heuristic 1: Method-invocation sequences with the same set of statements can be considered similar, although the statements are in different order.e.g., ''2 3 4 5'' and ''2 4 3 5 ''

• Heuristic 2: Method-invocation sequences differing by given cluster precision value can be considered similar.e.g., ''8 9 6 7'' and ''8 6 10 7 '' can be considered similar under cluster precision value one.

Page 36: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 36

Ranking Heuristics

• Heuristic 1: Higher frequency -> Higher rank

• Heuristic 2: Shorter length -> Higher rank

Page 37: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 37

PARSEWeb Overview

Code Downloader

Code Search Engine

Open Source Repositories

Local SourceCode Repository

Code Analyzer

MethodInvocationSequences

SequenceMiner

ClusteredMethod Invocation

Sequences

QuerySplitter

Final MethodInvocationSequences

Query

Page 38: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code

Query Splitter

• Lack of code samples in the results of code search engines– Code samples are split among different files

Solution:• Split the user query into multiple queries• Compose the results for each split query

Page 39: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code

Query Splitting Example1. User query: “org.eclipse.jface.viewers.IStructuredSelection->java.io.ObjectInputStream”

Results: None

2. Query: “java.io.ObjectInputStream”

Results: 3.

Most used sources are: java.io.InputStream, java.io.ByteArrayInputStream, java.io.FileInputStream

3. Three Queries to be fired:

“org.eclipse.jface.viewers.IStructuredSelection-> java.io.InputStream”

Results: 1

“org.eclipse.jface.viewers.IStructuredSelection-> java.io.ByteArrayInputStream”

Results: 5

“org.eclipse.jface.viewers.IStructuredSelection-> java.io.FileInputStream”

Results: None

Page 40: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 40

Eclipse Plugin

Page 41: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code

Evaluations• Real Programming Problems: To address problems posted

in developer forums.

• Real Projects: To show that solutions recommended by PARSEWeb are – available in real projects – better than solutions recommended by related tools PROSPECTOR,

Strathcona, Google Code Search Engine averagely

Page 42: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code

Jakarta BCEL User Forum

• Jakarta BCEL user forum, 2001

Problem: “How to disassemble java byte code”

Query: “Code Instruction”

Solution Sample Code: Code code;

InstructionList il = new InstructionList(code.getCode());

Instruction[] ins = il.getInstructions();

Page 43: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code

Dev2Dev Newsgroups• Dev 2 Dev Newsgroups, 2006

Problem: “how to connect db by sesseionBean”

Query: javax.naming.InitialContext java.sql.Connection

Solution Sequence: FileName:3 AddressBean.java MethodName:getNextUniqueKey Rank:1

NumberOfOccurrences:34javax.naming.InitialContext,lookup(java.lang.String)

ReturnType:javax.sql.DataSourcejavax.sql.DataSource,getConnection()

ReturnType:java.sql.Connection

Page 44: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code

Challenges in Mining Code• Sometimes too few data samples

– Scalability is usually not an issue– Static code bases vs. change histories

• Data preparation/preprocessing– Related to traditional program analysis

• Pattern postprocessing (filtering and ranking)– Heuristics play important roles

• Demand-driven mining vs. any gold mining– Programming vs. bug finding

Page 45: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code

Conclusion• Mining various types of software engineering data

to aid software engineering task

• Mining program source code to improve programmer productivity– MAPO: mining API usage patterns for a given API– Apiartor: mining API usage patterns for a given set of

APIs– PARSEWeb: mining API usage patterns for input-

output-type quries

Page 46: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

Questions?

Mining Software Engineering Data Bibliography http://ase.csc.ncsu.edu/dmse/•What software engineering tasks can be helped by data mining?•What kinds of software engineering data can be mined?•How are data mining techniques used in software engineering?•Resources

Page 47: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 47

Demand-Driven Or Not

Any-gold mining

Demand-driven mining

Examples DynaMine, … MAPO, BugTriage, …

Advantages Surface up only cases that are applicable

Exploit demands to filter out irrelevant information

Issues How much gold is good enough given the amount of data to be mined?

How high percentage of cases would work well?

Page 48: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 48

Code vs. Non-Code

Code/Programming Langs

Non-Code/Natural Langs

Examples MAPO, DynaMine, … BugTriage, CVS/Code comments, emails, docs

Advantages Relatively stable and consistent representation

Common source of capturing programmers’ intentions

Issues What project/context-specific heuristics to use?

Page 49: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 49

Static vs. Dynamic

Static Data: code bases, change histories

Dynamic Data: prog states, structural profiles

Examples MAPO, DynaMine, … Spec discovery, …

Advantages No need to set up exec environment;

More scalable

More-precise info

Issues How to reduce false positives?

How to reduce false negatives?

Where tests come from?

Page 50: Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University

T. Xie Mining Program Source Code 50

Snapshot vs. Changes

Code snapshot Code change history

Examples MAPO, … DynaMine, …

Advantages Larger amount of available data

Revision transactions encode more-focused entity relationships

Issues How to group CVS changes into transactions?


Recommended