+ All Categories
Home > Documents > Presentation 7

Presentation 7

Date post: 01-Jan-2016
Category:
Upload: portia-carlson
View: 13 times
Download: 2 times
Share this document with a friend
Description:
Presentation 7. Cross Language Clone Analysis Team 2 November 22, 2010. Agenda. Feasibility Study Release Plan Architecture Parsing CodeDOM Clone Analysis Testing Demonstration Team Collaboration Path Forward. Our Team. Allen Tucker Patricia Bradford Greg Rodgers Brian Bentley - PowerPoint PPT Presentation
Popular Tags:
153
Presentation 7 Cross Language Clone Analysis Team 2 November 22, 2010
Transcript
Page 1: Presentation 7

Presentation 7Cross Language Clone Analysis

Team 2November 22, 2010

Page 2: Presentation 7

• Feasibility Study• Release Plan• Architecture• Parsing• CodeDOM• Clone Analysis• Testing• Demonstration• Team Collaboration• Path Forward

Agenda

2

Page 3: Presentation 7

Allen Tucker Patricia Bradford Greg Rodgers Brian Bentley Ashley Chafin

Our Team

3

Page 4: Presentation 7

Feasibility StudyOur evaluation of the project to determine the difficulty in carrying out the task.

4

Page 5: Presentation 7

Our Customers: Dr. Etzkorn and Dr. Kraft Customer Request:

◦ A tool that will abstract programs in C++, C#, Java, and (Python or VB) to the Dagstuhl Middle Metamodel, Microsoft CodeDOM or something similar, and detect cross-language clones.

Areas to Note: ◦ the user interface◦ easy comparisons of clones◦ visualization of clones◦ sub-clones◦ clone detection for large bodies of code

Task Summary

5

Page 6: Presentation 7

Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model.

Some Language Independent Object Models:◦ Dagstuhl Middle Metamodel (DMM)◦ Microsoft CodeDOM

Both of these models provide a language independent object model for representing the structure of source code.

Task Summary (cont.)

6

Page 7: Presentation 7

Detecting clones across multiple programming languages is on the cutting edge of research.

A preliminary version of this was done by Dr. Kraft and his students for C# and VB.◦ They compared the Mono C# parser (written in C#) to the

Mono VB parser (written in VB).◦ Publication:

Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59

Related Research

7

Page 8: Presentation 7

Three Step Process• Step 1 Code Translation

• Step 2 Clone Detection

• Step 3 Visualization

Task Understanding

Source Files

TranslatorCommon

Model

Common Model

InspectorDetected Clones

Detected Clones

UIClone

Visualization

8

Page 9: Presentation 7

Step 1: Code Translation◦ C#, C++, Java, VB (or Python)◦ CodeDOM

Step 2: Clone Detection◦ Leverage current clone detection techniques and

research

Step 3: Clone Visualization◦ Need for an intuitive user interface

Task Understanding (cont.)

9

Page 10: Presentation 7

Clone Detection as a Product

Commercial Product

What are the benefits of Software Clone Detection?

Main Goal: Decrease Coding Errors (bugs)

10

Page 11: Presentation 7

Benefits Fact: Modularity is a key characteristic in

today’s software world

Why? Allows us to divide software into a decomposed separation of concerns◦ Attributes to maintainability, reusability, testability

and reliability

Clone Detection allows us to detect common software spread across large bodies of code◦ Identify code that is subject to further modularity

11

Page 12: Presentation 7

Benefits (cont) But not all code can be cleanly decomposed

Crosscutting Concerns◦ Responsible for tangling and scattering (code

duplication) an implementation

Logging◦ Scattered across Unrelated Functions

How do you Manage large areas of (usually) Duplicated Crosscuts?◦ Errors, Changes

12

Page 13: Presentation 7

Benefits (cont) Aspect Oriented Programming

◦ Modularize Crosscuts using Advice and Join Points◦ Example: Spring Framework

Identifying Aspects (crosscuts)◦ Time Consuming task

Use Clone Detection to Identify Aspects◦ Define Rule

13

Page 14: Presentation 7

Benefits (cont) Summarize

What?◦ Detect code that is a candidate for modularity◦ Identify Crosscuts in modules

Am I a candidate for ASP?

How?◦ Continuous Integration

Generate Reports every time new code is added

14

Page 15: Presentation 7

Features Clone Detection Software Suite

◦ Identifies◦ Tracks◦ Manages Software Clones

Multi-language support◦ C++◦ C#◦ Java

15

Page 16: Presentation 7

Features (cont)

Provides complete code coverage

Multi-Application Support◦ Stand-alone◦ Plug-in based (Eclipse)◦ Backend service (Ant task)

16

Page 17: Presentation 7

Features (cont) Extendible

◦ Built on a Plug-in Framework◦ Add new languages

Easy to Navigate between Clones

Persists Clones for easy Retrieval

17

Page 18: Presentation 7

Designing to meet user needs

◦ User center approach Need for an intuitive user interface Clone Visualization techniques

Human Factors

18

Page 19: Presentation 7

Intellectual PropertyThe University of Alabama in Huntsville would own and manage any and all intellectual property associated with the research and developmental artifacts of this project.

19

Page 20: Presentation 7

Project and Development Issues Fast, Good, and

Cheap…choose two.◦ Fast…time required

to deliver products◦ Good…quality of

product◦ Cheap…cost of

designing and building

20

Page 21: Presentation 7

Complexity of problem proves more difficult than initial estimates.

Technology to be applied is neither well-established or has yet to be developed.

Unable to complete defined project scope within schedule.

Volatile user requirements leading to redefinition of project objectives.

Risk Analysis

21

Page 22: Presentation 7

Our initial approach…maximize existing open sourced developed items in order to reduce project timeline.

◦ Instability in harvested projects.

◦ Lack of support…documentation, forums, etc.

◦ Disjoint projects code bases.

◦ Non-existing code bases to harvest from.

Project Scale-Down Factors

22

Page 23: Presentation 7

Release PlanRelease Plan and User Stories

23

Page 24: Presentation 7

User Story Approach User Stories Applied…Mike Cohn suggested

formal approach◦ As a (role) I want (something) so that (benefit).

Quality Attributes◦ Independent◦ Negotiable◦ Valuable to user or customers◦ Estimatable◦ Small◦ Testable

24

Page 25: Presentation 7

Came out with original Release Plan on 9/15/20

Due to customer wants/needs, we had to re-tool our user stories.

Dr. Etzkorn’s main concerns: Load source code and translate to a language

independent model Analyze the translated source code for clones

◦ Results from meeting: Created two new user stories (see next two slides) These two user stories have been pushed to the front

of our card stack

Re-tooled User Stories

25

Page 26: Presentation 7

Analysis

Story ID Story Title Priority (1 - 10) Estimate (Days) Score17 Source Code Load & Translate 1 14 1418 Source Code Analyze 1 14 14

2 Code Clone Highlights 1 14 1413 Auto-Navigate 2 7 14

3 Visual Reports 1 21 2114 Clone Density Graph 1 21 21

1 Project Management 10 5 509 Source Code Association 11 5 555 Analysis Options 3 20 608 Clone Categorization 5 14 704 False Positive Identification 7 14 98

10 Project Language Auto-Detection 8 14 1127 Development Environment Integration 4 30 120

12 Project History 6 21 12611 Detection Updates 9 21 18915 Interactive Help 10 21 210

6 Build Environment Integration 10 30 300286

Note: Task 9 should be skipped….customer indicated that this feature brough no value to the project.

26

Page 27: Presentation 7

CS 666 Studio I User Stories

Phase I

Page 28: Presentation 7

Summary ~ 68 remaining development days

Focus on top 3 user stories

Focus on Translation and Analysis

28

Page 29: Presentation 7

Story ID:

Priority:

Estimate:

017

1

14 Days

29

As an analyst I want the to load and translate my source code projects so I can analyze the source for clones.

Source Code Load & Translate

Page 30: Presentation 7

Story ID:

Priority:

Estimate:

018

1

14 Days

30

As an analyst I want the to analyze my source code projects so I can see the clones.

Source Code Analyze

Page 31: Presentation 7

Story ID:

Priority:

Estimate:

002

1

14 Days

31

As a analyst I want the capability to have the source code associated with clones highlighted within source files so that they are easy to identify.

Code Clone Highlights

Page 32: Presentation 7

CS 668 Software Studio II

Phase II

Page 33: Presentation 7

Summary ~ 80 development days

Focus on next 5 user stories

Focus on analysis capabilities

33

Page 34: Presentation 7

Story ID:

Priority:

Estimate:

013

2

7 Days

34

As a developer I want the capability to auto-browse to the code segment associated with a clone so I do not have to manually search for it.

Auto-Navigate

Page 35: Presentation 7

Story ID:

Priority:

Estimate:

003

1

21 Days

35

As a analyst I want the capability to generate reports on clones within projects in a number of formats (e.g. html, cvs, etc.) so that I can include them in presentations.

Visual Reports

Page 36: Presentation 7

Story ID:

Priority:

Estimate:

014

1

21 Days

36

As an analyst I want the capability to have a projects clone density reported in a graph form so I can visually see the distribution of detected clones within a project.

Clone Density Graph

Page 37: Presentation 7

Story ID:

Priority:

Estimate:

001

10

5 Days

37

As a analyst I want the capability to load and manage multiple projects within the application so that I can perform analysis on them at various times without having to reload them.

Project Management

Page 38: Presentation 7

Story ID:

Priority:

Estimate:

005

3

20 Days

38

As a analyst I want the capability to view summary analysis data (e.g. clones per file, package, projects, etc.) so that I can identify the distribution of clones within a project.

Analysis Options

Page 39: Presentation 7

Follow-On WorkFuture Capabilities

Page 40: Presentation 7

Story ID:

Priority:

Estimate:

010

8

14 Days

40

As an analyst I want the capability to have the language of a source code project auto-detected so I do not have to define it.

Project Language Auto-Detection

Page 41: Presentation 7

Story ID:

Priority:

Estimate:

008

5

14 Days

41

As an analyst I want the capability to have the detected clones categorized by a number of criteria (e.g. type, priority, etc.) so that work prioritization can be established.

Clone Categorization

Page 42: Presentation 7

Story ID:

Priority:

Estimate:

004

7

14 Days

42

As a analyst I want the capability to label a prospective clone as a false positive so that it will be ignored in analysis and reports.

False Positive Identification

Page 43: Presentation 7

Story ID:

Priority:

Estimate:

007

4

30 Days

43

As a developer I want the capability to integrate the clone detection tool directly into my development environment (e.g. eclipse, netbeans, visual studio, etc.) so that I have a single application with all development tools integrated.

Development Environment Integration

Page 44: Presentation 7

Story ID:

Priority:

Estimate:

012

6

21 Days

44

As an analyst I want the capability to see project change history (e.g. initial project, xx clones found, clone id yyy removed, project updated, xx new clones found, etc.) so I can assess the impact of code changes within a project.

Project History

Page 45: Presentation 7

Story ID:

Priority:

Estimate:

011

9

21 Days

45

As an analyst I want the capability to update a projects associated source code and the tool to detect these changes and offer a detection re-do so I can make corrections to clones and see resolutions in action.

Detection Updates

Page 46: Presentation 7

Story ID:

Priority:

Estimate:

015

10

21 Days

46

As a general user I want an interactive help system with context sensitive search so I can learn the system with ease.

Interactive Help

Page 47: Presentation 7

Story ID:

Priority:

Estimate:

006

10

30 Days

47

As a configuration manager I want the capability to integrate clone detection into an automated build environment (e.g. ant, nmake, msbuild, etc.) so that I can view reports on a code projects as they are built.

Build Environment Integration

Page 48: Presentation 7

Dropped User StoriesCut By Customer

Page 49: Presentation 7

Story ID:

Priority:

Estimate:

009

11

5 Days

49

As an analyst I want the capability to retain or not to retain the associated source code with a project so I can reduce my project size footprint.

Source Code Association

Customer priority of 11 (Normal range is 1 – 10)…indicated would cut from scope.

Page 50: Presentation 7

Current TasksRequirements & Models

50

Page 51: Presentation 7

Requirements modeling for the first user story “Source Code Load & Translate”:◦ Load & parse C#, Java, C++ source code.◦ Translate the parsed C#, Java, C++ source code

to CodeDOM.◦ Associate the CodeDOM to the original source

code. Requirements modeling for the second user

story “Source Code Analyze”:◦ Analyze CodeDom for clones.

Current Tasks’ Requirements

51

Page 52: Presentation 7

UML Model – Load & Parse

52

Page 53: Presentation 7

UML Model – Translate

53

Page 54: Presentation 7

UML Model – Associate

54

Page 55: Presentation 7

UML Model – Analyze

55

Page 56: Presentation 7

ArchitectureDesign and Architecture

56

Page 57: Presentation 7

Key Architecture Points Multilanguage support

Configurable for different platforms◦ Stand-along application◦ plug-in◦ backend service

Extendable

57

Page 58: Presentation 7

Architecture

C# Service

Java Service

C++ Service

ApplicationUser Interface

Code Model

Clone Detection Algorithms

Core

API

Language Support (Interface)

58

Service

EclipsePlug-in

Etc…

WebInterface

Page 59: Presentation 7

Core Unit Code Model

◦ Stores the code in common format Application Programming Interface

◦ Used to embed clone detection in applications Language Service Interface

◦ Communication layer between the core and the specific language services

Code ModelClone Detection

Algorithms

Core

API

Language Service Interface

59

Page 60: Presentation 7

Visual Studio Solution

60

Page 61: Presentation 7

Core

61

Page 62: Presentation 7

Core - API

62

Page 63: Presentation 7

Language Service

63

Page 64: Presentation 7

Language Service

64

Page 65: Presentation 7

Language Service

65

Page 66: Presentation 7

App Configuration

66

Page 67: Presentation 7

CRC CardsClass Responsibility Collaboration Cards

67

Page 68: Presentation 7

Java Parser

Parse Java source code LALRParser (Gold Parser)

Construct Java token tree

Java Parser CRC

68

Page 69: Presentation 7

Parser

Parse C# source code LALRParser (Gold Parser)

Construct C# token tree

C# Parser CRC

69

Page 70: Presentation 7

LanguageService

Defines standard interface for all language providers.

ILanguageService

Language ServiceCRC

70

Page 71: Presentation 7

JavaService

Reads Java source code Java Parser

Understands Java grammar production rules

CloneDetection

Construct CodeDOM compilation unit

JavaCodeProvider

ILanguageService

Java Service CRC

71

Page 72: Presentation 7

CsService

Reads C# source code C# Parser

Understands C# grammar production rules

CloneDetection

Construct CodeDOM compilation unit

CsCodeProvider

ILanguageService

Cs Service CRC

72

Page 73: Presentation 7

CloneDection

Loads and manages languages services.

ILanguageService

Controls parsing

Establishes CodeDOM compilation units to source code file associations

Compares code segments CodeDomComparer

Provides bookkeeping for code segments

CodeDomSummary

CloneDetectionCRC

73

Page 74: Presentation 7

FileSetNode

Manages file set tree information for a CloneProject

FileSetNodeCRC

74

Page 75: Presentation 7

ProjectNode

Manages project tree information for a CloneProject

ProjectNodeCRC

75

Page 76: Presentation 7

SourceFileNode

Manages source file tree information for a CloneProject

SourceFileNodeCRC

76

Page 77: Presentation 7

EnabledValueConverter

Manages enabled state for visual components bound to an object

EnabledValueConverterCRC

77

Page 78: Presentation 7

VisibilityValueConverter

Manages visibility state for visual components bound to an object

VisibilityValueConverterCRC

78

Page 79: Presentation 7

CloneProject

Manages project information PresentationModel

Knows the file sets associated with a project

ILanguageService

Knows the files associated with each file set

Knows the name of the project

Can add a file

Can remove a file

CloneProjectCRC

79

Page 80: Presentation 7

ProjectIO

Save a CloneProject CloneProject

Open a CloneProject

ProjectIOCRC

80

Page 81: Presentation 7

RecentProjectList

Manages a list of recently viewed projects

CloneProject

RecentProjectListCRC

81

Page 82: Presentation 7

ProjectView

Visual display of project tree CloneProject

PresentationModel

ProjectNode

FileSetNode

SourceFileNode

ILanguageService

ProjectViewCRC

82

Page 83: Presentation 7

App

Startup class

Manage visual theme

AppCRC

83

Page 84: Presentation 7

MainFrame

Manage application frame PresentationModel

Manage user input – Save CloneProject

Manage user input – Open ProjectView

Manage user input – Close

Manage user input – Exit

Manage user input – Add File Set

Manage user input – Create New

MainFrameCRC

84

Page 85: Presentation 7

PresentationModel

Manage current project state ICloneDetection

Current Project CloneProject

Clone Detection

Currently Selected File

PresentationModelCRC

85

Page 86: Presentation 7

ParsingOur struggles and our successes.

86

Page 87: Presentation 7

We explored and conducted spikes on CSParser and CS CodeDOM Parser.◦ They both had advantages and disadvantage.◦ We came to the conclusion that neither of them

were going to fit our needs. We explored and conducted a spike on

GOLD Parser.◦ We ultimately chose the GOLD Parser because it

best fit our needs. This gave us a way to manage multiple language

grammars with one engine.

Parsing Struggles & Successes

87

Page 88: Presentation 7

C# Spike

88

Page 89: Presentation 7

Spike Objectives:◦ Associated risks/shortfalls◦ Project feasibility◦ Familiarization

CSParser◦ a utility which parses the C# source code and

creates a CodeDOM tree of the code◦ Open source◦ Supports most language features◦ Error handling for features not supported

C# Spike Review

89

Page 90: Presentation 7

C# Spike: CSParser Output

90

Page 91: Presentation 7

Spike Conclusion:◦ Some limitations, but has work around◦ Wrapper code needed

Moving on from Spike:◦ This past iteration, we downloaded CSParser and

familiarized ourselves with it more.◦ Due to several programs having the same name,

we came across CS CodeDOM Parser, as well.

C# Spike Review (cont)

91

Page 92: Presentation 7

The good & the bad for both… CS Parser:

◦ Good parser - Parsed a lot of C# language features

◦ No GUI - It is all command line◦ Came with a large number of test cases◦ Does not use CodeDOM

CS CodeDOM Parser:◦ General parsing◦ GUI◦ Uses CodeDOM

CS Parser & CS CodeDOM Parser

92

Page 93: Presentation 7

Since both programs have good and bad features, our plan is to combine them.

CSParser + CS CodeDOM Parser Planned combined features:

◦ Good parsing◦ GUI◦ CodeDOM◦ Test cases

C# Plan

93

Page 94: Presentation 7

GOLD Parsing SystemSpike

94

Page 95: Presentation 7

Topics To Discuss What is it? How does it work? What can we use it for? How can we extend it?

95

Page 96: Presentation 7

What Is GOLD? GOLD is a free parsing system that you can

use to develop your own programming languages, scripting languages and interpreters. It strives to be a development tool that can be used with numerous programming languages and on multiple platforms. – www.devincook.com/goldparser

96

Page 97: Presentation 7

How It Works (Block Structure)

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

97

Page 98: Presentation 7

How It Works (Components)

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

Three Major Components1. Builder – Reads a source

grammar to construct a Compiled Grammar Table

2. Compiled Grammar Table – Stores LALR and DFA parse tables

3. Engine – Performs actual parsing

98

Page 99: Presentation 7

How It Works (Process)

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

Step 1• Write the grammar for the

language being implemented. (GOLD-Meta Language)• Rules: Backus-Naur Form• Terminals: Regular Expressions• Character sets: Set Notation

99

Page 100: Presentation 7

How It Works (Process)

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

Step 2• Analyze Grammar• Construct LALR and DFA parse

tables which are saved in a Compiled Grammar Table file.

100

Page 101: Presentation 7

How It Works (Process)

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

Step 3• Analyze source text with parser

engine and construct parse tree• Engine can be implemented in

any number of programming languages

101

Page 102: Presentation 7

Usage within CloneDigger

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

CodeDOM Conversion• Need to write routine to move

data from Parsed Tree to CodeDOM• Parsed data trees from parser

are stored in consistent data structure, but are based on rules defined within grammars

CodeDOM Conversi

on

AST

102

Page 103: Presentation 7

Task Understanding Three Step Process• Step 1 Code Translation

• Step 2 Clone Detection

• Step 3 Visualization

Source Files

TranslatorCommon

Model

Common Model

InspectorDetected Clones

Detected Clones

UIClone

Visualization

103

Page 104: Presentation 7

Extension and Enhancements

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

Enhance Grammars• Update Java• Update C#• Define C++

• Share among other classmates with similar interest

• Share with greater community

104

Page 105: Presentation 7

Grammars What is a grammar?

◦ A set of rules of a specific kind, for forming strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context —only their form.

105

Page 106: Presentation 7

Gold Parser Grammars Gold Parser uses context-free grammars

that can be used to do Lookahead Left-to-Right (LALR) parsing.

LALR compliant grammars that we already have:◦ C#◦ Java◦ Visual Basic .Net

106

Page 107: Presentation 7

Grammar Example

107

Page 108: Presentation 7

C++ Grammar Issue Currently no LALR compliant C++ grammar

exists due to the overall complexity.

Other C++ parsers exist, but give an output format different than the other languages we already have grammars for using Gold Parser.

We are still searching for C++ parsing solutions.

108

Page 109: Presentation 7

We plan to use GOLD Parsing System. Tasks we have to complete:

◦ Update JAVA grammer◦ Update C# grammer◦ Research “Define C++ grammer”◦ Create a CodeDOM conversion to move data from

Parsed Tree to CodeDOM

GOLD Parser Conclusion

109

Page 110: Presentation 7

GOLD Parsing SystemGOLD Parsing Populating CodeDOM

110

Page 111: Presentation 7

Topics To Discuss What we are doing? Compiled Grammar Table Bookkeeping Testing

111

Page 112: Presentation 7

How It Works (Block Structure)

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

112

Page 113: Presentation 7

How It Works (Process)

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

Typical output from engine: a long nested tree

113

Page 114: Presentation 7

Usage within CloneDigger

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

CodeDOM Conversion• Need to write routine to move

data from Parsed Tree to CodeDOM• Parsed data trees from parser

are stored in consistent data structure, but are based on rules defined within grammars

CodeDOM Conversi

on

AST

114

Page 115: Presentation 7

Grammar UpdatesGOLD Parser Grammar Updates

115

Page 116: Presentation 7

Grammar Updates Currently the grammars we have for the

Gold parser are out dated.

Current Gold Grammars◦ C# version 2.0◦ Java version 1.4

Current available software versions◦ C# version 4.0◦ Java version 6

116

Page 117: Presentation 7

Grammars for C# and Java are very complex and require a lot of work to build.

Antler and Gold Parser grammars use completely different syntax.

Positive note: Other development not halted by use of older grammars.

Grammar Update Issues

117

Page 118: Presentation 7

Our BookkeepingBookkeeping for parsing the multiple grammars

118

Page 119: Presentation 7

For Java, there is…◦ 359 production rules◦ 249 distinctive symbols (terminal & non-terminal)

For C#, there is…◦ 415 production rules◦ 279 distinctive symbols (terminal & non-terminal)

Compiled Grammar Table

119

Page 120: Presentation 7

Production Rule Dependancies

120

Page 121: Presentation 7

Since there are so many production rules, we came up with the following bookkeeping:

A spreadsheet of the compiled grammar table (for each language) with each production rule indexed.◦ This spreadsheet covers:

various aspects of language what we have/have not handled from the parser what we have/have not implemented into CodeDOM percentage complete

Our Grammar Bookkeeping

121

Page 122: Presentation 7

Our Grammar Bookkeeping

122

Page 123: Presentation 7

Parsing Handlers’ Status:◦ C# = 100% complete◦ Java = 100% complete

Parsing & CodeDOM Status

123

Page 124: Presentation 7

CodeDOMLanguage Independent Object Model

124

Page 125: Presentation 7

CodeDOM Document Object Model for Source Code

API - [System.CodeDom]

Only supports certain aspects of the language since it’s language agnostic◦ Good Enough

What Does it Do?◦ Programmatically Constructs Code

What Doesn’t it Do?◦ Does NOT parse

125

Page 126: Presentation 7

CodeDOM Example CodeCompileUnit

◦ CodeNameSpace Imports Types

Members Event Field Method

Statements Expression

Property

126

Page 127: Presentation 7

Clone AnaysisClones & Dr. Kraft’s Tool

127

Page 128: Presentation 7

Software Clones: (Definitions from Wikipedia)

◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity.

◦ Clones: sequences of duplicate code.

“Clones are segments of code that are similar according to some definition of similarity.”

—Ira Baxter, 2002

Software Clones

128

Page 129: Presentation 7

3 Types of Clones (Definition of Similarity):◦ Type 1: An exact copy without modifications

(except for whitespace and comments)

◦ Type 2: A syntactically identical copy Only variable, type, or function identifiers have

been changed

◦ Type 3: A copy with further modifications Statements have been changed, reordered,

added, or removed

Clones Types

129

Page 130: Presentation 7

Copy and Paste Programming◦ Ctrl-C, Ctrl-V Virus

Multiple Developers◦ Similar Functionality, Similar Code

Plagiarism◦ Code Theft

How Clones are Created

130

Page 131: Presentation 7

Multi-Language Clone Detection◦ Cutting Edge of Research

Preliminary Research◦ Dr. Kraft and Students at UAB

C# and VB. Publication

Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59

◦ Utilizes Mono Parsers C# VB

Clone Research

131

Page 132: Presentation 7

Performs Comparisons of Code Files

For each File, a CodeDOM tree is tokenized

Uses Levenshtein Distance Calculation◦ Minimum number of edits needed to transform one

sequence into the other

Distances Calculated◦ Distance determines Probability of a Clone

Dr. Kraft Clone Analysis

132

Page 133: Presentation 7

Dr. Kraft Application

133

Page 134: Presentation 7

Limitations Only does file-to-file comparisons

◦ Does not detect clones in same source file

Can only detect Type 1 and some Type 2 clones

Not very efficient (brute force)

134

Page 135: Presentation 7

Add Support for Same File Clone Detection

Add Support for Type 3 Clone Detection◦ Requires more Research

Provide a more efficient clone analysis algorithm

Enhancements

135

Page 136: Presentation 7

TestingWhite Box & Black Box Testing

136

Page 137: Presentation 7

White Box Testing: ◦ Unit Testing

Black Box Testing:◦ Production Rule Testing

Allows us to test the robustness of our engine because we can force rule production errors.

Regression Testing Automated

◦ Functional Testing

Testing Our Project

137

Page 138: Presentation 7

Unit Testing

138

Page 139: Presentation 7

Production Rule Test Input File Example

139

Page 140: Presentation 7

Functional Tests

140

Page 141: Presentation 7

MetricsProject Metrics

141

Page 142: Presentation 7

As of Nov 8, 2010 SLOC:

◦ CS666_Client = 553 lines◦ CS666_Core = 114 lines◦ CS666_CppParser = 117 lines◦ CS666_CsParser = 1678 lines◦ CS666_JavaParser = 3350 lines◦ CS666_LanguageSupport = 48 lines◦ CS666_UnitTests = 3384 lines

Total = 9244 lines (including unit tests)

SLOC For Our Project

142

Page 143: Presentation 7

DemonstrationDemonstration of our progress.

143

Page 144: Presentation 7

Demonstration These are the things we would like to show

you today:◦ GUI work◦ Project setup

Save project Load project

◦ Loading of source code◦ Parsing of source code◦ Translation of source code

144

Page 145: Presentation 7

Team CollaborationTeam 2 & Team 3

145

Page 146: Presentation 7

Team Collaboration Due to Team 3’s team size, we have taken

responsibility of gathering & sharing grammars.

Team 3 has the responsibility of the C++ Parsing.

Both Teams will…◦ Use the same grammars & engines

We will both have limitations based on this. Ex: JAVA grammar is based off 1.4 -> we are limited to

using JAVA 1.4

◦ Test the same grammars & engines We will have two test beds.

146

Page 147: Presentation 7

Team Collaboration Method of collaboration:

◦ Google code project site: http://code.google.com/p/uah-studio-2010-2011/ Team 4 team members have access to this site.

◦ Meetings◦ Email

What does our google code project contain?◦ Source control for grammers & engines◦ Bugs/Issues

Team 4 will have ability to document new bugs.◦ Documents/Artifacts

147

Page 148: Presentation 7

Team Collaboration Both teams met Monday (11-8-10) after

class and performed the required Pair Programming.

Current Status:◦ Team 2

All project source code has been made available.

We are researching and working to update the Java and C# grammars.

◦ Team 3 Team 3 is working on C++ parsing.

Looking into other parser, ELSA.

148

Page 149: Presentation 7

Path ForwardCurrent Status & Path Forward for Next Semester

149

Page 150: Presentation 7

Iteration 1: Parsing -> 85%◦ Completed parsing for Java & C#◦ No parsing for C++

But we have a foundation and design to start from. Iteration 2: Translation to CodeDOM -> 60%

◦ We have the foundation and design completed.◦ Now, it is a matter of turning the crank for the

languages. Iteration 3: Clone Analysis -> 30%

◦ Ported majority of Dr. Kraft’s student project code.◦ Started focusing on the GUI

Where we stand…

150

Page 151: Presentation 7

Task Understanding Three Step Process• Step 1 Code Translation

• Step 2 Clone Detection

• Step 3 Visualization

Source Files

TranslatorCommon

Model

Common Model

InspectorDetected Clones

Detected Clones

UIClone

Visualization

151

Page 152: Presentation 7

Schedule

152

Page 153: Presentation 7

Our next step is to re-evaluate where we currently stand.◦ Revisit Release Plan

Pull in Software Studio I work that was not completed.

◦ Revisit User Stories◦ Start off strong with unit tests not completed.

Path Forward

153


Recommended