+ All Categories
Home > Documents > Code Analysis

Code Analysis

Date post: 23-Jan-2016
Category:
Upload: velika
View: 21 times
Download: 0 times
Share this document with a friend
Description:
Code Analysis. Zhengong ( 才振功 ) 2011-07-27. Agenda. Software Analysis Overview Methodology & Methods Tools Demo Project Candidates & Discussion. Software Analysis for What?. Scenario 1 - PowerPoint PPT Presentation
Popular Tags:
66
Code Analysis Zhengong ( 才才才 ) 2011-07-27
Transcript
Page 1: Code Analysis

Code Analysis

Zhengong (才振功 )2011-07-27

Page 2: Code Analysis

Agenda

• Software Analysis OverviewSoftware Analysis Overview

• Methodology & MethodsMethodology & Methods

• Tools DemoTools Demo

• Project Candidates & DiscussionProject Candidates & Discussion

Page 3: Code Analysis

Software Analysis for What?

• Scenario 1Scenario 1Company Company BB buysbuys another one another one CC for market needs. for market needs. CC has a running system has a running system

with a small maintenance group. with a small maintenance group. BB wants to expand the business of the wants to expand the business of the

related system. However, the system documents are related system. However, the system documents are outdatedoutdated, and the , and the

original developers leave. Moreover, the platform of the system cannot get original developers leave. Moreover, the platform of the system cannot get

enough supports from the providers. On the other hands, developing a enough supports from the providers. On the other hands, developing a

replacement system will bring great cost and risks, for the replacement system will bring great cost and risks, for the lack of lack of

comprehensioncomprehension of the system and its business logics…… of the system and its business logics……

Page 4: Code Analysis

Software Analysis for What?

• Scenario 2Scenario 2Project A has been in progress for several months, a Project A has been in progress for several months, a newnew developer D developer D

involvesinvolves as a replacement since a original member leaves. as a replacement since a original member leaves. DD needs to needs to

knows of the source code, documents and other materials before starting knows of the source code, documents and other materials before starting

the development….the development….

• Scenario 3Scenario 3A A bugbug reported by QA or customer. As a developer, how to locate the bug reported by QA or customer. As a developer, how to locate the bug

and and fixfix it in a given time? On the other hand, the customer hope to it in a given time? On the other hand, the customer hope to addadd a a

newnew functionfunction, where to add and which source code to be modified…., where to add and which source code to be modified….

Page 5: Code Analysis

What’s Software Analysis

• DefinitionDefinition

Software analysis is a process or action to validate, verify or locate Software analysis is a process or action to validate, verify or locate

software features (or constraints) manually or automaticallysoftware features (or constraints) manually or automatically

• Similar termsSimilar terms

▫Program comprehension / reverse engineeringProgram comprehension / reverse engineering

• ScopeScope▫DevelopmentDevelopment phase phase :: know of the progress, predict the developing know of the progress, predict the developing actions, eliminate the defeats and program changes, etcactions, eliminate the defeats and program changes, etc▫MaintenanceMaintenance phase phase :: program comprehension and software program comprehension and software maintenancemaintenance▫ReuseReuse phase phase :: analysis and reuse the available partsanalysis and reuse the available parts

Page 6: Code Analysis

The Goals

• Program comprehensionProgram comprehension▫ FunctionalityFunctionality

▫ architecturearchitecture

• Feature LocationFeature Location▫ locate the buglocate the bug

▫ locate where to add new functionslocate where to add new functions

• Code ReviewCode Review▫ Coding stylesCoding styles

▫ Program optimizationProgram optimization

In a word, identify the code architecture and map source code to abstract In a word, identify the code architecture and map source code to abstract

modelsmodels

Page 7: Code Analysis

Objects for Analysis

• Source Code – code analysisSource Code – code analysis

• ModelsModels▫ RequirementsRequirements

▫ Design ModelsDesign Models

▫ Software ArchitectureSoftware Architecture

• Documents, including requirements, design, test, etc.Documents, including requirements, design, test, etc.

• CommentsComments

Page 8: Code Analysis

Problems With Code Analysis

Compile

Source Code

Compilation Environment

Code Analysis Tool

1. Business domain vs application domain

2. Source code vs abstract business

3. Tools are costly

Link

APPLICAT ION

SYSTEM

Syntactic Data for Code Analysis Syntactic & Semantic

Data for Code Analysis

1. Physical actions vs logics

2. Structural program vs unstructured semantic data

Page 9: Code Analysis

Methodology and Methods

• Static analysisStatic analysis

• Dynamic analysisDynamic analysis

• Hybrid approachesHybrid approaches

Page 10: Code Analysis

Static Analysis• Basic approachesBasic approaches

▫ Control flow analysisControl flow analysis

▫ Data flow analysisData flow analysis

▫ Information flow analysisInformation flow analysis

▫ Symbolic executionSymbolic execution

▫ Slice analysisSlice analysis

▫ Clone analysisClone analysis

▫ Syntax analysisSyntax analysis

▫ Type analysisType analysis

Page 11: Code Analysis

Static Analysis▫ Range checkingRange checking

▫ Structure analysisStructure analysis

▫ Alias analysisAlias analysis

▫ Pointer analysisPointer analysis

• Formal approachesFormal approaches

▫ Model checkingModel checking

▫ Theorem provingTheorem proving

Not limited to these. More methods….Not limited to these. More methods….

Page 12: Code Analysis

Control Flow Analysis

• GoalsGoals :: to construct CFGto construct CFG

▫ Analysis the execution pathAnalysis the execution path

▫ Abstract the code structureAbstract the code structure

▫ Locate dead codeLocate dead code

▫ Evaluate the loops and recursionEvaluate the loops and recursion

• MethodsMethods

▫ Sequence diagramSequence diagram

▫ Call graphCall graph

▫ Structure analysisStructure analysis

▫ Program sliceProgram slice

Page 13: Code Analysis

Control Flow Analysis

• ExampleExample

Page 14: Code Analysis

Data Flow Analysis

• GoalsGoals :: evaluate the definition and use of variable in each evaluate the definition and use of variable in each

statementstatement

▫ Variable definitionVariable definition

▫ Input should not be re-assignedInput should not be re-assigned

▫ Output should be assignedOutput should be assigned

▫ Proper global variableProper global variable

• DFADFA :: usually starts with CFA usually starts with CFA ▫ forward analysis——reaching definitionforward analysis——reaching definition

▫ backward analysis——live variablesbackward analysis——live variables ,, eliminating dead codeeliminating dead code

Page 15: Code Analysis

Classical Data-flow Problems• Reaching definitions (Reach)

• Live uses of variables (Live)

• Def-use chains built from Reach, and the dual Use-def chains, built from Live, play role in many optimizations

• Set of variables▫ Gen(N) = set of variables defined by Node N. ▫ Kill(N) = set of variables killed by Node N . ▫ IN(N)=set of variables from the previous nodes▫ Forward order: Out(N) = Gen(N) +(In(N) - Kill(N));

Page 16: Code Analysis

Reaching Definitions

•Definition A statement that may change the value of a variable (e.g., x = i+5)

•A definition of a variable x at node k reaches node n if there is a path clear of a definition of x from k to n.

k

n

x = …

… = x

x = …

Page 17: Code Analysis

Live Uses of Variables

•Use Appearance of a variable as an operand of a 3-address statement (e.g., y=x+4)

•A use of a variable x at node n is live on exit from k if there is a path from k to n clear of definition of x.

k

n

x = …

… = x

x = …

Page 18: Code Analysis

Def-use Relations

•Use-def chain links an use to a definition that reaches that use

•Def-use chain links a definition to an use that it reaches

k

n

x = …

… = x

x = …

Page 19: Code Analysis

Optimizations Enabled

•Dead code elimination (Def-use)

•Loop invariant code motion (Use-def)

•Constant propagation (Use-def)

•Strength reduction (Use-def)

•Copy propagation (Def-use)

Page 20: Code Analysis

Information Flow Analysis

• GoalsGoals::▫ The dependency tracing from output to inputThe dependency tracing from output to input

▫ Validate the dependency according to initial constraintsValidate the dependency according to initial constraints

• IFA methodsIFA methods ::▫ Intra-procedural analysisIntra-procedural analysis

▫ Inter-procedural analysisInter-procedural analysis

Example:X := A + B;Y := D – C;if X>0 then Z := Y + 1;end if;

Here:X depends on A & BY depends on C & DZ depends on A, B, C, & Dand implicitly on Z’s initial value

Page 21: Code Analysis

21

Symbolic Execution •Goals

▫ Verify properties of a program by algebraic manipulation of the source text without requiring a formal specification

•Methods:▫ Typically performed where the program is “executed”

statically by performing back-substitution▫ Converts sequential logic into a set of parallel

assignments in which output values are expressed in terms of input values A + B <= 0:

X = A + BY = D – CZ = not defined

A + B > 0:X = A + BY = D – CZ = D – C + 1

Previous Example:X := A + B;Y := D – C;if X>0 then Z := Y + 1;end if;

Page 22: Code Analysis

Slicing Analysis

• GoalsGoals

▫ Extract the source code related to the concernExtract the source code related to the concern ,, i.e. slicei.e. slice

• MethodMethod::▫ Obtain the concern-related variableObtain the concern-related variable

▫ Analyze the related statements and predicateAnalyze the related statements and predicate ,, to form a sliceto form a slice

▫ Analyze the slice to comprehend the programAnalyze the slice to comprehend the program

• Analysis approachAnalysis approach

▫ Data flow analysisData flow analysis

▫ Dependency analysisDependency analysis

Page 23: Code Analysis

Slicing Analysis

• ExampleExample

Page 24: Code Analysis

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Backward Slice

Backward slice with respect to “printf(“%d\n”,i)”

Page 25: Code Analysis

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Backward Slice

Backward slice with respect to “printf(“%d\n”,i)”

Page 26: Code Analysis

int main() {

int i = 1;while (i < 11) {

i = i + 1;}

printf(“%d\n”,i);}

Slice Extraction

Backward slice with respect to “printf(“%d\n”,i)”

Page 27: Code Analysis

Forward Slice

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Forward slice with respect to “sum = 0”

Page 28: Code Analysis

Forward slice with respect to “sum = 0”

Forward Slice

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Page 29: Code Analysis

Control Flow Graph

Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + i

T

F

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Page 30: Code Analysis

Flow Dependence Graphint main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 printf(sum) printf(i)

sum = sum + i i = i + i

Flow dependence

p q Value of variableassigned at p may beused at q.

i = 1 while(i < 11)

Page 31: Code Analysis

q is reached from pif condition p istrue (T), not otherwise.

Control Dependence Graph

Control dependence

p qT

p qF

Similar for false (F).

Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + 1

T T

TT T

TTT

int main() {int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

}

Page 32: Code Analysis

Program Dependence Graph (PDG)int main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + 1

T

TT T

T

Control dependence

Flow dependence

TT

T

Page 33: Code Analysis

Program Dependence Graph (PDG)int main() {

int i = 1;int sum = 0;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + 1

T

TT T

TTT

T

Opposite Order

Same PDG

Page 34: Code Analysis

Backward Sliceint main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + 1

T

TT T

TTT

T

Page 35: Code Analysis

Backward Slice (2)int main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + 1

T

TT T

TTT

T

Page 36: Code Analysis

Backward Slice (3)int main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + 1

T

TT T

TTT

T

Page 37: Code Analysis

Backward Slice (4)int main() {

int sum = 0;int i = 1;while (i < 11) {

sum = sum + i;i = i + 1;

}printf(“%d\n”,sum);printf(“%d\n”,i);

} Enter

sum = 0 i = 1 while(i < 11) printf(sum) printf(i)

sum = sum + i i = i + 1

TT

TT T

TTT

Page 38: Code Analysis

Slice Extractionint main() {

int i = 1;while (i < 11) {

i = i + 1;}

printf(“%d\n”,i);} Enter

i = 1 while(i < 11) printf(i)

i = i + 1T

TT

TT

Page 39: Code Analysis

Clone AnalysisCode clone is a code fragment in source

files that is identical or similar to another

Clone Pair

Clone ClassCode clone is one of factors that make

software maintenance more difficult.▫ If some faults are found in a code clone, it is

necessary to consider pros and cons of modification in its all code clones.

Page 40: Code Analysis

Clone Analysis

• Improvements for clone codeImprovements for clone code▫ Extract methodExtract method

▫ Pull up methodPull up method

• ToolsTools▫ CCFinder CCFinder

▫ GeminiGemini

Page 41: Code Analysis

Extract Methodvoid methodA(int i){

methodZ();

System.out.println(“name:” + name);System.out.println(“amount:” + i);

}

void methodB(int i){methodY();

System.out.println(“name:” + name);System.out.println(“amount:” + i);

}

void methodA(int i){methodZ();methodC(i);

}void methodB(int i){

methodY();methodC(i);

}

Void methodC(int i){System.out.println(“name:” + name);System.out.println(“amount:” + i);

}

methodC(i);

methodC(i);

Page 42: Code Analysis

Pull Up Method

method A

class A

class B class C

class A

class B class C

method A

method A

Page 43: Code Analysis

Syntax Analysis

• GoalsGoals

▫ Construct AST (Abstract-Syntax Tree)Construct AST (Abstract-Syntax Tree)

▫ Validate the AST according to BNFValidate the AST according to BNF

• MethodsMethods

▫ Bottom-upBottom-up :: operator first methodsoperator first methods

▫ Top-downTop-down :: recursive approachrecursive approach

▫ Context-free, like Left-Right, etcContext-free, like Left-Right, etc

Syntax analysis is also the fundamental of compiling and other Syntax analysis is also the fundamental of compiling and other

analysis approaches.analysis approaches.

Page 44: Code Analysis

Type Analysis

• GoalsGoals

▫ Locating the type errorsLocating the type errors

• MethodsMethods

▫ Most are based on static analysis, for eliminating the type errors Most are based on static analysis, for eliminating the type errors

and verifying the software qualityand verifying the software quality

▫ Some based on dynamic analysisSome based on dynamic analysis

Page 45: Code Analysis

Pointer Analysis• GoalsGoals

▫ Find locations to which a pointer may point toFind locations to which a pointer may point to

▫ Lies at the heart of many program optimization and verification Lies at the heart of many program optimization and verification

problemsproblems

• Pointer analysis is un-decidable in static analysisPointer analysis is un-decidable in static analysis

▫ There exist many conservative approximationsThere exist many conservative approximations

▫ Small points-to set Small points-to set more precision more precision

• FactorsFactors

▫ Flow sensitivityFlow sensitivity

▫ Context sensitivityContext sensitivity

▫ Etc.Etc.

Page 46: Code Analysis

Alias Analysis• Why?

▫ More accurate memory dependence analysis and data flow analysis.

▫ More aggressive optimization and scheduling. Without alias analysis, data flow analysis, optimization and scheduling have to be conservative.

• Exampler 1= arr[1];arr[2]=r2;r3=arr[1];val=r3+arr[3];

Page 47: Code Analysis

Alias Analysis• Challenges

▫ Formal parameters▫ Function pointers▫ Struct & union▫ Type-casted

• Alias Analysis: Computes pairs of pointers that may point to the same memory location

▫Used primarily by older pointer analysis for C

▫Can be computed using a points-to analysis may-alias(v1,v2) if points-to(v1) ∩ points-to(v2) ≠ Ø

Page 48: Code Analysis

Alias Analysis

•ExampleClass Quad{uint32 ulow;uint32 uhigh;};

Class qpart {ushort c, d, a, b;}Quad quad;qpart s = & quad;

ulow

uhigh

cdab

Page 49: Code Analysis

49

•Goals: ▫ Ensure data values lie within the specified ranges▫ Ensure data maintains specified accuracy

•Methods:▫ Overflow and Underflow Analysis▫ Range Checking Analysis▫ Array Bounds Checking▫ Rounding Errors Analysis

Discrete static bounds can often be checked automatically

Checking is straight forward for Enumeration TypesAbsence of overflow for Real Types can be demanding

Range Checking

Page 50: Code Analysis

Structure Analysis

• GoalsGoals

▫ How artifacts build into higher level artifactsHow artifacts build into higher level artifacts

▫ How artifacts depend on each otherHow artifacts depend on each other

▫ visualizationvisualization

• methodsmethods ::▫ Dependency analysisDependency analysis

▫ Impact analysisImpact analysis

• ToolsTools

▫ STAN – a structure analysis tool for JavaSTAN – a structure analysis tool for Java

▫ IBM Rational RoseIBM Rational Rose

▫ MS VisioMS Visio

Page 51: Code Analysis

Structure Analysis• Directed dependency graphDirected dependency graph

Page 52: Code Analysis

Model Checking

• GoalsGoals

▫ Verifying the system models according to requirementsVerifying the system models according to requirements

• MethodsMethods

▫ State transitionState transition

▫ Modal / temporal logicsModal / temporal logics

▫ Define and validate the mathematical problem “can state transition Define and validate the mathematical problem “can state transition

satisfy the logics ”satisfy the logics ”

• Potential problems——models abstraction from code may lose Potential problems——models abstraction from code may lose

some informationsome information

• Tools——SLAMTools——SLAM ,, Java PathFinder2, etc.Java PathFinder2, etc.

Page 53: Code Analysis

Theorem Proving

• ObjectsObjects

▫ Theoretical proof of the system logicsTheoretical proof of the system logics

• FeaturesFeatures

▫ Based on formal or mathematical approachBased on formal or mathematical approach

▫ Most complex and preciseMost complex and precise

▫ Relies on the manual transition and configurationRelies on the manual transition and configuration

• Tools——ESCTools——ESC ,, ESC/JavaESC/Java

Page 54: Code Analysis

Pros and Cons of Static Analysis• ProsPros

▫ Can cover all the code and pathsCan cover all the code and paths

▫ Prior knowledge is not mandatory, applicable for unfamiliar codePrior knowledge is not mandatory, applicable for unfamiliar code

▫ For complete comprehension and partial comprehensionFor complete comprehension and partial comprehension

• ConsCons▫ The precision is affected by dynamic featuresThe precision is affected by dynamic features

▫ Relies on programming languages and coding stylesRelies on programming languages and coding styles

▫ Some data dependencies are too complicatedSome data dependencies are too complicated

Page 55: Code Analysis

Dynamic Analysis

• Dynamic tracingDynamic tracing

• Off-line validationOff-line validation

• Online detectingOnline detecting

Page 56: Code Analysis

Dynamic Tracing

• Output-basedOutput-based▫ Use the system output, log, etc.Use the system output, log, etc.

• Code instrumentationCode instrumentation▫ Source code instrumentationSource code instrumentation

▫ Binary code instrumentationBinary code instrumentation

▫ InterceptorInterceptor for the communication between caller and calleefor the communication between caller and callee

• By interfaces of platformBy interfaces of platform▫ Use the development platform, like OS, JVM, etc.Use the development platform, like OS, JVM, etc.

Page 57: Code Analysis

Off-line validation

• Input generationInput generation▫ First static analysisFirst static analysis

▫ Goal-driven input generationGoal-driven input generation

• Constraint descriptionConstraint description▫ Describe the constraints using the linear sequence diagramDescribe the constraints using the linear sequence diagram

• Execution trace analysisExecution trace analysis▫ Plenty of internal and output data at runtimePlenty of internal and output data at runtime

▫ Trace generation from these dataTrace generation from these data

▫ Verify the trace against the constraintsVerify the trace against the constraints

Page 58: Code Analysis

Online Detecting

• In-lineIn-line▫ The monitor runs in the space with the systemThe monitor runs in the space with the system

▫ High efficiency, quick responseHigh efficiency, quick response

▫ May affect the system itselfMay affect the system itself

• Out lineOut line▫ Monitor runs in independent spaceMonitor runs in independent space

▫ Can deal with multiple outputsCan deal with multiple outputs

▫ Low efficiencyLow efficiency

Page 59: Code Analysis

Online Detecting

• Features Features ▫ The input should not affect the normal execution of the systemThe input should not affect the normal execution of the system

▫ The monitor is always part of the systemThe monitor is always part of the system

Page 60: Code Analysis

Pros and cons

• ProsPros▫ The running results are more trustableThe running results are more trustable

▫ Incremental comprehension by adding test casesIncremental comprehension by adding test cases

▫ High precisionHigh precision

• ConsCons▫ Prior knowledge is inevitable to design test casesPrior knowledge is inevitable to design test cases

▫ Test cases are often not thoroughTest cases are often not thorough

▫ Can not filter dead codeCan not filter dead code

▫ Require ability to run programRequire ability to run program

Page 61: Code Analysis

Hybrid Analysis

• Dynamic Analysis (DA) + Information Retrieval (IR)Dynamic Analysis (DA) + Information Retrieval (IR)

• DA + Impact AnalysisDA + Impact Analysis

• DA + Web MiningDA + Web Mining

• DA + IR + dependency analysisDA + IR + dependency analysis

• IR + BRCGIR + BRCG

• Analysis + TestAnalysis + Test

etc.etc.

Page 62: Code Analysis

IR+BRCG• Approach overviewApproach overview

Page 63: Code Analysis

IR+BRCG

• A BRCG exampleA BRCG example

Page 64: Code Analysis

Tool Demo• InstrumentationInstrumentation

▫ Bytecode instrumentation – IBM BIPTKBytecode instrumentation – IBM BIPTK

• Information RetrievalInformation Retrieval

▫ Lucene 3.0Lucene 3.0

• Static analysisStatic analysis

▫ PMDPMD

▫ Architexa @ Architexa @ http://www.architexa.com/

▫ Code Analysis Plugin @ Code Analysis Plugin @ http://sourceforge.net/projects/cap4e/

▫ AppPerfect @ AppPerfect @ http://www.appperfect.com/download/files/index.html

Page 65: Code Analysis

Projects

• Projects candidatesProjects candidates▫ JDKJDK

▫ JunitJunit

▫ Log4jLog4j

▫ Struts 2.0Struts 2.0

▫ SpringSpring

▫ HibernateHibernate

• DeliverablesDeliverables▫ Analysis report, including at least architecture, flow Analysis report, including at least architecture, flow

chart or sequence diagram and function descriptionchart or sequence diagram and function description

Page 66: Code Analysis

Thank You !


Recommended