+ All Categories
Home > Documents > Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and...

Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and...

Date post: 11-Jan-2016
Category:
Upload: leon-wilcox
View: 212 times
Download: 0 times
Share this document with a friend
71
Getting Started in Program Getting Started in Program Analysis Research: Outline Analysis Research: Outline Background and useful skills Background and useful skills Ana Ana Using and developing analysis Using and developing analysis Mary Lou Mary Lou Identifying and building Identifying and building infrastructure infrastructure Lori Lori Evaluating your analysis Evaluating your analysis Ana Ana
Transcript
Page 1: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Getting Started in Program Getting Started in Program Analysis Research: OutlineAnalysis Research: Outline

Background and useful skillsBackground and useful skills– AnaAna

Using and developing analysis Using and developing analysis – Mary LouMary Lou

Identifying and building infrastructureIdentifying and building infrastructure– LoriLori

Evaluating your analysisEvaluating your analysis– AnaAna

Page 2: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Ana MilanovaAna Milanova

I am from BulgariaI am from BulgariaNational High School for Math and ScienceNational High School for Math and ScienceAmerican University in Bulgaria, 1997American University in Bulgaria, 1997– I have a degree in Business AdministrationI have a degree in Business Administration

Rutgers University, PhD in CS, 2003Rutgers University, PhD in CS, 2003Now Assistant Professor at RPINow Assistant Professor at RPIResearch: program analysis for software Research: program analysis for software toolstoolsFamilyFamily– Husband TonyHusband Tony– Katarina, 5 and Petar, 2Katarina, 5 and Petar, 2

Page 3: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Program Analysis: Program Analysis: Useful Background and Useful Background and

Skills Skills

Page 4: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Program AnalysisProgram Analysis

StaticStatic program analysis program analysis– Analyzes the source code of the programAnalyzes the source code of the program– Run-time behavior properties without Run-time behavior properties without

running the programrunning the programE.g., E.g., ”The object values that flow to ”The object values that flow to reference variable reference variable xx are only of classes are only of classes A and B, but not C.”A and B, but not C.”

Static analyses are conservative: Static analyses are conservative: consider all possible run-time consider all possible run-time behaviors of the programbehaviors of the program

Page 5: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Program AnalysisProgram Analysis

DynamicDynamic program analysis program analysis– Analyzes a set of program executionsAnalyzes a set of program executions– Reasons about run-time behavior properties Reasons about run-time behavior properties

over observed executionsover observed executionsE.g., E.g., ”The object values that flowed to reference ”The object values that flowed to reference

variable variable xx during observed executionsduring observed executions were only were only of classes A and B, but not C.”of classes A and B, but not C.”

Dynamic analyses are incomplete: consider Dynamic analyses are incomplete: consider only behaviors over particular executionsonly behaviors over particular executions

Goal: combine with static analysisGoal: combine with static analysis

Page 6: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Uses of Uses of StaticStatic Program Program AnalysisAnalysis

Compilers – traditional application domainCompilers – traditional application domain– Enables optimizing transformationEnables optimizing transformation

Software engineering toolsSoftware engineering tools– Static debugging, verification, securityStatic debugging, verification, security

Uncover difficult errors and security flawsUncover difficult errors and security flaws

– Testing Testing Evaluate and improve test suitesEvaluate and improve test suites

– Software understandingSoftware understandingCalling structureCalling structureComplex dependencesComplex dependencesChange impactsChange impacts

Page 7: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Uses of Program AnalysisUses of Program Analysis

Analysis for compiler optimization Analysis for compiler optimization

is is differentdifferent from from

Analysis for software tools Analysis for software tools

Different requirements, different Different requirements, different success criteria (more later…)success criteria (more later…)

Page 8: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Static Analysis Static Analysis MethodologiesMethodologies

Data-flow analysisData-flow analysis

Constraint-based program analysisConstraint-based program analysis

Abstract interpretationAbstract interpretation

Type and effect systemsType and effect systems

Model checkingModel checking

Page 9: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Example: Data-flow AnalysisExample: Data-flow Analysis

Flow factsFlow facts– Information that we are Information that we are

propagatingpropagating– E.g., set of definitions {(i,1), E.g., set of definitions {(i,1),

(i,4),(i,6)…} (i,4),(i,6)…}

Transfer functionsTransfer functions– The effect of a statement on The effect of a statement on

the incoming flow factsthe incoming flow facts– E.g., statement i=11 at 6 E.g., statement i=11 at 6

“kills” the incoming “kills” the incoming definition (i,4), and definition (i,4), and “generates” definition (i,6)“generates” definition (i,6)

1. i=11 read x,y

5. p(i)

3. p(i)

2. if x<y

4. i=j+5

6. i=11

7. i=i*i

{(i,1)}

{(i,1)}

{(i,1)}

{(i,1)}

{(i,4)}

{(i,6)}

{(i,4)}

Page 10: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

TheoryTheory

Data-flow frameworksData-flow frameworks– Control-flow graph Control-flow graph CFGCFG– Space of flow facts Space of flow facts LL– Space of transfer functions Space of transfer functions FF– Certain properties of Certain properties of LL and and FF allow a general allow a general

solution proceduresolution procedure

Fixed-point iterationFixed-point iteration– Termination: the iterative computation Termination: the iterative computation

terminatesterminates– SafetySafety (correctness, soundness): the solution is (correctness, soundness): the solution is

conservativeconservativeFor most problems the analysis produces “noise”For most problems the analysis produces “noise”

Page 11: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Theory and PracticeTheory and Practice

Analysis cost – how much time, memory Analysis cost – how much time, memory

Analysis precision – how much noiseAnalysis precision – how much noise– a.m(): A more precise analysis a.m(): A more precise analysis a:a: {B}, and a {B}, and a

less precise analysis less precise analysis a:a: {A,B,C} {A,B,C}

Typically, there is a tradeoff between Typically, there is a tradeoff between cost and precision! cost and precision!

In practice, we need to analyze very In practice, we need to analyze very large programs, 100K LOC, even 1M LOClarge programs, 100K LOC, even 1M LOC

Page 12: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Theory and PracticeTheory and Practice

Approximations - introduce noiseApproximations - introduce noise– make the CFG “smaller” make the CFG “smaller”

– make the set of flow facts “smaller”make the set of flow facts “smaller”

– make the transfer functions converge make the transfer functions converge fasterfaster

Approximations are necessary Approximations are necessary – But be careful: different approximations But be careful: different approximations

for different analysesfor different analyses

Page 13: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Standard ApproximationsStandard Approximations

Flow-sensitive vs. flow-insensitiveFlow-sensitive vs. flow-insensitive

x: {true} x = true;x: {true}, y: {false} y = false;x: {false}, y: {false} x = y;

x: {true,false}, y: {false}

Page 14: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Standard ApproximationsStandard Approximations

Context-sensitive vs. context-Context-sensitive vs. context-insensitiveinsensitive

A(bool X) { this.f = X;}a = new A(true);b = new A(false);

a.f = trueb.f = false

a.f: {true}, b.f: {false}

Merged flow:

a.f = true/falseb.f = true/false

a.f: {true,false},

b.f: {true,false}

Page 15: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Useful Background and Useful Background and SkillsSkills

Higher-level undergraduate or graduate Higher-level undergraduate or graduate courses on:courses on:– Programming Languages, Compilers, Algorithms, Programming Languages, Compilers, Algorithms,

Logic, Software Engineering, ArchitectureLogic, Software Engineering, Architecture

Analytical and programming skillsAnalytical and programming skillsStep1:Step1: Design a program analysis algorithm Design a program analysis algorithm

Understand your target language (e.g., Java and C++, Understand your target language (e.g., Java and C++, C)C)

Step2:Step2: Implement the analysis algorithm Implement the analysis algorithm Understand the language(s) of the infrastructureUnderstand the language(s) of the infrastructure

Step3:Step3: Evaluate analysis algorithm Evaluate analysis algorithm

Page 16: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Useful ResourcesUseful Resources

Books (my personal list)Books (my personal list)– ““Compilers: Principles, Techniques and Tools” Compilers: Principles, Techniques and Tools”

by Aho, Sethi, Ullman, Ch. 10 by Aho, Sethi, Ullman, Ch. 10 An introduction to data-flow analysisAn introduction to data-flow analysis

– ““Program Analysis” by Nielsen, Nielsen, HankinProgram Analysis” by Nielsen, Nielsen, HankinAn excellent reference for advanced studentsAn excellent reference for advanced students

– ““Model Checking” by Clarke, Grumberg, PeledModel Checking” by Clarke, Grumberg, Peled

Course material on the webCourse material on the web– Classes taught by professorsClasses taught by professors– My class (there are better ones, of course): My class (there are better ones, of course):

www.cs.rpi.edu/~milanova/csci6961/lectures/www.cs.rpi.edu/~milanova/csci6961/lectures/

Page 17: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Using and Developing Using and Developing Program AnalysisProgram Analysis

Mary Lou SoffaMary Lou Soffa

University of VirginiaUniversity of Virginia

Page 18: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

About Mary Lou SoffaAbout Mary Lou SoffaConfused about what I wanted to beConfused about what I wanted to be

Ph.D. programs:Ph.D. programs:– Mathematics, Sociology; Philosophy; Mathematics, Sociology; Philosophy;

Environmental Acoustics: disenchantedEnvironmental Acoustics: disenchanted– Found what I really loved – computer Found what I really loved – computer

sciencescience

After 25+ years at Pitt, moved to UVAAfter 25+ years at Pitt, moved to UVA– Small farm – grow “crops”; love my tractorSmall farm – grow “crops”; love my tractor– Passion – increasing the participation of Passion – increasing the participation of

women and minorities in computer sciencewomen and minorities in computer science– Professional achievement – 24 Ph.D. Professional achievement – 24 Ph.D.

students; ½ are women.students; ½ are women.

Page 19: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Program analysisProgram analysis

How to apply program analysis in How to apply program analysis in your researchyour research

What are questions and what do you What are questions and what do you have to dohave to do

Page 20: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Solve a problem

Program behaviorstatic or dynamic

What parts of program are involved

Determine information needed

Develop appropriate representation

Develop analysis

Develop algorithm

Page 21: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Have a goal – program codeHave a goal – program code

ProblemProblem– Improve performanceImprove performance– Understand programUnderstand program– Find errorsFind errors– Locate cause of errorsLocate cause of errors

Need to collect information about the Need to collect information about the program that helps you infer properties of program that helps you infer properties of programprogram

Static or dynamic codeStatic or dynamic code

Page 22: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Determine information Determine information neededneeded

What questions are you askingWhat questions are you asking

What do you need to gather to What do you need to gather to answer questionsanswer questions– Examples:Examples:

Statements needed to compute an Statements needed to compute an expressionexpression

Values are always constant at a particular Values are always constant at a particular program pointprogram point

Locations of dead statementLocations of dead statement

Branches that are correlatedBranches that are correlated

Page 23: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Example: redundancyExample: redundancy

Remove redundancies with goal of Remove redundancies with goal of improving performance –improving performance –– Redundant redundant expressionsRedundant redundant expressions– Redundant loadsRedundant loads– Redundant storesRedundant stores– Dead codeDead code

Static Static Remove redundant expressions from Remove redundant expressions from

program representationprogram representation

Page 24: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Redundant expressionsRedundant expressions

Does the value need to be computed Does the value need to be computed for correct semantics?for correct semantics?

X := A * B

F := C + E

C := C + 1

If (cond) then R := A * B; S := C+ E

Else X := A * B; A := 6

End if

G= A*B

Page 25: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

What parts of program What parts of program involvedinvolved

Given information you need, what Given information you need, what parts of program are involvedparts of program are involved

Examples:Examples:– branches and statements that change branches and statements that change

values in conditionalvalues in conditional– all possible execution pathsall possible execution paths– Array definitions and usesArray definitions and uses– TypesTypes– LoopsLoops

Page 26: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Example: Redundant Example: Redundant expressionsexpressions

ExpressionsExpressions

DefinitionsDefinitions

Control flow among definitions and Control flow among definitions and expressionsexpressions

Program pathsProgram paths

Page 27: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Program representationProgram representation

Program representation that enables Program representation that enables collection of informationcollection of information

GranularityGranularity– Source, intermediate, binarySource, intermediate, binary

Issues: how to get representation Issues: how to get representation from another representationfrom another representation

Page 28: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Example: redundant Example: redundant expressionsexpressions

Want to know how expressions flow Want to know how expressions flow

Is the value of an expression same Is the value of an expression same as when expression used againas when expression used again

Need control flow graph with Need control flow graph with statements in nodes – intermediate statements in nodes – intermediate levellevel

X := A + BX := A + B

Page 29: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Available ExpressionsAvailable ExpressionsControl flow graphControl flow graph

X := A * B

F := C + E

C := C + 1R := A * B

S := C+ E

X := A * B

A := 6

G := A*B

Page 30: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Formulate analysis over Formulate analysis over representationrepresentation

How to gather information from How to gather information from representationrepresentation

How many analysesHow many analyses

Direction of flow of analysisDirection of flow of analysis

Along all paths or any pathAlong all paths or any path

Local solutionLocal solution

Global solutionGlobal solution

Page 31: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Example: Redundant Example: Redundant expressionsexpressions

Local - basic block – single entry/exitLocal - basic block – single entry/exit

– What expressions are generatedWhat expressions are generated– What expressions are “killed” by a What expressions are “killed” by a

definitiondefinition

Global Flow over flow graphGlobal Flow over flow graph– Forward flowForward flow– Must be true on all paths Must be true on all paths

Page 32: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Redundant ExpressionsRedundant ExpressionsControl flow graphControl flow graph

X := A * B

F := C + E

C := C + 1R := A * B

S := C+ E

X := A * B

A := 6

G := A * B

{A * B}

{ A * B}

{ A * B, C+E}

{ A * B}

Page 33: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Develop analysesDevelop analyses

Data flow equations – use data flow Data flow equations – use data flow frameworkframework

AlgorithmAlgorithm

PrecisenessPreciseness

ExpenseExpense

Page 34: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Data flow equationsData flow equations

Gen (B) = all expressions Gen (B) = all expressions

Kill (B) = all definitions – kill all Kill (B) = all definitions – kill all incoming available expressionincoming available expression

Out(B) = Gen(B) Out(B) = Gen(B) (IN(B) – Kill(B)) (IN(B) – Kill(B))

In(B) = In(B) = Out(j) Out(j)

Page 35: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Dynamic OptimizationDynamic Optimization

Static optimizationsStatic optimizations– Apply before executionApply before execution

Dynamic OptimizationsDynamic Optimizations

Apply during execution – redundancy Apply during execution – redundancy expressionsexpressions

Binary code Binary code

Program tracesProgram traces

Page 36: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

1. A = 42. T1 = A*B3. L1: T2 = T1/C4. If T2 < W go to L25. M = T1 * K6. T3 = M + 17. L2: H = I8. M = T3 - H9. If T3 > 0 go to L310. Go to L111. L3: halt

1. A = 42. T1 = A*B

3. L1: T2 = T1/C4.if T2<W go to L2

5. M = T1*K6. T3 = M + 1

7. L2: H = I8. M = T3-H9. If 3 > 0 go to L3

10. go to L1

11. L3:halt

B1

B2

B3

B4

B5

B6

B2

B3

B6

B1

B4

B5

Page 37: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Program TraceProgram Trace

A = 4T1 = A*BT2 = T1/CIf T2 !< W jump outH = IM = T3 - HIf T3 > 0 go to L3T2 = T1/CIf T2 !< W jump outM = T1 * KT3 = M + 1H = IM = T3 - Hhalt

Binary code

Page 38: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Dynamic optimizationDynamic optimization

Note:Note:

Single entry; multiple exitsSingle entry; multiple exits

No LoopsNo Loops

Need to Representation – bring up a Need to Representation – bring up a level from binary codelevel from binary code

Page 39: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Applying optimizationsApplying optimizations

Not as complicatedNot as complicated

But, cannot tolerate much overheadBut, cannot tolerate much overhead– Phases in staticPhases in static– Developed algorithm that can apply Developed algorithm that can apply

multiple optimizationsmultiple optimizations– Demand drivenDemand driven– Limit study of dynamic optimizationsLimit study of dynamic optimizations

Page 40: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

ConclusionConclusion

Need analysis in many different Need analysis in many different applicationsapplications– Virtual execution enviromentsVirtual execution enviroments

MulticoreMulticore

Wireless sensor networksWireless sensor networks

– TestingTestingTesting for wireless sensor networksTesting for wireless sensor networks

Testing for securityTesting for security

Page 41: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Identifying and Building Identifying and Building Infrastructure Infrastructure

Page 42: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Lori’s JourneyLori’s Journey

Science/Math love: Started in chemistry at liberal arts college.

Field Trip and first cs course -> CS major.Advisor’s strong push for grad school -> U Pitt.

Took compilers course from Mary Lou -> PhD in compiler optimization.

Big year: 10/85-married Mark. 1/86-started at Rice. 4/86-PhD

Family: The yankees returned north 3 years later!

University of Delaware: 15+ yrs. Visiting, Assistant, Associate, Full

Family: Lauren (HS senior), Lindsay (16 and driving), Matt (11)

Support: Mark, Mark, Mark,… Mary Lou, Errol, Sandee, CRA-W

Currently: software tools, testing, compiler optimization

Page 43: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Identifying and Building Identifying and Building Infrastructure for Analysis Infrastructure for Analysis

ResearchResearch

What kinds of infrastructure do you What kinds of infrastructure do you need?need?

How to identify and build How to identify and build infrastructureinfrastructure

ExamplesExamples

Page 44: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

What kinds of infrastructure do What kinds of infrastructure do you need?you need?

AnalysisFrameworkSoftware

Hardware Workloads

People

Labspace

Analysis Research and Evaluation

Page 45: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Identifying Analysis Framework Identifying Analysis Framework SoftwareSoftware

Determine Goals

Specify Requirements- Needed- Desired (Prioritized)

Search for Possibilities- Peers/Experts- Technical papers- Internet search

Try Them Out - Install + Run Tests- Read docs- Examine code- Try small task

Weigh Choices- Meet Requirements?- Ease of Use/Change?...

- Short term- Long term

Page 46: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Example: Identifying Analysis Example: Identifying Analysis Framework SoftwareFramework Software

Determine Goals

Specify Requirements

- Needed: call graph, cfg, chgRealistic environment/appsEasy to extend/build client tools

Search for Possibilities- Common environment is IDE, Java. Eclipse platform

Try Them Out - Install + explore- Write a small plugin- Use call graph, chg,cfg for small task

Weigh Choices- Learning curve vs Available analyses, realism

Evaluate new analysis on JavaOn its own and in client tool

Page 47: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Implementing Your AnalysisImplementing Your Analysis

Once you have decided on an Once you have decided on an infrastructure:infrastructure:– Think Reuse!! Think modularity!!Think Reuse!! Think modularity!!– Think prototype, but extensible and Think prototype, but extensible and

scalablescalable– Test, test, test - try to be systematicTest, test, test - try to be systematic– Debug – not easyDebug – not easy

Page 48: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Example: Implementing My NL Example: Implementing My NL Analysis Analysis

Build small modular components -> reuse Build small modular components -> reuse – Analyzing method signatures to extract NLAnalyzing method signatures to extract NL– Building program representation for NLBuilding program representation for NL– Traversing program repTraversing program rep– Building program rep for IRBuilding program rep for IR

Design reps to avoid loss of info -> reuseDesign reps to avoid loss of info -> reuse– Id’s and their roles and locations in codeId’s and their roles and locations in code– Verb, Direct object rep -> extensibleVerb, Direct object rep -> extensible

Page 49: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Managing the Evolving Managing the Evolving Software InfrastructureSoftware Infrastructure

Managing change over time and peopleManaging change over time and people– CVS, subversionCVS, subversion

Tracking tasks, bugs, deadlines/goalsTracking tasks, bugs, deadlines/goals– TRAC, bugzilla, gforgeTRAC, bugzilla, gforge

Maintaining documentationMaintaining documentation– JavaDocs, DoxygenJavaDocs, Doxygen

Testing, testing, testingTesting, testing, testing– Unit, system, regression -- test suitesUnit, system, regression -- test suites

Sounds like software engineering…Sounds like software engineering…

Page 50: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Selecting Appropriate Selecting Appropriate HardwareHardware

Determine Goals

Specify Requirements- Needed- Desired (Prioritized)

Search for Possibilities - Peers/Experts- System Staff

Weigh Choices- Meet Requirements?- Costs within budget?- Need to ask for money?

- Short term- Long term

Page 51: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Gathering Good WorkloadsGathering Good Workloads

ControlledExperiment

SynthesizedBenchmarks

Case Studies

Representative

Kind of Evaluation Desired

Try to reduce threats to validity of experiments:- varied/similar- domain- size- complexity/form- known and available to others

Page 52: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Example: Gathering Good Example: Gathering Good WorkloadsWorkloads

Case Studies

Representative

Kind of Evaluation Desired

Try to reduce threats to validity of experiments:- varied/similar- domain- size- complexity/form- known/available to others

Research Questions:- How effective is our FindConceptTool versus other code search tools?(versus lexical search and IR)(precision and recall)- How does the human effort compare?

Sourceforge:- very large- many cvs updates (active)- varied in domain

Page 53: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Identifying Strong StudentsIdentifying Strong Students

Teach a compiler or program analysis course Teach a compiler or program analysis course regularlyregularlyIdentify students from the courseIdentify students from the courseIdeal = Ideal =

Creative + quick to understand analysis Creative + quick to understand analysis + good problem solver + good problem solver + hard working + hard working + good coder + good coder + good communicator + good writer + good communicator + good writer + show initiative and interest in analysis+ show initiative and interest in analysis

Some training will be required.Some training will be required.Start Small. Create a Pipeline.Start Small. Create a Pipeline.

Page 54: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Building a Working Lab Building a Working Lab SpaceSpace

Needs:Needs:- one workspace/computer/storage per grad student- one workspace/computer/storage per grad student

- room for growth and undergrad researchers- room for growth and undergrad researchers

- current technology – minimize old machines – - current technology – minimize old machines – maintenance?maintenance?

- lab printer- lab printer

- lab library of research-oriented background books- lab library of research-oriented background books

Make it somewhere students want to work:Make it somewhere students want to work:

- - posters/pictures/plantsposters/pictures/plants

- open and pleasant – microwave, frig, coffeepot…?- open and pleasant – microwave, frig, coffeepot…?

- all needed resources/supplies easily available- all needed resources/supplies easily available

- conference room for larger research meetings- conference room for larger research meetings

Page 55: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Static Program Analysis: Static Program Analysis: Evaluating Your AnalysisEvaluating Your Analysis

Page 56: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

A Typical Program Analysis A Typical Program Analysis Research ProjectResearch Project

Step 1: Design your analysisStep 1: Design your analysis– Reason about safetyReason about safety– Reason about complexity in terms of Reason about complexity in terms of

program sizeprogram size

Step 2: Implement your analysisStep 2: Implement your analysis– Hard! Hard! – Complex and difficult to test, debug and Complex and difficult to test, debug and

verify – a real problemverify – a real problem

Step 3: EVALUATE!Step 3: EVALUATE!

Page 57: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Evaluation of a Compiler Evaluation of a Compiler AnalysisAnalysis

Strict requirements for the analysisStrict requirements for the analysis– Safety is crucial! Safety is crucial!

An unsafe analysis may miss an execution An unsafe analysis may miss an execution path, and result in a change of the original path, and result in a change of the original programprogram

– Analysis time (and space) Analysis time (and space) Constraint by normal compilation timeConstraint by normal compilation time

Objective success criteriaObjective success criteria– Show improvement in execution timeShow improvement in execution time– Show reduction in memory footprintShow reduction in memory footprint

Page 58: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Evaluation of a Compiler Evaluation of a Compiler AnalysisAnalysis

Established benchmarksEstablished benchmarks– E.g., the SPEC JVM98 E.g., the SPEC JVM98

General evaluation of Java compilersGeneral evaluation of Java compilers

– E.g., the DaCapo benchmark suiteE.g., the DaCapo benchmark suiteMemory intensive Java applicationsMemory intensive Java applications

Ideally you would say something like this: Ideally you would say something like this:

““our analysis increases compilation time our analysis increases compilation time by at most 10%, and results in speed-up of by at most 10%, and results in speed-up of 10-16% on the SPEC JVM98 benchmarks”.10-16% on the SPEC JVM98 benchmarks”.

Page 59: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Evaluation of an Analysis for a Evaluation of an Analysis for a Software ToolSoftware Tool

Requirements for the analysis - not so Requirements for the analysis - not so strictstrict– Relaxing safety is OK!Relaxing safety is OK!– Analysis time (space) is not so crucialAnalysis time (space) is not so crucial

Developers would definitely wait if the analysis finds Developers would definitely wait if the analysis finds “difficult” bugs such as data races and memory leaks“difficult” bugs such as data races and memory leaks

Success criteria - not so objectiveSuccess criteria - not so objective– Precision – low noisePrecision – low noise– Practicality – practical time/space Practicality – practical time/space

requirements, works on 100K LOC requirements, works on 100K LOC – Usability of toolUsability of tool– Bugs found – absolutely sure Bugs found – absolutely sure

Page 60: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Evaluation of an Analysis for a Evaluation of an Analysis for a Software ToolSoftware Tool

Precision is CRUCIAL – noise is really bad!Precision is CRUCIAL – noise is really bad!– E.g., there are 10 buffer overflow bugs in program PE.g., there are 10 buffer overflow bugs in program P– Safe analysis A issues 1000 warnings, 10 are real and Safe analysis A issues 1000 warnings, 10 are real and

990 are false positives990 are false positives– Unsafe analysis B issues 13 warnings, 8 are real and 5 Unsafe analysis B issues 13 warnings, 8 are real and 5

are false positivesare false positives– Analysis B is much more useful than analysis A!Analysis B is much more useful than analysis A!

Absolute precision – done more and more oftenAbsolute precision – done more and more often– Choose a subset of analyzed programsChoose a subset of analyzed programs– Manually find the real solutionManually find the real solution– Compare with analysis solutionCompare with analysis solution

Precision – how much noise is there? Precision – how much noise is there? Recall (if the analysis is unsafe) – how much did it miss?Recall (if the analysis is unsafe) – how much did it miss?

– E.g., a.m(): The real solution E.g., a.m(): The real solution a:a: {B}, a safe analysis {B}, a safe analysis solution solution a:a: {A,B,C}. Precision - 67% noise! {A,B,C}. Precision - 67% noise!

Page 61: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Evaluation of an Analysis for a Evaluation of an Analysis for a Software ToolSoftware Tool

Finding a benchmark setFinding a benchmark set– Depends on analysis applicationDepends on analysis application– Large programsLarge programs– Diverse programs, as many as it is feasibleDiverse programs, as many as it is feasible– Publicly available: sourceforge.orgPublicly available: sourceforge.org– Look at benchmark suites in published work!Look at benchmark suites in published work!

Ideally, you will have a large set of diverse Ideally, you will have a large set of diverse programs, will show acceptable absolute programs, will show acceptable absolute precision (low false positive rate) and precision (low false positive rate) and practical cost practical cost

Page 62: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Comparison with Existing Comparison with Existing AnalysisAnalysis

Well-known program analysis Well-known program analysis problemsproblems– ““Haven’t we solved that problem yet?”Haven’t we solved that problem yet?”– E.g., Points-to analysisE.g., Points-to analysis

Design a new analysis ADesign a new analysis A

Compare with best known analysis BCompare with best known analysis B– Show improvement in one or more of: Show improvement in one or more of:

analysis cost, analysis precision analysis cost, analysis precision

Page 63: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

What Not to DoWhat Not to Do

Propose a new analysis without any Propose a new analysis without any evaluationevaluation– E.g., “We describe this new great points-to E.g., “We describe this new great points-to

analysis.”analysis.”

Design your own metric, different from Design your own metric, different from established metricsestablished metrics– E.g., “We propose a novel points-to analysis A E.g., “We propose a novel points-to analysis A

and points-to analysis A’ which improves on A. and points-to analysis A’ which improves on A. Therefore, both A and A’ are great.”Therefore, both A and A’ are great.”

Use non-standard benchmarkUse non-standard benchmark– Report on a subset: the ones for which the Report on a subset: the ones for which the

analysis worksanalysis works

Page 64: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

QuestionsQuestions

Page 65: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.
Page 66: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

An Example: Devirtualization in An Example: Devirtualization in Object-oriented ProgramsObject-oriented Programs

Polymorphism and dynamic dispatchPolymorphism and dynamic dispatchclass A { void m() { … } }class A { void m() { … } }class B extends A { void m() { … } }class B extends A { void m() { … } }class C extends A { void m() { … } }class C extends A { void m() { … } }

– Virtual call: Virtual call: a.m()a.m() is dispatched at run-time, is dispatched at run-time, based on the class of the receiver, A, B or Cbased on the class of the receiver, A, B or C

– Powerful: enables modern software engineeringPowerful: enables modern software engineering– But costly: 13% of time spent in virtual dispatchBut costly: 13% of time spent in virtual dispatchAnalysis: “only Analysis: “only BB objects ever flow to objects ever flow to aa””Optimization: virtual call Optimization: virtual call a.m()a.m() => direct => direct call to call to B.m()B.m()

Page 67: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Uses of Static Program Uses of Static Program AnalysisAnalysis

Software engineering toolsSoftware engineering tools– Static debugging, verification, securityStatic debugging, verification, security

Uncover difficult errors and security flawsUncover difficult errors and security flaws

– Testing Testing Evaluate and improve test suitesEvaluate and improve test suites

– Software understandingSoftware understandingCalling structureCalling structureComplex dependencesComplex dependencesChange impactsChange impacts

Many (unexplored) areas of applicationMany (unexplored) areas of application

Page 68: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Static DebuggingStatic Debugging

Analyze the program and look for bugsAnalyze the program and look for bugs– Memory and pointer bugs: memory leaks, null Memory and pointer bugs: memory leaks, null

pointer dereferences, double frees, buffer pointer dereferences, double frees, buffer overflows, etc.overflows, etc.

– Concurrency bugs: races, deadlocksConcurrency bugs: races, deadlocks– Issue warnings Issue warnings Microsoft: Microsoft: – PREFix and PREfast tools in use since 2000PREFix and PREfast tools in use since 2000– Many new tools developedMany new tools developedIBM:IBM:– Tools for static debugging of production J2EE Tools for static debugging of production J2EE – Tools for security auditing of J2EETools for security auditing of J2EE

Page 69: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Software TestingSoftware Testing

Coverage-based testingCoverage-based testing– Improve test quality with good “coverage”Improve test quality with good “coverage”– E.g., cover E.g., cover all possibleall possible receiver classes at virtual calls receiver classes at virtual calls

Step 1Step 1: analyze the tested code: analyze the tested code– What areWhat are all possibleall possible receiver classes at virtual calls?receiver classes at virtual calls?

a.m(): Analysis: “only B objects ever flow to a”a.m(): Analysis: “only B objects ever flow to a”

Step 2Step 2: insert instrumentation: insert instrumentation

Step 3Step 3: run tests and report coverage: run tests and report coverage– What were theWhat were the receiver classes receiver classes actually observedactually observed while while

running the tests?running the tests?

comparecompare

Page 70: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Software UnderstandingSoftware Understanding

Navigate through calling structure:Navigate through calling structure:

Reason about (im)mutabilityReason about (im)mutability– Powerful, central to imperative programmingPowerful, central to imperative programming– Many real bugs are due to unintended Many real bugs are due to unintended

mutabilitymutabilityQ1: is a method A.m(…) side-effect free?Q1: is a method A.m(…) side-effect free?Q2: can a private field in a class A be mutated by Q2: can a private field in a class A be mutated by untrusted clients of A (i.e., classes that use A)?untrusted clients of A (i.e., classes that use A)?

Reason about other quality attributesReason about other quality attributesFind code related to a change, etc.Find code related to a change, etc.Reverse engineeringReverse engineering

X.n()

B.m()

Page 71: Getting Started in Program Analysis Research: Outline Background and useful skills –Ana Using and developing analysis –Mary Lou Identifying and building.

Program RepresentationsProgram Representations

if (x<y) then z=1; else z=2;if (x<y) then z=1; else z=2;

Control Flow GraphControl Flow Graph– LinearLinear– 3-address statements3-address statements– Flow of controlFlow of control

Syntax TreeSyntax Tree– TreeTree– Parse tree of the programParse tree of the program

if x<y

z = 1 z = 2

T F

If-then-else

Expr Stmt Stmt

x<y z=1 z=2


Recommended