Getting Started in Program Getting Started in Program Analysis Research: OutlineAnalysis Research: Outline
Background and useful skillsBackground and useful skills– AnaAna
Using and developing analysis Using and developing analysis – Mary LouMary Lou
Identifying and building infrastructureIdentifying and building infrastructure– LoriLori
Evaluating your analysisEvaluating your analysis– AnaAna
Ana MilanovaAna Milanova
I am from BulgariaI am from BulgariaNational High School for Math and ScienceNational High School for Math and ScienceAmerican University in Bulgaria, 1997American University in Bulgaria, 1997– I have a degree in Business AdministrationI have a degree in Business Administration
Rutgers University, PhD in CS, 2003Rutgers University, PhD in CS, 2003Now Assistant Professor at RPINow Assistant Professor at RPIResearch: program analysis for software Research: program analysis for software toolstoolsFamilyFamily– Husband TonyHusband Tony– Katarina, 5 and Petar, 2Katarina, 5 and Petar, 2
Program Analysis: Program Analysis: Useful Background and Useful Background and
Skills Skills
Program AnalysisProgram Analysis
StaticStatic program analysis program analysis– Analyzes the source code of the programAnalyzes the source code of the program– Run-time behavior properties without Run-time behavior properties without
running the programrunning the programE.g., E.g., ”The object values that flow to ”The object values that flow to reference variable reference variable xx are only of classes are only of classes A and B, but not C.”A and B, but not C.”
Static analyses are conservative: Static analyses are conservative: consider all possible run-time consider all possible run-time behaviors of the programbehaviors of the program
Program AnalysisProgram Analysis
DynamicDynamic program analysis program analysis– Analyzes a set of program executionsAnalyzes a set of program executions– Reasons about run-time behavior properties Reasons about run-time behavior properties
over observed executionsover observed executionsE.g., E.g., ”The object values that flowed to reference ”The object values that flowed to reference
variable variable xx during observed executionsduring observed executions were only were only of classes A and B, but not C.”of classes A and B, but not C.”
Dynamic analyses are incomplete: consider Dynamic analyses are incomplete: consider only behaviors over particular executionsonly behaviors over particular executions
Goal: combine with static analysisGoal: combine with static analysis
Uses of Uses of StaticStatic Program Program AnalysisAnalysis
Compilers – traditional application domainCompilers – traditional application domain– Enables optimizing transformationEnables optimizing transformation
Software engineering toolsSoftware engineering tools– Static debugging, verification, securityStatic debugging, verification, security
Uncover difficult errors and security flawsUncover difficult errors and security flaws
– Testing Testing Evaluate and improve test suitesEvaluate and improve test suites
– Software understandingSoftware understandingCalling structureCalling structureComplex dependencesComplex dependencesChange impactsChange impacts
Uses of Program AnalysisUses of Program Analysis
Analysis for compiler optimization Analysis for compiler optimization
is is differentdifferent from from
Analysis for software tools Analysis for software tools
Different requirements, different Different requirements, different success criteria (more later…)success criteria (more later…)
Static Analysis Static Analysis MethodologiesMethodologies
Data-flow analysisData-flow analysis
Constraint-based program analysisConstraint-based program analysis
Abstract interpretationAbstract interpretation
Type and effect systemsType and effect systems
Model checkingModel checking
Example: Data-flow AnalysisExample: Data-flow Analysis
Flow factsFlow facts– Information that we are Information that we are
propagatingpropagating– E.g., set of definitions {(i,1), E.g., set of definitions {(i,1),
(i,4),(i,6)…} (i,4),(i,6)…}
Transfer functionsTransfer functions– The effect of a statement on The effect of a statement on
the incoming flow factsthe incoming flow facts– E.g., statement i=11 at 6 E.g., statement i=11 at 6
“kills” the incoming “kills” the incoming definition (i,4), and definition (i,4), and “generates” definition (i,6)“generates” definition (i,6)
1. i=11 read x,y
5. p(i)
3. p(i)
2. if x<y
4. i=j+5
6. i=11
7. i=i*i
{(i,1)}
{(i,1)}
{(i,1)}
{(i,1)}
{(i,4)}
{(i,6)}
{(i,4)}
TheoryTheory
Data-flow frameworksData-flow frameworks– Control-flow graph Control-flow graph CFGCFG– Space of flow facts Space of flow facts LL– Space of transfer functions Space of transfer functions FF– Certain properties of Certain properties of LL and and FF allow a general allow a general
solution proceduresolution procedure
Fixed-point iterationFixed-point iteration– Termination: the iterative computation Termination: the iterative computation
terminatesterminates– SafetySafety (correctness, soundness): the solution is (correctness, soundness): the solution is
conservativeconservativeFor most problems the analysis produces “noise”For most problems the analysis produces “noise”
Theory and PracticeTheory and Practice
Analysis cost – how much time, memory Analysis cost – how much time, memory
Analysis precision – how much noiseAnalysis precision – how much noise– a.m(): A more precise analysis a.m(): A more precise analysis a:a: {B}, and a {B}, and a
less precise analysis less precise analysis a:a: {A,B,C} {A,B,C}
Typically, there is a tradeoff between Typically, there is a tradeoff between cost and precision! cost and precision!
In practice, we need to analyze very In practice, we need to analyze very large programs, 100K LOC, even 1M LOClarge programs, 100K LOC, even 1M LOC
Theory and PracticeTheory and Practice
Approximations - introduce noiseApproximations - introduce noise– make the CFG “smaller” make the CFG “smaller”
– make the set of flow facts “smaller”make the set of flow facts “smaller”
– make the transfer functions converge make the transfer functions converge fasterfaster
Approximations are necessary Approximations are necessary – But be careful: different approximations But be careful: different approximations
for different analysesfor different analyses
Standard ApproximationsStandard Approximations
Flow-sensitive vs. flow-insensitiveFlow-sensitive vs. flow-insensitive
x: {true} x = true;x: {true}, y: {false} y = false;x: {false}, y: {false} x = y;
x: {true,false}, y: {false}
Standard ApproximationsStandard Approximations
Context-sensitive vs. context-Context-sensitive vs. context-insensitiveinsensitive
A(bool X) { this.f = X;}a = new A(true);b = new A(false);
a.f = trueb.f = false
a.f: {true}, b.f: {false}
Merged flow:
a.f = true/falseb.f = true/false
a.f: {true,false},
b.f: {true,false}
Useful Background and Useful Background and SkillsSkills
Higher-level undergraduate or graduate Higher-level undergraduate or graduate courses on:courses on:– Programming Languages, Compilers, Algorithms, Programming Languages, Compilers, Algorithms,
Logic, Software Engineering, ArchitectureLogic, Software Engineering, Architecture
Analytical and programming skillsAnalytical and programming skillsStep1:Step1: Design a program analysis algorithm Design a program analysis algorithm
Understand your target language (e.g., Java and C++, Understand your target language (e.g., Java and C++, C)C)
Step2:Step2: Implement the analysis algorithm Implement the analysis algorithm Understand the language(s) of the infrastructureUnderstand the language(s) of the infrastructure
Step3:Step3: Evaluate analysis algorithm Evaluate analysis algorithm
Useful ResourcesUseful Resources
Books (my personal list)Books (my personal list)– ““Compilers: Principles, Techniques and Tools” Compilers: Principles, Techniques and Tools”
by Aho, Sethi, Ullman, Ch. 10 by Aho, Sethi, Ullman, Ch. 10 An introduction to data-flow analysisAn introduction to data-flow analysis
– ““Program Analysis” by Nielsen, Nielsen, HankinProgram Analysis” by Nielsen, Nielsen, HankinAn excellent reference for advanced studentsAn excellent reference for advanced students
– ““Model Checking” by Clarke, Grumberg, PeledModel Checking” by Clarke, Grumberg, Peled
Course material on the webCourse material on the web– Classes taught by professorsClasses taught by professors– My class (there are better ones, of course): My class (there are better ones, of course):
www.cs.rpi.edu/~milanova/csci6961/lectures/www.cs.rpi.edu/~milanova/csci6961/lectures/
Using and Developing Using and Developing Program AnalysisProgram Analysis
Mary Lou SoffaMary Lou Soffa
University of VirginiaUniversity of Virginia
About Mary Lou SoffaAbout Mary Lou SoffaConfused about what I wanted to beConfused about what I wanted to be
Ph.D. programs:Ph.D. programs:– Mathematics, Sociology; Philosophy; Mathematics, Sociology; Philosophy;
Environmental Acoustics: disenchantedEnvironmental Acoustics: disenchanted– Found what I really loved – computer Found what I really loved – computer
sciencescience
After 25+ years at Pitt, moved to UVAAfter 25+ years at Pitt, moved to UVA– Small farm – grow “crops”; love my tractorSmall farm – grow “crops”; love my tractor– Passion – increasing the participation of Passion – increasing the participation of
women and minorities in computer sciencewomen and minorities in computer science– Professional achievement – 24 Ph.D. Professional achievement – 24 Ph.D.
students; ½ are women.students; ½ are women.
Program analysisProgram analysis
How to apply program analysis in How to apply program analysis in your researchyour research
What are questions and what do you What are questions and what do you have to dohave to do
Solve a problem
Program behaviorstatic or dynamic
What parts of program are involved
Determine information needed
Develop appropriate representation
Develop analysis
Develop algorithm
Have a goal – program codeHave a goal – program code
ProblemProblem– Improve performanceImprove performance– Understand programUnderstand program– Find errorsFind errors– Locate cause of errorsLocate cause of errors
Need to collect information about the Need to collect information about the program that helps you infer properties of program that helps you infer properties of programprogram
Static or dynamic codeStatic or dynamic code
Determine information Determine information neededneeded
What questions are you askingWhat questions are you asking
What do you need to gather to What do you need to gather to answer questionsanswer questions– Examples:Examples:
Statements needed to compute an Statements needed to compute an expressionexpression
Values are always constant at a particular Values are always constant at a particular program pointprogram point
Locations of dead statementLocations of dead statement
Branches that are correlatedBranches that are correlated
Example: redundancyExample: redundancy
Remove redundancies with goal of Remove redundancies with goal of improving performance –improving performance –– Redundant redundant expressionsRedundant redundant expressions– Redundant loadsRedundant loads– Redundant storesRedundant stores– Dead codeDead code
Static Static Remove redundant expressions from Remove redundant expressions from
program representationprogram representation
Redundant expressionsRedundant expressions
Does the value need to be computed Does the value need to be computed for correct semantics?for correct semantics?
X := A * B
F := C + E
C := C + 1
If (cond) then R := A * B; S := C+ E
Else X := A * B; A := 6
End if
G= A*B
What parts of program What parts of program involvedinvolved
Given information you need, what Given information you need, what parts of program are involvedparts of program are involved
Examples:Examples:– branches and statements that change branches and statements that change
values in conditionalvalues in conditional– all possible execution pathsall possible execution paths– Array definitions and usesArray definitions and uses– TypesTypes– LoopsLoops
Example: Redundant Example: Redundant expressionsexpressions
ExpressionsExpressions
DefinitionsDefinitions
Control flow among definitions and Control flow among definitions and expressionsexpressions
Program pathsProgram paths
Program representationProgram representation
Program representation that enables Program representation that enables collection of informationcollection of information
GranularityGranularity– Source, intermediate, binarySource, intermediate, binary
Issues: how to get representation Issues: how to get representation from another representationfrom another representation
Example: redundant Example: redundant expressionsexpressions
Want to know how expressions flow Want to know how expressions flow
Is the value of an expression same Is the value of an expression same as when expression used againas when expression used again
Need control flow graph with Need control flow graph with statements in nodes – intermediate statements in nodes – intermediate levellevel
X := A + BX := A + B
Available ExpressionsAvailable ExpressionsControl flow graphControl flow graph
X := A * B
F := C + E
C := C + 1R := A * B
S := C+ E
X := A * B
A := 6
G := A*B
Formulate analysis over Formulate analysis over representationrepresentation
How to gather information from How to gather information from representationrepresentation
How many analysesHow many analyses
Direction of flow of analysisDirection of flow of analysis
Along all paths or any pathAlong all paths or any path
Local solutionLocal solution
Global solutionGlobal solution
Example: Redundant Example: Redundant expressionsexpressions
Local - basic block – single entry/exitLocal - basic block – single entry/exit
– What expressions are generatedWhat expressions are generated– What expressions are “killed” by a What expressions are “killed” by a
definitiondefinition
Global Flow over flow graphGlobal Flow over flow graph– Forward flowForward flow– Must be true on all paths Must be true on all paths
Redundant ExpressionsRedundant ExpressionsControl flow graphControl flow graph
X := A * B
F := C + E
C := C + 1R := A * B
S := C+ E
X := A * B
A := 6
G := A * B
{A * B}
{ A * B}
{ A * B, C+E}
{ A * B}
Develop analysesDevelop analyses
Data flow equations – use data flow Data flow equations – use data flow frameworkframework
AlgorithmAlgorithm
PrecisenessPreciseness
ExpenseExpense
Data flow equationsData flow equations
Gen (B) = all expressions Gen (B) = all expressions
Kill (B) = all definitions – kill all Kill (B) = all definitions – kill all incoming available expressionincoming available expression
Out(B) = Gen(B) Out(B) = Gen(B) (IN(B) – Kill(B)) (IN(B) – Kill(B))
In(B) = In(B) = Out(j) Out(j)
Dynamic OptimizationDynamic Optimization
Static optimizationsStatic optimizations– Apply before executionApply before execution
Dynamic OptimizationsDynamic Optimizations
Apply during execution – redundancy Apply during execution – redundancy expressionsexpressions
Binary code Binary code
Program tracesProgram traces
1. A = 42. T1 = A*B3. L1: T2 = T1/C4. If T2 < W go to L25. M = T1 * K6. T3 = M + 17. L2: H = I8. M = T3 - H9. If T3 > 0 go to L310. Go to L111. L3: halt
1. A = 42. T1 = A*B
3. L1: T2 = T1/C4.if T2<W go to L2
5. M = T1*K6. T3 = M + 1
7. L2: H = I8. M = T3-H9. If 3 > 0 go to L3
10. go to L1
11. L3:halt
B1
B2
B3
B4
B5
B6
B2
B3
B6
B1
B4
B5
Program TraceProgram Trace
A = 4T1 = A*BT2 = T1/CIf T2 !< W jump outH = IM = T3 - HIf T3 > 0 go to L3T2 = T1/CIf T2 !< W jump outM = T1 * KT3 = M + 1H = IM = T3 - Hhalt
Binary code
Dynamic optimizationDynamic optimization
Note:Note:
Single entry; multiple exitsSingle entry; multiple exits
No LoopsNo Loops
Need to Representation – bring up a Need to Representation – bring up a level from binary codelevel from binary code
Applying optimizationsApplying optimizations
Not as complicatedNot as complicated
But, cannot tolerate much overheadBut, cannot tolerate much overhead– Phases in staticPhases in static– Developed algorithm that can apply Developed algorithm that can apply
multiple optimizationsmultiple optimizations– Demand drivenDemand driven– Limit study of dynamic optimizationsLimit study of dynamic optimizations
ConclusionConclusion
Need analysis in many different Need analysis in many different applicationsapplications– Virtual execution enviromentsVirtual execution enviroments
MulticoreMulticore
Wireless sensor networksWireless sensor networks
– TestingTestingTesting for wireless sensor networksTesting for wireless sensor networks
Testing for securityTesting for security
Identifying and Building Identifying and Building Infrastructure Infrastructure
Lori’s JourneyLori’s Journey
Science/Math love: Started in chemistry at liberal arts college.
Field Trip and first cs course -> CS major.Advisor’s strong push for grad school -> U Pitt.
Took compilers course from Mary Lou -> PhD in compiler optimization.
Big year: 10/85-married Mark. 1/86-started at Rice. 4/86-PhD
Family: The yankees returned north 3 years later!
University of Delaware: 15+ yrs. Visiting, Assistant, Associate, Full
Family: Lauren (HS senior), Lindsay (16 and driving), Matt (11)
Support: Mark, Mark, Mark,… Mary Lou, Errol, Sandee, CRA-W
Currently: software tools, testing, compiler optimization
Identifying and Building Identifying and Building Infrastructure for Analysis Infrastructure for Analysis
ResearchResearch
What kinds of infrastructure do you What kinds of infrastructure do you need?need?
How to identify and build How to identify and build infrastructureinfrastructure
ExamplesExamples
What kinds of infrastructure do What kinds of infrastructure do you need?you need?
AnalysisFrameworkSoftware
Hardware Workloads
People
Labspace
Analysis Research and Evaluation
Identifying Analysis Framework Identifying Analysis Framework SoftwareSoftware
Determine Goals
Specify Requirements- Needed- Desired (Prioritized)
Search for Possibilities- Peers/Experts- Technical papers- Internet search
Try Them Out - Install + Run Tests- Read docs- Examine code- Try small task
Weigh Choices- Meet Requirements?- Ease of Use/Change?...
- Short term- Long term
Example: Identifying Analysis Example: Identifying Analysis Framework SoftwareFramework Software
Determine Goals
Specify Requirements
- Needed: call graph, cfg, chgRealistic environment/appsEasy to extend/build client tools
Search for Possibilities- Common environment is IDE, Java. Eclipse platform
Try Them Out - Install + explore- Write a small plugin- Use call graph, chg,cfg for small task
Weigh Choices- Learning curve vs Available analyses, realism
Evaluate new analysis on JavaOn its own and in client tool
Implementing Your AnalysisImplementing Your Analysis
Once you have decided on an Once you have decided on an infrastructure:infrastructure:– Think Reuse!! Think modularity!!Think Reuse!! Think modularity!!– Think prototype, but extensible and Think prototype, but extensible and
scalablescalable– Test, test, test - try to be systematicTest, test, test - try to be systematic– Debug – not easyDebug – not easy
Example: Implementing My NL Example: Implementing My NL Analysis Analysis
Build small modular components -> reuse Build small modular components -> reuse – Analyzing method signatures to extract NLAnalyzing method signatures to extract NL– Building program representation for NLBuilding program representation for NL– Traversing program repTraversing program rep– Building program rep for IRBuilding program rep for IR
Design reps to avoid loss of info -> reuseDesign reps to avoid loss of info -> reuse– Id’s and their roles and locations in codeId’s and their roles and locations in code– Verb, Direct object rep -> extensibleVerb, Direct object rep -> extensible
Managing the Evolving Managing the Evolving Software InfrastructureSoftware Infrastructure
Managing change over time and peopleManaging change over time and people– CVS, subversionCVS, subversion
Tracking tasks, bugs, deadlines/goalsTracking tasks, bugs, deadlines/goals– TRAC, bugzilla, gforgeTRAC, bugzilla, gforge
Maintaining documentationMaintaining documentation– JavaDocs, DoxygenJavaDocs, Doxygen
Testing, testing, testingTesting, testing, testing– Unit, system, regression -- test suitesUnit, system, regression -- test suites
Sounds like software engineering…Sounds like software engineering…
Selecting Appropriate Selecting Appropriate HardwareHardware
Determine Goals
Specify Requirements- Needed- Desired (Prioritized)
Search for Possibilities - Peers/Experts- System Staff
Weigh Choices- Meet Requirements?- Costs within budget?- Need to ask for money?
- Short term- Long term
Gathering Good WorkloadsGathering Good Workloads
ControlledExperiment
SynthesizedBenchmarks
Case Studies
Representative
Kind of Evaluation Desired
Try to reduce threats to validity of experiments:- varied/similar- domain- size- complexity/form- known and available to others
Example: Gathering Good Example: Gathering Good WorkloadsWorkloads
Case Studies
Representative
Kind of Evaluation Desired
Try to reduce threats to validity of experiments:- varied/similar- domain- size- complexity/form- known/available to others
Research Questions:- How effective is our FindConceptTool versus other code search tools?(versus lexical search and IR)(precision and recall)- How does the human effort compare?
Sourceforge:- very large- many cvs updates (active)- varied in domain
Identifying Strong StudentsIdentifying Strong Students
Teach a compiler or program analysis course Teach a compiler or program analysis course regularlyregularlyIdentify students from the courseIdentify students from the courseIdeal = Ideal =
Creative + quick to understand analysis Creative + quick to understand analysis + good problem solver + good problem solver + hard working + hard working + good coder + good coder + good communicator + good writer + good communicator + good writer + show initiative and interest in analysis+ show initiative and interest in analysis
Some training will be required.Some training will be required.Start Small. Create a Pipeline.Start Small. Create a Pipeline.
Building a Working Lab Building a Working Lab SpaceSpace
Needs:Needs:- one workspace/computer/storage per grad student- one workspace/computer/storage per grad student
- room for growth and undergrad researchers- room for growth and undergrad researchers
- current technology – minimize old machines – - current technology – minimize old machines – maintenance?maintenance?
- lab printer- lab printer
- lab library of research-oriented background books- lab library of research-oriented background books
Make it somewhere students want to work:Make it somewhere students want to work:
- - posters/pictures/plantsposters/pictures/plants
- open and pleasant – microwave, frig, coffeepot…?- open and pleasant – microwave, frig, coffeepot…?
- all needed resources/supplies easily available- all needed resources/supplies easily available
- conference room for larger research meetings- conference room for larger research meetings
Static Program Analysis: Static Program Analysis: Evaluating Your AnalysisEvaluating Your Analysis
A Typical Program Analysis A Typical Program Analysis Research ProjectResearch Project
Step 1: Design your analysisStep 1: Design your analysis– Reason about safetyReason about safety– Reason about complexity in terms of Reason about complexity in terms of
program sizeprogram size
Step 2: Implement your analysisStep 2: Implement your analysis– Hard! Hard! – Complex and difficult to test, debug and Complex and difficult to test, debug and
verify – a real problemverify – a real problem
Step 3: EVALUATE!Step 3: EVALUATE!
Evaluation of a Compiler Evaluation of a Compiler AnalysisAnalysis
Strict requirements for the analysisStrict requirements for the analysis– Safety is crucial! Safety is crucial!
An unsafe analysis may miss an execution An unsafe analysis may miss an execution path, and result in a change of the original path, and result in a change of the original programprogram
– Analysis time (and space) Analysis time (and space) Constraint by normal compilation timeConstraint by normal compilation time
Objective success criteriaObjective success criteria– Show improvement in execution timeShow improvement in execution time– Show reduction in memory footprintShow reduction in memory footprint
Evaluation of a Compiler Evaluation of a Compiler AnalysisAnalysis
Established benchmarksEstablished benchmarks– E.g., the SPEC JVM98 E.g., the SPEC JVM98
General evaluation of Java compilersGeneral evaluation of Java compilers
– E.g., the DaCapo benchmark suiteE.g., the DaCapo benchmark suiteMemory intensive Java applicationsMemory intensive Java applications
Ideally you would say something like this: Ideally you would say something like this:
““our analysis increases compilation time our analysis increases compilation time by at most 10%, and results in speed-up of by at most 10%, and results in speed-up of 10-16% on the SPEC JVM98 benchmarks”.10-16% on the SPEC JVM98 benchmarks”.
Evaluation of an Analysis for a Evaluation of an Analysis for a Software ToolSoftware Tool
Requirements for the analysis - not so Requirements for the analysis - not so strictstrict– Relaxing safety is OK!Relaxing safety is OK!– Analysis time (space) is not so crucialAnalysis time (space) is not so crucial
Developers would definitely wait if the analysis finds Developers would definitely wait if the analysis finds “difficult” bugs such as data races and memory leaks“difficult” bugs such as data races and memory leaks
Success criteria - not so objectiveSuccess criteria - not so objective– Precision – low noisePrecision – low noise– Practicality – practical time/space Practicality – practical time/space
requirements, works on 100K LOC requirements, works on 100K LOC – Usability of toolUsability of tool– Bugs found – absolutely sure Bugs found – absolutely sure
Evaluation of an Analysis for a Evaluation of an Analysis for a Software ToolSoftware Tool
Precision is CRUCIAL – noise is really bad!Precision is CRUCIAL – noise is really bad!– E.g., there are 10 buffer overflow bugs in program PE.g., there are 10 buffer overflow bugs in program P– Safe analysis A issues 1000 warnings, 10 are real and Safe analysis A issues 1000 warnings, 10 are real and
990 are false positives990 are false positives– Unsafe analysis B issues 13 warnings, 8 are real and 5 Unsafe analysis B issues 13 warnings, 8 are real and 5
are false positivesare false positives– Analysis B is much more useful than analysis A!Analysis B is much more useful than analysis A!
Absolute precision – done more and more oftenAbsolute precision – done more and more often– Choose a subset of analyzed programsChoose a subset of analyzed programs– Manually find the real solutionManually find the real solution– Compare with analysis solutionCompare with analysis solution
Precision – how much noise is there? Precision – how much noise is there? Recall (if the analysis is unsafe) – how much did it miss?Recall (if the analysis is unsafe) – how much did it miss?
– E.g., a.m(): The real solution E.g., a.m(): The real solution a:a: {B}, a safe analysis {B}, a safe analysis solution solution a:a: {A,B,C}. Precision - 67% noise! {A,B,C}. Precision - 67% noise!
Evaluation of an Analysis for a Evaluation of an Analysis for a Software ToolSoftware Tool
Finding a benchmark setFinding a benchmark set– Depends on analysis applicationDepends on analysis application– Large programsLarge programs– Diverse programs, as many as it is feasibleDiverse programs, as many as it is feasible– Publicly available: sourceforge.orgPublicly available: sourceforge.org– Look at benchmark suites in published work!Look at benchmark suites in published work!
Ideally, you will have a large set of diverse Ideally, you will have a large set of diverse programs, will show acceptable absolute programs, will show acceptable absolute precision (low false positive rate) and precision (low false positive rate) and practical cost practical cost
Comparison with Existing Comparison with Existing AnalysisAnalysis
Well-known program analysis Well-known program analysis problemsproblems– ““Haven’t we solved that problem yet?”Haven’t we solved that problem yet?”– E.g., Points-to analysisE.g., Points-to analysis
Design a new analysis ADesign a new analysis A
Compare with best known analysis BCompare with best known analysis B– Show improvement in one or more of: Show improvement in one or more of:
analysis cost, analysis precision analysis cost, analysis precision
What Not to DoWhat Not to Do
Propose a new analysis without any Propose a new analysis without any evaluationevaluation– E.g., “We describe this new great points-to E.g., “We describe this new great points-to
analysis.”analysis.”
Design your own metric, different from Design your own metric, different from established metricsestablished metrics– E.g., “We propose a novel points-to analysis A E.g., “We propose a novel points-to analysis A
and points-to analysis A’ which improves on A. and points-to analysis A’ which improves on A. Therefore, both A and A’ are great.”Therefore, both A and A’ are great.”
Use non-standard benchmarkUse non-standard benchmark– Report on a subset: the ones for which the Report on a subset: the ones for which the
analysis worksanalysis works
QuestionsQuestions
An Example: Devirtualization in An Example: Devirtualization in Object-oriented ProgramsObject-oriented Programs
Polymorphism and dynamic dispatchPolymorphism and dynamic dispatchclass A { void m() { … } }class A { void m() { … } }class B extends A { void m() { … } }class B extends A { void m() { … } }class C extends A { void m() { … } }class C extends A { void m() { … } }
– Virtual call: Virtual call: a.m()a.m() is dispatched at run-time, is dispatched at run-time, based on the class of the receiver, A, B or Cbased on the class of the receiver, A, B or C
– Powerful: enables modern software engineeringPowerful: enables modern software engineering– But costly: 13% of time spent in virtual dispatchBut costly: 13% of time spent in virtual dispatchAnalysis: “only Analysis: “only BB objects ever flow to objects ever flow to aa””Optimization: virtual call Optimization: virtual call a.m()a.m() => direct => direct call to call to B.m()B.m()
Uses of Static Program Uses of Static Program AnalysisAnalysis
Software engineering toolsSoftware engineering tools– Static debugging, verification, securityStatic debugging, verification, security
Uncover difficult errors and security flawsUncover difficult errors and security flaws
– Testing Testing Evaluate and improve test suitesEvaluate and improve test suites
– Software understandingSoftware understandingCalling structureCalling structureComplex dependencesComplex dependencesChange impactsChange impacts
Many (unexplored) areas of applicationMany (unexplored) areas of application
Static DebuggingStatic Debugging
Analyze the program and look for bugsAnalyze the program and look for bugs– Memory and pointer bugs: memory leaks, null Memory and pointer bugs: memory leaks, null
pointer dereferences, double frees, buffer pointer dereferences, double frees, buffer overflows, etc.overflows, etc.
– Concurrency bugs: races, deadlocksConcurrency bugs: races, deadlocks– Issue warnings Issue warnings Microsoft: Microsoft: – PREFix and PREfast tools in use since 2000PREFix and PREfast tools in use since 2000– Many new tools developedMany new tools developedIBM:IBM:– Tools for static debugging of production J2EE Tools for static debugging of production J2EE – Tools for security auditing of J2EETools for security auditing of J2EE
Software TestingSoftware Testing
Coverage-based testingCoverage-based testing– Improve test quality with good “coverage”Improve test quality with good “coverage”– E.g., cover E.g., cover all possibleall possible receiver classes at virtual calls receiver classes at virtual calls
Step 1Step 1: analyze the tested code: analyze the tested code– What areWhat are all possibleall possible receiver classes at virtual calls?receiver classes at virtual calls?
a.m(): Analysis: “only B objects ever flow to a”a.m(): Analysis: “only B objects ever flow to a”
Step 2Step 2: insert instrumentation: insert instrumentation
Step 3Step 3: run tests and report coverage: run tests and report coverage– What were theWhat were the receiver classes receiver classes actually observedactually observed while while
running the tests?running the tests?
comparecompare
Software UnderstandingSoftware Understanding
Navigate through calling structure:Navigate through calling structure:
Reason about (im)mutabilityReason about (im)mutability– Powerful, central to imperative programmingPowerful, central to imperative programming– Many real bugs are due to unintended Many real bugs are due to unintended
mutabilitymutabilityQ1: is a method A.m(…) side-effect free?Q1: is a method A.m(…) side-effect free?Q2: can a private field in a class A be mutated by Q2: can a private field in a class A be mutated by untrusted clients of A (i.e., classes that use A)?untrusted clients of A (i.e., classes that use A)?
Reason about other quality attributesReason about other quality attributesFind code related to a change, etc.Find code related to a change, etc.Reverse engineeringReverse engineering
X.n()
B.m()
Program RepresentationsProgram Representations
if (x<y) then z=1; else z=2;if (x<y) then z=1; else z=2;
Control Flow GraphControl Flow Graph– LinearLinear– 3-address statements3-address statements– Flow of controlFlow of control
Syntax TreeSyntax Tree– TreeTree– Parse tree of the programParse tree of the program
if x<y
z = 1 z = 2
T F
If-then-else
Expr Stmt Stmt
x<y z=1 z=2