Infrastructure for Correctness Tools Jon Pincus Reliability Group (PPRC) Microsoft Research.

Post on 15-Jan-2016

212 views 0 download

transcript

Infrastructure for Correctness Tools

Jon PincusReliability Group (PPRC)

Microsoft Research

Jon Pincus (Microsoft Research)

2

Outline

• Definitions

• Case study

• Key infrastructure challenges

• Opportunities

Jon Pincus (Microsoft Research)

3

What is software?

Jon Pincus (Microsoft Research)

4

What is software?

• A software system is a collection • … of multiple executables …

• … which are decomposed into modules …• (… each possibly present in multiple exes …)

• … which are in turn made up of functions …• (… which interact – e.g., caller/callee …)

• … which exist in source and object form …

• … and change over time.

Jon Pincus (Microsoft Research)

5

Some structure, please!

• First-class artifacts• E.g. executables, modules, classes, functions, variables,

types, …

• Multiple views of each artifact• E.g., source code, parse trees, object code, coverage info,

traces, performance data, specifications, …• Internally, there may be view-specific structure

• (e.g., source line)

• Relations between different views and artifacts• Hierarchy, versioning, dependency, …

Jon Pincus (Microsoft Research)

6

Infrastructure = support for artifacts, views, and relations

• Storing• Viewing• Manipulating

• Programmatically• Graphically

• Transforming between views• Parsing creates parse trees from source code• Decompilation goes from object code to source

• Verifying consistency of different views• E.g., checking transformations

Jon Pincus (Microsoft Research)

7

Tools = Usable Solutions

• Tools are used to accomplish user goals• Organizational

• “get higher-quality software to market”• “develop more efficiently”• “stop breaking the build”

• Personal• “fix that killer bug”• “stop getting blamed for breaking the build”• “perform experiment X and write a PLDI paper”

Jon Pincus (Microsoft Research)

8

Infrastructure != Tools

• Tools build on infrastructure• Thus, infrastructure is useful if supports tools

• (i.e., the kind of tools people want to build)

• Infrastructure can be extracted from tools• May be problematic if not designed as infrastructure

• Ideally, tools can be packaged into systems with common infrastructure …• … but that’s stage 2 (or 3 or N)

Jon Pincus (Microsoft Research)

9

Sounds familiar?

• The rough model (and much of the terminology) is similar to that used in EDA [Electronic Design Automation]

• That’s no coincidence

(No, I’m not equating software and hardware design; there are definite differences. But I digress …)

Jon Pincus (Microsoft Research)

10

Correctness Tools

• A “correctness tool” helps improve the correctness (reliability, robustness) of the code

• It must do more than just identifying defects• Help people understand, prioritize, and fix

defects

• Difficult to separate from general “program understanding”

Jon Pincus (Microsoft Research)

11

Outline

• Definitions

• Case study

• Key infrastructure challenges

• Opportunities

Jon Pincus (Microsoft Research)

12

PREfix

• Analyzes C/C++ source code

• Identifies defects

• GUI to aid understanding and prioritization• Viewing individual defects

• Sorting/filtering sets of defects

• Integrates smoothly into existing builds

• Stores results in database

Jon Pincus (Microsoft Research)

13

(mod “PwrOf2” (c “a” init) (t t1 (& a (-a 1))) (g t1<0:4> !0 (r 0 success)) (g t1<0:4> 0) (r 1 success)))

PREfix 3.X Architecture

#include <std.h>int PwrOf2(int a){ if (a & (a - 1)) return 0; else return 1;}

SimulatorSimulator

AutoModeler

Virtual Machine

Execution Control

ErrorAnalysis

Source Code

Model Database Defect Database

Web Browser

C/C++C/C++ParserParser

Jon Pincus (Microsoft Research)

14

Web Based User Interface

Jon Pincus (Microsoft Research)

15

PREfix’ views of the data

• Parse trees • Defects [more generally, “messages”]

• A distinguished location in the code• Properties of that location and the path that was traversed

to get there

• Models • External behavior of function• Collection of paths through the function

• Source code• Existing build structure

Jon Pincus (Microsoft Research)

16

Key infrastructure operations

• Parsing

• Calculating function dependencies

• Generating and storing models

• Generating and storing defect information

• Viewing/sorting/filtering sets of defects

• Viewing paths through source code

• Build integration

Jon Pincus (Microsoft Research)

17

PREfix as infrastructure consumer

• Part of PREfix (built and/or used in build)• EDG Parser• zlib [for compression]• Installer• Perl• ANTLR• STL

• Totals ~ 80%-90% of “code mass”

• Used (i.e., required) by PREfix• Database [SQL Server]• Web Server [IIS]• Web Browser [IE]

Jon Pincus (Microsoft Research)

18

PREfix as (potential) infrastructure provider

• Various separable/reusable components, e.g.• Defect schema (and tools to operate on it)• Models• Viewing• Parser isolation module• Build integration

• ~ 60-75% of the remaining code mass • We would have preferred to reuse infrastructure for

these – but it wasn’t possible/practical• Implies that total infrastructure could be 95-98%

• Not all are packaged as infrastructure yet

Jon Pincus (Microsoft Research)

19

Now what?

• PREfix is being used for heavy-duty, “overnight” analysis

• Developers want a desktop tool for quick feedback• Decent information, fast, is useful in ways that

better (but slower) information isn’t• Technique: trade performance for completeness

• How to get there quickly?

Jon Pincus (Microsoft Research)

20

Counterintuitively …

Actual analysis is only a small part of any program analysis tool …

… so even if analysis is reimplemented, leveraging infrastructure is a big win

Jon Pincus (Microsoft Research)

21

PREfast

• Reuse:• AST Parsing• PREfix models

• Use info generated in central runs for local analysis

• Subset of UI

• Implement local defect detection• PREfix’ semi-exhaustive path walking and global

analysis isn’t a good match for this use model

Jon Pincus (Microsoft Research)

22

Some “infrastructure” benefits

• Initial prototype up and running quickly• And it demos well, too!

• Easy to test on large code bases• Side benefit: already getting value

• Decent usability, immediately• Leverage all PREfix’ ease-of-use work

• Major savings in documentation and testing

Jon Pincus (Microsoft Research)

23

Outline

• Definitions

• Case study

• Key infrastructure challenges(or at least one person’s opinion thereof)

• Opportunities

Jon Pincus (Microsoft Research)

24

Infrastructure must be high quality

• Including:• Robustness

• Scalability

• Usability

• Clarity

• Stability and maintenance over time

• It’s so easy to develop “okay” software …• Unless infrastructure is good, why bother using it?

Jon Pincus (Microsoft Research)

25

Approaches to quality

• Invest appropriately• Practice software engineering• Don’t treat quality as an afterthought• Measure (and improve) quality attributes

• Reuse existing (C)OTS components, e.g.• Databases. They work. Use them.• XML (schemas, storage, viewing, …)• Parsers.

Jon Pincus (Microsoft Research)

26

Infrastructure must be “real”• For its target audience, it must:

• Support the mainstream languages• (yes, even if they’re hard)

• Handle “large enough” software• Be usable by the typical consumer (tool developer or user)

• There are definitely domain differences, e.g.• “Simple web stuff”: 100s or 1000s of lines of HTML, JScript,

VBScript• Glue code/WebUIs: 1000s of lines of Perl, DHTML, VB• Hard-core software dev: millions+ of lines of C/C++

• Full generality is hard!

Jon Pincus (Microsoft Research)

27

What languages are needed?(Please assume appropriate trademark/copyright symbols)

• Perl• C [K&R, ANSI], C++ [Cfront, GCC extensions, MSVC

extensions], Java, C#• VB, TCL• ECMAScript [JScript, JavaScript], VBScript, Python• HTML [HTML3.2, Netscape/MS variants], DHTML• SQL• XML, XSL, XSL-T, XML Schemas• FORTRAN, COBOL for legacy code• Make, sh, InstallShield, IDL, Excel macros, …

Jon Pincus (Microsoft Research)

28

Approaches to “reality”

• Even if the technology is general, consider an initial narrow focus• Look for underserved niches

• Understand your audience• Typical environment• Requirements• Goals• Constraints

• Test on real-world code bases

Jon Pincus (Microsoft Research)

29

Infrastructure must not require change

• It’s very difficult to get people to • Change their processes• Change language variants (let alone languages)• Modify their existing code in any way

• Infrastructure alone can’t offer enough value to force a change• E.g., if they don’t write specs now,

infrastructure won’t get them to start

Jon Pincus (Microsoft Research)

30

Approaches to not requiring change

• Enable change, but do not require it• Example: do something useful even without

programmer annotations; do more if annotations exist

• Understand existing processes and make integration seamless• A narrow focus helps here

• Don’t overestimate your influence

Jon Pincus (Microsoft Research)

31

Infrastructure must enable useful tools

• Infrastructure has direct value to tool-builders

• Infrastructure has indirect value to end users• It’s the tools enabled by the infrastructure that has

the direct value

Jon Pincus (Microsoft Research)

32

Approaches to enabling tools

• Develop infrastructure and tool simultaneously • Caveat: it’s a lot more work …

• Develop infrastructure jointly with a tool developer• Caveat: requires joint work; goals may differ

• Re-implement infrastructure for existing tools• Caveat: must show a major benefit

Jon Pincus (Microsoft Research)

33

Outline

• Definitions

• Case study

• Key infrastructure challenges

• Opportunities

Jon Pincus (Microsoft Research)

34

Opportunities for infrastructure

• Anything to do with user interaction• Visualization of relations

• Presenting data from multiple views

• Abstracting information

• Properties of program artifacts

• Paths and information about paths

• A general object model (too ambitious?)

Jon Pincus (Microsoft Research)

35

Questions?

Infrastructure for Correctness Tools

Jon PincusReliability Group (PPRC)

Microsoft Research