Download - Dependence Graphs for Information Assurance

July 2001 OASIS --- Santa Fe 1

Dependence Graphs for Information Assurance

Paul Anderson

[email protected]

GrammaTech, Inc.

Ithaca, NY

http://www.grammatech.com

Tim Teitelbaum

[email protected]

(Cornell)


Situation

• Front door open– Eligible Receiver: 65% of attacks successful; 63% undetected [Tenet; Minihan]

• Back door (installed and) open– Back Orifice, etc. demonstrate the potential of automated intrusion maintenance

• Blinds are up– Open sources expose critical software and all its flaws

• Foundation is rotten– Buggy software is the norm [Weise; Engler]

– A dozen new buffer-overrun attacks every week [Epstein]

• The domestic is foreign– 195,000 H1B visas issued per year


DISCEX II Keynotes

• DoD Perspective– “Nation states are our greatest concern”

LTG Edward G. Anderson IIIDeputy Commander in ChiefUnited States Space Command

• Characteristics

– Planning

– Long term view

– Strategic investment

– Stealth

• Questions for OASIS

– What is your model of a

nation-state attack?

– How do you address it?


DISCEX II Keynotes

• Industry Perspective– “DARPA should focus on tools for building safer code”– “More emphasis needed on correct implementation,

especially for security products!”Jeremy EpsteinDirector, Product SecuritywebMethods, Inc.

• DoD Perspective– “Nation states are our greatest concern”

LTG Edward G. Anderson IIIDeputy Commander in ChiefUnited States Space Command


Questions

• Front door open– What have nation states been doing to us while we have been so exposed?

• Back door (installed and) open– What percentage of observed attacks are nation-state attacks?

• Blinds are up– Are nation states investing heavily in vulnerability analysis of open source code?

• Foundation is rotten– Are nation-state insider code-level attacks distinguishable from bugs?

• The domestic is foreign– How many foreign agents program routers for Cisco?– How does Cisco defend its products from its own malicious employees?– Do you consider firmware part of the TCB? On what basis?


President’s Critical Infrastructure Protection Plan

Recommendations [page 61]6.1 Critical Infrastructure Protection Research and Development

Intrusion Detection and Monitoring

. . .

This program is designed to develop advanced software tools and techniques

that can detect and eliminate trap doors and other malicious code in software.

Although detecting subtle but intentional alterations to computer code is

problematic, these tools will increase the integrity of software products, and

thereby reduce the probability of future penetrations and compromises of

computers and networks.

Development of Advanced [...] Software Tools for Trap Door Analysis

and Malicious Code Detection

DARPAHARD


Role of Static Code Analysis in IA

• Assumptions– Implementations are critical– Tools for understanding code are strategic

• Attacks and Vulnerabilities– Trap doors and exploitable bugs

• Approach – Statically detect and eliminate

• Other applications of core-technology – Policy enforcement by code rewriting– Model extraction from code

• Scope– Mission-critical and mass-market– Open-source and closed-source– Source and binary


• Core Static-Analysis Research– Context-Sensitive GMOD / GREF Analysis– Fine-grained discrimination by structure-fields– Variable-based queries and function-based queries– Non-structured control constructs, e.g., switch, break, continue, goto

• better precision

– Pointer Analysis• better performance [about x6 faster]

– Interprocedurally-precise model checker (mu-calculus)

• Information Assurance Workbench– Buffer overrun vulnerability detection and analysis– Pattern matching on AST fragments

Accomplishments [past 6 months]

2

3

1

1


Analysis of Buffer Overrun Vulnerabilities

• Code Red attack– Begins by exploiting a buffer-overrun vulnerability

• Static analysis

– Can detect potential buffer-overrun vulnerabilities

1


Analysis of Buffer Overrun Vulnerabilities, cont.

• Code Red attack– Exploits an unchecked byte-string to wide-character-string conversion

– Assume the operation used was

mbstowcs(char *dst, char *src, int length)

– Can 2*length be bigger than size of dst?

• Dependence queries– Reveal potential information-flows, e.g.,

• from data sources under user control (external strings)

• to dangerous operations (unchecked length arguments)



Sources of external strings

mbstowcs(wide_buf, buf, );strlength(buf)

Sources of internal strings

Unchecked variable-length argument mbstowcs(wide_buf, buf, );SIZE

Unchecked fixed-length argument

mbstowcs(wide_buf, buf, );r

Checked variable-length argument

Can users influence the length argument of a string-to-wide-character conversion?

Bounds-check










Dependences from data sources to mbstowcs arguments

Bounds-check










Chop from sources to targets shows all possible information flows

bounds-check

sources targets










Good news: find all flows; Bad news: false positive (flow through bounds-check)

bounds-check


Code Red source-code mock-up, showing

chop-sources, chop-targets , and query-

results.

chop-targets


chop-sources

bounds-check



• Model checking queries

– Can assert and check properties about flow paths

– Counter-examples: reveal possible vulnerabilities

• Sample (false) assertion

Every path from a user data source to the

length argument of mbstowcs goes through a

bounds-check

• Sample counter-example

– Path from data source to unchecked length argument of mbstowcs










Good news: counter-example avoids bounds-check

bounds-check



No bounds-check

Counter-example in query-results; Chop result in _______ background



• Constraint satisfaction [Wagner, et al.]

– Assert required constraints between destination buffer sizes and

corresponding copy length arguments

– Report all cases where constraints are not satisfied

• Use of CodeSurfer [future work]

– Implement in industrial-strength framework

– Reduce false positives reported

• Context-sensitive constraint satisfaction

• Better pointer analysis

– Interactive tool for analysis of false positives


Context-Sensitive GMOD / GREF Analysis

• Accurate dependence analysis for reference parameters– Previously, a major source of imprecision– Now, context-sensitive analysis of non-local variable usage– Substantial improvement– Relevant for buffer-overrun analysis

• Example– Instead of

mbstowcs(char *dst, char *src, int length)

consider

assign(char *dst, char *src){

*dst = *src;}

2


Context-Sensitive GMOD / GREF Analysis

• When procedure P has formal parameter F of type *T, the flow-insensitive,

context-insensitive points-to set of F is the union of the points-to sets of all

corresponding actual parameters in calls to P (plus any other pointers

assigned to F in P)

assign(&a1, &a2); assign(&b1, &b2);

assign(char *dst, char *src)

{

*dst = *src;

}


Context-Sensitive GMOD / GREF Analysis, cont.

• Additional (hidden) actual and formal parameters are generated to represent

the variables modified or referenced via formal F (as well as variables modified

or referenced via global pointer variables)


assign(char *dst, char *src)a2,b2 a1,b1

{

*dst = *src;

}



• The generated actual parameters are wired to the accessible defs and uses of

the variables accessible via F.

• In general, there is more than one def for each actual-in, and more than one

use for each actual-out.



{

*dst = *src;

}

a2=… b2=……=a1 …=b1



• Previously, the wiring was based on the points-to sets of the corresponding formal parameters. Thus, the additional edges (in blue) were also wired.



{

*dst = *src;

}

a2=… b2=……=a1 …=b1



• A backward slice from a use of variable b1 shows what influences its value.• It is computed by following dependence edges backward.



{

*dst = *src;

}

a2=… b2=……=a1 …=b1



• A backward slice from a use of variable b1 shows what influences its value.• It is computed by following dependence edges backward.• Only feasible paths are followed, i.e., edges shown in gold are not followed.

Good!



{

*dst = *src;

}

a2=… b2=……=a1 …=b1



• But the path to variable a2 would also be followed. Bad! Variable a2 has no

influence on variable b1



{

*dst = *src;

}

a2=… b2=……=a1 …=b1



• … and other spurious paths would also be followed. Very bad!



{

*dst = *src;

}

a2=… b2=……=a1 …=b1



• This bad behavior has now been eliminated.

• We now distinguish between variables accessible because of actual

parameters and variables accessible because of globals.



{

*dst = *src;

}

a2=… b2=……=a1 …=b1



• Big win in precision

• Big win in time and space

• But a new time and space problem [example not shown]Conjecture: Greater precision makes previously identical sets (with shared

representations) different (and therefore unshared)

Program LOCSize of SDG

Summary edge time Build time

Forward slice time

Backward slice time

compress 1,937 -48% 0% -62% -50% 0%cpp 4,079 -16% -33% -19% -47% -29%

byacc 6,626 3% -23% -27% -34% -25%cadp 12,787 -21% 14% -39% -32% -22%flex 12,400 -6% 4% -28% -33% -29%

ijpeg 28,177 -24% -81% 6% -71% -71%go 29,246 -21% -31% -35% -46% -53%

ntpd 61,068 -12% -16% -27% -12% -7%


Discrimination by Structure Field

• Previously, all fields participated in every operation on any field

– e.g., predecessors of p->f were defs of every field of struct pointed to by p

• Now, there is an option to discriminate on structure fields

– e.g., predecessors of p->f are only defs of field f of structs pointed to by p

• But casts and unions must be taken into account

– For portable analysis, cannot use explicit offsets

– Two fields f1 and f2 in different struct types T1 and T2 have the same offsets if the field

sequences leading up to f1 and f2 have pair-wise compatible types

• Explicit offsets could be used in the future for precise platform-dependent analysis

• New problem to be solved

– Unless calls to malloc are immediately cast to their intended types, field discrimination

is lost

3


Transitions (Spin-off SBIR Research)

• Current SBIR Phase I projects to transition research to products

– Malicious Code Detection in Firmware (Air Force)

• CodeSurfer for x86; use to detect malicious code

– Model Checking of Hierarchical Graph Structures (DARPA / ITO)

• CodeSurfer model checking plug-in for QA

– Inlined Reference Monitors for Java Bytecode (NIST)

• Use of dependence-graph technology for insertion of efficient IRMs

– Model Checking of UML designs (Navy / Aegis)

• Model-checking to assure properties in UML Rose/RealTime designs

– Dependence Graphs for Dynamic Internet Technologies (NSF)

• CodeSurfer for Java; decision support for test coverage

– Static Analysis for AOP (DARPA / PCES)

• Aspect C (separate take from Gregor’s)

– New Technique for Efficient Compression of Information (BMDO) *

• BDD variant, potential for double-exponential decision tree compression

* [unrelated to DARPA research]


Transitions, cont.

• Recent Product Release – CodeSurfer 1.5

• Open APIs to C program representation and analysis operations

• Paper– Software Inspection using CodeSurfer, WISE’01 Workshop on Inspection

in Software Engineering, July 23rd, 2001, Paris.


Workshop Topics

• Integration Opportunities– Projects exploring code rewriting or reorganization, or developing

vulnerability scanners• Client of our open APIs to program representation and analyses

– Projects relying on a system model• Potential to extract the model automatically from the code

• Validation– Scalability

• Performance on benchmarks

– Vulnerability detection

• False positive rate w.r.t. “truth”, e.g., known buffer overrun attacks


Future Work

• Core Technology– Pointer analysis– Dependence analysis

• concurrency, asynchronous control, reused storage, types, array strides– Model checker (model reduction and abstraction relaxation)– Constraint satisfaction: sets and numeric ranges– Summary information for libraries– Rewriting support– Performance

• Information Assurance Workbench– Scanners for buffer overruns, race conditions, covert channels