July 2001 OASIS --- Santa Fe 1
Dependence Graphs for Information Assurance
Paul Anderson
GrammaTech, Inc.
Ithaca, NY
http://www.grammatech.com
Tim Teitelbaum
(Cornell)
July 2001 OASIS --- Santa Fe 2
Situation
• Front door open– Eligible Receiver: 65% of attacks successful; 63% undetected [Tenet; Minihan]
• Back door (installed and) open– Back Orifice, etc. demonstrate the potential of automated intrusion maintenance
• Blinds are up– Open sources expose critical software and all its flaws
• Foundation is rotten– Buggy software is the norm [Weise; Engler]
– A dozen new buffer-overrun attacks every week [Epstein]
• The domestic is foreign– 195,000 H1B visas issued per year
July 2001 OASIS --- Santa Fe 3
DISCEX II Keynotes
• DoD Perspective– “Nation states are our greatest concern”
LTG Edward G. Anderson IIIDeputy Commander in ChiefUnited States Space Command
• Characteristics
– Planning
– Long term view
– Strategic investment
– Stealth
• Questions for OASIS
– What is your model of a
nation-state attack?
– How do you address it?
July 2001 OASIS --- Santa Fe 4
DISCEX II Keynotes
• Industry Perspective– “DARPA should focus on tools for building safer code”– “More emphasis needed on correct implementation,
especially for security products!”Jeremy EpsteinDirector, Product SecuritywebMethods, Inc.
• DoD Perspective– “Nation states are our greatest concern”
LTG Edward G. Anderson IIIDeputy Commander in ChiefUnited States Space Command
July 2001 OASIS --- Santa Fe 5
Questions
• Front door open– What have nation states been doing to us while we have been so exposed?
• Back door (installed and) open– What percentage of observed attacks are nation-state attacks?
• Blinds are up– Are nation states investing heavily in vulnerability analysis of open source code?
• Foundation is rotten– Are nation-state insider code-level attacks distinguishable from bugs?
• The domestic is foreign– How many foreign agents program routers for Cisco?– How does Cisco defend its products from its own malicious employees?– Do you consider firmware part of the TCB? On what basis?
July 2001 OASIS --- Santa Fe 6
President’s Critical Infrastructure Protection Plan
Recommendations [page 61]6.1 Critical Infrastructure Protection Research and Development
Intrusion Detection and Monitoring
. . .
This program is designed to develop advanced software tools and techniques
that can detect and eliminate trap doors and other malicious code in software.
Although detecting subtle but intentional alterations to computer code is
problematic, these tools will increase the integrity of software products, and
thereby reduce the probability of future penetrations and compromises of
computers and networks.
Development of Advanced [...] Software Tools for Trap Door Analysis
and Malicious Code Detection
DARPAHARD
July 2001 OASIS --- Santa Fe 7
Role of Static Code Analysis in IA
• Assumptions– Implementations are critical– Tools for understanding code are strategic
• Attacks and Vulnerabilities– Trap doors and exploitable bugs
• Approach – Statically detect and eliminate
• Other applications of core-technology – Policy enforcement by code rewriting– Model extraction from code
• Scope– Mission-critical and mass-market– Open-source and closed-source– Source and binary
July 2001 OASIS --- Santa Fe 8
• Core Static-Analysis Research– Context-Sensitive GMOD / GREF Analysis– Fine-grained discrimination by structure-fields– Variable-based queries and function-based queries– Non-structured control constructs, e.g., switch, break, continue, goto
• better precision
– Pointer Analysis• better performance [about x6 faster]
– Interprocedurally-precise model checker (mu-calculus)
• Information Assurance Workbench– Buffer overrun vulnerability detection and analysis– Pattern matching on AST fragments
Accomplishments [past 6 months]
2
3
1
1
July 2001 OASIS --- Santa Fe 9
Analysis of Buffer Overrun Vulnerabilities
• Code Red attack– Begins by exploiting a buffer-overrun vulnerability
• Static analysis
– Can detect potential buffer-overrun vulnerabilities
1
July 2001 OASIS --- Santa Fe 10
Analysis of Buffer Overrun Vulnerabilities, cont.
• Code Red attack– Exploits an unchecked byte-string to wide-character-string conversion
– Assume the operation used was
mbstowcs(char *dst, char *src, int length)
– Can 2*length be bigger than size of dst?
• Dependence queries– Reveal potential information-flows, e.g.,
• from data sources under user control (external strings)
• to dangerous operations (unchecked length arguments)
July 2001 OASIS --- Santa Fe 11
Analysis of Buffer Overrun Vulnerabilities, cont.
Sources of external strings
mbstowcs(wide_buf, buf, );strlength(buf)
Sources of internal strings
Unchecked variable-length argument mbstowcs(wide_buf, buf, );SIZE
Unchecked fixed-length argument
mbstowcs(wide_buf, buf, );r
Checked variable-length argument
Can users influence the length argument of a string-to-wide-character conversion?
Bounds-check
July 2001 OASIS --- Santa Fe 12
Analysis of Buffer Overrun Vulnerabilities, cont.
Sources of external strings
mbstowcs(wide_buf, buf, );strlength(buf)
Sources of internal strings
Unchecked variable-length argument mbstowcs(wide_buf, buf, );SIZE
Unchecked fixed-length argument
mbstowcs(wide_buf, buf, );r
Checked variable-length argument
Dependences from data sources to mbstowcs arguments
Bounds-check
July 2001 OASIS --- Santa Fe 13
Analysis of Buffer Overrun Vulnerabilities, cont.
Sources of external strings
mbstowcs(wide_buf, buf, );strlength(buf)
Sources of internal strings
Unchecked variable-length argument mbstowcs(wide_buf, buf, );SIZE
Unchecked fixed-length argument
mbstowcs(wide_buf, buf, );r
Checked variable-length argument
Chop from sources to targets shows all possible information flows
bounds-check
sources targets
July 2001 OASIS --- Santa Fe 14
Analysis of Buffer Overrun Vulnerabilities, cont.
Sources of external strings
mbstowcs(wide_buf, buf, );strlength(buf)
Sources of internal strings
Unchecked variable-length argument mbstowcs(wide_buf, buf, );SIZE
Unchecked fixed-length argument
mbstowcs(wide_buf, buf, );r
Checked variable-length argument
Good news: find all flows; Bad news: false positive (flow through bounds-check)
bounds-check
July 2001 OASIS --- Santa Fe 15
Code Red source-code mock-up, showing
chop-sources, chop-targets , and query-
results.
chop-targets
Analysis of Buffer Overrun Vulnerabilities, cont.
chop-sources
bounds-check
July 2001 OASIS --- Santa Fe 16
Analysis of Buffer Overrun Vulnerabilities, cont.
• Model checking queries
– Can assert and check properties about flow paths
– Counter-examples: reveal possible vulnerabilities
• Sample (false) assertion
Every path from a user data source to the
length argument of mbstowcs goes through a
bounds-check
• Sample counter-example
– Path from data source to unchecked length argument of mbstowcs
July 2001 OASIS --- Santa Fe 17
Analysis of Buffer Overrun Vulnerabilities, cont.
Sources of external strings
mbstowcs(wide_buf, buf, );strlength(buf)
Sources of internal strings
Unchecked variable-length argument mbstowcs(wide_buf, buf, );SIZE
Unchecked fixed-length argument
mbstowcs(wide_buf, buf, );r
Checked variable-length argument
Good news: counter-example avoids bounds-check
bounds-check
July 2001 OASIS --- Santa Fe 18
Analysis of Buffer Overrun Vulnerabilities, cont.
No bounds-check
Counter-example in query-results; Chop result in _______ background
July 2001 OASIS --- Santa Fe 19
Analysis of Buffer Overrun Vulnerabilities, cont.
• Constraint satisfaction [Wagner, et al.]
– Assert required constraints between destination buffer sizes and
corresponding copy length arguments
– Report all cases where constraints are not satisfied
• Use of CodeSurfer [future work]
– Implement in industrial-strength framework
– Reduce false positives reported
• Context-sensitive constraint satisfaction
• Better pointer analysis
– Interactive tool for analysis of false positives
July 2001 OASIS --- Santa Fe 20
Context-Sensitive GMOD / GREF Analysis
• Accurate dependence analysis for reference parameters– Previously, a major source of imprecision– Now, context-sensitive analysis of non-local variable usage– Substantial improvement– Relevant for buffer-overrun analysis
• Example– Instead of
mbstowcs(char *dst, char *src, int length)
consider
assign(char *dst, char *src){
*dst = *src;}
2
July 2001 OASIS --- Santa Fe 21
Context-Sensitive GMOD / GREF Analysis
• When procedure P has formal parameter F of type *T, the flow-insensitive,
context-insensitive points-to set of F is the union of the points-to sets of all
corresponding actual parameters in calls to P (plus any other pointers
assigned to F in P)
assign(&a1, &a2); assign(&b1, &b2);
assign(char *dst, char *src)
{
*dst = *src;
}
July 2001 OASIS --- Santa Fe 22
Context-Sensitive GMOD / GREF Analysis, cont.
• Additional (hidden) actual and formal parameters are generated to represent
the variables modified or referenced via formal F (as well as variables modified
or referenced via global pointer variables)
assign(&a1, &a2); assign(&b1, &b2);
assign(char *dst, char *src)a2,b2 a1,b1
{
*dst = *src;
}
July 2001 OASIS --- Santa Fe 23
Context-Sensitive GMOD / GREF Analysis, cont.
• The generated actual parameters are wired to the accessible defs and uses of
the variables accessible via F.
• In general, there is more than one def for each actual-in, and more than one
use for each actual-out.
assign(&a1, &a2); assign(&b1, &b2);
assign(char *dst, char *src)a2,b2 a1,b1
{
*dst = *src;
}
a2=… b2=……=a1 …=b1
July 2001 OASIS --- Santa Fe 24
Context-Sensitive GMOD / GREF Analysis, cont.
• Previously, the wiring was based on the points-to sets of the corresponding formal parameters. Thus, the additional edges (in blue) were also wired.
assign(&a1, &a2); assign(&b1, &b2);
assign(char *dst, char *src)a2,b2 a1,b1
{
*dst = *src;
}
a2=… b2=……=a1 …=b1
July 2001 OASIS --- Santa Fe 25
Context-Sensitive GMOD / GREF Analysis, cont.
• A backward slice from a use of variable b1 shows what influences its value.• It is computed by following dependence edges backward.
assign(&a1, &a2); assign(&b1, &b2);
assign(char *dst, char *src)a2,b2 a1,b1
{
*dst = *src;
}
a2=… b2=……=a1 …=b1
July 2001 OASIS --- Santa Fe 26
Context-Sensitive GMOD / GREF Analysis, cont.
• A backward slice from a use of variable b1 shows what influences its value.• It is computed by following dependence edges backward.• Only feasible paths are followed, i.e., edges shown in gold are not followed.
Good!
assign(&a1, &a2); assign(&b1, &b2);
assign(char *dst, char *src)a2,b2 a1,b1
{
*dst = *src;
}
a2=… b2=……=a1 …=b1
July 2001 OASIS --- Santa Fe 27
Context-Sensitive GMOD / GREF Analysis, cont.
• But the path to variable a2 would also be followed. Bad! Variable a2 has no
influence on variable b1
assign(&a1, &a2); assign(&b1, &b2);
assign(char *dst, char *src)a2,b2 a1,b1
{
*dst = *src;
}
a2=… b2=……=a1 …=b1
July 2001 OASIS --- Santa Fe 28
Context-Sensitive GMOD / GREF Analysis, cont.
• … and other spurious paths would also be followed. Very bad!
assign(&a1, &a2); assign(&b1, &b2);
assign(char *dst, char *src)a2,b2 a1,b1
{
*dst = *src;
}
a2=… b2=……=a1 …=b1
July 2001 OASIS --- Santa Fe 29
Context-Sensitive GMOD / GREF Analysis, cont.
• This bad behavior has now been eliminated.
• We now distinguish between variables accessible because of actual
parameters and variables accessible because of globals.
assign(&a1, &a2); assign(&b1, &b2);
assign(char *dst, char *src)a2,b2 a1,b1
{
*dst = *src;
}
a2=… b2=……=a1 …=b1
July 2001 OASIS --- Santa Fe 30
Context-Sensitive GMOD / GREF Analysis, cont.
• Big win in precision
• Big win in time and space
• But a new time and space problem [example not shown]Conjecture: Greater precision makes previously identical sets (with shared
representations) different (and therefore unshared)
Program LOCSize of SDG
Summary edge time Build time
Forward slice time
Backward slice time
compress 1,937 -48% 0% -62% -50% 0%cpp 4,079 -16% -33% -19% -47% -29%
byacc 6,626 3% -23% -27% -34% -25%cadp 12,787 -21% 14% -39% -32% -22%flex 12,400 -6% 4% -28% -33% -29%
ijpeg 28,177 -24% -81% 6% -71% -71%go 29,246 -21% -31% -35% -46% -53%
ntpd 61,068 -12% -16% -27% -12% -7%
July 2001 OASIS --- Santa Fe 31
Discrimination by Structure Field
• Previously, all fields participated in every operation on any field
– e.g., predecessors of p->f were defs of every field of struct pointed to by p
• Now, there is an option to discriminate on structure fields
– e.g., predecessors of p->f are only defs of field f of structs pointed to by p
• But casts and unions must be taken into account
– For portable analysis, cannot use explicit offsets
– Two fields f1 and f2 in different struct types T1 and T2 have the same offsets if the field
sequences leading up to f1 and f2 have pair-wise compatible types
• Explicit offsets could be used in the future for precise platform-dependent analysis
• New problem to be solved
– Unless calls to malloc are immediately cast to their intended types, field discrimination
is lost
3
July 2001 OASIS --- Santa Fe 32
Transitions (Spin-off SBIR Research)
• Current SBIR Phase I projects to transition research to products
– Malicious Code Detection in Firmware (Air Force)
• CodeSurfer for x86; use to detect malicious code
– Model Checking of Hierarchical Graph Structures (DARPA / ITO)
• CodeSurfer model checking plug-in for QA
– Inlined Reference Monitors for Java Bytecode (NIST)
• Use of dependence-graph technology for insertion of efficient IRMs
– Model Checking of UML designs (Navy / Aegis)
• Model-checking to assure properties in UML Rose/RealTime designs
– Dependence Graphs for Dynamic Internet Technologies (NSF)
• CodeSurfer for Java; decision support for test coverage
– Static Analysis for AOP (DARPA / PCES)
• Aspect C (separate take from Gregor’s)
– New Technique for Efficient Compression of Information (BMDO) *
• BDD variant, potential for double-exponential decision tree compression
* [unrelated to DARPA research]
July 2001 OASIS --- Santa Fe 33
Transitions, cont.
• Recent Product Release – CodeSurfer 1.5
• Open APIs to C program representation and analysis operations
• Paper– Software Inspection using CodeSurfer, WISE’01 Workshop on Inspection
in Software Engineering, July 23rd, 2001, Paris.
July 2001 OASIS --- Santa Fe 34
Workshop Topics
• Integration Opportunities– Projects exploring code rewriting or reorganization, or developing
vulnerability scanners• Client of our open APIs to program representation and analyses
– Projects relying on a system model• Potential to extract the model automatically from the code
• Validation– Scalability
• Performance on benchmarks
– Vulnerability detection
• False positive rate w.r.t. “truth”, e.g., known buffer overrun attacks
July 2001 OASIS --- Santa Fe 35
Future Work
• Core Technology– Pointer analysis– Dependence analysis
• concurrency, asynchronous control, reused storage, types, array strides– Model checker (model reduction and abstraction relaxation)– Constraint satisfaction: sets and numeric ranges– Summary information for libraries– Rewriting support– Performance
• Information Assurance Workbench– Scanners for buffer overruns, race conditions, covert channels