Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 4 times |
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
1
CCURED: TYPE-SAFE
RETROFITTING OF LEGACY CODE
George Necula Scott McPeak Wes Weimer
Presented by Anastasia Braginsky
Some slides were taken from George Necula presentation :
http://www.slidefinder.net/c/ccured_taming_pointers_george_necula/6827275
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
2
Problem
C is popular; it is part of the
infrastructure
C is also unsafe and has a weak
type system that can cause
subtle bugs
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
3
Solution
Add type safety to C – Make C “feel” as safe as Java
Catch memory safety errors, by static analysis as much as possible
Add run-time checks to C programs, as less as possible (performance)
Minimal user effort Add type inference to C
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
4
The CCured System
C Program CCuredTranslator
CCuredTranslator
InstrumentedC Program Compile &
Execute
Compile &Execute
Halt: MemorySafety Violation
Success
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
5
Two Main Premises
Usually in C a large part of the program
can be verified statically to be type safe
The remaining part can be instrumented with
run-time checks to ensure that the execution
is memory safe
In many applications, some loss of
performance due to run-time checks is an
acceptable price for the type safety
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
6
Example C Program
Boxed integer
31 bit 1 bit
Un-boxing
C type int* is used to represent boxed integer
integer or pointer taginteger or pointer tag
0011…11101001 00011…11101001 0
0101…10101110 00101…10101110 0
0001…11000101 10001…11000101 1
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
7
Example C Program1 int * * a; //array2 int i; // index3 int acc; // accumulator4 int * * p; // element ptr5 int * e; // unboxer6 acc = 0;7 for (i=0; i<100; i++) {8 p = a + i; // ptr
arithmetic9 e = *p; // read
element10 while ( (int)e%2 == 0 ) { // check tag11 e = * (int * * ) e; // unbox12 }13 acc += ((int)e >> 1); //
strip tag14 }
0011…11101001 00011…11101001 0
0101…10101110 10101…10101110 1
0001…11000101 10001…11000101 1 0101…10101001 00101…10101001 0
1101…10110110 11101…10110110 1
aa
pp
ee 0101…10101110 1
SAFE
SEQuence
DYNamic
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
8
Example C Program1 int * * a; //array2 int i; // index3 int acc; // accumulator4 int * * p; // element ptr5 int * e; // unboxer6 acc = 0;7 for (i=0; i<100; i++) {8 p = a + i; // ptr
arithmetic9 e = *p; // read
element10 while ( (int)e%2 == 0 ) { // check tag11 e = * (int * * ) e; // unbox12 }13 acc += ((int)e >> 1); //
strip tag14 }
0011…11101001 00011…11101001 0
0101…10101110 10101…10101110 1
0001…11000101 10001…11000101 1 0101…10101001 00101…10101001 0
1101…10110110 11101…10110110 1
aa
pp
ee 0101…10101110 1
SAFE
SEQuence
DYNamic
But due to aliases all are considered to point to dynamic!
But due to aliases all are considered to point to dynamic!
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
9
SAFE Pointers
SAFE pointer to type t
t
ptr
On use: - null check
Can do: - dereference
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
10
SEQuence Pointers
SEQ pointer to type t
t t t
base ptr
On use: - null check - bounds check
Can do: - dereference - pointer arithmetic
end
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
11
DYNamic Pointers
DYN DYN int
home ptr
DYN pointer
len
tags
On use: - null check - bounds check - tag check/update
Can do: - dereference - pointer arithmetic - arbitrary typecasts
1 1 0
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
12
A Formal Language
To simplify the presentation, it is
described formally for a small
language: CCured
Then it is described informally
how to extend the approach to
handle the remaining C constructs
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
13
The Syntax
Types: τ ::= int | τ ref SAFE |τ ref SEQ |
DYNAMIC
Expressions: e ::= x | e1 op e2 | (τ)e | e1 ⊕ e2 | !e
Commands: c ::= skip | c1; c2 | e1:= e2
Only integers or pointers
Only integers or pointers
ML syntax of references
ML syntax of references
Doesn’t carry the type of the
pointed value
Doesn’t carry the type of the
pointed value
Integer literals
Integer literals
Assortment of binary integer
operations
Assortment of binary integer
operationsCastingCasting Pointers
arithmetic
Pointers arithmetic
Like *e in C
Like *e in C
Memory update through a pointer, like *e1= e2 in C
Memory update through a pointer, like *e1= e2 in C
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
Example C Program, translated to CCured
1 int *1 *2 a; //array
2 int i; // index
3 int acc; // accumulator
4 int *3 *4 p; // element ptr
5 int *5 e; // unboxer
6 acc = 0;
7 for (i=0; i<100; i++) {
8 p = a + i; // ptr arithmetic
9 e = *p; // read element
10 while ( (int)e%2 == 0 ) { // check tag
11 e = * (int *6 *7 ) e; // unbox
12 }
13 acc += ((int)e >> 1); // strip tag
14 }
14
1 DYNAMIC ref SEQ a; // array
2 int ref SAFE p_i; // index
3 int ref SAFE p_acc; // accumulator
4 DYNAMIC ref SAFE ref SAFE p_p; // element ptr
5 DYNAMIC ref SAFE p_e; // unboxer
6 p_acc := 0;
7 for ( p_i := 0 ; !p_i<100 ; p_i := !p_i + 1 ) {
8 p_p := (DYNAMIC ref SAFE) (a ⊕ !p_i); // ptr arith
9 p_e := !!p_p; // read element
10 while ( (int) !p_e % 2 == 0 ) { // check tag
11 p_e := !! p_e; // unbox
12 }
13 p_acc := !p_acc + ((int)!p_e >> 1); // strip tag
14 }
Sequence pointer to DYNSequence pointer to DYN
Safe pointer to DYNSafe pointer to DYN
DynamicDynamic
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
16
The CCured Type System
The purpose is to maintain the separation between the statically typed and the un-typed words
For presented type system assume that the program contains complete pointer kind information
Type environment is provided with the types for every variable name
It needs to give types, using derivation rules, to expressions and commands
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
17
The derivation rules: convertibility
“a ≤ b” – it is possible to convert type a to type b
τ ≤ τ reflexivity
τ ≤ int reading addresses
int ≤ τ ref SEQ pointers arithmetic
int ≤ DYN dereferences are prevented by run-time checks; the
pointer has lost its capability to perform memory operations
τ ref SEQ ≤ τ ref SAFE
reference types can’t change; bounds are checked by run-time checks
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
18
The derivation rules: expressions
“x : τ” – expression x is from type τ
(τ ref SAFE) 0 : τ ref SAFE creating safe null pointer
IF e : τ ref SAFE THAN !e : τ memory operations only for
IF e : DYN THAN !e : DYN safe and dynamic pointers
IF ( e : τ’ AND τ’ ≤ τ ) THAN (τ)e : τ casting rules
IF (e1 : int AND e2 : int ) THAN e1 op e2 : int
binary integer operations
IF ( e1 : τ ref SEQ AND e2 : int ) THAN e1⊕e2 : τ ref SEQ
IF ( e1 : DYN AND e2 : int ) THAN e1⊕e2 : DYN
pointer arithmetic only for sequence and dynamic pointers
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
19
The derivation rules: commands
IF ( e1 : τ ref SAFE AND e2 : τ ) THAN
e1 := e2
IF ( e1 : DYN AND e2 : DYN ) THAN
e1 := e2
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
20
Homes
H is a set of memory allocated areas (which are
called homes)
A home is represented by its starting address and
its size
All homes are disjoint
A special null-home: 0H size(0)=1
Safe pointers and integers have no representation
overhead over C
Sequence and dynamic pointers carry with them
their home
Home starting
at h1
Home starting
at h1
Home - h2Home - h2
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
21
Casts Any integer with value n, can be casted to sequence or
dynamic pointer with value n with null-home
No further memory operations
Any sequence or dynamic pointers with value n and with home
with starting address h, can be cast to integer with value n+h
Any dynamic pointer can be cast to different dynamic pointer
with same value and home
No dynamic ↔ sequence since it is not allowed by type system
Any sequence pointer with value n and with home with starting
address h, can be cast to safe pointer with value n+h.
Only if 0≤n<size(home) run-time check
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
22
Run-time checks
A null-pointer check for memory operation that uses safe pointer
Memory access boundaries Non-pointer check (null-home)
for sequence and dynamic pointers Programs that cast pointers to
integers and then back to pointers will not be able to use the resulting pointers as memory addresses
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
23
Well-typed CCured programs
Can fail
Due to failed run-time check
Can not fail
Due to unexpected types
Due to trying to access an invalid
memory location
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
24
Theorem I (Progress and type preservation)
IF e : τ (for valid type τ)
AND
The contents of each memory address corresponds to the typing constraints of the home to which it belongs
THEN EITHER
One of the run-time checks fails during the evaluation of the expression e
OR ELSE
e evaluates to value v AND v is the valid value of type τ
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
25
Theorem II (Progress for commands)
For any command c which is built from valid types
IF The contents of each memory address corresponds to
the typing constraints of the home to which it belongs
THEN EITHER
The command execution fails due to run-time checks
OR ELSE
The commands succeeds and still the contents of each memory address corresponds to the typing constraints of the home to which it belongs
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
26
Type inference algorithm
Given a C program, translate the pointer types to make the program well-typed in the CCured type system
The C program already uses types of the form “τ ref ”. It is needed to discover whether it should be safe, sequence or dynamic.
τ ref q where q is a qualifier ranging over the set {SAFE, SEQ,
DYN}
The overall strategy is to find as many SAFE and SEQ pointers as possible
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
27
Algorithm overview
1. Introduce a qualifier variable for each syntactic occurrence of the pointer type constructor in the C program
2. Scan the program and collect a set of constrains C on these qualifier variables
3. Solve the system of constrains to produce a substitution S of qualifier variables with qualifier values
S(int) = int
S(τ ref q) = DYNAMIC if S(q)=DYN
S(τ) ref S(q)otherwise
4. Apply the substitution to the types of C program to produce a CCured program
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
28
Constraint Generation Rules
Convertibility int ≤ τ ref q {q ≠ SAFE} C
τ1 ref q1 ≤ τ2 ref q2
{q1 ← q2} { q1=q2=DYN OR τ1=τ2=int} C
q1 ← q2 = SEQ can be cast to SAFE (q1 is SEQ and q2 is SAFE) or qualifiers are equal
Expressions and commands If e1 : τ ref q and e2 : int than e1⊕e2 : τ ref q
{q ≠ SAFE} C (pointer arithmetic)
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
29
Constraint Collection
Additional rules to bridge the gap between C and CCured Allow memory access through SEQ (not just SAFE)
pointers
Allow ints to be read or written through DYNAMIC pointers
In both cases implicit cast, no run-time checks
In a memory write allow a conversion of the value being written to the type of the referenced type
For each type of the form τ ref q’ ref q collect a constraint q=DYN => q’=DYN
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
30
Final Set of Constrains
ARITH: q ≠ SAFE
CONV: q ← q’
POINTSTO:
q = DYN => q’ = DYN
ISDYN: q = DYN
EQ: q = q’
Constraint Solving
1. Propagate the ISDYN constrains using the constraints EQ, CONV, and POINTSTO.
2. All qualifier variables involved in ARITH constrains are set to SEQ and this information is propagated using the constraints EQ and CONV
3. Make all the other variables SAFE
The whole type inference process is linear in the size of
the program!
The whole type inference process is linear in the size of
the program!
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
31
Handling the rest of C In the DYNAMIC world, structures and arrays are simply
alternative notations for saying how bytes of storage to
allocate
Explicit de-allocation is ignored (Garbage Collecor is used)
The address-of operator in C can yield a pointer to a stack-
allocated variable – additional run-time check that stack
pointer is not copied to a heap or globals
DYNAMIC function pointers and variable-argument functions
are handled by passing a hidden argument which specifies the
types of all arguments passed (checked by callee)
…
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
32
Source Changes
There are still a few cases in which legal
program will stop with a failed run-time
check – some manual invention is still
necessary
Pointer to integer then back to pointer make it
all void*
Some programs attempt to store stack variables
into a memory allocate on the heap
Calling functions in libraries that were not
compiled with CCured write wrapper function
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
33
Experimental Results
LOC %Safe %Seq %Dyn CCured Ratio
Purify Ratio
compress 1590 87 12 0 1.25 28
go 29315 96 4 0 2.01 51
ijpeg 31371 36 1 62 2.15 30
li 7761 93 6 0 1.86 50
bh 2053 80 18 0 1.53 94
bisort 707 90 10 0 1.03 42
em3d 557 85 15 0 2.44 7
ks 973 92 8 0 1.47 31
health 725 93 7 0 0.94 25
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
34
Bugs Found
ks passes FILE* to printf, not char*
compress, ijpeg: array bound violations
go: 8 array bound violations go: 1 uninit variable as array
index Many involve multi-dimensional
arrays Purify only found go uninit bug ftpd buffer overrun bug
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
35
Conclusions
C is a popular and useful program language, but need
to have type safety
Even in C programs most pointers can be verified to be
type safe, rest can be checked in run-time
This work provide us ability to infer simple and
accurately which pointers need to be checked in run-
time
Since majority of the pointers are safe, the overheads
are smaller then those of comparable tools
The presented type system is formally defined and
proved
IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary
36
QUESTIONS?Thank you!