Korea UniversityG. Lee - 2009 1
CRE652 Processor Architecture
Making it Trustworthy
Trustworthy Computing and Branch Prediction
Korea UniversityG. Lee - 2009
Trusting Buggy Software
2
No Guarantee at any stageNo Guarantee at any stage
No Design Proven Correct!No Implementation Proven Bug-Free!No Design Proven Correct!No Implementation Proven Bug-Free!
Development and Installation of Computer System in general andSoftware in particular
Korea UniversityG. Lee - 2009 3
Trustworthiness in ComputingExample - SSH Communications SSH Server
void do_authentication(char *user, ...) { int auth = 0; ... while (!auth) { /* Get a packet from the client */ type = packet_read(); switch (type) { ... case SSH_CMSG_AUTH_PASSWORD: if (auth_password(user, password)) auth =1; case ... } if (auth) break; } /* Perform session preparation. */ do_authenticated(…);}
auth = 1
Session starts with Invalid Authentication
Korea UniversityG. Lee - 2009
My ViewMy View
expected
described
executed
?
?
Difference in Program Behavior Space
•Description not proven correctw.r.t Expected
•Execution with no cross-checkingw.r.t Description
•Description not proven correctw.r.t Expected
•Execution with no cross-checkingw.r.t Description
Korea UniversityG. Lee - 2009
First Issue:Protection Measure Precision
Secure(w\ false negatives)
Precise Broad (insecure)(w\ false positives)
set of reachable statesin program execution
set of secure (or certified)states captured in digest
NeitherNeither nor clearly defined or known!nor clearly defined or known!NeitherNeither nor clearly defined or known!nor clearly defined or known!
Korea UniversityG. Lee - 2009
Second Issue: Semantic Gap
…next: read(a);
…..assign: X := a
if not_in(X, set) then goto next else goto print ;…..
print: print(whatever);…..return
Program semantics or behavior specified in program control and dataflow is only in user/programmer’s mind:• Isolated instruction instance• Blinded instruction sequencing
Korea UniversityG. Lee - 2009
Program Behavior ValidationProgram Behavior Validation
1. Empirical capture of {Executed Behavior} for {Expected Behavior}2. SW Transparent micro-architecture level validation1. Empirical capture of {Executed Behavior} for {Expected Behavior}2. SW Transparent micro-architecture level validation
expected
described
executed
?
?ValidationValidation
Korea UniversityG. Lee - 2009
Representing Program BehaviorRepresenting Program Behaviorfrom processor’s perspectivefrom processor’s perspective
…………jr $r6………
pcValidate at each indirect branch
•Legitimacy of in-flow•Legitimacy of out-flowat micro-architecture
Where it comesAND where it goesAND where it is?
Unique IDUnique ID for each dynamic instance of instruction for each dynamic instance of instruction
Build up legitimate {Build up legitimate {Unique IDUnique ID} empirically over time} empirically over timeBuild up legitimate {Build up legitimate {Unique IDUnique ID} empirically over time} empirically over time
For control flow:For control flow:
Korea UniversityG. Lee - 2009
Program Attribute Triplet (PAT)
• PC + target + Branch History (EP)
• dynamic behavior signature…jr $r3…if (b0)
then {…b1=1..}else…
…If (b1) then ..
else ………jr $r6…
pc
PAT = (pc)||($r6)||(b1b0)Unique IDUnique IDUnique IDUnique ID
Korea UniversityG. Lee - 2009 10
Protection via PAT Validation: Example - SSH Communications SSH Server
void do_authentication(char *user, ...) { int auth = 0; ... while (!auth) { /* Get a packet from the client */ type = packet_read(); integer overflow switch (type) { ... case SSH_CMSG_AUTH_PASSWORD: if (auth_password(user, password)) auth =1; case ... } if (auth) break; } /* Perform session preparation. */ do_authenticated(…);}
auth = 1
PAT = (pc|TPC|EP)Invalid!
Korea UniversityG. Lee - 2009
Execution Path based ValidationExecution Path based Validation
0
2000
4000
6000
8000
10000
12000
14000
16000
apache ftpd sshd telnetd
PAT=IBP+8-bitEP
PAT=IBP+6-bitEP
PAT=IBP+4-bitEP
IBP=BPC|TPC
•PAT - How many? Training: Convergence in PATs
{PAT} = Expected Behavior Space
Korea UniversityG. Lee - 2009
Validation Flow in Micro-ArchitectureValidation Flow in Micro-Architecture
Misprediction
pc=branch instruction address
its TPC
Branch prediction
next instructionswith predicted target….….
Branch verification
Fetch next instructionswith verified target
Micro-Architecture with Branch Prediction
Attack modified TPC or reached PC out of sequence
BTB
global BHSR
Korea UniversityG. Lee - 2009
Validation Flow in Micro-ArchitectureValidation Flow in Micro-Architecture
Bit vector
hashhash
hashhash
Mispredictionor EP miss
pc=branch instruction address
its TPC
Branch predictionextend BTB withEP buffer
next instructionswith predicted target….….
Branch verification
Fetch next instructionswith validated target
“invalid” exception
01
Bloom Filter for PATBloom Filter for PAT
preceding EP
With Enhanced Branch Predictor for Validation
Ref. Yixin Shi and Gyungho Lee, “Augmenting Branch Predictor for Secure Program Execution”, Proc. the IEEE 37th Dependable Systems and Networks (DSN 2007), pp. 10 -19, June 2007
Korea UniversityG. Lee - 2009
Validation UnitValidation Unit (outside to Branch Predictor) (outside to Branch Predictor)
time-mux’d with H3 hash functions
Hashing (H-3) logic 256K-bit vector Ouput Buffer Total delay
1.48 ns 1.062ns 0.99 ns 3.532ns
Q01
H3
256 K Bit Array
Output Buffer
H3 H3 H3
BPC||TPC||EP
Found?
Q00Q11
Q10Q21
Q20Q31
Q30 1.48ns
1.062ns
0.99ns
3.532ns X n
H3 hash
•estimated by a Verilog HDL implementation and a synthesis with TSMC’s 0.09um library
Korea UniversityG. Lee - 2009
Performance SimulationPerformance Simulation
Parameter Value
BTB 512 set, 4-way set associative
RAS 8 entries
Branch miss penalty 7 cycles
Pipeline stage 9
Branch Predictor g-share, 12 bits history, 2048 entries
Fetch/dispatch/issue width 4
RUU size 64 entries
Load/Store Queue 32 entries
I-cache 64K, 2 way set-asso., 2 cc hit time, LRU
D-cache 64K, 4 way set-asso., 2 cc hit time, LRU
L2 cache Unified, 512KB, 4 way set-associative,
L2 access time 10 cycles
Function unit 4 Int ALUs, 1 Int MUL/DIV, 4 FP Adder, 1 FP MUL/DIV
Memory 100 cycles access time, 2 memory ports
4-issue processor
Korea UniversityG. Lee - 2009
Performance ImpactPerformance Impact
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
2.25
2.50
bzip2 crafty eon gap gcc gzip mcf parser perl twolf vortex vpr AVG
No validation20-cycle validation delay25-cycle validation delay
ipc<2% performance overhead on average
(~3GHZ) 4-issue processor with EP length=8 and EP buffer =4; Bloom filter = 256kbit twice to have 8 hashes 7 ~ 8ns
Korea UniversityG. Lee - 2009 17
Performance Overhead
CFI In-line Instrumentation (applicable for static linking only)crafty 45%gcc 10% 21% on average
Program Shepherding with trace cache (w. monitoring overhead)crafty 4%(209%)gcc 625%(760%) 12%(32%) on average
includes some fp benchmarks
PAT validation (in HW) in SW (w. interrupt overhead)crafty 2.4% 17%(120%)gcc 0.3% 6%(24%)avg 0.9% 14%(29%)Ref. M. Abadi, et. al., “Control Flow Integrity: principles, implementations, and applications”, ACM CCS’05, 2005
Ref. V. Kriansky, et.al., “Secure Execution via program shepherding”, Proc. Usenix Security Symposium, 2002
Order of Magnitude less than Other Approaches:
Korea UniversityG. Lee - 200918/22
SummarySummary
Issues:•Generating {PATs}
at various stages of program development/use; testing, compile-time flow analysis, training, etc.
•Managing {PATs}•How to incorporate into program code
• a part of object code; similar to PLT•What to do at invalid exception
• criteria for new legitimate flow or attack;control flow integrity policy and supporting tool
•How to secure {PATs} • attack focus moves to {PATs};
encryption and read-only
Behavior Monitoring-Analysis tool
System software changes
Security Policy: access control on control flow{PAT} behavior proof;Public Key based DRMwith support from TPM
Korea UniversityG. Lee - 2009
Trusting Program BehaviorTrusting Program Behavior
19
{PATs}: Fine Grain Program Behavior Signature{PATs}: Fine Grain Program Behavior Signature• Server ApplicationsServer Applications• Industrial Control System Industrial Control System
• SCADASCADA• Embedded SWEmbedded SW
• Other Key SoftwareOther Key Software• OS KernelOS Kernel
Empirical Build-Up of Trust over timeEmpirical Build-Up of Trust over time
Signatures for Dynamic Data Flow
Korea UniversityG. Lee - 2009 20
Program Counter (PC) Encoding
Encode PC-bound data at definition and decode them before de-reference at PC loading Tight security
no gap between object to be protected and protection Little performance penalty
Just one machine instruction (XOR)
Checking only PC-bound No compatibility Issue
Nothing, code and memory layout, has changed No new HW or architecture change
encoding/decoding key –
stack (or frame) pointer or from TSC
Korea UniversityG. Lee - 2009 21
PC-encoding
• Encode PC-bound variable at its definition• Decode prior to upload PC-bound variable to PC• PC-bound variables:
Return Address Old Frame (Base) Pointer Function Pointer Function Pointer passed as parameter Longjmp buffer pointer Longjmp buffer pointer passed as parameter
Korea UniversityG. Lee - 2009 22
PC-encoding:
…Static int (*funcptr)(..);Static char buf[BUFSIZE];funcptr = goodfunc;/*Overflow funcptr*/strcncpy(buf, argv[1],…);…(void)(*funcptr)(..);…
Guess the address of “system()”.Add the address to the end of buf[BUFSIZE].execl(VULPROG, VULPROG, buf,…)
The program attacker specified is NOT executeddue to decoding failure
Code under attack – VULPROG.c
Encoding
Decoding
function pointer attack example
Korea UniversityG. Lee - 2009 23
PC-Encoding at Linking
Explicit PC-bound variable RET address in stack
PC-encoding at compiler longjmp() buffer pointer
PC-encoding of return address at setjmp() (static) function pointer
Identifying when to encode PC-bound data beyond explicit ones:e.g. Dynamic function calls
trap vector table, dynamic linked library, etc.
Korea UniversityG. Lee - 2009 24
PC-Encoding at (Dynamic) Linking
textdata
…call lib_f…
…
PLTf: jump *GOT[f]
push offset into stackjump PLT0
….….GOT[f]:….
lib_f: ………………
shared library f
1
2
34
5
Linker
Korea UniversityG. Lee - 2009 25
PC-Encoding at Linking
textdata
…call lib_f…
…
PLTf: jump *GOT[f]
push offset into stackjump PLT0
….….GOT[f]:….
lib_f: ………………
shared library f
1
2
34
5
Linker
encoding
decoding
Korea UniversityG. Lee - 2009 26
PC-Encoding at Linking
function pointers and function label
int (*funcptr)(..);
int (*funcptrcp)(..);
…
funcptr = goodfunc;
…
funcptrcp = funcptr;
…
(void)(*funcptr)(..);
(void)(*funcptrcp)(..);
…
Decoding atde-referencing offunction pointers at run-time
Encodingfunction label at linking
No need of pointer variable tracking
Korea UniversityG. Lee - 2009 27
PC-Encoding Issues Replay attack vulnerable
Guessing encoding key, i.e. $sp (or $fp)
Recompilation needsApplicable to open source only
Unusual function pointer: arithmetic expression
e.g. static int (*funcptr)(..);static int (*anotherfuncptr)(..);…unsigned int tmp; …funcptr = goodfunc;…anotherfuncptr = funcptr + tmp + 4;
Korea UniversityG. Lee - 2009 28
PC-Encoding Key
desirable:• Random – no lucky guess• No Repeated sequence – no replay• Simple – no overhead
NOTE:• Crypto Key - Too much overhead• Physical/natural random
Korea UniversityG. Lee - 2009 29
PC-Encoding Key
Time Stamp Counter• increases every cycle• non-sequential reads, i.e. no guarantee the
sequence of reads before and after machine instructions
Chi-Square test:
entropyreduce
%
Chi-Square
Value, %Arith. Mean
Monte Carlo
Pi error %
Serial correlation coefficient-C
TSC 7.99963
0 245.08, 52.75%
127.506224
0.8526 -0.106322
C rand 7.99961
0 267.51, 25%
127.4948 0.22 -0.000333
Korea UniversityG. Lee - 2009 30
PC-Encoding Key Storage
• Stack/frame pointer – register• Procedure specific – object header • Separate protected area
• TPM and its extended memory area
Korea UniversityG. Lee - 2009 31
PC-Encoding Efficacy
Protection fromcontrol flow altering attacks including buffer-overflow, print format string error.
Tool Attacksprevented
Attackmissed
Error
StackGuard 4 (20%) 16(80%) 0
Stack Shield Global & Range check
6 (30%) 14 (70%) 0
Libsafe 4 (20%) 16 (80%) 0
ProPolice 10 (50%) 9 (45%) 1 memory fault
PC-encoding 20 (100%) 0 0
e.g. Buffer Overflow; 20 different attack casesRef. J. Wilander and M. Kamkar, “A Comparison of publicly available tools for dynamic buffer overflow protection”, Proc. Network and Distributed System Security Symp., 2003
Korea UniversityG. Lee - 2009 32
PC-Encoding Efficacy
• No Protection from• Impossible Path (mimicry) attacks
due to Data Corruption or unchecked trap• Trojan horse
• Encoding – weak in crypto• Key – vulnerable
PC-encoding provides tamper resistance to most control flow altering attemptsbut no protection from control flow change by un-trusted software or compromised data induced impossible path
• Trade-off btw complexity and efficacy
Korea UniversityG. Lee - 2009 33
PC-Encoding: Performance effects
Program Counter Encoding with gcc and recompiled Linux
Connection Rate (con/sec) Avg. Latency (sec.) Avg. Throughput (Mbit/sec)
# ofclients
Without PC Encoding
With PC Encoding
Over-head
Without PC Encoding
With PC Encoding
Over-head
Without PC Encoding
With PC Encoding
Over-head
4 165.15 160.27 2.95 0.024 0.023 -4.17 23.14 22.48 2.85
8 168.18 159.18 5.35 0.046 0.049 6.52 24.06 21.66 9.98
12 184.00 173.87 5.51 0.064 0.067 4.69 25.82 23.91 7.4
16 184.33 184.4 -0.04 0.08 0.084 5.00 26.94 27.29 -1.3
20 192.62 191.53 0.57 0.10 0.091 -9.00 27.28 27.0 1.03
24 187.77 183.77 2.13 0.120 0.120 0 27.79 26.74 3.78
28 193.2 192.53 0.35 0.129 0.135 4.65 28.1 26.65 5.16
32 199.2 204.27 2.55 0.147 0.142 -3.4 27.78 26.73 3.78
Overhead (%) = 100* nic/IC, where nic is the instruction count with the extra instructions added for PC-Encoding, and IC is the instruction count without PC-encoding (all instruction counts are dynamic).
Apache Web Server Performance with and without PC-Encoding
Korea UniversityG. Lee - 2009 34
Architecture Support for PC-Encoding
Instruction extension:Incorporate encoding/decoding into store/loadIncorporate decoding into indirect branche.g.
•key-register $key •pc-store $n, $m(c); Mem[($m) + c] := ($n) xor ($key);•pc-load $n, $m(c); $n := (Mem[($m) + c]) xor ($key);•decode-&-jmp $n; pc:= ($n) xor ($key)
int (*funcptr)(..);int (*funcptrcp)(..);funcptr = goodfunc;funcptrcp = funcptr;…(void)(*funcptr)(..);…(void)(*funcptrcp)(..);…
mov #goodfunc, funcptrmov funcptr, funcptrcp…mov funcptr, $r1dec-&-jmp $r1…mov funcptrcp, $r2dec-&-jmp $r2