Self-Adjusting Machines
Matthew A. Hammer
University of ChicagoMax Planck Institute for Software Systems
Thesis DefenseJuly 20, 2012
Chicago, IL
Static Computation Versus Dynamic Computation
Static Computation:
Fixed Input Compute Fixed Output
Dynamic Computation:
Changing Input Compute Changing Output
ReadChanges
UpdateWrite
Updates
Matthew A. Hammer Self-Adjusting Machines 2
Dynamic Data is Everywhere
Software systems often consume/produce dynamic data
Scientific Simulation
Reactive SystemsAnalysis of Internet
data
Matthew A. Hammer Self-Adjusting Machines 3
Tractability Requires Dynamic Computations
Changing Input Compute Changing Output
Static Case(Re-evaluation “from scratch”)
compute 1 sec# of changes 1 million
Total time 11.6 days
Dynamic Case(Uses update mechanism)
compute 10 secupdate 1× 10−3 sec
# of changes 1 millionTotal time 16.7 minutes
Speedup 1000x
Matthew A. Hammer Self-Adjusting Machines 4
Tractability Requires Dynamic Computations
Changing Input Compute Changing Output
ReadChanges
UpdateWrite
Updates
Static Case(Re-evaluation “from scratch”)
compute 1 sec# of changes 1 million
Total time 11.6 days
Dynamic Case(Uses update mechanism)
compute 10 secupdate 1× 10−3 sec
# of changes 1 millionTotal time 16.7 minutes
Speedup 1000x
Matthew A. Hammer Self-Adjusting Machines 4
Dynamic Computations can be Hand-Crafted
As an input sequence changes, maintain a sorted output.
1,7,3,6,5,2,4
Changing Input
compute 1,2,3,4,5,6,7
Changing Output
1,7,3,6/,5,2,4Remove 6 update 1,2,3,4,5,6/,7
1,7,3,6,5,2/,4Reinsert 6,Remove 2
update 1,2/,3,4,5,6,7
A binary search tree would suffice here (e.g., a splay tree)What about more exotic/complex computations?
Matthew A. Hammer Self-Adjusting Machines 5
Self-Adjusting Computation
Offers a systematic way to program dynamic computations
Self-Adjusting Program
Domain knowledge + Library primitives
The library primitives:
1. Compute initial output and trace from initial input
2. Change propagation updates output and trace
Matthew A. Hammer Self-Adjusting Machines 6
High-level versus low-level languages
Existing work uses/targets high-level languages (e.g., SML)In low-level languages (e.g., C), there are new challenges
Language feature High-level help Low-level gapType system Indicates mutability Everything mutableFunctions Higher-order traces Closures are manualStack space Alters stack profile Bounded stack spaceHeap management Automatic GC Explicit management
C is based on a low-level machine modelThis model lacks self-adjusting primitives
Matthew A. Hammer Self-Adjusting Machines 7
Thesis statementBy making their resources explicit, self-adjusting machines give anoperational account of self-adjusting computation suitable forinteroperation with low-level languages;
via practical compilation and run-time techniques, these machinesare programmable, sound and efficient.
ContributionsSurface language, C-based ProgrammableAbstact machine model SoundCompiler Realizes static aspectsRun-time library Realizes dynamic aspectsEmpirical evaluation Efficient
Example: Dynamic Expression Trees
Objective: As tree changes, maintain its valuation
+
−
+
3 4
0
−
5 6
((3 + 4) − 0) + (5 − 6) = 6
+
−
+
3 4
0
+
−
5 6
5
((3 + 4) − 0) + ((5 − 6) + 5) = 11
Consistency: Output is correct valuationEfficiency: Update time is O(#affected intermediate results)
Matthew A. Hammer Self-Adjusting Machines 9
Expression Tree Evaluation in C
1 typedef struct node s* node t;
2 struct node s {3 enum { LEAF, BINOP } tag;
4 union { int leaf;
5 struct { enum { PLUS, MINUS } op;
6 node t left, right;
7 } binop; } u; }
1 int eval (node t root) {2 if (root->tag == LEAF)
3 return root->u.leaf;
4 else {5 int l = eval (root->u.binop.left);
6 int r = eval (root->u.binop.right);
7 if (root->u.binop.op == PLUS) return (l + r);
8 else return (l - r);
9 } }Matthew A. Hammer Self-Adjusting Machines 10
The Stack “Shapes” the Computation
int eval (node t root) {if (root->tag == LEAF)
return root->u.leaf;
else {int l = eval (root->u.binop.left);
int r = eval (root->u.binop.right);
if (root->u.binop.op == PLUS) return (l + r);
else return (l - r);
} }
Stack usage breaks computation into three parts:
I Part A: Return value if LEAFOtherwise, evaluate BINOP, starting with left child
I Part B: Evaluate the right child
I Part C: Apply BINOP to intermediate results; return
Matthew A. Hammer Self-Adjusting Machines 11
The Stack “Shapes” the Computation
int eval (node t root) {if (root->tag == LEAF)
return root->u.leaf;
else {int l = eval (root->u.binop.left);
int r = eval (root->u.binop.right);
if (root->u.binop.op == PLUS) return (l + r);
else return (l - r);
} }
Stack usage breaks computation into three parts:
I Part A: Return value if LEAFOtherwise, evaluate BINOP, starting with left child
I Part B: Evaluate the right child
I Part C: Apply BINOP to intermediate results; return
Matthew A. Hammer Self-Adjusting Machines 11
The Stack “Shapes” the Computation
int eval (node t root) {if (root->tag == LEAF)
return root->u.leaf;
else {int l = eval (root->u.binop.left);
int r = eval (root->u.binop.right);
if (root->u.binop.op == PLUS) return (l + r);
else return (l - r);
} }
Stack usage breaks computation into three parts:
I Part A: Return value if LEAFOtherwise, evaluate BINOP, starting with left child
I Part B: Evaluate the right child
I Part C: Apply BINOP to intermediate results; return
Matthew A. Hammer Self-Adjusting Machines 11
The Stack “Shapes” the Computation
int eval (node t root) {if (root->tag == LEAF)
return root->u.leaf;
else {int l = eval (root->u.binop.left);
int r = eval (root->u.binop.right);
if (root->u.binop.op == PLUS) return (l + r);
else return (l - r);
} }
Stack usage breaks computation into three parts:
I Part A: Return value if LEAFOtherwise, evaluate BINOP, starting with left child
I Part B: Evaluate the right child
I Part C: Apply BINOP to intermediate results; return
Matthew A. Hammer Self-Adjusting Machines 11
Dynamic Execution Traces
Input Tree+
−
+
3 4
0
−
5 6
Execution Trace
A+ B+ C+
A− B− C− A− B− C−
A+ B+ C+ A0 A5 A6
A3 A4
Matthew A. Hammer Self-Adjusting Machines 12
Updating inputs, traces and outputs
+
−
+
3 4
0
−
5 6
+
−
+
3 4
0
+
−
5 6
5
A+ B+ C+
A− B− C− A+ B+ C+
A+ B+ C+ A0 A− B− C− A5
A3 A4 A5 A6
Matthew A. Hammer Self-Adjusting Machines 13
Core self-adjusting primitives
Stack operations: push & popTrace checkpoints: memo & update points
memo
update
(new evaluation)
memo
update
A+ B+ C+
A− B− C− A+ B+ C+
A+ B+ C+ A0 A− B− C− A5
A3 A4 A5 A6
Matthew A. Hammer Self-Adjusting Machines 14
Abstract model:Self-adjusting machines
Matthew A. Hammer Self-Adjusting Machines 15
Overview of abstract machines
I IL: Intermediate languageI Uses static-single assignment representationI Distinguishes local from non-local mutation
I Core IL constructs:I Stack operations: push, popI Trace checkpoints: memo, update
I Additional IL constructs:I Modifiable memory: alloc, read, writeI (Other extensions possible)
Matthew A. Hammer Self-Adjusting Machines 16
Abstract machine semantics
Two abstract machines given by small-step transition semantics:
I Reference machine: defines normal semantics
I Self-adjusting machine: defines self-adjusting semanticsCan compute an output and a traceCan update output/trace when memory changesAutomatically marks garbage in memory
We prove that these abstract machines are consistent
i.e., updated output is always consistent with normal semantics
Matthew A. Hammer Self-Adjusting Machines 17
Needed property: Store agnosticism
An IL program is store agnostic when each stack frame has a fixedreturn value; hence, not affected by update points
destination-passing style (DPS) transformation:
I Assigns a destination in memory for each stack frame
I Return values are these destinations
I Converts stack dependencies into memory dependencies
I memo and update points reuse and update destinations
I Lemma: DPS-conversion preserves program meaning
I Lemma: DPS-conversion acheives store agnosticism
Matthew A. Hammer Self-Adjusting Machines 18
Consistency theorem, Part 1: No Reuse
Trace
Input Self-adj. Machine Run Output
q qInput Reference Machine Run Output
Self-adjusting machine is consistent with reference machinewhen self-adjusting machine runs “from-scratch”, with no reuse
Matthew A. Hammer Self-Adjusting Machines 19
Consistency theorem, Part 2: Reuse vs No Reuse
Trace0
Input Self-adj. Machine Run Trace
Output
q qInput Self-adj. Machine Run Trace
Output
Self-adjusting machine is consistent with from-scratch runsWhen it reuses some existing trace Trace0
Matthew A. Hammer Self-Adjusting Machines 20
Consistency theorem: Main result
Trace0 Trace
Input Tracing Machine Run (P) Output
q qInput Reference Machine Run (P) Output
Main result uses Part 1 and Part 2 together:
Self-adjusting machine is consistent with reference machine
Matthew A. Hammer Self-Adjusting Machines 21
ConcreteSelf-adjusting machines
Matthew A. Hammer Self-Adjusting Machines 22
From abstract to concrete machines
Overview of design and implementation
I Abstract model guides design
I Compiler addresses static aspects
I Run-time (RT) addresses dynamic aspects
Phases
I Front-end translates CEAL surface language into IL
I Compiler analyses and transforms IL
I Compiler produces C target code, links with RT library
I Optional optimizations cross-cut compiler and RT library
Matthew A. Hammer Self-Adjusting Machines 23
Compiler transformations
Destination-passing style (DPS) conversion
I Required by our abstract model
I Converts stack dependencies into memory dependencies
I Inserts additional memo and update points
Normalization
I Required by C programming model
I Lifts update points into top-level functions
I Exposes those code blocks for reevaluation by RT
Matthew A. Hammer Self-Adjusting Machines 24
Compiler analyses
Compiler analyses
I guide necessary transformations
I guide optional optimizations
Special uses
memo/update analysis selective DPS conversion
live variable analysis translation of memo/update points
dominator analysis normalization, spatial layout of trace
Matthew A. Hammer Self-Adjusting Machines 25
From compiler to run-time system
Trace nodes
I Indivisible block of traced operations
I Operations share overhead (e.g., closure information)
I Compiler produces trace node descriptors in target code
Run-time system
I RT interace based on trace node descriptors (from compiler)
redo callback — code at update pointsundo callback — revert traced operations
I Change propagation incorporates garbage collection
Matthew A. Hammer Self-Adjusting Machines 26
Optimizations
Sparser traces — avoid tracing when possible
1. Stable references Programmer uses type qualifier2. Selective DPS Compiler analysis of update points
Cheaper traces — more efficient representation
3. Write-once memory Programmer uses type qualifier4. Trace node sharing Compiler analysis coalesces traced ops
Matthew A. Hammer Self-Adjusting Machines 27
Evaluation
Matthew A. Hammer Self-Adjusting Machines 28
From-scratch time: Constant overhead
0.00.20.40.60.81.01.21.41.6
0 250K 500K 750K
Tim
e (s
)
Input Size
Exptrees From-Scratch
Self-AdjStatic
Matthew A. Hammer Self-Adjusting Machines 29
Average update time: Constant time
0.0110.0120.0130.0140.0150.0160.0170.0180.0190.0200.0210.022
250K 500K 750K
Tim
e (m
s)
Input Size
Exptrees Ave Update
Self-Adj
Matthew A. Hammer Self-Adjusting Machines 30
Speed up = From-scratch / Update
0.0 x 100
5.0 x 103
1.0 x 104
1.5 x 104
2.0 x 104
2.5 x 104
0 250K 500K 750K
Spe
edup
Input Size
Exptrees Speedup
Self-Adj
Matthew A. Hammer Self-Adjusting Machines 31
Evolution of our approach
Stage 1: First run-time library
+ Change propagation & memory management− Very high programmer burden
Stage 2: First compiler
+ Lower programmer burden− No return values− Memo points are non-orthogonal
(conflated with read and alloc primitives)− No model for consistency or optimizations
Stage 3: New compiler & run-time library
+ Self-adjuting machine semantics guides reasoningabout consistency & optimizations
+ Very low programmer burden
Matthew A. Hammer Self-Adjusting Machines 32
Stage 1, RT library : vs SML library
0
50
100
150
0 50 100 150 200 250 300
Tim
e (s
)
Input Size (n × 103)
Quicksort From-Scratch
SML+GC SML-GC
C
0
5
10
15
0 50 100 150 200 250 300
Tim
e (m
s)
Input Size (n × 103)
Quicksort Ave. Update
SML+GC SML-GC
C
I SML-GC is comparable to C
I SML+GC are 10x slower
Matthew A. Hammer Self-Adjusting Machines 33
Stage 2, Basic compiler : CEAL vs Delta-ML
Normalized Measurements [(CEAL / DeltaML) × 100]
App. From-Scratch Ave. Update Max Livefilter 11% 16% 23%map 11% 14% 23%
reverse 13% 17% 24%minimum 22% 11% 38%sum 22% 29% 34%
quicksort 4% 6% 21%quickhull 20% 30% 91%diameter 17% 23% 67%
Averages 15% 18% 40%
Matthew A. Hammer Self-Adjusting Machines 34
Stage 3, Machine model : Multiple targets
1. Stable references Programmer uses type qualifier2. Selective DPS Compiler analysis of update points3. Write-once memory Programmer uses type qualifier4. Trace node sharing Compiler analysis coalesces traced ops
Matthew A. Hammer Self-Adjusting Machines 35
Stage 3, Machine model : Average update times
0.0
0.2
0.4
0.6
0.8
1.0
expt
rees
map
reve
rse
filte
r
sum
min
imum
quic
ksor
t
mer
geso
rt
quic
khul
l
diam
eter
dist
ance
mea
n
Upd
ate
Tim
e (n
orm
. by
no−o
pt) all−opt
no−seldpsno−shareno−stableno−owcr
Matthew A. Hammer Self-Adjusting Machines 36
Stage 3, Machine model : Maximum live space
0.0
0.2
0.4
0.6
0.8
1.0
expt
rees
map
reve
rse
filte
r
sum
min
imum
quic
ksor
t
mer
geso
rt
quic
khul
l
diam
eter
dist
ance
mea
n
Max
Liv
e S
pace
(no
rm b
y no
−opt
) all−optno−seldpsno−shareno−stableno−owcr
Matthew A. Hammer Self-Adjusting Machines 37
Stage 3, Machine model : Previous approaches
0.05.0
10.015.020.025.030.035.0
0 25K 50K 75K 100K
Tim
e (s
)
Input Size
Quicksort From-Scratch
∆MLall-optCEAL
0.000
0.200
0.400
0.600
0.800
1.000
1.200
0 25K 50K 75K 100K
Tim
e (m
s)
Input Size
Quicksort Ave Update
∆MLall-optCEAL
I Delta-ML: order of magnitude slower
I CEAL (stage 2) slightly faster than all-opt (stage 3)CEAL uses non-orthogonal allocation primitive
Matthew A. Hammer Self-Adjusting Machines 38
Thesis statementBy making their resources explicit, self-adjusting machines give anoperational account of self-adjusting computation suitable forinteroperation with low-level languages;
via practical compilation and run-time techniques, these machinesare programmable, sound and efficient.
ContributionsSurface language, C-based ProgrammableAbstact machine model SoundCompiler Realizes static aspectsRun-time library Realizes dynamic aspectsEmpirical evaluation Efficient