Post on 14-Dec-2015
transcript
Synthesizing System-Level Software
RequirementsCorrectnessScalability
Response time
RequirementsCorrectnessScalability
Response time
ChallengesCrossing abstraction levels
Hardware complexityTime to market
ChallengesCrossing abstraction levels
Hardware complexityTime to market
Highly Concurrent Algorithms
Parallel pattern matching
Anomaly detection
Parallel pattern matching
Anomaly detectionVoxel trees
Polyhedrons…
Voxel treesPolyhedrons
…Scene graph traversalPhysics simulationCollision Detection
…
Scene graph traversalPhysics simulationCollision Detection
…
Cartesian tree (fast fits)Lock-free queue
Garbage collection…
Cartesian tree (fast fits)Lock-free queue
Garbage collection…
Goal
Generate efficient provably correct components of concurrent systems from higher-level specs· Verification/checking integrated into the design
process· Automatic exploration of implementation details
Synthesize critical components· System-level code· Explore tradeoffs Some tasks are best done by machine,while others are best done by human insight;and a properly designed system will find the right balance.– D. Knuth
Implementation
??Manual Construction• Hard to verify/test• Often buggy• Did the programmer choose well??• One time deal
Memory ModelThread Model
Concurrency PrimitivesCPU primitives
…
Optimistic concurrencyAdding metadata
Adding space…
ENVIRONMENT REQUIREMENTS BAG OF TRICKS
ThroughputMemory Consumption
Pause Time…
High(er) level description
SYSTEM SPEC
Current Approach: Manual Construction
Memory ModelThread Model
Concurrency PrimitivesCPU primitives
…
Optimistic concurrencyAdding metadata
Adding space…
Implementation
ENVIRONMENT REQUIREMENTS BAG OF TRICKS
??
ThroughputMemory Consumption
Pause Time…
ImplementationImplementation
Alternativeimpls
Our Vision
Machine Assistance• Auto checking/verification • Auto exploration
of implementation details
• Repeatable
Machine Assistance• Auto checking/verification • Auto exploration
of implementation details
• Repeatable
High(er) level description
SYSTEM SPEC
Example: Concurrent Set Algorithm
Systematically derived with machine assistanceCorrectness – automatically verified
Performance – only uses CAS
Systematically derived with machine assistanceCorrectness – automatically verified
Performance – only uses CAS
Why Should You Care?
Correctness· Checking/verification integrated into the design process
Performance· Systematic exploration beats human in crossing levels
of abstraction, leveraging non-intuitive memory models, etc.
· Systematic exploration produces many candidates with varying tradeoffs
Adaptability· Shorter development cycle for adapting system to a
new environment
Correctness· Checking/verification integrated into the design process
Performance· Systematic exploration beats human in crossing levels
of abstraction, leveraging non-intuitive memory models, etc.
· Systematic exploration produces many candidates with varying tradeoffs
Adaptability· Shorter development cycle for adapting system to a
new environment
Why Should You Care?
Why There is Hope?
Designer effort· Provide insights that are also required in
manual constructionCorrectness
· Checking helps eliminate large number of incorrect candidates
· Designer can focus on remaining candidatesPerformance
· …Adaptability
· …
Why There is Hope II ?
Transformational derivation· Concurrent garbage collection algorithms [PLDI’06]
Combinatorial exploration · Concurrent GC algorithms [PLDI’07]· Concurrent set algorithms [PLDI’08]
Automatic Verification · Comparison under Abstraction for Verifying Linearizability
[Amit, CAV’07]· Shape Analysis for Concurrent Programs [TAU]· …
Risk Summary
Designer Effort· Return on designer “investment”· Is the result competitive with manually crafted
system?· Is the tool working in the right level of
abstraction?
Verification· scalability
Outline
Technical details· Commonalities between concurrent algorithms· Adapting to a changing environment· Preliminary experience: our combinatorial
approachPlan
· Succeed EarlyMany open questions
· Common representation· “more efficient”· …
Ben-Ari Base ‘84
Dijkstra(C) ‘78
Doligez(C) ‘93
Azatchi ‘03
Domani ‘03
Yuasa ‘90
Pixley ‘88
Ben-Ari Base ‘84
Doligez ‘94
Ben-Ari Extended ‘84
Steele(C) ‘75
Boehm ‘91
Barabash ‘03
AL
GO
RIT
HM
SP
RO
OF
SExample: “The Origin of GCs”
Incorrect
Correct
(C) Corrected
FA
MIL
Y
Example: Concurrent Set Algorithms
Harris ‘01
Michael ‘02
Heller ‘05
Valois ‘95
Ruppert ‘04
Massalin ‘91
Greenwald ‘99
Adapting to a Changing Environment
Algorithm
Synch primitives
Memory model
Thread model
Memorymanager
Scheduler …
…
Families of algorithms sharing a common skeleton with parametric functions
Trace StepMutator Step
Expose
Mutator Collector
Machine Assisted Design Process
Overview
High-level design Find a sufficient local invariantFind a sufficient abstraction
Low-level search Verify local invariant
High-level design Find algorithm outlineFind building blocks
Low-level search explore algorithm space
Generation
Verification
{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new}
{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}
{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}
Trace Step (source, field)Mutator Step (source, field, new)
Set Expose (log)
Coarse-Grained to Fine-Grained Synchronization
What now ?Can we remove atomics ?
Result is incorrect, may lose objects!
atomicatom
ic
atomic
{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new}
{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}
{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}
Trace Step (source, field)Mutator Step (source, field, new)
Set Expose (log)
What now ?Can we remove atomics ?
Coarse-Grained to Fine-Grained Synchronization
{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}
{ M1: old = source.field M2: w = source.field.WF M5: w old.MC-- M3: w new.MC++ M4: w log = log U {new} M6: source.fld = new}
{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}
Trace Step (source, field)Mutator Step (source, field, new)
Set Expose (log)
What now ?Can we remove atomics ?
“When in doubt, use brute force.” --Ken Thompson “When in doubt, use brute force.” --Ken Thompson
Coarse-Grained to Fine-Grained Synchronization
Tracing Step Building BlocksMutator Building Blocks
Expose Building Blocks
M1: old = source.fieldM2: w = source.field.WFM3: w new.MC++M4: w log = log U {new} M5: w old.MC--M6: source.fld = new
C1: dst = source.fieldC3: mark dstC2: source.field.WF = true
E1: o= remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}
System Input – Building Blocks
Input Constraints
• Mutator blocks: [M3, M4]• Tracing blocks: [C1, C3]• Expose blocks: [ E1, E2, E3,
E4 ]
• Dataflow e.g. M2 < M3
System Output – (Verified) Algorithms
Mutator Step (source, field, new)
{ M1: old = source.field
M6: source.fld = new M2: w = source.field.WF
M3: w new.MC++ M4: w log = log U {new}
M5: w old.MC—}
Set Expose(log)
{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}}
Trace Step (source, field){ C1: dst = source.field C3: mark dst C2: source.field.WF = true }
Explored 306 variations in around 2 mins
Least atomic (verified) algorithm with given blocks
But What Now ?
How do we get further improvement?Need more insightsNeed new building blocks
· Example: start and end of collector reading a field
CoordinationMeta-data
Atomicity Ordering
Continuing the Search…
We derived a non-atomic algorithm (at the granularity of blocks)· Non atomic write-barrier, collector step and expose · System explored over 1,600,000 algorithms (took ~34 hours)
All experiments took ~41 machine hours and ~3 human hours
Plan
Identify application domainCase studies
· Concurrent garbage collection algorithms· Concurrent set algorithms· Concurrent memory allocator (used in metronome)· …
Dynamic tool for testing systems (ParaDyn)Abstraction-guided synthesisAutomatic verification using local abstractionsRepresentationChoosing the right starting point
Highly Concurrent Plan
Identify application domain
Case studies· Concurrent garbage
collection algorithms· Concurrent set
algorithms· Concurrent memory
allocator (used in metronome)
· …
Dynamic tool for testing systems (ParaDyn)
• Representation • Choosing the right starting point• …
Abstraction-guided synthesis
Automatic verification using local abstractions
Succeed Early
Choose “the right” domain· Correctness is critical· High performance· Highly dynamic (concurrent changes)· Custom architecture (?)· Irregular structures (?)· Workloads unknown at compile time· Examples: VM components, drivers for
embedded devices…
Choosing the Right Starting Point?
“Higher-level specification” ?A sequential program?start with something else?
Add(S,x): S’ = S { x } Remove(S,x): S’ = S { x }
Contains(S,x): x S
What is “More Efficient”?
Multiple dimensions· Scalability· Response time· …
Theoretical models exist· Disjoint-access parallelism· …
Not clear whether existing theoretical models capture reality
Abstraction-Guided Synthesis
Guarantee correctness · synthesize only programs that can be proved
with your abstraction
Summary
Machine assisted design and implementation of correct efficient highly-concurrent algorithms
Designer provides insights, system explores implementation details
Business impact· Change the way concurrent systems are built· (More) Reliable high-performance systems. Shorter time
to marketScientific impact
· Realistic semi-automated synthesis of concurrent systems
Why us?
Our team has expertise in concurrency and verification of concurrent systems
We have preliminary experience with synthesizing concurrent algorithms in the domain of concurrent garbage collectors
We have ongoing collaborations with world experts on verification of concurrent programs, and with researchers working on parallel computing
Parallelization
Higher-level Underlying structure does not change
during computationSystem can be broken into independent
parts
Synthesizing Concurrent Systems
Designing practical and efficient concurrent systems is hard · trading off simplicity for performance· fine-grained coordination
Result: sub-optimal, buggy algorithms
Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specificationsSome tasks are best done by machine,while others are best done by human insight;and a properly designed system will find the right balance.– D. Knuth