Hints for Computer System Design
Paper by B. W. Lampson
Presentation by Emerson Murphy-Hill
Some Background
• B.W. Lampson – same guy who wrote “Experience with Processes and Monitors in Mesa”
• Currently at Microsoft• Worked on hardware, operating
systems, programming environments, and applications
• Here he presents a laundry list of folk wisdom for system design
<Advice>
• <Short Explanation>• <Example>
• 26 Pieces of Advice
Separate Normal and Worst Case
• Normal case must be fast, but worst case must still work
• Specialization. Special code generated for best case, “replugging” for worst case
Do One Thing At a Time, Well• Capture the minimum possible
interface, deliver what the interface promised, don’t promise too much
• Exokernel. The minimum possible abstraction, only promises resource protection.
Don’t Generalize
• Don’t try to anticipate all possible uses of interface (no general implementation)
• Microkernels. Interface built generally, but few assumptions made about implementation
Get It Right
As Dick Cheney would say, this is “non-actionable intelligence” and fall into the class of “known unknowns”
Abstraction does not imply correctness
Don’t Hide Power
• When low level is high performance, don’t mask it in abstraction
• Scheduler Activations. Rather than kernel multiplexing threads across processors, have user space decide how to allocate a processor
Use Procedure Arguments
• Pass code as a parameter
• Asynchronous I/O. Function callback used as parameter, rather than doing some sort of generic lookup upon return.
Leave it to the Client
• Attain flexibility and performance by “doing one thing,” letting the client do the rest
• Monitors. Provide synchronization as a language-level construct, but leave protecting resources to client
Keep Basic Interfaces Stable
• Avoid changing the interface out from under client
• Trampoline. To maintain Linux binary compatibility, sys calls are trampoline’d into user space event handler
Keep a Place to Stand
• If you have to change the interface, keep backward compatibility
• Virtual Machines. Rather than building OS on top of raw hardware, building on top of virtual machine allows VM implementation to be changed
Plan to Throw One Away
• Throw away the first version of the system, and start over
• Interestingly, we didn’t see an example of this. Some of the most worthwhile research papers are systems have failed
Keep Secrets
• In other words, encapsulation.
• Layers. Each layer contains state that is private from other layers.
Use Good Design Again
• Rather than being general, use a good idea multiple times
• Variations on Cache. TLB, L1+L2, virtual memory, file system cache, disk controller, hierarchical RAID…
Divided and Conquer
• Take a complex problem and split it up into easier ones
• Threading. A number of threads can be used to do a number of subtasks.
Shed Load
• Don’t try to handle all requests, eliminate some
• Web servers. Rather than try to serve all requests, deny some. Apache does this by putting an upper limit on number of threads
Safety First
• Avoid disaster over attaining optimal results
• High level languages. Programmer doesn’t have to worry about type safety and array bounds checking, for example (at a performance cost)
Split Resources
• Divide up resources, rather than scheduling them.
• Scheduler Activations. Rather than multiplexing across processors, have one user level thread per processor.
Static Analysis
• Analyze code without running it, wherever possible
• Deadlock/Race detection (Sun’s lock_lint). As we have seen dynamic race detection is dependent on system entering all states.
Dynamic Translation
• Translate/compile code when needed
• Packet filtering / collocation (Exokernel). Code is interpreted in kernel when needed (runtime) to run in kernel-mode.
Cache Answers
• Don’t recompute or fetch, when possible
• Virtual memory. Acts as a cache to hold frequently used pieces of memory.
Use Hints
• System may provide a hint as to desired results, or where desired results may be found
• URPC. Calling process will provide a hint to the kernel as to where its processor should next be allocated (server)
Use Brute Force
• If an elegant solution is not possible, fall back on a long calculation
• Specialization/RPC variants. Both do something clever when possible, but do the standard thing when not possible
Compute In Background
• If work is not immediately necessary, do it during downtime
• Cleanup in log-based file systems. Segment cleaning could be scheduled for nighttime.
Batch Processing
• Amortize cost by doing a bunch of operations at once
• Page Protection. In VMM systems, we’ve seen that protecting/unprotecting multiple pages is faster
End-to-end
• Error detection and recovery is not strictly necessary at all levels, but only for performance.
• Layers. Error detection could be handled at any layer… really depends on the application
Log Updates
• Periodically record and backup the state of a system, and be able to recover
• Log-based file systems. RAID 5 in Elephant, too.
Make Actions Atomic
• Either have operations complete or fail without residue
• RCU. Changes are seen as atomic to all processes.
Summary
• Make it simple• Do one thing well• Easiest thing possible• Delegate work• Tackle one aspect only• Be consistent
References
• http://research.microsoft.com/~lampson/
• http://the-age-of-reason.blogspot.com/2004_06_20_the-age-of-reason_archive.html