Paintable Computing
A Presentation of:“Programming A Paintable Computer”
William ButeraPhD Thesis, MIT 2002
all images (c) their respective owners
The Goal
● Computing by the Liter
The Big Idea
● The Superlative Multi-Processor● Inverse of Current Architecture Paradigm● What are the hard problems?
– Are they worse than what has already been solved?
Architecture Problems
● Asynchronous devices– No easy way to make synchronous
● Highly Unreliable Processors
Architecture Problems
● No Global Communication● Unknown (and Unknowable) Topology
Architecture Problems
● Code must be compact– Nodes cannot support large processes– Working sets must be small
● Infinitely many paths to failure
The Solution
● New Architecture => New Solution– Out with the old
assumptions● Self Assembly
– Better paradigm– Redefines “success”
Complex Adaptive Systems
● Aggregate Behavior – simple parts => arbitrarily complex systems
● Statistical Output– Local Interactions => Global State
Implementing a Solution
● What sort of hardware is a good target?– Cannot be too small
● Must be able to do useful work– Cannot be too large
● Must be hard enough
Reference Standard “Paintable” Computer-- Processing --
● Really tiny “traditional” architecture– CPU: 10-200Mhz– RAM: 50K words– Bus: 16+ bit– Programmable in traditional languages
● C, Java, etc
Reference Standard “Paintable” Computer-- Power --
● Unspecified interface– Does not impinge on the architecture
● Examples– Batteries– Chemical substrate– Photo-cell– Structural power routing– Fuel Cells
Reference Standard “Paintable” Computer-- Networking --
● Directionless● Bandwidth: 100kbps Full Duplex● Radius: ~8 particles
– Gaussian Random distribution of connectivity● Example Technologies:
– luminescence– electrostatic– near-field RF
The Pushpin Computer
● A real system– An example of a
paintable computer● Model architecture
– 330 nodes
System Layout
● Separate communication, ground, and power● Planes separated by flexible silicon insulation
Programming Model
● Program Fragments (PFrag's)– Computational Elements
● Shared Memory Partitions– Inter Process Communication
● Embedded OS– Local Resource Control– Special PFrag Services
Shared Memory Layout
Shared Memory Layout
● PFrag I/O– Bassinet: Pre-Load Store– Launch Pad: Post-Unload Store
● Data I/O– Home Page: Output to Neighbors– Mirrored Home Pages: Input From Neighbors– Organized as a key value pairs
OS Services 1 and 2 of 4
● Housekeeping– Defragmenting Memory– Resizing I/O Zone
● Network Access– Inter Processor Communication
● Manages Access I/O Regions● Manages Joins/Leaves
– Mediates PFrag Access to I/O Regions
OS Services 3 and 4 of 4
● Running PFrags– Installs / Uninstalls the PFrag– Runs the PFrag
● PFrag Services– Mathematics– Random Numbers– Access to Memory– Transit Request Messages– etc.
PFrag Implementation
● Implements Five Functions– Install
● Moves Self From Bassinet to Main Memory– DeInstall
● Cleans Up and Erases Self– Update
● Runs the process
PFrag Transit
● Transfer-Granted– Cleans Up and Moves to Launch Pad for Transit
● Transfer-Refused– Allows PFrag to Dequeue Transfer Request
Does It Work?
● Need to prove Viability– Simple Applications that we can use to:
● Test● Validate
– Butera Implements● BreadCrumbs● Near Sighted Mailman● Knitting Club
What Is It Good For?
● What software will Motivate?– Need a “Killer App”
● Only works well on a Paintable Architecture– Butera Implements
● Gradient● MultiGrad● Tessellation Operator● Diffusion● Channel Operator● Coordinate Operator
Can It Do What I Want It To?
● Need to Prove Utility– Simple Applications that do something Useful
● Traditional Service We Cannot Live Without– Butera Implements:
● Streaming Audio● Holistic Data Storage● Surface Bus● Image Segmentation
Where do we go from here?
● AMD and Intel– 2-4 on core
processors– cannot go faster
● go wider!● How far can we
expand sideways?● A job for Architecture!
How Good Is It?
● Can We Compare to Traditional?– Apples = Oranges?
● Consider Two Cases– Serial Operation– Embarrassingly Parallel Operation
The Worst Case: Serial
● Cannot “optimize away” all serial operations● Interactive Programs
– Shells will work ● low system requirements● low communication overhead
– Will need a new device to do graphics● Build output into paintable?● Integrate a larger processor for graphics?
● Can still do Mulitprocessing● Can make it massively fault tolerant
The Best Case: Parallel
● Where is our overhead?– Getting the problem to the device– Getting the problem off the device– Sharing intermediates
● The Computation Scales– with number of units– Cost is critical
How Does Cost Scale?
● Butera's Die Assumptions:– Large Die = 100 mm2
– Medium Die = 25 mm2
– Small Die = 1 mm2
● Current Processor Dies– Pentium M = 84 mm2
– Pentium 4 = 131 mm2
– “Smithfield” Dual Core = 206 mm2
– Opteron Dual Core = 199 mm2
Peering Into the Process
● Butera's Defect Rate Analysis– 200, 500, 1000 defects
● Class 1 Cleanroom – 1 particle per ft3
– 30cm (diameter) wafers = 0.785 ft2
● 250 to 1270 ft of linear air motion– If process takes 5 days:
● 2 to 10.5 ft per hour air motion● Assumptions are reasonable, even optimistic
How Does Cost Scale?● Butera's Calculations
Why is this relevant?● Yield Ratio is ~ 200 : 1
– As much as 20,000% greater yield
How Can This Help?
● Consider a motivating problem– Embarrassingly Parallel– O(n2) computation– O(n) input– O(n) output– No inter-node communication
● Only Need to Consider Problem Input/Output– O( n * log8(n) )
How Does It Scale?
0 1000 2000 3000 4000 5000 6000 7000 80000
10000
20000
30000
40000
50000
60000
Paintable Benefit Crossover
EmbarrassingWith Startup CostTraditional
Problem Size O(n^2)
Run
Tim
e