Download - Paintable Computing

Paintable Computing

A Presentation of:“Programming A Paintable Computer”

William ButeraPhD Thesis, MIT 2002

all images (c) their respective owners

The Goal

● Computing by the Liter

The Big Idea

● The Superlative Multi-Processor● Inverse of Current Architecture Paradigm● What are the hard problems?

– Are they worse than what has already been solved?

Architecture Problems

● Asynchronous devices– No easy way to make synchronous

● Highly Unreliable Processors


● No Global Communication● Unknown (and Unknowable) Topology


● Code must be compact– Nodes cannot support large processes– Working sets must be small

● Infinitely many paths to failure

The Solution

● New Architecture => New Solution– Out with the old

assumptions● Self Assembly

– Better paradigm– Redefines “success”

Complex Adaptive Systems

● Aggregate Behavior – simple parts => arbitrarily complex systems

● Statistical Output– Local Interactions => Global State

Implementing a Solution

● What sort of hardware is a good target?– Cannot be too small

● Must be able to do useful work– Cannot be too large

● Must be hard enough

Reference Standard “Paintable” Computer-- Processing --

● Really tiny “traditional” architecture– CPU: 10-200Mhz– RAM: 50K words– Bus: 16+ bit– Programmable in traditional languages

● C, Java, etc

Reference Standard “Paintable” Computer-- Power --

● Unspecified interface– Does not impinge on the architecture

● Examples– Batteries– Chemical substrate– Photo-cell– Structural power routing– Fuel Cells

Reference Standard “Paintable” Computer-- Networking --

● Directionless● Bandwidth: 100kbps Full Duplex● Radius: ~8 particles

– Gaussian Random distribution of connectivity● Example Technologies:

– luminescence– electrostatic– near-field RF

The Pushpin Computer

● A real system– An example of a

paintable computer● Model architecture

– 330 nodes

System Layout

● Separate communication, ground, and power● Planes separated by flexible silicon insulation

Programming Model

● Program Fragments (PFrag's)– Computational Elements

● Shared Memory Partitions– Inter Process Communication

● Embedded OS– Local Resource Control– Special PFrag Services

Shared Memory Layout

Shared Memory Layout

● PFrag I/O– Bassinet: Pre-Load Store– Launch Pad: Post-Unload Store

● Data I/O– Home Page: Output to Neighbors– Mirrored Home Pages: Input From Neighbors– Organized as a key value pairs

OS Services 1 and 2 of 4

● Housekeeping– Defragmenting Memory– Resizing I/O Zone

● Network Access– Inter Processor Communication

● Manages Access I/O Regions● Manages Joins/Leaves

– Mediates PFrag Access to I/O Regions

OS Services 3 and 4 of 4

● Running PFrags– Installs / Uninstalls the PFrag– Runs the PFrag

● PFrag Services– Mathematics– Random Numbers– Access to Memory– Transit Request Messages– etc.

PFrag Implementation

● Implements Five Functions– Install

● Moves Self From Bassinet to Main Memory– DeInstall

● Cleans Up and Erases Self– Update

● Runs the process

PFrag Transit

● Transfer-Granted– Cleans Up and Moves to Launch Pad for Transit

● Transfer-Refused– Allows PFrag to Dequeue Transfer Request

Does It Work?

● Need to prove Viability– Simple Applications that we can use to:

● Test● Validate

– Butera Implements● BreadCrumbs● Near Sighted Mailman● Knitting Club

What Is It Good For?

● What software will Motivate?– Need a “Killer App”

● Only works well on a Paintable Architecture– Butera Implements

● Gradient● MultiGrad● Tessellation Operator● Diffusion● Channel Operator● Coordinate Operator

Can It Do What I Want It To?

● Need to Prove Utility– Simple Applications that do something Useful

● Traditional Service We Cannot Live Without– Butera Implements:

● Streaming Audio● Holistic Data Storage● Surface Bus● Image Segmentation

Where do we go from here?

● AMD and Intel– 2-4 on core

processors– cannot go faster

● go wider!● How far can we

expand sideways?● A job for Architecture!

How Good Is It?

● Can We Compare to Traditional?– Apples = Oranges?

● Consider Two Cases– Serial Operation– Embarrassingly Parallel Operation

The Worst Case: Serial

● Cannot “optimize away” all serial operations● Interactive Programs

– Shells will work ● low system requirements● low communication overhead

– Will need a new device to do graphics● Build output into paintable?● Integrate a larger processor for graphics?

● Can still do Mulitprocessing● Can make it massively fault tolerant

The Best Case: Parallel

● Where is our overhead?– Getting the problem to the device– Getting the problem off the device– Sharing intermediates

● The Computation Scales– with number of units– Cost is critical

How Does Cost Scale?

● Butera's Die Assumptions:– Large Die = 100 mm2

– Medium Die = 25 mm2

– Small Die = 1 mm2

● Current Processor Dies– Pentium M = 84 mm2

– Pentium 4 = 131 mm2

– “Smithfield” Dual Core = 206 mm2

– Opteron Dual Core = 199 mm2

Peering Into the Process

● Butera's Defect Rate Analysis– 200, 500, 1000 defects

● Class 1 Cleanroom – 1 particle per ft3

– 30cm (diameter) wafers = 0.785 ft2

● 250 to 1270 ft of linear air motion– If process takes 5 days:

● 2 to 10.5 ft per hour air motion● Assumptions are reasonable, even optimistic

How Does Cost Scale?● Butera's Calculations

Why is this relevant?● Yield Ratio is ~ 200 : 1

– As much as 20,000% greater yield

How Can This Help?

● Consider a motivating problem– Embarrassingly Parallel– O(n2) computation– O(n) input– O(n) output– No inter-node communication

● Only Need to Consider Problem Input/Output– O( n * log8(n) )

How Does It Scale?

0 1000 2000 3000 4000 5000 6000 7000 80000

10000

20000

30000

40000

50000

60000

Paintable Benefit Crossover

EmbarrassingWith Startup CostTraditional

Problem Size O(n^2)

Run

Tim

e