+ All Categories
Home > Documents > The von Neumann Syndrome

The von Neumann Syndrome

Date post: 09-Jan-2016
Category:
Upload: afia
View: 75 times
Download: 1 times
Share this document with a friend
Description:
TU Delft, Sept 28, 2007. The von Neumann Syndrome. Reiner Hartenstein TU Kaiserslautern. (v.2). http://hartenstein.de. von Neumann Syndrome. this term has been coined by “ RAM ” (C.V. Ramamoorthy, emeritus, UC Berkeley). 60 years later the von Neumann (vN) model took over. - PowerPoint PPT Presentation
40
The von Neumann Syndrome Reiner Hartenstein TU Kaiserslautern TU Delft, Sept 28, 2007 p://hartenstein.de (v.2)
Transcript
Page 1: The von Neumann Syndrome

The von Neumann Syndrome

Reiner Hartenstein

TU Kaiserslautern

TU Delft, Sept 28, 2007

http://hartenstein.de

(v.2)

Page 2: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de2

TU Kaiserslauternvon Neumann Syndrome

this term has been

coined by “RAM” (C.V. Ramamoorthy, emeritus, UC Berkeley)

Page 3: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de3

TU Kaiserslautern

The first Reconfigurable Computer

•prototyped 1884 by Herman Hollerith

•a century before FPGA introduction

•data-stream-based•data-stream-based

•60 years later the von Neumann (vN) model took over

•instruction-stream-based

•instruction-stream-based

Reiner Hartenstein
Herman Hollerith *29 Feb 1860 Buffalo
Page 4: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de4

TU KaiserslauternOutline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 5: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de5

TU KaiserslauternThe spirit of the Mainframe Age

•For decades, we’ve trained programmers to think sequentially, breaking complex parallelism down into atomic instruction steps …

•Even in “hardware” courses (unloved child of CS scenes) we often teach von Neumann machine design – deepening this tunnel view

•… finally tending to code sizes of astronomic dimensions

•1951: Hardware Design going von Neumann (Microprogramming)

Page 6: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de6

TU Kaiserslautern

von Neumann: array of massive overhead phenomena

overheadvon Neumann

machine

instruction fetch instruction stream

state address computation instruction stream

data address computation instruction stream

data meet PU instruction stream

i/o - to / from off-chip RAM instruction stream

… other overhead instruction stream

… piling up to code sizes of astronomic dimensions

Page 7: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de7

TU Kaiserslautern

von Neumann: array of massive overhead phenomena

overheadvon Neumann

machine

instruction fetch instruction stream

state address computation instruction stream

data address computation instruction stream

data meet PU instruction stream

i/o - to / from off-chip RAM instruction stream

… other overhead instruction stream

piling up to code sizes of astronomic dimensions

[R.H. 1975] universal bus

considered harmful

[Dijkstra 1968] the “go to”

considered harmful

temptations by von Neumann style software

engineering

massive communication

congestion

Backus, 1978: Can programming be liberated from the von Neumann style?Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style

Page 8: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de8

TU Kaiserslautern

von Neumann overhead: just one

example

overheadvon Neumann

machine

instruction fetch instruction stream

state address computation instruction stream

data address computation instruction stream

data meet PU instruction stream

i/o - to / from off-chip RAM instruction stream

… other overhead instruction stream

[1989]: 94% computation load

(image processing example)

94% computation load

only for moving this window

Page 9: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de9

TU Kaiserslautern

the Memory Wall

DRAM7%/yr..

1

10

100

1000Performance

1980 1990 2000

DRAM

CPU

µProc60%/yr..

Dave Patterson’s Law -“Performance” Gap:

… needs off-chip RAM which fully hits

instruction stream code size of astronomic dimensions …..

growth 50% / yeargrowth 50% / year

CPU clock speed ≠ performance:processor’s silicon is mostly cache

better compare off-chip vs. fast on-chip memory

ends in 2005ends in 2005

2005

: ~

1000

2005

: ~

1000

Reiner Hartenstein
processors are not that good
Page 10: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de10

TU Kaiserslautern

Benchmarked Computational Density

[BWRC, UC Berkeley, 2004]

1990 1995 2000 2005

200

100

0

50

150

75

25

125

175

SP

EC

fp20

00/M

Hz/

Bill

ion

Tra

nsis

tors

DEC alpha

SUNHP

IBM

alp

ha:

dow

n b

y 1

00

in

6 y

rsIB

M:

dow

n b

y 2

0 in 6

yrs

stolen from Bob Colwell

CPU caches ...

CPU clock speed ≠ performance:processor’s silicon is mostly cache

Reiner Hartenstein
intel curve removed, meanwhile allcurves removed from RAMP website
Page 11: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de11

TU KaiserslauternOutline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 12: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de12

TU KaiserslauternThe Manycore future

• we are embarking on a new computing age -- the age of massive parallelism [Burton Smith]

• multiple von Neumann CPUs on the same µprocessor chip lead to exploding (vN) instruction stream overhead [R.H.]

• Even mobile devices will exploit multicore processors, also to extend battery life [B.S.]

• everyone will have multiple parallel computers [B.S.]

Page 13: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de13

TU KaiserslauternSeveral overhead phenomena

The instruction-stream-based parallel von

Neumann approach:

the watering pot model [Hartenstein]

has several

von Neumann overhead

phenomena

has several

von Neumann overhead

phenomena

per CPU!

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Page 14: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de14

TU Kaiserslautern

Explosion of overhead by von Neumann parallelism

overheadvon Neumann

machine

monoprocessor

local overhead

instruction fetch instruction streamstate address computation instruction streamdata address computation instruction stream

data meet PU instruction streami / o to / from off-chip RAM instruction stream

… other overhead instruction stream

parallel

global

inter PU communication instruction stream

message passing instruction stream

proportionate to the number of processors

disproportionate to the number of processors[R.H. 2006] MPI

consideredharmful

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Page 15: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de15

TU KaiserslauternRewriting Applications

•more processors means rewriting applications

•we need to map an application onto different size manycore configurations

•most applications are not readily mappable onto a regular array.

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

•Mapping is much less problematic with Reconfigurable Computing

Page 16: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de16

TU Kaiserslautern

Disruptive Development

•Computer industry is probably going to be disrupted by some very fundamental changes. [Ian Barron]

•I don‘t agree: we have a model.

•A parallel [vN] programming model for manycore machines will not emerge for five to 10 years [experts from Microsoft Corp].

•We must reinvent computing. [Burton J. Smith]

•Reconfigurable Computing: Technology is Ready, Users are Not•It‘s mainly an education problem The Education Wall

Reiner Hartenstein
....... does not support massive parallelism in large systems......
Page 17: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de17

TU KaiserslauternOutline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 18: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de18

TU Kaiserslautern

The Reconfigurable Computing Paradox

•The spirit from the Mainframe Age is collapsing under the von Neumann syndrome

• There is something fundamentally wrong in using the von Neumann paradigm

•Up to 4 orders of magnitude speedup + tremendously slashing the electricity bill by migration to FPGA

•Bad FPGA technology: reconfigurability overhead, wiring overhead, routing congestion, slow clock speed

• The reason of this paradox ?

Page 19: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de19

TU Kaiserslauternbeyond von Neumann

Parallelism

We need an approach like this:

The instruction-stream-based von Neumann

approach:

the watering pot model [Hartenstein]

has several

von Neumann overhead

phenomena

has several

von Neumann overhead

phenomena

per CPU!

per CPU!

it’s data-stream-based RC*

it’s data-stream-based RC*

*) “RC” = Reconfigurable Computing

Page 20: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de20

TU Kaiserslautern

von Neumann overhead vs. Reconfigurable

Computing

overheadvon Neumann

machinehardwired

anti machinereconfigurable anti machine

instruction fetch instruction stream none*state address computation instruction stream none*data address computation instruction stream none*

data meet PU + other overh. instruction stream none*i / o to / from off-chip RAM instruction stream none*Inter PU communication instruction stream none*

message passing overhead instruction stream none*

using

reconfigurable

data countersusing datacounters

usingprogramcounter

*) configured before run time

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPA: reconfigurable datapath arrayrDPA: reconfigurable datapath array

(coa

rse-

grai

ned

rec.

)(c

oars

e-gr

aine

d re

c.)

no

inst

ruct

ion

fetc

h a

t ru

n

tim

e

Page 21: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de21

TU Kaiserslautern

overheadvon Neumann

machinehardwired

anti machinereconfigurable anti machine

instruction fetch instruction stream none*state address computation instruction stream none*data address computation instruction stream none*

data meet P + other overh. instruction stream none*i / o to / from off-chip RAM instruction stream none*Inter PU communication instruction stream none*

message passing overhead instruction stream none***) just by reconfigurable address generator

von Neumann overhead vs. Reconfigurable

Computingusing

reconfigurable

data countersusing datacounters

usingprogramcounter

*) configured before run time

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

[1989]: x 17 speedup by GAG**

(image processing example)

rDPA: reconfigurable datapath arrayrDPA: reconfigurable datapath array

(coa

rse-

grai

ned

rec.

)(c

oars

e-gr

aine

d re

c.)

[1989]: x 15,000 total speedup

from this migration project

Page 22: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de22

TU Kaiserslautern

Reconfigurable Computing means …

• Reconfigurable Computing means moving overhead from run time to compile time**

• For HPC run time is more precious than compiletime

• Reconfigurable Computing replaces “looping” at run time* …

http://www.tnt-factory.de/videos_hamster_im_laufrad.htm

… by configuration before run time

*) e. g. complex address computation**) or, loading time

Page 23: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de23

TU Kaiserslautern

Data meeting the Processing Unit (PU)

by Software

byConfigware

routing the data by memory-cycle-hungry instruction streams thru shared memory

data-stream-based: placement* of the execution locality ...

We have 2 choices

pipe network generated by configware compilation

... explaining the RC advantage

*) before run time

(data)

(PU)

Page 24: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de24

TU Kaiserslautern

pipe network, organized at compile time

rDPA = rDPU array, i. e. coarse-grained

rDPU = reconf. datapath unit (no program counter)

What pipe network ?

rDPArDPA

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

Generalization* of

the systolic array

array port receiving or sending a data stream

rDPArDPA

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU[R. Kress, 1995]

*) supporting non-linear pipes on free form hetero arrays

depending on connect fabrics

Page 25: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de25

TU Kaiserslautern

datacounter

GAG RAM

ASM: Auto-Sequencing

MemoryrDPArDPA

ASMASM ASMASM ASMASM

ASMASM

ASMASM

ASMASM

Migration benefit by on-chip RAM

so that the drastic code size reduction by software to configware migration can beat the memory wall

Some RC chips have hundreds of on-chip RAM blocks, orders of magnitude faster than off-chip RAM

multiple on-chip RAM blocks are the enabling technology for ultra-fast anti machine solutions

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

ASMASM

ASMASM

ASMASM

ASMASM ASMASM ASMASM

rDPA = rDPU array, i. e. coarse-grainedrDPU = reconf. datapath unit (no program counter)

GAGs inside ASMs generate the data streams

GAG = generic address generator

Page 26: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de26

TU Kaiserslautern

Coarse-grained Reconfigurable Array exampleimage processing: SNN filter ( mainly a pipe network)

note: kind of software perspective, but without instruction streams datastreams+ pipelining

note: kind of software perspective, but without instruction streams datastreams+ pipelining

compiled by Nageldinger‘s KressArray Xplorer (Juergen Becker‘s CoDe-X inside)

array size: 10 x 16 = 160 such rDPUs

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

rout thru only

not usedbackbus connect

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

rDPUrDPU. . . . . .

. .

32 bits wide32 bits wide

mesh-connected; exceptions: see

3 x 3 fast on-

chip RAM

coming close to programmer‘s mind set (much closer than FPGA)

coming close to programmer‘s mind set (much closer than FPGA)

Page 27: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de27

TU KaiserslauternOutline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 28: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de28

TU Kaiserslautern Software / Configware Co-Compilation

Analyzer/ Profiler

SW code

SWcompiler

paradigm“vN" machine

CW Code

CWcompiler

anti machineparadigm

Partitioner

C language source

FW Code

Juergen Becker

1996

But we need a dual paradigm approach: to run legacy software together w. configware

Reconfigurable Computing: Technology is Ready. -- Users are Not ?

apropos compilation:

The CoDe-X co-compiler

Page 29: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de29

TU Kaiserslautern

Curricula from the mainframe age

non-von-Neumann accelerators

(procedural) structurallydisabled

(this is not a lecture on brain regions)

no common modelno common model

the education wallthe education wall

not really taughtnot really taught the

main

pro

ble

mth

e m

ain

pro

ble

mthe common model is

ready, but users are

notthe common model is

ready, but users are

not

Page 30: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de30

TU KaiserslauternWe need a twin paradigm

education

Brain Usage: both Hemispheres

each side needs its own common model

each side needs its own common model

procedural structural

(this is not a lecture on brain regions)

Page 31: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de31

TU KaiserslauternRCeducation 2008

http://fpl.org/RCeducation/

The 3rd International Workshop on Reconfigurable Computing Education

April 10, 2008, Montpellier, France

teaching RC ?

Page 32: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de32

TU KaiserslauternWe need new courses

“We urgently need a Mead-&-Conway-like text book “[R. H., Dagstuhl Seminar 03301,Germany, 2003]

We need undergraduate lab courses with HW / CW / SW partitioning

We need new courses with extended scope on parallelism and algorithmic cleverness for HW / CW / SW co-design

20072007Here it is !

Page 33: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de33

TU KaiserslauternOutline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 34: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de34

TU KaiserslauternConclusions

•But we need it for some small code sizes, old legacy software, etc. …

•Data streaming is the key model of parallel computation – not vN

•We need to increase the population of HPC-competent people [B.S.]

•The twin paradigm approach is inevitable, also in education [R. H.].

•Von-Neumann-type instruction streams considered harmful [RH]

•We need to increase the population of RC-competent people [R.H.]

Page 35: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de35

TU KaiserslauternAn Open Question

please, reply to:

• Coarse-grained arrays: technology ready*, users not ready

• Much closer to programmer’s mind set: really much closer than FPGAs**

•Which effect is delaying the break-through?

*) offered by startups (PACT Corp. and others)

**) “FPGAs? Do we need to learn hardware design?”

Page 36: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de36

TU Kaiserslautern

thank you

Page 37: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de37

TU Kaiserslautern

END

Page 38: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de38

TU Kaiserslautern

.

Page 39: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de39

TU Kaiserslautern

Disruptive Development

The way the industry has grown up writing software - the languages we chose, the model of synchronization and orchestration, do not lead toward uncovering parallelism for allowing large-scale composition of big systems. [Iann Barron]

Reiner Hartenstein
....... does not support massive parallelism in large systems......
Page 40: The von Neumann Syndrome

© 2007, [email protected] [R.H.] http://hartenstein.de40

TU Kaiserslautern

Dual paradigm mind set: an old hat

Mapped into a Hardware mind set: action box = Flipflop, decision box = (de)multiplexer

Software mind set: instruction-stream-based: flow chart -> control instructions

(mapping from procedural to structural domain)

C. G. Bell et al: The Description and Use of Register-Transfer Modules (RTM's); IEEE Trans-C21/5, May 1972

W. A. Clark: Macromodular Computer Systems; 1967 SJCC, AFIPS Conf. Proc.1967:1972:

FF

token bitevoke

FF FF


Recommended