Post on 14-Dec-2015
transcript
Enabling Technologies for
Reconfigurable Computing
Enabling Technologies for Reconfigurable Computing Part 2:Stream-based Computing for RC
Wednesday, November 21, 10.30 – 12.00 hrs.
Reiner Hartenstein
University ofKaiserslautern
November 21, 2001, Tampere, Finland
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de2
University of Kaiserslautern
Xputer Lab
Schedule
time slot
08.30 – 10.00
Reconfigurable Computing (RC)
10.00 – 10.30
coffee break
10.30 – 12.00
Stream-based Computing for RC
12.00 – 14.00
lunch break
14.00 – 15.30
Resources for RC
15.30 – 16.00
coffee break
16.00 – 17.30
FPGAs: recent developments
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de3
University of Kaiserslautern
Xputer Lab>> EDA revolution
• EDA revolution • Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation
http://www.uni-kl.de
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de4
University of Kaiserslautern
Xputer LabEDA: where Electronics begins
[Richard Newton]
1k
•Dataquest InitiativeNew book
• NASDAQ index
EDA index
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de5
University of Kaiserslautern
Xputer Lab
[Richard Newton]
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de6
University of Kaiserslautern
Xputer LabThe End is near
year to market10 0
103
10 6
109
1012
1015
1960 1970 1980 1990 2000 2010 2020 2030 2040
transistors/chip
x1.6/year
The end of Hypergrowth ?
x100/decade
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de7
University of Kaiserslautern
Xputer Lab
Paradigm Shift
Mainstream
Tornado
Development of Hypergrowth Markets
Harper Business 1995
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de8
University of Kaiserslautern
Xputer Lab
Makimoto’s 3rd wave
The next EDA Industry Revolution
1978
Transistor entry: Applicon, Calma, CV ...
1992Synthesis: Cadence, Synopsys ...
1985
Schematics entry: Daisy, Mentor, Valid ...
[Keutzer / Newton]
EDA industry paradigmswitching every 7 years
1999(Co-) Compilation
Stream-based DPU arrays
[Hartenstein]
2006
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de9
University of Kaiserslautern
Xputer Lab Biggest Mistake in History
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de10
University of Kaiserslautern
Xputer LabInnovation Stalled ?[Richard Newton]
What is next after VHDL ?
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de11
University of Kaiserslautern
Xputer Lab What is next after VHDL ?
Motivations• HDL-savvy designers needed• New Business Model• Co-Design never ending• HDLs ?• Extended HDLs – how far ?• Automatic Partitioning
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de12
University of Kaiserslautern
Xputer Lab>> Dead Supercomputer
• EDA revolution
• Dead Supercomputer• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation
http://www.uni-kl.de
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de13
University of Kaiserslautern
Xputer Lab Dead Supercomputer Society
• 37 university and corporate R&D projects: 2 or 3 successes…
• All the rest failed to work or to be successful (Research 1985-1995)
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de14
University of Kaiserslautern
Xputer Lab
Dead Supercomputer Society
• ACRI • Alliant • American
Supercomputer • Ametek • Applied Dynamics • Astronautics • BBN • CDC• Convex• Cray Computer • Cray Research • Culler-Harris • Culler Scientific • Cydrome • Dana/Ardent/
Stellar/Stardent• DAPP
• Denelcor • Elexsi • ETA Systems • Evans and Sutherland• Computer• Floating Point Systems • Galaxy YH-1 • Goodyear Aerospace MPP • Gould NPL • Guiltech • ICL • Intel Scientific Computers • International Parallel
Machines • Kendall Square Research • Key Computer Laboratories
[Gordon Bell, keynote at ISCA 2000].
•MasPar•Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer•Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de15
University of Kaiserslautern
Xputer Lab Dead Supercomputer Society• ACRI• Alliant• American Supercomputer• Ametek• Applied Dynamics• Astronautics • BBN• CDC • Convex• Cray Computer • Cray Research• Culler-Harris • Culler Scientific• Cydrome • Dana/Ardent/Stellar/Stardent• DAP (ICL) • Denelcor • Elexsi • ETA Systems• Evans and Sutherland Computer• Floating Point Systems • Galaxy YH-1
• Goodyear Aerospace MPP • Gould NPL• Guiltech • Intel Scientific Computers • International Parallel Machines• Kendall Square Research • Key Computer Laboratories• MasPar • Meiko • Multiflow • Myrias • Numerix • Prisma • Tera• Thinking Machines • Saxpy • Scientific Computer Systems (SCS) • Soviet Supercomputers• Supertek • Supercomputer Systems• Suprenum • Vitesse Electronics
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de16
University of Kaiserslautern
Xputer Lab>> Stream-based
Computing
• EDA revolution
• Dead Supercomputer
• Stream-based Computing• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation
http://www.uni-kl.de
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de17
University of Kaiserslautern
Xputer LabCoarse Grain Reconfigurable Arrays
vs. Parallel Processes
I-Seq ALU
I-Seq ALUI-Seq ALU
I-Seq ALU I-Seq ALU
I-Seq ALU
I-Seq ALUI-Seq ALU
•••
• • •
I-Seq ALU
• • •
• • •
• • •
• • •
• • •
• • •
DataSequencer
rALU rALU rALU
rALU rALU rALU
rALU rALU rALU
Paralellität auf Prozeß-Ebene Paralellität auf Datenpfad-Ebene
Parallelism at Process Level
Parallelism at Datapath Level
reconfigurablehardwired no
instruction sequencing
!
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de18
University of Kaiserslautern
Xputer Lab Concurrent Computing
DPUinstructionsequencer
DPUinstructionsequencer
DPUinstructionsequencer
DPUinstructionsequencer
....
Bus(es) or switch box
CPUextremely inefficient
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de19
University of Kaiserslautern
Xputer Lab Stream-based Computing
DPU DPUDPUDPU
driven by data stream from / to memory or, from / to peripheral interface
transport-triggered executionno instruction sequencer inside !
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de20
University of Kaiserslautern
Xputer LabStream-based Computing: (r)DPU
array
for both,reconfigurable,and, hardwired
DPU DPUDPU
DPU DPUDPU
DPU DPUDPU
driven by data streams
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de21
University of Kaiserslautern
Xputer Lab>>> extremely high efficiency
• avoiding address computation overhead
• avoiding instruction fetch and interpretation
overhead
• high parallelism, massively multiple deep pipelines
• much less configuration memory
• no routing areas to configure functions from CLBs
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de22
University of Kaiserslautern
Xputer LabSystolic Stream-based Computing
SystemSystolic Array [H. T. Kung, 1980]: an array of DPUs (Data Path Units)
y10
y20
y30
x1
x2
x3
-
-
-
a12
a11 a21
a32
a31
a23 a33
a22
a13
--
y1
y2
y3
---
-
DPU architecturey
+*
x
a
datastreams
equations
placement linearprojection
or algebraicmapping
The Mathematician’s
Synthesis Method
linear pipelinesand uniformarrays only
norouting!
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de23
University of Kaiserslautern
Xputer Lab
computingin space
Computing in space and time
datastreams
y10
y20
y30
---
y1
y2
y3
---
x1
x2
x3
-
- -
computingin time
a12
a11 a21
a32
a31
a23 a33
a22
a13
placement
systolicarrays etc.
and other transformationsmigration by re-timing
this dichotomy iscompletely ignoredby our CS curricula
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de24
University of Kaiserslautern
Xputer Lab
2
General Stream-based Computing Systemheterogenous Array of DPUs (data path units)
Scheduler
Mapper
expression treeDPU architectures
y
+*
x
a
1
simultaneousplacement& routing
3
+
++
+
***sh
*sh
sh sh
xf
xf
-
- datastreams
4
The same mapper for both:Reconfigurable,or hardwired
Kress DPSS [1995]
simulated
annealing
free form
pipe network
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de25
University of Kaiserslautern
Xputer LabConverging Design Flows
this synthesis method is a generalization of
systolic array synthesis:super systolic synthesis
and DPA [Broderson,
2000]: terms:
DPU: datpath unitDPA: data path arrayrDPU: reconfigurable DPUrDPA: reconfigurable DPA
the same synthesis method may be used for mapping an algorithm
onto both:rDPA [Kress, 1995],
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de26
University of Kaiserslautern
Xputer Lab
Super Pipe Networks
pipeline properties array applications
shape resources
mapping scheduling
(data stream formation)
systolic array
regular data dependencies
only
linear only
uniform only
linear projection or algebraic synthesis
super-systolic rDPA
no restrictions simulated
annealing or P&R algorithm
(e.g. force-directed) scheduling algorithm
The key is mapping, rather than architecture
**) KressArray [1995]
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de27
University of Kaiserslautern
Xputer Lab>> Stream-based Memory
Architecture
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation
http://www.uni-kl.de
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de28
University of Kaiserslautern
Xputer LabHot Research Topic: Memory Architectures
•High Performance Embedded Memory Architectures
•High Performance Memory Communication Architectures [Herz]
•Custom Memory Management Methodology [Cathoor]
•Data Reuse Transformations [Kougia et al.]
•Data Reuse Exploration [Soudris, Wuytak]
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de29
University of Kaiserslautern
Xputer LabProcessor Memory Performance Gap
1
10
100
1000Performance
1980 1990 2000
µProc60%/yr..
DRAM7%/yr..
Processor-MemoryPerformance Gap:(grows 50% / year)
DRAM
CPU
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de30
University of Kaiserslautern
Xputer LabRAs: Cache does not help
• the memory bandwidth problem is often more dramatic then for microprocessors
• interleaving is not practicable, since based on sequential instruction streams
• classical caches do not help, since instruction sequencing is not used
• the problem: throughput of parallel data streams, not instruction streams
• super pipe networks, no parallel computers !
• Stream-based arrays are a memory bandwidth problem
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de31
University of Kaiserslautern
Xputer Lab
http://kressarray.de
Efficient Memory Communicationshould be directly supported by the Mapper Tools
sequencers
memory ports
application
not used
Legend:Optimized ParallelMemory Controller
An example byNageldinger’s KressArray Xplorer
Synthesizable Memory Communication
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de32
University of Kaiserslautern
Xputer LabThe Disk Farm? or
a System On a Card?
The 500GB disc cardLOTS of bandwidthA few disks replaced by >10s Gbytes RAM and a processor
14"
MicroDrive:1.7” x 1.4” x 0.2” 2006: ?
1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek
2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW)
Integrated IRAM processor2x height
Connected via crossbar switchgrowing like Moore’s law
16 Mbytes; ; 1.6 Gflops; 6.4 Gops10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops
[Gordon Bell, Jim Gray,
ISCA2000]
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de33
University of Kaiserslautern
Xputer LabMemory Communication Architecture
• hot research topic in embedded systems
• storage context transformations [Herz, others]
• for low power
• for high performance
• startups provide memory IP or generators
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de34
University of Kaiserslautern
Xputer LabStream-based Soft Machine
SchedulerMemory(data memory)
memory bank
memory bank
memory bank
memory bank
memory bank
...
...
“instructions”
rDPACompiler
Sequencers(data stream
generator)
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de35
University of Kaiserslautern
Xputer Lab>> Design Space Explorers
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers • KressArray Xplorer
• Machine paradigms
• Co-Compilation
http://www.uni-kl.de
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de36
University of Kaiserslautern
Xputer Lab
• domain-specific Reconfigurable Platforms will be suitable to cope with the 2nd Design Crisis
• just as the general purpose massively parallel computer system
general purpose is unrealistic
an Illusion ...
KressArray Explorer ...
• fully general purpose reconfigurable sometimes is ....
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de37
University of Kaiserslautern
Xputer Lab Universal RAs: is it feasible?
... such as obviously also the Universal Massively Parallel Computer Architecture... counter-example: Application Domain of Image Processing
The General Purpose (coarse grain)
Reconfigurable Array appears to be an Illusion
...
Motivatio
n
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de38
University of Kaiserslautern
Xputer Lab -> Design Space Exploration
• Design Space Exploration– Design Space Explorer (DSEs) – Platform Space Explorers (PSEs)– Compiler / PSE symbiosis– Parallel computing vs. reconfigurable
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de39
University of Kaiserslautern
Xputer LabDesign Space Exploration Systems
Explorer System year source inter-active
status evaluation status generation
DPE 1991 [66] no abstract models rule-based
Clio 1992 [67] yes prediction models device generator
DIA 1998 [68] yes prediction from library rule-based
DSE for RAW 1998 [49] no analytical models analytical
ICOS 1998 [76] no fuzzy logic greedy search DSE for Multimedia
1999 [77] no simulation branch and bound
Xplorer 1999 [11] [50] yes fuzzy rule-based simulated annealing
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de40
University of Kaiserslautern
Xputer LabDSEs: an overview
• For VLSI design in general• for parallel Computer Systems• Xplorer the only one for
reconfigurable platforms (auch MATRIX ?)
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de41
University of Kaiserslautern
Xputer Lab>> KressArray Xplorer
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer• Machine paradigms
• Co-Compilation
http://www.uni-kl.de
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de42
University of Kaiserslautern
Xputer Lab KressArray DPSS
ApplicationSet
DPSS
published at ASP-DAC 1995
ArchitectureEditor
MappingEditor
statist.Data
DelayEstim.
Analyzer
Architecture
Estimator
interm.form 2
expr.tree
ALE-XCompiler
PowerEstimator
PowerData
VHDLVerilog
HDLGeneratorSimulator
User
ALEXCode
Improvement Proposal Generator
Suggestion
SelectionUserInterface
interm.form 3
Mapper
DesignRules
DatapathGeneratorGenerator
KressrDPU
Layout
data stream Schedule
Scheduler
KressArrayXplorer (Platform Design Space Explorer)
Xplorer
InferenceEngine (FOX)
Sug-gest-ion
KressArrayfamily
parameters
Compiler
Mapper
Scheduler
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de43
University of Kaiserslautern
Xputer Lab
Architecture &Mapping Editor
Stat
istics
KressArray DPSS
DatastreamGenerator
HDLGeneratorSimulator
DatapathGeneratorGenerator
Delay & Power
EstimatorImprovement
ProposalGenerator
User DPSS
SourceInputKressArray
(Design Space)Platform SpaceExplorer
http://kressarray.de
Xplorer
ApplicationSet
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de44
University of Kaiserslautern
Xputer Lab Design Flow of Domain-specific
Architecture Optimization
ApplicationCompilation
ApplicationSelection
ApplicationMapping
MappingAnalysis
ModificationSuggestion
ArchitectureModification
ArchitectureVerification
OptimizedArchitecture
ApplicationSet
Initial Arch.Estimation
or benchm ark
Nageldinger’s KressArray
Design Space Xplorer:
including aFuzzy LogicImprovementProposalGenerator
accessible by internet:
http://kressarray.de
runs best withNetscape 4.6.1
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de45
University of Kaiserslautern
Xputer Lab KressArray Design Space Xplorer
DPSS-NData Path Systhesis System
Analyser
HDL Generator HDLDescription
.v
Module Generator
.krs
Kress IPLibrary
other IP
Editor /User Interface
ArchitectureEstimation
IntermediateFormat
.map
ALE-XCompiler
ALE-XCode
.alex
User
Mapper
Interm.Format
.map
includingconfigwarecode
Technology Mapping
SchedulerData
.seq SequencingCode
KressrDPU.krsLayout
Placement & Routing
Map
pin
g
StatisticalData
.stat
to SynthesisEnvironment
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de46
University of Kaiserslautern
Xputer Lab >> Machine paradigms
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms• Co-Compilation
http://www.uni-kl.de
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de47
University of Kaiserslautern
Xputer Lab
datacounter
instructions
programcounter:
state register
CompilerMemory
Datapath
hardwired
Sequencer
Computer Computer tightly coupledby compact
instruction code
“von Neumann”
“von Neumann”does not supportsoft data pathsdoes not supportsoft data paths
Datapath
reconfigurable
Xputer Xputer
Scheduler
CompilerMemory
multiplesequencer
DatapathArray
“instructions”
University of Kaiserslautern
Xputer Lab
loosely coupledby decision data bits only
Xputer:Xputer:The Soft Machine Paradigm
The Soft Machine Paradigm reconfigurablereconfigurable
also for hardwiredalso for hardwired
Computer:the wrong Machine Paradigm
“von Neumann”
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de48
University of Kaiserslautern
Xputer LabSoft Machine Paradigm
Xputer Xputer Parallel Xputer Parallel Xputer
reconfigurable
Scheduler
CompilerMemory
SequencerDatapath
“instructions”
datacounter
Scheduler
Compiler
SequencerDatapath
Sequencer
•
“instructions”
datacounters reconfigurable
•
mem
ory
mem
ory
••••
multiple
Decision data only; i, e, loose coupling
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de49
University of Kaiserslautern
Xputer Lab Computer:the wrong Machine
Paradigm
CompilerMemory
Sequencer
DecoderDatapath
instructions
programcounter
hardwired
tightly coupledby a compactinstruction code“von
Neumann”
“von Neumann”does not supportsoft data paths:does not supportsoft data paths:
“von Neumann”
at run time: no instruction fetchat run time: no instruction fetch
:InstructionSequencer
Datapath
reconfigurable
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de50
University of Kaiserslautern
Xputer LabMachine Paradigms
machine categoryComputer
(“v. Neumann”)Xputer
(no transputer!)
driven by: control flow data streams (no “dataflow”)
engine principles instruction sequencing data sequencing
state register program counter (multiple) data counter(s)
communicationpath set-up
at run time at load time
resource single ALU array of ALUs & other rDPUsdatapath operation sequential parallel pipe network
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de51
University of Kaiserslautern
Xputer Lab Machine Paradigms
machine categoryComputer
(“v. Neumann”)Xputer [8]
(no transputer!)Machine paradigm procedural sequencing: deterministic
driven by: control flow(no dataflow [13])
data stream(s)RA support no yesengine principles Instruction sequencing data sequencing
state register program counter (multiple) data counter(s)communicationpath set-up
at run time at load time
resource single ALU array of ALUsdatapath operation sequential parallel
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de52
University of Kaiserslautern
Xputer LabFundamental Ideas available
• Data Sequencer Methodology
• Data-procedural Languages (Duality w. v. N.)
• ... supporting memory bandwidth optimization
• Soft Data Path Synthesis Algorithms
• Parallelizing Loop Transformation Methods
• Compilers supporting Soft Machines
• SW / CW Partitioning Co-Compilers
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de53
University of Kaiserslautern
Xputer Lab >> Co-Compilation
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation
http://www.uni-kl.de
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de54
University of Kaiserslautern
Xputer LabFPGA-Style Mapping for coarse
grain reconfigurable arrays
mapping Kress DPSS CHESS RaPiD Colt
placement simulated annealinggenetic
algorithm
routing
simulatedannealing
Pathfindergreedy
algorithm
Compiler
Mapper
Schedulerspecifies and
assembles thedata streams
from / to array
DPSS
KressArray DPSS(Datapath Synthesis System)
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de55
University of Kaiserslautern
Xputer Lab Changing Models of Computing
“von Neumann”
downloading
RAM
downloading
data path instructionsequencer
I / O
(procedural)Software
contemporary
host
hardwired
downloading
accelerator(s)
CAD
RAM
reconfigurablecomputing
host
re-
downloading
conf.accelerator(s)
RAM RAM
SoftwareConfigware
both done at customer siteHardware
designer neededdone at
vendor site
ASICs
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de56
University of Kaiserslautern
Xputer LabChanging Models of Computation
contemporaryhost
hardwired
Compiler
accelerator(s)
CAD
RAM
reconfigurablecomputing
host
re-
Co-Compiler
conf.accelerator(s)
RAM RAM
SoftwareConfigware
Machine
paradigm
Machine paradigm
EDA tools
needed*
ASICs
*) even 80% hardware people hate their tools
both done at customer sitedone at
vendor site
no hardware
experts needed
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de57
University of Kaiserslautern
Xputer Lab
Processor
Co-Compilation
partitioning compiler
Computer
Machine Paradigm
Software running on
Xputer
“Soft” Machine Paradigm
Configware running onGNU C
compiler Analyzer/ Profiler
Hardware / Software Co-Design turnsto Configware / Software Co-Design
supportingdifferentplatforms
Resource Parameters
inte
rfac
e
X-Ccompiler
ReconfigurableAcceleratorsKressArray
DPSS
high level programming language sourceX-C
Partitioner
Jürgen Becker’s Co-DE-X Co-Compiler[ASP-DAC’95]
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de58
University of Kaiserslautern
Xputer Lab
Co-Compilation
Xputer
“Soft” Machine Paradigm
Configware running on
partitioning compiler
high level programming language source
Processor ReconfigurableAcceleratorsin
terf
ace
Reconfigurable Architecture (RA)
-- instead of hardwired
no CAD !
Compilation
instead !
Hardware / Software Co-Design turnsto Configware / Software Co-Design
We introduce: Co-Compilation
Computer
Machine Paradigm
Software running on
Xputer
“Soft” Machine Paradigm
Configware running on
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de59
University of Kaiserslautern
Xputer LabJürgen Becker’s Co-DE-X Co-Compiler
Analyzer/ Profiler
host
GNU Ccompiler
paradigmComputer machine
DPSSKressArray
X-Ccompiler
Xputer machineparadigm
Partitioner
Loop
Transfor-
mationsX-C is C languageextended by MoPLX-C
Resource Parameters
supportingdifferentplatforms
supporting platform-based design
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de60
University of Kaiserslautern
Xputer LabLoop Transformation
Examples
loop 1-8bodybodyendloop
loop 1-8bodyendloop
loop 9-16bodyendloop
fork
joinstrip mining
loop 1-4triggerendloop
loop 1-2triggerendloop
loop 1-8triggerendloop
reconf.array:host:loop 1-16bodyendloop
sequential processes: resource parameter drivenCo-Compilation
loop unrolling
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de61
University of Kaiserslautern
Xputer LabHistory of Loop
TransformationsDavid Loveman, 1977, Allen and Kennedy, et
al.
Loop Unrolling, Loop Fusion, Strip Mining ....
• (Parameter-driven) Time to Time/Space Partitioning1995/97 [Karin Schmidt / Jürgen Becker]: downto Datapath Level:
e. g.: Transformation from Sequential Process to Super-systolic
• Multi-dimensional Loop Unrolling / Storage Scheme Optimization supporting burst-mode & parallel Memory Banks
2000 [Michael Herz]: optimized RA to Memory Communication Bandwidth:
70ies - 80ies: at Process Level:• Sequential to Parallel Processes, incl. Vectorization
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de62
University of Kaiserslautern
Xputer Lab History of Loop Transformations
• For Sequential Programs on Parallel Computers: David Loveman, 1977, Allen and Kennedy, etc.:
Loop Unrolling, Loop Fusion, Strip Mining ....
• For memory communication: Michael Herz (2000): Multi-Level Loop Unrolling to reduce Memory Cycles needed to create RA Data Streams
• For parallel Datapaths: Jürgen Becker (1997): to • Sequential to Super-Systolic Transformation • Optimize Throughput of Reconfigurable Arrays (RAs)
Instruction Code vs. Reconfiguration Code
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de63
University of Kaiserslautern
Xputer Lab Future Coarse Grain RA Development
• It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full-custom-style VLSI Design (array cells).
• It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de64
University of Kaiserslautern
Xputer Lab>> Design Space Explorers
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation
http://www.uni-kl.de
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de65
University of Kaiserslautern
Xputer Lab
Schedule
time slot
08.30 – 10.00
Reconfigurable Computing (RC)
10.00 – 10.30
coffee break
10.30 – 12.00
Stream-based Computing for RC
12.00 – 14.00
lunch break
14.00 – 15.30
Resources forRC
15.30 – 16.00
coffee break
16.00 – 17.30
FPGAs: recent developments
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de66
University of Kaiserslautern
Xputer Lab
END