Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | audrey-warner |
View: | 218 times |
Download: | 2 times |
February 12, 1998 Aman Sareen
DPGA-Coupled Microprocessors
Commodity IC’s for the Early 21st Century
by
Aman SareenAman SareenSchool of Electrical Engineering and Computer Science
Ohio University
February 12, 1998 Aman Sareen 2
What’s going to be covered ??
Part 1Technology TrendsApplication OutlookSome Developed Reconfigurable EnginesApplications of Reconfigurable LogicCommon Objectives of Reconfigurable DevicesLimitations of the Current Systems
February 12, 1998 Aman Sareen 3
What’s going to be covered ?? (cont.)
Part 2Uniform Computational Array Model
FPGA SIMD Arrays
Hybrid Arrays DPGA
Applications Benefits
DPGA Prototype Highlights Architecture Implementation
February 12, 1998 Aman Sareen 4
What’s going to be covered ?? (cont...)
Part 3DPGA Coupled Processor ApplicationsCosts and Benefits of ReconfigurationChallengesConclusion
February 12, 1998 Aman Sareen 5
Technology Trends
What's going on in the industry??Operational performance of microprocessors is increasing by 60% each year.More and more transistors (25% increase per year) on a single chip.12 million transistors on a single chip are estimated by the end of the century.
Disadvantages ??High performance is not we get always.Cost ineffective.Risks overspecialization.Reduced volume utilization per design investment.
So what do we do ?? => Reconfigurable Design
What does it do ??Application acceleration.Implement system specific functions.
February 12, 1998 Aman Sareen 6
Application Outlook
There’s always a scope of additions/modifications
So what do we do ?? => Reconfigurable Design
What does it do ??It allows applications to specialize the hardware.
February 12, 1998 Aman Sareen 7
Some Developed Reconfigurable Engines
PRISM ( Processor Reconfiguration through Instruction-Set Metamorphosis)built by Athanas and Silverman.* couples a programmable element with a microprocessor.* each application synthesizes new processor instructions for acceleration.
CM-2 built at the Supercomputing Research Center by Cuccaro and Reese.* the processor is augmented with reconfigurable logic to perform common operations.
SPLASH built at the Supercomputing Research Center.* used in genome sequence matching.
February 12, 1998 Aman Sareen 8
Applications of Reconfigurable Logic
Binary Operations.Arithmetic.Encryption/Decryption/Compression.Sequence and string matching.Sorting.Physical system simulation.Video and image processing.
February 12, 1998 Aman Sareen 9
Common Objectives in Reconfigurable Applications
High performance.Clear potential for application acceleration.Exploring bit-level parallel computation.High performance through parallelism.Customize data paths.
February 12, 1998 Aman Sareen 10
Limitations of the Current Systems
Low Bandwidth and High Latency InterfaceExpected acceleration not achievable.Prevents close cooperation between fixed and reconfigurable logic circuits.Expensive.Limits throughput.
High Reconfiguration OverheadSingle configuration must be maintained throughout an application.Multitasking/Time sharing not possible.
February 12, 1998 Aman Sareen 11
Unified Computational Array Model
Arr
ay E
lem
ent
Com
puta
tiona
l Uni
t
Inpu
ts f
rom
loca
l sta
te o
r fr
om o
ther
arr
ay e
lem
ents
Out
puts
to lo
cal s
tate
or
to o
ther
arr
ay e
lem
ents
Instruction
Computational Block of AE
February 12, 1998 Aman Sareen 12
Unified Computational Array ModelLookup Models for AE Computational Unit
Lookup Table(Memory)
Inpu
ts f
rom
loca
l sta
te
or f
rom
oth
er a
rray
el
emen
tsInstruction
Outputs to local state or to other array elements
Data Outputs
Add
ress
Inp
uts
Instruction = MemoryProgramming
Outputs to local state or to other array elements
Data OutputsInpu
ts f
rom
loca
l sta
te
or f
rom
oth
er a
rray
el
emen
ts
Add
ress
Inp
uts
Lookup Table(Memory)
February 12, 1998 Aman Sareen 13
Ideally, different instruction for each AE on each computational cycle
Drawback: Instruction distribution resource requirement increases. Instruction bandwidth becomes unmanageable.
P * log2(Nf)tcycle
IBW =
Unified Computational Array Model
Instruction Distribution
P = 100, Nf = 64, Operational Freq. = 10 MHz
IBW => 6 Gbits/sec
February 12, 1998 Aman Sareen 14
Unified Computational Array ModelWeakening Instruction Distribution
SIMD ArrayGlobal Instruction
(common to all elements in array)FPGA Instruction / AE Uniform in time Slow programming phase
SIMD Array Instruction / cycle Uniform in space A
rray
Ele
men
tC
ompu
tati
onal
Uni
t
Inpu
ts f
rom
loca
l sta
te o
r fr
om o
ther
arr
ay e
lem
ents
Out
puts
to lo
cal s
tate
or
to o
ther
arr
ay e
lem
ents
Instruction
FPGAStatic Instruction
( distinct for each array elementefficiently constant during operation)
February 12, 1998 Aman Sareen 15
FPGA v/s SIMD Computation
FPGA Fixed Function in Time Spatially Varying Computation Bit-Parallel Computation Build Computation Spatially
* Low-latency
SIMD Array Operation Varies in Time Homogenous Computation in Space Bit-Serial Computation Build Computation in Time
* High Throughput on Homogenous data
February 12, 1998 Aman Sareen 16
Dynamically Programmable Gate Arrays
Hybrid Model
Multiple Context FPGABroadcast a Context IdentifierIndirect Instruction LookupFeatures:
Rapid Context SwitchExploits local, on-chip BandwidthSpatially and Temporally Varying ComputationHigh Logic DensityReuse Gates and Wires in Time
February 12, 1998 Aman Sareen 17
Dynamically Programmable Gate ArraysConfigurable Instruction-Store View of DPGA AE
Computational Unit(Lookup Table)
Inpu
ts f
rom
loca
l sta
te
or f
rom
oth
er a
rray
el
emen
ts
Outputs to local state or to other array elements
Data Outputs
Add
ress
Inp
uts
Dat
a O
utpu
ts
Address InputsInstruction Store(Lookup Table)Configurational Unit
function is configuredby Instruction Storeoutput Programming may
differ for eacharray element
Global Context Identifier(common to all elements)
Inst
ruct
ion
February 12, 1998 Aman Sareen 18
Dynamically Programmable Gate Arrays
Applications
Rapid Context Switch FPGA
Time-Slice Computation
Temporal Pipelining
Operation Cache
Processor Assistance
Multi-Stream SIMD
Boundary Condition handling
Virtual Cells
February 12, 1998 Aman Sareen 19
DPGA Prototype - Highlights
4 on-chip configuration contexts
DRAM configuration cells
Automatic refresh of dynamic memory elements
Non-intrusive background loading
Wide bus architecture for high-speed context loading
Two-level routing architecture
February 12, 1998 Aman Sareen 20
DPGA Prototype - Overview
February 12, 1998 Aman Sareen 21
DPGA Prototype - Context Memory
February 12, 1998 Aman Sareen 22
DPGA Prototype - Array Element
February 12, 1998 Aman Sareen 23
DPGA Prototype - Local Interconnect
February 12, 1998 Aman Sareen 24
DPGA Prototype - Subarray Interconnect
February 12, 1998 Aman Sareen 25
DPGA Prototype - Areas
3 metal, 1µ drawn 0.85µ effective CMOS process
February 12, 1998 Aman Sareen 26
DPGA Prototype - Area Percentages
February 12, 1998 Aman Sareen 27
DPGA Prototype - Estimated Timings
tcycle = tmem + nl * tlut + nx * txbar
February 12, 1998 Aman Sareen 28
DPGA-Coupled Processor Applications
General-Purpose Workstations and Personal Computers.
Special-Purpose Computing Machines.
Embedded Systems.
Multiprocessor Systems
February 12, 1998 Aman Sareen 29
Costs and Benefits of Reconfiguration
Specialized design limits range of application.
Moving exception handling into reconfigurable logic.* Feature Interaction.
* Migrating critical control of fixed resources to reconfigurable logic
February 12, 1998 Aman Sareen 30
Challenges
Processor reconfigurable logic interfacing.
Grain Size.
Area and Pin allocation.
Multitasking and state interaction.
February 12, 1998 Aman Sareen 31
Conclusion
•Prototype demonstrates that efficient DPGAs can be implemented•DPGAs allow computation to vary both spatially and temporally•DPGAs require no additional bandwidth•Both bit-parallel and bit-serial computation in a single array structure•Higher performance•Higher flexibility•Lower part count•Microprocessors with tightly integrated, rapidly reconfigurable logic
promise to be prime commodity building block.