Post on 25-Feb-2016
description
transcript
MIT Lincoln Laboratory
HPEC 2010 - 1Hendry, et al. 04/22/2023
* This work is sponsored by the Defense Advanced Research Projects Agency (DARPA) under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.
Enabling High Performance Embedded Computing through Memory Access
via Photonic Interconnects
Gilbert HendryEric RobinsonVitaliy GleyzerJohnnie ChanLuca P. Carloni
Nadya BlissKeren Bergman
MIT Lincoln LaboratoryHPEC 2010 - 2
Hendry, et al. 04/22/2023
Photonics:Advantages and Disadvantages
Advantages
Very fast transfer rateVery low latency for
long distancesLow power
Disadvantages
High upfront cost in time to send a packetHigh upfront cost in
power to send a packet
Photonic Interconnects hold potential for on-chip computing. However, the target applications
must be considered to determine if photonics will be beneficial for them
MIT Lincoln LaboratoryHPEC 2010 - 3
Hendry, et al. 04/22/2023
Embedded Computing:ISR Applications
Image Registration
Where is the image in relation to other images
already taken?
Image Sharpening
Can image fidelity be improved through using additional information or
multiple pictures?
SAR Image Formation
How many pulses can feasibly be combined and what size of an image can
we take?
MIT Lincoln LaboratoryHPEC 2010 - 4
Hendry, et al. 04/22/2023
Image Registration
Image Registration Involves:•Image Orientation and
Scaling•Image Alignment
Produces an image that “fits” properly with other registered images to get a global view of the area.
MIT Lincoln LaboratoryHPEC 2010 - 5
Hendry, et al. 04/22/2023
Image Sharpening
Image Fusion:Fuses two low resolution images to form a high resolution result.
Filtering:Enhances image fidelity by combining filters with the original image (Bicubic, Bilinear, Halfband...)
MIT Lincoln LaboratoryHPEC 2010 - 6
Hendry, et al. 04/22/2023
SAR Image Formation
Synthetic Aperture Radar (SAR) is an imaging technique that uses
RADAR pulses rather than photography
SAR Processing:•Image formation nontrivial,
requires combining pulses•The more pulses that can
be processed, the higher the image resolution
•SAR can operate in conditions where traditional photography fails (low light, cloud cover)
MIT Lincoln LaboratoryHPEC 2010 - 7
Hendry, et al. 04/22/2023
ISR Application Kernels
MatrixMultiply
FourierTransform
ProjectiveTransform
ISR Kernels:•Matrix Multiply,
Projective Transform, Fourier Transform
•Used in a broad range of ISR applications
•Typically a performance bottleneck
•Demand high throughput from the memory and network modules
MIT Lincoln LaboratoryHPEC 2010 - 8
Hendry, et al. 04/22/2023
Characteristics of ISR Applications
• Large Memory Access Size• Low Power Requirements• High Memory Access to Compute Ratio• High Throughput Requirements
ISR Applications Ideal Candidates for Photonic Interconnects
MIT Lincoln LaboratoryHPEC 2010 - 9
Hendry, et al. 04/22/2023
Ring Resonators
• Modulator/filter
λ λ
Broadband
MIT Lincoln LaboratoryHPEC 2010 - 10
Hendry, et al. 04/22/2023
Circuit-switched P-NoCs
SD
0V1V
n-regionp-region
Electronic Control
0V
1V
Ohmic Heater
Thermal Control
Tran
smis
sion
Injected Wavelengths
Off-resonance profile On-resonance
profile
MIT Lincoln LaboratoryHPEC 2010 - 11
Hendry, et al. 04/22/2023
Peripheral Memory AccessProcessor Core
Network Router
Memory Access Point
MIT Lincoln LaboratoryHPEC 2010 - 12
Hendry, et al. 04/22/2023
Memory Access Point
To Memory Module
Memory Control
To/FromNetwork-on-Chip
Chip Boundary
Control plane
Data plane
On Chip
Off Chip
Modulators
From Memory Module
[V. R. Almeida et al. Cornell]
MIT Lincoln LaboratoryHPEC 2010 - 13
Hendry, et al. 04/22/2023
Photonic TDM Network
• Mesh topology• Distributed switch control• Single dimension
transmission• Controlled by fixed time slots
:
MIT Lincoln LaboratoryHPEC 2010 - 14
Hendry, et al. 04/22/2023
Vertical Memory Access
Vertical Coupler
[J. Schrauwen et al. U of Ghent.]
MIT Lincoln LaboratoryHPEC 2010 - 15
Hendry, et al. 04/22/2023
IO
SDRAM DIMM Anatomy
Row
D
ecod
er
Col Decoder
DRAM cell arrays
Banks (usually 8)
data
data
Col addr/en
Row addr/en
Sense Amps
CntrlAddr/cntrl
DRAM_Chip
DRAM_Bank
DRAM_DIMM
Ranks
SDRAM device
MIT Lincoln LaboratoryHPEC 2010 - 16
Hendry, et al. 04/22/2023
Optical Circuit Memory (OCM) Anatomy
Waveguide Coupling VDD, Gnd
Bank
IO
Cntrl
Addr/ cntrl
DRAM Chipdrivers
Laser Source Waveguide
Laser In
Addr/cntrl
Rx Dec.
IO Gatingdata
Mux Chip
AWG
AWG
AWG
AWG
AWGAWG
AWG
AWG
AWG
AWGTo Mux
Chip
From Mux Chip
MIT Lincoln LaboratoryHPEC 2010 - 17
Hendry, et al. 04/22/2023
Results: Circuit SwitchedApplication Performance
Emesh EmeshCS PS-1 PS-205
101520253035404550
1.04
47.3
27.8
17.76
0.78
31.8226.51
13.48
1.754.74 4.32 3.12
Performance
Projective Transform Matrix Multiply FFT
Network Type
Perf
orm
ance
(GO
PS)
EmeshCS yields the best performance, but PS-1 and PS-2 are competitive
MIT Lincoln LaboratoryHPEC 2010 - 18
Hendry, et al. 04/22/2023
Results: Circuit SwitchedPower
Emesh EmeshCS PS-1 PS-202468
101214161820
11.2
19
4.372.21
11.1
15.8
4.352.17
11.4 11.2
4.282.15
Network Power
Projective Transform Matrix Multiply FFT
Network Type
Pow
er (W
)
PS-1 and PS-2 use much less power than electronic alternatives
MIT Lincoln LaboratoryHPEC 2010 - 19
Hendry, et al. 04/22/2023
Results: Circuit SwitchedPerformance/Watt Comparison
Emesh EmeshCS PS-1 PS-20
102030405060708090
100
1
26.9
68.6
86.7
1
29.01
87.64 89.33
1 2.82 6.72 9.67
Performance per Watt Improvement
Projective Transform Matrix Multiply FFT
Network Type
Impr
ovem
ent F
acto
r
PS-1 and PS-2 give the best performance per unit of power
MIT Lincoln LaboratoryHPEC 2010 - 20
Hendry, et al. 04/22/2023
Results: Circuit SwitchedPower Budget Breakdown
Projective Transform Electronic components
dominate the power of all the systems in
question
PS-1 and PS-2 both dominated by Electronic
Crossbar
Emesh dominated by Electronic Buffer
EmeshCS dominated by Crossbar and Electronic
Wire
The Electronic Crossbar requires a significant amount of power. However, in the Electronic Mesh, the Electronic
Buffers dominate the energy consumption
MIT Lincoln LaboratoryHPEC 2010 - 21
Hendry, et al. 04/22/2023
Results: TDMProjective Transform
Emesh Pmesh P-TDM P-ETDM0
5
10
15
20
25
30
11.2215.49 16.02
23.97
Network Power
Network Type
Pow
er(W
)
Emesh Pmesh P-TDM P-ETDM0
10
20
30
40
50
60
1.117.55
20.87
51.04
Performance
Network Type
GOPS
Emesh Pmesh P-TDM P-ETDM0
5
10
15
20
25
5x
13x
22x
Performance per Watt Im-provement
Network Type
Impr
ovem
ent F
acto
r
TDM Results:•Performed on a smaller image
(256x256)•Yields the best performance when
packets can be sent in a single time slice•Constant setup cost means smaller
packages can be sent with less overheadTDM yields advantages when message sizes are smaller
MIT Lincoln LaboratoryHPEC 2010 - 22
Hendry, et al. 04/22/2023
Conclusions
• ISR front-end application performance is of increasing importance in the community
• These applications put large demands on the memory and network subsystems
• Photonics offers a low-powered approach to meeting these performance demands
For the full details on these photonic architectures, see our other publications in the Journal of Parallel and Distributed Computing
(JPDC) 2011 and Supercomputing (SC) 2010
MIT Lincoln LaboratoryHPEC 2010 - 23
Hendry, et al. 04/22/2023
References
•TDM Arbitration in a Silicon Nanophotonic Network-On-Chip for High Performance CMPsGilbert Hendry, Eric Robinson, Vitaliy Gleyzer, Johnnie Chan, Luca P. Carloni, Nadya Bliss, Keren BergmanJournal of Parallel and Distributed Computing 2011
•Circuit-Switched Memory Access in Photonic Networks-on-Chip for High Performance Embedded ComputingGilbert Hendry, Eric Robinson, Vitaliy Gleyzer, Johnnie Chan, Luca P. Carloni, Nadya Bliss, Keren BergmanSupercomputing 2010