Date post: | 20-Mar-2017 |
Category: |
Documents |
Upload: | michael-sporer |
View: | 286 times |
Download: | 1 times |
Memcon 2015
Serial Memories Fill a Need
Agenda
Michael Sporer – Director of Marketing
The future of parallel versus serial interface for memory
Mark Baumann – Director of Applications Engineering
Based on experience at MoSys developing and introducing the GigaChip interface and 1st, 2nd and 3rd generations of Bandwidth Engine ICs we will describe several options for future memory interface solutions.
Copyright ©MoSys, Inc. 2015. All rights reserved. 2 MemCon 2015 - October 12th
Discrete DRAM doesn’t do Serial… yet
Memory is the last holdout that still hasn’t gone serial
Copyright ©MoSys, Inc. 2015. All rights reserved. 3 MemCon 2015 - October 12th
Challenges of Implementing DDR
Copyright ©MoSys, Inc. 2015. All rights reserved. 4
Source: Agilent MemCon 2015 - October 12th
DRAM bus trace length matching requirements
Design, Development & Qualification
Tradeoffs: Serial vs. Parallel
On the Chip
SerDes adds costs on chip • MUX deMUX • 2.5GHz chip with 25 Gbps IO
IO Bandwidth / Chip Area Roughly the same on chip Depends on the range
IO Bandwidth / Power
It depends on reach
On the Board
Fewer lanes • 25GHz is more challenging, but is
solvable Longer reach than parallel
• Easier board floor planning • Distributed thermal loads
Greater noise immunity
Is it a balanced tradeoff?
Copyright ©MoSys, Inc. 2015. All rights reserved. 5 MemCon 2015 - October 12th
HMC gives them the bandwidth they need
“DDR has run out of pins on the package”
Copyright ©MoSys, Inc. 2015. All rights reserved. 6
Source: Xilinx Technology Outlook - Liam Madden, FPL, Sept-2014 MemCon 2015 - October 12th
TSV Based DRAM Stacks
The performance potential of TSV based DRAM stacks can be
realized with two very different interface and packaging solutions.
High Bandwidth Memory (HBM) Evolutionary wide, parallel interface
Hybrid Memory Cube (HMC) high performance serial interface.
Both solutions have their place in new systems design and there are advancements in both options on the horizon.
Copyright ©MoSys, Inc. 2015. All rights reserved. 7 MemCon 2015 - October 12th
and HBM is coming …
Just look at what AMD and nvidia have planned
Copyright ©MoSys, Inc. 2015. All rights reserved. 8 MemCon 2015 - October 12th
HBM Gen1 shipping now
HBM Gen2 coming soon
Interposer based MCM
Xilinx highlighted that the technology wasn’t the critical element, it was the supply chain.
Copyright ©MoSys, Inc. 2015. All rights reserved. 9
Source: Xilinx Technology Outlook - Liam Madden, FPL, Sept-2014 MemCon 2015 - October 12th
Economics of Direct Attach HBM
@Customer: Can customer afford Direct Attach HBM?
Interposer development costs Fixed memory footprint Special Supply Chain
What is the volume required to recoup incremental costs?
@Manufacturer: Can DA-HBM exist in a low volume, high mix manufacturing environment?
Copyright ©MoSys, Inc. 2015. All rights reserved. 10 MemCon 2015 - October 12th
Serial HBM: High Performance, Low Pin count
Serial HBM Solution
Serial HBM Reduces Risk at the Customer Lower Technology Risk
• Pin count advantage for host device, • Ease of routing a serial interface • Standard CEI interface • Scalable and versatile
Component type Supply Chain • Inventories • Test and Burn-In
Cost Advantages • Standard board assembly
Serial HBM Markets Networking
• Packet Buffering and high capacity tables Embedded
• Supports a range of capacity and speeds with long product lifecycles • Protects customers from changing HBM memory interface on host
All the Bandwidth but none of the headaches of DA-HBM
12 Copyright ©MoSys, Inc. 2015. All rights reserved.
Serial Interface HBM
shim GCI
MemCon 2015 - October 12th
Flexible Capacity Expansion : Serial
One host port of 16 lanes can connect to 1, 2 or 4 devices
No additional bus loading or pin count
No throughput degradation
Expansion example shows MoSys Bandwidth Engine
Host
16 8 8
4 4 4 4
Host
Host 1x
4x
2x
13 Copyright ©MoSys, Inc. 2015. All rights reserved.
HBM MCM Yield Analysis
HBM Memory Solutions
Direct Attach HBM – 4 HBM MCM Yield Single Sourced Interface support longevity Memory controller complexity and power
added to ASIC
Serial HBM Package on Package Tested and optional burn in of component
HBM before MCM assembly shim features optimized for application Incremental power for additional shim ASIC USR SerDes for MCM
Serial HBM On Motherboard: VSR SerDes for Motherboard Lowest Cost, highest yield solution 30% board area increase Easiest thermal solution
Copyright ©MoSys, Inc. 2015. All rights reserved. 15
ASIC 55 um
HB
M
HB
M
HB
M
HB
M
ASIC 180 um
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
ASIC 180 um
MemCon 2015 - October 12th
Serial vs. Direct Attach Value Comparison
Copyright ©MoSys, Inc. 2015. All rights reserved. 16
Attribute Serial HBM Direct Attach HBM
Technical Risk + +
•Smaller Interposer •Discrete Component BI & Test
- -
•MCM Yield •HBM Repair
Cost + +
•Lower yielded cost •Supply Chain Inventory
- -
•MCM Development Cost •MCM Yield
Power - • incremental power /BW + •Lower power
Thermal + •Distributed sources - •Higher Thermal Density
Time to Market + +
•Proven Standard SerDes •Discrete Component Design
- -
•HBM Interface IP Availability •MCM Complexity
Flexibility + + +
•On or Off substrate •Memory expansion •Fungible Serdes
- -
•Depopulate or not •Single purpose HBM IO Block
Reliability + +
•Burn-In Option •Field Repair managed in Serial HBM
-
•JEDEC Field Repair in host ASIC
Supply Chain Ownership
+ + +
•Single Point •Discrete component •Multi-sourced
- - -
•Multiple or Single Points •MCM Model •Single Sourced
Board Area - •0% to 30% larger + •baseline
MemCon 2015 - October 12th
Normalized Yielded Cost of HBM
Copyright ©MoSys, Inc. 2015. All rights reserved. 17 MemCon 2015 - October 12th
Assembly yield expected to be 95%
HMC – Hybrid Memory Cube
Breakthrough in power due to TSV based construction 5 pJ/b DRAM only
Combined with Logic die resulting in 24.5W per 1Tbps 3 links @ 12.5G 24.5 pJ/b total (vs. 39 for DDR4)
Copyright ©MoSys, Inc. 2015. All rights reserved. 18 MemCon 2015 - October 12th
Serial vs. Parallel Memory Comparison
Attribute Bandwidth Engine BE-2 | BE-3
Hybrid Memory Cube (HMC)
High Bandwidth Memory (JEDEC)
DDR4 (JEDEC)
Physical Interface Serial CEI Standard Serial CEI Std JEDEC HBM IO JEDEC DDR4 IO
Protocol GigaChip™ Interface HMC Consortium RAS/CAS
Source of Supply Dual-Sourced Single Sourced Multi-Sourced
Access TDM Scheduler Sched./Switch Banked RAM
Capacity 576 Mb 1152 Mb 16~32 Gb 32-64 Gb 4-8 Gb
Buffer Bandwidth 400 Gbps 800 Gbps 1280 Gbps 2048 Gbps 38 Gbps
Transaction Rate >4.5 Bt/s >10 Bt/s 2.6~2.9 Bt/s TBD 0.2 Bt/s
Signal Pins 66 66 272 ~1600 42
Package BGA 19x19 BGA 25x25 BGA 31x31 KGSD BGA 8x12
Power 7-11W TBA ~28W 8W estimated 0.7W
DDR4 ~ 16+20Switch
Serial IO
16 16 16 16
………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
TDM / Scheduler
Serial IO
8 8
19 Copyright ©MoSys, Inc. 2015. All rights reserved.
Channel 0 Channel 1
HBM – 8 channels & 128 banks,
~1600 pins, Si Interposer
MemCon 2015 - October 12th
Future TSV DRAM Comparison
Copyright ©MoSys, Inc. 2015. All rights reserved. 20
Direct Attach HBM Serial HBM concept HMC
Bandwidth equal
Interposer / Yield cost CPU Memory Memory
Power 1x <2x >3x
Latency Lowest Low ?
Deterministic Yes Yes No
Longevity of Interface 5 years indefinitely
Field Repair Host based Serial HBM based HMC based
Host IO (PHY & pins) Single Purpose General Purpose and LP SerDes
Test or Burn-In Not possible Possible
Supply Chain MCM-type Component
Application Performance
none Optimized for application
Generic HMC Specification
Source Multi-sourced Single Source
MemCon 2015 - October 12th
What to build with? It depends…
The Ultimate Network Processor’s Memory Implementation
Memcon 2014 MoSys presented on extreme memories for networking and showed the relative position and value for different memories for a 1.2Tbps Network processor.
HBM for buffering Serial memories
for header processing and search
Off chip PHY to optimize datapath
This is a great point solution for 1.2 Tbps datapath
What about less extreme systems?
Copyright ©MoSys, Inc. 2015. All rights reserved. 22 MemCon 2015 - October 12th
Fron
t Pan
el
Example 400G Line Card w/ EZchip NPS Z30 Adds 50% System Memory Bandwidth
Packet Buffer 24 x DDR4 devices
Embedded Memory
uP uP uP uP uP uP uP uP uP uP uP uP uP uP uP uP
Intelligent Offload Flexible Feature &
Performance Expansion
Memory I/O Memory bandwidth for Packet Buffering, cores
and HW Accelerators
Packet Forwarding Engine
Hardware Accelerators
8-16 serial lanes
Back
plan
e
MoSys Framer/
Gear Box
MoSys
MSRZ30
FIC
Flexibility + Performance “C” Programmable Processors
+ L2-L7 Accelerators
23 Copyright ©MoSys, Inc. 2015. All rights reserved.
DDR4
DDR4
DDR4
DDR4
DDR4
DDR DDR4
DDR4
DDR4
DDR4
DDR4
DDR DDR4
DDR4
DDR4
DDR4
DDR4
DDR DDR4
DDR4
DDR4
DDR4
DDR4
DDR
MemCon 2015 - October 12th
800GE Using Serial HBM & BE3
Copyright ©MoSys, Inc. 2015. All rights reserved. 24
400G PFE (ASIC/FPGA)
400G PFE (ASIC/FPGA)
4 x 100G
4 x 100G
Optics Module
GB/RT
LineSpeed Gearbox, Retimer
Optics Module
GB/RT
LineSpeed Gearbox, Retimer
Bandwidth Engine Gen 3
Shared: • FIB Tables •Statistics •Metering •Semaphores •Packet Buffers
MemCon 2015 - October 12th
shim
GCI
Conclusion
Serial memory offers advantages over Direct Attach HBM
Economics driven by Supply Chain Flexible and adaptable Scalable performance Quality and reliability Simplifying board design and cooling
Pick your memory for your application
Memory core performance and capacity (DRAM vs. others) Architecture ( Point to Point versus Chainable) IO serial vs. parallel
DDR DRAM is the defacto standard based on decades of
evolution and optimization. If DDR doesn’t meet your needs there are other options available.
Copyright ©MoSys, Inc. 2015. All rights reserved. 25 MemCon 2015 - October 12th
Mark Baumann Director of Applications
Bandwidth Engine Serial Interface (GCI)
Topics
Parallel Interface evolution – faster, wider How long can this Last?
Serial Interface evolution – NRZ PAM4 emerging
Interface efficiency – HMC vs. GCI vs. ILA Standards based solutions vs. proprietary Interface for offload (abstracted) serial is better (variable size transfers) Splitting transaction layer from transport layer
Purpose built vs. Fungible IO
Copyright ©MoSys, Inc. 2015. All rights reserved. 27 MemCon 2015 - October 12th
NPU Interface Options Today
NPU SSTL/HSTL SerDes
DDR-3 SDRAM
RLDRAM
QDR SRAM
KBP/ TCAM
SSTL/HSTL
SSTL/HSTL SerDes
SerDes
DDR Style Serial Style
Net
wor
k &
Bac
kpla
ne In
terf
aces
XAUI
10G KR
Interlaken
PCIex
Mem
ory
& C
oPro
cess
or
28 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th
NPU Interfaces Using Serial
NPU SerDes
DDR-3 SDRAM
SerDes
SerDes
Serial Style Serial Style
Net
wor
k &
Bac
kpla
ne In
terf
aces
SerDes
SerDes
DDR-3 Bridge
Enabled by 10G KR GCI enabled SerDes
SSTL/HSTL
3x to 4x Bandwidth Density per mm2
GCI
GCI
Interlaken
KBP/ TCAM
Serial SRAM?
BE
XAUI
10G KR
Interlaken
PCIex Mem
ory
& C
oPro
cess
or
29 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th
NPU Interfaces Using Serial
NPU SerDes
SerDes
SerDes
Serial Style Serial Style
Net
wor
k &
Bac
kpla
ne In
terf
aces
SerDes
SerDes
HMC or Ser. HBM
Enabled by 10G KR GCI enabled SerDes
SSTL/HSTL
3x to 4x Bandwidth Density per mm2
GCI
Interlaken
KBP/ TCAM
Serial SRAM?
BE
XAUI
10G KR
Interlaken
PCIex Mem
ory
& C
oPro
cess
or
30 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th
Parallel vs Serial
31 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th
GigaChip Interface Layers & Frame Format
Transaction Application Specific
Data Link
Physical Coding Sublayer (PCS)
Physical Media Access Electrical
Link initialization Lane Deskew Scrambling
Reliable transport of Frames via CRC & Positive Ack
GigaChip Interface Protocol
PC Board Trace
BE QDR,TCAM…
32 Copyright ©MoSys, Inc. 2015. All rights reserved.
CEI Compatible SerDes
Payload DLL Rx Ack CRC
Data Link Layer Frame Format
Frame striped across SerDes lanes (1, 2, 4, 8,16) Modulo 10 UI, Fixed size Sized to meet needs of application >90% bandwidth efficiency at 80b
Data Link Layer operations DLL Indicates if payload is Transaction Link Layer
operation or Data Payload Data Link Layer operations: Replay, Pause (no-op)
Data Payload format up to application Op codes, address, data…formatting left to higher level For memory transactions: 1 frame = transaction For packets: variable number of frames can be used
72b 1b 1b 6b
MemCon 2015 - October 12th
CRC Error Handling w/Positive Ack
Tx Request Transactor
Queue
Device A CSI Tx
Device B CSI Rx
CRC Error Check
Rx Target Transactor
Queue
Rx Ack Counter
Tx SerDes
Rx SerDes CRC
Gen
Ack Count
Compare, Set Tx
Replay if “stuck”
Tx Replay Queue
Rx SerDes
Prev Rx Ack Count
Rx SerDes
PISO SIPO
6
1
Ack Count
1
Compare Ack, Replay when
“stuck”
Freeze Ack If CRC Error, Resume Replay Frame
Post if CRC OK, Freeze if not OK, Resume posting on Replay Frame
72 72
72 + 6 72 + 6
33 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th
Multi Core => Multi-Partition & Multi-bank
Copyright ©MoSys, Inc. 2015. All rights reserved. 34
Packet Processor 0
1
n-1
n
Serial Link
Serial Link
Serial Link
Serial Link
…
…
…
Bandwidth Engine
Multi-cycle Scheduler
10 GA
800 Gb/s
BIST Self- repair
…
…
ingress egress
Multi-bank Multi-partitions allow for high access availability
Multi-threaded Multi-Cores allow for high processing throughput Multi-linked
allow for concurrent transport operations
ALU for functional Acceleration Local processing minimizes intra-chip traffic
Allows Extended Carrier Class & In package Repair
ALU
MemCon 2015 - October 12th
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 5 10 15 20 25 30 35 40 Payload Size (B)
Read-Only Data Efficiency
BE
ILA
HMC
Protocol Transfer Efficiency Comparison: Range of Payload Sizes and Applications
35 Copyright ©MoSys, Inc. 2015. All rights reserved.
Transfer Efficiency = Data / (CMD + Address + Data + Transport Protocol)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 20 40 60 80 100 120 140 160 180 Payload Size (B)
Read/Write Data Transfer Efficiency
BE 50:50
HMC 50:50
HMC 128B Block Size HMC 64B HMC 32B
Packet Header Processing Application Packet Buffering Applications
Efficiency includes Transaction & Transport protocol:
Note GCI: GCI + TL 2.0
HMC 32B Block Size
MemCon 2015 - October 12th
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 10 20 30 40 50 60 70 80 Frame Size (Bytes)
ILA
Interlaken
GCI 2.0
Protocol Transport Efficiency Comparison: GCI Optimized For Smaller Transfers
36 Copyright ©MoSys, Inc. 2015. All rights reserved.
GCI + TL 2.0
GCI ≈ Interlaken
GCI ~ 2x Interlaken
Packet Transfers
Header Processing
MemCon 2015 - October 12th
Serial Link Rate Road Map
Xilinx UltraScale+ 2016 33G GTY SerDes
BE3 2016 Q1 31G SerDes
56G PAM4 is being demonstrated now
Copyright ©MoSys, Inc. 2015. All rights reserved. 37 MemCon 2015 - October 12th
CEI-56G Will Address Chip to Chip, Module, +
Copyright ©MoSys, Inc. 2015. All rights reserved. 38 MemCon 2015 - October 12th
Summary
GCI is a proven chip to chip reliable transport protocol
Multiple designs in FPGA, ASIC and ASSP in production systems
GCI Specification is freely available without restriction on use Same as Interlaken model
GCI protocol is designed to evolve as the CEI standard evolves
The inherent performance efficiency of GCI naturally equates to
improved energy efficiency
Copyright ©MoSys, Inc. 2015. All rights reserved. 39 MemCon 2015 - October 12th
Thank You
Copyright ©MoSys, Inc. 2015. All rights reserved. 40 MemCon 2015 - October 12th
CMOS Memory Core Technologies
Copyright ©MoSys, Inc. 2015. All rights reserved. 41
DDR
•Transaction Rate •Power •mm2/bit •Cost
#BitCells per SenseAmp
LL/RL DRAM
eDRAM
SRAM
Logic Fab
DRAM Fab (limited metal)
TCAM
Mobile DRAM
MemCon 2015 - October 12th
HMC HBM