Recent Advances in Die Stacking and 3D FPGA Arif Rahman, Program Director & Architect December 9th 2013
Summary
Early adoption of die stacking technology is underway in logic and logic-memory applications − Major semi companies across the supply chain have programs in place − High-volume application driver is needed for broader adoption
Next generation high-end FPGA applications require 2.5D/3D integrated memory − Driven by application requirements − Addresses pin bandwidth limitations
Altera is well positioned to leverage die stacking and Stratix 10 products will include stacking capabilities
2
Agenda
Background Technology Trends 3D FPGA and Application Space Exploration Die Stacking Initiatives at Altera Summary
3
What’s Driving the Need for 3D? Enhanced Capabilities
Bandwidth expansion − High-bandwidth chip-to-chip interface (wide IO interface) − Optical interconnect
Additional processing capabilities − Memory enhancement
Product feature set expansion − Derivative products
Energy efficiency Integration
− Fewer components
4
Enhanced Memory Capability
6
Integration and Form Factor Reduction
7
Today’s Remote Radio Units Yesterday’s Cell Towers
Energy Efficiency
8
Die-stacking enables up to 10X reduction in off-chip signaling power
Normalized Energy
Operation Monolithic 2.5D/3D Write 64b DFF 1X .8X
64b Integer Add 2X 1.6X
Read 64b Register (64 x 32 bank) 7X 6X
8192-point FFT (per transform) 20X 15X
Read 64b from DRAM 4,000X <400X Source: Bill Dally, Advanced Computing Symposium, September 16, 2009 An 8192-Point Fast Fourier Transform 3D-IC Case Study, R. Davis, A. Sule, and P. Franzon
Micro-bump (30-55 um pitch)
IC1
IC2
IC1
IC2
Package Substrate
Thorough silicon via (10-100 um pitch )
Chip-to-chip bonding
Thin wafer handling & processing (25-100um thick)
Enabling Technologies for 3D Integration
C4 Interconnect (100-250 um pitch)
Solder Balls (0.5-1 mm pitch)
On-chip interconnect (<0.1- 0.8 um pitch)
Micro-bump (30-55 um pitch)
IC1
IC2
IC1
IC2
Package Substrate
Thorough silicon via (10-100 um pitch )
Chip-to-chip bonding
Thin wafer handling & processing (25-100um thick)
Enabling Technologies for 3D Integration
C4 Interconnect (100-250 um pitch)
Solder Balls (0.5-1 mm pitch)
On-chip interconnect (<0.1- 0.8 um pitch) 1-10X
300-500X
100-1KX
1K-2.5KX
5K-10KX
Relative Scale
11
Comparison of 3D Integration Schemes
Technology* Wire Bonding 2.5D (Silicon Interposer) 3D (TSV-on-active integration)
Die Size Small Medium to Large Small to Large
Number of Interconnects
A few 100’s 1,000-10,000 1,000 - 10,000+ Limited by TSV area overhead
Int. Power 1X 0.1X 0.01X
Aggregate BW/W
1X ~ 0.01X – 0.001X ~0.001X – 0.0001X
Tech. Availability
In volume production
a. F2F Stacking (50 um pitch): 2008
b. 2.5D (50um pitch): 2012/13
a. Limited applications (TSV 150 um pitch) : 2009
b. Full 3D: > 2013/2014
Intel: 4Mb Stacked SRAM in 65nm Courtesy:
Chipworks
*: Other Alternative solutions include glass or organic interposer, wafer-level fan-out, etc.
Design and Architectural Consideration System-level Partitioning and Feasibility
Application requirements and product realization − Partitioning of functions in different die − Chip-to-chip interface − Manufacturing aspects (KGD, test) − Thermal and thermo-mechanical co-design
Implementation strategy − Relative dimensions of on-chip and chip-chip features, design rules and impact to floorplan − Keep out zone for TSVs − Design trade-off
Other considerations − High-frequency signaling through TSV and u-bump − Signal isolation between stacked components
12
Traditional system and board level design considerations are now part of 2.5D/3D integration
Design Consideration Manufacturing Aspects
Scalability of 2.5D/3D interconnect vs. traditional interconnects
Thermal budget and heat removal − Thermally aware design
Component reliability − Effects on die size, structural attributes, and material properties − FPGA specific requirements
Testability and KGD
14
Design and technology interaction has to be considered much earlier in product planning process
Altera’s Product Portfolio
CPLDs Lowest Cost, Lowest Power
PowerSoCs High-efficiency
Power Management
FPGAs Cost/Power Balance SoC & Transceivers
Design Software
Development Kits
Embedded Soft and Hard Processors
FPGAs Mid-range FPGAs
SoC & Transceivers
R E S O U R C E S
FPGAs Optimized for
High Bandwidth
Intellectual Property (IP)
Industrial Computing Enterprise
15
2.5D/3D Technology Segmentation and Product Alignment
16
Fron
t End
Cen
tric
B
ack
End
Cen
tric
Cost Driven Performance Driven
3D Integration (HMC, HBM, etc.)
Silicon Interposer (FPGA, mixed signal, SoC, Proc + Memory) - Face-to-Face Stacking
(Sony PSP) - Wire bond stacking
Glass interposer - Organic Interposer - Fine-Pitch Substrate - Wafer level FO - Multi-Chip Package
Altera’s 3D Silicon Vision
Customer & application driven heterogeneous system
integration in package − Mix and match silicon IP − Integrated design flow − Integrated system test methodology
Maximum system performance
Minimum system power Smallest form factor Reduced system cost
17
HardCopy ASIC
ASSP
Memory
ASIC
CPU SoC
FPGA
FPGA +
TSMC 20 nm process 15% higher performance than
current high-end with 40% lower midrange power
5x higher customer commitment dollar value at time of launch
1.9x processor system improvement
Intel 14 nm Tri-Gate process 2x performance increase 70% power savings 3rd-generation processor
system 3D-capable for integrating
SRAM, DRAM, ASIC
Breakthrough Advantage with Generation 10
18
Delivering Unimaginable Performance
Reinventing the Midrange
TPR, Univ. Minnesota
Homogeneous 2.5D/3D Integration of FPGA
Identical programmable fabric in multiple tiers − Shorter interconnects & fewer tracks/LAB − Lower power, smaller form factor
Architectural Considerations − Attributes of inter-tier connections & scalability − Thermal aspects
19
3D FPGA requires 5-10X finer pitch inter-tier interconnects compared to today’s volume manufacturing capability
Heterogeneous 2.5D/3D Integration Partitioning of FPGA Functions in Different Tiers
Driven by smaller form factor and shorter interconnects − Lower power, higher performance
Design considerations − High-density inter-tier connections − Standard vs. specialized process tech − Thermal aspects and testability
20
Interesting value proposition but requires unique process technology
M. Lin, FPGA’06 Tier Logic, SLIP’10
Heterogeneous 2.5D/3D Integration Mixed Functionalities and Technologies
21
Discrete Solution 2.5D Integration
FPGA-Memory integration addresses package pin bandwidth limitation and system power constraints
FPGA-mixed signal or optical integration enables form factor reduction and optimal use of process technology
Altera’s Market Segments
22
Communications Industrial and Automotive
Military and Aerospace
Military and Aerospace Computing, Consumer, Storage, Test, and Medical
Automation and Process Control
PLC and I/O Modules, Motion and Motor Control, Industrial Networking, Sensor/Encoder Interfaces
Building Control and Security Video Surveillance, Access Control, HVAC Control
Automotive Displays, Infotainment, Driver Assistance
Smart Energy Smart Grid/Meter, Energy Management, Power Distribution
Intelligence Deep Packet Inspection, Data Analysis, High Performance Computing, Acceleration, Access
EW/Radar Counter-IED, Jammers, Decoys, Early Warning Radar; Airborne, Ship-Borne and Stationary Radar
Secure Communications
In-Line Network Encryptors; Airborne, Vehicular, Tower and Tactical Radios
Guidance & Control
Aircraft, Missile, Vehicle and Robot Guidance and Control, Instrumentation Clusters
Networking Switches, Routers
Wireline Optical Metro Access
Wireless Remote Radio Head, Basestations, Wireless LAN
Broadcast Studio, Satellite, Broadcasting
Computer and Storage
Servers, RAID, High Performance Computing, Flash Storage, MFP
Consumer Displays, Set-Top-Boxes
Test IP Video Testers, Protocol Testers
Medical CT Equipment, Ultrasound
Communications segment contains the most demanding product requirements
Memory Intensive Networking Applications
23
Backplane Switch (FIC)
Traffic Manager
(TM)
Packet Processing
(PP)
Front End Optics
(& Processing)
PP Function Memories Used Parsing M20K*
Packet Store M20K, DDR
Classification TCAM
Packet Editing M20K, QDR, RLD
Statistics M20K, DDR
Policing M20K, QDR, RLD
Forwarding DDR
TM Function Memories Used Free List M20K, QDR, RLD
Linked List M20K, QDR, RLD
Queue & Buffer Management
QDR, DDR
nQ, dQ (head,tail ptrs) QDR, RLD
Congestion Mgt. QDR, RLD
Scheduler QDR, RLD
* M20K: Distributed embedded SRAM in Altera FPGA
Wireline Application Memory Requirements
Data plane memory − Temporary storage of packets while they await forwarding decision − Require high capacity and bandwidth
Control plane memory − Storage of data for forwarding decision − Requires low latency and high random transaction rate
24
Memory Bandwidth Scaling Trends and System Requirements
Potential solution requires innovations architecture, chip-to-chip interface, and system integration
DDR Memory or IO Bandwidth 2X every 4-5 years
~2X every 2 years
Mem
ory
Inte
rfac
e B
andw
idth
Timeline
Application Requirements
FPGA IO & packaging solution will be challenged to meet system-level power & performance requirements. Inflection point at 200G
26
100G-400G Wireline Memory Requirements for FPGAs
0
200
400
600
800
1000
1200
1400
1600
1800
0
2000
4000
6000
8000
10000
12000
100 200 400
Random Trans./Sec (M)
Full Duplex BW (Gb/sec)
TM Random (M. Trans/sec) BW (Gbit/sec)
Package pin constraint for control plane
Beyond 200G Serial HMC is recommended
Offered Load Gb/sec
Data plane constraint. 4x72b DDR4 @ 1200 MHz.
Inflection Point
Emerging High Performance Memory
Memory vendors are addressing IO bandwidth constraints by architecting memory and IO interface
Both serial and wide IO solution will likely co-exist − Power constrained system will prefer wide IO solution
28
Control & Data Plane Memory
(QDR, RLDRAM, DDR)
Serial IO Interface
Parallel (HBM) Interface
Legacy Products
2.5D/3D Capable Memory
Power Savings with Integrated Memory
Memory power accounts for up to 30% of total line card power
20-40% memory power reduction can be achieved by 2.5D/3D integration
29
Discrete Solution 2.5D Integration
30
Summary of Product Direction 1
10
100
1000
Spartan
Flash
FPGA
NVM Mixed-Signal
Wirebond 2.5D Integration 3D Integration
Nor
mal
ized
Chi
p-to
-Chi
p C
onne
ctio
ns
Time
Product Enablement and Risk Mitigation
Risk mitigation for 2.5D/3D technology through product-like test vehicles
Different flavors of test vehicles of varying complexity − Passive test chips for short-loop learning and technology qualification − Active test chips for electrical characterization, functional validation, and
technology qualification
Collaboration with industry consortium, foundry, equipment vendors, and university
Extensive modeling and simulation work − Thermal and thermo-mechanical − Electrical − Application use cases
31
Face-to-Face Integration for Low-Cost and Mid-Range Products
32
IC1 IC2
Heat Spreader Die 1
Substrate Die 2
IC 1
IC 2
40 µm bump pitch 200 µm bump pitch
Microball + Cu post Cu pillar with LF solder + micro-bump Micro-ball + Cu post
Silicon Interposer Based Active Test Vehicle for High-Performance Applications
R&D vehicle for design and manufacturing enablement
Uses TSMC’s CoWoS process
Cross-Section of R&D Vehicle
TSV Daisy Chain Resistance: Pre- and Post-TCB Stress
35
No shift in TSV resistance from either processes “A” or “B” after TCB stress. Process A and B are equivalent for TSVs.
Micro-bump Daisy Chain Resistance: Pre- and Post-TCB Stress
36
All resistance values are normalized to the median value of “Process A” resistance distribution.
Process A: Un-optimized Process B: Optimized
Electrical Characterization of TSV and SerDes
Insertion loss of TSV at 10G ~ 0.6dB
Comparable transceiver performance at 10G
Die
Silicon Interposer TSV
u-bump
C4
Package Substrate
W/O TSV With TSV
Jitter increased 1.1 ps in -5dB 28Gbps system (simulated)
-0.9 dB
w/oTSV With TSV
Comparable Jitter at 10G (measured)
MIM Cap Integration in Silicon Interposer
MIM CAP
RJ(rms) (pS)
DJ (pS)
TJ(1e-12) (pS)
Eye height (mV)
Eye width (pS)
10G SerDes
2.2nF 0.760 25.10 35.93 619 70
None 0.770 28.16 39.20 512 65
• Metal-insulator-metal cap between interposer’s Mz interconnect layers
• Decoupling for power planes improved TX jitter (~10%)
Multi-Chip Integration of Optical Interconnect
39
Optical interconnect eliminate signal integrity issues with electrical interconnect
Optically-enabled FPGA Architectural optimization to
reduce link power and latency For factor reduction
Applications Rack-to-rack Card-to-card Backplane Chip-to-chip
ROSA
TOSA
Technology Demonstrator
Optical FPGA Board & Driving 100GE
Stratix IV GT FPGA − 4S100G5 (530KLE) − 28 Full Duplex @ 11.3Gbps
Electrical: 16 @ 11.3 Gbps Optical: 12 @ 11.3 Gbps
Optical Interface − Connector: MTP at faceplate − Wavelength: 850 nm (VCSEL) − Reach: 100m multimode fiber − Power: 2-3W total power
FPGA Design Demo − 100GE MAC − Random packet generator/checker − 10 x 10.3125 Gbps
Current Status of 2.5D/3D Industry
Still some challenges with design and manufacturing enablement but they are being addressed − EDA tool flow − Test and KGD − Flexible supply chain − Thermal/cooling solution
Several standard activities are underway − JEDEC Wide IO and HBM − Si2 Open 3D − 3D DFT architecture (IEEE 1149.1/1500) − SEMI
41
Summary
Early adoption of die stacking technology is underway in logic and logic-memory applications − Major semi companies across the supply chain have programs in place − High-volume application driver is needed for broader adoption
Next generation high-end FPGA applications require 2.5D/3D integrated memory − Driven by application requirements − Addresses pin bandwidth limitations
Altera is well positioned to leverage die stacking and Stratix 10 products will include stacking capabilities
44
Thank You