REVISITING AMDAHL'S WAFER SCALE INTEGRATION WITH ADVANCED PACKAGING
Moore’s Law for Heterogeneous Integration
Subramanian S. Iyer
chips.ucla.edu
Center for Heterogeneous Integration and Performance Scaling
UCLA CHIPS
A UCLA Led partnership to develop Applications, Enablement and Core technologies and the eco-system required for continuing Moore’s Law at the Package and System Integration levels and develop our students & scholars to lead this effort
What are we trying to do ?(from UCLA CHIPS 2015)
Develop an “app-like” environment for Hardware that can
• Cut the time to market by 5-10X
• Cut the NRE cost by 10-20X
• Allow extreme heterogeneity including extensions to cyber-physical systems
• Develop a sophisticated manufacturing workforce
Technologies• Fine Pitch interconnects• Rigid and Flexible Fabrics• On-Fabric communication• CTT• Reliability
Enablement• Power delivery• Connections outside• Thermal management• Yield management• Design Infrastructure• Supply Chain & security• Manufacturing
Applications• SuperCHIPS protocol• Hardware based IP reuse• Energy efficient computing• Neuromorphic Computing• Medical EngineeringIn
ter-
rela
ted
Th
emes
Manycore
Mobile sensors+ processors Cognitive implantables
Heterogeneous integration is key to the new Moore’s law
• Processors• Memory• Sensors• Communication
• Semi base• Node• Technology
• For the Raw Device: scaling = performance• For the interconnect: Not straightforward• For System Functionality: Heterogeneity = performance• For Cost/Function: Heterogeneity • For Time to Market: Heterogeneity• For power/energy: scaling, interconnects, heterogeneity,
and architecture
Energy Efficiency is key to massively scaled systems
You can’t scale yourself out of the power gap
You need a different architecture driven by and enabled by the ability to interconnect devices more efficiently in three dimensions
True 3D systems
and its all about interconnects
What would Yogi say ?
1925-2015
Silicon features1000X
Packaging features~ 4X
Package/Board Features have scaled modestly
• We packed more function on die – SoC era• Serialization – deserialization techniques
allowed high data rates over fewer channels• But at an unsustainable power and realestate
cost • SerDes even when standardized tend to be
proprietary
Board channel
Serialzer
des
eri
aliz
er
We Need a Moore’s Law for Packaging
One way to scale the package is to get rid of it !
Packages merely fan-out tight on Si pitches to coarse PCB pitchesand then fan them in again
Complex hierarchy : die/Xposer/Laminate/board
Disparate materials (Si, Cu, FR4, Molding compound etc.)
Limited heat sinking
Solder based interconnects
Interconnects always surrounded by underfill
Die-to-die connections limited by BGA pitch
Heat sinkChip
Laminate
Board
Interposer
BGA
C4 bumps
Micro bumps
180 µm
1000 µm
P: 50 µm
400 µm
50 µm
Ø = 20 µmUnderfill TSV
Heat sink
Laminate
Board
180 µm
1000 µm
P: 50 µm
400 µm
50 µm
Ø = 20 µm
PCB Trace
Morphing “Packaging” from a cost center to a high value add operation
• Without question the smallest board is a single large die with all the system functions on it
Gene Ahmdahl @Trilogy Systems
Amdahl eventually declared the idea would only work with a 99.99% yield, which wouldn't happen for 100 years.
circa: late 70’s
Lets revisit this concept but with a different approach
Can we do this in practice ?• The concept direct die attach to the board is not new
• But doing it at Fine Pitch with precision alignment and close spacing is
• Fine pitch ?– like ”fat wires” on a Silicon wafer - 2-10 mm
• Precision alignment ? – similar to fat wire alignment <0.2 mm
• Close Spacing– As close as possible <20 mm
Some nomenclature:
Interconnects refer to the die to substrate connections(e.g.. C4, BGA etc.)
Wire refers to wires on the substrate(e.g.. Trace on a laminate or board
Vias refers to connection between wiring levels
The “Right” Rigid Interconnect FabricRequirements:
• Mechanically robust (flat, stiff, tough…)
• Processability: fine pitch wiring, & interconnects
• Thermally conductive
• Can have passive (and active) built-in components
• Economical
Organic (e.g. FR-4)
Silicon
Hybrid approaches(EMIB)
Heat sink
Si-Interconnect Fabric (IF)
Heat sink
Heterogeneous dielet 2
Heat sink
Heterogeneous Dielet 1
Heat sink
spacing: 20-100 mm
Si IF single level Hierarchy
Conventional package Vs Si-IF (this is not an interposer !)
Several packaging levels
Disparate materials (Si, Cu, FR4, Molding compound etc.)
Limited heat sinking
Solder based interconnects
Interconnects always surrounded by underfill
Die-to-die connections limited by BGA pitch
Heat sinkChip
Laminate
Board
Interposer
BGA
C4 bumps
Micro bumps
180 µm
1000 µm
P: 50 µm
400 µm
50 µm
Ø = 20 µmUnderfill TSV Dielet (Si, III-V etc.)
Cu-pillars
Si-Interconnect Fabric (IF)
Cu-pad
2 - 10 µm 5 µm
PECVD Oxide
Ø= 1-5 µm
Recess (1 µm)
Heat sink
Heat sink
one packaging level
Mainly three materials (Si, Cu, Oxide)
Excellent integrated heat sinking
Metal-metal interconnects
No underfill
Interconnect pitch: 2-10 µm
only pads on the dies
Au-Capped Cu pillar
Si-Interconnect Fabric (IF)
Pitch:10 µm(Ø= 5 µm)
PECVD SiO2
Dielet (Si)Au-Capped Cu
pad PECVD SiO2
Si-Interconnect Fabric (IF)
Heated chuck: <120 ⁰C
Dielet (Si)
Bonding pressure: 100 MPa
Bond head: < 350 ⁰C
Dielet (Si)
Bonding pressure: 100 MPa
Bond head: < 350 ⁰C
How do assemble dies on the Si IF
Evolution of TCB process parameters for Si-IF assembly
Conditions
Chip pad
Interconnect fabric pillar
No
min
al p
ress
ure
(1
00
MPa
)
Interconnect fabric pillar
Chip pad
Thermal compression bonding
Pla
stic
def
orm
atio
n
250 ⁰C, 100 MPa
Chip pad
Interconnect fabric pillar
Bo
nd
ing
pre
ssu
re
Thermal Compression Bonding : Pristine and Flat
But High force can Damage Low K dielectrics
Fine Pitch Assembly on Si IF
• Si IF fabricated using Dual Damascene process • ~1000 dielets assembled (1mm2 - 4mm2)• 10mm pitch (+/-1 mm alignment; q <6m deg)• Can mix and match different diameters • 100mm spacing• >3000mm2 total dielet area• Passivated with Parylene C
Dielet
10 µm
Pillars on Si-IF
Interconnected Dies at 10 mm pitch
10000 1000 100 10
1
10
100
Solder mbumps [3],[4]
Effective R
Co
nta
ct (m
m2)
Contact area (mm2)
This work:
Au-capped Cu pillar
Cu-pillar [5]
113 36 11 4
Contact diameter (mm)
BGA
Specific Contact Resistance of pillar interconnects on Shear strength
PCB: SerDes
SuperCHIPS Protocol (Simple Universal Parallel intERface for CHIPS
10 µm
100 µm
50 µm
~5 mm
SuperCHIPS: Simple Inverter
Area: ~1 mm2 Area: ~2 μm2
PCB links Interposer links Si-IF links• Long links (several mm to cm)
– High parasitic inductance (~12 nH)
– RLC link behavior
• Short links (<500 µm, typical 100 µm)– Low parasitic inductance (~0.5 nH)
– RC link behavior
• Transmission Line Model– Signal Reflections & Matching
• RC Line Model– No signal reflections
Vs
ZsZo
Zl Vs
Zl
Rw
Cw
• Inter Symbol Interference (ISI)– Large transceiver ~0.81mm2 [1]
– Energy/bit: >23pJ/bit [1]
• Synchronous data transfer
• No Inter Symbol Interference– Simple inverter driver ~2.5x10-6mm2
– Energy/bit: <0.1pJ/bit
• Can be Asynchronous
• Long links (few mm)– High parasitic inductance
(~3 nH)
– RLC link behavior
• Transmission Line Model at higher operating frequency. (typical)
• RC model at lower operating frequency.
• Lower ISI compared to PCB– Still large transceiver
– Energy/bit: >6pJ/bit [3]
• Synchronous data transfer
Experimental Verification
• The measured insertion loss <1 dB up to 30 GHz. Dimension: 450 μm, 6x2 μm2, 7 μm spacing
• The preliminary results show RC-like transfer characteristics.
1E8 1E9 1E10-2.4
-2.0
-1.6
-1.2
-0.8
-0.4
Inse
rtio
n L
oss (
dB
)
Frequency (Hz)
Simulated
Measured
• GSG link dimensions: 585 μm, 2.1x1.3 μm2, 10 μm pitch• Insertion loss with interconnects: <-2 dB up to 30 GHz• Demonstrates RC-like single pole behavior• Low interconnect resistance (100-150 mΩ) compared to link
resistance (3 Ω)
Interconnect pitch/protocol10 µm on Si IF
SuperCHIPS
50 µm on Si Interposer
HBM2
400 µm on FR4 PCB/
SerDes
No of signal links per mm 133 20 2.5
Inter-die distance (µm) <100 <5,000 10,000
Link Latency (ps)0.19a
2.01b - -
Overall Latency (ps)27a
55.6b 300[2] ~1,000
Max data-rate/link (Gbps)25a
5b 2[4] 40[1]
Energy per bit (pJ/b)<0.02a
<0.09b 6[3] 23.2[1]
Max Bandwidth per mm*
(Gbps/mm)
3,325a
665b 40 100
Table 2:Si-IF vs Interposer vs Conventional PCB
a. Without ESD capacitance
b. With ESD capacitance* Calculation done assuming only peripheral links. Bandwidth can further be improved using array configuration of links.
SuperCHIPS Protocol• High bandwidth and low power using large number of
parallel short links– Hardware based, agnostic to soft protocols
– Simple inverter based drivers (No SerDes)
– Packet-less data transfer
– No clock and data recovery circuitry needed
• Data-rate/link: 25 Gbps/link
• Data-Bandwidth: 3.3 Tbps/mm
• Energy/bit: 0.09 pJ/b
• Enablers– Fine pitch interconnects (10 µm)
– Package-less dielet assembly at <100 µm spacing
SuperCHIPS protocol recommendations
Option 1 Option 2 Option 3 Option 4
No
ESD
With
ESD
No
ESD
With
ESD
No
ESD
With
ESD
No
ESD
With
ESD
Link Length
(µm)100 150 500 1000
Data-rate/link
(Gbps)25 5 16 4.5 4 2.5 2.5 2
Data-bandwidth
(Tbps/mm)3.32 0.66 2.1 0.59 0.53 0.33 0.33 0.27
• Configuration: SGS (shared ground)
• Number of peripheral links: 133 #/mm
• Data rates may be increased by increasing driver size * Calculations done assuming 250 Ω equivalent driver resistance
GS
S
SG G
S
S
SG
S S
Link length
Die
let
1
Die
let
2
Die
ed
ge
Partitioning and Reintegration Flow
System fabrication and assembly using the CHIPS approach
Dielet size and interconnect pitch - Case Study - BlueGene® Q
• Most dielets are small( < 3mmX3mm)• Large Dielets are dominated by interconnect congestion –
can be mitigated by extra wiring levels• Interconnect pitch of 5 mm would be would suffice
Several Challenges still Remain
• Large area patterning• Connections to the external
world: (connectors, RF, optical)• On-IF long haul communication• Network on IF• Reliability• Power Management • Thermal Management• Developing the IP ecosystem• and many others……………
But Everything is not Rigid
High Performance Flexible systems will require high performance heterogeneous Chips and high interconnect density
Self contained implantable and autonomous electronics has widespread applications includingneuroprostheics with electrodes, Optogenetics with mLEDs
But they do need Hi performance processors, heterogeneous components and innovative power delivery and communication and fine pitch interconnects on flexible substrates to conform and insert
Grand Challenges
• Depression• Parkinson's• Epilepsy • Neuroprosthetics• And many more
29
This work incorporates two “novel” Concepts
Flexible substrate
Small/thin Si dielets
This work
High-density wirings
Low stress
Cross-
section
Bird-eye
view
Wafer-level processing
Similar to a bicycle chain:rigid segments with flexible links
Semi rigid dielets on a an ultra flexible substrate(bicycle chain approach)
Fan-Out Wafer-Level Packaging applied to flexible biocompatible molding compoundsthat is die technology agnostic and capable of fine pitch interconnects
A truly flexibleHigh performanceassembly
Properties PDMS (MDX4-4210 / Dow) Polyimide
Tensile strength 5 MPa231 MPa
Elongation at break~500%
< 100%
Dielectric constant 3.01@100Hz) 3.4 (@1KHz)
Dissipation factor 0.0009@100Hz) 0.005 (@1KHz)
CTE~300 ppm/K 20 ppm/K
Young modulus 0.5 MPa 2.5 GPa
Glass Transition temp. (Tg) -120 ˚C >350 ˚C
Thermal decomposition temp. 200 ˚C or more 500 ˚C
Curing temp. 25 ˚C – 80 ˚C 250 ˚C or more
Biocompatibility(screening test)
Passed up to 29 days for implantationin the human body.
Yes
Properties of a Biocompatible PDMS (SILASTIC® MDX4-4210 / Dow)
Biocompatible PDMS acts as a flexible substrate with appropriate curing temperature
A Fabrication Flow of FlexTrateTM using Flexible FOWLP Process with a Biocompatible PDMS
This transfer process allows wafer-level processing based on FOWLP to make flexible inorganic semiconductor dielets with fine-pitch interconnects.
FlexTrateTM
32
Fan-out ChallengesSi Si Si Si
Si wafer (2nd handler)
Cross-section of measured structure100mm
Die tiltCoplanarity
Dies can: • Tilt• Pop-up• Shift
• The lowest die tilt is given by the lowest temperature and thinnest adhesion layer (<6mm)
• The coplanarity is getting smaller (<1mm) as the curing temperature decreases.
• We can reduce the die tilt and coplanarity by controlling curing temp. & adhesion thickness.
Metallization Challenges on Elastomers
Less than 0.4% change in R for corrugated Lines
X’
X’
Corrugated Interconnects
Planar Interconnects
500 µm
35
UCLA Bend tester
Line width: 95mmLength: 40 mmAu thickness: 200 nm
CHIPS
Automated Medical Diagnosis Using TN
• Identifying & diagnosing back-pain related issues by analyzing spinal CT-scan images
• Current techniques rely on a qualified radiologist to visually examine the x-ray/CT-scan and provide their opinion
• Easy to miss or overlook certain features
• Very difficult to diagnose the presence of objects pinching the spinal nerve by visual inspection
• Collaboration with Department of Neurosurgery at UCLA (Special thanks to Dr. Luke Macyszyn & Dr. Bilwaj Gaonkar)
• Access to one of the largest databases of anonymous patient data Currently working with 457 training & 106 testing CT-scans
Power
(W)
CPU50W
0.1W
5
FPS
100
FPS
GPU*
Single
-TN**
5W
5000
FPS
Original Human-generated Automated
CHIPS
3D-WSI for scaling Neuromorphic Systems
Scaling-out the system to host larger network
− More neurons
− Sufficient synaptic connections
Two chips integrated on a wafer (or CHIPS
Si-IF)
Wafers stacked by 3D-WSIPCB vs. 3D-WSI
Source: Zhe Wan, S. S. Iyer, "Three-Dimensional Wafer Scale Integration for Ultra Large Scale
Cognitive Systems", IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference
(S3S), 2017.
Chip2
Chip3
Chip1
Chip4
How we do it @CHIPS
Integration
A processing facilityfor
Interconnect Fabric (IF) & assembly
•Silicon processing•Glass•Flexible substrates•Additive manufacturing•Thermal compression bonding•Wafer thinning•Wafer-wafer integration
Applications & Architecture
Heterogeneous SystemsApproximate Computing
Cognitive ComputingFault Tolerance
Supply chain Integrity SecurityMemory Subsystems
Processing in Memory DFTNetwork on Board
Materials
Fine Pitch InterconnectSubstrate Materials Warpage, Stress Flexible MaterialsThermal solutionsMaterials for Additive Mfg. Reliability
Devices/Components
Novel switches New memoryMEMSSensorsPassives, antennaeMedical devices
Design Infrastructure
Thermo-MechanicalElectrical
ToolsPartitioning
DFTActive IF Design
Tier 1 Equipment Partners
New Tool ConceptsTool Development
Scale-Up
Tier 1 Foundry Partners
Si, Compound Semis, MEMS, and OSATs
Subramanian S. Iyer
Aspirational Goals for future interconnected heterogeneous systems
• True Three dimensional form factor & connectivity
• Data Center - type Scales with desktop type “morphable” form factors
• Limitless heterogeneity and communication modes
• Thermal solutions• Infinite customization• infinitesimal time to market• Manufacturing solutions
Materials and Integration Processes (Hard engineering and Rocket Science)
• Silicon will continue to expand its role as a versatile substrate for heterogeneous Integration
• how do you engineer its properties for a versatile low cost, low loss interconnect fabric• Physically flexible links• Biocompatiblity
• Interconnects will continue to be the roadblock• can we engineer 2D materials (or something else) to be good conductors (vs switches) and pattern them• How do we attach dies to free form interconnect fabrics at sub-micrometer pitches reliably, with high
throughput and low cost
• Customization requires new patterning schemes that are maskless and can conform to arbitrary size, form factor and arbitrary 3D shapes
• Small form factor passives that have High Q at high frequencies
• Get the power in & take the heat out
• Revisit (based on state-of-the-art miniaturization potential) • Vacuum “electronics”• NEMS based solutions - eliminate stiction ?• Fluidic logic• Already emerged memories eg. SRAM and DRAM with wafer level stacking
Summary• We are in the midst of a significant hardware transformation
• Semiconductor Scaling is saturating• The application space is getting very diverse• Systems are getting more heterogeneous• SoCs design costs and Times-to-Market huge
• Our approach @UCLA CHIPS drives a much more holistic Moore’s Law and enable this new reality
• Silicon Interconnect Fabric• Flextratetm
• Energy efficient Computing
• But it does take a village !
12/12/2017 42
UCLA CHIPS Thanks you for your Support !
AcknowledgementsFaculty: Mark Goorsky, Puneet Gupta, Sudhakar Pamarti,
Visiting Scholars: Adeel Bajwa,, Amir Hanna, Boris Vaisband
PhD Students: J. Sivachandra, Zhe Wan, Saptadeep Pal, Steven Moran, GouthamEzhilarasu, Faraz Khan, Xuefeng Gu, PremKittur, Arsalan Alam, K.T. Kannan, JP Santos, Partia Naghebi, Niloofar ShakoorZadeh, Jonathan Cox, Masoud Monjatipour
MS Students:Hannah Marvin, Yandong Luo, Arshiya Vohra
Undergraduate Students: William Whitehead, Randall Irwin, Aidan Wolk
Alums: Tak Fukushima (Tohoku U), Menglu Li (Apple);
chips.ucla.edu
The Charge Trap Transistor (CTT)
ΔV
t(V
)
As a digital memory:As a plastic synapse:
a=0.31V; b=0.38t=22.8ms
Charge trapping mechanism: Oxygen vacancies in HfOx
mechanism
The Physics
CTT as a non volatile memory element
0 150 300 450 600 7500
50
100
150
200
V
T (
mV
)
Switching Cycle #
With GlobalFoundries
CTTs as Electronic Synapses
LTPLTD LTP LTD
𝑓LTD/(𝑓LTP+𝑓LTD)
Unsupervised Learning Using CTT
X. Gu and S. S. Iyer, IEEE EDL, 2017
• Proof-of-concept network simulated using CTT characteristics to cluster stylized letters
• Perfect clustering achieved after on average 24 presentations