Chung Huang, Amjad Qureshi, and Kishore Kasamsetty
Complexities in Developing a High-Performance DDR Subsystem at 3200Mbps on 16FF+ and 10FF
2 © 2015 Cadence Design Systems, Inc. All rights reserved.
• Challenge of timing budget • Challenge of system uncertainty vs. training/calibration • Challenge of signal and power integrity • T16FF+ LP4-PoP 3200 test chip data • T16FF+ LP4-DSC 3200 test chip data • 10nm vs. 16FF data
Table of contents
3 © 2015 Cadence Design Systems, Inc. All rights reserved.
Challenge of high-speed DDR timing budget
DDR4 SDRAM Cadence PHY
Critical Timing – DDR4 Timing Budget Breakdown
Shrinking Trend of Read Eye Window
4 © 2015 Cadence Design Systems, Inc. All rights reserved.
DDR subsystem timing budget trend (read timing)
0.000 100.000 200.000 300.000 400.000 500.000 600.000
LPDDR4-‐3200
DDR4-‐2400
DDR3-‐2133
LPDDR3-‐1866
LPDDR2-‐1066
DRAM SPEC TREND(Read ;ming)
PHY CHANNEL
• DRAM spec consumes largest percentage of timing budget • Channel budget gets worse in absolute “ps” as frequency scales • SoC needs to pick up the slacks of DRAM timing
5 © 2015 Cadence Design Systems, Inc. All rights reserved.
VT drift implications on training and leveling
6 © 2015 Cadence Design Systems, Inc. All rights reserved.
item Training Uncertainty in Question
1 CA (Vref) DRAM Vref variation
2 CA (Data eye) Delay variation across CA bit lines against CLK
3 Read gate Read DQS preamble placement
4 Read data eye Delay variation across DQ bit lines against DQS
5 Write leveling WR DQS-CLK timing variation
6 Write DQ Delay variation across DQ bit lines against DQS
7 DQ VREF DRAM Vref variation
8 Per-bit deskew Delay variation across bit lines
9 IO PVT calibration Process variation, VT drift of IO impedance/Ibias
10 Delay line PVT calibration Process variation, VT drift of delay line against CLK
DDR training offsetting system uncertainty
DDR training supported by JEDEC spec
7 © 2015 Cadence Design Systems, Inc. All rights reserved.
• Supply noise induces jitter along the clock or data path across multiple voltage domains
• Clock or data jitter consumes timing budget, which means running the bus at lower speed
Minimizing supply noise improves clock and data jitter
DDR4-3200, Non-ideal VDDQ
DDR4-3200, Ideal VDDQ = 1.2v
Supply Noise
Supply Noise
Supply Noise
8 © 2015 Cadence Design Systems, Inc. All rights reserved.
• Proper die/package/board decap per speed bin
• PLL wake-up power sequencing
• Die level voltage domain Isolation: VDDPLL, VDDCLK, VDDQ, VDD
• SPG ratio recommendation (after trade-off between pin counts vs SI/PI)
• DBI mode support per JEDEC standards (DDR4/LPDDR3)
DDR supply noise reduction technique
PKG VDDA (PLL)
VDD (PHY/IO)
PCB
Vreg1
DDR PHY
VDDQ (IO)
Vreg2
DDR supply domain isolation
DDR bump SPG 2:1:1 ratio
VSS VSS VSS VSS VSS
VDDQ VDDQ VDDQ VDDQ VDDQ
PAD_MEM_DM[0]
PAD_MEM_DATA[5]
PAD_MEM_DQS_M[0]
PAD_MEM_DATA[2]
PAD_MEM_DATA[0]
PAD_MEM_DATA[7]
PAD_MEM_DATA[6]
PAD_MEM_DQS_P[0]
PAD_MEM_DATA[3]
PAD_MEM_DATA[1]
VSS PAD_MEM_DATA[4] VSS VSS VSS
VSS VDD VDD VDD VDDPLL1
VSS VDDQ VSS VSS VDDPLL2
VDDQ PAD_MEM_DATA[13] VDDQ VDDQ VDDQ
PAD_MEM_DM[1]
PAD_MEM_DATA[12]
PAD_MEM_DQS_M[1]
PAD_MEM_DATA[10]
PAD_MEM_DATA[8]
PAD_MEM_DATA[15]
PAD_MEM_DATA[14]
PAD_MEM_DQS_P[1]
PAD_MEM_DATA[11]
PAD_MEM_DATA[9]
VSS VSS VSS VSS VSS
DQ[3] VDDQ DQ[2] DQ[1] VDDQ DQ[0]
DQ[7] VSS DQ[6] DQ[5] VSS DQ[4]
DDR BGA SPG 4:1:1 ratio
VDDQCK (CK)
LPF
LPF
9 © 2015 Cadence Design Systems, Inc. All rights reserved.
On-die decap sources: • Device cap • MIM cap • Metal cap
Importance of on-die decoupling
Die decap • 21pF per IO • 50pF per IO • 80pF per IO • 107pF per IO • 135pF per IO
DDR4 PHY
ZPDN_VDDQ (Die+PKG+PCB) vs on-die decap density PDN input impedance • Die decap affects Q of
high-frequency PDN resonance (typically ~ 100-800 MHz)
• Adding on-die decap suppresses the peaking amplitude and lowers the peaking freq
data slice
data slice
data slice
data slice
PLLPLLPLL PLLPLL
memclk
adrslice
adrslicead
rctl
slic
e
adrc
tlsl
iceIO
calibration
DDR4 IO pads
data data data dataAC AC AC ACClk
DECAP
Digital PHY
10 © 2015 Cadence Design Systems, Inc. All rights reserved.
Benefits of DECAP on data eye on SI/PI
DQS with non-ideal PDN DQ with non-ideal PDN DQS with ideal PDN DQ with ideal PDN VDDQ with non-ideal PDN VDDQ with ideal PDN
• Increase decap density → reduce supply noise → reduce jitter
• Some jitter are correlated between DQ and DQS → differential DQ-to-DQS jitter < single-ended DQ jitter
Decap=160pf/IO SSO DBI disabled
Ideal PDN SSO DBI disabled
Decap=50pf/IO SSO DBI disabled
11 © 2015 Cadence Design Systems, Inc. All rights reserved.
DDR4 3200 signal integrity challenges DDR4 3200 Gb/s, 1DPC, 1R, DQ Write Eye
DDR 1-DIMM Topology
DDR 2-DIMM Topology
DDR4 3200 Gb/s, 2DPC, 1R, DQ Write Eye @ Near-DIMM (DIMM0)
DDR4 3200 Gb/s, 2DPC, 1R, DQ Write Eye @ Far-DIMM (DIMM1)
12 © 2015 Cadence Design Systems, Inc. All rights reserved.
At 3200, xtalk consumes substantial amount of link budget for DIMM topology.
Channel impairment due to ISI and X-Talk at DDR4 3200
At 3200, DDR4 channel dominates the link budget. SI becomes critical, I/O needs to meet very tight budget
13 © 2015 Cadence Design Systems, Inc. All rights reserved.
T16FF+ LPDDR4 test chip
Item Value
Process TSMC 16FF LL+
Protocol LPDDR4 3200
Bus Width One x16 LPDDR4 channel 2 rank
Package 2-2-2 FC-BGA
LPDDR4 Die Floor Plans CDNS T16FF+ LP4-PoP Test Board
14 © 2015 Cadence Design Systems, Inc. All rights reserved.
T16FF+ LP4-PoP 3.2Gbps silicon correlation
Measured DQ/DQS eye
Simulated DQ eye
Measured DQ (WR Burst) simulated DQ (WR Burst)
VDDQ measured VDDQ simulation
15 © 2015 Cadence Design Systems, Inc. All rights reserved.
T16FF+ LP4-Discrete 3.2Gbps silicon correlation
CDNS T16FF+ LP4-DSC Test Board
Measured DQ/DQS eye
Simulated DQ eye
16 © 2015 Cadence Design Systems, Inc. All rights reserved.
Challenges in measuring and fe-embedding DDR signals
DRAM Eye @ Pin
DRAM Eye @ Pad
Little reflection on Rx pad
Large reflection on Rx pin
17 © 2015 Cadence Design Systems, Inc. All rights reserved.
DDR IO & PHY comparison
IO PHY 28HPM 16FF 10FF 28HPM 16FF 10FF
Performance (phy STA margin or IO jitter)
2667Mbps 3200Mbps 4267Mbps 2667Mbps 3200Mbps 4267Mbps
Area 100% 88% 65% 100% 40% 20%
18 © 2015 Cadence Design Systems, Inc. All rights reserved.
Achieving 3200 performance on 16FF/10FF
Simulation conditions • DDR4 memory down channel • 34ohm PU/PD driver strength • 60ohm termination • Nominal PVT conditions • PRBS data pattern • 3200Mbps data rate
VDDIO Transient Current
Receiving Data Eye
19 © 2015 Cadence Design Systems, Inc. All rights reserved.
Signal integrity modeling and simulation flow