+ All Categories
Home > Documents > DDR4 Designing for Power and Performance - MemCon

DDR4 Designing for Power and Performance - MemCon

Date post: 11-Feb-2017
Category:
Upload: doandung
View: 264 times
Download: 9 times
Share this document with a friend
48
DDR4: Designing for Power and Performance
Transcript
Page 1: DDR4 Designing for Power and Performance - MemCon

DDR4: Designing for Power and Performance

Page 2: DDR4 Designing for Power and Performance - MemCon

Agenda

Comparison between DDR3 and DDR4 Designing for power

− DDR4 power savings

Designing for performance − Creating a data valid window − Good layout practices for DDR4 − Board debug tools to minimize issues

Looking ahead and conclusion

2

Page 3: DDR4 Designing for Power and Performance - MemCon

Comparison Between DDR3 and DDR4

3

Page 4: DDR4 Designing for Power and Performance - MemCon

DRAM Technology Comparison DDR3 DDR4 GDDR5

Voltage 1.5 V / 1.35 V 1.2 V 1.5 V / 1.35 V

Strobe Bi-directional differential Bi-directional differential Free-running differential WRITE clock

Strobe Configuration Per byte Per byte Per word READ Data Capture Strobe based Strobe based Clock data recovery

Data Termination VDDQ/2 VDDQ VDDQ Address/Command

Termination VDDQ/2 VDDQ/2 VDDQ

Burst Length BC4, 8 BC4, 8 8 Bank Grouping No 4 4

On-Chip Error Detection No Command / address parity CRC for data bus CRC for data bus

Configuration x4, x8, x16 x4, x8, x16 x16, x32 Package 78-ball / 96-ball FBGA 78-ball / 96-ball FBGA 170-ball FBGA

Data Rate (Mbps/Pin) 800 – 2,133 1,600 – 3,200+ 4,000 – 7,000 Component Density 1 GB – 8 GB 2 GB – 16 GB 512 MB – 2 GB

Stacking Options DDP, QDP Up to 8H (128-GB stack); single load No

4

Page 5: DDR4 Designing for Power and Performance - MemCon

DDR4 Power Savings

5

Page 6: DDR4 Designing for Power and Performance - MemCon

DDR4 Power Savings Features

DDR4 voltage is 1.2 V (up to 40% savings) − Lower voltage than DDR3 (1.5 V) − On-die VREF − Pseudo-open drain I/Os

Manages refreshes (up to 20% savings) − Based on temperature

New DDR4 low-power auto self-refresh (LPASR) capability − Changes refresh rate based on temperature

− Only refreshes parts of array that is in use Controller must allow fine-granularity refresh based on memory utilization

Supports data bus inversion − Limits number of signals transitioning, reducing simultaneous switching

output (SSO) and saving power

6

Page 7: DDR4 Designing for Power and Performance - MemCon

Creating a Data Valid Window

7

Page 8: DDR4 Designing for Power and Performance - MemCon

Timing Margins Are Shrinking

8

Data Valid Window

DRAM Margin

Package/ Board Margin

Chip Margin

DDR1 2,500 900 800 800 DDR2 938 425 256 256 DDR3 469 188 140 140 DDR4 313 125 93 93

2,500

938

469 313

DDR1 DDR2 DDR3 DDR4

Shrinking Timing Margins in Picoseconds DRAM Margin Package/board Margin Chip Margin Data Valid Window

400 Mbps 3,200 Mbps

Package / Board Margin

Page 9: DDR4 Designing for Power and Performance - MemCon
Page 10: DDR4 Designing for Power and Performance - MemCon

Shrinking the Window Even More: DDR4 VREF Training (1/2)

DDR4 VREF training − Training: sweep VREF setting, find maximum passing window

Lump sum of DCD, RX offset, etc. Resolution error is the combination of (VREF, PI, or delay chain)

− Margin loss calculation VREF step size: from 0.5% VDDQ to 0.8% VDDQ VREF set tolerance: 1.625% or 0.15% Calibration error: 1 step size

− 0.8% * VDDQ = 0.8% * 1.2V = 9.6 mV Margin loss (due to VREF calibration error)

− 9.6 mv * 2 / slew_rate = 4.8 ps (assume slew rate = 4 V/ns) Calibration error = half step size

10

Vref Step Size Vref step 0.50% 0.65% 0.80% VDDQ 2

Vref Set Tolerance Vref_set_tol -1.625% 0.00% 1.625% VDDQ 3, 4, 6

-0.15% 0.00% 0.15% VDDQ 3, 5, 7

Page 11: DDR4 Designing for Power and Performance - MemCon

Shrinking the Window Even More: DDR4 VREF Training (2/2)

Discussion with JEDEC members − RDDR4 specification section 13.4: any DRAM component level variation

must be accounted for within the DRAM RX mask. This means that the VREF calibration error is included in VdlVW_total.

− VREF_DQ internal aligns to VCENT_DQs with training. VCENT_DQs has variation. VREF_DQ training error should increase with this variation and internal voltage noise etc.

11

Page 12: DDR4 Designing for Power and Performance - MemCon

Shrinking the Window Even More: Duty Cycle Error

DDR4 specification is +/-2% tCK = +/- 0.04 UI − IPD current budget +/-3% tCK

Margin loss is 4% tCK With proper link timing calibration

− 2% tCK margin loss

Assume same for read

12

+/-2%

+/-2%

DQS

DQ

Timing Parameters by Speed Bin for DDR4-2400 to DDR4-3200

Speed DDR4-2400 DDR4-2666 DDR4-3200 Units NOTE

Parameter Symbol MIN MAX MIN MAX MIN MAX

Clock Timing

Minimum Clock Cycle Time (DLL Off Mode) tCK (DLL_OFF) 8 - 8 - 8 - nδ 22

Average Clock Period tCK (avg) TBD pδ

Average High Pulse Width tCH (avg) 0.48 0.52 0.48 0.52 0.48 0.52 tCK (avg)

Average Low Pulse Width tCL (avg) 0.48 0.52 0.48 0.52 0.48 0.52 tCK (avg)

Page 13: DDR4 Designing for Power and Performance - MemCon

Shrinking the Window Even More: Calculating the PLL Jitter

13

Current Profile : I(f) PDN Impedance : Z(f)

f f

Jitter Sensitivity : S(f)

f

Jitter Spectrum J(f)

f

iFFT

TIE Jitter : j(t)

t

)()()()()()( tjfJfPfSfZfI TIEiFFT→=×××

p-p jitter

PSRR of PLL: P(f)

f

Page 14: DDR4 Designing for Power and Performance - MemCon

DDR4 Bank Group Timing

Different timing within a group and between groups (tCCD, tWTR, tRRD) − “Long” timing: bank-to-bank within a group − “Short” timing: access to different bank groups

Maintain array timing requirements within bank group Maintain speed between different bank groups

Bank Group 1

Bank 2

Bank 0

Bank 3

Bank 1

Bank 2

Bank 0

Bank 3

Bank 1

Short Timings

Long Timings

14

Bank Group 1

Bank Group 0

Bank 2

Bank 0

Bank 3

Bank 1

Bank Group 3

Bank 2

Bank 0

Bank 3

Bank 1

Bank Group 2

Bank 2

Bank 0

Bank 3

Bank 1

Page 15: DDR4 Designing for Power and Performance - MemCon

Calibration Is Critical to Shrinking Margins

15

-0.1

0

0.1

0.2

0.3

0.4

0.5

Mar

gin

(ns)

FPGA EffectsExternalEffects

CalibrationEffects

CalibrationUncertainty

No Margin Without Calibration

Page 16: DDR4 Designing for Power and Performance - MemCon

What is Calibration?

16

Benefit: Accurate strobe placement More resync margin

0 15 30 45 60 … … … … 315 330 345 360DQ0DQ1DQ2DQ3**DQ70DQ71

Valid data window

Resync Calibration

Voltage and temperature

tracking Data shifts due to VT variations

VT Compensation

Benefit: Dynamic phase adjustment to match shifting data valid window Robust over VT

Capture Calibration (De-skew)

Benefit: Reduce skew between data group More capture margin

Before de-skew – small valid capture window DQs

0 15 30 45 60 75 90 105 120 135 150 165 180DQ0DQ1DQ2DQ3DQ4DQ5DQ6DQ7

DQs0 15 30 45 60 75 90 105 120 135 150 165 180

DQ0DQ1DQ2DQ3DQ4DQ5

After de-skew – maximize valid capture window

Page 17: DDR4 Designing for Power and Performance - MemCon

High-Level Output Topology

Calibration knobs − DQ-out1 and DQ-out2 delay : Control the delay applied to outgoing DQ

pins − DQS-out1 and DQS-out2 delay : Control the delay applied to outgoing DQS

pins − Write leveling output : Changes the delay on both DQ and DQS relative to

the memory clock-in phase taps

17

DQS

CLK

DQS OUT2 DelayDQS OUT1 Delay

X phaseX+90 phase

DQDQ OUT2 DelayDQ OUT1 Delay

ptap control DQS out dtap1 control

DQS out dtap2 control

DQ out dtap1 control

DQ out dtap2 control

Page 18: DDR4 Designing for Power and Performance - MemCon

High-Level Input Topology

Calibration knobs − DQ-in delay: Control the delay applied to incoming DQ pins − DQS-in delay: Control the delay applied to incoming DQS pins − LFIFO : Controls number of cycles after read command that data is read out of

the LFIFO − DQS-En phase: Control the delay on DQS En in phase taps − DQS-En delay: Control the delay on DQS En in dtaps − VIFO : Adjusts the delay in cycles applied to controller-provided DQS burst signal

to generate DQS enable

18

DQS

DQ

DQS IN Delay DQS Delay Chain

DQ IN Delay

DQS in dtap control

DQ in dtap control

DDIOin

DQS Enable

X phase

dqs_en ptap control

DQS En Delay

DQS en dtap control

VFIFO

vfifo control

LFIFO

Lfifo control

Page 19: DDR4 Designing for Power and Performance - MemCon

Calibration Stages

DQS-enable calibration − Calibrate DQS enable (delayed read data valid) relative to DQS

Post-amble tracking − Track DQS-enable across temperature variation

Read data deskew − Calibrate DQS relative to read command (read leveling)

− Calibrate DQ versus DQS (per-bit deskew) for reads

LFIFO training − Calibrate LFIFO delay cycles (read latency)

Write leveling − Calibrate DQS and DM to write command (write leveling)

Write data deskew − Calibrate DQ versus DQS (per-bit deskew) for writes

Address/command training (leveling and deskew) − Calibrate CS, CAS, RAS, and ODT versus memory clock

VREF training (FPGA and memory) − Calibrates receiver voltage threshold

(for DDR4 with pseudo open drain DQs)

19

Initialize INST/AC ROM for all pins on this

Mem Interface

Initialize the memory(Mode Registers etc.)

Calibratethe Mem Interface

Start

Y

N

User command found in DPRIO?

User command found in RAM?

Process DPRIO user command

Process RAM user command

Y

YN

N

All Mem Interfaces calibrated?

Calibration loop

User mode loop

Wait for PLL/DLL locking

Page 20: DDR4 Designing for Power and Performance - MemCon

Calibration Is Critical to Shrinking Margins

20

-0.1

0

0.1

0.2

0.3

0.4

0.5

Mar

gin

(ns)

FPGA EffectsExternalEffects

CalibrationEffects

CalibrationUncertainty

No Margin Without Calibration

Page 21: DDR4 Designing for Power and Performance - MemCon

Good Layout Practices for DDR4

21

Page 22: DDR4 Designing for Power and Performance - MemCon

DDR4 Output Driver

DDR3 – Push-Pull DDR4 – Pseudo Open Drain

22

Content Courtesy of Micron

Page 23: DDR4 Designing for Power and Performance - MemCon

Unadjusted, Non-Terminated Data Eye

Jitter

Overshoot

Undershoot

VDD

VSS

23

Content Courtesy of Micron

Page 24: DDR4 Designing for Power and Performance - MemCon

Terminated Data Eye

VIHac

VILac

Vref

VIHdc

VILdc

Hi-Ringback

Lo-Ringback

Overshoot

Undershoot

24

Content Courtesy of Micron

Page 25: DDR4 Designing for Power and Performance - MemCon

OCT from the Controller Standpoint

DQ and CA pins are terminated differently in DDR4

25

Specification DDR3 DDR4

Density / Speed 512 Mb ~ 8 GB 1.6 ~ 2.1 Gbps

2 GB ~ 16 GB 1.6 ~ 3.2 Gbps

Interface

Voltage (VDD / VDDQ / VPP)

1.5 V / 1.5 V / NA (1.35 V / 1.35 V / NA) 1.2 V / 1.2 V / 2.5 V

VREF External VREF (VDD / 2) Internal VREF (need training)

Data I/Os CTT (34 ohm) POD (34 ohm)

CMD/ADDR I/Os CTT CTT

Strobe Bi-directional / differential Bi-directional / differential

Core Architect

Number of banks 8 16 (4 GB)

Page size (x4 / x8 / x16) 1 KB / 1 KB / 2 KB 512 B / 1 KB / 2 KB

Number of prefetch 8 bits 8 bits

Added function RESET / ZQ / Dynamic ODT + CRC / DBI / Multi preamble

Physical

Package type / balls (x4, x8 / x16) 78 / 96 BGA 78 / 96 BGA

DIMM type R, LR, U, SoDIMM + ECC SoDIMM

DIMM pins 240 (R, LR, U) / 204 (So) 284 (R, LR, U) / 256 (So)

Page 26: DDR4 Designing for Power and Performance - MemCon

OCT Calibration Scheme to Support DDR4

OCT can calibrate 2 times with 2 sets of pins (DQ/CA) DQ and CA pins will have 2 different sets of codes in DDR4

26

DDR3 DDR4

Page 27: DDR4 Designing for Power and Performance - MemCon

General Layout Concerns

Avoid crossing splits in the power plane SSO on controller collapsed strobes/clocks

− Separate supplies and/or flip-chip packaging helps

Low-pass VREF filtering on controller helps Minimize VREF noise Minimize intersymbol interference (ISI) Minimize crosstalk

27

Content Courtesy of Micron

Page 28: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (1/12)

Signal integrity review − Importance of transmission line theory

Today’s clock rates are too fast to ignore − Matched impedance line is important for good signaling

Mismatched impedance lines result in reflections Termination schemes are used to reduce / eliminate reflections

− Good power bussing is paramount to reducing SSO SSO reduce voltage and timing margins

− Decoupling capacitors needs and requirements

28

Content Courtesy of Micron

Page 29: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (2/12)

Signal integrity analysis is paramount to developing cost-effective high-speed memory systems − Develop timing budget for proof of concept − Use models to simulate − Board skews are important and should accounted for − ISI, crosstalk, VREF noise, path length matching, Cin and RTT mismatch –

employ industry practices and assumptions − Model vias too − Eliminate return path discontinuities (RPDs) − Minimize SSO affects

Difficult to model

29

Content Courtesy of Micron

Page 30: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (3/12)

DRAM and controller package parasitics are fixed − SSO effects already contained in their specified timings

However, these are to test conditions with specific decoupling

Power delivery network (PDN) for the controller and DRAM need to be properly designed

Lowering power supply inductance minimizes signaling variations between devices − Use power and ground planes wherever possible − Make all power and ground traces as fat as possible − Couple power and ground as much as possible

Lowers inductance (mutual effects)

30

Content Courtesy of Micron

Page 31: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (4/12)

SSO − Timing and noise issues generated due to rapid changes in voltage and

current caused by multiple circuits switching simultaneously in the same direction

Problems caused by SSO − False triggers due to power/ground bounce − Reduced timing margin due to SSO induced skew − Reduced voltage margin due to power/ground noise − Slew rate variation

31

Content Courtesy of Micron

Page 32: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (5/12)

Good power bussing is paramount to reducing SSO

Reduce L (power delivery effective inductance) − Use planes for power and ground distribution − Proper routing of power and ground traces to devices − Proper use of decoupling capacitance

Locate as close as possible to the component pins

Reduce dI/dt (switching current slew rate) − Use the slowest drive edge that will work − Use reduced drive strength instead of full drive where possible

⋅=∆

dtdILV

32

Content Courtesy of Micron

Page 33: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (6/12)

RPDs induce board noise and are difficult to model − Splits/holes in reference planes − Connector discontinuities − Layer changes

Avoid RPDs if at all possible − Avoid crossing holes/splits in reference plane − Route signals so they reference the proper domain − Add power/ground vias to board

Especially in dense layer-change areas − Place decoupling capacitors near connectors Solid Return Path

Split Return Path

33

Content Courtesy of Micron

Page 34: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (7/12)

VREF noise − Induces strobe to data skews and reduces voltage margins − Power/ground plane noise − Crosstalk

Minimize VREF noise − Use widest trace practical to route

From chip to decoupling capacitor − Use large spacing between VREF and neighboring traces

34

Content Courtesy of Micron

Page 35: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (8/12)

ISI − Occurs when data is random

Clocks do not have ISI − Multiple bits on the bus at the same time

Bus cannot settle from bit #1 before bit #2, etc. − Signal edges jitter due to previous bit’s energy still on the bus − Ringing due to impedance mismatches − Low pass structures can cause ISI

Minimize ISI − Optimize layout − Keep board/DIMM impedances matched

Drive impedance should be same as Zo of transmission line − Terminate nets

Termination values should be the same as Zo of transmission line − Select high-quality connector

Matched to board/DIMM impedance Low mutual coupling

35

Content Courtesy of Micron

Page 36: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (9/12)

Crosstalk − Coupling on board, package, and connector from other signals, including

RPDs Inductive coupling is typically stronger than capacitive coupling

− When aggressors fire at the same time as victim (e.g. data-to-data coupling) Victim edge speeds up or slows down, causing jitter

− When aggressors do not fire at the same time as victim (e.g. data-to-command/address coupling) Noise couples onto victim at time of aggressor switching

36

Content Courtesy of Micron

Page 37: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (10/12)

Minimize crosstalk − Keep bits that switch on same “clock” edge routed together

Route data bits next to other data bits; never next to CMD/ADDR bits − Isolate sensitive bits (strobes)

If need be, route next to signals that rarely switch − Separate traces by at least two to three {preferred} conductor widths

(more accurately, one would define by trace pitch and height above reference plane) Example: 5-mil trace located 5 mils from a reference plane should have a 15-mil gap

to its nearest neighbors to minimize crosstalk − Choose a high-quality connector − Run traces as stripline (as opposed to microstrip)

Not at the cost of additional vias − Maintain good references for signals and their return paths − Avoid RPDs − Keep driver, BD Zo, and ODT selections well matched

37

Content Courtesy of Micron

Page 38: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (11/12)

Cin mismatch − Differing input capacitances on receiver pins − Adds skew to input timings

RTT mismatch − Termination resistors not at nominal value − Internal ODT on data pins have smaller variation than on DDR2

They are calibrated (so is DRAM’s Ron) − External termination resistor variation must be accounted for

Consider one-percent resistors

38

Content Courtesy of Micron

Page 39: DDR4 Designing for Power and Performance - MemCon

Layout and Termination (12/12)

High-speed signals must maintain a solid reference plane − Reference plane may be either VDD or ground

− For DDR3 UDIMM systems, the DQ busses are referenced to ground while the ADDR/CMD and clock are referenced to VDD

− All signals may be referenced to ground if the layout allows

Best signaling is obtained when a constant reference plane is maintained − If this is not possible try to make the transitions near decoupling capacitors

Signal Power Plane

Ground Plane

Cap

39

Content Courtesy of Micron

Page 40: DDR4 Designing for Power and Performance - MemCon

Board Debug Tools to Minimize Issues

40

Page 41: DDR4 Designing for Power and Performance - MemCon

TimeQuest DDR Timing: Read Capture

41

Errors in the calibration algorithm Effects of

temperature and voltage changes on

the calibration

Total margin after calibration

“Before calibration” is the standard timing analysis

Calibrating out some of the process variation in the

memory

Calibrating to the FPGA variations

(deskew + pessimism removal)

Page 42: DDR4 Designing for Power and Performance - MemCon

EMIF Debug Toolkit Features

Reports results of the last calibration to the user − Reports interface details, margins observed before calibration, settings

made during calibration, and post-calibration margins − In the case of a calibration failure, toolkit reports the stage at which

calibration failed and the group

Provides eye monitor support Provides loopback support Allows user interaction with memory interface

− Send commands to the memory interface to recalibrate, mask groups and ranks

− Eye monitor support of data valid window − Loopback support for bit error rate (BER) testing

42

Page 43: DDR4 Designing for Power and Performance - MemCon

43

TimeQuest-Like GUI interface

Commands run Shown in console

Tasks section

Reports section

Page 44: DDR4 Designing for Power and Performance - MemCon

“On-Chip” EMIF Debug Toolkit

Core access to calibration data − Access same calibration data as the EMIF toolkit, now via FPGA logic

Via Avalon® Memory-Mapped (Avalon-MM) interface

44

Page 45: DDR4 Designing for Power and Performance - MemCon

Looking Ahead and Conclusion

45

Page 46: DDR4 Designing for Power and Performance - MemCon

Will There Be a DDR5?

Very unlikely − SI for a parallel bus of 2 GHz and above would be very difficult − Timing budget would be consumed in the package

PDN noise Package skew

Transition to stack memory − Hybrid Memory Cube and serialized memory − 3D memories integrated into ASICs

46

Page 47: DDR4 Designing for Power and Performance - MemCon

Conclusion

DDR4 has many ways to reduce overall system power − ~50% lower power than DDR3 at 1.5 V

DDR4 is 33% faster than DDR3 2133 But there are challenges…..

− Shrinking data valid window − Increase signal integrity and power integrity concerns

These can be overcome by good controller design − Innovative calibration − Good ODT − Careful board design − Good board debug tools

47

Page 48: DDR4 Designing for Power and Performance - MemCon

Thank You Thank You


Recommended