Features and Implementation of High- Performance 667Mbs and … · 2010-08-02 · solution space...

DesignCon 2005

Features and Implementation of High-Performance 667Mbs and 800Mbs DDRII Memory Systems Dail Robert “Bob” Cox, Micron Technology, Inc. Randy Wolff, Micron Technology, Inc. Doug Burns, Signal Integrity Software, Inc. (SiSoft) Barry Katz, Signal Integrity Software, Inc. (SiSoft) Walter Katz, Signal Integrity Software, Inc. (SiSoft)

Abstract With ever-increasing CPU speeds, the need for higher bandwidth memory systems is greater than ever. With signaling rates up to 800Mbs, DDR2 memory technology provides the required bandwidth and growth path needed for current and next generation CPUs and systems. Signal integrity, timing, and crosstalk analysis on these interfaces has become significantly more complex than for previous DDR1 technology. Historic analysis approaches do not properly calculate margin, thereby forcing costly over-design, system over-constraint, reduced margin, or reduced performance. This paper highlights key issues that must be incorporated into the design methodology for engineers performing pre-layout solution space analysis to identify topology and termination schemes or post-layout verification to validate physical implementation of DDR2 designs. Authors Biography Dail Robert “Bob” Cox, Simulation Engineer, Micron Technology, Inc Bob works as a simulation engineer for Micron Technology where he performs signal integrity analysis, system timing analysis and timing analysis tool development to support the module product group. The past couple of years Bob has focused on JEDEC reference card architecture and performed the support simulations for much of the DDR1 and DDR2 JEDEC small modules task group. Bob spent time architecting, simulating and designing motherboards for desktop computers with Micron's motherboard group. Bob's expertise in microprocessor, controller, video, PCI and memory subsystems was highly affective in the development of many of Hewlett Packard’s Laserjet products prior to joining Micron. Bob has won a number of industry awards including IEEE best product for a modem and several of Mentor Graphics technology awards for DFM and High Speed Design. Bob holds patents at Micron for memory termination on modules and has patents pending. Randy Wolff Simulation Engineer, Micron Technology, Inc Randy Wolff has managed the simulation team within Micron's Module Products Group for the last three years. This team performs electrical and thermal simulations on modules as well as characterization of component and packaging parasitics. He developed Micron's IBIS and HSPICE modeling program and is currently responsible for IBIS and HSPICE model development for all DRAM products. Randy currently serves as secretary of the IBIS Open Forum. Randy graduated cum laude from Montana State University with a BSEE degree.

Barry Katz, President and CTO, Signal Integrity Software, Inc. (SiSoft) Barry Katz is President and CTO of Signal Integrity Software, Inc. (SiSoft). Barry founded SiSoft in 1995. Throughout his career, Barry has been actively involved with leading edge high-speed design. Barry has extensive experience in developing tools to integrate and automate a wide range of signal integrity analysis processes. As CTO, Barry has played a key role in leading tool development efforts for SiSoft’s products. He has devoted much of his efforts at SiSoft to solving the problems faced by designers of leading edge high-speed systems. Barry has assembled a team of world-class experts committed to solving the most challenging high-speed design problems facing the industry today by delivering a comprehensive design methodology, software tools, and expert consulting. Barry has been a major influence to the signal integrity methodology utilized by numerous companies and has personally led multiple signal integrity design teams. He has expertise in all aspects of high-speed design including: timing analysis, interconnect analysis, crosstalk analysis, electro-magnetic modeling of interconnect, packages and connectors, IO buffer analysis and selection, decoupling analysis, simultaneously switching output analysis, interconnect topology, termination selection, clock distribution and skew analysis, and high-speed bus design. He currently serves as chairman of the IBIS Quality committee. Barry received an MSEE degree from Carnegie Mellon and a BSEE degree from the University of Florida. Douglas J. Burns, V.P. Consulting /Chief Scientist, Signal Integrity Software, Inc. (SiSoft) Doug Burns is Vice President Consulting Services/Chief Scientist of Signal Integrity Software, Inc. (SiSoft). Doug has over 20 years experience leading teams in designing hardware, performing signal integrity analysis, and implementing ASICs for Honeywell, Digital, and Compaq Computer. Doug was a key contributor in developing the system architecture of Digital’s giga-Hz ALPHA processor systems. Doug provided consulting expertise across many groups within Digital and Compaq, and has a successful track record of bringing products to market. Doug holds four patents in computer system design and has three patents pending. Doug’s expertise spans a wide range of engineering disciplines including: system architecture, VLSI design, package design, chip and system timing analysis, interconnect analysis, crosstalk analysis, electro-magnetic modeling, I/O buffer analysis and selection, simultaneously switching output analysis, termination selection, clock distribution and skew analysis, and high-speed bus design. Doug graduated Magna cum Laude from the University of Massachusetts with a BSEE degree and received an MSEE degree from Northeastern University. Walter M. Katz, PhD, Chief Scientist, Signal Integrity Software, Inc. (SiSoft) Dr. Katz is a pioneer in the development of constraint driven printed circuit board routers. He developed SciCards, the first commercially successful auto-router. Dr. Katz founded Layout Concepts and sold routers through Cadence, Zuken, Daisix, Intergraph and Accel. More than 20,000 copies of his tools have been used worldwide. Dr. Katz developed the first signal integrity tools for a 17 MHz 32-bit minicomputer in the seventies. In 1991, IBM used his software to design a 1 GHz computer. Dr. Katz holds a PhD from the University of Rochester and a BS from Polytechnic Institute of Brooklyn. Acknowledgements: Mike Mayer, Senior FAE, Signal Integrity Software, Inc. (SiSoft) We would like to acknowledge Mike’s contribution in the setup and support of the SiAuditor/Quantum-SI DDR2 analysis environment.

Introduction With ever-increasing CPU speeds, the need for higher bandwidth memory systems is greater than ever. With signaling rates up to 800Mbs, DDR2 memory technology provides the required bandwidth and growth path needed for current and next generation CPUs and systems. Signal integrity, timing, and crosstalk analysis on these interfaces has become significantly more complex than for previous DDR1 technology. Historic analysis approaches do not properly calculate margin, thereby forcing costly over-design, system over-constraint, reduced margin, or reduced performance. This paper highlights key issues that must be incorporated into the design methodology for engineers performing pre-layout solution space analysis to identify topology and termination schemes or post-layout verification to validate physical implementation of DDR2 designs. These include:

• On-die termination (ODT): ODT analysis involves the dynamic setting of SDRAM and memory controller terminator characteristics based upon the network configuration and active driver location. A detailed review of DDR2 ODT rules and configurations is discussed.

• Slew rate derating and frequency dependent timing: Slew rate affects the intrinsic delay of receivers, which in turn affects the final on-die eye timing characteristics. This paper introduces the concepts of a Virtual Eye Diagram and Frequency Agile Timing models for the proper timing margin analysis of DDR2 systems.

• Populations: At DDR2 signaling rates, designs exhibit very tight waveform quality and timing margins. It is vital to explore all possible system configurations and memory module populations to identify boundary condition cases.

• Inter-Symbol Interference (ISI): With large and varied loading configurations, network topologies may not settle between switching transitions. This effect is referred to as ISI or resonance and requires the use of pseudo-random bit sequence (PRBS) patterns to identify margin impacts.

• Process, Voltage, and Temperature (PVT): Simulations must be performed over environmental and process extremes to properly account for variations in silicon and etch characteristics. With source-synchronous timing integral to DDR designs, proper accounting of correlative and non-correlative effects is required.

The organization of the paper is as follows: First we present an introduction to DDR memory. This is followed with a section describing the key enhancements of DDR2 over DDR1 and the unique challenges associated with designing DDR2 memory systems. This leads into a discussion on I/O buffer modeling challenges. The paper culminates with analysis methodology requirements for DDR2 and results for a standard 667Mbs and 800Mbs DDR2 reference design. The DDR2 SoDIMM will be used as the design and analysis example for the paper. Overview of DDR Memory Theory of Operation Double Data-Rate (DDR) SDRAM memory refers to a type of memory architecture in which data is transferred on both the rising and falling edges of the clock. In practice, multiple clocks are used to enable DDR operation. For both DDR1 and DDR2, the rising edge of main clock (CK) is used to clock address, command and control (ADDCMD and CTRL) into the memory device and is also used as the clock for the memory core. DDR1 devices have a CK period that ranges from 10ns (100MHz) to 5ns

(200MHz) and DDR2 devices have a CK period that ranges from 5ns (200MHz) to 2.5ns (400MHz). Either one bi-directional strobe or two uni-directional strobes (DQS) are source-synchronously driven with the data (DQ) and data mask (DM) from the source device and are used to capture the data at the respective target device. DQ, DM, and DQS operate at the same rate as CK, with the DQ being clocked by both edges of DQS. As part of their physical implementation, both the memory controller and memory devices have two latches on the same DQ bit (Figure 1) with one latch capturing data on the rising edge of DQS and the other capturing data on the falling edge.

Figure 1: Data and Data Strobe Simple Schematic

For the source synchronous system to capture the appropriate value at the core, two conditions must be true. First, both DQS rising and falling must be positioned to insure that they meet the setup and hold requirements of DQ. Second, since the memory and controller cores are clocked on the CK positive edge and DQ is clocked by DQS, a fixed relationship between CK and DQS must be maintained. For proper operation, the rising edge of CK must occur after the falling edge of DQS, but before the next rising edge of DQS. In practice, the ADDRCMD and CTRL are pre-launched by ½ of the CK period (Figure 2). This is defined as a minimum launch of the address on the falling edge of the previous clock. In a registered system where the CK and ADDCMD/CTRL drivers are similar and each of the respective nets go to a single receiver on a memory module, this pre-launch is desirable and the effect is to center the clock in the middle of the ADDCMD/CTRL valid window. For unbuffered systems, the load variation between ADDCMD, CTRL, and CK increases (see Table 1 and Table 2) and an additional CK adjustment may be required. In addition, heavy loads in the unbuffered case may require that the address be driven over multiple cycles. For Writes, DQ and CK are driven from the controller at the same time and DQS is delayed by ¼ of the CK period (Figure 2). For Reads, the incoming DQS signal is delayed by ¼ of the CK period. The controller generally performs these shifts, otherwise, there must be a delay built into the system. This would typically be done through PWB traces, however, it is less than ideal to use PWB trace to accomplish this due to the excessive etch lengths required and inherent PCB delay variations that would introduce skew.

Figure 2: Simple DDR Timing Diagram

Electrical Signaling Environment Stub Series Terminated Logic (SSTL) is the parent signaling architecture that both DDR1 and DDR2 memory systems have evolved from. SSTL introduced the concept of AC and DC signaling levels. By defining AC and DC signaling levels, tighter setup and hold specifications could be developed based upon the receiver input waveform. DDR1 is based on a 2.5V VDDQ while DDR2 is based on a 1.8V VDDQ. For both DDR1 and DDR2, signals are required to transition from a DC level to the next logic state AC level and then settle out to be at or above the same logic level DC value. Nominally, interconnect delays are measured at VREF, relative to the standard load waveform. There are extensions to how interconnect delays are calculated for DDR2 based on slew rate that we will be discussing later in the paper. The SSTL signaling levels for DDR1 and DDR2 are shown in Figure 3 and Figure 4 respectively.

Figure 3 SSTL_2/DDR1 Signaling Levels

Figure 4 SSTL_18/DDR2 Signaling Levels

System Configuration Loading and Transfers Generically, memory systems can have from one to four memory modules. However, memory module loading is dependent on the type of memory modules being utilized. There are different types of memory modules including registered DIMMs, unbuffered DIMMs, SoDIMMs, micro DIMMs, and mini DIMMs. Some of these are inherently 64 bit, and others are 72 bit and support ECC but can be configured as 64 bit. Some have registers for the ADDCMD/CTRL signals and PLLs for clock distribution and tuning. In addition, the number of SDRAMs per memory module can vary through the use of different width components, stacked components, or banks/ranks.

Figure 5: Memory Interface Block Diagram

CTRL

DQ

DQS

DM

CK

ADDCMD

DDR Memory

Controller DD

R D

IMM

DD

R D

IMM

Figure 5 above, shows a system level block diagram of a two DIMM system with DIMM loading per net class as well as net class directionality. Table 1 and Table 2 show the range of loading configurations for one-slot and two-slot SoDIMM configurations for DDR1 and DDR2 respectively. The most notable difference is the removal of ECC from DDR2 SoDIMMs and the reduction from three to two copies of CK to the SoDIMM.

Table 1: DDR1 SoDIMM Loading and Transfers

Min Max Min Max Min MaxDQ 1 2 1 2 1 4 Controller↔SDRAM

DQS 1 2 1 2 1 4 Controller↔SDRAMDM 1 2 1 2 1 4 Controller→SDRAM

ADDCMD 1 2 4*/5 16*/18 4*/5 32*/36 Controller→SDRAMCTRL 1 1 4*/5 8*/9 4*/5 8*/9 Controller→SDRAM

CK 1 1 2 6 2 6 Controller→SDRAM* Loading without ECC

Valid TransfersNet ClassDimms/Net Class Loads/SoDimm Total Load variation

Table 2: DDR2 SoDIMM Loading and Transfers

Min Max Min Max Min MaxDQ 1 2 1 2 1 4 Controller↔SDRAM

DQS 1 2 1 2 1 4 Controller↔SDRAMDM 1 2 1 2 1 4 Controller→SDRAM

ADDCMD 1 2 4 16 4 32 Controller→SDRAMCTRL 1 1 4 8 4 8 Controller→SDRAM

CK 1 1 2 8 2 8 Controller→SDRAM

Valid TransfersLoads/SoDimm Total Load variation

Net ClassDimms/Net Class

DDR2 SoDIMM Raw Cards and Populations There are six basic SoDIMM memory module designs. These are referred to as JEDEC Raw Cards A-F (RCA-RCF). Table 3 shows the configuration of each of these modules. The differences between the modules are the SDRAM width, stacked devices, component count, and number of ranks.

Table 3: DDR2 SoDIMM Raw Card Configurations

DDR2 SoDimm Raw Card SDRAM Width # SDRAMs # RanksA x16 8 2B x8 8 1C x16 4 1D x8 (stacked) 16 2E x8 16 2F x8 16 2

For a one-slot implementation, it is easy to see that there are six unique cases that need to be analyzed. For a two-slot system, there are 48 unique populations that need to be analyzed for complete analysis coverage as is shown in Table 4.

Table 4: DDR2 SoDIMM two-slot populations

Population Name Slot 1 Slot 2 Population Name Slot 1 Slot 2 Population Name Slot 1 Slot 2mb_2slot_RCA_Empty RCA Empty mb_2slot_RCC_Empty RCC Empty mb_2slot_RCE_Empty RCE Emptymb_2slot_RCA_RCA RCA RCA mb_2slot_RCC_RCA RCC RCA mb_2slot_RCE_RCA RCE RCAmb_2slot_RCA_RCB RCA RCB mb_2slot_RCC_RCB RCC RCB mb_2slot_RCE_RCB RCE RCBmb_2slot_RCA_RCC RCA RCC mb_2slot_RCC_RCC RCC RCC mb_2slot_RCE_RCC RCE RCCmb_2slot_RCA_RCD RCA RCD mb_2slot_RCC_RCD RCC RCD mb_2slot_RCE_RCD RCE RCDmb_2slot_RCA_RCE RCA RCE mb_2slot_RCC_RCE RCC RCE mb_2slot_RCE_RCE RCE RCEmb_2slot_RCA_RCF RCA RCF mb_2slot_RCC_RCF RCC RCF mb_2slot_RCE_RCF RCE RCFmb_2slot_RCB_Empty RCB Empty mb_2slot_RCD_Empty RCD Empty mb_2slot_RCF_Empty RCF Emptymb_2slot_RCB_RCA RCB RCA mb_2slot_RCD_RCA RCD RCA mb_2slot_RCF_RCA RCF RCAmb_2slot_RCB_RCB RCB RCB mb_2slot_RCD_RCB RCD RCB mb_2slot_RCF_RCB RCF RCBmb_2slot_RCB_RCC RCB RCC mb_2slot_RCD_RCC RCD RCC mb_2slot_RCF_RCC RCF RCCmb_2slot_RCB_RCD RCB RCD mb_2slot_RCD_RCD RCD RCD mb_2slot_RCF_RCD RCF RCDmb_2slot_RCB_RCE RCB RCE mb_2slot_RCD_RCE RCD RCE mb_2slot_RCF_RCE RCF RCEmb_2slot_RCB_RCF RCB RCF mb_2slot_RCD_RCF RCD RCF mb_2slot_RCF_RCF RCF RCFmb_2slot_Empty_RCA Empty RCA mb_2slot_Empty_RCC Empty RCC mb_2slot_Empty_RCE Empty RCEmb_2slot_Empty_RCB Empty RCB mb_2slot_Empty_RCD Empty RCD mb_2slot_Empty_RCF Empty RCF

DDR2 Memory Enhancements At this point, we have reviewed the basic operation of the DDR memory architecture and explored the large number of variables in loading and populations that must be examined to prove reliable system operation. Moving on to DDR2, numerous enhancements and innovations in technology and methodology were required to achieve performance gains. On the technology side, there were changes to power, packaging, termination, circuit design, and memory module topologies. The VDDQ power rail was reduced from a nominal value of 2.5V to 1.8V and the associated switching thresholds were scaled accordingly (Figure 3 and Figure 4). This benefited overall power consumption but reduced noise margins. The packaging changed to an FBGA style package for enhanced electrical performance. An optionally differential DQS was added to reduce skew uncertainty. On-die termination (ODT) was added to improve signal quality and reduce component count. Standard memory module topologies and terminations were optimized for ADDCMD and CTRL net classes to mitigate the effects of ISI and resonance, which were amplified in unbalanced populations (Table 4). On the methodology side, two major challenges needed to be addressed. First, slew rate derating tables were introduced to allow more accurate prediction of setup and hold margins based on the knowledge of the actual receiver response to waveforms at the receiver. Accurate analysis requires that the methodology be able to capture the slew rate of each edge and apply the appropriate slew rate derating on an edge-by-edge case. Second, ODT introduces even more combinations of events that must be simulated. To be able to fully examine DDR2 interfaces, a comprehensive, accurate, and automated analysis methodology that can analyze all boundary condition cases for system level noise, waveform quality, and timing margin is required. To clearly communicate the enhancements of DDR2, this paper will focus on three of the enhancements that affect designers: on-die termination, slew rate derating, and ADDCMD/CTRL topology and termination. A discussion of signal integrity and timing analysis methodology for DDR2 will also be reviewed. On-Die Termination (ODT) In DDR1, signal termination was accomplished with discrete resistor terminations on the motherboard. For DDR2, the DQ, DM, and DQS nets all utilize ODT. JEDEC’s design analysis of various memory configurations determined the desired termination programmability to be 50Ω, 75Ω, 150Ω, and ‘off’ for both controller and memory receivers. This termination is in addition to the 22Ω series resistor on all memory module DQ, DM, and DQS nets. ODT functionality is not available for ADDCMD or CTRL nets in DDR2. ODT allows a number of variable termination settings to be made through controller and memory device ODT selection and enablement. This is very desirable, as motherboard/memory module configurations for data nets can vary depending upon the architecture one may need to support. An example of this would be a 4 module system with 1 rank of memory per module versus a 2 module system with 2 ranks of memory on each module. Each would yield a total of 4 memory devices on the net, but the location of the memory devices could vary dramatically, thus requiring completely different ODT enablement to optimize signal integrity.

During write operations, ODT is turned off at the controller and enabled at a memory that is not being written to (if one exists). Be aware that the signal integrity on the receiver with active ODT can be poor, but this is generally not of interest for timing since the receiver is not active. One does, however, need to check all receivers, including those that have ODT enabled, to insure that we do not violate voltage overshoot thresholds that could damage the memory input structure. Table 5 provides the recommended ODT setting for SDRAMs and the controller based on SDRAM location, system population, and write destination.

Table 5: Recommended ODT Matrix for Write operation

Front Side Back Side Front Side Back SideSlot 1 No Term No Term No Term 50 or 75 ohm No TermSlot 2 No Term 50 or 75 ohm No Term No Term No TermSlot 1 No Term No Term No Term 50 or 75 ohm EmptySlot 2 No Term 50 or 75 ohm No Term No Term EmptySlot 1 No Term No Term Empty 50 or 75 ohm No TermSlot 2 No Term 50 or 75 ohm Empty No Term No TermSlot 1 No Term No Term Empty 50 or 75 ohm EmptySlot 2 No Term 50 or 75 ohm Empty No Term Empty

2R / Empty Slot 1 No Term 150 ohm No Term Empty EmptyEmpty / 2R Slot 2 No Term Empty Empty 150 ohm No Term1R / Empty Slot 1 No Term 150 ohm Empty Empty EmptyEmpty / 1R Slot 2 No Term Empty Empty 150 ohm Empty

Write Configurations

Write toConfiguration

2R / 2R

2R / 1R

DQ Active-Term ResistanceDram at Slot 1 Dram at Slot 2Controller

1R / 2R

1R / 1R

During read operations, ODT termination is set to either 150Ω or 75Ω at the controller depending upon the memory configuration and which memory is driving. Similar to write operation, ODT is enabled at a memory that is not being read from (if one exists). As with writes, signals on the memory cards are not of any signal integrity and timing concern other than to insure that over voltage conditions do not exist. Table 6 provides the recommended ODT setting for SDRAMs and Controller based on SDRAM location, system population, and read transfer

Table 6: Recommended ODT Matrix for Read operation

Front Side Back Side Front Side Back SideSlot 1 150 ohm No Term No Term 50 or 75 ohm No TermSlot 2 150 ohm 50 or 75 ohm No Term No Term No TermSlot 1 150 ohm No Term No Term 50 or 75 ohm EmptySlot 2 150 ohm 50 or 75 ohm No Term No Term EmptySlot 1 150 ohm No Term Empty 50 or 75 ohm No TermSlot 2 150 ohm 50 or 75 ohm Empty No Term No TermSlot 1 150 ohm No Term Empty 50 or 75 ohm EmptySlot 2 150 ohm 50 or 75 ohm Empty No Term Empty

2R / Empty Slot 1 50 or 75 ohm No Term No Term Empty EmptyEmpty / 2R Slot 2 50 or 75 ohm Empty Empty No Term No Term1R / Empty Slot 1 50 or 75 ohm No Term Empty Empty EmptyEmpty / 1R Slot 2 50 or 75 ohm Empty Empty No Term Empty

Read Configurations

Read fromDQ Active-Term Resistance

Controller Dram at Slot 1 Dram at Slot 2

1R / 2R

1R / 1R

2R / 2R

2R / 1R

Configuration

The use of ODT greatly reduces component count on the motherboard and simplifies system implementation. The tradeoff is that the designer must now keep track of more variables in their simulations. Design methodology is critical to obtaining accurate and repeatable results. Traditional simulation methodologies perform simulations on all driver/receiver combinations. This generates unneeded simulation results. The ability to specify driver/receiver configurations is a significant improvement. This reduces simulation time by eliminating the unneeded simulations and eliminates the need for data management of unwanted cases. Slew Rate Measurements and Derating Slew rate derating tables were introduced in DDR2 to allow more accurate prediction of setup and hold margins. They are based on the knowledge of how the intrinsic delay of a receiver varies as a function of the input receiver waveform. At a high level, the challenge for any chip is to specify setup and hold requirements with respect to a common switching point of reference on both the input signal and its associated clock. A common switching point of reference consists of a reference location (e.g. pad of the receiver), reference input waveform, and one or more reference thresholds from which minimum and maximum delays are to be measured. The intrinsic delays of a receiver are then determined through simulation using the common switching point of reference and are factored into setup and hold requirements (Figure 6).

Figure 6: Intrinsic Receiver Delay

VDD/2Ref Threshold

Intrinsic Receiver Delay

High Threshold

Low Threshold

Ref Slew-Rate

The accounting problem arises when the actual input waveforms do not match the reference input waveform. The reference input waveform is typically a ramp with a constant slew rate (e.g. 1v/ns). However, actual input waveforms may have different slew rates or plateaus in the transition region defined by the reference thresholds. These variations from the reference input waveform will directly affect the intrinsic receiver delays and thus the setup and hold requirements. The DDR2 slew rate derating tables attempt to define the receiver intrinsic delay variations as a function of slew rate variations from the common switching point of reference. Application of these tables allows us to essentially map an eye diagram at the pad of a receiver to an eye diagram that would result at the output of the receiver in the core of the chip. We refer to the eye diagram at the core generated from the eye diagram at the pad utilizing the slew derating tables as a virtual eye diagram.

How Slew Rate Derating Works For setup derating, DDR2 slew rate measurements are made from Vref to VihAC and from Vref to VilAC for rising and falling ADDCMD/CTRL/DQ/DM/DQS signals respectively. The linear line plotted from the Vref threshold to the respective VinAC thresholds is used to calculate the signal slew rate and is called the nominal slew rate for the transition or nominal waveform. The actual waveform must then be compared to the nominal waveform. If the actual waveform exceeds the nominal waveform for all points on the waveform, then the nominal waveform is considered to represent the signal slew rate and is used to determine the setup derating from the associated derating table (Figure 7). If the actual waveform crosses the nominal waveform, a tangent line that crosses thru the VihAC (rising edge) or VilAC (falling edge) thresholds must be defined (Figure 8). The slew rate of this tangent line is then used to determine the setup derating. In a similar manner, hold slew rate derating is also performed. The primary difference is that slew rate measurements are done from VilDC to Vref or VihDC to Vref for rising and falling signals respectively.

Figure 7: Linear Line Slew Rate Approximation

Figure 8: Tangent Line Slew Rate Approximation

In DDR2, interconnect delays are measured to the VinDC and VinAC thresholds for minimum and maximum delays respectively. These delays must then be compensated for by slew rate as described previously. As the actual receiver waveform slew rate increases above 1V/ns, additional derating time must be factored into the analysis to account for the additional time required for the receiver to respond. Similarly, as the slew rate decreases below 1V/ns, time must be prorated (negative derating) to account for the faster receiver response than theorized by the slew rate. Slew rate derating is performed on the differential CK in addition to ADDCMD, CNTRL, DQ, DM, and DQS. This can be seen for ADDCMD in the derating table shown in Figure 9 or similar tables for the other net classes.

Figure 9: Setup and Hold Derating Table

Derating/Prorating the setup and hold requirements of a device is a fairly alien concept to the industry at large, but in reality it makes very good sense and is absolutely necessary for more accurate timing margin prediction. Unfortunately, these tables, while always conservative and an improvement over standard approaches, still leave margin on the table. We believe a more accurate charge model approach, which will use charge area to determine device switching, will be required for future technologies.

ADDCMD/CTRL Topology and Termination During the development of the DDR2 SoDIMM memory modules, simulations on routing topologies and termination for ADDCMD and CTRL nets were performed to optimize the eye aperture. These evaluations were useful in identifying several improvements over DDR1 topologies. Two examples of these improvements are: 1) the addition of a resonance dampening stub resistor on the memory module at the edge connector, and 2) On-DIMM topology balancing and load etch balancing to reduced DIMM variations. Stub Resistor Development With the assistance of simulation we can see that the stub resistor actually performs two functions. Primarily, it isolates one module from the other. Secondarily it acts as a damping resistor or wave shaping resistor between the driver and the receiver. The original DDR1 designs did not have this structure and this advance in termination technique led to an improvement in the address and command bus performance under mixed loading configurations in two module systems. What is unique in the DDR2 application of this stub resistor is how the value of the resistor changes as DIMM loading increases. With original DDR1, when a larger number of memory devices were used, a smaller resistor was placed on the module with the most memories. This made sense especially when we see that as this value increases the apertures became smaller in any given single module system. What was discovered turned out to be a radical change over existing concepts. Looking at the structure from strictly the point of energy distribution, a larger stub resistor value was placed on the heavily loaded module (RCD). As the value of the stub resistor increased, the ability of the RCD module to absorb energy slowed, leaving more energy for the lightly loaded module (RCC) and thus opening up the aperture size and increasing the slew rates. Figure 10 shows the eye opening at the RCC when the DDR1 style of termination was employed (small resistor at RCD, large resistor at RCC). Figure 11 shows the improvement in eye opening utilizing the new DDR2 termination strategy.

Figure 10 DIMM Eye opening with DDR1 Style Termination

Figure 11 DIMM Eye Opening with Improved DDR2 Style Termination

The resistor value had to be a minimum of 10Ω to realize this and the eye aperture continued to improve as the resistor value on the more heavily loaded DIMM increased. Depending upon the strength of the driver used, increasing of the stub resistor value improved the aperture size by anywhere from approximately 1-2.5 ns. This was a huge improvement over previous techniques and provided a solution that worked well for mismatched module combinations from 4 -16 parts. The advantages of this differing technique in changing stub resistor values (so the smaller resistor goes on the lightly loaded module and the larger resistor goes on the heavier loaded module) are very important in achieving acceptable performance in an unbuffered, multi-drop, two module system. This technique significantly reduces the resonance problems created by mismatched modules with little or no loss in margins in the systems where more balanced modules would be used together. Resonance at all frequencies has not been completely eliminated, but the frequency distribution analysis provides adequate information to show that this solution is very good up to around 500-600 MHz (much above any 1T DDR2 addressing frequencies). The frequency distribution resonances that do occur around 800 MHz are not all that bad by comparison to the previous termination techniques, which were failing at primary frequencies for DDR2 addressing. DDR2 ADDCMD Topology For DDR1 designs, routing on the DIMM modules was not balanced between the loads. When devices of various configurations were used together, reflections on the bus (sometimes referred to as resonance on the bus) caused reductions in eye apertures. Figure 12 shows a typical DDR1 topology and the resulting eye characteristics.

Figure 12: Example of unbalanced (T-daisy) topology (used on DDR1 modules)

Connector

DRAM7

DRAM8

DRAM6

DRAM5

350350

150 150 150150

DRAM3

DRAM4

DRAM2

DRAM1

150 150 150150

127 34Rstub

572 572

For DDR2 SoDIMMs, RCA-RCD, the address and command net topologies for each DIMM type were designed with balanced (symmetrical) routing. Additional benefits from the balanced topology were to more closely match the loaded flight time arrival from memory to memory. This affords greater timing margin for each of the respective busses. Better symmetry reduces the signal jitter and can make a huge improvement in reducing non-monotonic signal behavior that is present on most SDRAM and DDR1 memory module address busses (which primarily used T-daisy structures) Figure 13 shows a typical DDR2 topology and the resulting eye characteristics. The implementation of balanced routing and the isolation effects of the stub resistor have proven an effective combination for increasing DDR2 SoDIMM system performance.

Figure 13: Example of balanced (symmetrical) topology (used on DDR2 modules)

Connector

DRAM7

DRAM8

DRAM6

DRAM5

350350

150 150 150150

DRAM3

DRAM4

DRAM2

DRAM1

150 150 150150

127 34R stub

400 400 400400

I/O Buffer Modeling Challenges “Garbage in, garbage out” is a saying that should not be taken lightly when it comes to simulation of high speed signaling. At DDR2-667 and DDR2-800, tens of picoseconds become significant when closing the timing budget. In order to ensure accurate simulation of these high-speed memory busses, it is essential to have highly accurate and reliable simulation models of drivers and receivers. The spice model is the most accurate, provides a detailed model of actual device characteristics, and allows for any process, voltage, or temperature variation. This is typically at the expense of simulation performance and can prove to be impractical for modeling all of the cases that need to be examined, thus it is highly desirable to have IBIS models. Validated IBIS models closely match the spice models they are derived from, offer higher simulation performance, are prevalent in the industry, supported by multiple simulators, and protect critical IP. The challenge encountered with IBIS was modeling of the ODT circuits. Many questions were raised about the best way to accurately represent the ODT circuit characteristics while at the same time creating a model that would be compatible with most simulation software programs. The investigation led to three possible solutions, but one clear winner. The first method is to model the full I/O characteristics with the normal pullup, pulldown, power clamp, and ground clamp I-V curves (no ODT). ODT DC circuit behavior can then be separately modeled by I-V curves showing the relationship between current in the termination versus voltage applied to the signal pin. This I-V relationship includes not only the characteristics of the termination circuit, but also the characteristics of power and ground clamping structures. Separating the I-V characteristics of the termination from the clamping diodes requires subtraction of the power and ground clamp I-V characteristics from the ODT-enabled I-V characteristics. The end result is 6 separate I-V curve tables in the IBIS model. The only way to include more than just the standard 4 tables of characteristics for an I/O model is through use of the IBIS [Submodel] keyword. The [Submodel] keyword can be used to tell a simulator to sum together the characteristics of the clamp diodes with the characteristics of the termination circuit when the DQ circuit is in a non-driving state. This is where a potential compatibility issue comes into play. The [Submodel] keyword is part of the IBIS 3.2 standard, but not all simulation programs support this keyword. Given that the model must work across the broadest set of simulators, this is not a viable alternative. The second method of ODT modeling is to create a model for the DQ with ODT disabled that would include the usual four curves. A separate model is created for the DQ in the input state with ODT enabled. This model would have a power clamp curve including half of the termination characteristics and a ground clamp curve including the other half of the termination characteristics. See Figure 14 and Figure 15 for an example of this method. Note that the best method for splitting the termination characteristics between a power clamp curve and a ground clamp curve is to split the I-V curve up where the current goes to zero. For the center termination structure of DDR2, the value would typically be VTT (or VDDQ/2). The drawback of this solution is that separate models are used when the driver is an input verses an output. For tools using a schematic front end, 2 schematics would need to be created, one with the cell as an input and the other as an output. As the number of loads on the bus increases, the number of schematics that need to be managed increases.

The third and preferred method is to model the I/O characteristics with ODT enabled using only the 4 normally included pullup, pulldown, power clamp, and ground clamp I-V curves. This creates an I/O model that shows normal output driver characteristics along with ODT characteristics when the DQ is switched into the input state. The ODT characteristics are lumped in with the power and ground clamp tables. This is fine when the model is set as an input, because in input mode, one would always see the termination and the diode characteristics at the same time (Figure 14 and Figure 15). However, the pullup and pulldown data must be manipulated, because in the output mode, the power and ground clamp characteristics are always summed together with the pullup or pulldown characteristics by the simulation tool. For this reason, the ODT characteristics must be subtracted from both the pullup and pulldown data. This creates pullup and pulldown curves that don’t look like normal transistor curves by themselves (Figure 16 and Figure 17), but in reality are fine when combined with the clamping curves (Figure 18 and Figure 19).

Figure 14: 50Ω ODT model ground clamp I-V curve

Figure 15: 50Ω ODT model power clamp I-V curve

Figure 16: DQ model Pulldown I-V curve with 50Ω ODT characteristics subtracted

Figure 17: DQ model Pullup I-V curve with 50Ω ODT characteristics subtracted

Figure 18: DQ model Combined Pulldown/Clamp I-V curve with 50Ω ODT

Figure 19: DQ model Combined Pullup/Clamp I-V curve with 50Ω ODT

DDR2 Analysis Methodology Requirements The analysis of DDR2 systems is very complex and requires a comprehensive analysis methodology that allows you to support the ODT tables defined in the On-Die Termination (ODT) section (Table 5 and Table 6), slew rate derating tables defined in the Slew Rate Measurements and section (Figure 9), and the configurations and populations as defined in the Overview of DDR Memory section (Table 3 and Table 4). This analysis methodology must support simultaneous signal integrity and timing analysis in order to allow design tradeoffs between waveform quality and timing to be made efficiently. In addition, the tool must be able to support the effects of crosstalk and its associated impacts. Below is a list of other key requirements:

• Automated parametric solution space sweeping of topology, termination, drivers, and receivers. • Support for pseudo-random bit sequence (PRBS) patterns. • Automated waveform and eye diagram processing supporting AC, DC, and Vref thresholds and

slew rate extraction. • Proper extraction of interconnect delays relative to a standard load. • Simulation across process, voltage, and temperature. • Source synchronous timing on rising and falling edges with proper accounting of correlative and

non-correlative affects. • Frequency agile timing models. • Parameterized and programmable IBIS and timing models.

DDR2 Analysis Results

In general, Micron has required a mixture of simulation tools including internally developed eye diagram analysis software and numerous Excel macro scripts to perform downstream analysis of simulation engine results for margin analysis and signal integrity evaluation. During the development cycle for the DDR2 DIMMs, work began in earnest with SiSoft’s SiAuditor/Quantum-SI tool suite at Micron. SiSoft worked closely with Micron to expand their already comprehensive software and simulation methodologies to include the new features required by DDR2. Working together, Micron and SiSoft expanded the software feature sets to include even greater user interface capabilities for waveform and margin analysis that ported easily into standard office products to produce documentation for sharing simulation results with JEDEC and industry partners.

Micron continues to subscribe to a multiple simulation tool environment, but has found that SiSoft is an extremely customer pro-active company with emerging technological innovations and advanced concepts that bring a higher level of analysis capability, accuracy, and productivity. The following simulation results were accomplished completely within the framework of this single tool environment.

Slew Rate Derating Figure 20 shows the results for the address bit of an actual memory design running at 800Mbs. Based upon the defined AC and DC thresholds, a valid eye opening of 3.6ns is found. However, it is clear from the figure that the signal is non-monotonic within the switching region. The Vref to AC (derates setup) and DC to Vref (derates hold) signal slew rates range from 0.11V/ns to 0.544V/ns and from 0.3V/ns to 1.4V/ns respectively. To calculate the Virtual Eye opening, we use the min slew rate for the setup adjustment and the max slew rate for the hold adjustment. Referring to Figure 9, and assuming a 2V/ns CK, we see that a 0.1V/ns slew rate will add 1000ps to the eye while a 1.4V/ns slew will add 15pS. This resulted in a 1.015ns improvement in the eye beyond the 3.6ns measured at the driver pad and we would predict a 4.615ns eye at the receiver output. Figure 21 shows the eye opening at the receiver output for the same simulation run to generate Figure 20. The measured eye at the receiver output was 4.75ns, which correlates well with the predicted eye of 4.615ns. Without the slew rate adjustment, over 1ns of margin would have been lost.

Figure 20 At Pad

Figure 21 At Core

DDR2 800Mbs For simulation, a DDR2 design was constructed using a standard 2 DIMM memory topology. In one DIMM slot, a DIMM with 4 loads (RCA) was inserted while the other DIMM slot had a 16 load (RCD) inserted. This case was determined by JEDEC to be the worst-case population. Simulations were run for all net classes and the timing margins for ADDCMD and DQ are shown in Table 7.

Table 7: ADDCMD/DQ mb_2slot_rcc_rcd Timing Margins

Setup Margin

(ns)

Hold Margin

(ns) Net Class0.351 0.716 addcmd_mb_2slot_rcc_rcd0.013 0.131 dq_mb_2slot_rcc_rcd (Write)0.081 0.123 dq_mb_2slot_rcc_rcd (Slot 1 Read)0.059 0.066 dq_mb_2slot_rcc_rcd (Slot 2 Read)

Figure 22 and Figure 23 show the representative eye diagrams for the DQ with aperture and associated DQS across fast and slow process corners for writes and reads respectively. Figure 24 shows the representative eye diagrams for ADDCMD across fast and slow process corners at worst-case RCC and RCD locations.

Figure 22: DQ Write with DQS, Fast and Slow Corners

Figure 23: DQ Read with DQS, Fast and Slow Corners

Figure 24: ADDCMD Eye Diagrams at RCC/RCD, Fast and Slow Corners

Summary This paper has shown that 800Mbs DDR2 interfaces can be designed with margin. Successful design requires a comprehensive analysis methodology that can incorporate the large number of variables associated with populations, ODT settings, and DIMM configuration. We believe the importance of this cannot be understated. In addition, slew rate derating was shown as an important concept for the successful implementation of DDR2 designs and must be included as part of any design methodology.

Date post:	11-Apr-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Features and Implementation of High- Performance 667Mbs and … · 2010-08-02 · solution space...

Documents