+ All Categories
Home > Documents > ISSCC 2010 / SESSION 20 / NEXT-GENERATION OPTICAL &...

ISSCC 2010 / SESSION 20 / NEXT-GENERATION OPTICAL &...

Date post: 18-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
3
370 2010 IEEE International Solid-State Circuits Conference ISSCC 2010 / SESSION 20 / NEXT-GENERATION OPTICAL & ELECTRICAL INTERFACES / 20.6 20.6 A 32mW 7.4Gb/s Protocol-Agile Source-Series- Terminated Transmitter in 45nm CMOS SOI Wayne D Dettloff 1 , John C Eble 1 , Lei Luo 1 , Pravin Kumar 2 , Fred Heaton 1 , Teva Stone 1 , Barry Daly 1 1 Rambus, Chapel Hill, NC, 2 Rambus, Bangalore, India Source-series-terminated (SST) transmitters consume ¼ the output stage power of CML drivers [1], but their adoption in industry-standard multi-protocol SerDes has been stunted by difficulties in achieving flexible swings, constant current equalization, and supporting DC-coupled voltage standards drafted with CML in mind. Fundamentally, CML drivers separate the termination control from the switching devices, allowing current summing techniques to implement out- put level control and transmitter equalization (EQ). In this paper, the architecture and circuits of an equally flexible SST transmitter is presented that overcomes these challenges through the use of ground regulation, P-to-N shunting legs, and partially weighted segments. The clocks and datapath dissipate 32mW at 7.4Gb/s with an 800mV differential swing. Target protocols include PCIe Gen1/2, XAUI, Fibre Channel (FC) 1/2/4, CEI6 SR and SATA 1/2. Figure 20.6.1 shows the block diagram of the transmitter. The entire datapath, including the clocks, switches between Vtt and a regulated supply, Vs, adjustable between 780 and 920mV below Vtt. The reduced swing allows thin-oxide devices to be used in the output stages with a 1.0 to 1.65V Vtt. By referencing the regu- lated supply to Vtt, the design is compliant to protocols specifying CML voltage levels (e.g., CEI6). The regulator minimizes the amount of impedance adjustabil- ity required and supply-induced jitter. Since the entire datapath operates at this reduced level, less power is consumed and requisite level shifting is done in the parallel domain, well in advance of jitter critical stages. A 1/12 replica of the driver is used to calibrate a single slice. Separate calibra- tion codes are generated for the pull-up (PU) and pull-down (PD) paths by com- paring the series combination of the resistor (a tight tolerance across PVT reduces the required adjustability range) and transistors to an external reference. These codes are applied to the final stage of the pre-driver using delay-match- ing generic gates for NAND and NOR functions. These gates also are used to bypass the clocks in AC-JTag and beacon modes. Impedance adjust is orthogo- nal to the level/EQ control. Previously, SST drivers have required such a high power, current variation and area penalty for level/EQ control that they are often unimplemented [2], thereby sacrificing compliance for a number of protocols. Two EQ design challenges for SST drivers are 1) mitigating the data-dependent supply current variation and 2) achieving enough granularity to meet exact specification requirements across different packages. Traditionally [3, 4], an impedance-matched equalizing level is accomplished by replacing one PU/PD slice with its opposite at the full data rate. For example, one can achieve a level reduction of about 1.58dB by applying opposite data to one slice of a 12-slice output stage. However, this technique creates undesired data-dependent supply current fluctuations that increase with larger EQ settings. Furthermore, only 5 useful settings are available, (-1.58 to -15.56dB) corresponding to the swapping of 1 to 5 PU/PD legs. By employing shunting devices between the differential output pins (Fig. 20.6.2) to implement lower swings, power is saved and a near constant current draw from the output supply is achieved. (Theoretically, a set of PU, PD and shunt impedances exists that consumes equal supply current for each output level while maintaining proper differential and common-mode termination.) In addition, replacing a PU/PD slice with a shunt slice in the same 12-slice transmitter reduces the swing by 0.76dB or half of the classical method, thereby doubling the available set- tings. By dynamically replacing a PU or PD slice with a shunt slice of equivalent imped- ance, different output level/EQ settings are obtained. This work implements a single-tap equalizer but a multi-tap design would only require changing the data streams into the pre-driver. By inspection, the differential impedance is unchanged with the substitution of shunting slices. Maintaining the AC com- mon-mode impedance requires a good AC ground on the node between the shunting devices. FC requires a -12dB common-mode limit at 50MHz which sets the minimum capacitance on this node. PCIe requires transmitter EQ settings of -3.5 and -6dB ±0.5dB and additional settings for margining (Gen2). Achieving these specifications without shunt slices requires a large number of slices (44 in [3]) or a division of segments (22×5 in [4]). More slices translates to more pre-driver circuitry, clock load, half- data-rate power and pin capacitance. With just twelve 600Ω “shunt-able” slices, these settings can be achieved with an integral number of slice substitutions. Still, some applications may require more granularity to keep within the tight tol- erance, particularly if package loss is considered; therefore, this work adopted a “partially weighted” slice scheme. Instead of uniform 600Ω slices, the slices are grouped by data stream and designed for different values between 526 and 702Ω. The transistors in the output stage and pre-driver are identical across the slices to maintain PVT matching. The widths of the resistors are also identical so that the variation is based solely on the ratio of lengths. A subset of the 496 dif- ferent level/EQ settings for AC-coupled mode and one regulator setting is shown in Fig. 20.6.3, including some reduced SATA swings. Since only 12 slices con- nect to the pins, no T-coil is necessary. Even with robust ESD and ancillary cir- cuits (RX detect, fault control, etc.), ~500fF of output capacitance was predicted and confirmed with S 11 measurements. The supply sensitivity and latency from the final clock edge to the package pin determine the contribution of the driver to the overall transmitter jitter. Two gate delays balance the transitions and allow fine tuning of the capacitance in the final stage to match timing with only a single transistor in the output path (Fig. 20.6.4). Transmitters can select a full-rate clock from two independent, central LC-PLLs (Fig. 20.6.5). Optimizing jitter performance including duty- cycle-distortion and power-supply-induced jitter is critical to meet 0.25UI @ 10 -12 and 0.3UI @ 10 -15 output jitter for PCIe and CEI6, respectively. To minimize reflections and standing waves, the global clock distribution is designed as a ter- minated transmission line. Duty-cycle-correction on the full-rate clocks reduces accumulated distortion. Given clock frequency information, the slew-rate control circuit maximizes the clock slew with enough clock swing to provide phase rotat- ing linearity and maintain good power supply noise rejection. AC coupling is used to reduce the half-rate clock duty-cycle distortion. Finally, a calibrated closed-loop duty-cycle-correction circuit further reduces the remaining distor- tion by sampling the output of the clock buffer. Chips with ×4 and ×8 configurations of transceivers have been fabricated and fully characterized. Figure 20.6.6 shows a 7.4Gb/s eye diagram with -2.6dB pre- emphasis to overcome board and cable attenuation. The total jitter is 209.6mUI or 28.3ps at 10 -12 BER (or 224.7mUI at 10 -15 ). The measured power efficiency is 4.32mW/(Gb/s). Figure 20.6.7 is a micrograph of the chip between the two transmitter bumps. Acknowledgements: Todd Forkner, Grant Holst, Renu Rangnekar, Sangeeth George, Michael Bucher, Ravi Kollipara, Will Ng, John Poulton, Farhad Zarkeshvari, Prateek Goyal, Kashinath Prabhu, Vijay Khawshe, Wes Ficken and Bill Cornwell. References: [1] H. Hatamkhani, K-L. J. Wong, R. Drost, C-K. K. Yang, “A 10mW 3.6Gbps I/O Transmitter,” Symp. VLSI Circuits, pp. 97-98, Jun., 2003. [2] J. Poulton, R. Palmer, W. Dally, et al., “A 14-mW 6.25Gb/s Transceiver in 90- nm CMOS,” IEEE J. Solid-State Circuits, vol. 42, no. 2, pp. 2745-2757, Dec., 2007. [3] C. Menolfi, T. Toifl, P. Buchmann, et al., “A 16Gb/s Source-Series-Terminated Transmitter in 65nm CMOS SOI,” ISSCC Dig. Tech. Papers, pp. 446-447, Feb., 2007. [4] M. Kossel, C. Menolfi, J. Weiss, et al., “A T-Coil-Enhanced 8.5Gb/s High- Swing Source-Series-Terminated Transmitter in 65nm Bulk CMOS,” ISSCC Dig. Tech. Papers, pp. 110-111, Feb., 2008. 978-1-4244-6034-2/10/$26.00 ©2010 IEEE
Transcript
Page 1: ISSCC 2010 / SESSION 20 / NEXT-GENERATION OPTICAL & …web.mit.edu/Magic/Public/papers/05433825.pdf · The supply sensitivity and latency from the final clock edge to the package

370 • 2010 IEEE International Solid-State Circuits Conference

ISSCC 2010 / SESSION 20 / NEXT-GENERATION OPTICAL & ELECTRICAL INTERFACES / 20.6

20.6 A 32mW 7.4Gb/s Protocol-Agile Source-Series-Terminated Transmitter in 45nm CMOS SOI

Wayne D Dettloff1, John C Eble1, Lei Luo1, Pravin Kumar2, Fred Heaton1,Teva Stone1, Barry Daly1

1Rambus, Chapel Hill, NC, 2Rambus, Bangalore, India

Source-series-terminated (SST) transmitters consume ¼ the output stagepower of CML drivers [1], but their adoption in industry-standard multi-protocolSerDes has been stunted by difficulties in achieving flexible swings, constantcurrent equalization, and supporting DC-coupled voltage standards drafted withCML in mind. Fundamentally, CML drivers separate the termination control fromthe switching devices, allowing current summing techniques to implement out-put level control and transmitter equalization (EQ). In this paper, the architectureand circuits of an equally flexible SST transmitter is presented that overcomesthese challenges through the use of ground regulation, P-to-N shunting legs,and partially weighted segments. The clocks and datapath dissipate 32mW at7.4Gb/s with an 800mV differential swing. Target protocols include PCIe Gen1/2,XAUI, Fibre Channel (FC) 1/2/4, CEI6 SR and SATA 1/2.

Figure 20.6.1 shows the block diagram of the transmitter. The entire datapath,including the clocks, switches between Vtt and a regulated supply, Vs, adjustablebetween 780 and 920mV below Vtt. The reduced swing allows thin-oxide devicesto be used in the output stages with a 1.0 to 1.65V Vtt. By referencing the regu-lated supply to Vtt, the design is compliant to protocols specifying CML voltagelevels (e.g., CEI6). The regulator minimizes the amount of impedance adjustabil-ity required and supply-induced jitter. Since the entire datapath operates at thisreduced level, less power is consumed and requisite level shifting is done in theparallel domain, well in advance of jitter critical stages.

A 1/12 replica of the driver is used to calibrate a single slice. Separate calibra-tion codes are generated for the pull-up (PU) and pull-down (PD) paths by com-paring the series combination of the resistor (a tight tolerance across PVTreduces the required adjustability range) and transistors to an external reference.These codes are applied to the final stage of the pre-driver using delay-match-ing generic gates for NAND and NOR functions. These gates also are used tobypass the clocks in AC-JTag and beacon modes. Impedance adjust is orthogo-nal to the level/EQ control.

Previously, SST drivers have required such a high power, current variation andarea penalty for level/EQ control that they are often unimplemented [2], therebysacrificing compliance for a number of protocols. Two EQ design challenges forSST drivers are 1) mitigating the data-dependent supply current variation and 2)achieving enough granularity to meet exact specification requirements acrossdifferent packages. Traditionally [3, 4], an impedance-matched equalizing level isaccomplished by replacing one PU/PD slice with its opposite at the full data rate.For example, one can achieve a level reduction of about 1.58dB by applyingopposite data to one slice of a 12-slice output stage. However, this techniquecreates undesired data-dependent supply current fluctuations that increase withlarger EQ settings. Furthermore, only 5 useful settings are available, (-1.58 to -15.56dB) corresponding to the swapping of 1 to 5 PU/PD legs. By employingshunting devices between the differential output pins (Fig. 20.6.2) to implementlower swings, power is saved and a near constant current draw from the outputsupply is achieved. (Theoretically, a set of PU, PD and shunt impedances existsthat consumes equal supply current for each output level while maintainingproper differential and common-mode termination.) In addition, replacing aPU/PD slice with a shunt slice in the same 12-slice transmitter reduces the swingby 0.76dB or half of the classical method, thereby doubling the available set-tings.

By dynamically replacing a PU or PD slice with a shunt slice of equivalent imped-ance, different output level/EQ settings are obtained. This work implements asingle-tap equalizer but a multi-tap design would only require changing the data

streams into the pre-driver. By inspection, the differential impedance isunchanged with the substitution of shunting slices. Maintaining the AC com-mon-mode impedance requires a good AC ground on the node between theshunting devices. FC requires a -12dB common-mode limit at 50MHz which setsthe minimum capacitance on this node.

PCIe requires transmitter EQ settings of -3.5 and -6dB ±0.5dB and additionalsettings for margining (Gen2). Achieving these specifications without shuntslices requires a large number of slices (44 in [3]) or a division of segments(22×5 in [4]). More slices translates to more pre-driver circuitry, clock load, half-data-rate power and pin capacitance. With just twelve 600Ω “shunt-able” slices,these settings can be achieved with an integral number of slice substitutions.Still, some applications may require more granularity to keep within the tight tol-erance, particularly if package loss is considered; therefore, this work adopted a“partially weighted” slice scheme. Instead of uniform 600Ω slices, the slices aregrouped by data stream and designed for different values between 526 and702Ω. The transistors in the output stage and pre-driver are identical across theslices to maintain PVT matching. The widths of the resistors are also identical sothat the variation is based solely on the ratio of lengths. A subset of the 496 dif-ferent level/EQ settings for AC-coupled mode and one regulator setting is shownin Fig. 20.6.3, including some reduced SATA swings. Since only 12 slices con-nect to the pins, no T-coil is necessary. Even with robust ESD and ancillary cir-cuits (RX detect, fault control, etc.), ~500fF of output capacitance was predictedand confirmed with S11 measurements.

The supply sensitivity and latency from the final clock edge to the package pindetermine the contribution of the driver to the overall transmitter jitter. Two gatedelays balance the transitions and allow fine tuning of the capacitance in the finalstage to match timing with only a single transistor in the output path(Fig. 20.6.4). Transmitters can select a full-rate clock from two independent,central LC-PLLs (Fig. 20.6.5). Optimizing jitter performance including duty-cycle-distortion and power-supply-induced jitter is critical to meet 0.25UI @ 10-12 and 0.3UI @ 10-15 output jitter for PCIe and CEI6, respectively. To minimizereflections and standing waves, the global clock distribution is designed as a ter-minated transmission line. Duty-cycle-correction on the full-rate clocks reducesaccumulated distortion. Given clock frequency information, the slew-rate controlcircuit maximizes the clock slew with enough clock swing to provide phase rotat-ing linearity and maintain good power supply noise rejection. AC coupling isused to reduce the half-rate clock duty-cycle distortion. Finally, a calibratedclosed-loop duty-cycle-correction circuit further reduces the remaining distor-tion by sampling the output of the clock buffer.

Chips with ×4 and ×8 configurations of transceivers have been fabricated andfully characterized. Figure 20.6.6 shows a 7.4Gb/s eye diagram with -2.6dB pre-emphasis to overcome board and cable attenuation. The total jitter is 209.6mUIor 28.3ps at 10-12 BER (or 224.7mUI at 10-15). The measured power efficiency is4.32mW/(Gb/s). Figure 20.6.7 is a micrograph of the chip between the twotransmitter bumps.

Acknowledgements:Todd Forkner, Grant Holst, Renu Rangnekar, Sangeeth George, Michael Bucher,Ravi Kollipara, Will Ng, John Poulton, Farhad Zarkeshvari, Prateek Goyal,Kashinath Prabhu, Vijay Khawshe, Wes Ficken and Bill Cornwell.

References:[1] H. Hatamkhani, K-L. J. Wong, R. Drost, C-K. K. Yang, “A 10mW 3.6Gbps I/OTransmitter,” Symp. VLSI Circuits, pp. 97-98, Jun., 2003.[2] J. Poulton, R. Palmer, W. Dally, et al., “A 14-mW 6.25Gb/s Transceiver in 90-nm CMOS,” IEEE J. Solid-State Circuits, vol. 42, no. 2, pp. 2745-2757, Dec.,2007.[3] C. Menolfi, T. Toifl, P. Buchmann, et al., “A 16Gb/s Source-Series-TerminatedTransmitter in 65nm CMOS SOI,” ISSCC Dig. Tech. Papers, pp. 446-447, Feb.,2007.[4] M. Kossel, C. Menolfi, J. Weiss, et al., “A T-Coil-Enhanced 8.5Gb/s High-Swing Source-Series-Terminated Transmitter in 65nm Bulk CMOS,” ISSCC Dig.Tech. Papers, pp. 110-111, Feb., 2008.

978-1-4244-6034-2/10/$26.00 ©2010 IEEE

Page 2: ISSCC 2010 / SESSION 20 / NEXT-GENERATION OPTICAL & …web.mit.edu/Magic/Public/papers/05433825.pdf · The supply sensitivity and latency from the final clock edge to the package

371DIGEST OF TECHNICAL PAPERS •

ISSCC 2010 / February 10, 2010 / 11:15 AM

Figure 20.6.1: Source-series-terminated transmitter block diagram. Figure 20.6.2: Output drivers showing equalization circuitry.

Figure 20.6.3: Sample of equalization settings available for various swingsettings (dB).

Figure 20.6.5: Transmitter clock generation block diagram.Figure 20.6.6: Measured eye diagram - 7.4Gb/s at 200mV/div, Vtt=1.08V, 127bPRBS, 32mW.

Figure 20.6.4: Output-stage pull-up with pre-driver circuitry.

20

Page 3: ISSCC 2010 / SESSION 20 / NEXT-GENERATION OPTICAL & …web.mit.edu/Magic/Public/papers/05433825.pdf · The supply sensitivity and latency from the final clock edge to the package

• 2010 IEEE International Solid-State Circuits Conference 978-1-4244-6034-2/10/$26.00 ©2010 IEEE

ISSCC 2010 PAPER CONTINUATIONS

Figure 20.6.7: Micrograph of transmitter including output bumps.


Recommended