+ All Categories
Home > Documents > Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel...

Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel...

Date post: 21-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
3
446 2007 IEEE International Solid-State Circuits Conference ISSCC 2007 / SESSION 24 / MULTI-GB/s TRANSCEIVERS / 24.6 24.6 A 16Gb/s Source-Series Terminated Transmitter in 65nm CMOS SOI Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel Kossel, Thomas Morf, Jonas Weiss, Martin Schmatz IBM, Rueschlikon, Switzerland The quest for high data rates at low power consumption and area has renewed interest in the source-series terminated (SST) driv- er. While SST drivers may not necessarily boast better perform- ance than their counterparts using CML output stages, their advantage lies in their potential for lower power operation [1] and their ability to cope with a large range of termination volt- ages, which makes them a prime candidate for multi-standard TX. Given the increasing challenge in achieving acceptable ana- log performance in advanced digital CMOS technologies, the SST driver principle is based entirely on digital switching devices that are optimized for high-speed operation and continue to scale with technology. In this paper, the architecture and design of key com- ponents of a half-rate SST TX is presented that implements a ver- satile, power- and area-efficient equalization and impedance-tun- ing scheme. Furthermore, it achieves low jitter and negligible duty-cycle distortion (DCD), thanks to a clock duty-cycle cleanup circuit. Figure 24.6.1 shows a block diagram of the implemented differen- tial half-rate SST TX. All local clocks are derived from a global half-rate CML clock (ck2cml) and are converted to CMOS half- rate (ck2) and quarter rate (ck4) clock, respectively. The 4b quar- ter-rate input data d[0:3] is transformed in a first multiplexer stage to a half-rate interleaved even and odd data stream. Both even and odd data streams then pass a 4b shift register, that con- sists of 4 latches driven on opposing clock phases of the half-rate clock (ck2) and implements the delayed data taps of a 4-tap FIR pre-emphasis filter (tap[0:3][even, odd]). In order to set the sign of any pre-emphasis filter tap, each shift register latch output is followed by an XOR gate that selectively (sign[0:3]) inverts the corresponding filter tap. The resulting 4 even/odd (=8) half-rate tap data streams, along with the half-rate CMOS clock ck2, are then globally distributed to 44 identical differential half-rate SST driver slices, each of which can be configured to select one even/odd tap stream out of the available 4×2 tap data streams. Consequently, each SST driver slice can be assigned to any of the 4 FIR taps, which adds versatile power- and area-efficient equal- ization capabilities to the TX. Figure 24.6.2 depicts a schematic of a half-rate differential SST slice. Each slice is composed of 2 single-ended SST drivers to form a pseudo-differential output stage. One single-ended slice con- tains 2 pull-up/pull-down branches with the corresponding even and odd data and select transistors, along with a common pull- up/pull-down polysilicon linearization resistor. Even and odd data for both single-ended SST slices are derived from two 4:1 input multiplexers to select the data stream corresponding to the assigned FIR position. In order to guarantee stable data and the appropriate timing at the multiplexing output SST stage re-tim- ing latches are introduced. Note that the output impedance of the driver is equal to the parallel combination of all the SST slices, and it does not matter whether a particular slice is pulling up or down. Termination-impedance tuning is obtained by some addi- tional logic to disable a certain number of SST slices and to set them into high impedance (enable=0). A nominal 50impedance is achieved when 30 SST slices are enabled, leaving a tuning range of ±14 slices to comfortably cope with process tolerances. A MOS to poly resistance ratio of 1:3 is chosen in this implementa- tion for optimum accuracy/area trade-off. A particular concern in a half-rate SST TX lies in the fact that it operates at both clock cycles of the half rate CMOS clock ck2 and any imbalance or DCD has a direct impact on the jitter perform- ance. Special measures are taken in the CML-to-CMOS clock con- verter, shown in Fig. 24.6.3(a), to cleanup DCD. The first CML buffer stage uses DC suppression and acts as a first DCD-cleanup stage [2]. In addition, it provides some gain to maximize the CML output signal swing (out, outb). An AC-coupled inverter with resistive feedback then follows the first CML buffer and acts as a CML to CMOS conversion stage [3]. AC-coupling is a simple and efficient way to remove any DC component and to perform a volt- age level shift to the input of the inverter, which is DC biased to its trip point by means of the feedback resistor. Three tapered CMOS inverters then follow to provide enough drive strength to globally distribute the half-rate CMOS clock ck2/ck2b to the 44 unit SST slices. Care is taken to minimize delay (~ 43ps) in the CMOS clock path for minimum power-supply-noise-induced jit- ter. Furthermore, special attention is paid in the circuit layout to keep the fully differential clock nets ck2/ck2b symmetrical. A circuit layout of the implemented SST TX is shown in Fig. 24.6.3(b). The macro occupies an area of 230×56µm 2 including ESD protection. In order to characterize the performance of the SST TX, a wafer probable test chip is implemented (Fig. 24.6.4). The test chip consists of 2 SST TX lanes that share a common external differential half-rate CML clock input. A serial 3-wire interface allows control of the TX settings and the 2 bit-pattern generators that generate independently programmable quarter- rate data for both TX lanes. Furthermore, supply decoupling capacitors in the order of 50pF/lane are added closely to each TX lane. A chip micrograph is shown in Fig. 24.6.7. Figure 24.6.5 shows 3 measured PRBS15 differential data eyes at 16Gb/s data rate and at 3 different termination voltages Vtt, along with the measured jitter numbers. Sub-ps RJ is measured, which is essentially at the resolution limit of the measurement equipment, while the measured DJ is ~8.5ps and dominated by ISI. Split wafer lot measurements of DCD at data rates between 5.2 to 12.5Gb/s remain below 600fs. A DC jitter supply sensitivi- ty of -4.6ps/100mV (@V dd =1V) could be observed. Only a slight degradation in DJ of <1ps could be observed when switching the neighboring lane in operation. No perceivable difference in the jitter performance nor in the eye opening could be observed for any termination voltage Vtt from 0 to 1V, which proves the versa- tility of the SST TX to cope with different termination standards. The quality of the SST TX clocking path is not only confirmed by sub-ps RJ, but also with the DCD-cleanup performance. Figure 24.6.6 shows the measured output peak-to-peak DCD versus an incoming-clock DCD at a data rate of 5Gb/s (2.5GHz CML clock) and different termination voltages Vtt. The measured output DCD at all termination voltages remains below 0.5%pp at an input DCD of ±10%. The measured supply current of one SST TX lane including bit-pattern generator at 16Gb/s and differential 100termination is 57.5mA at a nominal V dd of 1V, corresponding to a power-dissipation efficiency of 3.6mW/Gb/s. Acknowledgements: The authors would like to thank Bhavna Agrawal, Michael Beakes, Nick Perez, Steve Walker, and Carl Wermer for analog design enablement and SOI modeling support, and the IBM foundry team for chip manufacturing. References: [1] H. Hatamkhani, K. J. Wong, R. Drost, et al.,”A 10mW 3.6Gbps I/O Transmitter,” Symp. VLSI Circuits, pp. 97 - 98, Jun., 2003. [2] C. Menolfi, T. Toifl, R. Reutemann, et. al, “A25Gb/s PAM4 Transmitter in 90nm CMOS SOI,” ISSCC Dig. Tech. Papers, pp. 72-73, Feb., 2005. [3] J. Savoj, B. Razavi, “A CMOS Interface Circuit for Detection of 1.2Gb/s RZ Data,” ISSCC Dig. Tech. Papers, pp. 278 - 279, Feb., 1999. 1-4244-0852-0/07/$25.00 ©2007 IEEE.
Transcript
Page 1: Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel ...web.mit.edu/magic/Public/papers/04242457.pdf · Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel Kossel, Thomas Morf,

446 • 2007 IEEE International Solid-State Circuits Conference

ISSCC 2007 / SESSION 24 / MULTI-GB/s TRANSCEIVERS / 24.6

24.6 A 16Gb/s Source-Series Terminated Transmitter in 65nm CMOS SOI

Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel Kossel,Thomas Morf, Jonas Weiss, Martin Schmatz

IBM, Rueschlikon, Switzerland

The quest for high data rates at low power consumption and areahas renewed interest in the source-series terminated (SST) driv-er. While SST drivers may not necessarily boast better perform-ance than their counterparts using CML output stages, theiradvantage lies in their potential for lower power operation [1]and their ability to cope with a large range of termination volt-ages, which makes them a prime candidate for multi-standardTX. Given the increasing challenge in achieving acceptable ana-log performance in advanced digital CMOS technologies, the SSTdriver principle is based entirely on digital switching devices thatare optimized for high-speed operation and continue to scale withtechnology. In this paper, the architecture and design of key com-ponents of a half-rate SST TX is presented that implements a ver-satile, power- and area-efficient equalization and impedance-tun-ing scheme. Furthermore, it achieves low jitter and negligibleduty-cycle distortion (DCD), thanks to a clock duty-cycle cleanupcircuit.

Figure 24.6.1 shows a block diagram of the implemented differen-tial half-rate SST TX. All local clocks are derived from a globalhalf-rate CML clock (ck2cml) and are converted to CMOS half-rate (ck2) and quarter rate (ck4) clock, respectively. The 4b quar-ter-rate input data d[0:3] is transformed in a first multiplexerstage to a half-rate interleaved even and odd data stream. Botheven and odd data streams then pass a 4b shift register, that con-sists of 4 latches driven on opposing clock phases of the half-rateclock (ck2) and implements the delayed data taps of a 4-tap FIRpre-emphasis filter (tap[0:3][even, odd]). In order to set the signof any pre-emphasis filter tap, each shift register latch output isfollowed by an XOR gate that selectively (sign[0:3]) inverts thecorresponding filter tap. The resulting 4 even/odd (=8) half-ratetap data streams, along with the half-rate CMOS clock ck2, arethen globally distributed to 44 identical differential half-rate SSTdriver slices, each of which can be configured to select oneeven/odd tap stream out of the available 4×2 tap data streams.Consequently, each SST driver slice can be assigned to any of the4 FIR taps, which adds versatile power- and area-efficient equal-ization capabilities to the TX.

Figure 24.6.2 depicts a schematic of a half-rate differential SSTslice. Each slice is composed of 2 single-ended SST drivers to forma pseudo-differential output stage. One single-ended slice con-tains 2 pull-up/pull-down branches with the corresponding evenand odd data and select transistors, along with a common pull-up/pull-down polysilicon linearization resistor. Even and odddata for both single-ended SST slices are derived from two 4:1input multiplexers to select the data stream corresponding to theassigned FIR position. In order to guarantee stable data and theappropriate timing at the multiplexing output SST stage re-tim-ing latches are introduced. Note that the output impedance of thedriver is equal to the parallel combination of all the SST slices,and it does not matter whether a particular slice is pulling up ordown. Termination-impedance tuning is obtained by some addi-tional logic to disable a certain number of SST slices and to setthem into high impedance (enable=0). A nominal 50Ω impedanceis achieved when 30 SST slices are enabled, leaving a tuningrange of ±14 slices to comfortably cope with process tolerances. AMOS to poly resistance ratio of 1:3 is chosen in this implementa-tion for optimum accuracy/area trade-off.

A particular concern in a half-rate SST TX lies in the fact that itoperates at both clock cycles of the half rate CMOS clock ck2 andany imbalance or DCD has a direct impact on the jitter perform-

ance. Special measures are taken in the CML-to-CMOS clock con-verter, shown in Fig. 24.6.3(a), to cleanup DCD. The first CMLbuffer stage uses DC suppression and acts as a first DCD-cleanupstage [2]. In addition, it provides some gain to maximize the CMLoutput signal swing (out, outb). An AC-coupled inverter withresistive feedback then follows the first CML buffer and acts as aCML to CMOS conversion stage [3]. AC-coupling is a simple andefficient way to remove any DC component and to perform a volt-age level shift to the input of the inverter, which is DC biased toits trip point by means of the feedback resistor. Three taperedCMOS inverters then follow to provide enough drive strength toglobally distribute the half-rate CMOS clock ck2/ck2b to the 44unit SST slices. Care is taken to minimize delay (~ 43ps) in theCMOS clock path for minimum power-supply-noise-induced jit-ter. Furthermore, special attention is paid in the circuit layout tokeep the fully differential clock nets ck2/ck2b symmetrical.

A circuit layout of the implemented SST TX is shown in Fig.24.6.3(b). The macro occupies an area of 230×56µm2 includingESD protection. In order to characterize the performance of theSST TX, a wafer probable test chip is implemented (Fig. 24.6.4).The test chip consists of 2 SST TX lanes that share a commonexternal differential half-rate CML clock input. A serial 3-wireinterface allows control of the TX settings and the 2 bit-patterngenerators that generate independently programmable quarter-rate data for both TX lanes. Furthermore, supply decouplingcapacitors in the order of 50pF/lane are added closely to each TXlane. A chip micrograph is shown in Fig. 24.6.7.

Figure 24.6.5 shows 3 measured PRBS15 differential data eyes at16Gb/s data rate and at 3 different termination voltages Vtt,along with the measured jitter numbers. Sub-ps RJ is measured,which is essentially at the resolution limit of the measurementequipment, while the measured DJ is ~8.5ps and dominated byISI. Split wafer lot measurements of DCD at data rates between5.2 to 12.5Gb/s remain below 600fs. A DC jitter supply sensitivi-ty of -4.6ps/100mV (@Vdd=1V) could be observed. Only a slightdegradation in DJ of <1ps could be observed when switching theneighboring lane in operation. No perceivable difference in thejitter performance nor in the eye opening could be observed forany termination voltage Vtt from 0 to 1V, which proves the versa-tility of the SST TX to cope with different termination standards. The quality of the SST TX clocking path is not only confirmed bysub-ps RJ, but also with the DCD-cleanup performance. Figure24.6.6 shows the measured output peak-to-peak DCD versus anincoming-clock DCD at a data rate of 5Gb/s (2.5GHz CML clock)and different termination voltages Vtt. The measured outputDCD at all termination voltages remains below 0.5%pp at aninput DCD of ±10%. The measured supply current of one SST TXlane including bit-pattern generator at 16Gb/s and differential100Ω termination is 57.5mA at a nominal Vdd of 1V, correspondingto a power-dissipation efficiency of 3.6mW/Gb/s.

Acknowledgements:The authors would like to thank Bhavna Agrawal, Michael Beakes, NickPerez, Steve Walker, and Carl Wermer for analog design enablement andSOI modeling support, and the IBM foundry team for chip manufacturing.

References:[1] H. Hatamkhani, K. J. Wong, R. Drost, et al.,”A 10mW 3.6Gbps I/OTransmitter,” Symp. VLSI Circuits, pp. 97 - 98, Jun., 2003.[2] C. Menolfi, T. Toifl, R. Reutemann, et. al, “A 25Gb/s PAM4 Transmitterin 90nm CMOS SOI,” ISSCC Dig. Tech. Papers, pp. 72-73, Feb., 2005.[3] J. Savoj, B. Razavi, “A CMOS Interface Circuit for Detection of 1.2Gb/sRZ Data,” ISSCC Dig. Tech. Papers, pp. 278 - 279, Feb., 1999.

1-4244-0852-0/07/$25.00 ©2007 IEEE.

Page 2: Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel ...web.mit.edu/magic/Public/papers/04242457.pdf · Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel Kossel, Thomas Morf,

447DIGEST OF TECHNICAL PAPERS •

Continued on Page 614

ISSCC 2007 / February 14, 2007 / 11:15 AM

Figure 24.6.1: Half-rate SST TX block diagram. Figure 24.6.2: SST driver half-rate slice architecture.

Figure 24.6.3: (a) CML to CMOS converter with DCD cleanup, (b) Layout of one SST TXcore.

Figure 24.6.5: Measured PRBS15 data eye at 16Gb/s and at different terminationvoltages Vtt.

Figure 24.6.6:Measured output DCD versus input-clock DCD at 5Gb/s and different termination voltages.

Figure 24.6.4: Test-chip block diagram.

FIRshift

register(4 taps)

configuration register

2:1

d0

d2

ck4

out

d0d2d1d3

MUX 4:2

FIRshift

register(4 taps)

CML-to-CMOSClock Generator

tapsign

44 diff. half-rateunit SSTdrivers

ck2cml

ck2/ck2bck4/ck4b

tap 1(even/odd)

tap 2(even/odd)tap 3(even/odd)

tap 0(even/odd)even

odd

sign[0:3]enable[0:28]

config[0,1][0:43]

out+out-

2

2

2

2

globaldistribution

data

incl

ock

in

data

out

2

Vdd

ck2b

Vdd

ck2

ck2

ck2b

dep dop

den don

de

do

endop

den

don

depD Q

E

4:1de_0do_1de_2do_3

4:1do_0de_1do_2de_3

config:tap0,tap1,tap2,tap3

even data mux

odd data mux

re-timing latch

ck2b

Vdd

ck2

ck2

ck2b

dep dop

den don

de

do

endop

den

don

depde

deb

do

dob

ck2

D Q

Eck2b

out+out-

single-ended half-rate SST slice (2x)

enable:0 = off (high impedance)1 = on

2

Vdd

M1C1 / C1B

M1Bck2cml ck2cmlb

OutOutb ck2b

ck2

4x FO2.5 stages ~43ps

ESD+

ESD-

22differentialSST slices

22differentialSST slices

CMOSClock

generator

MUX4:2FIR

56µm

230µm

a)

b)

TX lane14-tap FIR

Half-rate SST driver44 unit slices

Pat. Gen 1PRBS[7|15|31]

8b prog. pattern

d0:d3@ck4

ck4 (CMOS)

2ck2_cml

TX lane04-tap FIR

Half-rate SST driver44 unit slices

Pat. Gen 0PRBS[7|15|31]

8b prog. pattern

d0:d3@ck4

ck4 (CMOS)

2out_lane0

2out_lane1

2ck2cml

3-wire serial interface+ configuration register

3data, clk, load

Testchip

Vtt=0V Vtt=0.5V

Vtt=1.0V

8.35

8.72

8.53

DJ(d-d)[ps]

0.250

0.170

0.120

DCD[ps]

TJ [ps](BER1e-12)

RJ[ps] rms

16Gb/s,PRBS15

10.790.176Vtt=1.0V

11.160.176Vtt=0.5V

10.990.177Vtt=0V

400ps

50%

-10%

+10%

20% p-p

Applied input clock signal@ 2.5GHz

Duty-Cycle Distortion (DCD) @ 5Gb/s vs. input DCD

0

0.1

0.2

0.3

0.4

0.5

0.6

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11

DCD in %

DC

D o

ut %

pk-

pk

Vtt=0.0 Vtt=0.5 Vtt=1.0

24

Page 3: Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel ...web.mit.edu/magic/Public/papers/04242457.pdf · Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel Kossel, Thomas Morf,

614 • 2007 IEEE International Solid-State Circuits Conference 1-4244-0852-0/07/$25.00 ©2007 IEEE.

ISSCC 2007 PAPER CONTINUATIONS

Figure 24.6.7: Test-chip micrograph.

decoupling caps

decoupling caps

decoupling caps

SST TX lane 0

SST TX lane 1

data patterngenerators

3-wireinterface &config reg.

ck2cml

ck2cmlb

out+ out-

out+ out-

lane 0

lane 1

1mm x 1mmclockinput


Recommended