+ All Categories
Home > Documents > A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

Date post: 18-Dec-2016
Category:
Upload: stefanos
View: 214 times
Download: 0 times
Share this document with a friend
11
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 12, DECEMBER 2013 1 A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications Aida Varzaghani, Member, IEEE, Athos Kasapi, Member, IEEE, Dimitri N. Loizos, Member, IEEE, Song-Hee Paik, Shwetabh Verma, Member, IEEE, Sotirios Zogopoulos, and Stefanos Sidiropoulos, Member, IEEE Abstract—This paper presents the design of a 40-nm CMOS 10.3-GS/s 6-bit Flash ADC used as the analog frontend of a universal DSP-based receiver that meets the requirements for all the NRZ 10G Ethernet (10GE) standards, for both ber and copper channels. The 4-way interleaved ADC consists of a pair of frontend variable gain ampliers (VGAs) driving four sets of track-and-hold (T/H) switches, followed by ne VGAs that drive 6-bit comparator arrays. A Wallace-tree adder is utilized as the thermometer-to-binary encoder allowing comparator re-ordering and redundancy. Also integrated is an 8-bit calibration DAC that is used as a reference to nullify the accumulated offset of the entire signal path, as well as to compensate for the nominal nonlinearity of the ne VGA and the resistor ladder. After calibration, the peak SNDR of the ADC is about 34 dB with bandwidth ranging from 3.5 to 6 GHz over all VGA gain settings. The ADC, along with its entire clock path, occupies 0.27 mm and consumes 242 mW from a 0.9-V supply. Index Terms—10G Ethernet, A/D conversion, ADC, comparator re-ordering, CX1, DFE, DSP-based receiver, FFE, ash ADC, KR, LRM, MMF, SR, time-interleaved ADC, Wallace-tree adder. I. INTRODUCTION I NCREASING demand for higher bandwidth has led to ex- tensive use of 10G Ethernet in all parts of the network in- frastructure: from high-density, short optical and electrical links within the data center, to over long optical bers connecting these data centers together. A wide array of standards has been developed to support 10GE over both ber and copper media. For example, the long-reach multi-mode (LRM) standard [1] targets installed base multi-mode bers with lengths less than 1 km. The short-reach (SR) standard [2] is used for shorter op- tical links, generally less than 100 m over multi-mode ber using low-cost VCSEL-based optics. The LR and ZR optical standards [2] have been designed for long-reach optical links which run for 10–100 km over single-mode ber. Similarly, KR [3] and CX1 [4] are standards for 10GE serial links in copper media. The KR links are copper backplane channels up to 1 m Manuscript received April 19, 2013; revised June 28, 2013; accepted July 05, 2013. This paper was approved by Guest Editor Brian Brandt. A. Varzaghani, D. Loizos, S.-H. Paik, S. Verma, and S. Zogopoulos are with Broadcom Corporation, Santa Clara, CA 95054 USA (e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]). A. Kasapi is with Athos Kasapi Consulting LLC, San Francisco, CA 94117 USA (e-mail: [email protected]). S. Sidiropoulos is with Barefoot Networks Inc., Palo Alto, CA 94306 USA (e-mail: [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/JSSC.2013.2279419 Fig. 1. Block diagram of frontend ADC within the DSP-based receiver. in length, while the CX1 copper cables can be several meters in length. The diversity of the physical media, as well as the severe channel impairments of the legacy bers at 10.3 Gb/s, lend themselves to the use of exible DSP-based receivers [5]–[8] that can be designed to meet the performance requirements of multiple 10GE standards at once. Such receivers incorporate sophisticated equalization and timing recovery techniques, while allowing for better power and area scaling with tech- nology. A key block for such receivers is the frontend ADC that digitizes the incoming data. Running at the aggregate data rate, the ADC has to provide sufcient resolution bandwidth under reasonable power dissipation and area constraints. This paper describes a 10.3-GS/s, 6-bit Flash ADC for 10GE applications. The resolution of the ADC can be arbitrarily re- duced to save power for easier channels. The Flash architecture enables lower latency, superior exibility, and lower predicted metastability error rate than other high-speed low-to-medium resolution ADCs. With pipeline or successive-approximation (SAR) architectures, the relatively longer latency to produce a sample may adversely impact the clock and data recovery (CDR) performance. In addition, due to the nature of the suc- cessive bit decoding, substantial time-interleaving may be nec- essary to ensure a metastability error rate of better than 10 , which is a typical requirement in such applications [6], [9]. Implemented in 40-nm CMOS technology, the ADC de- scribed in this paper is used as the analog frontend for a DSP-based receiver that meets the requirements for all the NRZ 10GE standards. While this paper focuses on the design and implementation of the ADC, a complete block diagram of the DSP-based receiver is shown in Fig. 1. The incoming signal is rst digitized by the ADC, and then processed by an adaptive equalizer to recover the transmitted data. The receiver 0018-9200 © 2013 IEEE
Transcript
Page 1: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 12, DECEMBER 2013 1

A 10.3-GS/s, 6-Bit Flash ADC for 10G EthernetApplications

Aida Varzaghani, Member, IEEE, Athos Kasapi, Member, IEEE, Dimitri N. Loizos, Member, IEEE, Song-Hee Paik,Shwetabh Verma, Member, IEEE, Sotirios Zogopoulos, and Stefanos Sidiropoulos, Member, IEEE

Abstract—This paper presents the design of a 40-nm CMOS10.3-GS/s 6-bit Flash ADC used as the analog frontend of auniversal DSP-based receiver that meets the requirements forall the NRZ 10G Ethernet (10GE) standards, for both fiber andcopper channels. The 4-way interleaved ADC consists of a pairof frontend variable gain amplifiers (VGAs) driving four sets oftrack-and-hold (T/H) switches, followed by fine VGAs that drive6-bit comparator arrays. A Wallace-tree adder is utilized as thethermometer-to-binary encoder allowing comparator re-orderingand redundancy. Also integrated is an 8-bit calibration DAC thatis used as a reference to nullify the accumulated offset of the entiresignal path, as well as to compensate for the nominal nonlinearityof the fine VGA and the resistor ladder. After calibration, the peakSNDR of the ADC is about 34 dB with bandwidth ranging from3.5 to 6 GHz over all VGA gain settings. The ADC, along with itsentire clock path, occupies 0.27 mm and consumes 242 mW froma 0.9-V supply.

Index Terms—10G Ethernet, A/D conversion, ADC, comparatorre-ordering, CX1, DFE, DSP-based receiver, FFE, flash ADC, KR,LRM, MMF, SR, time-interleaved ADC, Wallace-tree adder.

I. INTRODUCTION

I NCREASING demand for higher bandwidth has led to ex-tensive use of 10G Ethernet in all parts of the network in-

frastructure: from high-density, short optical and electrical linkswithin the data center, to over long optical fibers connectingthese data centers together. A wide array of standards has beendeveloped to support 10GE over both fiber and copper media.For example, the long-reach multi-mode (LRM) standard [1]targets installed base multi-mode fibers with lengths less than1 km. The short-reach (SR) standard [2] is used for shorter op-tical links, generally less than 100 m over multi-mode fiberusing low-cost VCSEL-based optics. The LR and ZR opticalstandards [2] have been designed for long-reach optical linkswhich run for 10–100 km over single-mode fiber. Similarly, KR[3] and CX1 [4] are standards for 10GE serial links in coppermedia. The KR links are copper backplane channels up to 1 m

Manuscript received April 19, 2013; revised June 28, 2013; accepted July 05,2013. This paper was approved by Guest Editor Brian Brandt.A. Varzaghani, D. Loizos, S.-H. Paik, S. Verma, and S. Zogopoulos

are with Broadcom Corporation, Santa Clara, CA 95054 USA (e-mail:[email protected], [email protected], [email protected],[email protected], [email protected]).A. Kasapi is with Athos Kasapi Consulting LLC, San Francisco, CA 94117

USA (e-mail: [email protected]).S. Sidiropoulos is with Barefoot Networks Inc., Palo Alto, CA 94306 USA

(e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/JSSC.2013.2279419

Fig. 1. Block diagram of frontend ADC within the DSP-based receiver.

in length, while the CX1 copper cables can be several meters inlength.The diversity of the physical media, as well as the severe

channel impairments of the legacy fibers at 10.3 Gb/s, lendthemselves to the use of flexible DSP-based receivers [5]–[8]that can be designed to meet the performance requirements ofmultiple 10GE standards at once. Such receivers incorporatesophisticated equalization and timing recovery techniques,while allowing for better power and area scaling with tech-nology. A key block for such receivers is the frontend ADCthat digitizes the incoming data. Running at the aggregate datarate, the ADC has to provide sufficient resolution bandwidthunder reasonable power dissipation and area constraints.This paper describes a 10.3-GS/s, 6-bit Flash ADC for 10GE

applications. The resolution of the ADC can be arbitrarily re-duced to save power for easier channels. The Flash architectureenables lower latency, superior flexibility, and lower predictedmetastability error rate than other high-speed low-to-mediumresolution ADCs. With pipeline or successive-approximation(SAR) architectures, the relatively longer latency to producea sample may adversely impact the clock and data recovery(CDR) performance. In addition, due to the nature of the suc-cessive bit decoding, substantial time-interleaving may be nec-essary to ensure a metastability error rate of better than 10 ,which is a typical requirement in such applications [6], [9].Implemented in 40-nm CMOS technology, the ADC de-

scribed in this paper is used as the analog frontend for aDSP-based receiver that meets the requirements for all theNRZ 10GE standards. While this paper focuses on the designand implementation of the ADC, a complete block diagramof the DSP-based receiver is shown in Fig. 1. The incomingsignal is first digitized by the ADC, and then processed by anadaptive equalizer to recover the transmitted data. The receiver

0018-9200 © 2013 IEEE

Page 2: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 12, DECEMBER 2013

Fig. 2. ADC top-level block diagram.

includes baud-rate CDR logic, adaptation and gain control, anda histogram engine to compute the ADC accuracy and expectedbit-error rate (BER). An on-chip micro-controller C is usedto coordinate the activities of the entire DSP. It is also used tocalibrate and monitor the ADC.The rest of this paper describes the design and architecture

of the 4-way interleaved ADC. This ADC design targets sim-plicity, architectural flexibility, and extensive use of calibrationtechniques. Section II discusses the architecture of the ADC andhighlights some key features. Section III describes the design ofmajor circuit blocks within the ADC. Section IV covers the de-tails of the calibration scheme along with its enabling hardware.Section V presents the measurement results. Finally, Section VIsummarizes the paper and draws conclusions.

II. ADC ARCHITECTURE

Fig. 2 shows the architecture of the 6-bit 4-way interleavedADC. The on-chip C first calibrates the ADC upon receiverstart-up, and then continuously compensates the inter-channelmismatch during the normal operation of the ADC. The ADCconsists of a pair of frontend VGAs (G1) that provide coarsegain control. Each instance of G1 drives a pair of T/H switchesthat are clocked by complementary phases of a 2.575-GHzclock. Each T/H output is buffered by a fine-resolution VGA(G2) that drives the input load of the comparator array. The5-bit offset DAC at the output of each G2 is utilized by theC to continuously compensate the residual offset of

each interleave. Each comparator within the array can be indi-vidually powered down, allowing the flexibility to save bothcomparator and clock distribution power for the less-impairedchannels. The thermometer output of the comparator arraygoes through a series of metastability-hardening flip-flops. AWallace-tree adder then converts the metastability-hardened

thermometer code into binary. Finally, the output is retimed toa single clock phase and de-multiplexed to a 644-MHz rate.The quadrature 2.575-GHz clock phases are derived from a

divide-by-two block following a 5.15-GHz phase interpolator.These clocks go through CML fan-up buffers before splittinginto two separate buffer chains. The T/H clocks are distributedvia additional CML buffers that can adjust the timing skew witha resolution of 0.5 ps, while the comparator array strobe clocksare driven by CMOS buffers. The strobe clocks can be adjustedwith respect to the sampling clocks with a resolution of 20 ps.The rest of this section discusses some architectural choices

made in this design.

A. ADC Frontend Signal Path

The frontend signal path in this ADC is a cascade of G1,T/H switches and G2. This configuration enables high trackingbandwidth, mitigates charge sharing between the T/H switches,and allows independent gain and offset calibration of each in-terleave. Since G1 drives a small capacitive load, high trackingbandwidth is possible. G1 absorbs most of the input signal dy-namic range to help mitigate dynamic harmonic distortion pro-duced by the T/H switches. The split design of G1 alleviatescharge sharing between the T/H switches that can degrade theaccuracy of the sampled signals. This mechanism is shown inFig. 3. Consider network A, where the 4-way interleaved T/Hswitches are driven by a single VGA. In this case, the onset ofsampling in one T/H overlaps with the sampling operation inthe previous T/H, corrupting its output due to charge sharing be-tween the capacitors. In this ADC, this problem is mitigated bysplitting the single frontend VGA into two instances, as shownin network B. Each VGA drives a pair of T/H switches drivenfrom complementary phases of a 2.575-GHz clock. Since thesampling phases of each pair of T/H switches do not overlap,corruption due to charge sharing does not occur. Lastly, the

Page 3: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VARZAGHANI et al.: A 10.3-GS/s, 6-Bit FLASH ADC FOR 10G ETHERNET APPLICATIONS 3

Fig. 3. Mitigating the charge sharing problem between the T/H switches.

slower G2 in each interleave provides fine gain control whileenabling single interleave gain and offset calibration.

B. Comparator Re-Ordering and Redundancy

In a Flash ADC, random and systematic offsets in the com-parator array may result in nonlinearity, and in extreme cases,non-monotonicity in the ADC transfer function. These offsetscan be reduced at the expense of increased device sizes, whichresults in substantial power and area overhead in the comparatorarray, as well as the preceding stages. A more power-efficientapproach is to reduce the comparator size and compensate forthe voltage offset due to random mismatch utilizing an em-bedded offset calibration DAC within every comparator. Theviability of such an approach may be limited by the range andprecision of this offset DAC.To alleviate the requirement on the DAC tuning range, this

ADC allows dynamic reconfiguration of the comparator order,similar to [10]. In addition to comparator re-ordering, residualoffset cancellation is utilized, leading to a low-power ADC de-sign with smaller residual error. The comparators are aggres-sively sized down to save power, while firmware-driven offsetcalibration is used to maximize the ADC yield. The benefit ofcomparator re-ordering is illustrated in Fig. 4. Consider a se-quence of four comparators (0, 1, 2, 3) with ideal trip points of

. Due to mismatch in the comparators and the re-sistor ladder, the actual trip points of these comparators will berandomly distributed. The offset tuning range of the embeddedDAC within each comparator needs to be large enough to re-store all the trip points back to their nominal positions. Alter-natively, comparator re-ordering allows reduction of the com-parator offset tuning range. In this scheme, the comparators arefirst sorted in the order of their actual trip points, and then theresidual offsets are taken out using the embedded offset tuningDACs. In the example of Fig. 4, first the comparators are re-or-dered (1, 2, 0, 3), and then their corresponding trip points areadjusted to , , , and , respectively. As in-dicated by the arrows, this relaxes the offset DAC tuning rangerequirement.This re-ordering flexibility is realized by using aWallace-tree

adder as the thermometer-to-binary encoder. As shown in the

example in Fig. 5, the adder naturally disregards the com-parator switching order, and guarantees output monotonicity[11]. To prevent metastability-related error propagation, thethermometer code goes through a sequence of three flip-flopsbefore arriving at the adder inputs. Together with the com-parator and the SR latch, this guarantees metastability error ratebetter than 10 .Using an adder as a decoder in combination with individual

comparator offset calibration leads to a power-efficient FlashADC design. To analyze the benefits of comparator re-orderingfor this ADC, a total of 10,000 hypothetical 6-bit comparatorarrays are simulated. The nominal LSB size is around 6 mV,while the comparator offset distribution in each ADC has a rawstandard deviation of 15 mV. Fig. 6(a) shows the his-togram of comparator offset values at their original index posi-tions for these 10,000 ADCs. Depending upon the distributionof the random offsets for a given ADC instance, comparatorswill not always turn on in the desired order. Utilizing an adderas the decoder, the effective standard deviation of the re-sortedarray is found to be 7 mV (Fig. 6(b)). With comparatoroffset tuning range of 35 mV, up to 5-sigma effective offsetvariation for every comparator can be corrected. Assuming aGaussian distribution of offset values, the probability of havinginsufficient tuning range for a single comparator can be foundas

(1)

where is the Gaussian cumulative distribution functionwith average and standard deviation of 0 and 1, respectively.Since there are 4 63 comparators in the time-interleaved ADC,the overall offset-related yield can be found as

(2)

In this design, incorporating re-ordering implies 2X reductionin the effective offset, which is equivalent to a hypothetical 4Xincrease in size, and therefore power, of a single comparator.Because the comparator power is a significant fraction of theoverall ADC power, this technique results in significant powersavings. Three extra comparators are included at each end ofthe array to provide sufficient redundancy for re-ordering. Thesecomparators do not contribute to additional power dissipation,as only 63 comparators remain active post-calibration.

III. CIRCUIT IMPLEMENTATION

A. Track-and-Hold and Variable Gain Amplifiers

The frontend design comprises a cascade of two VGAs witha T/H stage in between (Fig. 2). The differential T/H consistsof a pair of passive PMOS switches utilizing dummy devicesto reduce charge injection as well as cross-coupled replicacapacitors to minimize signal feed-through [9]. The T/H clocksare provided by CML buffers due to their superior supplynoise immunity. Because of extensive switching activity onthe supply, this supply immunity is critical to the performanceof the ADC. The delay of the T/H clock is digitally tunedvia a 6-bit binary-weighted capacitive load at the output ofthe driving CML buffer. This scheme provides for a skew

Page 4: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 12, DECEMBER 2013

Fig. 4. Comparison of comparator offset adjustment: (a) using the original comparator orders and (b) after comparator re-ordering.

Fig. 5. Example of using an adder as the binary-to-thermometer encoder.

resolution of 0.5 ps. The number of buffer stages and the delaytuning range are constrained to ensure minimal jitter accumu-lation. The VGAs adjust the input signal to provide an optimalswing of nearly 400 mV differential peak-to-peak (dpp) tothe comparator array. The amplifier G1 provides coarse gaincontrol at high bandwidth, while a slower G2 provides fine gaincontrol over a relatively narrow gain range. Fig. 7 shows thedesign of the two source-degenerated differential amplifiers.For both amplifiers, source degeneration is implemented via adigitally programmable resistor array that allows the gain to bevaried. The gain ranges for G1 and G2 are [ 6 dB, 6 dB] and[ 1 dB, 5 dB], respectively, with 5-bit dB-linear gain control.The input signal can be as high as 800 mVdpp. G1, whichabsorbs most of the input signal dynamic range, uses shuntpeaking [12] to achieve a nominal bandwidth of 6 GHz. Shuntpeaking in G1 allows 1.6X bandwidth improvement withoutany additional power consumption. The cascode devices M3and M4 that help extend bandwidth during normal operation

Fig. 6. Numerical results of comparator re-ordering benefits. Distribution ofcomparator offset values (a) before and (b) after re-ordering.

also disconnect the input signal from the load during calibra-tion. Total harmonic distortion of G1 is below 36 dB underworst-case conditions. The bandwidth requirement for G2 issomewhat relaxed (2 GHz)—its output can settle up to around200 ps after the T/H enters the hold state. Charge feedback viagate-to-drain overlap capacitance of M1 and M2 can disturbthe voltage held on the sampling capacitors as the output of G2settles. Cross-coupled devices M3 and M4 perform first-ordercancellation of this effect. The accumulated offset of the entiresignal path is suppressed by a 5-bit current steering DAC at theoutput of G2. This DAC has an offset tuning range of 20 mV.

B. Comparator Array

The comparator array consists of 63 active comparators anda resistor ladder. The ladder range is 400 mVdpp, resulting in anominal LSB value of 6.25 mV.To achieve a power-efficient design, the dynamic compara-

tors in this ADC operate without any pre-amplifiers. To savepower in the clock tree, each comparator uses a local clock

Page 5: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VARZAGHANI et al.: A 10.3-GS/s, 6-Bit FLASH ADC FOR 10G ETHERNET APPLICATIONS 5

Fig. 7. Source-degenerated variable gain amplifiers: (a) G1 and (b) G2.

Fig. 8. Dynamic comparator with offset tuning and kickback cancellation.

buffer that can be disabled during individual comparator power-down. Fig. 8 shows the design of the comparator, which is basedon a dynamic-sense amplifier latch [13] incorporating severalmodifications. The comparator samples the differential inputsignal and compares it with the differential reference generatedby the resistor ladder. This operation is performed by differen-tial pairs M1–M2 and M3–M4 in Fig. 8. This input device con-figuration accommodates the large signal range at the edges ofthe ladder.Offset tuning is accomplished by turning on extra devices in

parallel on the reference path, M2 and M4. This offset DAChas been placed away from the high-speed signal path in orderto minimize its impact on the dynamic sampling behavior ofthe comparator. The offset tuning range is [ 35 mV, +35 mV]with 5-bit control. The offset tuning is robust with less than0.5 LSB drift across 0–110 C and 0.85–0.95 V temperature

and supply range, respectively.

In general, input kickback noise is a common problem for alldynamic comparators. As shown in Fig. 9(a), upon activating adynamic comparator, the reset devices are disengaged, and thesource and drain of the input transistors are pulled to ground.This action draws in charge from the gate of the input devicesand creates kickback noise on the input. This noise may inter-fere with the comparator decision. A power-inefficient way toreduce the kickback noise is to use a preamplifier to reduce theimpedance seen at the comparator inputs. Instead, in this cir-cuit, kickback noise cancellation is accomplished using dummytransistors at the comparator inputs. As shown in Fig. 9(b),device M1d is a dummy replica of M1, with its source/drainnode charged to ground during reset. Upon comparator acti-vation, the source and drain nodes of M1d are pulled up tothe supply voltage. The complementary turn-off action of thisdummy device provides all the charge that is drawn by the inputtransistor, which is turning on. Ideally, no additional charge isdrawn from the input. If a delay exists between the transitiontimes of the input and the dummy transistors, some charge maytemporarily leak to the input terminals causing a residual error.To mitigate this, complementary switches (Mn1, Mp1, Mn2and Mp2) are utilized to allow better transition tracking acrossprocess, voltage and temperature (PVT) corners. Even thoughthe kickback cancellation scheme increases the input capaci-tance of the comparators, it nominally eliminates the kickbacknoise allowing power saving in G2. Simulation results showthat without such kickback cancellation, the trip voltages of thecomparators at the ends of the array could move by 250 mV.The charge kickback to the resistor ladder is suppressed viabypass capacitors placed at each ladder node. Operating at2.575 GHz, the comparator achieves better than 0.2 mV inputsensitivity across all PVT corners.

IV. ADC CALIBRATION

A. Initial Power-on Calibration

Upon chip power-up, the signal path is disabled and acommon 8-bit calibration DAC is utilized to sequentiallycalibrate the ADC interleaves. The design of this calibrationDAC is shown in Fig. 10. It is a binary 8-bit PMOS currentDAC array, with segmentation for the three most significantbits. During calibration, a single cascoded current mirror isused to steer current into the loads of the two G1 amplifiers.Since the cascode devices can be turned off by grounding theirgate nodes, they also act as multiplexers. The calibration DACgenerates calibration voltages across a range of 200 mV, withan LSB size of 1.6 mV. It operates from a 1.8-V supply and ispowered down during the normal operation of the ADC.Fig. 11 shows the path for the initial calibration of each

interleave. During this calibration phase, the output of G1 isdisengaged from the input via transistors M3-M4, shown inFig. 7(a), and the shared 8-bit calibration DAC drives the G1resistors. The corresponding T/H switches are forced on. TheC firmware sweeps the calibration DAC voltage to determinethe trip points of all the comparators in the array. It then sortsthe array in the trip point order, and powers down the threeredundant comparators at each end of the re-sorted array.

Page 6: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 12, DECEMBER 2013

Fig. 9. (a) The kickback mechanism in a dynamic comparator, and (b) charge kickback cancellation using dummy replica transistors.

Fig. 10. Shared calibration DAC.

Finally the C sets the calibration DAC voltage to each desiredtrip point, and drives the target comparator’s 5-bit offset DACto cancel the residual offset. This comparator array calibrationis performed only upon chip start-up and is robust againstchanges in the supply and temperature. Since the calibrationsignal is injected at the output of G1, this procedure alsocompensates for nonlinearity of G2 as well as mismatch withinthe reference ladder for each interleave. Moreover, since thecommon calibration DAC is used to calibrate each interleave,the calibration procedure also corrects for any initial gain andoffset mismatch between the interleaves upon chip start-up.

B. Continuous Background Calibration

As described earlier, the gain and offset mismatch betweenthe interleaves are first corrected during the initial calibrationprocess. During runtime operation, continuous gain adjustmentand offset cancellation is performed by the C on each ADC in-terleave based on the information derived from the ADC sam-ples. The offset from each interleave is continuously monitoredby the DSP, and the C acts on the offset DAC at the output ofthe respective G2 amplifier to minimize the offset. Meanwhile,the gains of G1 and G2 are continuously adapted to preserve atarget swing of nearly 400 mVdpp at the input of the comparatorarrays.Fig. 12 shows the calibration process for correcting the timing

mismatch between the interleaves. The design of the feed-for-ward equalizer (FFE) is split into four, with the adaptation ofeach FFE coefficient based on samples received from aparticularinterleave. In the absence of inter-channel timing and gain mis-match, the corresponding FFE coefficients are identical. In thepresence of timing mismatch, the FFE coefficients deviate fromeach other to compensate for the delay mismatch. The on-chipC computes the error difference between the coefficients andadjusts the inter-channel skews to minimize the error utilizing asteepest descent method. The algorithm assumes a statisticallybalanced input signal. The continuously adapting LMS feedbackloop that controls the FFE compensates for any residual skewmismatch (within 0.5 ps), as well as for subsequent drifts in bothskew and gain mismatches of the four ADC interleaves.

V. MEASUREMENT RESULTS

Fig. 13 shows the chip micrograph, with the key blocks high-lighted. The interleaves with complementary clocks are placedadjacent to each other. Fabricated in 40-nm standard CMOS,

Page 7: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VARZAGHANI et al.: A 10.3-GS/s, 6-Bit FLASH ADC FOR 10G ETHERNET APPLICATIONS 7

Fig. 11. Calibration of a single interleave.

Fig. 12. Inter-channel skew calibration.

Fig. 13. Chip photo.

the ADC occupies 0.27 mm . The nominal supply voltage ofthe ADC is 0.9 V.

A. ADC Performance

ADC differential nonlinearity (DNL) and integral nonlin-earity (INL) are measured by sweeping the calibration DACcode across the ADC input range. Fig. 14 shows the INLbefore and after calibration. The random mismatch in thecomparators and the resistor ladder is the dominant sourcesof nonlinearity before calibration. ADC calibration improvesthe INL from 3.2 LSB to 0.4 LSB. Similarly, the DNL isreduced to 0.5 LSB after calibration (Fig. 15). Post-calibra-tion nonlinearity is limited by the implementation details of thecalibration algorithm and resolution of the comparator offsettuning DAC.The limited chip area in this production version of the

ADC would not allow a large on-chip memory to producean FFT plot for an input sinusoid. The dynamic performanceof the ADC is instead measured and characterized by col-lecting the sampled histogram of such an input sinusoid [14].Fig. 16 shows the peak SNDR as a function of the input

Page 8: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 12, DECEMBER 2013

Fig. 14. Typical pre-/post- calibration INL.

Fig. 15. Typical pre-/post- calibration DNL.

frequency. The input signal at each frequency was adjustedfor a fixed comparator array loading of . The peak

SNDR is nearly constant between 33 and 35 dB, from 200MHz to 6 GHz. The SNDR is distortion-limited. The inter-

Page 9: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VARZAGHANI et al.: A 10.3-GS/s, 6-Bit FLASH ADC FOR 10G ETHERNET APPLICATIONS 9

Fig. 16. Peak SNDR versus input frequency.

TABLE IADC POWER BREAKDOWN

channel skews as well as offset and gain mismatch, which arecontinuously monitored and corrected by the on-chip C ,are guaranteed to be bounded to 0.5 ps, 1 mV and 0.2 dB,respectively. The SNDR degradation due to the thermal noiseof the ADC and the sampling clock jitter is measured to bewell below the quantization noise floor. The effective numberof bits (ENOB) at 200-MHz input frequency is 5.35 bits.Once calibrated, the SNDR of the ADC always exceeds 31 dBover 0.8 –1.0 V and 0–110 C voltage and temperature range,respectively.The measured overall effective system resolution bandwidth

(ERBW) varies from 3.5 to 6 GHz over all G1 gain settings.Since G1 uses a variable source resistor to adjust the gain,it introduces a source-dependent zero in the overall systemtransfer function, creating a gain-ERBW dependency. Thisbandwidth variation does not affect the system performance,since the firmware and the DSP continuously optimize overallsystem bandwidth in conjunction with the required gain.Table I shows the detailed power breakdown of the ADC in

the 6-bit mode. It consumes 242 mW from a 0.9-V supply at100 C junction temperature. The comparator array and the dig-ital backend, along with the clock path, are the dominant con-tributors. From this, it is clear that significant power can besaved if the ADC resolution is reduced.The figure of merit, as calculated by

(3)

TABLE IIADC PERFORMANCE SUMMARY

is between 0.49 and 0.84 pJ/conv-step across G1 gain values.The ADC performance is summarized in Table II.

B. Receiver Performance

The ADC performance is borne out by the overall re-ceiver system performance, exceeding all the relevant spec-ifications by a significant margin in a production environ-ment. The system performance measurements are carried outunder worst-case conditions, with a 0.85-V supply and 110 Ctemperature, using commercial stress-generators that conformto the various 10GE standards. Curves of BER versus opticalmodulation amplitude (OMA) are derived for the receiverusing various stressors using a 2 1 PRBS data pattern. Thereceiver sensitivity, defined as the OMA value for which theBER drops to 10 , is determined for these stressors and issummarized in Table III. It should be noted that the LRM,SR and CX1 stressors have an additional 5-inch FR4 traceand a double-stacked connector in the signal path. As thetable shows, despite this additional impairment, the receiverperformance comfortably exceeds all specifications with noerror floor. The receiver also successfully operates over KRchannels with Nyquist loss greater than 35 dB-representing amargin greater than 10 dB beyond the required specification.Additionally, the raw electrical sensitivity of the receiver ismeasured to be better than 25 mVdpp, enabling operationover single-mode fiber of lengths greater than 80 km.

VI. CONCLUSIONS

This work presents a 4-way interleaved, 10.3-GS/s, 6-bit,242-mW, Flash ADC for a universal 10GE DSP-based receiver.The dynamic comparators are aggressively sized down toreduce power consumption. Comparator re-ordering and redun-dancy, along with offset tuning, are utilized to meet the linearityperformance requirement. An on-chip 8-bit DAC calibratesthe gains and offsets of the interleaves. Inter-channel timingmismatch is corrected by the C DSP using the difference

Page 10: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 12, DECEMBER 2013

TABLE IIIRECEIVER PERFORMANCE SUMMARY

between the channel FFE coefficients. Thanks to the ADC per-formance, the receiver exceeds all 10G Ethernet specificationswith significant margin.

ACKNOWLEDGMENT

The authors would like to thank Manu Agarwal, Joe Dao,Venkata M. Kumar, Li-Min Lee, Dean Liu, Marc Loinaz, CraigMoriyama, Hong Ngo, and Senaid Tahirovic for their valuableassistance.

REFERENCES

[1] “Physical Layer and Management Parameters for 10 Gb/s Operation,Type 10 GBASE-LRM,” IEEE Standard 802.3 AQ-2006, Sep. 2006.

[2] “10 Gb/s Ethernet task force,” IEEE Standard 802.3 AE-2002, p. 802,Jun. 2002.

[3] “Ethernet Operation Over Electrical Backplanes,” IEEE Standard802.3 AP-2007, May 2007.

[4] SFP+Direct Attach Cable Specifications, SFF-8431 Appendix E,2009.

[5] M. Harwood et al., “A 12.5 Gb/s SerDes in 65 nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery,” inIEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2007,vol. 1, pp. 436–437.

[6] O. A. Agazzi et al., “A 90 nm CMOS DSP MLSD transceiver withintegrated AFE for electron. dispersion compensation of multimodeopt. fibers at 10 Gb/s,” IEEE J. Solid-State Circuits, vol. 43, no. 12,pp. 2939–2957, Dec. 2008.

[7] J. Cao et al., “A 500mW ADC-based CMOS AFE with digitalcalibration for 10 Gb/s serial links over KR-backplane and multimodefiber,” IEEE J. Solid-State Circuits, vol. 45, no. 6, pp. 1172–1185, Jun.2010.

[8] S. Verma et al., “A 10.3GS/s 6b flash ADC for 10G Ethernet appli-cations,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.Papers, 2013, vol. 1, pp. 462–463.

[9] P. S. Schvan et al., “A 24GS/s 6b ADC in 90nm CMOS,” in IEEE Int.Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2008, vol. 1, pp.544–634.

[10] M. F. Flynn et al., “Digital calibration incorporating redundancy offlash ADCs,” IEEE Trans. Circuits Syst. II, vol. 50, no. 5, pp. 205–213,May 2003.

[11] F. K. Kaess et al., “New encoding scheme for high-speed flash ADC’s,”in Proc. IEEE ISCAS, 1997, vol. 1, pp. 5–8.

[12] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Cir-cuits. Cambridge, U.K.: Cambridge Univ. Press, 1998.

[13] J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC micropro-cessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703–1714,Nov. 1996.

[14] J. Doernberg et al., “Full-speed testing of A/D converters,” IEEE J.Solid-State Circuits, vol. 19, no. 6, pp. 820–827, Dec. 1984.

Aida Varzaghani (S’05–M’06) was born in Rasht,Iran. She received the B.S. and M.S. degrees fromSharif University of Technology, Tehran, Iran, in1999 and 2001, respectively, and the Ph.D. degreefrom the University of California, Los Angeles, CA,USA, in 2007, all in electrical engineering.In 2004, she was a Summer Intern at IBM,

Yorktown Heights, NY, USA. She joined RambusInc. in 2007. Since 2008 she has been with NetLogicMicrosystems, now part of Broadcom Corporation,Santa Clara, CA, USA. Her technical interests

include high-speed, high-performance, mixed-mode integrated circuit andsystem design.

Athos Kasapi (M’12) received the Ph.D. degreein electrical engineering from Stanford University,Stanford, CA, USA, in 1994.He is presently Principal at Athos Kasapi Con-

sulting LLC. His interests include algorithms andsignal processing for wired and wireless communi-cations systems.

Dimitri N. Loizos (S’06–M’08) received theDiploma in electrical and computer engineeringfrom the National Technical University of Athens,Athens, Greece, in 2003 and the M.Sc.E. and Ph.D.degrees in electrical and computer engineering fromthe Johns Hopkins University, Baltimore, MD, USA,in 2005 and 2007, respectively.He then joined the Division of Biological Sciences,

University of California at San Diego, La Jolla, CA,USA, as a Postdoctoral Fellow. Since 2008, he hasbeen with NetLogic Microsystems Inc., now part of

Broadcom Corporation, Santa Clara, CA, USA. His research interests includemodel-free optimization techniques and their VLSI implementation, analog andmixed-signal IC design for high-speed wire-line transceivers, as well as systemand circuit design for optical network communications.Dr. Loizos was the recipient of the Best Paper Award in the IEEE Symposium

on Integrated Circuits and Systems Design 2007, as well as the third place forthe Best Student Paper in the IEEE International Symposium on Circuits andSystems 2007.

Page 11: A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VARZAGHANI et al.: A 10.3-GS/s, 6-Bit FLASH ADC FOR 10G ETHERNET APPLICATIONS 11

Song-Hee Paik received the B.S. and M.Eng.degrees in electrical engineering from the Massa-chusetts Institute of Technology, Cambridge, MA,USA, in 2002 and 2004, respectively.From 2004 to 2008, she was with Analog Devices,

Wilmington, MA, USA, working on high-speed I/Ofor networking and video applications. Since 2008,she has been with NetLogic Microsystems, now partof Broadcom, Santa Clara, CA, USA, where she de-signs mixed-signal circuits for wire-line transceivers.

Shwetabh Verma (S’97–M’04) received the B.S.degree from the University of Toronto, Toronto,ON, Canada, in 1998, and the M.S. and Ph.D.degrees from Stanford University, Stanford, CA,USA, in 2000 and 2005, respectively. His graduatework focused on the design of low-cost, low-powertechnology for wireless personal area networks.Since 2005 he has been with Aeluros Inc., now

part of Broadcom, designing circuits and systems forbroadband data communications.

Sotirios Zogopoulos received the B.S. degree inelectrical engineering and computer science fromthe Technical University of Crete, Greece, in 2001and the M.S. and Ph.D. degrees from the Universityof Southern California, Los Angeles, CA, USA, in2003 and 2007, respectively.Since 2007 he has been with Aeluros Inc., which

later was merged with NetLogic Microsystems andnow is part of Broadcom Corporation. His researchinterests include low-power and high-speed trans-ceiver architectures, calibration schemes and DSP

architectures.

Stefanos Sidiropoulos (S’93–M’98) received the B.Sc. and M.Sc. degrees incomputer science from the University of Crete, Greece, in 1991, and the Ph.D.degree in electrical engineering from Stanford University, Stanford, CA, USA,in 1997.From 1993 to 2001 he worked in R&D and management positions at various

established and start-up companies (Rambus, MIPS, DEC, 8x8 Inc). In 2001 heco-founded Aeluros Inc., a mixed-signal semiconductor company focused onCMOS ICs for the optical Ethernet market. He was CEO of Aeluros until thecompany’s acquisition by Netlogic Microsystems in 2007. He was the VP/GMof Netlogic’s Physical Layer Products Group until 2011, when Netlogic wasacquired by Broadcom Corporation. Subsequently he was with Broadcom’s In-frastructure Networking Group until 2013, and currently is with Barefoot Net-works Inc, a start-up company in Palo Alto, CA.He has published more than 30 peer-reviewed technical papers and holds over

50 patents in areas ranging from mixed-signal circuit design to networking al-gorithms and memory systems. His current technical interests are in the designof hardware and software for communication systems.


Recommended