Adaptive DecisionFeedback Equalization With Continuous ...€¦ · Adaptive DecisionFeedback...

Adaptive Decision Feedback Equalization With

Continuous-Time Infinite Impulse Response Filters

by

Shayan Shahramian

A thesis submitted in conformity with the requirementsfor the degree of Doctorate of Philosophy

Graduate Department of Electrical and Computer Engineering

University of Toronto

Copyright c© 2016 by Shayan Shahramian

Adaptive Decision Feedback Equalization WithContinuous-Time Infinite Impulse Response Filters

Shayan Shahramian

Doctorate of Philosophy, 2016

Graduate Department of Electrical and Computer Engineering

University of Toronto

Abstract

In high-speed (10+Gb/s) chip-to-chip links, the primary impairments to signal in-

tegrity are noise, crosstalk, and a smooth tail in the pulse response resulting in inter-

symbol interference (ISI) sometimes spanning more than 10 unit intervals (UIs). Al-

though often simple in their implementation, continuous time linear equalizers amplify

high-frequency noise and crosstalk and consume extra power. A conventional discrete-

time (DT) decision feedback equalizer (DFE) is well-suited and power efficient for chan-

nels with a few dominant post-cursor ISI terms, however, the power can become pro-

hibitive for channels with many post-cursor ISI terms. Infinite impulse response (IIR)

DFEs can equalize post-cursor ISI persisting 10 or more UIs while consuming low-power

comparable to just one DT tap. DFE architectures with varying numbers of DT and IIR

taps are compared for use in typical wireline channels, and it is found that 2 IIR taps can

offer an excellent compromise between power consumption and performance. However,

an IIR DFE’s performance degrades significantly as the feedback loop delay increases.

Fortunately, adding a single DT tap can eliminate the degradation. The first ever hybrid

DFE combining 1 DT and multiple (2) IIR taps is presented equalizing 24dB loss at half

the bitrate while consuming 4.1mW at 10Gb/s. A novel edge based adaptation algorithm

ii

is also presented for DT DFEs which converges faster than previous algorithms while us-

ing the same high-speed circuitry and signals required for clock recovery. The edge based

algorithm is extended to work for a 1 DT + 1 IIR DFE. The 1 DT + 1 IIR DFE along

with integrated clock recovery and adaptation is demonstrated in 28nm FD-SOI CMOS.

At 16Gb/s with a 30dB-loss channel, a BER below 10−12 is measured over a 0.3UI timing

window. The novel edge-based algorithm adapts both IIR and discrete-tap equalizer co-

efficients using the same high-speed circuitry and signals required for clock recovery. The

adaptive DFE converges within 5us and is robust in the presence of poorly-conditioned

data.

iii

Acknowledgements

I would like to thank professor Tony Chan Carusone for his guidance, kindness, and

patience. Tony, you have been a role model for me in all aspects of life. I have learned

countless things from you throughout this journey and you have strongly helped shape

the person I am today. To anyone that has the opportunity to work with professor Chan

Carusone, I could not recommend anyone more highly.

I would like to thank the members of my defense committee, professor Antonio Lis-

cidini, professor Ali Sheikholeslami, professor Wai Tung Ng, and professor Ravi Adve

for their constructive comments which all helped improve the thesis. I would also like

to thank Professor Samuel Palermo for taking the time to be my external examiner and

making the trip to Toronto to attend the thesis defense in person.

I would like to thank my parents and my in-laws for their support throughout this

degree. Thank you for always encouraging me and pushing me to strive for greatness.

To my brother Shahriar Shahramian, who I have followed in his footsteps and learned

countless things from, thank you for always being there for me.

I would like to thank my good friend Behzad Dehlaghi for his help with my project.

Behzad, you cared about this project and worked on it with an amazing level of passion

and interest.

To my other close friends who all played your part in me being able to complete this

degree: I want to thank Sadegh Jalali for being a solid friend and becoming a member

of my family. I want to thank Michal Fulmyk (PCB Expert), Alireza Sharif-Bakhtiar,

Mario Milicevic, Joshua Liang, Aynaz Vatankhahghadim, Ravi Shivnaraine, Rosanah

iv

Murugesu, Meysam Zargham, Dustin Dunwell, Mike Bichan, and Luke Wang for the

great memories throughout my degree.

Finally, to my wife Tina Tahmoureszadeh, you are my light in darkness; you have

always been there for me and none of this would be possible without your love and

support.

v

Contents

1 Background 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Continuous Time Linear Equalization . . . . . . . . . . . . . . . . 4

1.2.2 Decision Feedback Equalization . . . . . . . . . . . . . . . . . . . 5

1.2.3 Infinite Impulse Response Decision Feedback Equalization . . . . 7

1.2.4 First-order CTLE vs IIR DFE . . . . . . . . . . . . . . . . . . . . 8

1.3 Adaptive Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Clock and Data Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Generalized IIR + DT DFE architecture 17

2.1 Generalized IIR + DT DFE Architecture . . . . . . . . . . . . . . . . . . 18

2.2 IIR DFE Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . 26

2.3 Proposed 2-IIR DFE + 1-DT Receiver . . . . . . . . . . . . . . . . . . . 31

2.3.1 Input Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3.2 Summing and Latches . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3.3 Data re-multiplexing & IIR Filters . . . . . . . . . . . . . . . . . 35

2.3.4 Clocking and Output Buffers . . . . . . . . . . . . . . . . . . . . 37

2.3.5 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . 39

vi

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3 Robust Edge-Based Adaptation 48

3.1 Adaptation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.1.1 Adaptation Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.1.2 Mean-Square Error . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1.3 Tolerance to Repeating Patterns . . . . . . . . . . . . . . . . . . . 51

3.2 Discrete-time DFE Adaptation . . . . . . . . . . . . . . . . . . . . . . . 52

3.2.1 One 6-bit Pattern Used . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.2 Using Patterns of Varying Sizes . . . . . . . . . . . . . . . . . . . 57

3.2.3 Proposed: Utilizing all patterns . . . . . . . . . . . . . . . . . . . 57

3.2.3.1 Ensuring Pattern Diversity . . . . . . . . . . . . . . . . 59

3.3 Comparing Adaptation Schemes . . . . . . . . . . . . . . . . . . . . . . . 61

3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Edge Based IIR DFE Adaptation 67

4.1 Prior Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.2 IIR DFE Adaptation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 69

4.3 Proposed System Architecture . . . . . . . . . . . . . . . . . . . . . . . . 71

4.3.1 Double-tail Latch Architecture . . . . . . . . . . . . . . . . . . . . 73

4.3.2 Clockless Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3.3 IIR Filter Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.4 Demultiplexer Structure . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.5 Phase Interpolator based Clock Recovery . . . . . . . . . . . . . . 82

4.3.6 Adaptation Simulation Results . . . . . . . . . . . . . . . . . . . 84

4.4 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.4.1 Measurement Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.4.2 Clock Recovery Measurement Results . . . . . . . . . . . . . . . . 91

vii

4.4.3 DFE Adaptation Measurement Results . . . . . . . . . . . . . . . 91

4.4.3.1 Measured Channel Responses . . . . . . . . . . . . . . . 91

4.4.3.2 Measured Channel 1 Results . . . . . . . . . . . . . . . . 94




4.4.4 Clock Recovery and DFE Measurement Results . . . . . . . . . . 101

4.4.5 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . 102

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5 Conclusion 104

5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.1.1 Edge Based Adaptation Improvements . . . . . . . . . . . . . . . 105

5.1.2 Adapting Multiple IIR DFEs . . . . . . . . . . . . . . . . . . . . 106

References 106

viii

List of Tables

2.1 Reduction in vertical eye opening and increase in peak-to-peak jitter for

variations in DFE coefficients . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1 Chip Power Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.2 Comparison to previous work. . . . . . . . . . . . . . . . . . . . . . . . . 103

ix

List of Figures

1.1 Google data center [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Pipeline inside a google data center used for cooling [1]. . . . . . . . . . . 3

1.3 Various architectures for link equalization: A) A passive equalizer and

amplification at the receiver. B) Amplification in the transmitter along

with a passive equalizer at the receiver. C) A decision feedback equalizer

at the receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Energy efficiency of state-of-the art DFEs plotted versus the channel loss

they compensate at the one-half the bitrate. . . . . . . . . . . . . . . . . 6

1.5 A) Insertion loss of two channels dominated by skin-effect and dielectric

loss. B) Pulse response for each of the channels showing the ISI terms. . 7

1.6 DFE with 1 IIR tap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.7 (A) A Receiver with a passive equalizer.(B) A receiver with an IIR DFE

redrawn to show that the IIR DFE can be viewed as having access to

the transmitted data if the comparator output is error free. (C) Passive

equalizer implementation. (D) IIR DFE implementation . . . . . . . . . 10

1.8 Block diagram of sign-sign least mean square (SS-LMS) adaptation imple-

mentation for a 1-tap DT DFE. . . . . . . . . . . . . . . . . . . . . . . . 12

1.9 BER based adaptation requiring an additional high-speed comparator to

generate the error signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

x

1.10 Histogram based adaptation trying to modify the probability density func-

tion (PDF) of the received signal to match that of the ideal transmitted

data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.11 DFE adaptation without an additional high-speed comparator . . . . . . 15

1.12 Clock alignment to the center of the eye using a CDR . . . . . . . . . . . 15

1.13 Alexander bang-bang phase detector for “full-rate” systems where the

clock frequency is equal to the bitrate [2] . . . . . . . . . . . . . . . . . . 15

2.1 DFE with 2 IIR taps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Generic DFE architecture consisting of K DT taps and N continuous-time

IIR filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3 Pulse response for DFEs employing different number of DT taps and/or

IIR filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Simulated eye diagram with clock alignment and jitter histogram . . . . 20

2.5 Channel and DFE feedback pulse response . . . . . . . . . . . . . . . . . 21

2.6 Measured channel insertion loss . . . . . . . . . . . . . . . . . . . . . . . 22

2.7 Simulated eye diagrams for various DFE Architectures . . . . . . . . . . 23

2.8 Simulated DFE architectures at 10 Gb/s (FR-4 backplane channel) . . . 23

2.9 Simulated DFE architectures at 10 Gb/s (coax cable) . . . . . . . . . . . 25

2.10 (A) A 1 discrete-tap DFE, with varying loop delay and the channel pulse

response. (B) A 1 IIR DFE with varying loop delay and the channel and

resulting equalized pulse responses . . . . . . . . . . . . . . . . . . . . . 28

2.11 (A) - (E) Simulated Bathtub curves with various latch-delays for different

DFE architectures for a 32” backplane channel having ∼20dB loss at one-

half the bit rate. (F) Post-layout simulations of latch-delay increase as a

function of process and VDD variations in a 28nm CMOS technology. . . 29

2.12 (A) Block diagram of a 1 IIR + 1 Discrete-tap DFE . (B) Block diagram

of a 2 IIR DFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

xi

2.13 (A) Simulated bathtub curves for 2-IIR DFE with VDD and temperature

changes. The coefficients are re-adjusted to compensate for the change in

circuit performance. (B) The same simulations and conditions as (A) but

for a 2-IIR DFE with 1 Discrete-tap. . . . . . . . . . . . . . . . . . . . . 32

2.14 Block diagram of the receiver . . . . . . . . . . . . . . . . . . . . . . . . 33

2.15 Input termination, passive equalizer with disable, and preamplifier. . . . 33

2.16 Double-tail latch with DFE subtraction directly performed inside the latch.

The subtraction for IIR2, not shown, is identical to and in parallel with

IIR1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.17 Two dynamic 2:1 multiplexers are used to create a differential 2:1 multiplexer 36

2.18 IIR filters created using a resistor and switched capacitor circuits. The

faster time constant IIR1 (A) includes a varactor to allow for finer tuning. 36

2.19 (A) ILO1 and ILO2 block diagram showing the delay cells and the ring

ILO used. (B) ILO delay cell schematic . . . . . . . . . . . . . . . . . . . 38

2.20 Prototype 28nm-LP CMOS Die photo. . . . . . . . . . . . . . . . . . . . 39

2.21 Measurement setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.22 (A) ILO measurement setup by setting offset voltages to their maximum

and minimum values. (B) Clock pattern generated at the output is mea-

sured to determine phase shift introduced by the ILOs. . . . . . . . . . . 41

2.23 ILO tuning characteristics for both ILO1 and ILO2 at 10Gb/s and 8Gb/s 41

2.24 (A) Latch offset calibration setup to compensate for offset on the odd path

(B) Pattern generated at the output is measured to determine amount of

offset present in the latch. . . . . . . . . . . . . . . . . . . . . . . . . . . 42

xii

2.25 A, B) Channel insertion loss including the receiver characterization PCB

and QFN package loss. The coax and backplane channel losses are domi-

nated by skin effect and dielectric loss, respectively. C,D) Channel pulse

response for coax and backplane channels, respectively E) Model for the

Characterization PCB + QFN Package. . . . . . . . . . . . . . . . . . . . 43

2.26 Full-Rate retimed output from the chip. . . . . . . . . . . . . . . . . . . 44

2.27 (A,B) Bathtub curves at 10Gb/s using the passive EQ only for various

TX swings. (C) Input to the receiver when the passive equalizer is used to

equalize the signal with a swing of 1200 mVpp-diff. (D,E) Bathtub curves

at 10 Gb/s for various DFE settings and with the passive EQ disabled.

(F) Eye diagram at the channel output and input to the receiver with a

swing of 150mVpp-diff when the DFE is used for equalization. . . . . . . 44

2.28 Power breakdown and comparison to previous work . . . . . . . . . . . . 46

3.1 Metrics for comparing adaptation algorithms . . . . . . . . . . . . . . . . 51

3.2 A sample repeating pattern where Pseudo Random Bit Sequence (PRBS)

data alternates over 10,000 UI with specific repeating patterns. . . . . . . 52

3.3 a) Block diagram of a DFE and adaptation engine only using phase detec-

tor outputs b) Pulse response showing ISI on the edge samples is removed

using the edge based DFE adaptation. . . . . . . . . . . . . . . . . . . . 53

3.4 a) Two patterns that are used to obtain information about ISI at h0.5+k.

b) Example patterns of length N+2 used for obtaining ISI information at

h0.5+k. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.5 (left) Adaptation with repeating patterns protect feature disabled. (right)

Adaptation with repeating patterns protect enabled and κ = 10 . . . . . 60

3.6 Number of different patterns in Λ bits vs. UI for a repeating patterns

input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

xiii

3.7 Adaptation curves for the three schemes for a simple RC filter with PRBS-

31 input. The first half of adaptation curve is the result of 100 Monte Carlo

runs ensemble averaged to determine the settling time. . . . . . . . . . . 63

3.8 Adaptation curves for the three schemes for a 24” backplane channel with

PRBS-31 input. The first half of adaptation curve is the result of 100

Monte Carlo runs ensemble averaged to determine the settling time. . . . 63

3.9 Adaptation curves for the three schemes for a 50 meter coax cable with

PRBS-31 input. The first half of adaptation curve is the result of 100

Monte Carlo runs ensemble averaged to determine the settling time. . . . 63

3.10 Adaptation time and normalized coefficient error shown for the three

schemes and three different channels with varying amount of attenuation.

The results are for a 100 Monte Carlo runs with different PRBS31 and

noise seeds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.11 Adaptation curves for the three schemes for a 16” backplane channel with

repeating patterns input . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.12 Adaptation curves for a 24” backplane channel with varying initial condi-

tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.13 The adaptation time and normalized coefficient error as a function of the

number of patterns per decision, Λ. . . . . . . . . . . . . . . . . . . . . 66

4.1 Adaptation scheme for an IIR DFE using additional high-speed compara-

tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2 Proposed 1 IIR + 1 DT architecture without using additional high-speed

comparators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.3 Pulse response showing which ISI terms are used for the DFE coefficients. 71

4.4 Block diagram of the proposed 1 DT + 1 IIR DFE with adaptation and a

digital clock recovery unit. . . . . . . . . . . . . . . . . . . . . . . . . . . 73

xiv

4.5 Half-rate IIR DFE implementation with edge comparators included for the

adaptation/digital clock recovery unit (CRU). . . . . . . . . . . . . . . . 74

4.6 Latch architecture with subtraction performed directly inside the latch.

There are 5 slices for each the discrete-tap and IIR tap. . . . . . . . . . . 75

4.7 Clk-to-Q delay of the double-tail latch plus the RS latch as a function of

input amplitude and body bias voltage. . . . . . . . . . . . . . . . . . . . 76

4.8 Amount of subtraction for the discrete-tap as a function of the gain set-

tings. Simulation are for a TT corner with a VDD=1V and are RCc

extracted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.10 The Clk-to-Q delay of the double-tail latch + 2:1 clockless mux. . . . . . 78

4.11 (left) Filter code vs. filter bandwidth for two different IIR filter architec-

tures. (right) pulse responses for the two architectures of binary weighted

resistors (BWR) and binary weighted capacitors (BWC). . . . . . . . . . 80

4.12 IIR filter implementation using binary weighted resistors. . . . . . . . . . 80

4.13 (a) 2:8 demultiplexer architecutre including clock dividers. (b) a 1:2 demul-

tiplexer implementation using TSPC latches. (c) TSPC latch schematic.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.14 Phase rotator based digital CRU block diagram. . . . . . . . . . . . . . . 83

4.15 (a) phase code vs time for the digital CRU without frequency offset. (b)

phase code vs time for the digital CRU with 50 ppm frequency offset. . . 83

4.16 (left) Channel insertion loss for an 8” backplane, 24” backplane, 10 feet

coax cable, and 20 feet coax cable. (right) normalized pulse response for

each of the channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.17 1-IIR + 1-DT DFE adaptation curves for various channels showing the

tap weights for G1, B1, τ1. . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.18 Chip die photo and area breakdown. . . . . . . . . . . . . . . . . . . . . 86

xv

4.19 Measurement setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.20 Main GUI screenshot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.21 DFE GUI screenshot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.22 CDR GUI screenshot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.23 Adaptation GUI screenshot. . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.24 (a) High-frequency board housing the DUT and decoupling capacitors. (b)

DC board including regulators, uC, and DACs . . . . . . . . . . . . . . 90

4.25 (a) Half-rate re-timed output eye at 8Gb/s (DEV EN). (b) Phase rotator

output clock (CLK0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.26 Measured phase code vs. time for 0ppm, 100ppm, and 150ppm frequency

error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.27 Measured jitter tolerance for 0ppm, 100ppm and 150ppm frequency offset

for input amplitudes of 2Vpp-diff and 0.8Vpp-diff. . . . . . . . . . . . . . 92

4.28 Measured Channel insertion losses including setup loss. . . . . . . . . . . 93

4.29 Measured Channel output eye diagrams not including characterization

PCB + QFN package for an input amplitude of 2Vpp-diff . . . . . . . . . 93

4.30 Measured Channel output eye diagrams not including characterization

PCB + QFN package for an input amplitude of 0.8Vpp-diff . . . . . . . . 94

4.31 Repeating Patterns used to characterize adaptation robustness. . . . . . 95

4.32 Measured Adaptation curves for channel 1 with various types of inputs. . 96

4.33 Measured Channel 1 bathtub curves for various input amplitudes along

with the adapted coefficient values. . . . . . . . . . . . . . . . . . . . . . 97

4.34 Measured Adaptation curves for channel 2 with various types of inputs. . 97


with the adapted coefficient values. . . . . . . . . . . . . . . . . . . . . . 98

4.36 Measured Adaptation curves for channel 3 with a PRBS7 input (DT Gain,

G, is set manually). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

xvi


with the adapted coefficient values (DT Gain, G, is set manually). . . . . 99

4.38 Measured Adaptation curves for channel 4 with a PRBS7 input (DT Gain,

G, is set manually). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100


with the adapted coefficient values (DT Gain, G, is set manually). . . . . 100

4.40 Measured jitter tolerance for 0ppm, 100ppm and 150ppm frequency offset

for input amplitudes of 2Vpp-diff and 0.8Vpp-diff for channel 1. . . . . . 101

4.41 Measured bathtub curves showing the degradation in eye opening with

different PI codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

xvii

1 Background

1.1. Motivation

The demand for higher speed data networking and computer infrastructures is constantly

increasing due to the increasing amount of internet traffic. A data center from Google is

shown Fig. 1.1. Advances in video quality such as 4K streaming are gaining popularity

and the number of users are increasing. In 2014, 48% of the internet traffic in North

America was from Netflix and YouTube [3]. To meet this demand, data center I/O

bandwidths have been doubling approximately every 24 months [4]. This has led to the

next generation of Ethernet standards such as IEEE Std 802.3bj-2014 which is for 100GE

and IEEE P802.3bs for 400GE. The number of 10Gb/s Ethernet switch ports will increase

by 25X from 2009-2018 [5]. Similarly, demand for faster chip-to-chip communication links

have been increasing. Communication over Ultra-short reach links (< 2.5cm) all the way

up to Long-reach (up to 100cm) links over a back-plane is also required between chips

at increasing speeds. The next generation of required data rates for such links is now

400Gb/s to 1Tb/s [6].

While increasing data-rates provides challenges, things become much more difficult

once thermal issues are also considered. Fig. 1.2 shows the cooling infrastructure for a

data center. Massive infrastructures are needed to cool these growing data centers which

makes energy efficiency a critical parameter. Furthermore, chip-to-chip communication

1

Chapter 1. Background 2

Figure 1.1: Google data center [1].

reaches a thermal limit at the point where an air cooled chip heats to the point of catas-

trophic breakdown. The maximum power density for an air cooled chip is approximately

100W/cm2 [7]. To obtain a reasonable yield, the die size is limited to around 1.5cm x

1.5cm and can dissipate somewhere between 100W-200W before thermal failure. Around

10-20% of the power is allocated for the I/O of the chip, which is around 10-40W. Finally,

to be able to support the next generation of required data rates, the chips need to have an

aggregate bandwidth of 1-4Tb/s. This puts a requirement on the energy efficiency of the

link to be 2.5-10 pJ/bit while operating at 10+Gb/s to meet the bandwidth requirements

in the near future.

In many high-speed (10+Gb/s) chip-to-chip links, the primary impairments to sig-

nal integrity are noise, crosstalk, and a smooth tail in the pulse response resulting in

inter-symbol interference (ISI) spanning more than 10 unit interval (UI). As the ag-

gregate bandwidth inside high-performance computing, networking infrastructure and

mobile platforms increases, data rates must be increased throughout the communica-


Figure 1.2: Pipeline inside a google data center used for cooling [1].

tion hierarchy. Hence, the energy efficiency of all links must improve to stay within

limited power envelopes, requiring low-power receivers that can overcome channel im-

pairments. Chip-to-chip links over lossy printed circuit boards (approximately 1m in

length at 10Gb/s) [8], die-to-die links over silicon interposers [9] (up to a few centime-

ters in length at 10+Gb/s), or coaxial cable links (whose maximum length for robust

communication at 10+Gb/s data depends upon the cable cross-section) have between

20-30dB loss at one-half the bitrate [8], [9], [10]. In many cases, these links do not ex-

hibit major discontinuities. Even backplane links may employ high-frequency connectors

and/or backdrilling to mitigate the impact of the daughtercard-backplane discontinu-

ity [11]. Hence, the pulse response for many links does not suffer from major reflections,

but rather exhibits predominantly a long smooth tail of post-cursor ISI spanning 10 UI

or more. These links require equalization to overcome the channel ISI and allow for

data recovery at the receiver with low power consumption. Moreover, a receiver with

improved sensitivity in the presence of noise and crosstalk can permit lower transmit


swings thereby improving the links’ energy efficiency [12], [13].

The remainder of the chapter is organized as follows. Section 1.2 provides background

on different equalization methods. Section 1.3 will discuss the most prominent methods

of adaptive equalization. Finally, section 1.4 will introduce some of the important receiver

features to perform clock and data recovery (CDR).

1.2. Equalization

Fig. 1.3 shows various architectures that can be used for equalization of the channels

discussed in section 1.1.

1.2.1. Continuous Time Linear Equalization

Although often simple in their implementation, continuous time linear equalizers amplify

high-frequency noise and crosstalk and consume extra power. For example, a passive

equalizer followed by a gain stage (e.g. [14]) can be used to cancel the long tail of the pulse

response shown in Fig. 1.3A. Equivalently, the two can be combined into a continuous

time active linear equalizer [13]. In both cases, high frequencies experience more gain in

the receiver’s linear front end than do low frequencies. Since low frequencies determine

the baseline received eye opening, this means that the noise (which is broadband) and

crosstalk (typically concentrated at high frequencies) are amplified with respect to the

eye opening. Alternatively, the amplification can be performed at the transmitter so that

a wider dc swing is transmitted and only a passive equalizer used at the receiver as shown

in Fig. 1.3B resulting in the same eye opening as in Fig. 1.3A. In this case, the receiver’s

input-referred noise is not amplified but assuming near- and/or far-end crosstalk arises

from similarly architected links, the wider transmit swing will still mean more high-

frequency crosstalk. Furthermore, transmit swing cannot be increased indefinitely since

it is ultimately limited by the power supply voltage of the transmitter.


Channel

Passive EQ

TX Data Equalized

Dataf

0 dB

RX Amplification

Channel

Passive EQ

TX Data Equalized

Dataf

0 dB

TX Amplification

ChannelTX Data Recovered

Data

H1

HN

+-

+

+

Conventional DFE

Z-1

A)

B)

C)

Figure 1.3: Various architectures for link equalization: A) A passive equalizer and amplifica-tion at the receiver. B) Amplification in the transmitter along with a passive equalizer at thereceiver. C) A decision feedback equalizer at the receiver

1.2.2. Decision Feedback Equalization

A conventional discrete-time decision feedback equalizer (DFE), shown in Fig. 1.3C, is

well-suited and power efficient for channels with a few dominant post-cursor ISI terms.

Since the input to the DFE is the recovered digital data pattern free from channel noise

and crosstalk, it is able to cancel ISI without amplifying noise or crosstalk and without

attenuating the channel’s main-cursor response. Fig. 1.4 shows the energy efficiency

of some of the best DFE implementations plotted against the amount of attenuation

they compensate. State-of-the-art DFE implementations consume 0.083-0.25 pJ/bit/tap

[15], [16], [17], [18], [19]. Hence, DFEs are an efficient equalizer for the cancellation of

a few taps of post-cursor ISI compared to a continuous-time linear equalizer which have


recently been reported at 0.27 pJ/bit [13]. For the cancellation of reflections, DFE taps

with programmable delays, called roving taps, have been used [10], [20], [21]. Roving taps

allow the system to cancel the most significant post-cursor ISI terms while only adding

a few extra DFE taps; for example, additional taps in [10].

It is important to note that the channel loss at one-half the bitrate alone does not

indicate the number of post-cursor ISI terms that are present. Fig. 1.5A shows the

frequency response of two exemplar channels normalized to an arbitrary bitrate, fbit: one

dominated by skin effect loss and another dominated by dielectric loss. Both channels

have 25dB of loss at one-half the bitrate. Fig. 1.5B shows the pulse responses of both

channels with a transmitted pulse amplitude of 1 and 1 UI in duration with the post-

cursor ISI terms shown. Channels dominated by dielectric loss have a faster roll-off

(proportional to frequency) in their frequency response and a few dominant post-cursor

ISI terms. The channel dominated by skin effect loss has a slower roll off in its frequency

response (proportional to√f) and hence many more post-cursor ISI terms, in spite of

having the same magnitude response at one-half the bitrate.

For channels exhibiting a long tail of ISI, discrete-time (DT) DFE complexity becomes

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25 30 35 40

En

erg

y/B

it (

pJ/

Bit

)

Channel Attenuation at Nyquist (dB)

Discrete-Tap DFE IIR DFE

* Power includes clock buffers

[19]

[22]

[16]*

[17][23]

[9]

[24]

[18]

[25]

Figure 1.4: Energy efficiency of state-of-the art DFEs plotted versus the channel loss theycompensate at the one-half the bitrate.


(A) (B)

0 0.2 0.4 0.6 0.8 1-40

-35

-30

-25

-20

-15

-10

-5

0

Ma

gn

itu

de

Re

sp

on

se

(dB

)

Normalized Frequency (fbit)

0 10 20 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35

No

rma

lize

d P

uls

e R

esp

on

se

Time (UI)

Figure 1.5: A) Insertion loss of two channels dominated by skin-effect and dielectric loss. B)Pulse response for each of the channels showing the ISI terms.

prohibitive. For example, consider the channel response for the skin effect loss channel

(Fig. 1.5B); note that 10 UI of post-cursor ISI terms exceed 5% of the main-cursor’s

amplitude. Based upon the state-of-the-art 0.083-0.25 pJ/bit/tap, at 10Gb/s a 10-tap

DFE will consume 8.3-25 mW.

1.2.3. Infinite Impulse Response Decision Feedback Equalization

An alternative approach, illustrated in Fig. 1.6(left), shows an integrating infinite impulse

response (IIR) DFE to equalize long pulse responses , [26], [9], [24], [27], [28]. In this

approach, several DT DFE taps are replaced by a single feedback tap with an infinite

impulse response. The feedback path’s response is designed to match and cancel the tail

In (Channel Output)

IIR1

Puls

e R

esp

on

se

Equalized

+-

+

In Out

τ1

time

In (Channel Output)

IIR1

Puls

e R

esp

on

se

Equalized

+-

+

In Out

τ1

time

Figure 1.6: DFE with 1 IIR tap.


of the channel response shown in Fig. 1.6(right). This allows a single filter to cancel

multiple post cursor ISI terms saving power. Fig. 1.4 shows the power efficiency of IIR

DFEs relative to the amount of equalization they provide. The downside of an IIR DFE

is the inability to cancel reflections in the pulse response. Since the output of the filter is

a smooth RC response, a good fit cannot be obtained to channels with large amounts of

reflections. However, it is shown in chapter 3 that for channels with moderate reflections,

IIR DFEs can still provide a reasonable performance.

1.2.4. First-order CTLE vs IIR DFE

A continuous-time equalizer and IIR DFE can be shown to have similar effect on the

received signal. Let H(s) represent the transfer function of the channel, G(s) represent

the transfer function of a continuous-time (in this case, passive) equalizer as shown in Fig.

1.7A. The circuit parameters of the passive equalizer, RG1, RG2, and CG are defined in Fig.

1.7C. Let α = RG2/(RG1+RG2), ωGZ = 1/(RG1×CG), and ωGP = 1/(CG×(RG1//RG2)),

then the passive equalizer transfer function can be written as

G(s) = α× 1 + s/ωGZ

1 + s/ωGP

. (1.1)

If the recovered data is error free, the comparator output in the receiver is identical

to the transmitted data. Hence, in Fig. 1.7B the IIR DFE is modeled as having access to

the transmitted data. The circuit parameters of the DFE, R1 and C1 are defined in Fig.

1.7D. Let I(s) represent the transfer function of the IIR DFE tap. If ωIP = 1/(C1R1)

then the transfer function for the IIR filter can be written as

I(s) = β × 1

1 + s/ωIP

. (1.2)

To compare the effect of the passive equalizer with the IIR DFE, we set H(s) = 1

so that only the equalizer transfer functions are compared. In that case, the overall link


response in Fig. 1.7A is simply G(s), whereas the overall response of the IIR DFE link

in Fig. 1.7B is,

1− I(s) = (1− β)× 1 + s/(ωIP × (1− β))

1 + s/(ωIP ). (1.3)

The two approaches (continuous-time linear equalizer and IIR DFE) will be equivalent

when (1.1) and (1.3) are equal. The DC gain and the pole of the two transfer functions

are equal when β is set to 1 − α and ωIP = ωGP which means that CI = CG and

RI = RG1//RG2. Under these conditions, it may also be shown that ωIP ×(1−β) = ωGZ ,

as follows:

ωIP × (1− β) =1

CG × (RG1//RG2)× RG2

RG1 +RG2

=1

CG × RG1

= ωGZ .

(1.4)

From (1.4) it can be seen that both the passive equalizer and IIR DFE are performing

similar signal conditioning on the link, except that the IIR DFE operates on the recovered

data, free from noise and crosstalk. When H(s) 6= 1, the continuous-time linear equalizer

and IIR DFE can still be made equivalent if the IIR DFE is modified taking into the

account the response of H(s). The one provisio is that the model in Fig. 1.7B does not

include the delay of the channel which will impact the DFE response, I(s). The phase

shift due to the delay will capture the fact that the DFE can only cancel post-cursor ISI.

This model also does not capture the effect of error propagation.

If there is a continuous-time equalizer in front of the DFE, it alters the pulse response

and often makes it difficult to precisely cancel the ISI with a simple IIR DFE. If the

channel attenuation is high and both a continuous-time equalizer and an IIR DFE are

to be used, special care needs to be taken to make sure the IIR DFE can still match the

shape of the pulse response after the continuous-time equalizer.


Channel

Passive EQ

A)G(s)H(s)

Channel

H(s)

+-

τ1 β

I(s)

dEQ

d

d

B)

C)RG1

CG

RG2

G(s) I(s)

D)

βRI

CI

dEQ

Figure 1.7: (A) A Receiver with a passive equalizer.(B) A receiver with an IIR DFE redrawn toshow that the IIR DFE can be viewed as having access to the transmitted data if the comparatoroutput is error free. (C) Passive equalizer implementation. (D) IIR DFE implementation


This thesis will focus on IIR DFEs with low power, mainly to determine the DFE

architecture which has the best performance in terms of power, amount of equalization,

and system complexity. The thesis will also present an adaptation algorithm to determine

the IIR DFE coefficients.

1.3. Adaptive Equalization

All the types of equalization discussed in section 1.2 have some control parameters that

determine the amount of equalization. For many applications, the behavior of the channel

is not known a priori and it is desirable to have the equalizer coefficients determined

automatically by the system. Furthermore, the channel properties may change over time

and the system needs to be able to track these changes. All adaptation schemes can be

powered off after convergence to save power, however, they would no longer track slow

varying changes in the channel.

Traditional sign-sign least mean square (SS-LMS) based adaptation schemes require

an extra comparator in the signal path to guide the coefficients [29–32]. Fig. 1.8 shows

a block diagram for a SS-LMS adaptation of a 1-tap DT DFE. The SS-LMS algorithm

removes the correlation between the error signal, ek, and the previous bit, dk−1. The

error signal goes to zero and the correlation is removed when canceling the amount of

ISI introduced by the first-post cursor tap on the main cursor. If additional DFE taps

are required, the correlation between the error signal and dk−2, dk−3, etc.. is removed via

a similar loop with an integrator.

In [29], whose block diagram is shown in Fig. 1.9, a comparator with an adjustable

threshold and independent phase alignment is used, which allows for a bit error rate

(BER) based adaptation. Comparing the recovered data and the output of the compara-

tor, information can be obtained about the vertical eye opening and can be used to guide

equalization. In [30], an error comparator is used with a desired threshold which also

needs to be adjusted; this leads to two adaptation loops, one for the equalizer coefficients


+

W

+

-

ke

ˆkd

1

ˆkd −

-+

X

μ ∫

Din

Z-1

Figure 1.8: Block diagram of SS-LMS adaptation implementation for a 1-tap DT DFE.

and another to obtain the optimal comparator threshold value.

There are also adaptation techniques which utilize a histogram based approach shown

in Fig. 1.10. In these schemes, data from the output of a comparator with an adjustable

threshold is collected and a decision is made regarding equalization based on the proba-

bility density function (PDF) of the data samples [31], [32]. The transmitted data has a

PDF as shown in Fig. 1.10(left) where a transmitted 1 (VH) and a 0 (VL) have an ideal

PDF matching a delta function. As the data is transmitted through the channel, ISI

causes the signal’s PDF to deviate from the ideal delta function and spread out with a

Gaussian distribution. The received adaptation algorithm uses an additional compara-

tor to obtain information about the PDF of the received signal and to try to modify

the equalization parameters to match the PDF of the signal as best as possible to the

transmitted signal.

In all of these schemes, an additional high-speed comparator is required to generate

the error signal for the adaptation algorithm. Some of the designs also allow the clock

of the comparators to be adjustable to create an eye-monitor [29], [33]. These additional

high-speed slices consume extra power, load a critical node in the DFE and require


+-

+

+

+

G1

G2

Gk

Z-1

0o

180o

Sign(Em)

Dm

Rx

Em

Sign(Dm)

Adaptation

Engine

G1 G2 Gk

Φ

Additional

Slicer

Figure 1.9: BER based adaptation requiring an additional high-speed comparator to generatethe error signal.

VH

VL

ChannelVH

VL

PDF PDF

Transmitted Signal Recieved Signal

Figure 1.10: Histogram based adaptation trying to modify the PDF of the received signal tomatch that of the ideal transmitted data.


additional hardware to adjust their phase.

An alternative architecture involves using only the outputs of the phase detector

as will be discussed in section 1.4. These types of adaptation algorithms require no

additional high-speed hardware and therefore save power as well as avoid loading of the

high-speed node. Fig. 1.11 shows an example architecture where the data(Dm) and

edge(Em) samples from the phase detector are used to guide the adaptation algorithm.

The phase detector architecture is discussed further in section 1.4.

As discussed in section 1.1 this thesis focuses on low-power solutions for wireline

links and in keeping with that goal, these are the types of adaptation algorithms that

are investigated further. Chapter 3 and 4 look at existing adaptation algorithms in this

space in addition to presenting the proposed adaptation scheme.

1.4. Clock and Data Recovery

The CDR will adjust the clock phase of the receiver comparators to ensure the data is

sampled in the middle of the eye [34]. A popular example of particular interest in this

work is a phase interpolator based CDR shown in Fig. 1.12 [35], [36], [37]. The Alexander

Bang-Bang phase detector, [2], will determine whether the clock is sampling early/late

relative to the zero crossings of the signal. The early/late outputs are passed through a

proportional and an integral path. The proportional path helps align the phase of the

clock while the integral path helps track frequency offsets between the input data and

CLKref . The signal is integrated using an accumulator and finally sets the phase code

for the phase interpolator. This system can track small frequency offsets between the

input data and CLKref .

The block diagram for an Alexander Bang-Bang phase detector is shown in Fig.

1.13 [2]. Four comparators are required to detect the phase. In the conventional adapta-

tion schemes discussed in section 1.2, an additional high-speed comparator is required to

be able to guide the adaptation coefficients. In this thesis, chapters 3 and 4 will inves-


+-

+

+

+

G1

G2

Gk

Z-1

180o

0o

Adaptation

Engine

G1 G2 Gk

Rx

Sign(Em)Em

DmSign(Dm)

Figure 1.11: DFE adaptation without an additional high-speed comparator

Bang-Bang

Phase detectorVin

Dout,Eout

Kp

Ki Σ+ Σ PI+

Early

Late

Proportional path

Integral path

CLKref

-

Figure 1.12: Clock alignment to the center of the eye using a CDR

Clk

Data

Samplers

Edge

Samplers

X

Y

Din

Figure 1.13: Alexander bang-bang phase detector for “full-rate” systems where the clockfrequency is equal to the bitrate [2]

.


tigate adaptation schemes that use only the outputs already available in a conventional

Alexander phase detector reducing the over-head of high-speed comparators and their

loading effect on a critical node.

1.5. Objectives

The first objective is to understand and improve IIR DFEs because of their potential

for equalizing high-loss channels with a low power consumption. The second objective is

to improve DFE adaptation algorithms to both reduce adaptation time and reduce the

number of high-speed comparator required for the adaptation. Finally, to extend the

improved DFE adaptation to be used for IIR DFEs.

1.6. Outline

Chapter 2 will compare DFE architectures including IIR taps using some analysis on

how to select an appropriate DFE structure for particular channels of interest. Chapter

2 also presents a 28nm LP CMOS prototype implementing one such architecture along

with measurements results. Chapter 3 presents an edge-based adaptation algorithm that

is faster and more robust that previously implemented algorithms for a conventional

DT DFE. Chapter 4 will discuss how the edge-based adaptation algorithm discussed in

chapter 3 can be further modified to operate on an IIR DFE. A prototype of the IIR

DFE with an integrated adaptation algorithm in 28nm FDSOI CMOS process is also

presented along with measurement results.

2Generalized IIR + DT DFE

architecture

This chapter focuses on the analysis of a generic IIR DFE structure to be used for

wireline channels described in chapter 1. It will be shown that, depending on the shape

of the pulse response, a single time constant IIR DFE filter may not be able to provide

a good fit to cancel all of the post-cursor ISI. Instead, a generalized IIR-DFE structure

is considered which may comprise multiple IIR DFE taps, each having a different gain

and time constant, as shown in Fig. 2.1(left). The case of two IIR DFE taps is given

particular consideration. One filter is used to cancel the first few prevalent post-cursor

ISI terms while the second filter will cancel the remaining pulse response tail as shown

in Fig. 2.1(right). For channels with significant reflections, additional discrete-time taps

can be used. The IIR filters can cancel the general shape of the pulse response while

the additional DT taps cancel remaining ISI due to reflections. Using a continuous-time

linear equalizer with multiple DT DFE taps to cancel the reflections is also possible,

however, the reflections would be boosted by the continuous-time linear equalizer. This

may require the system to have more DT taps since even small reflections may be boosted

and become significant. Moreover, crosstalk and high-frequency noise are amplified by

the continuous-time linear equalizer. Hence, we focus here on receivers that rely upon a

DFE for equalization.

17

Chapter 2. Generalized IIR + DT DFE architecture 18

+-

+

In Out

τ1

+ τ2

In (Channel Output)

IIR1IIR2: Cancels Remaining ISI

Puls

e R

esp

on

se

time

Equalized

Figure 2.1: DFE with 2 IIR taps.

2.1. Generalized IIR + DT DFE Architecture

Given a generalized IIR-DFE architecture comprising (possibly) multiple IIR and/or

DT DFE taps, a natural question would be what is the best DFE architecture to use.

This analysis considers a DFE consisting of K DT taps and N IIR filters as shown

in Fig. 2.2. For the conventional DT DFE, N=0, and K taps are used; an example

pulse response is shown in Fig. 2.3a. The tap weights (G1, G2, G3, ...) are chosen to

subtract the post-cursor ISI at the sampling point. It is evident that for a channel with

several post-cursor ISI terms, a large number of DT DFE taps are required resulting

in higher power consumption. In [9], K=1, and N=1 and the parameters G1, B1, and

τ1 are varied. In fact, multiple IIR filters can be used to improve performance further.

Using 2-IIR filters, K=0 and N=2, each filter’s time constant (τ1, τ2) and gain (B1, B2)

can be varied. By having even more degrees of freedom (4 variables), a better fit can be

obtained to the channel pulse response compared with the previous case of K=1, N=2

(3 variables). Fig. 2.3b shows the channel pulse response and the response of the two

feedback filters. In this case, one filter’s parameters (τ1, B1) are chosen to cancel the

first few post-cursor ISI taps, while the second filter (τ2, B2) cancels the small residual

ISI present at the end of the pulse response. In the past, comparisons between IIR and

DT DFEs have not described how the coefficients may be optimized and the choice of

architectures (i.e. the number of DT and IIR taps) has therefore not been rigorously

justified. In this chapter, these architectures are systematically compared revealing that


an architecture with N=2 and K=1 provides an excellent combination of performance

and low complexity for common wireline channels. Also reported are the implementation

and measurement of an integrated DFE incorporating more than one IIR filter.

It is evident that by increasing the degrees of freedom in a DFE, ie. increasing the

number of taps or adding multiple IIR filters, the system can cancel more ISI. It would be

In Out

B1

BN

H1

Z-1

H2

Z-1

HK

1

Figure 2.2: Generic DFE architecture consisting of K DT taps and N continuous-time IIRfilters

Time (UI)0 10 20 30

Am

plitu

de

0

0.05

0.1

0.15

0.2Channel OutputPost-cursor ISIDiscrete-tap DFE response

...

H2

H1

H3

H10

(a) Conventional discrete-tap DFE, K=10, N=0

Time (UI)0 10 20 30

Am

plitu

de

0

0.05

0.1

0.15

0.2Channel OutputPost-cursor ISIB

1e-t/τ

2

B2e-t/τ

2

(b) 2-IIR filter DFE, K=0, N=2

Figure 2.3: Pulse response for DFEs employing different number of DT taps and/or IIR filters


beneficial to determine a DFE architecture providing adequate performance while keeping

the system complexity and power consumption as low as possible. Two things need to be

considered to correctly identify the optimal DFE architecture. Firstly, metrics need to

be determined that will allow for a comparison between the different DFE architectures.

Secondly, for each architecture, the DFE variables (filter time-constants, filter gains, and

tap weights) need to be optimized to obtain the best resulting metrics.

Two properties of the equalized signal are used here as metrics: Vertical eye opening

and jitter. Specifically, the Peak-to-peak jitter including only deterministic jitter caused

by ISI is used. Vertical eye opening is measured as the eye opening at the sampling

point (rising edge) of a recovered clock. The falling edges of the clock are aligned to the

point where h−0.5 = h+0.5 on the channel pulse response. This corresponds to the median

zero-crossing instances as would arise in a CDR using an Alexander bang-bang phase

detector [2] as shown in Fig. 2.4.

A cost function is defined here combining both vertical and horizontal eye opening.

Fig. 2.5 shows the channel pulse response (in grey) and the pulse response of the IIR

filters in the DFE as well as the DT DFE taps (in black). At the rising edges of the clock,

the difference between the two pulse responses is ed(i). This error affects the vertical eye

opening at the data sampling point. At the falling edges of the clock (zero crossings of

data), the difference between the pulse responses is ee(i). The cost function in (2.1) is a

Clk

Jitter

Histogram

Vertical Eye

Opening

Peak-to-peak

Jitter

Figure 2.4: Simulated eye diagram with clock alignment and jitter histogram


sum of the two errors.

Cost =

Q∑

i=1

|ed(i)|+Q∑

i=1

|ee(i)| (2.1)

This expression captures the sum of the absolute values of the residual ISI sampled

at 0.5 UI intervals. This metric is used instead of the mean-squared error because this a

better measure of the eye opening at very low BER since summing the absolute values

considers the worst-case ISI rather than the average ISI power. Note that, to limit the

optimization’s numerical complexity, only Q post-cursor ISI contributors are considered

in the cost function (2.1). The value of Q can be determined by observing how many

post-cursor ISI terms contribute significantly to the channel response. Another reason-

able upper bound to Q would be the longest number of consecutive identical digits the

system needs to support, since the predominantly positive ISI terms will all superimpose

constructively only when all of the Q preceding bits are identical. A value of Q=120 was

used in the remainder of this chapter.

Naturally, depending on the frequency response of the channel, the amount of ISI will

vary. This in turn could lead to different optimal DFE architectures for each channel.

Two channels have been considered here for operation at 10 Gb/s: a 50 meter coax

Channel Pulse

Response

IIR Filters +

DFE Taps

response

ed(1)ee(1)

CLK

Figure 2.5: Channel and DFE feedback pulse response


cable with 26 dB attenuation at 5 GHz and a 16” FR-4 backplane channel with 17 dB

attenuation at 5 GHz and the insertion loss shown in Fig. 2.6. In both cases, no linear

equalization is assumed. For each of the channels and each DFE architecture, K ∈ 0..10

and N ∈ 0..4, the DFE parameters Gk, BN , and τN are optimized to minimize the cost

function (1) at a 10 Gb/s data rate. A constrained nonlinear minimization is performed

using MATLAB’s fmincon function [38] to determine the optimal coefficients for each

DFE. The computation of DFE coefficients offline is practical only when the channel is

fixed and known a priori.

Fig. 2.7 shows the resulting equalized eye diagrams for the 16” FR-4 backplane

channel. For each DFE configuration, the system is simulated and the vertical eye opening

and jitter is measured. In Fig. 2.7, the first row corresponds to having a conventional DT

DFE with various numbers of taps (i.e. N=0, K=0..10 ). Each subsequent row adds one

IIR filter tap to the architecture. The measurements from each eye diagram are compiled

together in Fig. 2.8.

0 2.5 5 7.5 10 12.5 15-80

-70

-60

-50

-40

-30

-20

-10

0

Inse

rtio

n L

oss (

dB

)

Frequency (GHz)

FR-4 Backplane [16"]

Coax Cable [50m]

Figure 2.6: Measured channel insertion loss


Figure 2.7: Simulated eye diagrams for various DFE Architectures

0 2 4 6 8 100

20

40

60

80

Number of Discrete-time DFE Taps, K

Ve

rtic

al E

ye

Op

en

ing

(%

)

N = 0

N = 1

N = 2

N = 3

N = 4

Number of

IIR filters

(a) FR-4 backplane (16”) vertical eye opening

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1


Pe

ak-t

o-P

ea

k J

itte

r (U

I)

N = 0

N = 1

N = 2

N = 3

N = 4

(b) FR-4 backplane (16”) peak-to-peak jitter

Figure 2.8: Simulated DFE architectures at 10 Gb/s (FR-4 backplane channel)


Fig. 2.8a and 2.8b show the results for the 16” FR-4 backplane channel. Each curve

corresponds to a certain number of IIR filters while the x-axis refers to the number of

DT DFE taps, K, and the y-axes plot the two metrics. It can be seen that by increasing

the number of IIR filters in the DFE, N, there is a drastic improvement in both quality

metrics. While three or more IIR filters provide the best result, two IIR filters provide

a nearly optimal architecture while keeping system complexity low. It is also evident that

adding additional DT DFE taps to the system results in minuscule improvements and

can be avoided to reduce complexity.

Fig. 2.9a and 2.9b summarizes similar results for the 50 meter coax channel. Again

it is evident that increasing the number of IIR filters and DT DFE taps improves per-

formance. Once again trying to minimize the system complexity while considering the

quality metrics it is evident that two IIR filters provide a nearly optimal architecture.

It should be noted that looking only at the jitter performance, Fig. 2.8b, it may seem

that the performance degrades when increasing the number of DT taps, for example with

N=1 and K increasing from 0 to 1. However, the eye opening, Fig. 2.8a, has increased,

so the overall cost function improves. Table 2.1 shows the degradation in system perfor-

mance when all the DFE coefficients vary from their optimal value for both the 50 meter

coax cable and the 16” FR-4 backplane channel (K=0, N=2 ). To determine the DFE

architecture to use, factors other than vertical and horizontal eye opening need to be

considered (i.e. system complexity, area, power consumption, etc). While for both of the

channels analyzed, two IIR filters would provide a near optimal performance (in terms

of vertical and horizontal eye opening), in this chapter, adapting the coefficients is not

considered. In chapter 4, only 1 IIR tap is used in the DFE to simplify the adaptation

algorithm implementation.

It is interesting to compare architectures having K=0, N=2 and one having K=1,

N=1. Both can be thought of as having similar hardware complexity since both require

two taps into the DFE summing node (although, of course an additional passive filter


0 2 4 6 8 100

20

40

60

80


Ve

rtic

al E

ye

Op

en

ing

(%

)

N = 0

N = 1

N = 2

N = 3

N = 4

Number of

IIR filters

(a) Coax cable (50m) vertical eye opening

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1


Pe

ak-t

o-P

ea

k J

itte

r (U

I)

N = 0

N = 1

N = 2

N = 3

N = 4

(b) Coax cable (50m) peak-to-peak jitter

Figure 2.9: Simulated DFE architectures at 10 Gb/s (coax cable)

Table 2.1: Reduction in vertical eye opening and increase in peak-to-peak jitter for variationsin DFE coefficients

50 Meter Coax Cable 16” FR-4 Backplane

Coefficient Vertical Eye Jitter Vertical Eye Jitter

Variation Reduction Increase Reduction Increase

-20% 44 % 0.48 UI 26.6 % 0.1 UI

-10% 9.8 % 0.06 UI 11.8 % 0.013 UI

+10% 10.1 % 0.25 UI 12.2 % 0.08 UI

+20% 45 % 0.56 UI 28.4 % 0.17 UI


is required when N=2 ). With K=0, N=2, Fig. 2.8-2.9 illustrate a significant potential

improvement in performance. However, at the highest data rates, it may become difficult

to cancel the first post cursor ISI using only IIR filter taps due to additional delays in

the feedback path associated with the IIR filter. Using a DT tap can alleviate the timing

issues as discussed in section 2.2.

The adaptation of IIR filters has been studied in the literature [24, 39]. Unlike DT

DFEs, adapting the poles of IIR filters leads to local minima in a mean squared error

cost function which creates challenges for any gradient descent adaptation algorithm,

including the popular least mean square algorithm [40]. Solutions have been proposed

to guide the adaptation of such systems [41]. Implementation of such algorithms will

naturally rely upon some digital signal processing which certainly factors into the circuit’s

complexity. An adaptation algorithm for a system with K=1 & N=1 is presented in

chapter 4.

2.2. IIR DFE Performance Analysis

The benefit of an IIR DFE is that a single tap can cancel many UI of post-cursor ISI.

However, the performance of an IIR-DFE varies with loop delay, even for loop delays less

than 1 UI, whereas a DT DFE remains effective as long as the feedback loop delay is less

than 1 UI. To illustrate this, Fig. 2.10A shows a 1 discrete-tap DFE with loop delay ∆

which models the delay through the flip-flop, feedback gain path and summer. It can be

seen that as the loop delay (∆) increases, the retimed data (VDT ) is shifted by the amount

of the delay. However, the 1st post-cursor ISI term can still be canceled as long as the

delay is less than 1 UI. Fig. 2.10B shows the same plot for the 1 IIR DFE architecture.

As the loop delay is increased, the IIR gain and time constant are readjusted to best fit

the shape of the pulse response. However, it is evident that as the delay increases, the

amount of residual uncanceled 1st post-cursor ISI increases. Therefore, even for feedback

loop delays less than 1 UI, the performance of the receiver will degrade significantly. It


should be noted that the rising slope of the IIR filter output (VIIR) which is 1/τ1 × B1

is limited by the time-constant (τ1) that is selected so that the decaying tail of the pulse

response at VIIR matches that of the channel. Any additional loop delay (∆) decreases

the cancellation which the IIR DFE tap can provide for the first post-cursor ISI; the

reduction is given by 1/τ1 × B1 × ∆. The performance analysis degrades regardless

of how the loop delay is split between the flip-flop, summer, and gain path since this

feedback path is linear.

Fig. 2.11A shows a simulated bathtub curve for a 10 discrete-tap DFE and it can

be seen that the loop delay does not affect the horizontal eye opening. By contrast, the

loop delay is much more critical in an IIR DFE. Fig. 2.11B shows the bathtub curve

for an IIR DFE with increasing amounts of delay in the critical path and it is evident

that the performance is heavily dependent on the loop delay. To minimize the effect of

the delay in the critical timing path, a single 1st post-cursor DT tap is dedicated to the

cancellation of the first post-cursor ISI term. Specifically, the simulated horizontal eye

opening, shown in Fig. 2.11C, remains similar for varying loop delays because the first

post-cursor ISI is effectively canceled. This approach has been implemented in [9], [23]

and although the sensitivity to delay variations is addressed with the discrete-tap, since

only one IIR filter is used there is limited freedom to shape the DFE response. The block

diagram for [9] is shown in Fig. 2.12A. In [9] the design requires a sample and hold in

order to avoid the 3.92dB loss at half the bitrate due to the integrating latch, which can

be difficult to achieve at high speeds.

Using 2-IIR DFE taps provides a significant improvement over 1 IIR filter [42] as seen

in Fig 2.11B vs. Fig. 2.11D for a 32” backplane channel with ∼20dB of loss at one-half

the bitrate. In [25] 2-IIR DFE taps are implemented with two separate feedback paths

to minimize feedback loop delay, as shown in Fig. 2.12B. The additional feedback path

necessitates a second 2:1 multiplexer operating at the full data rate and consuming extra

power. Even so, because there is no DT tap in that work, the architecture’s performance


(A) (B)

1 DT DFE 1 IIR DFE

Dout

+-

+ τ1

Vin DoutdEQ

VIIR

+-

Vin

+ H1

VDT

dEQ

D = 0 UI

D = 0.2 UI

D = 0.4 UI

D = 0.6 UI

D = 0.8 UI

Vin

VIIR

dEQ

Pulse Response

Vin

VDT

D = 0 UI

D = 0.2 UI

D = 0.4 UI

D = 0.6 UI

D = 0.8 UI

Pulse Response

B1

Figure 2.10: (A) A 1 discrete-tap DFE, with varying loop delay and the channel pulse re-sponse. (B) A 1 IIR DFE with varying loop delay and the channel and resulting equalized pulseresponses

remains sensitive to latch clock-to-output delay which is in turn sensitive to VDD and

process variations. Post layout simulations in a 28nm LP CMOS technology show that

a 10% decrease in VDD results in a ∼0.2UI increase in latch clock-to-output delay at

10Gb/s, shown in Fig. 2.11F. (Delays are normalized to a VDD of 1V at the typical

(TT) corner.) Process variations can also cause significant increases in the latch delay.

Increasing latch-delay by 0.2UI reduces the eye opening anywhere from 0.1UI to 0.3UI

in a single tap IIR DFE without a discrete-tap for a ∼20dB loss channel (Fig. 2.11B).

Under identical operating conditions, the 2-IIR DFE is not only more sensitive to

loop delay than the 2-IIR + 1 DT DFE, it is also more sensitive to coefficient variations.

Fig. 2.13 directly compares the 2-IIR DFE architecture (Fig. 2.13A) with a 2-IIR +

1-DT DFE (Fig. 2.13B). The black curve shows the bathtub curve for a VDD = 1V and

T = 25oC, where the loop delay is conservatively chosen to be 0.5UI. As the VDD drops

to 0.9V and the temperature drops to −40oC (blue curve), the loop delay increases by


-0.5 0 0.510

-6

10-4

10-2

100

UI

2-IIR DFE

-0.5 0 0.510

-6

10-4

10-2

100

UI

2 IIR + 1 Discrete-Tap DFE

0 10 20-0.2

-0.1

0

0.10.2

0.3

0.4

0.50.6

VDD Decrease (%)

Latc

h D

ela

y I

ncre

ase (

UI)

(A) (B) (C)(A)

(F)

BER

(D) (E)

-0.5 0 0.510

-6

10-4

10-2

100

BER

UI

10 Discrete-Tap DFE

-0.5 0 0.510

-6

10-4

10-2

100

UI

IIR DFE

-0.5 0 0.510

-6

10-4

10-2

100

UI

IIR + 1 Discrete-Tap DFE

BER

SS

TT

FF

Figure 2.11: (A) - (E) Simulated Bathtub curves with various latch-delays for different DFEarchitectures for a 32” backplane channel having ∼20dB loss at one-half the bit rate. (F) Post-layout simulations of latch-delay increase as a function of process and VDD variations in a 28nmCMOS technology.


2:1

Mux

DataIN

DEVEN

DODD

τ

+

+

2:1

Mux

DataIN

DEVEN

DODD

τ1

Double-

Tail

Latch

SR

Latch

SR

Latch

FF

FF

2:1

Muxτ2

Double-

Tail

Latch

∫

∫(A)

(B)

Figure 2.12: (A) Block diagram of a 1 IIR + 1 Discrete-tap DFE [9]. (B) Block diagram ofa 2 IIR DFE [25]


0.2UI based on post-layout extracted simulations of the latch. Under this condition, the

2-IIR DFE eye is completely closed, however, there is only a minor degradation to the

2-IIR + 1 DT DFE. Finally, the red curve shows both systems at the reduced VDD and

temperature but the coefficients of the DFE have been re-optimized for minimum post-

cursor ISI. The eye opening of the 2-IIR DFE is partially restored but not completely,

whereas the 2-IIR + 1 DT DFE is completely restored back the original eye opening,

once again showing the insensitivity to loop delay. This analysis also shows that the

2-IIR + 1-DT DFE is less sensitive to coefficient variations since even without coefficient

re-adjustment, the bathtub curve is still open.

The remainder of this chapter describes the first work to combine the benefits of

2-IIR DFE taps plus one DT DFE tap [43], [44]. The two IIR DFE taps cancel the

long tail of the channel pulse response better than one tap can, and the DT DFE tap

makes its performance insensitive to latch timing delays (Fig. 2.11E). Moreover, unlike

past work, the proposed design is implemented in a low-power (LP) CMOS process

suitable for devices requiring low standby power, but where in general it can be difficult to

realize the high gain-bandwidth product required for analog equalization. The proposed

DFE implementation relies only upon dynamic logic also contributing to the low power

consumption. The entirely dynamic logic DFE allows its power consumption to scale

linearly with the bit rate and facilitates porting between CMOS technologies.

2.3. Proposed 2-IIR DFE + 1-DT Receiver

Fig. 2.14 shows a block diagram of the proposed half-rate receiver. The front-end com-

prises a passive equalizer and preamplifier. The passive linear equalizer can be disabled

to compare different methods of equalization. Dynamic logic is used throughout. Un-

like [9], a current integrating latch is not used which would require a sample and hold

to avoid the 3.92dB loss at one-half the bitrate. The 1 DT plus 2-IIR DFE taps all feed

directly into latch inputs. A single 2:1 multiplexer, shown in Fig. 2.17 and followed by


-0.5 0 0.510

-6

10-4

10-2

100

UI

BER

2-IIR DFE

-0.5 0 0.510

-6

10-4

10-2

100

UI

2 IIR + 1 Discrete-Tap DFE

VDD

=1V,T=25oC VDD

=0.9V,T=-40oC VDD

=0.9V,T=-40oC,Coeff Re-adjusted

(A) (B)

Figure 2.13: (A) Simulated bathtub curves for 2-IIR DFE with VDD and temperaturechanges. The coefficients are re-adjusted to compensate for the change in circuit performance.(B) The same simulations and conditions as (A) but for a 2-IIR DFE with 1 Discrete-tap.

cross-coupled buffers, is used to drive both IIR filters. By contrast, [25] used two separate

2:1 multiplexers to minimize the loop delay for the fast IIR filter, while allowing more

settling time for the second IIR filter. In this work, the architecture includes a DT tap

making the performance relatively insensitive to small variations in loop delay and obvi-

ating the need for the additional multiplexer. The half-rate architecture necessitates the

use of a 2:1 multiplexer since the “memory” elements in the analog IIR filters are capac-

itors which must be exposed to every recovered bit in sequence in order to produce the

correct analog feedback waveform. Therefore, the system needs to have some mechanism

to multiplex the half-rate data and generate the original full-rate data pattern.

2.3.1. Input Stage

The input termination consists of two 50Ω resistors with a large capacitor connected

in the middle to ground. Fig. 2.15 shows the CMOS inverter with resistive feedback

used as a pre-amp. At the input, C1 and R1 provide attenuation at low-frequencies


ILO1

CLKIN

ILO2

CLK

CLKMUX

2:1

Mux

DataIN τ1

τ2

Double-

Tail

Latch

SR

Latch

Double-

Tail

Latch

SR

Latch

dB

f

Passive EQ +

Pre-Amp

CLK

CLK

CLKMUX

Dout

Figure 2.14: Block diagram of the receiver

creating a relative boost at high-frequencies. The boost can be turned off by activating

the transmission gate which shorts out C1 and R1. The input resistance to the pre-amp

is designed to be more than 10× larger than the required 50Ω termination resistance to

minimize its impact on the matching network. The input common-mode is also set by

the pre-amp assuming the incoming data is AC coupled off-chip.

C1 C1

R1 R1

VDD

InP

PREn

PREp

Boost Boost

50Ω

50Ω

Termination

5

All Widths in µm

L=30nm

5

10 10

Figure 2.15: Input termination, passive equalizer with disable, and preamplifier.


2.3.2. Summing and Latches

The DFE subtraction is directly performed inside the latch to reduce the feedback loop

delay. Fig. 2.16 shows a double-tail latch implementation [45] with three additional

differential inputs subtracting the DFE feedback signals: one for the discrete-tap and

two for the IIR taps. For each DFE feedback input, three binary-weighted transistor

pairs sized 1×, 2× and 4× relative to the input pair can be selectively enabled to set the

tap gains. In this work, the enable transistors are placed closer to the output to reduce

the coupling from the fed-back data in the DFE (DODDp, DODDn, IIRp, IIRn, etc.) to

the latch summing node. The polarity of the subtraction is fixed under the assumption

that the channel behavior is low pass and the post-cursor ISI will always be positive.

A pair of transistors are introduced in parallel with the input pair to allow for offset

compensation in the latch by adjusting Voffp & Voffn as shown in Fig. 2.16. The offset

compensation transistor sizes were determined by post-layout monte-carlo simulations

and were set to ensure the DC offset can be compensated well beyond 3σ.

VDD VDD

CLK

PREp

CLK

VDD

CLKB

Dp Dn

PREn IIRp

CLK

IIRn

B[0:2] B[0:2]A[0:2]

DODDp

CLK

DODDn

A[0:2]

Voffn Voffp

IIR1 Subtraction (IIR 2 Subtraction Identical)

Discrete-Tap Subtraction

1X2X4X

1X2X4X

5 51 1

8

33

All Widths in µm

L=30nm

4 4 4 4

8 8

30

Figure 2.16: Double-tail latch with DFE subtraction directly performed inside the latch. Thesubtraction for IIR2, not shown, is identical to and in parallel with IIR1.


2.3.3. Data re-multiplexing & IIR Filters

Two single-ended 2:1 multiplexers choose between each of the even and odd inputs and are

followed by cross-coupled buffers. The implementation of the 2:1 differential multiplexer

is shown in Fig. 2.17. The clock is placed closer to the output of the multiplexer to

provide a shorter clock-to-output delay. The clock edges are aligned midway between

data transitions to ensure the data is stable while selected by the multiplexer.

The IIR filter time constants can be adjusted to fit the DFE response to that of

the channel as shown in Fig 2.18. The two IIR filters have time constants an order of

magnitude apart; hence, one is intended primarily to cancel the first 6 UI of post-cursor

ISI while the other is primarily intended to cancel ISI that persists for more than 6 UI

beyond the main symbol sample. The higher bandwidth filter, IIR1, can be adjusted

between 200MHz to 3.2GHz while the lower bandwidth filter, IIR2, can be adjusted

between 20MHz to 320MHz. Fig. 2.18A shows the IIR filter with a faster time constant

(IIR1), which can be adjusted with 3 binary-weighted switched capacitors as well as a

varactor. Since the DFE performance is more sensitive to the first few large post-cursor

ISI contributors, having the varactor allows for finer tuning of the fast time constant

to better match this critical portion of the pulse response. The tuning range of the

varactor was designed to be greater than the 50fF LSB capacitor to ensure the entire

range 200MHz to 3.2GHz can be covered. The filter IIR2, shown in Fig. 2.18B, has a 4-

bit binary-weighted switched capacitor bank for tuning its time constant, but no varactor

since the accuracy of this time constant is not as critical. The time constant of the IIR2

filter only needs to roughly match the long tail of the response to adequately cancel the

remaining post-cursor ISI. Any process variations causing a change in the resistance or

capacitance values can compensated for by adjusting the filter setting.


2:1

Mux

CLKMUX

2:1

Mux

CLKMUX

Buffer

DnEVEN

DnODD

DpEVEN

DpODD

Dataout

&

To IIRs

VDD VDD

CLK

CLK CLK

CLK

IN2

IN2

IN1

IN1

Out

2:1 Mux

3

All Widths in µm

L=30nm

3

3 3

6 6

6 6

Figure 2.17: Two dynamic 2:1 multiplexers are used to create a differential 2:1 multiplexer

C0

B0

C1

B1

C2 C3

B3B2

C0

B0

C1

B1

C2C3

B3 B2

C0

B0

C1

B1

C2

B2

C0

B0

C1

B1

C2

B2

20kΩ 20kΩ

2kΩ 2kΩ DataOutp DataOutn

DataOutp DataOutn

IIR1

IIR2

C0 = 25 fF

C1 = 50 fF

C2 = 100 fF

C3 = 200 fF

C0 = 50 fF

C1 = 100 fF

C2 = 200 fF

All Widths in µm

L=30nm

32168 32 16 8

32 16 8 44 8 16 32

Figure 2.18: IIR filters created using a resistor and switched capacitor circuits. The fastertime constant IIR1 (A) includes a varactor to allow for finer tuning.


2.3.4. Clocking and Output Buffers

Two injection locked oscillators (ILOs) are included on-chip to sweep the input half-rate

clock phase for BER bathtub curve measurements. Providing two variable clock delays

allows for independent control of the clock phases applied to the latches and to the 2:1

multiplexor. A block diagram of the clocking circuits is shown in Fig. 2.19A. ILO1 is

tuned to provide an adjustable phase shift covering 1UI at data rates of 10-12Gb/s [46].

For testing at 7-10Gb/s, additional delay tuning was required. Hence, additional tunable

delay cells were included at the input prior to the ILO1 ring. Incorporating the additional

delay stages within the ring would not have been practical since that would reduce the

frequency lock range of the ILO [47] and thereby limit the achievable phase shifts. Placing

the delay stages prior to the ILO ring allows the ring to clean up any duty cycle distortion

introduced by the delay stages before the clock is applied to the DFE latches. However,

if the additional delay stages had been placed at the output of the ILO they could have

acted as both delay and clock buffers which have saved power.

The second ILO (ILO2) adjusts the clock delay between the multiplexer sampling

clock and the latch output. This delay is used to account for the clock-to-Q delay of the

latches, and hence requires only fine tuning. Nevertheless, the same wide tuning range

ILO used for ILO1 was reused for ILO2 to reduce design and verification time, while

ensuring the two ILOs have overlapping lock ranges. If a separate narrow-tuning range

ILO had been designed for ILO2, power savings may have been realized. It is generally

desirable to make the clock-to-Q latch delay, and hence the phase shift through ILO2,

small otherwise the IIR tap may not settle prior to the second post-cursor. In that case,

the same problem described in Fig. 2.10 would then apply to canceling the 2nd post-

cursor ISI term. The delay stage schematic is shown in Fig. 2.19B and allows for tuning

the delay by varying the voltage across the pull-up PMOS device [48].

Both ILOs and all clock buffers consume 18.5mW to 35.4mW depending on the ILO

control voltages. As described above, the power consumption of the clocking was not


(A)

(B)

CTRL

ILO1

Buffer

ILO - Ring

ILO

Ring

CLK

Buffer

CLKMUX

CTRL

ILO2

τ τ τ τ

CLKIN τ τ τ

τ In Out

CTRL

VDD VDD

CTRL CTRL

Inp Inn

Outn Outp

ILO1 ILO2

CTRL

ILO1

CLKIN

CTRL

ILO2

CLK

CLKMUX

All Widths in µm

L=30nm

2.5

32

2.5

3 2

Figure 2.19: (A) ILO1 and ILO2 block diagram showing the delay cells and the ring ILOused. (B) ILO delay cell schematic


optimized and hence the ILOs and clock buffers are connected to a separate supply

voltage and not included in the receiver’s power consumption. This is consistent with

the other works cited in Fig. 1.4 and Fig. 2.28 except for [16].

2.3.5. Measurement Results

The chip die photo along with an area breakdown is shown in Fig. 2.20. The measure-

ment setup is shown in Fig. 2.21. A Centellax TG1B1-A PRBS/BERT unit is used to

provide PRBS data to the chip and measure BER. A pair of broadband attenuators are

used in conjunction with the swing adjustment available from the source to obtain the

desired input data swing levels. A Centellax TG1C1A provides a half-rate, 5GHz, clock

to the DUT. The 10 GHz clock required as an input to the BERT/PRBS is from an

Agilent 83732B Signal Generator. The BERT clock and the 5GHz clock for the DUT are

synchronized using 10MHz reference ports in the test equipment. A PC sets the DFE

coefficients, latch offset cancellation, and the phase of the clocks using the on-chip ILOs

and obtains BER information from the BERT creating a bathtub curve all via a USB in-

terface to a off-chip microcontroller, and from there through a serial interface integrated

onto the DUT.

To obtain the most accurate information regarding the ILOs’ tuning range, the ILOs

0.9mm

0.9mm

Block Description Area (um2)

2,030

1,620

3,240

802

1,070

Pre-Amp

DFE Core

IIR Filters

ILO 1

ILO 2

A

B

C

D

E

8,762Total

E

DA

B

C

Figure 2.20: Prototype 28nm-LP CMOS Die photo.


PRBS

Generator

Centellax

TG1B1-AChannelBroadband

Attenuators

Centellax

TG1B1-A

BERT

Clock

Synthesizer

Centellax

TG1C1A

PCBuC

PC

Agilent 83732B

Signal

Generator

Figure 2.21: Measurement setup.

were characterized in situ using the data path. The configuration for this measurement

is shown in Fig. 2.22A. The input was removed from the system and the digital offset

controls shown in Fig. 2.16 were set to their maximum values. This ensures that the

even path in the half-rate receiver always outputs a logic one and the odd path would

always output a logic zero. This leads to an oscillating data pattern at the output of the

2:1 multiplexer which is transmitted off chip. The ILO voltage was then set to zero and

the output phase was recorded as a baseline. As the ILO control voltage was increased,

the difference in phase was measured as shown in Fig. 2.22B. Once ILO1 was completely

characterized, its control voltage was set to 0 and the second ILO was characterized using

the same approach. The measured ILO delay vs. control voltage is shown in Fig. 2.23

for 10Gb/s and 8Gb/s clocks.

Calibration of the offset in each of the odd/even latches is performed as shown in Fig.

2.24A. First, the even path digital offset control is set to its maximum value, so that the

even path output is always a logic one. The digital offset control in the odd path (VA) is

then adjusted and the output (DOUT ) is observed on an oscilloscope. Starting the offset

control at a high voltage ensures the odd path output is always a logic one, and therefore

DOUT = 111...11 as shown in Fig. 2.24B. The offset control VA is then decreased one

LSB at a time until the output begins to sometimes switch to a logic zero as shown in Fig


2:1

Mux

Double-

Tail

Latch

SR

Latch

Double-

Tail

Latch

SR

Latch

CLK

CLKMUX

DOUT

+

VOFFSET=1

+

VOFFSET=-1

111…11

000...00

101010...

From ILO1

From ILO2

DOUT o

n

Osc

illi

sco

pe

(V)

Time (t)

ILO1=0V ILO1=0.1V

Phase Shift (UI)

(A)

(B)

Figure 2.22: (A) ILO measurement setup by setting offset voltages to their maximum andminimum values. (B) Clock pattern generated at the output is measured to determine phaseshift introduced by the ILOs.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.25

0.5

0.75

1

1.25

1.5

ILO Control Voltage

Ph

ase

Sh

ift (U

I)

Measured ILO Phase Shift vs. Control Voltage

ILO1 (Latch Clock) - 10 Gb/s

ILO2 (MUX Clock) - 10 Gb/s

ILO1 (Latch Clock) - 8 Gb/s

ILO2 (MUX Clock) - 8 Gb/s

Figure 2.23: ILO tuning characteristics for both ILO1 and ILO2 at 10Gb/s and 8Gb/s


2.24C,D. Finally VA is decreased until the output completely switches between a logic

one and a logic zero leading to the pattern DOUT = 10101...01. The offset voltage for the

odd path is set to the point where the odd path output is low approximately one-half of

the time. The same approach is used to characterize the offset for the even path.

The measured frequency response of the channels used for DFE characterization are

shown in Fig. 2.25A and B: a 6 meter coax channel, and a 34” backplane channel,

respectively. The plots also contain the simulated losses of the characterization PCB and

the QFN package based on the model shown in Fig. 2.25E. A 25mm PCB trace is used on

the characterization board to connect SMA connectors to the QFN package housing the

prototype. An approximately 2.5mm bondwire is used to connect the package pads to the

2:1

Mux

SR

Latch

SR

Latch

CLK

CLKMUX

DOUT

+

VOFFSET=1

+

VA

111…11

DOUT o

n

Osc

illi

sco

pe

(V)

(A)

(B)

Double-

Tail Latch+

Internal Odd

Latch Offset

VO_odd

Double-

Tail Latch+

DOUT o

n

Osc

illi

sco

pe

(V)

Time (t)

VA = 20mV VA = 15mV

DOUT o

n

Osc

illi

sco

pe

(V)

DOUT o

n

Osc

illi

sco

pe

(V)

Time (t)

VA = 10mV VA = 5mV

(C)

(D) (E)

Figure 2.24: (A) Latch offset calibration setup to compensate for offset on the odd path (B)Pattern generated at the output is measured to determine amount of offset present in the latch.


die, modeled as a 2.5nH inductance, and a 70fF pad capacitance based on post-layout

extraction. The losses of the characterization board are ∼2.5dB at 5GHz. The pulse

response for the coax and backplane channels are shown in Fig. 2.25C, D, respectively.

Fig. 2.26 shows an eye diagram at the output of the chip and Fig. 2.27 C, F show

measured eye diagrams at the output of the channel at two different amplitudes. With

0 1 2 3 4 5 6 7 8 9 10-50

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

Frequency (GHz)

Inse

rtio

n L

oss (

dB

)

Channel Frequency Response

(B)

2.5 dB

21.5 dB

24 dB

Backplane Channel(A)

0 1 2 3 4 5 6 7 8 9 10-50

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

Frequency (GHz)

Inse

rtio

n L

oss (

dB

)

Channel Frequency Response

2.5 dB16.5 dB

19 dB

Coax Channel

(E)

Backplane

Channel

PRBS Input

Characterization PCB

Transmission Line

QFN Package

Bondwire Inductance Pad Capacitance

Pad Capacitance

To pre-amp

0 10 20 300

0.1

0.2

0.3

0.4

No

rma

lize

d P

uls

e R

esp

on

se

Time (UI)

Coax Channel Pulse Response (10Gb/s)

0 10 20 300

0.1

0.2

0.3

0.4

No

rma

lize

d P

uls

e R

esp

on

se

Time (UI)

Backplane Channel Pulse Response (10Gb/s)(C) (D)

Figure 2.25: A, B) Channel insertion loss including the receiver characterization PCB andQFN package loss. The coax and backplane channel losses are dominated by skin effect anddielectric loss, respectively. C,D) Channel pulse response for coax and backplane channels,respectively E) Model for the Characterization PCB + QFN Package.


Data Output (Retimed)

200 mV-

diff20 ps10Gb/s

Figure 2.26: Full-Rate retimed output from the chip.

the passive equalizer disabled, the DFE can successfully equalize a signal launched with

a swing of only 150mVpp differential (mVpp-diff) and transmitted over a backplane

channel with 24 dB attenuation, or a 19 dB-loss coax cable driven single-endedly with

only 75mVpp swing.

In these measurements, the DFE coefficients are not adapted automatically, but rather

1200 mVpp-diff

Passive EQ input

(Backplane Channel)

200 mV-diff

20 ps

(C)

150 mVpp-diff

DFE Input

(Backplane Channel)

200 mV-diff

20 ps

(F)

-0.3 -0.2 -0.1 0 0.1 0.2 0.310

-12

10-10

10-8

10-6

10-4

10-2

100

UI

BE

R [

Coax C

hannel]

Bathtub Curve (10Gb/s) - Passive EQ

75mV

125mV

175mV

200mV

-0.4 -0.2 0 0.2 0.410

-12

10-10

10-8

10-6

10-4

10-2

100

UI

BE

R [

Backpla

ne C

hannel]

Bathtub Curve (10Gb/s) with DFE [150 mVpp-diff]

No DFE

1 Discrete Tap

1 IIR + DT

2 IIR + DT

TX swing = 150 mVpp-diff

(For All Curves)

TX swing:

(A)

(E)-0.3 -0.2 -0.1 0 0.1 0.2 0.3

10-12

10-10

10-8

10-6

10-4

10-2

100

UI

BE

R [

Coax C

hannel]

Bathtub Curve (10Gb/s) with DFE [75 mVpp SE]

No DFE

1 Discrete Tap

1 IIR + DT

(D)

TX swing = 75 mVpp SE

(For All Curves)

-0.4 -0.2 0 0.2 0.410

-12

10-10

10-8

10-6

10-4

10-2

100

UI

BE

R [

Backpla

ne C

hannel]

Bathtub Curve (10Gb/s) - Passive EQ

150mV

350mV

450mV

900mV

1200mV

TX swing:

(B)

BE

R [

Co

ax C

ha

nn

el]

BE

R [

Ba

ckp

lan

e C

ha

nn

el]

BE

R [

Ba

ckp

lan

e C

ha

nn

el]

BE

R [

Co

ax C

ha

nn

el]

DT Gain= 4

IIR1 Gain= 1

IIR1 BW= 4

DT Gain= 6

IIR1 Gain= 1

IIR1 BW= 2

IIR2 Gain=1

IIR2 BW=10

Figure 2.27: (A,B) Bathtub curves at 10Gb/s using the passive EQ only for various TXswings. (C) Input to the receiver when the passive equalizer is used to equalize the signal witha swing of 1200 mVpp-diff. (D,E) Bathtub curves at 10 Gb/s for various DFE settings andwith the passive EQ disabled. (F) Eye diagram at the channel output and input to the receiverwith a swing of 150mVpp-diff when the DFE is used for equalization.


manually adjusted. Finding the DFE coefficients requires iteration since the coefficients

are not independent. The discrete-tap is adjusted to lower the BER, then the gain and

bandwidth of IIR2, and finally IIR1 are adjusted. The process is repeated a few times

to improve the eye-opening on the bathtub curve. Adaptation of the gains in an IIR

DFE can be performed using the LMS algorithm, similar to discrete-taps, because as

long as the IIR time-constants remain fixed the feedback filter is simply an adaptive

linear combiner. When the IIR time constants are also to be adapted, adaptation is

more difficult since a multi-modal performance surface arises. In [24] and [49] IIR DFEs

are presented which include adaptation. This is the topic of Chapter 4.

Fig. 2.27A, B show measured bathtub curves for the two channels with the receiver

configured using only the passive equalizer (DFE disabled) for various transmit swing

amplitudes. Fig. 2.27D, E show the bathtub curve for the receiver with the DFE en-

abled and passive equalizer disabled at a transmit swing of 75mVpp (single-ended), and

150mVpp-diff (shown in Fig. 2.27F), respectively. For the backplane channel, to obtain

similar horizontal eye openings, the passive equalizer requires an input swing which is

8× higher than using the DFE (1.2Vpp-diff shown in Fig. 2.27C vs. 150mVpp-diff in

Fig. 2.27F). The larger swing is required to compensate for the continuous-time passive

equalizer’s low frequency attenuation of the signal. A continuous-time linear equalizer

with gain could have been used to improve input sensitivity but the additional power

consumption of a continuous-time linear equalizer is expected to be approximately 0.27

pJ/bit [13].

In comparison, the power overhead for the proposed DFE is only that of the 2:1 CMOS

multiplexer, the extra dynamic power of the differential pairs performing subtraction in

the DFE, and the pre-amp all totaling only 0.16 pJ/bit based upon post-layout simula-

tions.

Furthermore, a continuous-time linear equalizer amplifies crosstalk and high frequency

noise whereas the proposed DFE-based receiver does not. The improved receiver sensi-


tivity here can be translated into a minimum of 11mW (1.1 pJ/bit) power savings at the

transmitter assuming a 150mVpp-diff driver instead of 700mVpp-diff ( [9], [25]) over a

doubly-terminated 50-Ohm-per-side link.

Fig. 2.28 shows a power breakdown of the receiver along with a table of comparison

to previous work. The DFE power consumption consists of only dynamic power and as

a result scales with frequency. Among the compared receivers, this work occupies the

least area and can offer the lowest overall link power consumption owing to the greatly

reduced transmit swing requirement.

[9] [23] [25] [13] [24]

Figure 2.28: Power breakdown and comparison to previous work

2.4. Conclusion

Behavioral simulations showed that feedback loop delay has a tremendous impact on

the performance of IIR DFEs, but the addition of a DT tap was shown to make the

architecture robust. A circuit implementation of the DFE with two IIR taps and one DT

tap was developed in a 28nm-LP process exhibiting only dynamic power consumption.

This was the first implementation of multiple IIR DFE taps together with a DT DFE


tap, and architecture that has since duplicated in [50]. Digital foreground calibration of

ILO-based phase shifters and offset cancellation was described. The DFE consumes 4.1

mW at 10Gb/s. The design has a lower input swing requirement and smaller circuit area

than all previous designs as well as a lower area. The DFE was able to compensate 24dB

of loss with a transmit swing of only 150mVpp-diff, 8× lower than the swing required

using the passive equalizer [43], [44]. ‘

3Robust Edge-Based

Adaptation

The objective is to seek equalizer adaptation algorithms for low power receivers. Hence,

the focus of this chapter is on adaptation techniques which require no additional high-

speed signal-path circuitry, and which offer the fastest possible convergence. Additional

high-speed circuitry on the signal path increases receiver power. Not only do the addi-

tional circuits themselves consume power, they also load critical nodes of the signal path,

which in turn has a ripple effect on the design of preceding high-speed circuit blocks,

further increasing receiver power. The adaptation algorithm will utilize readily available

signals from a bang-bang phase detector.

Solutions that can offer fast convergence are sought since they can reduce the energy

overhead of powering-on the receiver. Firstly, there has been much work on the use of

burst-mode communication to save power in wireline links by powering-on the receiver

and recovering the clock quickly [51], [52], [53]. However, in fact the time required to

adapt equalizer coefficients can be much longer than that required for clock recovery, and

therefore will limit the power-on time of links whenever there is significant change in the

channel condition, power supply, temperature, etc. between bursts. Secondly, the faster

adaptation time can always be traded for more accuracy by reducing the step size in

an adaptation algorithm. If two algorithms converge to the same final value, the faster

algorithm is better since the coefficient step size can be reduced until its convergence

time is the same as the other, but with less variations in steady-state in the equalizer

48

Chapter 3. Robust Edge-Based Adaptation 49

coefficients. Lastly, a long adaptation time leads to a longer testing time for products

which will lead to an increase in cost. If the adaptation algorithm takes too long to

converge it might even become prohibitive to test each product sold.

In this chapter, robust adaptation algorithms are defined as algorithms which can

converge for a variety of input data and do not rely on only a specific pattern to set the

equalizer coefficients. Furthermore, robust algorithms do not wander from their adapted

values during bursts of repeating/idle patterns as will be discussed in section 3.1.3.

This chapter will propose an adaptation scheme for DT DFEs and compare it with

previous work. The adaptation metric used for comparison will be discussed in section

3.1. Chapter 4 will expand the base of the adaptation algorithm to a 1 IIR + 1 DT tap

DFE.

3.1. Adaptation Metrics

In order to quantify and compare the performance of various adaptation schemes, a set

of metrics are outlined in the following sections [54].

3.1.1. Adaptation Time

One important aspect of adaptation is the time it takes for the coefficients to converge

to their final value. Since random data and noise can have an effect on adaptation time,

multiple simulation runs with different numerical seeds for random number generation are

performed to quantify this. The ensemble averaged results are used and the 95% settling

time is obtained. To determine the settling time for a DFE with multiple coefficients, the

95% settling time is recorded for the coefficient that takes the longest to adapt. The final

value is determined by waiting for all the coefficients to toggle around a certain value and

then performing averaging to determine a numerical final value. This is shown graphically

in Fig. 3.1a. The 95% settling time provides a way to compare the relative convergence

speed of different adaptation algorithms; the absolute value of the adaptation time is


not used. As long as the same percentage settling time is used for all the algorithms, it

provides a fair comparison of the convergence time.

3.1.2. Mean-Square Error

Another important criteria which relates to the quality of the equalized signal is the mean-

square error (MSE). The coefficient MSE provides a measure of the difference between

the actual and optimal performance averaged over time. Let Gk(j) represents the kth

gain coefficient for the DFE at time step j, and, Gkopt is its optimal value determined from

the channel pulse response. The MSE for a particular simulation run is calculated by

squaring the difference between the coefficient value and the optimal value and averaging

it for W samples after the 95% settling point.

GMSEk =

1

W×

W∑

j=0

(Gk(j)−Gkopt)2 (3.1)

This is shown graphically in Fig. 3.1b. The MSE is ensemble averaged over M

simulation runs, each having different random data and noise seeds.

GMSEk =

1

M×

M∑

i=0

(

GMSEk (i)

)

. (3.2)

Finally, the normalized coefficient error is calculated by,

G%k =

√

GMSEk

Gkopt

· 100%. (3.3)

Generally, faster adaptation times result in greater coefficient MSE since the coeffi-

cients are updated in larger steps causing them to bounce and wander further away from

the optimal values in response to randomness in steady-state. Heuristics can be employed

to mitigate the tradeoff; for example, gear-shifting between large coefficient adaptation

updates during convergence and smaller updates in steady-state, but being heuristics,


timeEq

ua

lize

r C

oe

ffic

ien

tIndividual

Runs

Ensemble average

of multiple runs

95% Settling Time(a) Adaptation curves for multiple runs with different initial conditions. Performingan ensemble average allows for a way to determine the settling time independent ofthe initial condition of the system.

Eq

ua

lize

r

Co

eff

icie

nt

Gkopt

95% Settling Time

Calculate Mean

Square Errortime

(b) Adaptation curve where the mean square error of the coefficient is calculatedafter the settling time, this number is also averaged over multiple runs.

Figure 3.1: Metrics for comparing adaptation algorithms

there can be no guarantee of their effectiveness. In this work, adaptation algorithms are

sought that exhibit a superior combination of settling time and steady-state MSE.

3.1.3. Tolerance to Repeating Patterns

In certain applications, the input pattern can consist of random data followed by bursts

of repeating idle or test patterns for several thousand UIs which can cause the DFE

coefficients to diverge [55]. To check the adaptation’s robustness in the presence of such

data, Pseudo Random Bit Sequence (PRBS) data is transmitted for several thousand UIs

followed by one of three repeating patterns for another few thousand UIs. The pattern

is shown in Fig. 3.2.


PRBS

10,000

bits

Repeating Patterns Input:

PRBS = PRBS7

A = 0000001100111111

B = 10101010101010

C = 111111000000

A

10,000

bits

PRBS

10,000

bits

B

10,000

bits

PRBS

10,000

bits

C

10,000

bits

Figure 3.2: A sample repeating pattern where PRBS data alternates over 10,000 UI withspecific repeating patterns.

3.2. Discrete-time DFE Adaptation

Fig. 3.3a shows a block diagram repeated from Fig. 1.11 of a wireline receiver system

where two clocked comparators (within the phase detector) sample the equalized data

and provide the edge and data samples to the adaptation engine. All the adaptation

schemes to be compared correlate early-late phase detector outputs with the received bit

pattern to infer the amount of edge ISI as shown in Fig. 3.4a. DFE tap weights are

adjusted until the early/late outputs are independent of the preceding bit values. Over

time, the algorithm determines the DFE coefficients that remove ISI present at the edges

as shown in 3.3b.

To express the adaptation schemes mathematically, let dm ∈ −1, 1 represent the

transmitted data, c(t) the channel response including the transmitter and receiver front

end, T the bit period, and tsamp ∈ 0, T the sampling point for recovering data. The

received pulse response sampled at the data points is

hm = c(tsamp +mT ) (3.4)

where m ∈ Z. The received pulse response edge samples are

hm+0.5 = c(tsamp + (m+ 0.5)T ), (3.5)


c(t) (Channel Response)

h0

h1

h2h-1

h-0.5h0.5

h1.5

h2.5

(b)

G1

G1

G1

G1

G2

G2

time

+-

+

+

+

G1

G2

Gk

Z-1

180o

0o

Sign(Em)

Dm

Adaptation

Engine

G1 G2 Gk

(a)

Rx

Em

am=Sign(Dm)

tsamp

Figure 3.3: a) Block diagram of a DFE and adaptation engine only using phase detectoroutputs b) Pulse response showing ISI on the edge samples is removed using the edge basedDFE adaptation.

which is also graphically shown in Fig 3.3b for a pulse input. Let Gk be the kth DFE

tap gain, and N the total number of taps. The equalized data at the sampling point can

be expressed as

Dm =∞∑

i=−∞

di · hm−i −N∑

k=1

Gk · sign(Dm−k). (3.6)

Em =

∞∑

i=−∞

di · hm+0.5−i −N∑

k=1

Gk · sign(Dm−k). (3.7)

Using (3.7) it can be shown that bang-bang clock recovery will generally converge to

a sampling phase, tsamp, such that h−0.5 = h0.5. For the case when there is a transition

(dm 6= dm+1), Em can be written as

Em =m−1∑

i=−∞

di ·hm+0.5−i+dm ·h0.5+dm+1 ·h−0.5+∞∑

i=m+2

di ·hm+0.5−i−N∑

k=1

Gk · sign(Dm−k).

(3.8)

Assuming di are i.i.d and equally likely to be either ±1, the first, fourth, and fifth

terms in (3.8) are equally likely to be negative and positive. The bang-bang clock recovery


will lock to the point where there are 50% early pulse and 50% late pules which translates

to sign(Em) being equally likely to be positive or negative when there is a transition. To

satisfy this condition the terms dm · h0.5 must be equal to dm+1 · h−0.5 which leads to

h0.5 = h−0.5 since dm = −dm+1.

The recovered data can be described as am = sign(Dm), with am ∈ −1, 1.

Generally, (N+2)-bit sequences, [am−N−1 am−N ... am], can be used adapt N -DT-

tap(s) of a DFE with N ≥ 1. Assuming dm and dm−k are uncorrelated for k 6= 0 and a

relatively low BER so that am ≈ dm implies that E(am · dm) = 1 and E(am · dm−k) = 0

for k 6= 0. Using (3.7) it can then be shown that the expected value of am−k−1 · Em−1 is

E (am−k−1 · Em−1) = E

(

am−k−1 ·∞∑

i=−∞

di · hm+0.5−i − am−k−1 ·L∑

k=1

Gk · sign(Dm−k)

)

= E(am−k−1 · dm−k−1 · hk+0.5)−E(am−k−1 · Sign(Dm−k−1) ·Gk)

= hk+0.5 −Gk.

Ideally, the product am−k−1 ·Em−1 would be integrated over time to force this term to

zero, hence forcing Gk = hk+0.5 and eliminating the zero-crossing ISI. The main problem

is integrating the product am−k−1 · Em−1 would require the analog value Em−1 to be

digitized which would require a high-speed and high-resolution ADC. Instead, a 1-bit

quantized version of Em−1, which is already available inside Alexander bang-bang phase

detectors can be used [56]. This is the same as sign-sign LMS algorithms [57], [58], [59]

which use 1-bit quantized versions of the gradient estimates to perform adaptation.

Fig. 3.4a illustrates an adaptation rule that uses the product am−k−1 · sign(Em−1) to

iteratively update Gk forcing E[am−k−1 · Em−1] = hk+0.5 − Gk towards zero and, hence,

Gk towards its ideal value hk+0.5 as illustrated in Fig 3.3b. Note that the bang-bang

phase detector only provides useful information when there is a transition in the data;


i.e. am−1 6= am. In practice, to obtain ISI information for hk+0.5, occurrences of two

patterns which are identical in all bits except am−k−1 are considered

[am1−N−1 ... am1−k−2 + 1 am1−k ... am1], (3.9)

[am2−N−1 ... am2−k−2 − 1 am2−k ... am2]. (3.10)

For each occurrence of patterns in the form of (3.9) & (3.10), the phase detector’s

early/late samples, sign(Em1−1) & sign(Em2−1), can be used to determine whether to

increase/decrease equalization.

Unfortunately, if the data received over a particular window of time happens to have a

preponderance of certain patterns and not others, it can bias the adaptation and lead the

equalizer coefficients to wander away from their optimal point. To combat this, one can

wait for the occurrence of particular patterns before updating the equalizer coefficients.

However, in past works this has had the effect of slowing down the adaptation time

compared with what is otherwise possible. An example of this approach is described in

section 3.2.1 [60]. One downside of slowing down the adaptation is an increase in the test

time required for each chip, this in turn increases the cost. An example of an approach

suffering from coefficients wandering away from their optimal point is presented in section

3.2.2 [61].

All schemes using the product of am−k−1 · sign(Em−1) rely upon the edge samples of

the phase detector for the adaptation algorithm’s “error” information. As a result, they

are adapting to minimize ISI and noise at the edge sample times. They are therefore

referred to as “edge adaptation” schemes, and they serve to minimize jitter rather than

maximize noise margin (i.e. vertical eye opening).


am-k-1am-1

Em-1Pattern 1

am-k-1 am-1 am

am

Pattern 1'

am-N+1

am-N+1

If am-k-1 x sign(Em-1) = +1 Inc. Gk

If am-k-1 x sign(Em-1) = -1 Dec. Gk

[A,k]

Sequence (N=4)

[1,1][1',1]

-1 -1 -1 -1 -1 +1-1 -1 -1 +1 -1 +1

[2,1][2',1]

-1 -1 +1 -1 -1 +1-1 -1 +1 +1 -1 +1

..

.

[1,2][1',2]

-1 -1 -1 -1 -1 +1-1 -1 +1 -1 -1 +1

[2,2][2',2]

-1 +1 -1 -1 -1 +1-1 +1 +1 -1 -1 +1

..

.

amam-1am-5

(a) (b)

Pattern

Figure 3.4: a) Two patterns that are used to obtain information about ISI at h0.5+k. b)Example patterns of length N+2 used for obtaining ISI information at h0.5+k.

3.2.1. One 6-bit Pattern Used

Ref. [60], utilizes 6-bit patterns to guide adaptation for a linear equalizer, however, this

can be straightforwardly modified to guide equalization for 4-taps of a DFE. The algo-

rithm waits for the occurrence of a particular 6-bit pattern and depending on sign(Em−1)

updates the equalizer coefficients. The algorithm then waits for the occurrence of the

pattern with all bits identical except one and again depending on the early/late sam-

ples, the amount of equalization is updated. Fig. 3.4a shows example patterns and they

are defined by (3.9)-(3.10). Let αkm = 1 when pattern (3.10) occurs sometime after the

occurrence of (3.9), αkm = −1 if pattern (3.9) occurs sometime after the occurrence of

(3.10) and 0 otherwise. Let µ be the coefficient step size, and k refer to the DFE tap

which cancels the ISI at hk+0.5. Then the DFE tap weight, Gk is updated by

Gkm+1 = Gk

m + µ · sign(Em−1) · αkm. (3.11)

Over time, alternating between the two patterns ensures that only ISI term hk+0.5

influences the adaptation of tap weight Gk. This edge based equalization allows the DFE

to equalize for horizontal eye opening by removing the edge ISI. This is often acceptable


since jitter often limits the BER of a receiver.

3.2.2. Using Patterns of Varying Sizes

The adaptation algorithm in [61], uses the two previously detected bits (am−k−1 &

am−k−2) and the edge value (Em−1) to guide adaptation. Effectively, the pattern length

used for different Gk varies and is 2k. The DFE update equation is shown in (3.12) and

(3.13).

Gkm+1 = Gk

m + µ · sign Em−1 · (am−k−1 + am−k−2) ·Wm (3.12)

Wm =

0 if am = am−1

1 otherwise

(3.13)

The algorithm relies on random data to cancel the effect of other ISI terms from bits

not considered in the patterns. For this reason, if certain patterns occur with different

probabilities, it could lead to the algorithm settling to incorrect coefficient values or

diverging. Since the algorithm requires the two previous bits (am−k−1 & am−k−2) to the

data transition to be identical, it uses only half of the available patterns for adaptation.

3.2.3. Proposed: Utilizing all patterns

In this work, instead of considering only one [60] or half [61] of (N+2)-bit patterns at a

time, all possible (N+2)-bit patterns having a transition, am 6= am−1, are used. In this

way, to obtain information about hk+0.5, a total of 2N+1 patterns are considered.

The algorithm will next be described more rigorously. Consider a data pattern (N+2)-

bits long, [am−N+1 am−N+2 ... am]. The pattern is assigned index [A, k] where A ∈ 1 : 2N

when am−1 6= am and am−k−1 = −1, while the index [A′, k] is assigned to the same

pattern except bit am−k−1 = 1. These indices identify patterns of the form 3.9 and 3.10,

respectively. An example for N=4 is shown in Fig. 3.4b. The patterns [A,k] and [A’,k]


are used together with the corresponding edge samples Em−1 to inform the adaptation

of DFE tap k. Note each pattern is assigned N indices since each pattern can contribute

to the adaptation of all N DFE taps. Let Γ[A,k]m = 1 when pattern [A, k] occurs and 0

otherwise. The equalizer parameters are to be updated in accordance with

P[A,k]m+1 = P [A,k]

m + sign(Em−1) · am−k−1 · Γ[A,k]m . (3.14)

where a similar equation can be written for P[A′,K]m .

In this notation, the adaptation algorithm of [60] operates with N=4, and updates

each equalizer parameter only when one particular pair of patterns [A, k] and [A′, k] occur

alternately. Hence, assuming random data, the parameter is updated only 1/(2N+1) of the

time, and if the targeted pattern happens not to occur for a long time, the adaptation will

simply halt. Similarly, the algorithm in [61] operates for N=4 and updates the equalizer

parameters 2N/2N+1 = 1/2 of the time. In [61], the patterns are chosen in such a way

that can cause a systematic error in the coefficient value from its optimal value as shown

in section 3.3.

In the next step in the proposed adaptation, the pattern occurrences are summed and

the step size the adaptation will take, ζkm, is calculated as shown in (3.15). By utilizing

all patterns that contribute to the ISI term, this approach allows larger steps to be taken

by the algorithm in determining DFE coefficients. Let ψ be 1 every Λ bits and zero

otherwise. Then the adaptation gain step can be written as

ζkm = µ ·2N∑

A=1

(

P [A,k]m + P [A′,k]

m

)

· ψ (3.15)

The values in equations (3.14) are also reset every Λ bits. The DFE update equation

can then be expressed as


Gkm+1 = Gk

m + ζkm. (3.16)

which will update every Λ bits as a result of the counter ψ. The value for Λ effects the

adaptation time and the coefficient MSE, and through simulations it was determined

that Λ=50 provided optimal results. The simulation results for determining Λ are shown

in section 3.3. The proposed adaptation process requires counting and storing pattern

information which will require more digital hardware than previous schemes, however, it

can all be relatively low-speed synthesized logic.

3.2.3.1. Ensuring Pattern Diversity

The proposed adaptation engine ensures there is a good pattern diversity before updating

any of the coefficients. This means that a certain number of different patterns need to

occur before the algorithm will update the coefficients. Let κ represent the minimum

number of different patterns for which the equalizer coefficients are updated in Λ bits.

Fig. 3.5 shows the proposed adaptation scheme with the repeating pattern input (shown

in Fig. 3.2) with and without the repeating patterns protect feature enabled. In these

simulations, κ = 10 and Λ = 50.

There are trade-offs in choosing a value of κ. If κ is made too large, the adaptation

will slow down due to a stricter requirement on the number of different patterns that

must occur. If κ is made too small, then the adaptation algorithm may diverge during

bursts of idle or repeating patterns. Fig. 3.6 shows the number of patterns that occur

when both a random input is applied and when repeating patterns outlined in Fig. 3.2

are applied. Based on these results and similar analysis on different types of repeating

patterns present in various encoding schemes, a value of κ = 10 was chosen. This

value provides a good balance between maintaining a fast adaptation time while also not

diverging during repeating patterns.


UI×104

0 2 4 6 8

Gai

n S

ettin

g

-0.05

0

0.05

0.1

0.15Repeating Patterns Protect Disabled

G1

G2

G3

G4

UI×104

0 2 4 6 8

Gai

n S

ettin

g

-0.05

0

0.05

0.1

0.15Repeating Patterns Protect Enabled

G1

G2

G3

G4

Figure 3.5: (left) Adaptation with repeating patterns protect feature disabled. (right) Adap-tation with repeating patterns protect enabled and κ = 10

UI×104

0 1 2 3 4 5 6 7 8

Num

ber

of d

iffer

ent p

atte

rns

in Λ

bits

0

5

10

15

20

25

30 Random Data (PRBS 31)

Idle Input Idle Input Idle Input Idle Input

Figure 3.6: Number of different patterns in Λ bits vs. UI for a repeating patterns input.


3.3. Comparing Adaptation Schemes

Given an (N+2) bit sequence, there are 2N+1 patterns that have a transition between the

final two bits. The proposed scheme utilizes 100% of those patterns to guide equalization

given there is sufficient diversity in the received patterns. Ref. [61] uses 2N patterns for

equalization, which is only 50% of patterns with a transition. Ref. [60] uses 1 pattern

at a time to guide equalization which is only 3.125% of patterns of length N+2=6. We

expect the adaptation time to be heavily effected by the number of patterns that are

utilized in the adaptation.

The metrics outlined in section 3.1 will be used to compare the schemes, however,

a few other factors need to be considered for a fair comparison. Firstly, the minimum

step size for the coefficients, µ, should be the same for all schemes and remain the same

for the entire duration of adaptation. Next, all adaptation schemes are applied to the

same DFE architecture with the same behavior. Lastly, a clocking scheme is presumed

to ensure h−0.5 = h0.5, as in a bang-bang phase detector.

Fig. 3.7 shows typical 4-tap DFE adaptation curves for the three schemes for a 1st-

order lowpass channel with a pole at 0.1 times the bitrate. The gain settings for the DFE

tap weights are shown as a function of time in UI. The first half of the plots show the

results for an ensemble averaged set of 100 Monte Carlo runs. The runs, which are for a

RC channel, each use a different PRBS-31 initial condition & noise seed. From these plots,

the algorithms’ 95% settling time can be determined. In Fig. 3.7(left) the adaptation is

shown using only one 6-bit pattern to adapt each DFE coefficient, as in [60]; it is evident

that the adaptation is slower compared to the others shown in Fig. 3.7. The same

simulations for a 24” backplane channel and a 50 meter coax cable are shown in Fig.

3.8 and Fig. 3.9, respectively. For all the different channels the proposed adaptation

scheme has the fastest settling time with no overshoot on any of the coefficients. In

initial simulations, several hundred Monte carlo runs were ensemble averaged, however,


the results looked identical to the 100 runs case. In the final simulations, 100 Monte

Carlo runs were used to keep the simulation time lower while providing enough accuracy

to compare the adaptation schemes.

To better compare the adaptation results, the performance metrics described in Sec-

tion 3.1 are shown in Fig 3.10, where adaptation time is measured as the 95% settling

time and shown on the left axis. The normalized coefficient error, (3.3), is shown for

each of the four DFE tap weights on the right axis. This process was performed for three

channels including a simple single pole RC channel, a 24” backplane channel, and a 50

meter coax cable with attenuation at one-half the bit rate of 15dB, 17dB, and 26dB,

respectively. From the adaptation curves in Fig. 3.9, it can be seen that using the

varying pattern sizes algorithm, [61], the coefficients adapt to different values from the

other two algorithms. The other two algorithms converge much closer to the optimal

DFE tap weights Gkopt. This bias worsens its steady-state MSE metric which compares

the coefficient variations relative to the optimal DFE coefficient Gkopt. From Fig. 3.10

it can be seen that the proposed scheme has the fastest adaptation time and smallest

normalized coefficient error (G%k ) defined in (3.3).

Lastly, the adaptation schemes are compared based on their tolerance to repeating

patterns. A PRBS31 is transmitted for 10,000 UI followed by one of the three repeating

patterns for 10,000 UI (0000000011, 1111111100, 0000011111). Figure 3.11 shows the

results of a 100 Monte Carlo runs for a 16” backplane channel. The algorithm used

in [61] diverges to incorrect values during the repeating patterns, while the algorithm

in [60] and the proposed scheme are simply idle during the periods of repeating patterns.

The reason for this behaviour in the proposed scheme is the repeating patterns protect

feature described in section 3.2.3.1.

The proposed adaptation scheme’s performance as a function of initial conditions is

shown in Fig. 3.12. The initial condition of all of the coefficients are varied and the

adaptation curves are plotted for the 24” backplane channel. Regardless of the initial


UI (10,000)0 3 6 9 12

DFETap

Weigh

t(G

k)

-0.1

-0.05

0

0.05

0.1

0.15

0.2

One Pattern Used

UI (10,000)0 0.5 1 1.5 2

Varying Pattern Sizes Used

UI (10,000)0 0.5 1 1.5 2

Proposed: All Patterns Used

95% Settling Point 95% Settling Point95% Settling Point

1 Run 1 Run 100 Runs Averaged 1 Run100 Runs Averaged 100 Runs Averaged

G1G2

G3

G1

G3

G2

G1

G3

G2

G4

G4 G4

Figure 3.7: Adaptation curves for the three schemes for a simple RC filter with PRBS-31 input.The first half of adaptation curve is the result of 100 Monte Carlo runs ensemble averaged todetermine the settling time.

UI (10,000)0 3 6 9 12

DFETap

Weigh

t(G

k)

-0.05

0

0.05

0.1

0.15

0.2One Pattern Used

UI (10,000)0 0.5 1 1.5 2


UI (10,000)0 0.5 1 1.5 2


95% Settling Point

1 Run 1 Run 100 Runs Averaged 1 Run100 Runs Averaged 100 Runs Averaged

G1

G3G4

G2

G1

95% Settling Point

G4G3

95% Settling Point

G2

G1

G2

G3G4

Figure 3.8: Adaptation curves for the three schemes for a 24” backplane channel with PRBS-31 input. The first half of adaptation curve is the result of 100 Monte Carlo runs ensembleaveraged to determine the settling time.

UI (10,000)0 5 10 15 20

DFETap

Weigh

t(G

k)

-0.02

0

0.02

0.04

0.06

0.08

0.1One Pattern Used

UI (10,000)0 0.5 1 1.5 2


UI (10,000)0 0.5 1 1.5 2


G2

G4

100 Runs Averaged 1 Run100 Runs Averaged 1 Run1 Run100 Runs Averaged

G3

G3

G2

95% Settling Point 95% Settling Point 95% Settling Point

G1 G1G1

G3

G4

G4

G2

Figure 3.9: Adaptation curves for the three schemes for a 50 meter coax cable with PRBS-31 input. The first half of adaptation curve is the result of 100 Monte Carlo runs ensembleaveraged to determine the settling time.


RC Channel Backplane 16” Coax 50m

No

rmali

zed

Co

eff

icie

nt

Err

or

(G%K)

A B A B A B

A – One Pattern Used

B – Varying Pattern Sizes Used

Figure 3.10: Adaptation time and normalized coefficient error shown for the three schemesand three different channels with varying amount of attenuation. The results are for a 100Monte Carlo runs with different PRBS31 and noise seeds.

UI (10,000)0 5 10 15

DFETap

Weigh

t(G

k)

-0.05

0

0.05

0.1

0.15One Pattern Used

UI (10,000)0 5 10 15


UI (10,000)0 5 10 15


G3

G1

G3

G2

G1

G2

G4

G4G3

G2

G4

G1

Figure 3.11: Adaptation curves for the three schemes for a 16” backplane channel withrepeating patterns input


condition, the coefficients converge to the same final value.

As discussed in section 3.2.3 the proposed algorithm updated the coefficients every

Λ bits. Simulation results for different values of Λ are shown in Fig. 3.13. A value of

Λ = 50 is used elsewhere in this chapter as a good compromise between the adaptation

time and coefficient MSE. In the prototype implementation of chapter 4, the adaptation

algorithm will use Λ = 64; the power-of-2 provides for an easier implementation using

demultiplexed data and edge samples.

3.4. Conclusion

Adaptation schemes which utilize readily available data from a bang-bang phase detector

were compared. Adaptation criteria were introduced to capture the important aspects

of adaptation algorithms. A scheme was proposed which improves adaptation speed by

UI (1,000)0 1 2 3 4 5

DFETap

Weigh

t(G

1)

0

0.05

0.1

0.15

0.2

0.25G1

UI (1,000)0 1 2 3 4 5

DFETap

Weigh

t(G

2)

0

0.05

0.1

0.15

0.2

0.25G2

UI (1,000)0 1 2 3 4 5

DFETap

Weigh

t(G

3)

0

0.05

0.1

0.15

0.2

0.25G3

UI (1,000)0 1 2 3 4 5

DFETap

Weigh

t(G

4)

0

0.05

0.1

0.15

0.2

0.25G4

Figure 3.12: Adaptation curves for a 24” backplane channel with varying initial conditions.


Λ (Number of Patterns Per Decision)0 20 40 60 80 100 120 140 160 180 200

Adap

tation

Tim

e(U

I)

1500

1550

1600

1650

1700

1750

1800

1850

1900Adaptation Time

0 20 40 60 80 100 120 140 160 180 200

Normalized

Coeffi

cientError

(G% k)

0

3

6

9

12

15

18

21

24G1G2G3G4

Figure 3.13: The adaptation time and normalized coefficient error as a function of the numberof patterns per decision, Λ.

∼20× relative to [60] and has lower normalized coefficient error than previous works.

Unlike [61] the proposed adaptation scheme was also able to handle repeating pattern

inputs without diverging to incorrect coefficient values.

4Edge Based IIR DFE

Adaptation

While chapter 3 focused on edge-based adaptation for a DT DFE, this chapter focuses

on using the same principle to adapt a 1-IIR + 1-DT DFE. The algorithm will still use

the outputs of the bang-bang phase detector. As before, it is assumed that the edge-

sampling phase is aligned to the median zero-crossings of the data. Section 4.2 will

discuss the theory of operation for the adaptation algorithm. Section 4.3 will show the

implementation details for the system. Finally, section 4.4 will present the measurement

results for a 28nm CMOS FDSOI prototype DFE integrating the adaptation algorithm.

4.1. Prior Art

There have been two previously-reported and implemented adaptation algorithms for IIR

DFEs. In [49], the adaptation algorithm utilizes additional comparators on each of the

even/odd paths as shown in Fig. 4.1. The additional paths’ offset and sampling phase

are adjusted to monitor the shape of the received eye diagram. The adaptation algorithm

required a BER better than 10−5 to function correctly. Therefore, initially, a repeating

pattern is transmitted through the system, and over time, the shape of the pulse re-

sponse is determined off-chip. Based on this pulse response shape, an initial setting is

determined to provide a BER better than 10−5. Once the initial set of coefficients have

been determined, the system will perturb all the coefficients and observe if there is an

improvement in BER. This process is repeated until changing the coefficients does not

67

Chapter 4. Edge Based IIR DFE Adaptation 68

yield an improvement and all adjacent coefficient settings also do not yield an improve-

ment. This off-chip implementation of the algorithm takes 1000 iterations and 1 hour

and 25 minutes to converge. As described in chapter 3, a long adaptation time leads

to increased test-time leading to an increase in cost. The system equalizes 15dB loss at

2.5GHz for 5Gb/s data while using a passive equalizer, 1-DT + 1-IIR DFE.

In [24], a pattern based adaptation algorithm is presented without using additional

high-speed comparators. The outputs of the bang-bang phase detector comparators are

used to determine the 1-IIR DFE coefficients. However, similar to [60], only 1 pattern is

used to guide the IIR gain coefficient and another pattern to guide the IIR time constant

setting. This results in a slow adaptation time and can potentially fail to adapt if the

chosen pattern does not occur. The IIR gain is determined to cancel the ISI at h1.5

while the time constant is chosen to cancel ISI at h2.5, which may not result in the

best cancellation of the long-tail of ISI (which of course IIR taps are generally intended

2:1

Mux

DataIN

DEVEN

DODD

τ

+

+

+

Voffset

dclk

dclkb

aclk

+

Voffset

aclkb

Oscilloscope

Computer

EQ

Coefficients

Off-ChipOn-Chip

Overhead

Overhead

Figure 4.1: Adaptation scheme for an IIR DFE using additional high-speed comparators [49]


to cancel). Furthermore, the work does not discuss where the final coefficient converges

relative the optimal code for the DFE. Although a BER of 10−12 is achieved, the work does

not provide a bathtub curve so the quality of the eye cannot be determined. Moreover,

this work does not include a DT tap for the DFE which leaves its performance sensitive

to any process or voltage variations as discussed in section 2.2. If a first post-cursor DT

tap were added to the architecture, its adaptation would certainly interact and confuse

the adaptation of the IIR tap using that scheme.

4.2. IIR DFE Adaptation Algorithm

Fig. 4.2 shows the general proposed architecture to be used for the DFE adaptation.

There are no additional comparators used beyond those in the bang-bang phase detector.

The adaptation engine will need to determine the gain for the discrete-tap (G1), the gain

of the IIR path (B1), and the time constant of the IIR filter (τ1).

The adaptation algorithm uses 6-bit patterns to guide the IIR DFE coefficients. Using

6-bit patterns provides information regarding post-cursor ISI terms h1.5,h2.5,h3.5, and

h4.5. A longer pattern length could be used which would provide information on post-

cursor ISI further out, however, the implementation complexity would continue to grow

(the memory required doubles for every additional bit). Using 6-bit patterns is a good

compromise between providing enough ISI information for the 1 IIR + 1 DT coefficients

and the digital implementation complexity. The initial process of the 1-IIR + 1-DT DFE

adaptation is similar to the algorithm discussed in section 3.2.3. Equation (3.14) is still

implemented to keep a record of all the patterns that have occurred. That information

is again summed as in (3.15), repeated here for convenience

ζkm = µ ·2N∑

A=1

(

P [A,k]m + P [A′,k]

m

)

· ψ (4.1)

Recall that ζkm will provide information regarding the amount of ISI present at hk+0.5.


+-

+ G1

180o

0o

Sign(Em)

Dm

Adaptation

Engine

G1

Rx

Em

Sign(Dm)

B1+

τ1 B1

τ1

Figure 4.2: Proposed 1 IIR + 1 DT architecture without using additional high-speed com-parators

The discrete-tap, G1 is updated as before

G1m+1 = G1

m + ζ1m · ψ (4.2)

The IIR gain, B1 is determined using information at h2.5 inferred from ζ2m. The update

equation for the IIR gain is

B1m+1 = B1

m + ζ2m. (4.3)

This will ensure that the IIR gain is adjusted until h2.5 = 0. Finally, the IIR filter

time constant, τ1, is updated using information from ζ3m & ζ4m representing ISI present

at h3.5 & h4.5. It should be noted that changing the time-constant of the IIR filter will

also affect h2.5 and, hence, the gain B1. To slow down the interaction between these two

loops, the gain is updated more frequently than the IIR time constant. This allows the

gain to always adjust to cancel h2.5 as the time-constant is adapted. The update equation

for the IIR filter time constant is

τ 1m+1 = τ 1m + (ζ3m + ζ4m) · χ. (4.4)


where χ = 1 every 3×Λ bits and zero otherwise. This will allow the interaction with

the IIR filter time-constant to be minimized. In the proposed the design, Λ = 64 which

allows the data to be demultiplexed by 64 and each group of 64 samples are used for a

single update of the coefficients.

Fig. 4.3 is a channel pulse response showing which ISI terms are used for the DFE

coefficients. Using two edge ISI samples (h3.5 & h4.5) allows for a better fit to the

channel pulse response compared to [24]. In this implementation, all 6-bit patterns with

a transition are used to adapt the coefficients which results in 32/64 patterns to be used.

In [24], one 4-bit pattern is used for the adaptation of the IIR time-constant leading to

1/16 patterns to be used. Therefore, assuming random uncorrelated data there is an

8X improvement in adaptation time as a result of the greater variety of patterns being

used if all other parameters such as the update frequency and adaptation gain are kept

identical.

4.3. Proposed System Architecture

Fig. 4.4 shows the block diagram of the implemented system. A half-rate DFE with

1-IIR + 1-DT tap is implemented, the half-rate latches are used both for the DFE and

the clock recovery unit (CRU) phase detector. The data and edge samples, both needed

by the adaptation algorithm and the digital CRU, are demultiplexed down to 64 allowing

h0

h1

h2h-1

h-0.5 h0.5

h1.5

h2.5

timetsamp

h3 h3.5 h4 h4.5

G1

B1 τ1

Figure 4.3: Pulse response showing which ISI terms are used for the DFE coefficients.


the digital logic for adaptation and clock recovery to be synthesized using standard cells.

The output of the DFE adaptation algorithm is 15 bits: 5 for the DT gain, 5 for the IIR

gain, and 5 for the IIR bandwidth. The digital CRU outputs codes needed for the phase

rotator which adjusts the phase of the 4 sampling clocks (clk0, clk90, clk180, clk270) for

the 4 comparators in the half-rate DFE. The phase rotator block was designed by Behzad

Dehlaghi, a PhD candidate at the University of Toronto. The phase rotator block will be

described in terms of its function in the entire system, however, the exact implementation

details will not be included.

Fig. 4.5 shows the implementation of the half-rate IIR DFE. The received data is

captured alternately at DODD and DEV EN , and the edge samples EODD and EEV EN are

used for coefficient adaptation and the phase detector. For the DT tap, the output of

the SR latch is fed back into the other half of the DFE receiver, similar to the structure

presented in chapter 2.3. The main difference between this design and the previous one is

the output of the double-tail latch is directly connected to a new clockless 2:1 multiplexer

structure. This allows the loop delay to be reduced for the IIR DFE, which as was shown

in chapter 2, has a significant impact on the performance of the system. The output of

the 2:1 multiplexer is fed into the 1 IIR filter which is then connected to all the latches

in the DFE.

The system is implemented in a 28nm CMOS FDSOI process from STMicroelectron-

ics. The body bias for all the analog blocks is adjustable off-chip and can be used to lower

the thresholds of the NMOS/PMOS transistors. The body bias for NMOS transistors

can range anywhere from 0V to 1.3V and the PMOS can be adjusted between -1.3V to

0V. This allows the NMOS threshold to be adjusted from 275mV to 165mV and the

PMOS threshold to be adjusted from 235mV to 125mV The body bias can be modified

and the results can be used to compare and evaluate the benefits of body biasing on

different blocks. The digital standard cells were fast enough for the synthesized logic and

did not require any additional speedup using the body biasing technique. Therefore, the


Half-rate

1 IIR

+

1 DT

DFE

2:8

2:8

8:64

8:64

DEVEN

DODD

EEVEN

EODD

8

8

DFE

Adaptation

Clock

Recovery

Unit (CRU)

Phase

Rotator

Clk0Clk90Clk180Clk270

G (DT Gain)

B (IIR Gain)

τ (IIR Time Constant)

DIN

Synthesized Logic

5

5

5

70

CLK

64

64

ak

Ek

50 Ω

Per Side

Differential

4:1

Output

Select

DOUT

2:1 CLK

Select2:1

CLKOUT

Latch

Offset

Control

R-2R

DAC

Figure 4.4: Block diagram of the proposed 1 DT + 1 IIR DFE with adaptation and a digitalclock recovery unit.

synthesized digital portion of the chip does not employ body biasing and both PMOS

and NMOS body biases are tied to their minimum value of 0V.

4.3.1. Double-tail Latch Architecture

The double-tail latch schematic corresponding to the top and bottom latches in Fig.

4.5 is shown in Fig. 4.6. Feedback signals for the DT and IIR tap are subtracted

inside the latch. There are 5 bits of control for both the DT gain and the IIR gain,

whereas only 3 bits of control were provided in the design of chapter 3. This leads

to more granularity which allows ISI to be more accurately canceled leading to a better

performance. This number of control bits would have been impractical in the previous test

chip as there would have been too many combinations of settings for manual optimization.


2:1

Mux

DIN

τ1

Double-

Tail

Latch

SR

Latch

Double-

Tail

Latch

SR

Latch

CLK180

CLK0

DEVEN

DODD

Double-

Tail

Latch

SR

Latch

Double-

Tail

Latch

SR

Latch

CLK270

CLK90

DEVEN

DODD

EEVEN

EODD

Figure 4.5: Half-rate IIR DFE implementation with edge comparators included for the adap-tation/digital CRU.


VDD VDD

CLK

PREp

CLK

VDD

CLKB

Dp Dn

PREn

IIRp

CLK

IIRn

B[0:4] B[0:4]A[0:4]

DODDp

CLK

DODDn

A[0:4]Voffn Voffp

IIR Subtraction

Discrete-Tap Subtraction

1X2X

16X

1X2X

1X2X

16X

12 121 1

24

8.88.8

All Widths in µm

L=30nm

4 4 4 4

8 8

30

0.6

0.3 0.3

0.3 0.3

0.6

0.3 0.3

0.3 0.3

Figure 4.6: Latch architecture with subtraction performed directly inside the latch. Thereare 5 slices for each the discrete-tap and IIR tap.

However, with the adaptation algorithm in this design, the coefficients can be determined

automatically. The latch offset can be adjusted via transistor pairs in parallel with the

input with the gate voltages Voffn, Voffp. The voltages, Voffn, Voffp, are set using digital

control bits and a R-2R DAC [62].

The delay of the double-tail latch + the SR latch was characterized and is shown in

Fig. 4.7. Each of the three plots shows the Clk-to-Q delay of the comparator vs. the

input pk-to-pk-diff amplitude for three different corners. All the simulation results are

post-layout RCc extraction. The delay increases exponentially as the input amplitude is

reduced. The target for this chip is 20Gb/s which means that a delay of less than 50ps

is required to be able to close timing for the DT tap of the DFE. The different curves

correspond to different body bias voltages applied to the transistors in the 28nm FDSOI

process. As the body bias is varied, the thresholds of the NMOS/PMOS transistors are

lowered, resulting in reduction in the delay.

Fig. 4.8 shows the simulated comparator threshold normalized to the input amplitude

in the absence of mismatch. By normalizing the comparactor threshold to the input, the

gain factor applied to the discrete-tap feedback can be obtained. There are 5 binary


Input pk-to-pk-diff (mV)0 200 400 600 800

Clk

-to-

Q [p

s]

0

10

20

30

40

50

60

70

80Corner: TT [27oC] [V

DD=1V] [RCc]


0

10

20

30

40

50

60

70

80Corner: FF [-40oC] [V

DD=1V] [RCc]


0

10

20

30

40

50

60

70

80Corner: SS [80oC] [V

DD=1V] [RCc]

BB = 0mVBB = 300mVBB = 600mVBB = 900mVBB = 1300mV

Figure 4.7: Clk-to-Q delay of the double-tail latch plus the RS latch as a function of inputamplitude and body bias voltage.

weighted slices in the latch which can be enabled to change the amount of subtraction in

the DFE. The different curves correspond to the amount of subtraction that is performed.

Naturally, as the input amplitude increases, the relative impact of the subtraction slices

decreases and the amount of subtraction goes down. Fig. 4.8 shows the results for the

DFE with and without body biasing for various input amplitudes. All of the simulations

are performed on post-layout RCc extracted netlists.

Gain Setting (5 Bits)10 20 30

Gai

n F

acto

r fo

r S

ubtr

atio

n

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Vin 0.8V [pk-to-pk-diff]

BodyBias=0VBodyBias=1.3V


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8




0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8




0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8



Figure 4.8: Amount of subtraction for the discrete-tap as a function of the gain settings.Simulation are for a TT corner with a VDD=1V and are RCc extracted


in1nin2nin1nin2n

Outp

in1p in2p in1p in2p

Outn

All Widths in µm

L=30nm

2.2 2.28.258.25

88888888

Figure 4.9

4.3.2. Clockless Multiplexer

The clockless multiplexer schematic is shown in Fig. 4.9, this has several advantages over

the type of multiplexer used in Fig. 2.17. The first advantage is that the multiplexer

is connected to the output of the double-tail latch, as opposed to the output of the SR

latch as shown in Fig. 4.5. This will allow the delay to be minimized improving the

performance of the equalizer. The multiplexer is, in fact, two SR latches in parallel

which alternately control the output. When the output of the even path of the DFE is

evaluating, the output of the SR latches in the multiplexer is set by in1 coming from the

even data latch (clocked by CLK0). It therefore latches to the even value. Meanwhile,

the other odd path double-tail latch is reset to zero and has no impact on the SR latch

output. Similarly, during the other phase of the half-rate DFE, the odd double-tail latch

output is valid and determines the multiplexer output. Therefore, together the two SR

latches function as a 2:1 clockless multiplexer. The second advantage is the fact that this

multiplexer does not need a separate clock with a different phase applied to it. Recall

that in the previous implementation (chapter 2), the multiplexer required an additional

ILO so that its clock could be adjusted to minimize the delay for the IIR path. In

this implementation, this is no longer required, saving both the ILO power consumption

and additional clock buffers. The downside is that the clockless multiplexer will have

more delay dependency on the input amplitude to the latches compared with a clocked


multiplexer. Having a clocked multiplexer also gives a degree of freedom to adjust the

delay of the IIR feedback path in the rare case that the multiplexer is too fast and is

interfering with the discrete-tap.

Fig. 4.10 shows the Clk-to-Q delay of the double-tail latch + the 2:1 clockless mul-

tiplexer. This delay needs to be minimized to cancel the second post-cursor ISI tap

effectively. The delay is shown for different corners as well as different input amplitudes

into the double-tail latch. The delay is also shown as a function of the body bias voltage.

Just as there are requirements for the delay of the DT path, the multiplexer output delay

needs to be less than 2 UI to ensure that the signal settles before feeding back to the DFE

to cancel the 2nd post-cursor ISI. As shown in chapter 2, smaller IIR feedback delay will

lead to a better cancellation of the post-cursor ISI. If the delay through the double-tail

latch or the 2:1 clockless multiplexer increases past the mentioned limits, the DFE will

start to resemble a soft-decision DFE [63]. This is due to the fact that the information

being fed back in the DFE is still resolving and a not rail-to-rail signal (hard decision)

yet.


Clk

-to-

Q fo

r 2:

1 M

ux [p

s]

40

50

60

70

80

90Corner: TT [27oC] [V

DD=1V] [RCc]


40

50

60

70

80

90Corner: FF [-40oC] [V

DD=1V] [RCc]


40

50

60

70

80

90Corner: SS [80oC] [V

DD=1V] [RCc]

BB = 0mVBB = 300mVBB = 600mVBB = 900mVBB = 1300mV

Figure 4.10: The Clk-to-Q delay of the double-tail latch + 2:1 clockless mux.


4.3.3. IIR Filter Structure

Fig. 4.11 (left) shows the filter bandwidth vs. filter code for two possible implementa-

tions of the IIR filter. The first case uses binary weighted capacitors (BWCs) with the

implementation shown in Fig. 2.18 The alternative implementation uses binary weighted

resistors (BWRs) to create the filter. Both approaches can be designed to give the same

total tuning range for the filter bandwidth. Using BWCs allows for more accuracy for

setting the time constant at low bandwidths while using binary weighted resistors gives a

uniform distribution for the filter bandwidth. Fig. 4.11 (right) shows pulse responses for

each of the two cases. Using the BWR approach allows for more granularity in matching

the shape of the pulse response to cancel the first few post-cursor ISI, whereas, the BWC

will give more accuracy to cancel the post-cursor ISI that are much further out. Since the

first few post-cursor ISI affect the performance more severely, a BWR scheme is chosen to

allow for better cancellation of the dominant post cursor ISI. The filter implementation

is shown in Fig. 4.12. The resistors are chosen to be binary weighted with a constant

capacitor. The switches are also binary weighted to keep the total resistance in each

branch binary weighted. The filter is designed to have 32 bandwidth settings separated

by 50MHz.

4.3.4. Demultiplexer Structure

Fig. 4.13a shows the 2:8 demultiplexer that is implemented in the analog domain. The

overall system requires a 1:64 demultiplexer for both data and edge samples. The initial

1:2 demultiplexer is performed by the latches in the half-rate DFE as shown in Fig. 4.5.

The 2:8 demultiplexer is implemented using 1:2 demultiplexer stages using the TSPC

latch architecture as shown in Fig. 4.13b. Each 2:1 multiplexer samples the data on

both rising and falling edges of the clock, however, the output of the demultiplexer only

changes on the rising edge of the clock. This helps with the alignment of the signals

required by the next stage of demultiplexer. The TSPC latch is shown in Fig. 4.13c.


Binary Weighted Resistors

Binary Weighted Capacitors

BWC: Code 0:4

BWR: Code 0-4

BWR: Code 27-31

BWC: Code 27-31

Figure 4.11: (left) Filter code vs. filter bandwidth for two different IIR filter architectures.(right) pulse responses for the two architectures of binary weighted resistors (BWR) and binaryweighted capacitors (BWC).

16RC[0]

1X

8RC[1]

2X

4RC[2]

4X

2RC[3]

8X

RC[4]

16X

Inn Outn

16RC[0]

1X

8RC[1]

2X

4RC[2]

4X

2RC[3]

8X

RC[4]

16X

InpOutp

R=485Ω

T-gate NMOS = 1µm

T-gate PMOS = 2µm

L=30nm

130fF

Figure 4.12: IIR filter implementation using binary weighted resistors.


÷2 ÷2 ÷2CLK(10GHz)

1:2

1:2

1:2

ODD (10Gb/s)

1:2

1:2

1:2

EVEN (10Gb/s)

1:2

TSPC

Latch

TSPC

Latch

TSPC

Latch

CLK CLK

In

Out

All Widths in µm

L=30nm

TSPC

Latch

TSPC

Latch

2

2

2

2

2

2

CLK CLKB CLK

CLKB CLK

(A) (B)

(C)

Figure 4.13: (a) 2:8 demultiplexer architecutre including clock dividers. (b) a 1:2 demulti-plexer implementation using TSPC latches. (c) TSPC latch schematic.


The remaining 8:64 demultiplexer is implemented in the digital domain using synthesized

logic. The clock dividers are implemented using TSPC latches and are shared between

the data and edge demultiplexers. The dividers generate appropriate phases to ensure

the sampling point on each of the demultiplexers is near the center of the eye opening.

4.3.5. Phase Interpolator based Clock Recovery

The digital CRU uses the same 64 demultiplexed edge and data samples as the adaptation

algorithm to perform the phase detection and align the clock. Fig. 4.14 shows the block

diagram of the CRU. The bang-bang phase detector logic looks at all the 64 bits of

incoming data and edge samples to determine the number of early/late clock occurrences.

The early/late outputs are then subtracted and passed through two different paths with

a gain of Kp (proportional path) and Ki (integral path). The proportional path helps

track the variations in phase and adjust the phase of the recovered clock, the integral

path helps the phase rotator based CRU track frequency offsets between the incoming

data and the forwarded clock. The gains are adjustable on both of the proportional and

integral paths by factors of 2, as well as the integral path can also be disabled. The

outputs of the two paths are then integrated producing a 24-bit output. The output

is then truncated to the 7 MSB bits which are converted to thermometer and one-hot

encoded signals for the phase rotator. The truncation acts as an averaging so that the

phase code is only updated when the 7 MSB bits change out of the total 24 bits used at

the output of the integrator.

The CRU has several other programmable features which can be used to modify the

tracking bandwidth of the CDR. The skew between the data sampling clocks and edge

clocks can be adjusted. This will allow calibration to make sure the phases are in fact

90 degrees apart. The phase rotator code is down sampled by 7X and can be monitored

off-chip. This allows the phase code vs. time to be plotted and to see the locking

behaviour of the CDR during testing. The CRU employs a power saving mode where


+

Kp

Ki

+

+

+

+

Skew

Edge Code

Data Code

Data

Edge

Thermometer

+

One-hot

Encoder

Phase

Rotator

Clk0

Clk180

Clk90

Clk270

64

64

9Early

Late

9

22

22

24

24

7 (MSB)

7

7

Bang-Bang

Logic

10

CLKin

Figure 4.14: Phase rotator based digital CRU block diagram.

some of the phase rotators are turned off when they are not being used. Fig. 4.15a shows

a behavioural simulation of the CRU phase code as a function of time for a case with

no frequency offset. Fig. 4.15b shows the same simulation but with a 50ppm frequency

offset between the incoming data and the receiver clock. In this case, the phase needs to

continuously wrap around to track any frequency changes.

(a) (b)

Figure 4.15: (a) phase code vs time for the digital CRU without frequency offset. (b) phasecode vs time for the digital CRU with 50 ppm frequency offset.


4.3.6. Adaptation Simulation Results

Fig. 4.16(left) shows the insertion loss for backplane and coax channels of different length,

while Fig. 4.16(right) shows the normalized 20Gb/s pulse response. The adaptation en-

gine and CRU were behaviourally simulated working together for each of these channels.

The adaptation curves are shown in Fig. 4.17 for the DFE coefficients: the gain of the

discrete-tap G1, the gain of the IIR Filter B1, and the IIR time-constant setting τ1 where

each coefficient can be varied between 0-31. Since the IIR time constant is updated less

frequently than the other two coefficients it limits the adaptation time.

4.4. Measurement Results

4.4.1. Measurement Setup

The 28nm FDSOI chip die photo along with an area breakdown is shown in Fig. 4.18.

The measurement setup is shown in Fig. 4.19. An Agilent N4951B Pattern Generator

is used to provide PRBS data to the chip. A Centellax TG1B1-A BERT unit is used

to measure BER of the half-rate output data transmitted off-chip. An Agilent N4960A

clock synthesizer provides a half-rate, 8GHz, clock to the DUT. The clock synthesizer also

provides another 8GHz clock for the BERT. For jitter tolerance measurements, either the

jittered or divided clock is used for the BERT as shown in Fig. 4.19. An Agilent DSA-X

91604A Digital Signal Analyzer is used to capture the adaptation curves and phase code

vs. time from the chip.

A PC communicates with an Atmel Micro-controller through the USB port to set all

the required digital control bits. A Matlab GUI allows all the parameters of the chip to

be set graphically and programmed into the chip. Fig. 4.20 shows the main GUI window,

clicking on any of the sub-blocks will open a new window that shows all the control for

that specific sub-block. In the main window, the body bias can be adjusted for the chip,

the shift register can be programmed, and the clock dividers or digital core can be reset.


Frequency (GHz)0 5 10 15

InsertionLoss(dB)

-30

-25

-20

-15

-10

-5

0Insertion Loss (dB)

8" Backplane24" Backplane10ft Coax20ft Coax

UI0 5 10 15

Normalized

PulseRespon

se-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35Pulse Response (20Gb/s)


Figure 4.16: (left) Channel insertion loss for an 8” backplane, 24” backplane, 10 feet coaxcable, and 20 feet coax cable. (right) normalized pulse response for each of the channels.

UI (10,000)0 10 20 30

Coeffi

cientWeigh

t[0-31]

0

5

10

15

20

25

30Discrete Tap Gain (G1)


UI (10,000)0 10 20 30

IIR Gain (B1)


UI (10,000)0 10 20 30

IIR Time Constant (τ1)


Figure 4.17: 1-IIR + 1-DT DFE adaptation curves for various channels showing the tapweights for G1, B1, τ1.


0.9 mm

1.1

7 m

m

D

C

B

AE

F

Figure 4.18: Chip die photo and area breakdown.

Channel

PCBuCPC

Jitter Delay Divided

Agilent N4960A

Clock Synthesizer

Data

Agilent N4951B

Pattern GeneratorData

Centellax

TG1B1-A

BERT

LF JT HF JT

4

Agilent DSA-X

91604A Digital

Signal Analyzer

Ch1 Ch2 Ch3 Ch4

Jitter

Clock

Figure 4.19: Measurement setup


All of the settings can be saved and re-loaded into the GUI. The GUI establishes the

COM port to communicate with the Atmel micro-controller.

Fig. 4.21 shows the GUI for the DFE parameters. The initial conditions of the DFE

coefficients can be set or the adaptation can be disabled and the coefficients set externally.

The offset for the latches can also be adjusted via on-chip R-2R DACs [62]. A mux can

be controlled which sets which of the four comparator outputs are transmitted off-chip.

The GUI for the CDR, shown in Fig. 4.22, allows the gain of the proportional/integral

paths to be adjusted. The initial code for the PIs can be set as well as the skew between

the edge/data clocks.

Fig. 4.23 shows the GUI for the adaptation engine. The gain of the coeffcients

can be varied, the repeating patterns protect threshold can be adjusted. This GUI also

determines which digital information is transmitted off-chip (to be captured using the

Agilent DSA-X 91604A Digital Signal Analyzer). This can vary between the adaptation

coefficients, phase code for the phase rotator, as well as a snapshot of all the data/edge

samples every 128 bits which can be used to run the adaptation off-chip.

The DUT is packaged in a 36-pin QFN package and mounted on a PCB shown in

Fig. 4.24a. This board contains co-planar waveguide strips for the input/output data as

well as the input/output clock signals. The board also includes decoupling capacitances

of various sizes. This high-speed board plugs into a control board shown in Fig. 4.24b

The control board includes regulators for the various VDDs required on chip as well as

the uC and level shifters required to program the chip. All the equipment is controlled

through GPIB to allow for automation of bathtub curves and other measurements.

Fig. 4.25a shows a half-rate re-timed output of the chip at 8Gb/s which has 2.55

ps RMS Jitter. The chip contains a mux which allows the output of DEV EN , DODD,

EEV EN , EODD to be transmitted offchip. Fig. 4.25b shows the output clock which is

buffered offchip with 1.54 ps RMS Jitter. The chip contains a mux which allows any of

CLK0, CLK90, CLK180, or CLK270 to be transmitted off-chip.


Figure 4.20: Main GUI screenshot.

Figure 4.21: DFE GUI screenshot.


Figure 4.22: CDR GUI screenshot.

Figure 4.23: Adaptation GUI screenshot.


(a) (b)

Figure 4.24: (a) High-frequency board housing the DUT and decoupling capacitors. (b) DCboard including regulators, uC, and DACs

(a) (b)

Figure 4.25: (a) Half-rate re-timed output eye at 8Gb/s (DEV EN). (b) Phase rotator outputclock (CLK0).


4.4.2. Clock Recovery Measurement Results

Fig. 4.27 shows measured jitter tolerance with a PRBS7 input at 16Gb/s. The total setup

loss introduced is 2.7dB at 8GHz. The setup loss includes the loss of the characterization

PCB which consists of a 1” co-planar waveguide trace on Rogers RO 4003 material. The

setup loss also includes the losses of the QFN package, mainly the pad capacitance and the

bondwire inductance. In Fig. 4.27 measurements are shown for both mesochronous, and

plesiochronous (100-150ppm frequency error) half-rate receiver input clocks. Both show

similar low-frequency jitter tolerance, demonstrating proper phase rotation as plotted in

Fig. 4.26 (left). The jitter tolerance is provided for two different input amplitudes of

2Vpp-diff and 0.8Vpp-diff and show there is only a slight degradation at the lower input

amplitude.

4.4.3. DFE Adaptation Measurement Results

To characterize the DFE adaptation algorithm several channels were used with the in-

sertion loss shown in section 4.4.3.1. For each of the channels, the DFE was adapted

for various input amplitudes and bathtub curves were generated. Repeating patterns

were also introduced to demonstrate the functionality of the repeating patterns protect

feature.

4.4.3.1. Measured Channel Responses

The measured channel insertion losses are shown in Fig. 4.28. Four different channels

were used for the characterization of the DFE ranging from 15.7dB to 30dB of attenuation

at half the bit-rate. The eye diagrams using a PRBS7 pattern for each channel and an

input amplitude of 2Vpp-diff are shown in Fig. 4.29. These eye diagrams do not include

the characterization PCB + QFN package (2.7dB of loss at half the bit-rate). Fig. 4.30

shows the channel eye diagrams for an input amplitude of 0.8Vpp-diff.


Time (us)0 1 2 3 4 5

Pha

se C

ode

(0-1

27)

0

20

40

60

80

100

120

Figure 4.26: Measured phase code vs. time for 0ppm, 100ppm, and 150ppm frequency error.

Frequency (Hz)104 105 106 107 108

Jitte

r T

oler

ance

(U

I pk-

to-p

k)

10-1

100

101

102 Measured Jitter Tolerance 16Gb/s (BER < 10-12)

0ppm (Kp=211,Ki=1, Vin=2Vpp-diff)

0ppm (Kp=211,Ki=1, Vin=0.8Vpp-diff)





Figure 4.27: Measured jitter tolerance for 0ppm, 100ppm and 150ppm frequency offset forinput amplitudes of 2Vpp-diff and 0.8Vpp-diff.


Frequency (GHz)0 5 8 10 15

Inse

rtio

n Lo

ss (

dB)

-60

-50

-40

-30

-20

-10

0Channel Losses

Setup loss (PCB + QFN Package)Ch. 1 (10" FR-4 Trace + Setup loss)Ch. 2 (14" Backplane + Setup loss)Ch. 3 (18" Backplane + Setup loss)Ch. 4 (26" Backplane + Setup loss)

15.7dB2.7dB

22dB

28dB

30dB

Figure 4.28: Measured Channel insertion losses including setup loss.

Channel 1: Channel 2:


Figure 4.29: Measured Channel output eye diagrams not including characterization PCB +QFN package for an input amplitude of 2Vpp-diff




Figure 4.30: Measured Channel output eye diagrams not including characterization PCB +QFN package for an input amplitude of 0.8Vpp-diff

4.4.3.2. Measured Channel 1 Results

Aside from PRBS7 inputs, to test the repeating patterns protect feature, different pattern

types are used to test the adaptation algorithm robustness. Fig. 4.31 shows the three

different pattern types that were used. For the STM64 Input pattern [64], the length of

the random pattern was shortened to 50,000 to allow the results to be plotted and easily

compared.

The adaptation curves for channel 1 for a PRBS7 input are shown in Fig. 4.32a.

Increasing τ corresponds to increasing the IIR filter time constant. All the coefficients

converge within 80,000 UI, over an order of magnitude faster than in [24], after which

the BER is below 10−12. Fig. 4.32b shows measured adaptation curves for channel 1

when repeating patterns are inserted. It is evident that the equalizer coefficients are

not updated when the repeating patterns are present. Deactivating this feature, the

coefficients diverge in Fig. 4.32b and the BER increases when the repeating patterns

arise. Fig. 4.32c and 4.32d show the adaptation curves with the STM64 and SSPS64


PRBS

10,000

bits

A) Repeating Patterns Input:

PRBS = PRBS7

A = 0000001100111111

B = 10101010101010

C = 111111000000

A

10,000

bits

PRBS

10,000

bits

B

10,000

bits

PRBS

10,000

bits

C

10,000

bits

A1

1,536

bits

B) STM 64 Input Pattern

A1 = 11110110

A2 = 00101000

J0 = 00000011

NAT = 10101010

Random = PRBS7

A2

1,536

bits

J0

512

bits

NAT

1,024

bits

Random

1,239,560

bits

A1

1,536

bits

C) SSPS64 Input Pattern

A1 = 11110110

A2 = 00101000

NU = 10101010

PRBS = PRBS28

CID = 1, 72 0's

A2

1,536

bits

NU

1,026

bits

PRBS

4,071

bits

CID

73

bits

PRBS

8,139

bits

Figure 4.31: Repeating Patterns used to characterize adaptation robustness.

patterns [64], respectively. There is some variation in the coefficients from their adapted

values once the repeating patterns protect feature is disabled. The variations will increase

as the duration of the repeating patterns increase.

The bathtub curves for various input amplitudes for channel 1 are shown in 4.33.

There is very little degradation in the horizontal eye opening as the input amplitude

is reduced. Fig. 4.33 also shows the adapted coefficient settings for each of the input

amplitudes.


The adaptation curves for channel 2 are shown in Fig. 4.34. Fig. 4.34a shows the

adaptation curves for a PRBS7 input pattern. Fig. 4.34b shows the adaptation curves

for the repeating pattern input shown in Fig. 4.31A.

The bathtub curves for various input amplitudes for channel 2 are shown in 4.35. Fig.

4.35 also shows the adapted coefficient settings for each of the input amplitudes.


Time (us)0 10 20 30 40 50

Coe

ffici

ent S

ettin

g (0

-31)

0

5

10

15

20

25

30Ch. 1 Adaptation Curve

(IIR Bandwidth)τ

B (IIR Gain)G (DT Gain)

(a) PRBS7 Input Pattern

Time (us)50 60 70 80 90

Coe

ffici

ent S

ettin

g (0

-31)

0

5

10

15

20

25

30Ch 1. Adaptation with Repeating Patterns

B (IIR Gain)

G (DT Gain)

(IIR Bandwidth)τ

Repeating Patterns Protect OFFRepeating Patterns Protect ON

(b) Repeating Input Pattern

Time (us)50 55 60 65 70

Coe

ffici

ent S

ettin

g (0

-31)

0

5

10

15

20

25

30Ch 1. Adaptation with STM64 Pattern

G (DT Gain)


B (IIR Gain)

(IIR Bandwidth)τ

(c) STM64 Input Pattern

Time (us)50 60 70 80

Coe

ffici

ent S

ettin

g (0

-31)

0

5

10

15

20

25

30Ch 1. Adaptation with SSPS64 Pattern

G (DT Gain)


B (IIR Gain)

τ(IIR Bandwidth)

(d) SSPS64 Input Pattern

Figure 4.32: Measured Adaptation curves for channel 1 with various types of inputs.


Amplitude

Pk-to-pk diff

G

[0-31]

B

[0-31] [0-31]

2 V 5 13 16

1.6 V 5 11 15

1.2 V 4 9 15

0.8 V 3 6 15

0.6 V 2 5 12

ττττ

Figure 4.33: Measured Channel 1 bathtub curves for various input amplitudes along with theadapted coefficient values.

Time (us)0 10 20 30 40 50

Coe

ffici

ent S

ettin

g (0

-31)

0

5

10

15

20

25


B (IIR Gain)

τ

G (DT Gain)

(IIR Bandwidth)

(a) PRBS7 Input Pattern

Time (us)50 60 70 80 90

Coe

ffici

ent S

ettin

g (0

-31)

0

5

10

15

20

25

30Ch 2. Adaptation with Repeating Patterns


(IIR Bandwidth)

B (IIR Gain)

G (DT Gain)

τ

(b) Repeating Input Pattern

Figure 4.34: Measured Adaptation curves for channel 2 with various types of inputs.


Amplitude

Pk-to-pk diff

G

[0-31]

B

[0-31] [0-31]

2 V 5 14 12

1.6 V 5 11 12

1.2 V 4 9 10

0.8 V 3 6 8

0.6 V 2 5 9

ττττ

Figure 4.35: Measured Channel 2 bathtub curves for various input amplitudes along with theadapted coefficient values.


The adaptation curves for channel 3 are shown in Fig. 4.36. The discrete-tap gain was set

manually in these measurements. In all edge based adaptation algorithms [24], [61], [60],

removing the edge ISI leads to the best horizontal eye opening, however, even though

the edge ISI is canceled, the main cursor tap h1 may not be perfectly canceled. This can

lead to a scenario where the vertical eye opening is not sufficiently open for the latches to

operate at the required speed. In the measurements, for channels 3 and 4, not canceling

h1 sufficiently would lead to the system not being able to achieve a BER of 10−12 with the

adapted coefficients. To show that this is in fact the problem, the discrete-tap was fixed

and the IIR coefficients were adapted. The discrete-tap value is chosen such that there

is enough cancellation of h1 to allow for an eye opening larger than the latch sensitivity.

The bathtub curves for channel 3 are shown in Fig. 4.37 for various input amplitude

voltages. The discrete-tap is manually set for each of the input amplitudes.


The adaptation and bathtub curves for channel 4 are shown in Fig. 4.38 and 4.39,

respectively. The DT gain was set manually in these results (explained in section 4.4.3.4).


Time (us)0 10 20 30 40 50

Coe

ffici

ent S

ettin

g (0

-31)

0

5

10

15

20

25


τ (IIR Bandwidth)

B (IIR Gain)G (DT Gain)

Figure 4.36: Measured Adaptation curves for channel 3 with a PRBS7 input (DT Gain, G,is set manually).

Amplitude

Pk-to-pk diff

G

[0-31]

B

[0-31] [0-31]

2 V 11 15 15

1.6 V 10 12 8

1.2 V 8 10 0

0.8 V 6 7 0

ττττ

Figure 4.37: Measured Channel 3 bathtub curves for various input amplitudes along with theadapted coefficient values (DT Gain, G, is set manually).


Time (us)0 10 20 30 40 50

Coe

ffici

ent S

ettin

g (0

-31)

0

5

10

15

20

25


(IIR Bandwidth)τ

B (IIR Gain)

G (DT Gain)

Figure 4.38: Measured Adaptation curves for channel 4 with a PRBS7 input (DT Gain, G,is set manually).

Amplitude

Pk-to-pk diff

G

[0-31]

B

[0-31] [0-31]

2 V 13 16 8

1.6 V 12 14 10

1.2 V 9 12 10

1.0 V 7 10 10

0.8 V 7 8 14

ττττ

Figure 4.39: Measured Channel 4 bathtub curves for various input amplitudes along with theadapted coefficient values (DT Gain, G, is set manually).


4.4.4. Clock Recovery and DFE Measurement Results

To characterize the clock recovery and the DFE jitter tolerance was measured with chan-

nel 1. The DFE coefficients were fixed at their adapted point during the measurement.

Fig. 4.40 shows the measured jitter tolerance for channel 1 with 0ppm, 100ppm, and

150ppm for both 2Vpp-diff and 0.8Vpp-diff input amplitudes. The degredation at the

lower input amplitude with a frequency offset is due to the slight amplitude variation

in the phase interpolators output at different phase codes. Different amplitudes from

the phase rotator cause different latch delays which effects the BER of the DFE. This

effect is demonstrated in Fig. 4.41 by showing bathtub curves for channel 3 with various

phase code settings while the DFE coefficients are kept constant. The bathtub curves

are generated by varying the phase of the clock externally on the Agilent N4960A clock

synthesizer while keeping the phase rotator code constant.

Frequency (Hz)104 105 106 107 108

Jitte

r T

oler

ance

(U

I pk-

to-p

k)

10-1

100

101

102 Measured Jitter Tolerance 16Gb/s (BER < 10-12)

Ch. 1 (0ppm Kp=2^1^1,Ki=1, Vin=2Vpp-diff)

Ch. 1 (0ppm Kp=211,Ki=1, Vin=0.8Vpp-diff)

Ch. 1 (100 ppm Kp=210,Ki=1, Vin=2Vpp-diff)

Ch. 1 (100 ppm Kp=210,Ki=1, Vin=0.8Vpp-diff)

Ch. 1 (150ppm Kp=29,Ki=1, Vin=2Vpp-diff)

Ch. 1 (150ppm Kp=29,Ki=1, Vin=0.8Vpp-diff)

Figure 4.40: Measured jitter tolerance for 0ppm, 100ppm and 150ppm frequency offset forinput amplitudes of 2Vpp-diff and 0.8Vpp-diff for channel 1.


UI-0.4 -0.2 0 0.2 0.4

BE

R

10-12

10-8

10-4

100Measured Bathtub Curve (16Gb/s)

PI code: 0PI code: 5PI code: 10PI code: 15PI code: 20PI code: 25PI code: 30

Figure 4.41: Measured bathtub curves showing the degradation in eye opening with differentPI codes.

4.4.5. Performance Comparison

Table 4.1 shows the power breakdown for the chip at 16Gb/s. The adaptation engine,

CRU and 8:64 demux consume 24.3mW and occupy an area of 41,000 um2. Table 4.2

shows a table of comparison with previous work. Among the previous work, this work

has an adaptation algorithm that is at least 18X faster than other IIR adaptation im-

plementations. The DFE data path consumes 0.99mW/Gbps operating at 16Gb/s while

equalizing 30dB.

4.5. Conclusion

A 16 Gb/s 1 IIR + 1 DT DFE was demonstrated in 28nm FD-SOI CMOS with integrated

clock recovery and adaptation. The novel edge-based adaptation algorithm reuses the

high-speed circuitry and signals required for clock recovery, is robust in the presence of

ill-conditioned data statistics, and yet converges over an order-of-magnitude faster than

previous techniques.


Table 4.1: Chip Power Breakdown

BlockPower

ConsumptionDFE

(Data + Edge Latches + 2:1 Mux)27 mW

Demux(2:8) 14.4 mWDigital Logic

(Adaptation, CRU, 8:64 demux)24.3 mW

Phase Rotator 41.9 mWClock Buffers 33.5 mW

Table 4.2: Comparison to previous work.

[9] [24] [25] [49] [44] This WorkData Rate(Gb/s)

10 6 10 5 10 16

Architecture1 IIR +

1 DT DFE1 IIR DFE 2 IIR DFE

1 IIR + 1 DT+ Pass. EQ

2 IIR +1 DTDFE

1 IIR +1 DT DFE

Loss @half bitrate

27 dB 32.7 dB 35 dB 15 dB 24 dB 28 dB

Technology 65nm 90nm 65nm 65nm (LP)28nm(LP)

28nm FDSOI

Supply (V) 1 NA 1 1.2 1 1DFE Power(mW)[Data path only]

3.5 4 9.9 2.3 4.1 15.8

mW/Gbps 0.35 0.67 0.99 0.46 0.41 0.99Area (um2) 17,250 89,000 30, 400a 23,321 8, 760a 8, 100aAdaptiveEqualization

NO YES NO YES NO YES

Adaptation Time —250 us

(1.5x106 UI)—

Off chip(85 min)

2.55x1023 UI—

5 us(80,000 UI)

a: Area of DFE core only

5 Conclusion

To facilitate the need for energy efficient high-speed I/O, IIR DFEs were analyzed and

implemented. Section 1.1 provided background on conventional receiver architectures

that are used to tackle challenges in high-speed low-power links. A procedure was outlined

for determining an optimal DT + IIR DFE architecture given a channel response. The

procedure observed the vertical and horizontal eye opening to determine the number of

DT taps (K) and the number of IIR filters (N) that would provide the best eye opening.

For a 16” backplane and a 50 meter coax cable, values of K=1 and N=2 were found to

provide good performance while keeping the system complexity low. This procedure was

published in [42].

The first ever hybrid DFE including a DT and N>1 IIR filters was implemented in

a 28nm LP CMOS process (1 DT + 2 IIR) and was measured. The design included a

passive-equalizer front end which allowed for a comparison between passive equalizers and

the DFE. An energy efficiency of 0.41 mW/Gbps was achieved while equalizing 24dB at

half the bit-rate with a transmit swing of only 150 mVpp-diff. This work was published

in [43] and [44].

An edge-based adaptation algorithm that is faster and more robust that previously

implemented algorithms for a conventional DT DFE was presented. Specifically, the

algorithm was compared with two previous publications [49], [24] and showed at least an

18X reduction in adaptation time and was more robust in the presence of repeating and

104

Chapter 5. Conclusion 105

idle patterns. The edge-based adaptation was modified and extended to work with a 1 DT

+ 1 IIR DFE. A 1 DT + 1 IIR adaptive DFE along with clock recovery was implemented

in a 28nm SOI CMOS process. Measurement results showed the DFE could equalize

30dB at half the bit-rate operating at 16 Gb/s. The adaptation algorithm was tested

with PRBS patterns as well as several other repeating patterns. It was demonstrated

that the repeating patterns protect feature stopped the coefficients from diverging during

periods of idle/repeating patterns. This work has been accepted for publication in [65].

5.1. Future Work

To improve the performance of IIR DFEs further, some additional features can be inves-

tigated. These include both improving the specific implementation details of the edge

based adaptive IIR DFE in chapter 4 as well as extending the adaptation to multiple IIR

DFEs.

5.1.1. Edge Based Adaptation Improvements

As discussed in section 4.4.3.4 and 4.4.3.5 the edge based adaptation does not lead to

an optimal vertical eye opening at the sampling point. Instead, the algorithm optimizes

the horizontal eye opening and removes the ISI at the edges. It was also shown that

increasing the gain of the DT to better cancel h1 instead of h1.5 can lead to a better BER

for certain channels. A good addition to the DFE would be to allow the DFE data latches

and edge latches to receive different coefficients from the adaptation algorithm for the

DT tap [66, 67]. This would allow for an offset between the DT gain for the data/edge

latches which could lead to a significant improvement in the DFE performance. Using this

coefficient offset, the data latches can cancel the ISI for the optimal vertical eye opening

while the edge latches continue to optimize the horizontal eye opening. This technique can

be used alongside the proposed adaptation algorithm with very little overhead. The only

parameters that would now need to be adapted is the gain offset between the data/edge

Chapter 5. Conclusion 106

latches. This can be done by using the already available information from the algorithm

from h1.5-h4.5 and extrapolating to obtain some estimate of the value of h1.

5.1.2. Adapting Multiple IIR DFEs

In chapter 2 it was shown that 2 IIR + 1 DT DFEs provided a good performance while

keeping the system complexity low. In chapter 4, a 1 IIR + DT DFE was implemented

due to the complexity of the adaptation for multiple IIR filters. The next step would be

an adaptation algorithm that can adapt multiple IIR filters and DT DFEs together. This

would allow the DFE architecture which leads to the optimal vertical and horizontal eye

opening to be used. Having the flexibility to adapt multiple IIR filters would also extend

the applications of the DFE to more channel response shapes. The main issue that needs

to be resolved is the interaction between the IIR filter responses. Since each filter will

cancel multiple post-cursor ISI, there needs to be a way to remove their interactions when

adapting the filter gains and time constants.

References

[1] Google Inc. Google data centers. http://www.google.ca/about/datacenters/inside/,

2012.

[2] J.D.H. Alexander. “Clock recovery from random binary signals”. Electronics Letters,

11(22):541 –542, 30 1975.

[3] Sandvine Incorporated. Global internet phenomena report. Technical report, Sand-

vine Incorporated, https://www.sandvine.com/downloads/general/global-internet-

phenomena/2014/2h-2014-global-internet-phenomena-report.pdf, 2014.

[4] A. Healey. “F6: I/O design at 25Gb/s and beyond: Enabling the future communi-

cation infrastructure for big data - Challenges and solutions for next-generation 40

to 56 Gb/s transceivers”. In Solid-State Circuits Conference - (ISSCC), 2015 IEEE

International, pages 1–2, Feb 2015.

[5] J. Cao. “F6: I/O design at 25Gb/s and beyond: Enabling the future communication

infrastructure for big data - ADC-and-DAC-based Transceivers for 100Gb Ethernet”.

In Solid-State Circuits Conference - (ISSCC), 2015 IEEE International, pages 1–2,

Feb 2015.

[6] O. Agazzi. “F6: I/O design at 25Gb/s and beyond: Enabling the future communi-

cation infrastructure for big data - Digital Signal Processing Chips for 100-400Gb/s

107

References 108

Optical Communications”. In Solid-State Circuits Conference - (ISSCC), 2015 IEEE


[7] G.G. Shahidi. “Evolution of CMOS Technology at 32 nm and Beyond”. In Custom

Integrated Circuits Conference, 2007. CICC ’07. IEEE, pages 413–416, Sept 2007.

[8] Y.-H. Song, H.-W. Yang, H. Li, P.Y. Chiang, and S. Palermo. “An 8-16 Gb/s, 0.65-

1.05 pJ/b, Voltage-Mode Transmitter With Analog Impedance Modulation Equal-

ization and Sub-3 ns Power-State Transitioning”. Solid-State Circuits, IEEE Journal

of, 49(11):2631–2643, Nov 2014.

[9] Byungsub Kim, Yong Liu, T.O. Dickson, J.F. Bulzacchelli, and D.J. Friedman.

“A 10-Gb/s Compact Low-Power Serial I/O With DFE-IIR Equalization in 65-nm

CMOS”. Solid-State Circuits, IEEE Journal of, 44(12):3526–3538, Dec 2009.

[10] J. Savoj, K. Hsieh, P. Upadhyaya, Fu-Tai An, J. Im, Xuewen Jiang, J. Kamali,

Kang Wei Lai, D. Wu, E. Alon, and Ken Chang. “Design of high-speed wireline

transceivers for backplane communications in 28nm CMOS”. In Custom Integrated

Circuits Conference (CICC), 2012 IEEE, pages 1–4, Sept 2012.

[11] Ki Jin Han, Xiaoxiong Gu, Y.H. Kwark, Lei Shan, and M.B. Ritter. “Modeling

On-Board Via Stubs and Traces in High-Speed Channels for Achieving Higher Data

Bandwidth”. Components, Packaging and Manufacturing Technology, IEEE Trans-

actions on, 4(2):268–278, Feb 2014.

[12] J. Poulton, R. Palmer, A.M. Fuller, T. Greer, J. Eyles, W.J. Dally, and M. Horowitz.

“A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS”. Solid-State Circuits, IEEE

Journal of, 42(12):2745–2757, Dec 2007.

[13] M. Mansuri, J.E. Jaussi, J.T. Kennedy, Tzu-Chien Hsueh, S. Shekhar, G. Balamu-

rugan, F. O’Mahony, C. Roberts, R. Mooney, and B. Casper. “A Scalable 0.128-1

References 109

Tb/s, 0.8-2.6 pJ/bit, 64-Lane Parallel I/O in 32-nm CMOS”. Solid-State Circuits,

IEEE Journal of, 48(12):3229–3242, Dec 2013.

[14] S. Gondi and B. Razavi. “Equalization and Clock and Data Recovery Techniques

for 10-Gb/s CMOS Serial-Link Receivers”. Solid-State Circuits, IEEE Journal of,

42(9):1999–2011, Sept 2007.

[15] C. Thakkar, N. Narevsky, C.D. Hull, and E. Alon. “A mixed-signal 32-coefficient

RX-FFE 100-coefficient DFE for an 8Gb/s 60GHz receiver in 65nm LP CMOS”.

In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE


[16] M.H. Nazari and A. Emami-Neyestanak. “A 15-Gb/s 0.5-mW/Gbps Two-Tap DFE

Receiver With Far-End Crosstalk Cancellation”. Solid-State Circuits, IEEE Journal

of, 47(10):2420–2432, Oct 2012.

[17] T.O. Dickson, J.F. Bulzacchelli, and D.J. Friedman. “A 12-Gb/s 11-mW Half-Rate

Sampled 5-Tap Decision Feedback Equalizer With Current-Integrating Summers in

45-nm SOI CMOS Technology”. Solid-State Circuits, IEEE Journal of, 44(4):1298–

1305, April 2009.

[18] T. Toifl, C. Menolfi, M. Ruegg, R. Reutemann, D. Dreps, T. Beukema, A. Prati,

D. Gardellini, M. Kossel, P. Buchmann, M. Brandli, P.A. Francese, and T. Morf.

“A 2.6 mW/Gbps 12.5 Gbps RX With 8-Tap Switched-Capacitor DFE in 32 nm

CMOS”. Solid-State Circuits, IEEE Journal of, 47(4):897–910, April 2012.

[19] Huaide Wang and Jri Lee. “A 21-Gb/s 87-mW Transceiver With FFE/DFE/Analog

Equalizer in 65-nm CMOS Technology”. Solid-State Circuits, IEEE Journal of,

45(4):909–920, April 2010.

[20] F. Zhong, Shaolei Quan, Wing Liu, P. Aziz, Tai Jing, Jen Dong, C. Desai, Hairong

Gao, M. Garcia, G. Hom, T. Huynh, H. Kimura, R. Kothari, Lijun Li, C. Liu,

References 110

S. Lowrie, K. Ling, A. Malipatil, R. Narayan, T. Prokop, C. Palusa, A. Ra-

jashekara, A. Sinha, C. Zhong, and E. Zhang. “A 1.0625 to 14.025 Gb/s Multi-

Media Transceiver With Full-Rate Source-Series-Terminated Transmit Driver and

Floating-Tap Decision-Feedback Equalizer in 40 nm CMOS”. Solid-State Circuits,

IEEE Journal of, 46(12):3126–3139, Dec 2011.

[21] J.L. Zerbe, C.W. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim, W.F.

Stonecypher, A. Ho, T.P. Thrush, R.T. Kollipara, M.A. Horowitz, and K.S. Don-

nelly. “Equalization and clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane

transceiver cell”. Solid-State Circuits, IEEE Journal of, 38(12):2121–2130, Dec 2003.

[22] A. Emami-Neyestanak, A. Varzaghani, J.F. Bulzacchelli, A. Rylyakov, C.-K.K.

Yang, and D.J. Friedman. “A 6.0-mW 10.0-Gb/s Receiver With Switched-Capacitor

Summation DFE”. Solid-State Circuits, IEEE Journal of, 42(4):889–896, April 2007.

[23] Seuk Son, Han-Seok Kim, Myeong-Jae Park, Kyunghoon Kim, E-Hung Chen, B. Lei-

bowitz, and Jaeha Kim. “A 2.3-mW, 5-Gb/s Low-Power Decision-Feedback Equal-

izer Receiver Front-End and its Two-Step, Minimum Bit-Error-Rate Adaptation

Algorithm”. Solid-State Circuits, IEEE Journal of, 48(11):2693–2704, Nov 2013.

[24] Yi-Chieh Huang and Shen-Iuan Liu. “A 6Gb/s receiver with 32.7dB adaptive DFE-

IIR equalization”. In Solid-State Circuits Conference Digest of Technical Papers

(ISSCC), 2011 IEEE International, pages 356–358, Feb 2011.

[25] O. Elhadidy and S. Palermo. “A 10 Gb/s 2-IIR-tap DFE receiver with 35 dB loss

compensation in 65-nm CMOS”. In VLSI Circuits (VLSIC), 2013 Symposium on,

pages C272–C273, June 2013.

[26] N. Sitthimahachaikul, J.P. Keane, and P.J. Hurst. “An adaptive DFE using an

IIR feedback equalizer for 100Base-TX Ethernet”. In Circuits and Systems, 2004.

References 111

NEWCAS 2004. The 2nd Annual IEEE Northeast Workshop on, pages 173–176,

June 2004.

[27] E. Mensink, D. Schinkel, E.A.M. Klumperink, E. van Tuijl, and B. Nauta. “Power

Efficient Gigabit Communication Over Capacitively Driven RC-Limited On-Chip

Interconnects”. Solid-State Circuits, IEEE Journal of, 45(2):447–457, Feb 2010.

[28] E. Mensink, D. Schinkel, E. Klumperink, E. van Tuijl, and B. Nauta. “A 0.28pJ/b

2Gb/s/ch Transceiver in 90nm CMOS for 10mm On-Chip interconnects”. In Solid-

State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE


[29] E-Hung Chen, Jihong Ren, B. Leibowitz, Hae-Chang Lee, Qi Lin, Kyung Oh,

F. Lambrecht, V. Stojanovic, J. Zerbe, and C.-K.K. Yang. “Near-Optimal Equal-

izer and Timing Adaptation for I/O Links Using a BER-Based Metric”. Solid-State

Circuits, IEEE Journal of, 43(9):2144–2156, Sept 2008.

[30] V. Stojanovic, A. Ho, B.W. Garlepp, F. Chen, J. Wei, G. Tsang, E. Alon, R.T.

Kollipara, C.W. Werner, J.L. Zerbe, and M.A. Horowitz. “Autonomous dual-mode

(PAM2/4) serial link transceiver with adaptive equalization and data recovery”.

Solid-State Circuits, IEEE Journal of, 40(4):1012–1026, 2005.

[31] Wang-Soo Kim, Chang-Kyung Seong, and Woo-Young Choi. “A 5.4Gb/s adaptive

equalizer using asynchronous-sampling histograms”. In Solid-State Circuits Confer-

ence Digest of Technical Papers (ISSCC), 2011 IEEE International, pages 358–359,

2011.

[32] D. Dunwell and A.C. Carusone. “Gain and equalization adaptation to optimize the

vertical eye opening in a wireline receiver”. In Custom Integrated Circuits Conference

(CICC), 2010 IEEE, pages 1–4, 2010.

References 112

[33] H. Noguchi, Nobuhide Yoshida, H. Uchida, M. Ozaki, S. Kanemitsu, and S. Wada.

“A 40-Gb/s CDR Circuit With Adaptive Decision-Point Control Based on Eye-

Opening Monitor Feedback”. Solid-State Circuits, IEEE Journal of, 43(12):2929–

2938, Dec 2008.

[34] Behzad Razavi. Design of Integrated Circuits for Optical Communications. McGraw-

Hill, 2003.

[35] R. Kreienkamp, Ulrich Langmann, C. Zimmermann, T. Aoyama, and H. Siedhoff. “A

10-gb/s CMOS clock and data recovery circuit with an analog phase interpolator”.

Solid-State Circuits, IEEE Journal of, 40(3):736–743, March 2005.

[36] Nikola Nedovic, A. Kristensson, S. Parikh, S. Reddy, Scott McLeod, N. Tzartzanis,

K. Kanda, T. Yamamoto, S. Matsubara, M. Kibune, Y. Doi, S. Ide, Y. Tsunoda,

T. Yamabana, T. Shibasaki, Y. Tomita, T. Hamada, M. Sugawara, T. Ikeuchi,

N. Kuwata, Hirotaka Tamura, J. Ogawa, and W. Walker. “A 3 Watt 39.8-44.6 Gb/s

Dual-Mode SFI5.2 SerDes Chip Set in 65 nm CMOS”. Solid-State Circuits, IEEE

Journal of, 45(10):2016–2029, Oct 2010.

[37] T.O. Dickson, Yong Liu, S.V. Rylov, A. Agrawal, Seongwon Kim, Ping-Hsuan Hsieh,

J.F. Bulzacchelli, M. Ferriss, H.A. Ainspan, A. Rylyakov, B.D. Parker, M.P. Beakes,

C. Baks, Lei Shan, Young Kwark, J.A. Tierno, and D.J. Friedman. “A 1.4 pJ/bit,

Power-Scalable 16x12 Gb/s Source-Synchronous I/O With DFE Receiver in 32 nm

SOI CMOS Technology”. Solid-State Circuits, IEEE Journal of, 50(8):1917–1931,

Aug 2015.

[38] “Optimization Toolbox”. The Mathworks, Inc., 2010.

[39] P.M. Crespo and M.L. Honig. “Pole-zero decision feedback equalization with a

rapidly converging adaptive IIR algorithm”. Selected Areas in Communications,

IEEE Journal on, 9(6):817 –829, aug 1991.

References 113

[40] Z. Ding and R.A. Kennedy. “On the whereabouts of local minima for blind adaptive

equalizers”. Circuits and Systems II: Analog and Digital Signal Processing, IEEE

Transactions on, 39(2):119 –123, feb 1992.

[41] S.C. Ng and S.H. Leung. “On solving the local minima problem of adaptive learning

by using deterministic weight evolution algorithm”. In Evolutionary Computation,

2001. Proceedings of the 2001 Congress on, volume 1, pages 251 –255 vol. 1, 2001.

[42] S. Shahramian, H. Yasotharan, and A.C. Carusone. “Decision Feedback Equalizer

Architectures With Multiple Continuous-Time Infinite Impulse Response Filters”.

Circuits and Systems II: Express Briefs, IEEE Transactions on, 59(6):326–330, June

2012.

[43] S. Shahramian and A.C. Carusone. “A 10Gb/s 4.1mW 2-IIR + 1-discrete-tap DFE

in 28nm-LP CMOS”. In European Solid State Circuits Conference (ESSCIRC),

ESSCIRC 2014 - 40th, pages 439–442, Sept 2014.

[44] S. Shahramian and A. Chan Carusone. “A 0.41 pJ/Bit 10 Gb/s Hybrid 2 IIR and 1

Discrete-Time DFE Tap in 28 nm-LP CMOS”. Solid-State Circuits, IEEE Journal

of, 50(7):1722–1735, July 2015.

[45] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, and B. Nauta. “A Double-

Tail Latch-Type Voltage Sense Amplifier with 18ps Setup+Hold Time”. In Solid-

State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE


[46] F. O’Mahony, S. Shekhar, M. Mansuri, G. Balamurugan, J.E. Jaussi, J. Kennedy,

B. Casper, D.J. Allstot, and R. Mooney. “A 27Gb/s Forwarded-Clock I/O Receiver

Using an Injection-Locked LC-DCO in 45nm CMOS”. In Solid-State Circuits Con-

ference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, pages

452–627, Feb 2008.

References 114

[47] A. Mirzaei, M.E. Heidari, R. Bagheri, S. Chehrazi, and A.A. Abidi. “Injection-

Locked Frequency Dividers based on Ring Oscillators with Optimum Injection for

Wide Lock Range”. In VLSI Circuits, 2006. Digest of Technical Papers. 2006 Sym-

posium on, pages 174–175, 2006.

[48] Joonsuk Lee and Beomsup Kim. “A low-noise fast-lock phase-locked loop with

adaptive bandwidth control”. Solid-State Circuits, IEEE Journal of, 35(8):1137–

1145, Aug 2000.

[49] Seuk Son, Han-Seok Kim, Myeong-Jae Park, Kyunghoon Kim, E-Hung Chen, B. Lei-

bowitz, and Jaeha Kim. “A 2.3-mW, 5-Gb/s Low-Power Decision-Feedback Equal-

izer Receiver Front-End and its Two-Step, Minimum Bit-Error-Rate Adaptation

Algorithm”. Solid-State Circuits, IEEE Journal of, 48(11):2693–2704, Nov 2013.

[50] O. Elhadidy, A. Roshan-Zamir, Hae-Woong Yang, and S. Palermo. “A 32 Gb/s 0.55

mW/Gbps PAM4 1-FIR 2-IIR tap DFE receiver in 65-nm CMOS”. In VLSI Circuits

(VLSI Circuits), 2015 Symposium on, pages C224–C225, June 2015.

[51] P. Ossieur, N.A. Quadir, S. Porto, C. Antony, W. Han, M. Rensing, P. O’Brien, and

P.D. Townsend. “A 10 Gb/s Linear Burst-Mode Receiver in 0.25um SiGe BiCMOS”.

Solid-State Circuits, IEEE Journal of, 48(2):381–390, Feb 2013.

[52] M. Hossain and A.C. Carusone. “5-10 Gb/s 70 mW Burst Mode AC Coupled Re-

ceiver in 90-nm CMOS”. Solid-State Circuits, IEEE Journal of, 45(3):524–537,

March 2010.

[53] M. Nakamura, Y. Imai, Y. Umeda, J. Endo, and Y. Akatsu. “1.25-Gb/s burst-mode

receiver ICs with quick response for PON systems”. Solid-State Circuits, IEEE

Journal of, 40(12):2680–2688, Dec 2005.

[54] Samuel D. Stearns Bernard Widrow. Adaptive Signal Processing. Prentice-Hall PTR,

1985.

References 115

[55] B.S. Leibowitz, J. Kizer, Haechang Lee, F. Chen, A. Ho, M. Jeeradit, A. Bansal,

T. Greer, S. Li, R. Farjad-Rad, W. Stonecypher, Y. Frans, B. Daly, F. Heaton,

B.W. Gariepp, C.W. Werner, Nhat Nguyen, V. Stojanovic, and J.L. Zerbe. “A

7.5Gb/s 10-Tap DFE Receiver with First Tap Partial Response, Spectrally Gated

Adaptation, and 2nd-Order Data-Filtered CDR”. In Solid-State Circuits Conference,

2007. ISSCC 2007. Digest of Technical Papers. IEEE International, pages 228–599,

Feb 2007.

[56] A.C. Carusone. “An Equalizer Adaptation Algorithm to Reduce Jitter in Binary Re-

ceivers”. Circuits and Systems II: Express Briefs, IEEE Transactions on, 53(9):807–

811, 2006.

[57] Hyung-Joon Chi, Jae seung Lee, Seong-Hwan Jeon, Seung-Jun Bae, Young-Soo

Sohn, Jae-Yoon Sim, and Hong-June Park. “A Single-Loop SS-LMS Algorithm

With Single-Ended Integrating DFE Receiver for Multi-Drop DRAM Interface”.

Solid-State Circuits, IEEE Journal of, 46(9):2053–2063, Sept 2011.

[58] Z.-H. Hong, Y.-C. Liu, and W.-Z. Chen. “A 3.12 pJ/bit, 19-27 Gbps Receiver

With 2-Tap DFE Embedded Clock and Data Recovery”. Solid-State Circuits, IEEE

Journal of, PP(99):1–10, 2015.

[59] C. Thakkar, Lingkai Kong, Kwangmo Jung, A. Frappe, and E. Alon. “A 10 Gb/s

45 mW Adaptive 60 GHz Baseband in 65 nm CMOS”. Solid-State Circuits, IEEE

Journal of, 47(4):952–968, April 2012.

[60] Y. Hidaka, Weixin Gai, Takeshi Horie, Jian Hong Jiang, Y. Koyanagi, and H. Osone.

“A 4-Channel 1.25-10.3 Gb/s Backplane Transceiver Macro With 35 dB Equalizer

and Sign-Based Zero-Forcing Adaptive Control”. Solid-State Circuits, IEEE Journal

of, 44(12):3547–3559, 2009.

References 116

[61] R. Payne, P. Landman, B. Bhakta, S. Ramaswamy, Song Wu, J.D. Powers, M.U.

Erdogan, Ah-Lyan Yee, R. Gu, Lin Wu, Yiqun Xie, B. Parthasarathy, K. Brouse,

W. Mohammed, K. Heragu, V. Gupta, L. Dyson, and Wai Lee. “A 6.25-Gb/s binary

transceiver in 0.13- mu;m CMOS for serial data transmission across high loss legacy

backplane channels”. Solid-State Circuits, IEEE Journal of, 40(12):2646–2657, 2005.

[62] D. Johns T. Chan Carusone and K. Martin. Analog Integrated Circuit Design. 2nd

Edition. J. Wiley & Sons, 2011.

[63] K.-L.J. Wong, A. Rylyakov, and C.-K.K. Yang. “A 5-mW 6-Gb/s Quarter-Rate

Sampling Receiver With a 2-Tap DFE Using Soft Decisions”. Solid-State Circuits,

IEEE Journal of, 42(4):881–888, April 2007.

[64] Pete Anslow. “CEI Short Stress Patterns White Paper”. Optical Internetworking

Forum, 2007.

[65] S. Shahramian, B. Dehlaghi, and A.C. Carusone. “16 Gb/s 1 IIR + 1 DT DFE Com-

pensating 28dB Loss with Edge-Based Adaptation Converging in 5us”. In Solid-State

Circuits Conference Digest of Technical Papers (ISSCC), 2016 IEEE International,

Feb 2016.

[66] Seok Kim, Eun-Young Jin, Kee-Won Kwon, Jintae Kim, and Jung-Hoon Chun.

“A 6.4-Gb/s Voltage-Mode Near-Ground Receiver With a One-Tap Data and Edge

DFE”. Circuits and Systems II: Express Briefs, IEEE Transactions on, 61(6):438–

442, June 2014.

[67] K.-L.J. Wong, E-Hung Chen, and C.-K.K. Yang. “Edge and Data Adaptive

Equalization of Serial-Link Transceivers”. Solid-State Circuits, IEEE Journal of,

43(9):2157–2169, Sept 2008.

Date post:	24-Apr-2020
Category:	Documents
Upload:	others
View:	27 times
Download:	0 times

Adaptive DecisionFeedback Equalization With Continuous ...€¦ · Adaptive DecisionFeedback...

Documents