OPTICAL INTERCONNECTS TO SILICON CHIPS USING...

transcript

OPTICAL INTERCONNECTS TO SILICON CHIPS USING

SHORT PULSES

a dissertation

submitted to the department of electrical engineering

and the committee on graduate studies

of stanford university

in partial fulfillment of the requirements

for the degree of

doctor of philosophy

Diwakar Agarwal

September 2002

I certify that I have read this dissertation and that in

my opinion it is fully adequate, in scope and quality, as

a dissertation for the degree of Doctor of Philosophy.

David A. B. Miller(Principal Adviser)

Joseph W. Goodman

Mark A. Horowitz

Approved for the University Committee on Graduate

Studies:

Abstract

Processor speeds continue to increase rapidly due to the scaling of CMOS line-widths,

but electrical interconnect speeds have not grown at the same rate. The loss mecha-

nisms in electrical interconnects limit their ultimate capacity. Optical interconnects

have the potential to alleviate this interconnect bottleneck. At short scales such as

board-to-board, chip-to-chip, and on-chip, the important requirements for these opti-

cal interconnects are low latency, high throughput, high density, high bandwidth, and

simple integration with mainstream silicon technology. This thesis investigates optical

interconnects designed to meet these requirements using short pulses, in conjunction

with multiple quantum well (MQW) diodes filp-chip bonded to silicon CMOS chips.

The use of short optical pulses (100 fs to a few ps), equivalent to a return-to-zero

(RZ) format with very low duty cycle, has many potential advantages. We show

that using short pulses in optical links can, a) enhance the sensitivity of the receiver;

b) remove skew and jitter from an array of transmitters (modulators); c) deliver a

precise clock signal; d) reduce the latency of the receiver; and e) enable wavelength

division multiplexing. Furthermore, the sensitivity of the receivers can be enhanced

by 3 dB or more by using short pulses, which improves the overall system power

budget. The latency of trans-impedance and integrating receivers can be reduced by

greater than 60%, which might make global on-chip optical interconnects feasible.

The latency can be even further reduced by using a totem-pole diode pair without

amplification at the expense of optical power. All these benefits are investigated

through simulations and a series of experimental demonstrations.

Acknowledgments

There are a lot of people who have made it possible for me to be at this stage, and in

the process helped me in my academic and personal growth. I would like to express

my sincere thanks to all of them.

First, thanks to Dr. David Miller for his constant guidance. During the course

of my research he has always been very encouraging. I have learned a lot of stuff

from him, but one thing stands out in my mind. He has always said that if there is a

problem, which is getting difficult to figure out, go to the basics. It is amazing how

many times we forget to do this, even though it is such common sense.

I would like to thank Dr. Joseph Goodman for getting me interested in the area of

optics when I came to Stanford. He has been providing valuable advice and guidance

whenever required. I would also like to thank Dr. Mark Horowitz for reading my

thesis and giving me excellent critiques. Dr. Horowitz has patiently listened to my

ideas and given helpful suggestions over the course of my research. Access to his

hardware lab was also very helpful in testing my chips. Thanks are also due to Dr.

Fabian Pease for serving in my examination committee.

The optical interconnect project required the collaboration of several people to

make it successful. I would like to thank Gordon and Noah for their tireless work on

processing and flip-chip bonding of modulators, and for the squash games that took

the frustration out. Bianca designed the baseplates for the optical setups. Without

these components, this work would not have been possible. Christof Debaes of Vrije

Universiteit Brussel worked very closely with me on the design of the chip fabricated

through National Semiconductor. It was a lot of fun and a learning experience working

with him. Optical testing was a joint effort with all the students mentioned above.

Vijit, Ryo, and Helen worked for the initial development of the flip-chip bonding

process. Aparna and Ray have been very helpful by asking detailed questions during

their circuit design learning process. Micah has been a constant source of inspiration

for finishing my work. Volkan and Helen have been a sounding board for my ideas

and complaints. Coffee breaks with Martina provided relief from the “hard routine”

in which Christof was also a participant whenever he was visiting. Late night chats

with Sameer were quite refreshing and tennis with Henry was a lot of fun. I am also

grateful for administrative support of Ingrid Tarien. I do want to thank everybody

in the Miller group for making my experience enjoyable as well as valuable.

Ted Woodward and Ashok Krishnamoorthy kindly educated me in optical inter-

connect and receiver design during my summer internship at Bell Laboratory. Bill

Ellersick, Stefanos Sidiropoulos, Ken Yang, Amrutur Bharadwaj, and Evelina Yeung

in Dr. Horowitz’s group have helped by answering questions and providing circuit

design help. Gibong Jeong, Jane Lam, and Edmund Lam, former students of Dr.

Goodman have also been very helpful. I would also like to thank National Semicon-

ductor for the fabrication of the CMOS chip and JSEP, DARPA, and MARCO for

funding the research.

Friends outside the laboratory have given me company in many sports and fun

activities, which I enjoyed very much. My sisters, Anamika and Swarnima, reminded

me at regular intervals of possible life after Ph.D. I would like to thank my wife Kokila

for motivating me to finish my thesis and making it a worthwhile experience. She has

tried to read the thesis from an architect’s point of view and reminded me to make

it look more artistic. Thanks to her, my thesis has less mistakes. Any remaining

mistakes are solely my fault. Finally, I would like to thank my parents who have

always given me their unconditional support and without whom I would never have

made it here in the first place.

Contents

Abstract iv

Acknowledgments v

1 Introduction 1

1.1 Potential advantages of optics . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Limitations of electrical interconnects . . . . . . . . . . . . . . 3

1.1.2 Other advantages of optics . . . . . . . . . . . . . . . . . . . . 5

1.2 Components of an optical interconnect . . . . . . . . . . . . . . . . . 6

1.2.1 Optoelectronic devices . . . . . . . . . . . . . . . . . . . . . . 6

1.2.2 Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.3 Free space optical interconnects . . . . . . . . . . . . . . . . . 10

1.3 Challenges in current optical communication . . . . . . . . . . . . . . 11

1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Short Pulses in Interconnects 16

2.1 Improved receiver performance . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Low latency in receivers . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 Better synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4 Wavelength division multiplexing (WDM) . . . . . . . . . . . . . . . 22

3 Optical Interconnect Setup and Components 24

3.1 Optical test bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 MQW diodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Silicon chips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.1 Modulator driver . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.2 Pseudo random bit sequence (PRBS) generator and tester . . 32

3.3.3 Samplers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Hybrid integration of GaAs devices . . . . . . . . . . . . . . . . . . . 34

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Receivers 38

4.1 Transimpedance receiver . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 Integrating receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 Totem-pole diode pair receiver . . . . . . . . . . . . . . . . . . . . . . 51

4.4 Fabrication and testing . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.1 Transimpedance receiver . . . . . . . . . . . . . . . . . . . . . 53

4.4.2 Integrating receiver . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4.3 Measurement with supply noise . . . . . . . . . . . . . . . . . 58

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5 Latency in Interconnects 62

5.1 Transimpedance receivers . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.1.1 Modeling of latency . . . . . . . . . . . . . . . . . . . . . . . . 66

5.1.2 Measurement of latency . . . . . . . . . . . . . . . . . . . . . 71

5.2 Integrating Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.3 Totem-pole diode receiver . . . . . . . . . . . . . . . . . . . . . . . . 76

5.4 Scaling of latency with technology . . . . . . . . . . . . . . . . . . . . 78

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6 Timing in Silicon Chips 81

6.1 Jitter and skew removal . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2 Optical clock injection . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.2.1 Silicon detectors . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.2.2 Frequency response of silicon detectors . . . . . . . . . . . . . 87

6.2.3 Receiverless clock injection . . . . . . . . . . . . . . . . . . . . 90

6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7 Wavelength Division Multiplexing System 95

7.1 Concept of WDM with short pulses . . . . . . . . . . . . . . . . . . . 96

7.2 System implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.2.1 Optical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.2.2 Measurement results . . . . . . . . . . . . . . . . . . . . . . . 103

7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8 Conclusions 106

Bibliography 110

List of Tables

5.1 Receiver latency with NRZ and short pulse inputs. Optical energy per

bit for the transimpedance and integrating receivers is ∼ 50 fJ, and for

the recless receiver is 450 fJ. . . . . . . . . . . . . . . . . . . . . . . . 80

6.1 The dimensions and the capacitances of the silicon detectors imple-

mented in this work. Two n-well detectors and two interdigitated de-

tectors of different sizes were chosen. . . . . . . . . . . . . . . . . . . 87

List of Figures

1.1 Interconnects at different levels . . . . . . . . . . . . . . . . . . . . . 2

1.2 MQW modulator operation . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Wavelength vs. contrast ratio curve for MQW modulator for different

voltage swings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Schematic demonstration of NRZ and RZ coding . . . . . . . . . . . . 13

2.1 A pulse train and its spectrum . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Short pulse properties . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3 Sensitivity enhancement in transimpedance receiver with short pulses 19

2.4 Timing diagram of the integrating receiver with short pulse and NRZ

inputs. Energy incident during the evaluation phase is not integrated. 20

2.5 Skew removal from multiple parallel channels using short pulses. The

three waveforms are electrical drive signals and they are read by a short

pulse which samples all the channels at the same time. . . . . . . . . 21

2.6 Spectral slicing of short pulse spectrum for WDM . . . . . . . . . . . 22

3.1 Schematic diagram of an optical interconnect system . . . . . . . . . 25

3.2 Optomechanical setup for testing . . . . . . . . . . . . . . . . . . . . 26

3.3 Schematic and the picture of totem-pole connected diodes . . . . . . 28

3.4 Layout of the chip fabricated in the 0.5 µm HP process . . . . . . . . 29

3.5 Layout of the chip fabricated in the 0.25 µm National Semiconductor

process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.6 Eye diagram of modulator driver operation at 800 Mb/s obtained by

optical readout of the modulator. . . . . . . . . . . . . . . . . . . . . 31

3.7 Schematic of a LFSR generating a pseudo random sequence of length

27 − 1, where a square corresponds to a D flip-flop. . . . . . . . . . . 32

3.8 Schematic of the circuit to verify the sequence generated by the LFSR

shown earlier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.9 The circuit schematic of the on-chip sampler in 0.25 µm CMOS tech-

nology. All transistors are minimum length. (Yeung et al. [61]) . . . 33

3.10 Integration of GaAs devices on silicon chips . . . . . . . . . . . . . . 35

3.11 Picture of a CMOS chip with flip chip bonded diodes . . . . . . . . . 36

4.1 Transimpedance receiver structure . . . . . . . . . . . . . . . . . . . . 41

4.2 Schematic of the transimpedance frontend and the small-signal equiv-

alent circuit of its implementation. . . . . . . . . . . . . . . . . . . . 42

4.3 Pulse and step response of the transimpedance stage . . . . . . . . . 43

4.4 Pulse and step response of the transimpedance stage with varying feed-

back resistance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.5 Pulse response of the transimpedance stage with varying feedback re-

sistances normalized to the maximum of step response. . . . . . . . . 45

4.6 Pulse and step response of the transimpedance stage with varying

front-end capacitance. . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.7 Pulse response of the transimpedance stage with varying pulse width 47

4.8 Schematic of the integrating receiver frontend . . . . . . . . . . . . . 48

4.9 Timing diagram of the operation of integrating receiver with NRZ and

short pulse inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.10 Input data arrival-tolerance margins illustrated for NRZ and short

pulse inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.11 Totem-pole diode pair connected to a high impedance input node of

inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.12 Schematic of the transimpedance receiver. Transistor widths men-

tioned here are in λ, where λ = 0.2 µm for the technology used. All

transistors are minimum length. . . . . . . . . . . . . . . . . . . . . . 54

4.13 SPICE simulation of the transimpedance receiver with 10 µA average

photocurrent. Voltage at node out is shown. Top curve is for 1 Gbps

operation of the receiver with 260 fF of diode capacitance. Bottom

curve shows the operation at 1.5 Gbps with 100 fF of diode capacitance. 55

4.14 Eye diagram of the transimpedance receiver operation with NRZ input

at 600 Mb/s. 26 µA average photocurrent is injected in each beam. . 56

4.15 Eye diagram of the transimpedance receiver output voltage with short

pulse input at 80 Mb/s. . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.16 Schematic of the integrating receiver fabricated in the 0.5 µm tech-

nology. Transistor widths are shown in λ, where λ is 0.35 µm. All

transistors are minimum length. . . . . . . . . . . . . . . . . . . . . . 57

4.17 Operation of the integrating receiver with optical readout at 600Mb/s. 57

4.18 Sensitivity comparison for NRZ and short pulse data for integrating

receiver operating at 400 Mbps in a chip-to-chip link. . . . . . . . . . 58

4.19 Transimpedance receiver delay variation as a function of supply volt-

age. This measurement was done via the pump-probe technique. The

nominal supply voltage was 2.5 V. . . . . . . . . . . . . . . . . . . . . 59

4.20 Bit error rate curves of integrating receiver operation in a link at

100 Mbps with NRZ data. Sinusoid noise was injected in the supply

with different peak-to-peak values at 1 KHz. . . . . . . . . . . . . . . 60

5.1 ITRS projection of on-chip electrical interconnect delays with technol-

ogy scaling [42] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2 Components of latency in a modulator-based interconnect system . . 64

5.3 Mechanism of latency reduction in a transimpedance receiver with

short pulse input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.4 First order model of a transimpedance receiver with variable length

post-amplifier chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.5 Pulse energy vs. delay for short pulse and NRZ input for the first-order

model. Corresponding SPICE simulations are denoted with “x”. . . . 68

5.6 Variation of delay vs. number of post-amplifier stages for different total

gain, assuming a constant gain-bandwidth product for all stages. . . . 69

5.7 Number of post-amplifier stages vs. delay for different pulse energy . 70

5.8 Pulse energy vs. receiver delay for 2 and 3 post-amplifier stages . . . 70

5.9 Pump-probe setup for transceiver latency measurement . . . . . . . . 72

5.10 Receiver transmitter module used for testing latency via pump-probe

method. The numbers mentioned here are the sizes of PMOS and

NMOS transistors in λ, where λ = 0.2 µm. . . . . . . . . . . . . . . 73

5.11 Comparison of the latency of the transimpedance receiver-transmitter

module with short pulse and NRZ inputs. . . . . . . . . . . . . . . . 73

5.12 Circuit schematic of the integrating receiver frontend . . . . . . . . . 74

5.13 Latency with respect to clock in the integrating receiver with NRZ and

short pulse inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.14 Latency of the entire integrating receiver, including the SR latch, with

short pulse input computed by using SPICE circuit simulator. . . . . 76

5.15 Schematic of the totem-pole diode pair receiver connected to the high

impedance input of the inverter buffer. . . . . . . . . . . . . . . . . . 77

5.16 Voltage vs. time at node “in” of the recless receiver for NRZ and short

pulse inputs with minimum optical energy to swing the node by supply

voltage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.17 Comparing the delay of the transimpedance receiver with short pulse

data for 0.25 µm and 0.5 µm technologies by normalizing to FO4 delay

in respective technologies. . . . . . . . . . . . . . . . . . . . . . . . . 78

5.18 FO4 gate delay scaling with technology [107] . . . . . . . . . . . . . . 79

6.1 Transmitted signals from two channels readout with a cw laser. Chan-

nels are skewed by 3/8 of a bit period. . . . . . . . . . . . . . . . . . 83

6.2 Skew removal by short pulse readout of two modulator channels skewed

by 3/8 of a bit period. Ones and zeros are alternately read by these

pulses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.3 Jitter removal from a single interconnect channel. Upper trace is the

electrical drive signal with jitter and the bottom trace is the optical

readout of the receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.4 A cross-sectional view of two silicon detector topologies . . . . . . . . 86

6.5 The sampled signal trace showing the response of the first interdig-

itated detector to an optical short pulse. The optical energy in the

pulse was 0.74 pJ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.6 The frequency behavior of the various silicon detectors. The response

of the second interdigitated detector was not included for clarity. The

curves have been normalized with respect to their first frequency com-

ponent for comparison reasons. . . . . . . . . . . . . . . . . . . . . . 89

6.7 Schematic of receiverless optical clock injection with optical short pulses

using a totem-pole diode pair. The inverter provides very little capac-

itive loading, though it can be eliminated and clock can be injected

directly at the desired node. . . . . . . . . . . . . . . . . . . . . . . . 91

6.8 Equivalent circuit of the totem-pole pair implementation with interdig-

itated diodes. Due to substrate connection this device was self-resetting. 91

6.9 Receiverless optical clock injection with optical short pulses of 6 pJ

onto the totem-pole configuration of interdigitated detectors. . . . . . 92

6.10 Histogram of the pulse signals crossing at marker level at half their

swing. The histograms correspond to two experiments one of which is

delayed 10 ps more compared to the reference clock. . . . . . . . . . . 93

7.1 An exaggerated view of the frequency comb incident on the modulator

array. Frequency components of the 80 MHz pulse train are separated

in space by a blazed grating. . . . . . . . . . . . . . . . . . . . . . . . 97

7.2 Schematic of the WDM system implementation . . . . . . . . . . . . 98

7.3 First generation optical setup using Spindler and Hoyer components.

The portion on the transmitter side is visible. . . . . . . . . . . . . . 101

7.4 Second generation WDM link optical setup. A closeup of the receiver

side is shown in the picture. . . . . . . . . . . . . . . . . . . . . . . . 102

7.5 CCD scan of the wavelength of the modulated transmitter output.

Solid and dashed lines represent two snapshots at different times. The

corresponding modulators are shown below the wavelength scan. . . . 103

7.6 80 Mb/s operation of a single channel in a WDM link . . . . . . . . . 104

Chapter 1

Introduction

Modern computer processors run at the clock speeds of many GHz but the processor

to memory interface runs only at a few hundred MHz. A key reason for this difference,

and a problem for computing in general, is that the interface connection speeds are

not able to keep up with the increase in the processor speeds. This is mainly because

of design issues of electrical busses and their underlying physical properties. Due

to the capacity limitations of electrical wires, all long distance communication is

now done via optics. For medium distance communication, e.g. LAN, MAN, WAN

(about 300m-100km), optics is making inroads specifically because only optics can

support the high data rates required by these applications. At shorter distances (a

few meters - few hundred meters), primarily in data links, optics is rapidly gaining

entry. Even at distances shorter than a few meters, research is underway to use optics

for communication purposes.

A categorization of optical links is shown in Fig. 1.1. Short distance communica-

tion can be divided into the following categories: machine-to-machine (a few meters

to 100s of meters), inter-shelf or possibly on large boards or backplanes (a few cm to

a few meters), chip-to-chip (a few cm) and on-chip (up to a few cm). There are a

few products in the market for machine-to-machine communication using optics but

other categories are still in research stages. The practicality and the feasibility of

chip-to-chip and on-chip communication using optics is still an open question.

Optical interconnects to chips still face many technical challenges. Optics might

CHAPTER 1. INTRODUCTION 2

1 mm 1 cm 10 cm 1 m 10 m 100 m 1 km 10 km 100 km

inter−chip

chip−to−chip

inter−shelf

racks/chassis

LAN/WAN

Longhaul

2 D free space single and multimode fibercoarse−WDM and TDMparallel interconnects dense−WDM and TDM

Single mode fiber

1000 km

Figure 1.1: Interconnects at different levels

need to provide very dense interconnects, probably 1000’s of interconnects per chip.

Small and efficient optoelectronic components are needed to satisfy this requirement.

Even though very sophisticated optics and components are available for long distance

communication, new technology is needed for connection to chips because the re-

quirements are very different. Small latency, low noise, low power dissipation, and

the ability to coexist with mainstream silicon technology are required for dense inter-

connects. This dissertation investigates the role of return-to-zero signaling in meeting

these requirements.

We will first discuss the potential benefits of optics in interconnects in Section 1.1.

Then in Section 1.2 we will briefly introduce the devices, technologies, and components

required for optical interconnects. Challenges for optical interconnect and the focus

of this work will be discussed in Section 1.3. We will finally conclude in Section 1.4

by giving an overview of all the chapters.

1.1 Potential advantages of optics

Optical interconnects to chips have been studied for a long time. This study started

with a seminal paper by Goodman et al. [1]. Since then many authors have addressed

the benefits and limits of optical interconnects ( [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]), and

the analysis of optics vs. electronics ([3] [12] [13] [14]). We will look at the potential

benefits of optical interconnects, which come out of the references mentioned.

Optics has a very high frequency carrier (order of THz), a very short wavelength

(∼ 1 µm) and large photon energy. The very high optical carrier frequency eliminates

frequency dependent loss in the modulation band, and makes short pulse communi-

cation feasible. The short wavelength allows imaging with a single lens, low loss

in waveguides, impedance matching with very low overhead, and wavelength divi-

sion multiplexing (WDM). Voltage isolation is a result of the large photon energy of

optics [6].

1.1.1 Limitations of electrical interconnects

It is important to understand the limitations and issues of electrical interconnects to

realize the benefits of optics.

i. Aspect ratio limit. As observed by Miller and Ozaktas [4], in digital electrical

interconnects, the total number of bits per second is limited by the “architecture”

of the interconnect, i.e., the length and the cross sectional area. For capacitive-

resistive (RC) lines, the limit to the total number of bits per second (B) depends

only on the “aspect ratio” of the line, which is defined as the ratio of the length

(l) of the interconnect to the square root of the total cross-sectional area (√A) of

the interconnect1. B depends to some degree on the design of the electrical lines.

Roughly speaking, B ∼ BoA/l2, with Bo ∼ 1016 bits/s for unequalized lines. For

inductive-capacitive (LC) lines, the formula for bit rate capacity is the same as for

RC lines, though the factor Bo is slightly smaller, 1015, limited by the skin effect.

Equalization [15], multilevel modulation and the use of repeaters can increase the

total number of bits per second. These schemes, however, add complexity to the

system, which will limit the density of interconnects. The added complexity may

also increase the latency of interconnects. In comparison, optics does not suffer

from this limit because the mechanisms of loss and signal distortion are different.

1This meaning of “aspect ratio”, which might better be referred to as the “architectural aspect

ratio”, differs from the use of the term “aspect ratio” that is the ratio of the height to width for

metal connections on chips.

ii. Frequency dependent loss and equalization. The loss profile of electrical wires

has a significant frequency dependence over the entire frequency band of interest

for high speed communication. In baseband communication, commonly used on

electrical wires, a frequency response from DC to the signal bandwidth is required

when no coding is used. The response of electrical wires is not flat for multiple

decades in the frequency domain and requires equalization to compensate for

large loss at high frequencies. In optics a very high frequency carrier is used

and the signal modulation of multiple gigahertz is a small fraction of the carrier

frequency. The response of the medium is quite flat over this frequency range,

requiring no equalization, thus simplifying the system design.

iii. Signal integrity. Driving large pad capacitance by off-chip interconnects can

generate electrical noise on the supply, affecting the signals on the chip. Electrical

chips have signal pads with a capacitance of about 1-2 pF. Large drivers are

required to drive this capacitance. Switching the voltage on these capacitors

generates large current transients, which act as noise sources for the circuits

on the chip. This noise can corrupt signals. Optical devices, if miniaturized and

integrated on the chip, can have much lower capacitances requiring lower currents

to drive them. Consequently, less electrical noise may be injected into the chip.

iv. Distance dependent loss. Loss is very significant in electrical wires at high fre-

quencies because of the skin effect. The design of the interconnects needs to be

customized for different lengths to account for these losses. Also, the frequency

response of the cable changes with the change in length, and a redesign of equal-

ization is required if it is used. In comparison, the losses in transmission of light

are very small. An optical interconnect designed for a few meters can easily be

used for kilometers. To give an idea of the numbers involved, in a 12 m RG-55U

cable the loss at 2 GHz is ∼ 10 dB [16], while in a 1000 m long single-mode optical

fiber at 42 THz carrier frequency the loss is a mere 3 dB. The propagation loss

in free space is also low for optics.

1.1.2 Other advantages of optics

i. Density of interconnects. Electronics can provide very high densities of intercon-

nects at the on-chip level. Electrical chips can have a large number of pins to

increase the density of inter-chip interconnects, as in a ball grid array (BGA),

though lots of pins need to be used for supply and ground for a reliable high

speed interconnection. For off-chip or board-to-board interconnects, optics can

offer very large densities. Optical devices can be made very small and 1000’s of

input-outputs (IO) can be achieved on a chip. An experimental chip with 4000

optical IO in a 7 × 7 mm2 area has been demonstrated [17]. Optical interconnects

can utilize the third dimension by being able to cross the beams. In free space,

a few optical elements can handle a large number of beams easily, retaining very

high interconnect densities.

ii. Impedance matching. Most electrical lines are designed for 50 Ω impedance, which

requires a 50 Ω termination to avoid reflections. A lot of power is absorbed in

this termination. In optics, a quarter-wavelength-wide index matching material

(anti-reflection coating) can match the impedance of two dissimilar materials to

remove reflections. This is equivalent to the termination in electrical lines; in

optics, though, there is virtually no power dissipation in this index matching

material. In optics, a beam splitter can be used to tap the optical signal for

monitoring, with small or negligible reflections. A similar tap in electrical lines

needs to be very well designed to minimize the impedance discontinuities, and

hence to reduce the reflections.

iii. Voltage isolation. Optical communication is accomplished by sending photons be-

tween two physically separate transmitting and receiving nodes. The voltages on

the two sides need not be related to each other and can be completely electrically

isolated. This provides noise immunity from one side to the other. With scaling

in electronic chips, supply currents are increasing and so are resistive drops in

DC supply and ground bounce effects. Hence this voltage isolation property of

optics may become progressively more important for future generations.

1.2 Components of an optical interconnect

So, what does an optical interconnect consist of? At the physical layer, an optical link

has three main components: a transmitter, the transmission medium, and a receiver.

In digital circuits, binary data in the form of voltage levels (whose value depends on

the technology) needs to be transmitted. Data in the form of these voltage levels

is fed to a transmitter driver, which converts these levels into the voltage or current

signal required to drive the optical transmitter device. The optical transmitter device

converts these electrical signals into the modulation of light beams, which then travel

through some propagation medium to the destination. The photodiode on the receiver

side converts the optical signal into current, which is then converted into logic level

by the receiver.

We will first consider different optoelectronic devices, then look at issues in receiver

circuits, and finally we will discuss free space optical interconnects.

1.2.1 Optoelectronic devices

Vertical cavity surface emitting lasers and quantum well modulators are the leading

contenders as output devices for dense optical interconnects. Lasers are current mode

devices while modulators are voltage mode devices. Quantum well modulators can

also be used as photodiodes.

We will first look into these devices and then look at optoelectronic devices in

silicon since it might be beneficial to have optical devices in mainstream silicon tech-

nology from the point of view of cost.

Vertical cavity surface emitting lasers (VCSEL)

VCSELs are a strong candidate as a transmitter device as they have improved signif-

icantly in the last few years. Oxide confined VCSELs can achieve very low threshold

currents [18]. Sub-mA threshold currents are now easily achieved in VCSELs. Re-

cently, optical interconnects with arrays of VCSELs have been demonstrated [19] [20]

[21] [22].

There are many issues that still need to be addressed for using large VCSEL arrays

in optical interconnects [23].

i. Uniformity of threshold current. The threshold currents of lasers need to be

uniform to have uniform behavior of VCSELs across the entire array. If the array

is non-uniform, individual lasers would need to be monitored and controlled,

which will make the entire design complex.

ii. Thermal issues. To avoid the turn-on delay of the laser, it is typically biased

near the threshold and driven well above the threshold when on. In a large array,

in particular, the resulting high current densities can heat up the lasers changing

their properties.

iii. Wavelength stability. The wavelength of these lasers drifts (with temperature and

aging), and is difficult to precisely specify in manufacturing. Many components

in the interconnect can be wavelength-sensitive, and it is important to maintain

the wavelength. Control of the wavelength is even more important in wavelength

division multiplexing.

Multiple quantum well (MQW) modulators

MQW modulators are p-i-n diodes with quantum wells in the i region. The structure

of these modulators and their operation is shown in Fig. 1.2. By applying a voltage

across MQW diodes, the wavelength of the absorption peak can be shifted. This

effect is called the quantum-confined Stark effect (QCSE) [24] [25]. If the modulator

is operated at a single wavelength, varying the voltage across the device changes

the absorption for that wavelength. GaAs-based MQW diodes show a strong QCSE

shift around 850 nm wavelength of light. The ratio of reflected light intensity in low

absorption state vs. the reflected light intensity in high absorption state is defined as

the contrast ratio (CR). By using a reflecting surface at the bottom of the modulator,

the light makes two passes, increasing the contrast ratio of the device. The typical

contrast ratio for this reflection modulator is 2:1 for a 3V swing as shown in Fig. 1.3.

This contrast ratio is limited but, by using Fabry-Perot effects it can be en-

hanced [26]. Moreover, modulators can be used in a differential fashion to double

Indium bump

p−contact

quantumwells

n−contact

Figure 1.2: MQW modulator operation

Figure 1.3: Wavelength vs. contrast ratio curve for MQW modulator for differentvoltage swings.

the swing. The voltage swings available to modulators diminish as the CMOS line-

widths decrease. Research is in progress to develop new modulators which can operate

with low voltage swings. Details of the physical operation of modulators and their

properties are dealt with in Refs. [27] [28] [29]. These devices are used as transmitters

in the work described in this dissertation.

MQW modulators are well suited for integration in large arrays. There are many

advantages in using these large arrays of modulators:

i. High yields have been demonstrated. Greater than 99.99% of diodes in large arrays

have been demonstrated in Ref. [30]. With this kind of yield, large arrays of

interconnects can be fabricated. High IO count optically-interconnected systems

have been demonstrated because of these yields [17] [31] [32] [33].

ii. Single off-chip laser source. The laser for a modulator system can be placed

away from the chip, removing the source of heat generation from the proximity

of modulators. Also, by using a single source to operate all modulators, it is

relatively easy to synchronize the whole system.

iii. Modulator or a photodiode. The diode structure described here can be used both

as a modulator and a detector depending on the circuit to which it is connected.

Being able to use this device as an input or an output device simplifies the

fabrication.

iv. Operation with short pulses. Another big advantage of modulators is that they

can be used to modulate short optical pulses (100fs ∼ 1ps) [29]. Short pulses

have many benefits in optical interconnects as mentioned in the next chapter.

Modulators also have their share of problems. The QCSE is temperature de-

pendent; variation in temperature can move the exciton peak and severely reduce the

contrast ratio. Bringing external beams on modulators can be a disadvantage because

more optics is required to handle incident and reflected beams.

Various optical transmitter technologies are compared in literature [34] [35] [36].

1.2.2 Receivers

A receiver consists of a photodiode to convert an optical signal into an electrical

current and circuitry to convert the current into a full logic swing. Circuits can

be fabricated, for example, in silicon or in GaAs. Silicon foundries are very well

established, and even though the performance of the circuits is slow compared to

GaAs, very high circuit densities can be achieved, making it a preferred technology.

For best receiver performance, the capacitance of the photodiode and the receiver

circuit should be as low as possible. A small capacitance design requires fewer gain

stages and has better noise immunity. To reduce the capacitance, monolithic detec-

tors can be made in silicon but silicon has a large absorption depth at wavelengths

near 850 nm, much deeper than junction depths in silicon CMOS. Most photons are

absorbed deep inside the substrate causing generated carriers to come to the surface

over a long period, also leading to inefficient photodetectors. Effectively, every bit

generates a long electrical response tail. There are ways in which faster responses can

be generated, but they reduce the responsivity. Metal-semiconductor-metal (MSM)

photodiodes can also be potentially used to reduce the capacitance, though these

cannot be made in a standard CMOS process.

Another approach is to use GaAs detectors. GaAs is a good absorber at 850 nm

and it is possible to obtain a very fast response with the quantum efficiency reaching

nearly one. The diode structure used for a modulator (as described in the last section)

can also be used as a photodiode. This simplifies the system because, a simple device

works as both a modulator and a detector, depending on the circuitry to which it is

connected. An alternative would be to make MSM detectors in GaAs, which could

lead to fast, efficient, and low-capacitance detectors.

To acquire both the advantages of an advanced silicon foundry and the perfor-

mance of GaAs devices, a hybrid integration scheme can be used. This is explained

in Chapter 3, and receiver circuits are discussed in Chapter 4.

1.2.3 Free space optical interconnects

Light beams with modulated data need to propagate in some medium to reach the

destination. In non-line-of-sight communication, data needs to be sent through a

guided medium. In long distance telecommunications, data is sent through a single

mode fiber, which has a very low loss of 0.2 dB/km at 1550 nm. For relatively shorter

distances multimode fibers are used, because the loss and dispersion of these fibers is

tolerable at these distances.

For very high density interconnects at very short distances, a guided medium

may not be appropriate. Since beams need to travel short distances, bulk optics

can be used to direct many parallel beams with a few elements. In “Introduction

to Fourier Optics” by Goodman [37], an analysis of optical elements used in the

design of systems is presented. For chip-to-chip and on-chip interconnects, the free

space approach provides required densities of interconnects. Waveguides can also be

used for short distances though they can have very high losses and the density of

interconnects is typically much lower than in free space. A comparison of the free

space and guided approaches is given in Ref. [38] and discussion about the free space

approach is given in Refs. [12] and [39]. Dispersion and losses can be very low in

free space optical interconnects (FSOI). In our current work, we primarily use free

space interconnects on slotted stainless steel baseplates. These baseplates act like a

breadboard system for optics and are described in Chapter 3.

1.3 Challenges in current optical communication

Optics is making inroads to short reach interconnections. Many technical advances in

devices and packaging have taken place recently. Optical interconnect products are

available for local area network (LAN) and wide area network (WAN) applications [40]

[41]. But can optics provide a solution for chip-to-chip and on-chip interconnects?

For very short reach applications, the density of interconnects required is very

high. There are still device and integration challenges to get high densities of optical

interconnects. When optical devices are integrated close to digital circuits, because

of noise from digital circuits, the operation of interconnects is affected. The heating

of devices due to power dissipation may also limit the density of interconnects. It is

still an open issue as to what densities for optical interconnects, on and off chip, can

be achieved.

On-chip global interconnects require low latency, possibly less than a clock cycle.

With the continuing scaling of silicon CMOS technology, the delay of global wires

with and without repeaters is increasing at least relative to the clock cycle [42] [43].

One solution for this problem is to change the architecture of the chip. Or, if we want

to use optics, can it provide low latencies at these scales? Can the data be delivered

within a clock cycle, accounting for the transmitter driver and receiver delay?

For high density parallel interconnects, synchronization to a local clock is a chal-

lenging task. Synchronizing each individual channel will be very inefficient, and a

limiting factor on the number of channels. Are there ways in which all the channels

can be synchronized in the optical domain itself?

Clock distribution on chips with very low jitter and skew is becoming increasingly

difficult. Even with optical clock distribution, receiver circuits add a lot of jitter

and skew. Many applications, such as analog to digital conversion, and high speed

multiplexing and demultiplexing, require a very precise clock with low jitter. Can we

reduce the skew and jitter, and have a very precise clock delivered to the chip?

Using short pulse signaling in interconnects might provide the answer to many of

these questions.

Optical communication is mostly done in binary format, i.e., the messages are put

into a sequence of zeros and ones. At the physical level, ones can be encoded in at

least two ways, while for a zero, no light beam is transmitted. One way to encode a

one is to send a constant light intensity for the entire bit period; the other method

is to send a pulse shorter than the bit period. The first method of encoding is called

non-return-to-zero (NRZ) and the second method is termed as return-to-zero (RZ).

When the pulse duration is much shorter than the bit period, the pulses are referred

as short pulses (Fig. 1.4).

Short pulses provide some unique advantages in communications [44], though it

is important that the medium be able to support the propagation of these pulses.

On electrical wires the frequency dependent losses are very high for short pulses, and

there is substantial dispersion that spreads the pulses, making their use impractical.

In optics, as mentioned earlier, because of the high frequency carrier, the losses for the

entire spectrum of short pulses are nearly constant. Also, the dispersion in an optical

medium for small distances is tolerable and does not cause significant broadening.

Short pulse benefits in optical interconnects are briefly enumerated below.

i. Receiver sensitivity enhancement. By using short pulses, it is possible to improve

RZ withshort pulses

bit period

0 1 0 1 0

Figure 1.4: Schematic demonstration of NRZ and RZ coding

the sensitivity of the receiver, and hence reduce the number of gain stages. With

smaller receiver size, it might be possible to increase the density of interconnects.

Or, with the same number of stages, optical power required can be reduced for

improved system power budget.

ii. Latency reduction. Because of very sharp rise and fall times, short pulses can

reduce the latency of receivers and might reduce the overall latency of the link.

Global on-chip interconnects might be feasible with short pulses.

iii. Synchronization and clocking. Short pulses generated from a modelocked laser

have very low pulse-to-pulse jitter. A sharp rising edge with very low jitter can

be utilized to distribute a very precise clock. Also, short pulses that are much

shorter than the bit period can be used to read out the modulator at the nominal

center of the bit. This can effectively eliminate skew and jitter of up to half a bit

period from the transmitting channels and synchronize the entire array without

any extra processing.

iv. Wavelength division multiplexing (WDM). Short pulses (150 fs) have very broad

bandwidth (∼ 5 nm). Multiple separate channels can be created by spectral slic-

ing this bandwidth and modulating each slice individually. Channels generated

from a single source eliminate the need of wavelength monitoring of each channel.

Also, in a system using WDM all the benefits of short pulses can be utilized.

Short pulse signaling can potentially make optical interconnects feasible at the

chip-to-chip and on-chip level. This dissertation investigates some of the issues of

short distance optical interconnects operating with short pulses, including their prac-

ticality and feasibility. There are many compelling reasons for using optics at these

short distances.

1.4 Organization

The organization of this thesis is as follows. Chapter 2 gives an overview of short

pulses. The properties of short pulses, their generation, and their propagation in an

optical medium are addressed. A broad overview of the benefits of short pulses in

interconnects is presented.

The components of the chip-to-chip link used in this work are presented in Chap-

ter 3. These components include optomechanics, GaAs based MQW diodes, and

silicon CMOS chips. Circuits designed on silicon chips are discussed in detail. The

hybrid integration process used for flip-chip bonding of MQW diodes on silicon chips

is also mentioned.

Chapter 4 gives the details of three receiver architectures; transimpedance, inte-

grating, and totem-pole. Fabrication of these receivers and their experimental mea-

surement results are presented in this chapter. These receivers are operated with

NRZ and short pulse data and their performances are compared. It is shown that

short pulses can improve the receiver sensitivity significantly.

Latency in optical links is considered in Chapter 5. It is a very important criterion

for global on-chip interconnects. Short pulses can reduce the latency of a receiver,

hence making optical interconnects a potential solution for on-chip interconnects.

Chapter 6 presents clocking and synchronization with short pulses. These pulses

can remove skew and jitter of up to half a bit period from the entire array of mod-

ulators by nominally reading the whole array at the center of the bit period. Using

silicon detectors for low capacitance, and eliminating the receiver circuit, a very pre-

cise clock can be injected with short pulses. A totem-pole diode pair is used as a

push-pull device to generate the clock by alternately putting the pulses on the top

and bottom diode.

Wavelength division multiplexing (WDM) using short pulses is demonstrated in

Chapter 7. A short pulse beam is spectrally sliced to generate multiple wavelength

channels. A single short pulse source generating all the optical channels keeps the

output from the entire modulator array synchronized. Wavelength monitoring for

each separate channel is not required.

Finally the conclusions are presented in Chapter 8.

Chapter 2

Short Pulses in Interconnects

The non-return-to-zero (NRZ) format is the most commonly used format for data

communication. For a given speed it is bandwidth-efficient, which is very useful in

bandwidth-limited systems. An alternative to NRZ is the return-to-zero (RZ) format

which has no transition for a logic ’0’ and two transitions for a logic ’1’. The short

pulses referred to in this thesis are RZ encoding with very low duty-cycle. These

pulses are of the order of a few picoseconds or shorter, which for a 1 GHz link have

a duty-cycle of about 10−3.

Short pulses are typically generated by using a modelocked laser with modelock-

ing done actively or passively. Modelocking ensures very low pulse-to-pulse jitter.

Repetition rates of many gigahertz have been demonstrated; past research work in

modelocked lasers is comprehensively summarized by Avrutin et al. [45]. Short-pulse

sources in general are summarized by Tamura [44]. For the current work, a commer-

cial Ti-Sapphire laser with 80 MHz repetition rate is used. Due to limited commercial

applications at this time, high repetition rate short pulse commercial lasers are not

readily available at wavelengths convenient for the present work.

In an optical medium, attenuation is frequency-independent for a broad frequency

range. In absolute terms, attenuation per unit distance can be very small as well. Low

attenuation for large bandwidth allows the propagation of short pulses for distances of

interest in interconnects. Similarly, dispersion is also very small in an optical medium

for distances of interest for interconnects, making propagation of short pulses feasible.

CHAPTER 2. SHORT PULSES IN INTERCONNECTS 17

Shen et al. [46] have demonstrated a short pulse-based WDM transmission with only

3 ps of skew over the entire transmission band. This is in contrast to electrical wires

where both attenuation and dispersion are very high. This difference occurs because

a high frequency carrier is used in optics while baseband communication is used in

the electronic domain. Even very high speed modulation rates are small compared

to optical carrier frequencies (∼ 1014 − 1015 Hz), so such modulation makes little

difference to optical propagation.

A stream of ideal pulses can be represented mathematically by Dirac-delta func-

tions:n=+∞∑

n=−∞

δ(t− nT ) (2.1)

where T is the period of repetition and δ(t) is a Dirac-delta impulse. In the frequency

domain, this impulse train corresponds to another comb of Dirac-delta functions or

modes with a frequency separation of 1/T, as given by the following equation.

n=+∞∑

n=−∞

δ(f − n

T) (2.2)

Such an ideal pulse stream contains an infinite set of frequency components separated

by the repetition rate of the pulses. The pulses generated by the laser are not ideal

impulses, but are more likely approximately Gaussian in shape. For very short pulses

the spectrum is still very close to the spectrum of an ideal impulse train for a large

number of modes.

For non-ideal pulses with a pulse shape p(t), the pulse train and the corresponding

spectrum are given by

n=+∞∑

n=−∞

p(t− nT )⇔ P (f)n=+∞∑

n=−∞

δ(f − n

T) (2.3)

where P (f) is the Fourier transform of p(t). The spectrum of the train of pulses

consists of ideal impulses, and the envelope is determined by the Fourier transform

of an individual pulse. If the pulses are Gaussian in nature then the spectrum of the

pulse train is also Gaussian in its envelope. This is illustrated in Fig. 2.1. For the

laser used in the current work, the pulse width is about 150 fs and the spectral width

is ∼ 5 nm. These pulses are much shorter than any time scale on the chip and can

effectively be treated as impulses.

Figure 2.1: A pulse train and its spectrum

Large amplitude, large bandwidth, fast rising and falling edges, and low pulse-to-

pulse jitter (Fig. 2.2) are very useful properties of short pulses in optical interconnects.

The following sections give a brief overview of the different advantages of using short

pulses in interconnects, which form the motivation for this work.

low pulse to pulse jitter (< 3ps rms)

largeamplitude

very large bandwidth (> 2 THz)

150 fs

Figure 2.2: Short pulse properties

2.1 Improved receiver performance

The optical power budget is an important constraint in providing a large number

of IO between chips. Reducing the amount of optical power required will allow a

larger number of IO. Keeping everything else the same while reducing optical power

in a link requires larger amplification in the receiver, thus increasing the size of

the receiver and the amount of electrical power dissipation. By using short pulses,

the optical power required by the receiver can be reduced, without increasing the

amplification, because short pulses have all the energy concentrated in a very short

period. Sensitivity enhancement of transimpedance receivers with short pulses was

first mentioned by Boivin et al. [47]. In the case of NRZ data, while the input is

being charged, the charge leaks away through the feedback resistor, giving a smaller

peak swing for the same amount of energy compared to the short pulse input. This

is schematically illustrated in Fig. 2.3. For an integrating receiver, all the energy

is concentrated in the integrating period when short pulses are used. With NRZ

data, optical energy incident during the resetting period is wasted as illustrated in

Fig. 2.4. Thus the sensitivity of an integrating receiver improves by using short pulses,

though the extent of this improvement depends on the fraction of clock cycle used

for resetting. Chapter 4 expands on this idea, and presents the advantages of short

pulses with different receiver architectures.

short pulsei

Figure 2.3: Sensitivity enhancement in transimpedance receiver with short pulses

NRZinput

short pulseinput

integrationphase phase

evaluation

Figure 2.4: Timing diagram of the integrating receiver with short pulse and NRZinputs. Energy incident during the evaluation phase is not integrated.

2.2 Low latency in receivers

For on-chip connections, because of increasing clock speeds and reducing line-widths,

it is becoming increasingly difficult to send data across the chip in one clock cycle.

For example, on a 2 cm wide chip, repeatered global interconnections would require

∼ 330 ps assuming the speed of propagation of the signal to be roughly c/5 (c is the

velocity of light in vacuum) [6]. For optical interconnects to be a viable alternative,

the latency of optical links needs to be lower. It is potentially possible to reduce the

latency of an optical link by using short pulses instead of NRZ data format. The

latency of a transimpedance receiver can be reduced by ∼ 65%, if short pulses are

used. The latency of the integrating receiver can also be reduced significantly at the

expense of timing margin. The lowest latency in a receiver can be achieved by using

an amplifier-less scheme. A totem-pole structure of detectors connected to a high

impedance node can be charged or discharged to full supply levels using short pulses

in a very short period. The latency of transimpedance, integrating and totem-pole

receiver architectures is analyzed in Chapter 5.

2.3 Better synchronization

There are two aspects to synchronization in a system. One is to have all the channels

in a parallel link synchronized to each other, and the other is to provide accurate

clock to all parts of the chip. Short pulses can improve the synchronization of the

system because of fast rise and fall times, and low cycle-to-cycle jitter. In an imple-

mentation with multiple parallel channels, the drive waveforms have skew and jitter

due to process variations, temperature variations, and noise on supply lines. This

causes phase misalignment at the receiver, and imposes a system power penalty. As

mentioned earlier in this chapter, short pulses are like impulses and they effectively

sample the state of the modulator. By using short pulses to read out the parallel

channels at a nominal bit center, the effect of skew and jitter from those channels can

be removed. Fig. 2.5 conceptually shows the removal of skew from different channels.

Similarly, the effect of jitter can be removed. There is a limit though, to the amount

of skew and jitter that can be removed, namely up to half a bit period.

Figure 2.5: Skew removal from multiple parallel channels using short pulses. Thethree waveforms are electrical drive signals and they are read by a short pulse whichsamples all the channels at the same time.

It is also possible to inject a very precise clock using short pulses. This clock

can be potentially used to retime the data coming in on parallel optical IO. Apart

from this, the precise clock can find application in testing and debugging. In these

applications, a very low jitter clock is required to characterize waveforms on the chip.

Typically, amplifiers in the optical receiver also introduce jitter. To circumvent this

problem, an amplifier-less scheme is proposed, which is capable of providing a very

precise clock. Synchronization and clocking are dealt in with Chapter 6.

2.4 Wavelength division multiplexing (WDM)

~ 5nmwavelength

Figure 2.6: Spectral slicing of short pulse spectrum for WDM

In backplanes of current routers, the volume available for wiring is limited. The use

of WDM can potentially reduce the number of wires by transmitting multiple channels

on one fiber. A 150 fs pulse has a bandwidth of ∼ 5 nm. This broad bandwidth can be

split into multiple frequency bands, and each band can be modulated independently.

These bands are orthogonal to each other and can be combined to be sent through, say,

a single fiber and then split again at the receiving end. Fig. 2.6 shows the concepts

of splitting the spectrum to generate multiple channels. By using a single source

to generate multiple channels, many system aspects are also simplified. Different

channels are carved out of a single spectrum, hence they automatically maintain the

wavelength separation and require no monitoring. In contrast, a laser-based WDM

system requires a very careful monitoring of the wavelengths of lasers so they do not

drift into neighboring channels. Removing the monitoring requirement reduces the

cost of the system. In the case of short pulses, the channels are also synchronized

while going to the receiver as shown in the previous section. The received data on

all the channels can then be sampled using a single clock, reducing the complexity of

the system.

Chapter 7 goes into the details of the implementation of a short pulse-based WDM

Chapter 3

Optical Interconnect Setup and

Components

In this chapter we will describe technology that is common to the work in subsequent

chapters. Specifically, we will discuss the optical apparatus, the optoelectronic de-

vices, the integration technology, the overall layout of the silicon CMOS chips, and

some of the relatively standard circuits used on the chips.

A schematic of a generic dense chip-to-chip optical link based on modulators is

shown in Fig. 3.1. Either a short pulse beam from a modelocked laser or a continuous

wave (cw) beam is incident on a diffractive optical element (DOE), which fans out this

beam into multiple beams. These beams are modulated by an array of modulators

driven by the electrical signals from the chip. The modulated beams are imaged on

the receiver chip. The output of the receiver drives either, a) an output electrical pad

for direct testing; b) an on-chip bit error rate tester for evaluating link performance; or

c) another modulator for optical verification of the received data. All-optical testing

by reading out the modulator driven by the receiver eliminates the need for high-speed

output electrical pads from the chip.

For the present work, the optomechanics was designed on a breadboard style

system based on slotted stainless steel baseplates. GaAs-based MQW diodes acting

as modulators and photodetectors were flip-chip bonded to silicon CMOS chips. The

optomechanics and optical test bench setup are described in the next section. The

CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 25

Figure 3.1: Schematic diagram of an optical interconnect system

properties and operation of MQW diodes are presented in Section 3.2. Section 3.3

describes the silicon chips designed for this work. Finally, Section 3.4 deals with the

hybrid integration of MQW diodes and silicon chips.

3.1 Optical test bench

The implementation of the dense chip-to-chip optical link was done using slotted

stainless steel baseplates (Fig. 3.2). The input optical beam (cw or short pulse) was

fanned out into 20 beams for 10 linear differential channels using a diffractive optical

element (spot array generator). Beam steering was done by a pair of Risley prisms,

which moved the beam by small amounts when they were rotated. The chips were

mounted on XYZ stages, external to the baseplate, to provide better controllability

of the placement. The alignment of the beams was done visually by viewing with the

imaging cameras, shown at the top of Fig. 3.2.

In optical testing, optomechanics is an essential element. Precision in alignment

and stability are required for repeatable measurements. The slotted stainless steel

shortpulsebeam

slotted baseplate

spot arraygenerator

imaging cameras

beamreadout

Figure 3.2: Optomechanical setup for testing

baseplates used in the current work satisfy these requirements, simultaneously pro-

viding easy reconfigurability for a low setup time. This kind of setup was reported by

Brubaker et al. [48]. These baseplates are precision milled to 1 µm flatness over the

entire surface. All the components in a given slot are aligned to a common optical

axis, which is the same as the mechanical axis of the slots. The overall assembly with

baseplates is mechanically and thermally very stable. The baseplate setup minimizes

the time required for assembly and alignment, because all the components are on a

single optical axis. The optical components are mounted in circular cells which are

custom designed and placed on precision milled slots in the baseplate. The compo-

nents are held in place by using ceramic magnets, providing a stable arrangement

after alignment. For a given implementation, a custom layout of the slots is generally

required, unless optical path lengths are not critical. For the case of non-critical path

lengths, an arbitrary grid of slots can be used, providing flexibility and convenience.

The DOE used in this setup was an eight level phase-only mask fabricated by

Digital Optics Corporation. It was a representation of the Fourier transform of the

required pattern. The intensity of 20 fan out beams generated by the DOE was

uniform within 90% at the wavelength of interest, i.e. 850 nm. The design of a DOE

is explained in Refs. [49] [50].

There are many ways of generating short pulses. One way is to drive a laser

with very short current spikes as in Refs. [51] [52] [53]. Another way is to use either

active, passive, or hybrid modelocking in lasers. Active modelocking is typically done

by driving the laser from an external modulation source [54]. Passive modelocking

involves a saturable absorber in the laser cavity or Kerr nonlinear lensing [55]. For

this work, short pulses were generated by a commercial modelocked Ti:sapphire laser

at 80 MHz. The availability of a high power and a high-repetition-rate commercial

laser is presently limited by a relatively small demand, though in research, many

high-repetition-rate modelocked lasers have been demonstrated, e.g. in Ref. [56].

3.2 MQW diodes

Chapter 1 gave a basic overview of the design and operation of MQW diodes. These

diodes are p-i-n structures with quantum wells in the i region. They work as modula-

tors on the basis of quantum-confined Stark effect (QCSE). MQW diodes fabricated

in GaAs exhibit strong QCSE around a wavelength of 850 nm. These diodes can not

only be used as modulators, but also as photodiodes. Being able to use the same

device for modulation and reception simplifies the design of an interconnect system.

The MQW diodes used in this work were first-generation devices fabricated at

Stanford. They exhibited the Fabry-Perot effect, because an anti-reflection coating

could not be used for processing reasons. This Fabry-Perot effect degraded the per-

formance of the devices.

The overall size of these diodes after fabrication was 40 × 80 µm2. As photodiodes

these devices had a responsivity of about 0.13 A/W, a fourth of the expected value of

0.5 A/W. The maximum responsivity of a GaAs photodiode can be 0.66 A/W, a limit

corresponding to one electron per photon at a photon energy of 1.5 eV (850 nm). The

responsivity of 0.5 A/W is routinely achieved with anti-reflection coating [30]. With

a voltage swing of ∼ 3 V, the contrast ratio of these diodes when used as modulators

was about 1.3:1, which was much below the expected value of 2:1.

Because of low contrast ratio, these modulators were used differentially to increase

the signal strength. For single-ended electrical circuits, these diodes can be connected

in a totem-pole configuration to provide differential optical output, as in Fig. 3.3. In

this figure, a schematic and the corresponding picture of a totem-pole connected diode

pair is shown. The same configuration can be used at the receiver for differential

optical and single-ended electrical input. Two diodes can also be used separately in

a fully differential configuration.

Figure 3.3: Schematic and the picture of totem-pole connected diodes

The bonded capacitance of these diodes was originally expected to be 100 fF but

it was actually ∼ 260 fF. This capacitance was measured by using ring-oscillators on

the silicon chip. The oscillation frequencies of an unloaded oscillator and a MQW

diode-loaded oscillator were compared [57]. This large deviation in the capacitance

of these diodes affected the performance of the circuits quite adversely.

3.3 Silicon chips

Most of the receiver testing was done on two CMOS chips fabricated using different

technologies. One chip was fabricated in the 0.5 µm HP process and other was fabri-

cated in the 0.25 µm National Semiconductor process. In this section, the description

of the circuits on these chips will be presented.

The layout of the chip fabricated in the 0.5 µm process is shown in Fig. 3.4. The

chip consists of linear arrays of transceivers. The receiver output is connected to

generator 1noise testing

BER tester 2

BER tester 1

Transceiver

circuits

VCO for

PRBS generator 2

Figure 3.4: Layout of the chip fabricated in the 0.5 µm HP process

a modulator driver, so that the received data can be verified by reading the state

of the modulator. For this testing, the modulator is driven by the receiver output,

and a continuous wave beam reads out the modulator state. The modulated beam

can be observed by using a commercial photodiode. This allows for all-optical test-

ing of receivers, eliminating issues associated with high speed electrical pads. Both

transimpedance and integrating receivers are designed on this chip. Because of some

fabrication issues, the transimpedance receivers did not function correctly on this chip.

To test the performance of the receiver in terms of bit error rate, pseudo-random bit

sequence (PRBS) generators were incorporated on the chip. The details of the design

of PRBS generators and bit error rate tester circuits are given in Section 3.3.2. To

test the robustness of the receivers with substrate noise, voltage-controlled oscillators

were also designed on this chip. These oscillators were capable of generating substrate

noise at different frequencies. Receiver test circuits with outputs to electrical pads

were also accommodated.

receivertransmitterpairs

receiversconnectedto samplers

Ring oscillatorsto measure silicondetector capacitance

Silicon detectorsconnected tosamplers

Figure 3.5: Layout of the chip fabricated in the 0.25 µm National Semiconductorprocess

The layout of the chip fabricated in the 0.25 µm process is shown in Fig. 3.5.

This chip has receiver-transmitter pairs for all-optical testing as in the previous chip.

Receiver-transmitter pairs on this chip are designed so that the latency measurements

of on-chip interconnects can be performed. Apart from all-optical testing, electrical

samplers are put on this chip for the probing of internal nodes of the circuits, described

in Section 3.3.3. The chip also contains silicon detectors for tests on optical clock

injection. The sampler circuits fabricated on this chip are high voltage samplers,

which detect voltages above ∼ 1 V.

The design of the receivers and their operation with short pulses forms an impor-

tant part of this dissertation and is separately dealt with in detail in Chapter 4.

3.3.1 Modulator driver

The contrast ratio of modulators improves with higher voltage swing [27]. But if

modulators can be used with the voltage swing corresponding to the supply voltage

of the chip, the driver design is simplified, though with the scaling of technology, and

the supply voltage, it might be necessary to use higher-than-supply swings to get a

large enough contrast from modulators. Some circuits for high voltage swing drive

to modulators are described in Ref. [58]. Here, the modulator driver is designed to

generate the supply swing on the modulator.

Figure 3.6: Eye diagram of modulator driver operation at 800 Mb/s obtained byoptical readout of the modulator.

A modulator driver is a chain of buffers designed to drive ∼ 100 fF of capacitance.

Because the modulator capacitance was larger than expected, the driver chain was

not able to operate at very high speeds. In simulation for both 0.5 µm and 0.25 µm

technology with 100 fF of capacitance, the modulator driver was able to operate in

excess of 1 Gbps, while the fabricated modulator driver with bonded diodes operated

only up to 800 Mbps in 0.5 µm technology. The eye diagram is shown in Fig. 3.6.

A similar performance was obtained on the 0.25 µm chip. Because of limited drive

capability, the modulator driver was not able to drive certain pulse outputs related

to short pulse testing.

3.3.2 Pseudo random bit sequence (PRBS) generator and

tester

A pseudo random sequence can be generated by using storage elements (e.g. flip-

flops) and an XOR gate in a feedback loop [59]. The connection of the XOR gate

depends on a polynomial reported by many researchers, for example see Ref. [60].

This structure is referred to as a linear feedback shift register (LFSR). In general,

the maximum period for an n-stage LFSR is 2n − 1. The signals generated from

a LFSR are not truly random. Pseudo-random sequences are better for test-pattern

generation as they can be reproduced easily and verified. For example, a length 27−1

sequence generator is shown in Fig. 3.7. A feedforward circuit, such as the one shown

in Fig. 3.8, can verify a sequence generated from the earlier circuit. This circuit can

be used for bit error rate (BER) testing on the receiver side. It is important to note

that a LFSR should not be in an all-zero state because it will continue to generate

zeros after that. This possibility can occur only at the start of this circuit and a

mechanism can be inserted to start the circuit from a fixed state that is not all zeros.

x0x7x6 x3 x2x5 x4 x1

Figure 3.7: Schematic of a LFSR generating a pseudo random sequence of length27 − 1, where a square corresponds to a D flip-flop.

x7x6x3x2 x5x4x1

datainput

output

Figure 3.8: Schematic of the circuit to verify the sequence generated by the LFSRshown earlier.

PRBS generators and testers were designed on the 0.5 µm technology chip. There

were two PRBS generators: the first one generating a 27 - 1 length sequence and

the second one generating a 222 - 1 length sequence. All the tests were done with

the longer sequence generator; the short one was fabricated so that the entire bit

sequence could be seen on an oscilloscope for debugging. The 222 - 1 length sequence

generator was connected to an array of modulators to simulate a random bit stream.

Corresponding BER testers based on verification of the sequence generation logic were

also designed. In such a circuit, every error generated one transition at the output.

The total number of transitions in a given time period were counted to compute the

bit error rate.

3.3.3 Samplers

On-chip samplers allow the measurement of relatively high frequency content inside

the chip. This technique was proposed by Larsson and Svensson [62] and later many

authors have published different sampling methodologies, e.g. as in Ref. [63]. The

vsignal

vcalib

gnd gnd

smpClk_b

SmpClk & sample & enable

smpClk & calibrate & enable

vdd vdd vdd

enable

1.6u1.6u

3.2u4.8u

12u 12u

To shared current mirror

Figure 3.9: The circuit schematic of the on-chip sampler in 0.25 µm CMOS tech-nology. All transistors are minimum length. (Yeung et al. [61])

high bandwidth of MOS transmission gates makes this idea possible. By the sub-

sampling of a repetitive signal with varying clock phase, high speed analog signals

can be reconstructed. Samplers were fabricated on the 0.25 µm technology chip for

sampling silicon detector and the receiver responses.

A master-slave sample-and-hold switched-capacitor circuit forms the core of the

on-chip sampler, as shown in Fig. 3.9. A source follower buffer is placed between

master and slave nodes (marked by smp and hold on the schematic) to remove the

bandwidth limitation due to charge sharing. The hold voltage is transformed into

current and extracted out of the chip. Since the relationship between the sampled

voltage and the output current is not linear, we have multiplexed a calibration signal

at the input of the sampler. The sampler output current is calibrated with this input

signal before using the sampler. Every sampler is independently calibrated to account

for process and environmental variations across the chip. The transmission gates of

samplers are formed by PMOS transistors; therefore it is only possible to measure

signals above the threshold of the transistor (∼ 1 V). By extensive simulation, the

3 dB bandwidth of samplers was found to be ∼ 4 GHz.

3.4 Hybrid integration of GaAs devices

Ideally, one would like to integrate optoelectronic devices monolithically on silicon

chips. Silicon detectors have problems of speed and sensitivity in the near infrared,

and there are no viable silicon modulators or emitters for the kinds of densities,

efficiencies, and speeds required for interconnects to CMOS chips. One problem

with III-V devices for monolithic integration is that III-V compounds are not lattice

matched with silicon and hence cannot be grown without introducing many defects.

Also, the introduction of GaAs in a silicon foundry is often not acceptable because it

might have detrimental effects on silicon circuits.

A hybrid approach is more promising, because it avoids the aforementioned process

incompatibility issues. Using this approach, well established high performance silicon

circuits are combined with optically superior GaAs devices [64]. One technique of

integrating these devices is shown in Fig. 3.10. This technique is used for bonding

devices at Stanford. High yields have been demonstrated with large arrays of bonded

devices [30] [65] [66] [67] [68] [69].

coatingAnti Reflection

n+Indium solder

silicon

epoxyi MQW

p AlGaAs

i MQW epoxy

Figure 3.10: Integration of GaAs devices on silicon chips

For bonding GaAs devices to silicon chips, pads for contact are placed at appro-

priate places on silicon chips. The completed silicon chips are then post-processed

to deposit a barrier layer, followed by gold and then indium. GaAs-based devices

are etched to form mesa structures. To make both n and p contacts planar, the n

and i regions are etched all the way down to the p layer and a thick indium bump is

deposited to get it to the level of indium contact on the n side. Step 1 of Fig. 3.10

shows a silicon chip and GaAs-based devices at this stage. In step 2, both are brought

together with heat and pressure, to join them. The GaAs substrate absorbs light at

850 nm and the illumination of the devices needs to be done from the side of GaAs

wafer. Hence, the GaAs wafer is removed by etching and an anti-reflection coating

is optionally deposited on the devices. After removing the wafer, the devices stand

apart as mesa structures. GaAs and silicon have different expansion coefficients with

temperature and by removing the GaAs wafer, the problem of thermal stress between

silicon and GaAs is eliminated.

Fig. 3.11 shows a picture of the CMOS chip with flip-chip bonded MQW diodes.

These diodes were 80 × 40 µm2 in size. The diodes were fabricated and flip-chip

bonded at Stanford. It is quite possible to make smaller devices, which might be

preferable to reduce the photodiode capacitance. At Lucent, very small devices with

flip-chip pads of size 15 × 15 µm2 have been fabricated [30]. Ten rows of these diodes

were fabricated with 20 diodes in each row. The spacing between the diodes in a row

was 62.5 µm and the rows were separated by 125 µm.

Figure 3.11: Picture of a CMOS chip with flip chip bonded diodes

3.5 Summary

A slotted-baseplate-based optical system was implemented for a dense chip-to-chip

optical link. An eight phase level DOE was used as a fan out element to gener-

ate 20 beams for modulation by the modulators. Short pulses were generated by a

modelocked Ti:Sapphire laser operating at 80 MHz. The GaAs-based MQW diodes

were used as modulators and photodetectors after flip-chip bonding on silicon CMOS

chips. As photodiodes, their responsivity was 0.13 A/W and their capacitance was

∼ 260 fF. As modulators, their contrast ratio was ∼ 1.3:1. These devices were first-

generation devices and these numbers were quite different from the expected values.

The performance of the circuits fabricated in the 0.5 µm and 0.25 µm technology was

adversely affected because of this variation. The modulator driver operated up to

800 Mbps. The BER tester, the pseudo random bit sequence generator, and on-chip

samplers were designed on the chips to facilitate the link testing.

Chapter 4

Receivers

Earlier chapters introduced the technology used in this work and the concept of a very

low duty cycle return-to-zero scheme for improved performance of links. This chapter

looks into the design of receiver circuits for short distance links, and highlights the

differences in operation with short pulse and NRZ data.

The design of optical receivers for short distances is similar to telecommunications

receivers in some ways but the requirements are very different. Sensitivity is very im-

portant for telecommunications receivers because they operate with very few photons

per bit. In contrast, receivers for interconnects trade off sensitivity for lower power

dissipation. The area and the cost of the receivers are more critical in short links

than in telecommunications. In telecommunications, the serial data rate is increased

for higher throughput (wavelength division multiplexing is also used) requiring re-

ceivers to run at very high speeds. In short links the throughput is increased by

increasing parallelism. The receiver for telecommunications is noise-limited while the

short link receiver is typically gain-limited. To get high overall throughput, receivers

in interconnects need to be densely packed with other circuits, where supply and

substrate noise generated from surrounding circuits and the electrical crosstalk from

other receivers can impair the performance. Hence, for receivers to operate in this

environment they should be immune to noise generated from surrounding circuits.

CHAPTER 4. RECEIVERS 39

A lot of literature has been devoted to the design of receivers for telecommunica-

tions starting with the ground-breaking work by Personick [70]. A typical telecom-

munications receiver consists of a transimpedance stage, gain stages, a decision stage,

and an automatic gain control (AGC) module. In interconnects the number of stages

is generally minimized to reduce power consumption and total delay of the receiver.

Optical receivers can be operated with a single modulated optical beam, or with

differentially modulated optical beams. With a single beam implementation, espe-

cially if the system is to be DC coupled, a reference signal needs to be generated on

the chip. While with a differential beam implementation, the reference information

is carried by the beams. It is also possible to have a fixed threshold determined by

the devices forming the receiver [71] [72]. In a single beam implementation, perfor-

mance can possibly be degraded by optical intensity variations and the noise in the

reference generation mechanism. Variations in the received optical power require the

reference to be dynamically varied. Due to difficulties in generating a good reference

on the chip, receiver sensitivity is enhanced by using differential beams. In telecom-

munications, it is very expensive to incorporate two fibers to carry differential beams

for every channel, but in the case of free space interconnects, doubling the number

of beams is not such a significant problem. Differential beams also double signal

contrast, which is required in current modulator based system because of a limited

contrast ratio.

Receivers can be implemented in many different technologies. High performance

receivers have been demonstrated in BiCMOS [73], GaAs [74] [75], and silicon CMOS

[76] [77]. Because of advances in silicon CMOS technology and widespread use, the

cost of fabrication in this technology is very low. Also, a very high density of circuits

can only be achieved in silicon CMOS, making this a preferred choice for fabrication

of circuits. As detailed in Chapter 3, circuits for this present work were fabricated in

silicon CMOS. The overall chip design is described in Chapter 3 itself.

Three receiver topologies are considered in this chapter: transimpedance, integrat-

ing, and totem-pole stacked diode pair. The earlier work has primarily been focused

on NRZ data input to receivers. We believe that the use of short pulses with receivers

leads to useful advantages as will be shown in later sections.

A transimpedance receiver is a commonly used architecture in telecommunica-

tions. There is no clock required at the frontend which makes this receiver potentially

very fast. Synchronization to the local clock domain can be done after recovering the

signal to a full logic level. In this receiver the current generated by the photodiode

is converted to voltage by the transimpedance stage, which is amplified to full signal

swing by further amplification stages. The speed of this receiver is typically limited

by the frontend time constant, which is determined by the total capacitance at the

input node and the effective feedback resistance seen by the frontend. When a short

pulse format is used instead of NRZ, the large amplitude of short pulses increases

the sensitivity of the transimpedance receiver. The operation of this receiver and the

effect of changing various physical parameters are covered in Section 4.1.

An integrating receiver integrates input photocurrent and uses positive feedback

to make a decision. It is the most sensitive of the three architectures considered here.

This receiver requires a clock synchronized to the data input at the frontend. The

use of short pulses improves the timing margin of this receiver. The latency of this

receiver can be reduced at the expense of the timing margin, which is explained in

detail in Chapter 5. The sensitivity of the integrating receiver also improves by using

short pulses as mentioned in Section 4.2.

A totem-pole stacked diode pair is the simplest form of receiver. It works on the

principle of integrating the input optical power directly at the input node. Full swing

is generated at this node, which eliminates the need of any further amplification.

Removing amplification stages has the advantage of eliminating possible skew and

jitter introduced by the amplification circuitry. The operation of this receiver is

explained in Section 4.3.

The organization of this chapter is as follows. First the principles of operation of

three different receiver architectures are presented along with a comparison of their

performance with short pulse and NRZ input. The later sections give the fabrication

details and testing results of transimpedance and integrating receivers. The totem-

pole diode pair receiver is explained in detail in Chapter 6, as it is primarily used for

clock injection.

4.1 Transimpedance receiver

Transimpedance is the most common architecture for receivers for both telecommu-

nications and interconnects. This receiver does not require any clock at the frontend

(asynchronous), which makes it relatively easier to use. The design of this receiver

has been discussed in detail at many places in literature for telecommunications [73]

[77] and interconnects [76] [78] [79] [80] [81]. We will summarize the operation of this

receiver frontend without going into a lot of detail, and then compare the performance

of this receiver for NRZ and short pulse input.

vin vout

R f post−amplifierchain

frontend

Figure 4.1: Transimpedance receiver structure

The transimpedance receiver structure is shown in Fig. 4.1. It consists of photodi-

odes connected to an inverting amplifier with resistive feedback and a post-amplifier

chain. The difference of the currents from the two photodiodes flows into the circuit.

A high-input-impedance inverting amplifier and the resistor form the transimpedance

stage, which converts the photocurrent flowing into the circuit into voltage. This volt-

age is then amplified by the post-amplifier chain and a decision is made about the logic

level. The transimpedance receiver is analyzed for various optimizations in literature,

e.g. in Refs. [81] [82].

This receiver was implemented in silicon CMOS with inverters acting as ampli-

fiers. This implementation was originally proposed by Woodward et al. [83] and later

analyzed in detail for NRZ data by Forbes [82]. Fig. 4.2 shows the schematic of the

transimpedance frontend and the small-signal equivalent circuit of its implementation.

The two stacked diodes convert an optically differential signal into a single-ended pho-

tocurrent input (iin). The DC light intensity incident on the two diodes is cancelled

out and only the difference current flows into the circuit. In the equivalent circuit

shown, gm and gds are the total transconductance and output conductance of the

MOS transistors respectively, CL is the total capacitive loading of all the components

connected at the output of the frontend, Ci is the total input capacitance of the

receiver, and Rf is the feedback resistance. The gain of the amplifier can also be

expressed as A = gm/gds. The transimpedance gain of the first stage is given by

Z(s) =Vout(s)

Iin(s)=

1/gds − ARf

(1 + A) + τs+ s2/ω2o

where τ = CiRf +CL/gds+Ci/gds and 1/ω2o = CiRfCL/gds. ζ = ωoτ/2 is the damping

factor, which determines the settling time of the transient response.

gdsC L

vin vout

Figure 4.2: Schematic of the transimpedance frontend and the small-signal equiva-lent circuit of its implementation.

Based on Eq. 4.1, the transimpedance frontend was analyzed for short pulse and

NRZ inputs. Parameter values corresponding to 0.25 µm technology were assumed,

this being the technology in which this receiver was fabricated (Rf = 5 kΩ, Cin =

90 fF, A = 15, gds = 0.125 mf, CL = 60 fF). To simulate a current pulse input, a

pulse of 10 ps was assumed at the input. NRZ data was simulated by a step stimulus.

In the NRZ case, operation at 1 Gbps was assumed to compute energy in a bit period.

For equal energy in pulse and NRZ inputs, the time response of the frontend is

shown in Fig. 4.3. Short pulse input provides a larger peak response than does NRZ

input. This is because the large amplitude of the high frequency components of the

short pulse help produce larger output amplitude maxima. Another way to explain

this is that the charge does not leak away at the input with short pulses while the

output peaks, in contrast to NRZ, and a larger output is generated for the same

total charge. Effectively, short pulses enhance the sensitivity of the transimpedance

receiver. Boivin et al. [47] first described this sensitivity enhancement with short

pulses for telecommunication applications. Later Winzer et al. [84] analyzed it a

step further. Both of these references looked at the sensitivity enhancement in a

bandwidth-constrained transimpedance receiver for optimum thermal and shot noise

performance. For short pulse interconnects, thermal and shot noise do not limit the

performance and the bandwidth of the receiver can be quite large. This might allow

larger sensitivity gains.

0 0.2 0.4 0.6 0.8 1−0.05

Time (ns)

pulsestep

Time (ns)

Figure 4.3: Pulse and step response of the transimpedance stage

We now look at the effect of variations of different components on the relative

performance with short pulses and NRZ data.

Effect of feedback resistance

The feedback resistance (Rf ) determines the transimpedance gain. Increasing the

value of the feedback resistance increases the transimpedance gain while reducing the

damping factor. The time response with different feedback resistance values is plotted

in Fig. 4.4. A larger feedback resistance causes a bigger amplitude for both pulse and

step responses.

0 0.2 0.4 0.6 0.8 1−0.1

Time (ns)

0 0.2 0.4 0.6 0.8 10

Time (ns)

3k5k7k

Figure 4.4: Pulse and step response of the transimpedance stage with varying feed-back resistance.

Comparing the response of a pulse input to that of a step input with different

feedback resistances (Fig. 4.5), a smaller feedback resistance gives larger relative

sensitivity enhancement for short pulses. This is because the bandwidth of the receiver

increases with lower feedback resistance and more frequency components appear at the

output of the transimpedance stage, even though the absolute amplitude reduces due

to smaller gain. Simultaneously, the pulse width also reduces with smaller feedback

resistance. Resistance cannot be reduced to a very small value because later stages

require a minimum pulse width to propagate the pulse, and also, very low gain is not

acceptable. For maximum gain, we would like to have the largest possible feedback

resistance, which would not broaden the pulse to the extent of causing inter-symbol-

interference (ISI) in the system at the bit rate of operation. The appropriate resistance

value is determined given the bit rate.

0 0.2 0.4 0.6 0.8 1−1

Time (ns)

3k5k7k

Figure 4.5: Pulse response of the transimpedance stage with varying feedback re-sistances normalized to the maximum of step response.

Effect of input capacitance

The input capacitance (Cin) is dominated by the photodiode capacitance. Based on

Eq. 4.1, the pulse and step responses for different input capacitances are plotted in

Fig. 4.6. A smaller capacitance value produces larger amplitude with a given pulse

input. It is desirable to reduce the frontend capacitance to as small a number as

possible to improve the sensitivity of the receiver.

Effect of pulse width

In current simulations, an electrical pulse of 10 ps is assumed, based on device char-

acteristics. The carrier sweep-out time in a typical p-i-n photodiode is expected to

0 0.2 0.4 0.6 0.8 1−0.1

Time (ns)

0 0.2 0.4 0.6 0.8 10

Time (ns)

50fF90fF130fF

Figure 4.6: Pulse and step response of the transimpedance stage with varying front-end capacitance.

be of this magnitude, limited by carrier transport. If the electrical pulses generated

at the output of the photodiode are wider, then the behavior of the transimpedance

stage is shown in Fig. 4.7. For the same energy in the pulses, wider pulses produce

smaller peak amplitude at the output. If we broaden the pulses to a bit period, we get

the NRZ case. To get the largest peak amplitude we would like to use the narrowest

pulses possible.

Advantages and issues with short pulse operation

As seen above, short pulses can improve the sensitivity of the transimpedance receiver.

This sensitivity improvement for the entire receiver can be more than 3 dB. The

latency of this receiver can also be reduced significantly (up to 65%) by using short

pulses.

Apart from the above advantages of short pulses, there are some issues with us-

ing short pulses. Since short pulses create a fast transient response in the receiver,

0 0.2 0.4 0.6 0.8 1−0.05

Time (ns)

Increasing pulse width

Figure 4.7: Pulse response of the transimpedance stage with varying pulse width

this transient could cause inductive supply noise (Ldi/dt), which might reduce the

signal-to-noise ratio in large arrays. Also, the output pulse width generated by the

transimpedance frontend can vary due to parameter variation. If the pulses at the

output are too short, then they will not be detected by the decision stage and if the

pulses are broader than the bit period, data dependent effects would degrade the

receiver performance.

The integrating receiver described in the next section solves these problems.

4.2 Integrating receiver

The concept of positive feedback for amplification and logic decision has been reported

in many places [85] [86] [87]. Based on a similar principle, the implementation of an

integrating receiver is shown in Fig. 4.8. This receiver is based on the strongarm latch

mentioned in Ref. [88]. In this implementation, differential input data is integrated

at the input nodes for half the clock cycle (clock low), during which the rest of the

circuit is put into a metastable state with both outputs at the supply voltage. In the

next half cycle (clock high), a decision is made about the received data. At the end of

this half cycle the input nodes are reset so that new data can be integrated (Fig. 4.9).

The output of this receiver has valid data output for only half the bit period, and for

the other half the output is at the precharge voltage. To convert this output to valid

data for the entire cycle, a set-reset (SR) latch is used. The SR latch limits the speed

of this receiver. The performance of the entire receiver can be improved by using a

modified latch to get a valid bit for the entire bit period. This has been demonstrated

in [89]. A modified latch was not implemented in this work.

clk clkclk

Vsupply

Figure 4.8: Schematic of the integrating receiver frontend

in(NRZ)

evaluation integration

in(shortpulse RZ)

integration evaluation

Figure 4.9: Timing diagram of the operation of integrating receiver with NRZ andshort pulse inputs.

The sensitivity of an integrating receiver is typically better than a transimpedance

receiver because of the use of positive feedback in the latch, though it requires a clock

synchronized to the data input at the frontend. The synchronized clock is typically

generated using a phase-locked loop (PLL) or a delay-locked loop (DLL). It can even

be generated by using a totem-pole diode pair as mentioned in Chapter 6. Data

extraction by automatic clock synchronization was not implemented in the present

work. The clock phase was aligned manually.

The voltage difference ∆V at the input nodes is a function of the total capacitance

of all the devices connected at the input nodes (Cin) and the difference of input optical

powers in the two beams (∆Pin). In a simplified form, this voltage difference can be

represented as

∆V =∆PinRt

where R is the responsivity of the photodiodes and t is the time of integration.

A larger ∆V gives a faster response and is also less likely to be affected by the

noise sources. This equation suggests that by reducing Cin, the differential voltage

generated by the input light can be increased. Cin is dominated by the photodiode

capacitance, which can be reduced by reducing the size of the diode or by using a

different kind of diode like a metal-semiconductor-metal (MSM) diode, or by using

a silicon-on-insulator (SOI) process. For short pulses, the time for which the charge

integrates is the pulse width, which is very short compared to the bit period. For

NRZ, the integration time is half the cycle.

Advantages with short pulses

The problems encountered in short pulse operation with the transimpedance receiver

are not present in this receiver. Since short pulses are integrated at the receiver

frontend, no spikes are generated in the receiver supply. Also the receiver does not

generate short pulses, instead it generates a 50% duty cycle output, which can easily

be converted into NRZ data for subsequent use.

In a short pulse link, an integrating receiver with a clocked frontend has significant

advantages over NRZ signaling. These advantages include sensitivity enhancement,

tolerance to pulse arrival time, latency reduction, and improved supply noise perfor-

mance. The following sections discuss these advantages.

Sensitivity enhancement

If the integration period is half the clock cycle, then the timing diagram for this

receiver is as shown in Fig. 4.9. NRZ input incident during the evaluation period is

not integrated or utilized. By contrast, short pulses have all the energy concentrated

in a very brief period and it is integrated during the integration period. When the

integration and the evaluation period are the same, half of the energy in the NRZ

input is wasted. Effectively, the use of short pulses gives a 3 dB enhancement in

sensitivity.

Tolerance to pulse arrival

nrz t spt

in(NRZ)

integration evaluation integration evaluation

in(shortpulse)

Figure 4.10: Input data arrival-tolerance margins illustrated for NRZ and shortpulse inputs.

If short pulses are used, then the pulses can arrive anytime during the integration

period and all the charge will be integrated. For these very short pulses, the flexibility

in arrival time is about half a bit period. A similar flexibility exists with NRZ data,

but it depends on the rise and fall times of the bit, which are typically much larger

than the rise and fall times of the short pulses. The margin is then reduced by the

sum of the rise and fall times of NRZ input.

Fig. 4.10 shows how the rise and fall times of NRZ input reduces the tolerance to

the arrival time of data. In this figure tnrz and tsp are tolerances to arrival for NRZ

and short pulse input respectively.

Latency reduction

The time taken to generate a valid output from the time of arrival of data can be

reduced by using short pulses instead of NRZ data. The latency (the total delay

between input and output) is reduced at the expense of the timing margin. If the

pulses arrive closer to the end of the integration period, then the delay from input to

output can be reduced in this receiver, though the receiver performance is susceptible

to variation in pulse arrival time. For example, if the pulses are delayed too much,

then they fall outside the integration period and the charge is not integrated causing

an error at the output. The latency reduction with short pulses is explained in detail

in Chapter 5.

4.3 Totem-pole diode pair receiver

The receivers mentioned above have a relatively high latency because they have one or

more stages of amplification, each of which introduces delay. Controlling or limiting

latency is, however, crucial for on-chip interconnects. Also, the amplification stages

in receivers add skew and jitter, which could be a problem in a large receiver array

or optical clock injection. By eliminating the amplification stages and generating

full swing at the diodes, the latency of the receiver can be reduced, and skew and

jitter associated with amplification stages can be avoided. This leads to a totem-

pole diode-pair-based receiver implementation, or, in short, a “totem-pole” receiver

implementation.

A stacked diode pair (“totem-pole”) is connected to a high impedance node, pos-

sibly a buffer, to create an integrating frontend (Fig. 4.11). This receiver qualifies

as an integrating receiver but is treated separately here because of its interesting

characteristics for short pulse operation, especially for clock injection. If the data

beam is incident on the top diode, then the current flows into the circuit; if the data

Figure 4.11: Totem-pole diode pair connected to a high impedance input node ofinverter.

is incident on the bottom diode, then the current flows out of the circuit. As soon

as the node “in” is charged to supply rails, diodes are forward biased clamping the

voltage on node “in”.

This receiver trades off the electrical gain stages for additional optical power. A

few researchers have thought of using only the diode as a receiver, but primarily for

telecommunications applications where photons are scarce. Williams et al. used the

photodiode with an erbium doped fiber amplifier (EDFA) to boost the light intensity

to generate a large voltage swing [90]. Yoneyama et al. have hypothesized a receiver

consisting only of a photodiode and estimated power dissipation in links as a function

of bit error rate [91]. In contrast to telecommunications, interconnects tend to have

a larger optical power at the receiver, making this receiver architecture more feasible.

Also, in the references mentioned above, the output of the photodiodes is connected

to a 50 Ω resistance, which might require larger optical power for operation compared

to driving a high-impedance node (e.g. a capacitance). The capacitance seen by the

flip-chip bonded photodiode connected to an inverter circuit could be below 100 fF,

making this high impedance application more attractive.

There are many advantages of using this structure apart from its simplicity. This

receiver can operate at very high speeds by using short pulses, because the charging

time of the input node is determined by the carrier transit time inside the diodes,

which is of the order of few picoseconds. Full swing signals are generated in a single

stage eliminating the jitter and skew from amplifier stages. A single stage also reduces

the latency of the receiver, which can potentially make on-chip optical interconnects

feasible.

The optical input power requirement for this receiver depends on the responsivity

of the diodes and the capacitance of the input node. The capacitance on this node

is dominated by the diode capacitance. In the current work, p-n diodes in silicon are

implemented. Monolithic diodes are more appealing for clock injection because of

their potential for low capacitance.

The analysis of latency of this receiver is given in Chapter 5 and the details of its

implementation for clock injection are in Chapter 6.

4.4 Fabrication and testing

Fabrication details and measurements of the transimpedance and integrating receivers

are presented in this section. Measurements for the effect of supply noise on these

receivers are also given.

4.4.1 Transimpedance receiver

There are many ways of implementing a transimpedance receiver. In this work, an

inverter-amplifier-based implementation is chosen because of the simplicity and small

footprint. This possibly allows for very large densities of optical IO.

The transimpedance receiver was fabricated in 0.25 µm technology. A schematic

of the entire circuit is shown in Fig. 4.12. This architecture is analyzed in detail in

Ref. [82]. The first stage of this receiver is a transimpedance stage with the feedback

resistor implemented with a PMOS transistor. The effective feedback resistance of

PMOS can be changed by changing the voltage at node vtune. A small PMOS device

is capable of providing large resistance values. The transimpedance stage also consists

of the clamping diodes, formed by source and gate connected NMOS transistors, to

limit the output swing. By limiting the output swing, the dynamic range of this

receiver is increased.

This receiver has a very small footprint of 15 µm × 17 µm, which allows for

high density integration. This circuit was simulated in circuit simulator SPICE with

vtune at 0 V. The transimpedance gain of the first stage was about 5 kΩ. Power

clampingdiodes

in out

Figure 4.12: Schematic of the transimpedance receiver. Transistor widths men-tioned here are in λ, where λ = 0.2 µm for the technology used. All transistors areminimum length.

dissipation of this receiver was approximately 3 mW. Simulations show that for 100 fF

of diode capacitance this receiver can work up to ∼ 1.5 Gbps for 10 µA of average

photocurrent for NRZ input into the circuit. The capacitance of the bonded devices

was about 260 fF, for which this receiver worked at much lower speeds. A simulated

performance of this receiver with 10 µA of average photocurrent is shown in Fig. 4.13.

With no light incident on the receiver, the output stays on one supply rail and

when a short pulse is incident on a photodiode, it either switches to the other sup-

ply rail or continues to stay on the same supply rail depending on which diode is

illuminated. If the receiver switches to the other supply rail then it has to switch

back to the earlier rail within a bit period for no inter-symbol interference. Hence the

recovery time of the receiver determines the maximum speed the receiver can operate

with short pulses. For short pulse input with ∼ 520 fF total device capacitance, the

receiver works up to 200 Mbps in simulation. With lower capacitance this receiver

can operate at much higher speeds with short pulses in simulation.

Receivers on the chip were tested by using optical readout. The bit error rate

tester was not connected on this chip. The receiver was tested at 600 Mbps with

NRZ data generated from directly modulated lasers, and the eye diagram obtained

Time (lin) (TIME)0 5n 10n 15n 20n 25n 30n

1 Gbps with 260 fF diode capacitance

Time (lin) (TIME)0 5n 10n 15n 20n 25n 30n

1.5 Gbps with 100 fF diode capacitance

Figure 4.13: SPICE simulation of the transimpedance receiver with 10 µA averagephotocurrent. Voltage at node out is shown. Top curve is for 1 Gbps operation ofthe receiver with 260 fF of diode capacitance. Bottom curve shows the operation at1.5 Gbps with 100 fF of diode capacitance.

by optical readout of the modulator is shown in Fig. 4.14. The eye is barely open at

this speed because of the modulator driver limitation to drive 260 fF. The speed of

operation was also limited because of the large capacitance.

The receiver performance was also tested using electrical samplers. This way the

modulator driver was not a limitation since high capacitance modulator diodes were

not involved. The receiver was verified to work up to 900 Mbps with NRZ input.

Short pulse testing of the receiver was done at only 80 Mbps because of the speed

limitation of the laser. The receiver output was driving a modulator driver, which was

Figure 4.14: Eye diagram of the transimpedance receiver operation with NRZ inputat 600 Mb/s. 26 µA average photocurrent is injected in each beam.

Figure 4.15: Eye diagram of the transimpedance receiver output voltage with shortpulse input at 80 Mb/s.

read out by a cw beam as shown in Fig. 4.15. A 400 Mbps short-pulse laser available

did not have the sufficient output power to test the receiver in a chip-to-chip link.

Because of the limitation in driving large capacitance, no sensitivity enhancement

was measured for this receiver. According to the simulations, with 40 fF of diode

capacitance, there is ∼ 5 dB of sensitivity enhancement.

4.4.2 Integrating receiver

The schematic of the integrating receiver circuit is shown in Fig. 4.16. This circuit

was fabricated in the 0.5 µm technology. According to the simulations, the average

electrical power consumption of this receiver was 2.3 mW. A higher device capacitance

for this circuit requires a larger optical power for the same speed of operation.

The integrating receiver was operated at 600 Mbps with roughly 50 µW (∼ 14 µA

clk clkclk

Vsupply

8 81616

Figure 4.16: Schematic of the integrating receiver fabricated in the 0.5 µm tech-nology. Transistor widths are shown in λ, where λ is 0.35 µm. All transistors areminimum length.

Figure 4.17: Operation of the integrating receiver with optical readout at 600Mb/s.

photocurrent) average power in each beam from directly modulated lasers with NRZ

data input. The readout was done optically. The eye diagram of the receiver operation

is shown in Fig. 4.17.

The receiver performance with short pulses and NRZ input data was compared

by operating the receiver in an optical link. The link was operated at 400 Mbps with

the pseudo random sequence generator driving the modulator. A custom externally-

driven modelocked laser was used to generate short pulses at 400 MHz repetition rate.

The output power of this laser was approximately 5 mW, which was just sufficient

for the testing of this link [92].

The receiver output was put into the BER tester and the number of errors was

−16 −15.5 −15 −14.5 −14 −13.5 −13 −12.5 −12 −11.5

1e−3

1e−5

1e−7

1e−9

Power per beam (dBm)

short pulseNRZ

~ 3.1 dB

Figure 4.18: Sensitivity comparison for NRZ and short pulse data for integratingreceiver operating at 400 Mbps in a chip-to-chip link.

counted to get the bit error rate. Fig. 4.18 shows the BER vs. signal power per beam

at the receiver. The operation with short pulses required 3 dB less power compared

to NRZ input, verifying the sensitivity enhancement mentioned in the last section.

4.4.3 Measurement with supply noise

The digital circuits placed close to the receivers can inject noise in them through the

supply line and the substrate [93] [94]. Also, a large number of receivers connected to

the same supply may switch simultaneously and generate large current spikes on the

supply line. Because of the impedance of the supply line, voltage variations occur on

these lines with current spikes. Since the receivers amplify small-magnitude analog

signals, they are susceptible to these noise sources. The voltage noise on supply lines

might cause jitter at the output of the receivers.

Using the pump-probe method, given in detail in Chapter 5, the delay of the

transimpedance receiver with supply voltage was mapped [95]. This curve is shown

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3530

Supply (V)

Figure 4.19: Transimpedance receiver delay variation as a function of supply voltage.This measurement was done via the pump-probe technique. The nominal supplyvoltage was 2.5 V.

in Fig. 4.19. The delay varies as ∼ 11 ps/100 mV, which shows that this receiver is

quite sensitive to supply variations. With a large receiver array connected to a single

supply, it is possible to have few hundred mV of supply voltage fluctuation, which

would result in quite significant delay variation. The large amount of resulting jitter

can add a performance penalty to the the link.

To test the performance of the integrating receiver with supply noise, a chip-to-

chip link was operated at 100 Mbps with NRZ modulation. Since high frequency

supply noise could not be injected because of the by-pass capacitors, a sinusoidal

noise signal at 1 KHz was injected on the receiver supply from an external source.

Using the on-chip BER tester it was possible to quantify the effect of injected supply

noise. Bit error rate curves vs. total optical power in the link are plotted in Fig. 4.20.

The power penalty was only 0.12 dB for 100 mV of supply noise [96].

To characterize the effect of substrate noise on the integrating receiver, voltage-

controlled oscillators in the vicinity of the receivers were operated. There was no

−8 −7.8 −7.6 −7.4 −7.2 −7

1e−3

1e−5

1e−7

1e−9

Power per beam (dBm)

No noise0.1Vpp0.2Vpp0.3Vpp

Figure 4.20: Bit error rate curves of integrating receiver operation in a link at100 Mbps with NRZ data. Sinusoid noise was injected in the supply with differentpeak-to-peak values at 1 KHz.

measurable power penalty on the operation of the link while running these noise

generators.

4.5 Summary

Transimpedance, integrating, and totem-pole receiver topologies were discussed in

this chapter. Even though these topologies have been examined in literature, not

much work has been done to analyze them for short pulse operation. This chapter

has looked at the operation of these receivers with short pulse (RZ) input and the

possible advantages and issues with this operation.

Transimpedance receiver sensitivity can be enhanced by using short pulses, as

compared to NRZ data, though larger supply noise might be generated with short

pulse operation. A receiver fabricated in the 0.25 µm technology was shown to be

prone to jitter with supply noise as it had a delay variation of ∼ 11 ps/100 mV of

supply voltage variation measured using pump-probe technique. The operation of

this receiver was verified up to 600 Mbps with optical readout and up to 900 Mbps

with on-chip samplers. The performance of this receiver was affected by the larger-

than-designed capacitance of flip-chip bonded diodes.

The integrating receiver mentioned above has higher sensitivity than the tran-

simpedance receiver because it amplifies the signal with positive feedback. Being

fully differential, this receiver is more immune to supply and substrate noise. In a

chip-to-chip link with every 100 mV of supply noise, an optical power penalty of only

0.12 dB was measured. 600 Mbps NRZ operation with direct modulated lasers was

demonstrated. The receiver had a ∼ 3 dB of sensitivity enhancement in the link with

short pulse operation compared to NRZ.

It would seem that a fully differential integrating receiver is well suited for oper-

ation with short optical pulses.

Chapter 5

Latency in Interconnects

In connections between and within electronic chips, total latency is a very important

parameter in determining system performance. As the CMOS linewidth scales, the

processor clock speed increases, making it difficult to run an entire chip synchronously.

In other words, transferring data within a clock cycle is becoming difficult. According

to the International Technology Roadmap for Semiconductors (ITRS) estimate [42],

gate delay and local interconnect delay are being reduced as the technology is scal-

ing (Fig. 5.1), but the delay of global interconnects with and without repeaters is

continuously increasing relative to the clock period.

The propagation velocity of global interconnects with repeaters is a small fraction

of the velocity of light (10% - 20%) and is not expected to improve significantly [7]

[16] [97]. For 0.25 µm technology the delay of global lines is less than a clock cycle,

but for future technologies the delay will be longer than a clock cycle. If the signals

can be propagated at a significant fraction of the velocity of light, e.g. > 0.3 c, the

delay in communication will be less than a clock cycle up to 0.1 µm technology [7].

It might be possible to use optics to provide communication across chips at a

significant fraction of the velocity of light. For optics to be feasible, the delay in the

transmitter and the receiver has to be very low, of the order of a few gate delays. The

delay of propagation in optical media cannot be altered though it is relatively fast

(∼ 0.67c in glass). Transmitter and receiver circuits are designed in silicon CMOS,

hence they are likely to keep pace with silicon chips to perform logic operations as

CHAPTER 5. LATENCY IN INTERCONNECTS 63

the technology scales [8].

Dambre et al. [98] have shown that with low latency optical links, three-dimensional

optoelectronic multi-FPGAs outperform two-dimensional electronic FPGAs. In a re-

cent paper, Collet et al. [99] have concluded that since the most critical issue in

computer architecture is the access time to the main memory, the signal latency is of

critical importance in implementing optical interconnects. In Ref. [100] concerns are

expressed about increased latency in optical interconnects compared to their electri-

cal counterpart because of the added functions of electrical-to-optical and optical-to-

electrical conversion. But because of advanced integration techniques, as mentioned

in Chapter 3, parasitics associated with optical components can be reduced by a sig-

nificant amount reducing the latency in driving them. Kyriakis-Bitzaros et al. [101],

on the basis of a realistic model in 0.8 µm CMOS technology, demonstrated that the

latency of an optical link is lower than the electrical link even for sub-centimeter line

length.

Most work until now has looked at the latency of an optical link with NRZ data

Figure 5.1: ITRS projection of on-chip electrical interconnect delays with technologyscaling [42]

format with a VCSEL or an edge-emitting laser as a transmitter. Turn-on delay

of lasers could add significant latency, which depends on the electrical drive signal

strength and waveform [18]. Turn-on delay can be eliminated by using modulators

instead of VCSELs. It is possible to significantly reduce the latency of optical in-

terconnects by using short pulses with modulators. The fast optical rise time and

concentration of all the energy in short pulses both work towards reducing the la-

tency. In this chapter we will explore the latency in optical interconnects operating

with short pulses.

receiver

delay in propagation

modulator

driver delay

Figure 5.2: Components of latency in a modulator-based interconnect system

Optical interconnects have three components: the transmitter, the medium of

propagation and the receiver. A schematic of a modulator-based optical interconnect

system is shown in Fig. 5.2. The transmitter can be easily optimized because it es-

sentially consists only of digital components (its input is a digital logic level). For a

MQW modulator, the driver is typically an electrical buffer chain. The optimization

of a buffer chain is mentioned in Ref. [102]. The receiver, having analog input pro-

vides the largest room for improvement. A similar viewpoint was also expressed in

Ref. [103]. In the following sections we will analyze different receiver architectures for

latency with short pulse operation. Signal latency, here, is defined as the maximum

of rise or fall delay between input and output waveforms, measured at 50% of the

signal amplitude.

The organization of this chapter is as follows. The next three sections address

the latency of three different receiver architectures: transimpedance, integrating, and

totem-pole diode pair. The latency analysis of receivers via modeling is verified by

SPICE simulations. Experimental measurements of the latency of the transimpedance

receiver are also presented. The scaling of latency with technology is considered in

Section 5.4. Finally, the conclusions are presented.

5.1 Transimpedance receivers

The transimpedance receiver is the most commonly used receiver in optical commu-

nication. The latency of the transimpedance receiver with NRZ data format has

been analyzed in Ref. [103] and a measurement of latency for one implementation

is reported in [104]. For this work such receivers were fabricated in 0.25 µm CMOS

technology as mentioned in Chapter 4. This circuit was analyzed by simulating in

circuit- simulator SPICE and by using a first-order analytic model. Using the model

helps in a better understanding of the latency in this receiver. Intuitively, we would

expect to lower the latency of the transimpedance receiver by using short pulses, as

compared to NRZ. This is because for the same energy in the pulse, a larger maximum

amplitude at the output of the transimpedance stage is generated, which reduces the

gain required from later stages, hence reducing the latency. Also the transimpedance

stage is charged faster with a short pulse, as compared to NRZ input, because of

the concentration of energy in a very short period (Fig. 5.3). The following section

deals with the modeling of the latency and the section after that gives the details of

measurement setup and results.

Larger amplitude at this node

Transimpedance stage:

Postamplifier chain:smaller gain is required

Charges the output faster

Figure 5.3: Mechanism of latency reduction in a transimpedance receiver with shortpulse input.

5.1.1 Modeling of latency

To understand the mechanism of latency in receivers, a first-order model of the tran-

simpedance receiver is analyzed. This model is shown in Fig. 5.4. The first stage is the

transimpedance amplifier with a finite gain-bandwidth product. An ideal amplifier

with a series output impedance Ra (same as 1/gds of transistors) together with the

output capacitance model the finite gain-bandwidth amplifier. All the capacitances

at the output of the frontend amplifier, including the input capacitance of the next

stage, are combined into a single capacitance represented by CL in the figure. The

gain stages are modeled as open loop amplifiers with a finite gain-bandwidth prod-

uct. After computing the swing at the output of the transimpedance amplifier, the

required gain-per-stage (Gps) is calculated for the post-amplifier chain. Due to the

finite gain-bandwidth product, the time constant of the stage can be deduced given

the required Gps. A first order estimation of latency can be done by adding the time

constants of all these stages. A step input simulates the NRZ response and a 10 ps

pulse simulates the pulse response.

The following parameter values are assumed for simulation, which correspond to

the parameters of 0.25 µm technology in which this receiver was fabricated, with

low-capacitance high-responsivity photodetectors.

• Total capacitance at the input of the receiver (Cin) = 90 fF

• Feedback resistance (Rf ) = 5 kΩ

• Output impedance of the amplifier (Ra) = 8 kΩ

• Total capacitive loading at the output of the amplifier (CL) = 60 fF

• Open loop gain of the amplifier (A1) = 15

• Gain-bandwidth product of each post amplifier stage = 10 GHz

• Speed of operation = 1 Gbps

• Photodiode responsivity = 0.5 A/W

• Number of post amplifier stages = 2

• Pulse width of electrical current pulses generated from photodiode (limited by

the transit time of carriers in intrinsic region) = 10 ps

amp1 amp2 amp3

variable length amplifier chain

v vin out

Figure 5.4: First order model of a transimpedance receiver with variable lengthpost-amplifier chain

The capacitance of each photodiode is assumed to be 40 fF which is close to the

value of capacitance reported for flip-chip bonded MQW diodes [57]. The capacitance

value achieved in the current work was much higher, but in future runs it is expected

to be below 40 fF. The transfer function of the transimpedance stage is given by

H(s) =Vout(s)

Iin(s)=

Ra − A1Rf

1 + A1 + s(RfCin +RaCin +RaCL) + s2(RfCinRaCL)(5.1)

Based on this transfer function, pulse and step responses were computed for the tran-

simpedance stage. Energies per bit (also referred as pulse energies) were computed

based on 1 Gbps operation of the receiver. Latency in the receiver with different

input pulse energies is shown in Fig. 5.5. This result shows a latency reduction of ∼65% for large pulse energies by using short pulses as compared to NRZ [105]. This is

a very large reduction in latency which could make optical interconnects competitive

for on-chip connections. These results match very well with the results of precise

simulations in circuit-simulator SPICE of the transimpedance receiver fabricated on

this chip. This validates our first-order model, which we can therefore use to explore

the effect of different parameters on the latency of the receiver.

0 50 100 150 200 250 300 350 4000

Optical energy per bit (fJ)

short pulseNRZ

’x’ are SPICE simulation

Figure 5.5: Pulse energy vs. delay for short pulse and NRZ input for the first-ordermodel. Corresponding SPICE simulations are denoted with “x”.

To consider the effect of the number of stages in the post-amplifier chain on latency,

if the total gain needed from the post amplifier chain is A and the number of stages

is N , then the gain required per stage is Gps = A1/N . Due to finite gain-bandwidth

product of each stage, the delay per stage (inverse of bandwidth) is proportional to

the gain required per stage. Hence the total delay of the chain (τ) scales as

τ ∝ N.A1/N (5.2)

Gps decreases exponentially with the number of stages N . For low N , the exponential

decay of A1/N dominates in Eq. 5.2, while for a larger N the linear increase of N

dominates. This behavior can be seen in Fig. 5.6. Intuitively, for a large N , when N

is increased to N+1, the reduction in gain per stage is very small. Since the reduction

in gain is small, the reduction in delay per stage is also small, but because of the extra

stage the total delay (which is the sum of the delays of all stages) increases. On the

other hand, for a small N , when N is increased to N + 1, the reduction in gain per

stage is relatively large causing a large reduction in the delay. Even with one extra

stage, the overall delay is reduced.

0 5 10 15 20 25 30 35 400

Number of stages (N)

Number of stages vs. total delay

A=20 A=200

A=2000

Figure 5.6: Variation of delay vs. number of post-amplifier stages for different totalgain, assuming a constant gain-bandwidth product for all stages.

The receiver delay vs. the number of stages for different input optical energies per

bit (also referred as pulse energy) are plotted in Fig. 5.7. This plot follows the same

1 2 3 4 5 6 7 8 9 100

Number of post amplifier

Short pulseNRZ

100 fJ

Figure 5.7: Number of post-amplifier stages vs. delay for different pulse energy

0 50 100 150 200 250 300 3500

Pulse energy (fJ)

Short pulseNRZ

2 stages

3 stages

2 stages

3 stages

Figure 5.8: Pulse energy vs. receiver delay for 2 and 3 post-amplifier stages

pattern as in Fig. 5.6. The calculated latencies of the receiver vs. pulse energy for

a 2 stage and a 3 stage post amplifier are shown in Fig. 5.8. This figure illustrates

that as the pulse energy is increased, the amount of gain required reduces, causing

the delay to be minimized by a lower number of stages for a pulse energy higher than

a certain crossover pulse energy. Crossover occurs at ∼ 70 fJ for NRZ in this figure

but for short pulses this crossover occurs below the the plotted pulse energies.

In this section we saw that by using short pulses the latency in the transimpedance

receiver can be reduced very significantly (∼ 65%) compared to NRZ data. The results

of the first order model and SPICE simulation match very closely. By using the first

order model, it was also concluded that for a given input pulse energy there is an

optimum number of stages to minimize latency, which may not be the same for short

pulses and NRZ input. For reasonable pulse energies, as a rule of thumb, the latency

is minimized by using somewhere between two to five post-amplifier stages.

Measurement results and setup details are given in the next section and the results

will be seen to verify the simulated results of this section.

5.1.2 Measurement of latency

The results in the earlier section suggest that the latency is significantly improved by

using short pulses. To verify this concept, the latency of the receiver-modulator driver

pair was measured experimentally. Circuits were fabricated in 0.25 µm CMOS tech-

nology and the optical devices, multiple-quantum-well diodes, were flip-chip bonded

with the process mentioned in Chapter 3. An optical pump-probe setup was used for

measurement [95]. Short pulses (∼ 150 fs) generated from a Ti:sapphire modelocked

laser at 850 nm were used as pump and probe beam as illustrated in Fig. 5.9. Short

pulses at 80 MHz (repetition rate of the laser) as pump beam and a cw laser output

as balance beam are incident on the differential diode pair at the receiver input. The

pump beam excites the receiver, while the balance beam brings the receiver back

to its original state over time. The electrical output of the receiver drives a modu-

lator driver. The voltage output of the modulator driver is sampled optically with

a readout beam marked as probe beam in the figure, at the same rate as the pump

beam. Varying the delay between pump and probe beam maps the response of the

transceiver pair. Since the optical pulses are only 150 fs, sub-picosecond resolution

can be achieved in measurements [95]. This approach allows one transceiver transi-

tion to be accurately measured. By interchanging the pump and the balance beam,

it is possible to measure the other transition too.

modelockedlaser

lasercw diode

delay stage

pumpbeam

balancebeam

probebeam

or lock−inoscilloscope

chopper

CMOS chipwith MQW diodes

Figure 5.9: Pump-probe setup for transceiver latency measurement

The latency of the entire interconnect can be easily computed by adding the delay

in propagation to the measured latency of the transceiver pair. The measurement of

latency for NRZ data was not done with the same setup because the delays required

for that measurement were much larger. These NRZ measurements were done using

a high speed detector (2.5 GHz bandwidth), and directly evaluating the waveforms

on an oscilloscope. This is justified because the latency in this case was significantly

larger. The pump-probe method in particular can be used to characterize the variation

in latency due to supply voltage variation, which translates into jitter at the output

of the receiver. The results of those measurements were presented in Chapter 4.

Fig. 5.11 shows the measured values of latency for NRZ and short pulses for

the circuit in Fig. 5.10. These results match the SPICE simulations of the circuit

within error in estimating the parameters. Short pulses reduce the latency of the

30:10 30:10

out24:10 24:10 24:9

90:3030:10

Receiver BuffersModulator

driver

Figure 5.10: Receiver transmitter module used for testing latency via pump-probemethod. The numbers mentioned here are the sizes of PMOS and NMOS transistorsin λ, where λ = 0.2 µm.

Pulse energy (fJ)

NRZ and short pulse latency measured and simulated

measured NRZsimulated NRZmeasured spsimulated sp

Figure 5.11: Comparison of the latency of the transimpedance receiver-transmittermodule with short pulse and NRZ inputs.

transceiver pair compared to NRZ input by a very significant amount. Most of this

reduction comes from the receiver. The latency of the receiver can be further reduced

by reducing the capacitance of the diodes.

Measurements of latency of a transimpedance receiver implemented in bipolar

technology were presented by Wieland et al. [104]. The overall delay of their receiver

was 1.5 ns at 1 Gbps operation. The latency measured with NRZ data here for the

transceiver is of the same order, though the latency is quite low with short pulses.

5.2 Integrating Receiver

Fig. 5.12 shows the circuit schematic diagram of the integrating receiver. The opera-

tion of this receiver was explained in detail in Chapter 4. This receiver integrates the

current at the input for half a cycle. It evaluates and precharges for the remaining

half cycle.

clk clkclk

Vsupply

Figure 5.12: Circuit schematic of the integrating receiver frontend

In this receiver, the latency is a function of the total integrated charge. A typical

integration period is half of the clock cycle. If the energy is spread over the entire

bit period, as in the case of NRZ, the latency is half of the bit period plus the

time to resolve the logic level. Fig. 5.13 illustrates the details of the timing of this

receiver. In the case of short pulses, the pulses can arrive at the end of the integration

period and dump all the energy in an instant. As seen in the figure, the latency

with a short pulse is only the evaluation time. The evaluation time of this receiver

depends logarithmically on the amount of integrated charge due to positive feedback

amplification [106]. Certainly, for a practical system, there needs to be some timing

margin to account for the jitter and other variability in the system. This could be

incorporated after knowing the details of the system design.

Short pulsedata input

Integrating Evaluationphasephase

NRZ datainput

valid output

t2: latency with NRZt1: latency with short pulses

Figure 5.13: Latency with respect to clock in the integrating receiver with NRZ andshort pulse inputs.

This receiver operates on the principle of positive feedback, hence it is very sen-

sitive. For modeling of this receiver, the parameter values of the 0.25 µm CMOS

technology are assumed so that the results can be compared with the transimpedance

receiver. Other parameters assumed in simulation are: photodetector capacitance is

40 fF, photodetector responsivity is 0.5 A/W, and the pulse width of electrical pulses

generated from photodiode is 10 ps.

SPICE simulation of the latency of the entire integrating receiver circuit (including

SR latch) is plotted in Fig. 5.14. According to this simulation, the total latency of the

receiver is ∼ 150 ps for 50 pJ of pulse energy. The delay is approximately logarithmic

with input optical energy.

0 10 20 30 40 50 60 70 80150

Pulse energy (fJ)

Figure 5.14: Latency of the entire integrating receiver, including the SR latch, withshort pulse input computed by using SPICE circuit simulator.

5.3 Totem-pole diode receiver

Very low latency at the expense of larger power can be achieved by using a diode

pair connected in the totem-pole configuration as shown in Fig. 5.15. This design

is effectively receiver less (“recless”) as there is no voltage amplifier involved. This

receiver needs to be connected to a high impedance node like the gate of a buffer so

that the charge can be integrated. Here the input capacitance is charged to the supply

rails by providing sufficient optical power. The optical power required to charge the

node “in” to the supply rails is a linear function of the frontend capacitance, which

is typically dominated by the photodiode capacitance. If the total capacitance at the

node “in” is Cin and the total voltage swing required at the node is Vsup then the total

charge required Qtot = CinVsup. For photodiode responsivity R, the minimum optical

energy required is Eopt = CinVsup/R. This optical energy can either be delivered in a

very brief period or it can be spread out over the entire bit period (T). If the input to

this receiver is NRZ data, with the minimum required pulse energy, the input node

will reach half of the supply voltage in half a cycle (tnrz = T ), which is very long

latency. Instead, if short pulses are used, the input node will be charged immediately

(tsp), only limited by the carrier transit time in the intrinsic region of MQW diode.

The timing diagram of the charging of the input node with NRZ and short pulses is

shown in Fig. 5.16.

Figure 5.15: Schematic of the totem-pole diode pair receiver connected to the highimpedance input of the inverter buffer.

T T= T/2 tsp

NRZ input short pulse input

Figure 5.16: Voltage vs. time at node “in” of the recless receiver for NRZ and shortpulse inputs with minimum optical energy to swing the node by supply voltage.

If the flip-chip bonded photodiode capacitance is 40 fF and responsivity is 0.5 A/W

then for a total capacitance of 90 fF (assuming 10 fF capacitance of the buffer) the

optical energy required to charge the input node by 2.5 V (supply voltage for 0.25 µm

CMOS technology) is 450 fJ. By using a metal-semiconductor-metal photodiode, or

a silicon photodiode in a silicon-on-insulator process, the photodiode capacitance can

be reduced, which will reduce the optical energy required. For a 1 µm long intrinsic

region, the carrier transit time is roughly 10 ps, which determines the latency with

short pulses in this receiver. By comparison, for 1 Gbps operation, the latency with

NRZ data with minimum optical power required will be 0.5 ns. This receiver gives

the minimum latency with short pulses of all the three receivers mentioned in this

chapter, though the amount of optical energy required can be much higher, depending

on the capacitance.

5.4 Scaling of latency with technology

0 50 100 150 200 250 300 350 4000

Optical energy per bit (fJ)

0.25 µm0.5 µm

Figure 5.17: Comparing the delay of the transimpedance receiver with short pulsedata for 0.25 µm and 0.5 µm technologies by normalizing to FO4 delay in respectivetechnologies.

A fan-out-of-4 (FO4) delay is defined as the delay of a gate driving four gates of

the same size. The latency of the receiver is expected to scale roughly as FO4 delay for

the technology. The delay of a transimpedance receiver, scaled by the corresponding

FO4 delay, is compared (Fig. 5.17) in 0.5 µm and 0.25 µm CMOS technologies with

short pulse input. The comparison is done through SPICE simulations. In 0.5 µm

technology the FO4 delay is 270 ps while in 0.25µm technology it is 90 ps. The two

curves follow each other closely.

As seen from the normalized latency in two different technologies, we can con-

cluded that the latency of the receiver scales as FO4 delay in the technology. This

would allow optical interconnects to keep pace with the performance of silicon chips.

The scaling of FO4 gate delay with technology scaling is predicted in Fig. 5.18.

Technology Ldrawn ( m)µ

Figure 5.18: FO4 gate delay scaling with technology [107]

5.5 Summary

The latency of three different receiver architectures; transimpedance, integrating, and

recless (totem-pole diode pair) with NRZ and short pulse data inputs are shown in

Table 5.1. Short pulses significantly improve the performance of all three receivers.

A recless receiver with short pulses has the shortest delay, but at the expense of

optical power. The optical power required depends on the photodiode capacitance.

The transimpedance and integrating receivers have a similar performance with short

pulses.

Chip sizes are expected to increase modestly with future generations and will

remain around 2 cm across. Assuming a global interconnect of 2 cm the latency

of a repeatered electrical line is ∼ 330 ps (at 20% of the velocity of light). For

optical interconnects the propagation time for 2 cm distance in glass is ∼ 100 ps.

The latency in the transmitter can be brought down to ∼ 70 ps (assuming a single

Receiver type NRZ delay short pulse delay(ps) (ps)

Transimpedance 340 120Integrating 650 150Recless 500 10

Table 5.1: Receiver latency with NRZ and short pulse inputs. Optical energy perbit for the transimpedance and integrating receivers is ∼ 50 fJ, and for the reclessreceiver is 450 fJ.

buffer driving the modulator capacitance), and as we have shown, the receiver latency

can be reduced to ∼ 70 ps with short pulses. This shows that optical interconnects

can achieve latencies comparable to electrical interconnects or even less, at least

theoretically, for on-chip global communication.

Chapter 6

Timing in Silicon Chips

We have already seen in earlier chapters that large amplitude, and sharp rising and

falling edges of short pulses can be used for improving the sensitivity of the re-

ceivers, and reducing the latency of interconnects. Apart from these benefits, the low

pulse-to-pulse jitter in short pulses generated from a modelocked laser can help in

synchronization of the system. Phase aligning a large number of parallel intercon-

nect channels and providing a precise, skew-and-jitter free clock are two ways we will

consider to improve the synchronization of the system.

In providing a large number of parallel IOs, synchronization of all the channels to

a local clock on the receiving end is a challenging task. One way to synchronize all the

channels is to provide per channel timing management via electronics, though this is

cumbersome and requires silicon area to be devoted to each channel, which in turn

reduces the density of interconnects. Instead, the use of short pulses with modulators

automatically synchronizes all the channels by eliminating skew and jitter from the

modulator drive signals because of low pulse-to-pulse jitter in the short pulse train

as detailed in Section 6.1.

In current high-performance integrated circuits, precise clock signals are crucial

for the operation. In fact, the accuracy of clocks is a limiting factor in multiplexing

systems and analog-to-digital conversion systems. For example, the time resolution

of a NMOS sampling switch in a standard 0.8 µm CMOS technology is ∼ 21 ps

(48 Gb/s) [108] when there is no jitter on the clock, while in practical systems the

CHAPTER 6. TIMING IN SILICON CHIPS 82

attainable speeds are much slower due to jitter on the clock. We can generate large

enough swings to drive logic without any amplifier by using short pulses on the de-

tectors with low capacitance in a totem-pole diode pair (receiverless clock injection).

This eliminates delay, skew, and jitter from the receiving circuit, which can achieve

very precise clock input. We monolithically integrated silicon detectors to reduce the

capacitance of the diodes and to reduce the cost associated with hybrid integration

for this implementation. A proof-of-principle demonstration of precise clock injec-

tion with silicon detectors is described in Section 6.2. Characterization of the high

frequency response of silicon detectors using on-chip samplers is also presented.

6.1 Jitter and skew removal

High speed electrical links are typically serial links, where the entire data stream

is sent on a single channel, and the data is recovered by the receiver by extracting

the clock simultaneously. Jitter on this channel reduces the timing margins of the

receiver. In high-density parallel interconnects, if a single clock is used to extract all

the channels, the situation is even more difficult because in addition to jitter there can

be skew among the channels. Per channel skew compensation (e.g. as implemented by

Yeung and Horowitz [61]) can eliminate the skew from various sources at the receiver.

But it requires additional silicon area and does not remove jitter.

By employing short pulses in a modulator-based system, all parallel channels can be

resynchronized (illustrated in Fig. 2.5), removing both skew and jitter, and by making

the optical path lengths of all the channels equal, all the channels will be synchronized

at the receiving end. Up to half a bit of skew and jitter can be removed by this method.

To demonstrate skew removal experimentally, two channels were driven externally

from a bit stream skewed by 3/8 of a bit period [109]. Readout with a cw beam maps

the electrical drive of the modulators as seen in Fig. 6.1. These modulator channels

were then read by short pulses, which were nominally placed at the center of the

bit period. As shown in Fig. 6.2, skew was completely eliminated by the short pulse

readout.

Figure 6.1: Transmitted signals from two channels readout with a cw laser. Channelsare skewed by 3/8 of a bit period.

Figure 6.2: Skew removal by short pulse readout of two modulator channels skewedby 3/8 of a bit period. Ones and zeros are alternately read by these pulses.

Jitter from modulator channels can be similarly removed by reading out the mod-

ulator with short pulse at the nominal center of the bit. To demonstrate jitter removal

experimentally, an optical link was operated with a modulator driven by a signal with

± 3/8 bit of jitter. The receiver output was connected to another modulator, which

was read by a cw beam to give the received data shown in Fig. 6.3. The jitter has

clearly been removed by this approach.

The removal of skew and jitter demonstrates that a low jitter, periodic pulse train

from a modelocked laser can phase align the signals from an array of modulator

channels. This synchronization is achieved solely because of the short pulse readout.

short pulse readoutof modulator

signal with jittermodulator drive

Figure 6.3: Jitter removal from a single interconnect channel. Upper trace is theelectrical drive signal with jitter and the bottom trace is the optical readout of thereceiver.

A single clock can therefore recover data from all of these phase-aligned channels,

simplifying the system implementation. The optical power requirement from the

modelocked laser scales only linearly as the number of channels are increased.

6.2 Optical clock injection

The requirement of a precise clock is becoming a bottleneck in many applications.

Precise clock injection is required in analog-to-digital conversion, high speed multi-

plexing and demultiplexing, and test and measurement of high speed signals. To run

a chip synchronously, a skew and jitter-free clock needs to be distributed across the

chip [110]. To distribute the clock symmetrically, interconnections in the form of H

trees [111, 112], grid [113], and many other topologies are used. Researchers have also

used coupled oscillators [114, 115] to distribute precise clock across the chip. It is even

possible to intentionally skew the clock to improve the performance of circuits [116].

These techniques do help in clock distribution, but at the cost of significantly in-

creased complexity. Also, an extremely careful design is required to reduce the skew.

As the technology scales, the clock skew problem will get worse [117, 118].

Many attempts have been made to distribute the clock optically [112] [119]. Op-

tical clock distribution with short pulses has also been investigated by Delfyett et

al. [120] and Kawanishi et al. [121]. Delfyett et al. achieved 12 ps of jitter between

two ports under test, which is a remarkable result. All the work mentioned here con-

sisted of a receiver at each end-node to generate logic levels from the optical signal.

When distributing the clock to a large number of nodes, variation in this receiver in-

troduces skew and jitter in the received signal. We propose a receiverless scheme with

short pulses for clock injection. By using only the photodetectors and eliminating the

receiver circuit, the source of skew and jitter is also eliminated. It is then possible to

inject a precise clock to a large number of nodes with short pulses.

Monolithically integrated silicon detectors can potentially reduce the cost, simplify

the fabrication (by using standard CMOS fabrication process), and reduce the capac-

itance, as compared to hybrid integrated photodetectors. At 850 nm (wavelength of

operation), though, due to large absorption depth there are many issues with silicon

detectors. A discussion of these issues with the implementation of silicon detectors

is presented next. On-chip samplers are used to characterize the high speed response

of the silicon detectors. A demonstration of precise clock injection with a receiverless

scheme, implemented with silicon detectors, is presented at the end of this section.

6.2.1 Silicon detectors

As silicon has indirect bandgap at the wavelength of interest (850 nm), it has poor ab-

sorption [122] [123] [124]. Direct bandgap materials have an abrupt absorption change

near the band-edge, while silicon has a gradual onset of absorption near the band-

edge. Because of this weak indirect absorption, the absorption length (1/α where α

is the absorption coefficient) is roughly 14 µm. This absorption length is larger than

the typical well depth in current CMOS technologies. For example, in 0.25 µm CMOS

technology, the technology used in this work, the n-well depth is 1.2 µm. The deple-

tion region, formed near the well edge, absorbs very little optical energy. Most of the

light generates carriers deep into the substrate, which slowly diffuse to the depletion

region and generate a long tail response. This long tail inhibits the capability of using

p−substrate finger spacing

N−Well Detector

n+p+ p+

p−substrate

Interdigitated Detector (IDT)

Figure 6.4: A cross-sectional view of two silicon detector topologies

silicon photodetectors at very high speeds. By spatially blocking the light in certain

regions, and taking the difference of responses from blocked and unblocked regions, a

faster response could be obtained from silicon detectors [87]. In this method, however,

the responsivity reduces by a significant amount. Even with lower responsivity and

a long response tail, high speed receiver operation with monolithic silicon detectors

has been demonstrated [122] [125] [126] [127] [128].

In this work we have used two different kind of silicon detectors. Their cross-

sections are shown in Fig. 6.4. The first detector consists of a diode formed by

n-well and p-substrate. The second detector has interdigitated p-diffusion and n-

diffusion areas contained within an n-well. The p-diffusion and n-diffusion fingers of

the interdigitated detector are connected by metal. Using interdigitation increases

the depletion region in the device. For each topology we implemented two detectors

with slightly different dimensions, as summarized in Table 6.1.

The DC-responsivities, measured by an optical probe-station setup, were around

0.025 A/W. The capacitance of the detectors was measured on-chip by using ring

oscillators [57]. A five stage inverter ring formed an oscillator. Each inverter was

loaded with a copy of the silicon detector whose capacitance was being measured. The

frequency of the inverter stage was divided by 32 before it was extracted electrically

Area Finger spacing Capacitance(µm2) (µm) (fF)

Nwell 1 10 × 11.6 — 30Nwell 2 20 × 21.6 — 85Interdigitated 1 19.2 × 20 4.4 122Interdigitated 2 22.4 × 20 5.2 124

Table 6.1: The dimensions and the capacitances of the silicon detectors implementedin this work. Two n-well detectors and two interdigitated detectors of different sizeswere chosen.

at an output pad, so that the output pad would not need to drive a very high speed

signal outside the chip. By comparing this oscillation frequency with the frequency of

an unloaded oscillator, the capacitance of the detector was extracted. The capacitance

values are shown in Table 6.1.

6.2.2 Frequency response of silicon detectors

To be able to use silicon detectors for high-speed clock injection, we need to know

their frequency response. One way of characterizing the frequency response of a silicon

detector is by exciting it with a broadband source such as a short pulse and measuring

the response using probes [129]. Instead of loading with the large capacitance of the

probe, here we measure the frequency response of silicon detectors with realistic

loading, using an on-chip sampler and short pulse excitation. Details of the design

and operation of the sampler are given in Chapter 3. The bandwidth of the sampler

was found to be ∼ 4 GHz through simulations.

The voltage signal created by the short pulse excitation of the silicon detector

connected to a high impedance node of the sampler was sampled. Since the short

pulses were about 150 fs long, the impulse response of the detector was measured to a

good approximation. To make the signal periodic, the detector input was reset every

period to the supply rail. Clocks were aligned in such a way that short pulses arrived

slightly after the reset was released. A typical sampled trace is shown in Fig. 6.5.

The small glitch at the beginning of the trace was due to the release of the reset on

−6 −4 −2 0 2 4 6 81.5

time (ns)

↓ Reset Released

←Impact of optical pulse

Long Tail

Student Version of MATLAB

Figure 6.5: The sampled signal trace showing the response of the first interdigitateddetector to an optical short pulse. The optical energy in the pulse was 0.74 pJ.

the detector. The optical pulse induced a fast falling edge followed by a long tail at

the detector.

By taking the derivative of measured voltage we can get the profile of the photo-

current (i(t) = CdV/dt) assuming the capacitance remains the same. The Fourier

transform of the current signal (i(t)) gives the frequency response of the detector.

This response for different diodes is shown in Fig. 6.6. The plot shows that the

frequency response falls off as sub-20 dB/decade until roughly 2 GHz [130]. This

fall off indicates that detectors can not be modeled as a first-order system; in fact

the response falls off roughly as 10 dB/decade suggesting that the detector frequency

response can be modeled as:

h(s) =1

1 +√sτ

. (6.1)

With the present setup it is hard to measure τ because it is very small. The curve

falls off very rapidly around 4 GHz because of the frequency response of the samplers.

The sub-20 dB/decade fall off of detector frequency response allow a large enough

Freq (Hz)

square root sdependence

Roll off dueto finite BWof samplers

NWELL1NWELL2IDT1root s

Figure 6.6: The frequency behavior of the various silicon detectors. The response ofthe second interdigitated detector was not included for clarity. The curves have beennormalized with respect to their first frequency component for comparison reasons.

signal to be obtained at high frequencies, making it easier to use silicon detectors at

high speeds.

The output of the sampler settles on a much smaller time scale than the repetition

period of the short pulse laser. This allows us to extract more information from the

sampler by connecting its output to a high speed digital oscilloscope. The oscilloscope

itself constructs the trace by consecutively sampling different periods of the real signal.

Thus, by acquiring the oscilloscope’s whole signal, we actually get several versions of

the sampled signal. From this information, an estimate of noise in the signal can be

made for a certain delay. In this technique there is a complication that the sampler

output is affected by clock feedthrough at the slave switch (refer to Chapter 3 for

the schematic of the sampler). This can be compensated by calibrating the ripple

beforehand for different output currents. This method is used to estimate the jitter

in the injected clock signal.

6.2.3 Receiverless clock injection

By eliminating the receiver amplifier circuit, a potential source of skew and jitter can

be removed. Optical path lengths can be controlled very precisely to distribute a

virtually skew-free signal to different points of the chip. Very low pulse-to-pulse jitter

short pulses from a modelocked laser, along with a receiverless logic recovery scheme,

can provide a very precise clock. We present a proof-of-principle demonstration of

this concept in this section.

Willams et al. used a single detector to generate full logic swing for a telecommu-

nications receiver [90]. They used an erbium doped fiber amplifier (EDFA) to amplify

the optical power and the output of the detector was driving a 50 Ω resistance. In-

stead, we propose a high impedance load connected to the diode. Low capacitance of

the monolithically integrated diode may reduce the optical power requirement.

To implement this scheme, two silicon diodes are connected in a totem-pole config-

uration. The top diode is connected to the supply and the bottom diode is connected

to the ground as shown in Fig. 6.7. When the optical pulse is incident on the top

diode, the node marked as in is charged to the supply voltage and when the optical

pulse falls on bottom diode, node in discharges to the ground1. By alternating the

pulses on the top and bottom diodes, we were able to inject a very precise clock into

the chip. In the present case, a small CMOS inverter was connected to the output of

the totem-pole diode configuration as a high impedance load. To verify the operation

of this scheme, the output of this inverter was sampled by on-chip samplers. Since

the detectors in this design were very small, the output of the inverter was sampled

rather than the detector to minimize the distortion. We could set the node of the

detector to an external bias voltage for testing via a pass gate designed on the chip.

This scheme was implemented with two different kind of detectors. In the first

implementation a totem-pole was created with n-well/p-substrate detector and a p-

diffusion/n-well detector at 5 µm spacing. We were not able to create very significant

swing on this device because of the large difference in the responsivity of the detectors

and the diffusion of carriers from one detector to another. One possible solution to

1This is a simplified description. In fact the diodes can go into forward bias under illumination,

which means the node in can actually go above the supply voltage or below ground.

set pulse

samplednode

reset pulseT/2 delayed

node ofinterest

Figure 6.7: Schematic of receiverless optical clock injection with optical short pulsesusing a totem-pole diode pair. The inverter provides very little capacitive loading,though it can be eliminated and clock can be injected directly at the desired node.

set pulse

samplednode

600 Ω

Figure 6.8: Equivalent circuit of the totem-pole pair implementation with interdig-itated diodes. Due to substrate connection this device was self-resetting.

this problem may be to increase the separation of the diodes.

The second scheme was implemented with interdigitated detectors with two fin-

gers. The area of the detector was 14.4 × 10.4 µm2. Unfortunately, this scheme cre-

ated a substrate connection, tying the device node to ground. Though, this ground

connection was effectively through a 600 Ω resistance, which made this scheme self

resetting and required only one beam input. There are also disadvantages of self-

resetting: first, operation now requires more optical power and second, the resetting

edge is completely controlled by the substrate connection and not through the beam.

Fig. 6.9 shows the operation of this device via the sampled output of the inverter

when a 6 pJ pulse was incident on the device. Assuming 600 Ω resistance, this curve

Time (ns)

7.2 7.4 7.6 7.8 8 8.2 8.41.2

Student Version of MATLAB

Figure 6.9: Receiverless optical clock injection with optical short pulses of 6 pJ ontothe totem-pole configuration of interdigitated detectors.

is very similar to the one predicted by simulations. Simulations also infer that the

slew rate of this curve is limited by the inverter.

Fig. 6.9 is a grey scale image with the intensity of a point proportional to the

probability of the sample. At the midpoint of the swing on the rising edge at the

device (the falling edge on the curve in Fig. 6.9 because this signal is after the inverter)

we can obtain a histogram to determine the jitter on the signal. Fig. 6.10 shows this

histogram. The standard deviation of this histogram is 4 ps and the peak-to-peak

jitter is 20 ps. These measurements are close to the accuracy of our measurement

setup; when the optical pulse was directly put into the oscilloscope, through the

oscilloscope’s optical input, a similar jitter was obtained with a standard deviation

of 3.7 ps.

The incidence time of a pulse can be very precisely controlled by changing the

path length of the beam. This allows for very precise clock phase variation. Jitter

histograms of normal incidence and 10 ps delayed incidence in Fig. 6.10 illustrate

the controllability of the clock phase [130]. The mean of the falling edge after the

inverter is shifted by exactly 10 ps, proving the accuracy of the technique. In this

7.45 7.46 7.47 7.48 7.49 7.50

35Early Curve: µ :7.4678 ns σ :3.9253 ps Hits :298

Late Curve: µ :7.4778 ns σ :3.988 ps Hits :319

time (ns)

Figure 6.10: Histogram of the pulse signals crossing at marker level at half theirswing. The histograms correspond to two experiments one of which is delayed 10 psmore compared to the reference clock.

proof-of-principle demonstration, we measured the output of the inverter, which not

only reduced the slew rate of the signal but also added the jitter. Consequently, we

can conclude that the signal at the detector might perform even better.

This scheme can be improved further by using lower capacitance diodes. The

silicon-on-insulator process reduces the capacitance of integrated diodes quite sig-

nificantly, which will reduce the amount of optical power required. To improve the

responsivity of these diodes, the frequency of the light can be doubled. At 425 nm

the absorption length reduces to ∼ 0.2 µm. This will make a large impact on the

performance of these detectors.

6.3 Summary

Short pulses can potentially help in synchronization issues. As we saw in Section 6.1,

short pulses can synchronize an array of modulators by eliminating skew and jitter.

By nominally placing the pulses in the center of the bit period, skew and jitter of up

to half of a bit period can be removed. We demonstrated skew and jitter removal

of 3/8 of a bit period with this method. Synchronizing channels with short pulses

should eliminate the need for per-channel skew compensation, reducing the overall

complexity of the design. In conclusion, short optical pulses provide a simple and

scalable solution to data recovery in very large parallel interconnects.

Very precise clock injection is also possible with short pulses using a receiver-

less scheme, with potential applications in analog to digital conversion, high speed

multiplexing and demultiplexing, and low-skew on-chip clock distribution. Silicon

detectors were investigated because of the ease in integration with the current CMOS

process and low capacitance. Using on-chip electrical samplers, the high frequency

response of these detectors was obtained. A proof-of-principle experiment presented

here demonstrated the operation of precise clock injection with very low jitter.

Chapter 7

Wavelength Division Multiplexing

System

It is a common practice to use wavelength-division multiplexing (WDM) in telecom-

munications. The use of WDM enhances the capacity of the fibers by using passive

optical components to separate different wavelength channels, and have each channel

processed by the electronic circuits. Moreover, currently deployed WDM systems

typically operate in the wavelength range of erbium doped fiber amplifiers. A single

amplifier can amplify all the channels simultaneously, a significant advantage in the

system.

For short distance interconnects the issues are significantly different. The through-

put, the latency, and the cost are of paramount importance. By operating each chan-

nel at the maximum speed of the silicon technology without time-division multiplexing

and providing a large number of parallel channels, the throughput can be increased

while keeping transmitters and receivers very simple. Silicon CMOS allows cheap yet

dense circuits, and by using multiple parallel channels, a higher throughput can be

achieved at a low cost. Compared to point-to-point links, wavelength-division mul-

tiplexing allows communication using a single fiber. In space-constrained backplanes

and non-line-of-sight links, WDM might be a preferred solution.

Typical WDM implementations involve one laser for each channel emitting at a

specified wavelength, that is monitored very closely to avoid drift. As the number

CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 96

of channels increases, the number of wavelengths also increases, increasing the cost

and the complexity of the system. Also, the channels need to be synchronized at the

receiver end to remove skew and jitter for a synchronous system implementation.

A broadband optical source can be spectrally sliced to generate WDM channels,

which could then be modulated using modulators. This concept was implemented by

Wagner et al. [131] and Sampson et al. [132] using a super luminescent light emit-

ting diode (LED) as a broadband source. This implementation removes the wave-

length monitoring requirement from each individual channel, though synchronization

still needs to be done. Spectrally sliced WDM can also be implemented using short

pulses [46] [133] [134] [135]. Using femto-second pulses as a broadband source can not

only remove the monitoring requirement of each channel, but also synchronize all the

channels in the readout of modulators.

In this chapter we present a proof-of-principle demonstration WDM system, using

spectral slicing of short pulses for short distance interconnects. This system can po-

tentially utilize all the advantages of short pulses mentioned in the earlier chapters.

The concept of a WDM system using spectral slicing of short pulses is presented in

Section 7.1. First and second generation system implementation and measurement re-

sults are presented in Section 7.2. Finally the conclusions and possible improvements

to the system are mentioned.

7.1 Concept of WDM with short pulses

Femtosecond pulses have a very large bandwidth. A 150 fs pulse has a spectral width

of roughly 5 nm. This spectrum can be divided to form different channels. A train of

short pulses is represented by a train of Dirac-delta impulses in the frequency domain

with the envelope of these impulses determined by the Fourier transform of the pulse

shape.

n=+∞∑

n=−∞

p(t− nT )⇔ P (f)n=+∞∑

n=−∞

δ(f − n

T) (7.1)

fs pulses

at 80 Mhz

blazed grating

separated by the grating17 Comb of frequencies

µ45.5

Figure 7.1: An exaggerated view of the frequency comb incident on the modulatorarray. Frequency components of the 80 MHz pulse train are separated in space by ablazed grating.

where P (f) is the Fourier transform of pulse p(t). The separation of impulses in the

frequency domain is the same as the repetition frequency of pulses in the time domain.

To maintain the pulse sufficiently short for each channel, it is important to have a

large number of impulse components in the frequency domain for each channel. Us-

ing a single component (effectively a single wavelength) for each channel would make

the WDM implementation similar to a broadband source implementation and many

advantages of short pulses in interconnects would not be utilized. Fig. 7.1 shows an

exaggerated view of different frequency components incident on an array of modula-

tors. The modulator spacing shown in the figure corresponds to the implementation

in this work. For the non-flat envelope of pulse spectrum (P (f)), different channels

will encounter different optical powers. This variation in power needs to be accounted

in the optical power budget of the system.

A single short pulse source generating all the channels simplifies many system

criteria and provides multiple advantages. The benefits of the short pulse WDM

system with MQW diodes hybrid-integrated to silicon CMOS chips are summarized

below:

i. In traditional WDM, different lasers generate different channels, which need to

fs pulses

Transmitterchip

gratingsReceiver

high speeddetector

readoutbeam

Figure 7.2: Schematic of the WDM system implementation

be carefully monitored. If the laser frequency drifts, it can generate crosstalk

to the neighboring channel. However, using a single source to generate all the

channels eliminates this problem. The linear dimension of the modulator defines

the spectral width of a channel, and the separation of the modulators defines the

guard band between the channels. Since these are fixed dimensions, no monitoring

is required.

ii. All the advantages of short pulses mentioned in the earlier chapters can be uti-

lized.

iii. Hybrid integration enables each channel to be placed very close to the electrical

origin of the signal, reducing the latency of the propagation in the wires. In con-

trast, if multiple streams are time-multiplexed onto a single channel, the streams

need to be routed to the multiplexer, incurring extra latency in the process. In

WDM, channel multiplexing and demultiplexing can be accomplished using a

passive optical component without incurring any latency penalty.

The schematic in Fig. 7.2 shows the principle of operation of the WDM system.

Short optical pulses (∼ 150 fs), generated by a Ti:Sapphire modelocked laser, are

dispersed by the first grating into a wavelength spread in space. A lens collimates

the different wavelengths, which are then incident on the array of modulators. Each

modulator modulates a small band of wavelengths. The modulated light is reflected

back to the grating where it is again combined into a single beam (multiplexing). A

single mode fiber transports this beam to the destination. A second grating disperses

the beam into a spatial wavelength spread (demultiplexing). The modulated channels

are put on the corresponding receiver diodes. Received data is converted to a full logic

swing by the receiver.

It is important to note here that the pulses after modulation and multiplexing

are of the order of a few picoseconds. The width of the individual pulse is still short

compared to the bit period, thus retaining all the advantages of the short pulse link.

Dispersion in the fiber is not a concern because the distances involved are of the order

of a few tens of meters. Even for longer distance, Shen et al. have demonstrated the

transmission of spectrally sliced channels with a total span of 15 nm over a 2.5 km

standard single mode fiber/dispersion-compensating fiber link with less than 3 ps

timing skew [46].

Implementation of the proof-of-principle demonstration system is mentioned in

the next section.

7.2 System implementation

The diodes integrated on the silicon chips had a pitch of 62.5 µm. The ∼ 5 nm

bandwidth of the short pulse train was spatially distributed on 20 diodes, with each

two adjacent diodes forming a differential channel. The light falling between the

diodes is ideally absorbed, which forms a guard band between each diode. In a

system prone to misalignment this guard band avoids crosstalk between channels,

but it is not really required in a properly aligned system. Removing the guard band

will improve the power budget of the system. A channel spacing of 0.5 nm was used,

which corresponded to a frequency separation of ∼ 200 GHz. The modulators had

a window of 17 µm × 17 µm, which modulated the light. Modulator spacing was

inefficient but the pitch of the diode array was fixed for future operation with fiber

ribbon. At 80 MHz (the repetition frequency of the laser), around 340 frequency

components fall on the modulator and 900 components fall on the space between the

diodes. The number of frequency components modulated by the modulator was large

enough to retain the pulse width in picoseconds, which was still much shorter than

the bit period.

A silicon chip designed in the 0.5 µm technology was used in testing. The chip

consisted of a pseudo-random bit sequence (PRBS) generator driving an array of ten

differential channels. Integrating receivers were used because of high sensitivity. The

receivers were designed to either drive modulators for all-optical testing, or to drive

on-chip circuits for bit error rate testing.

Two iterations of system implementation were done to fix the shortcomings of the

first generation as mentioned below.

7.2.1 Optical setup

A picture of the first generation optical setup is shown in Fig. 7.3. Spindler and

Hoyer components were used to assemble the system. Short pulses were directed to

the chip by a polarizing beam-splitter, so that the modulated beam reflected from

the chip could be rotated in polarization by 90o to redirect it in a different direction,

and couple into a single mode fiber. A pellicle beam splitter was used to illuminate

the chip with an infra-red LED to help align the input pulses by viewing the chip

through a camera. This pellicle was removable to minimize the loss. The pellicle was

very thin, and it did not shift the position of the beam.

There were a few problems with this first generation optical setup:

i. The components were mounted at a height on thin rod structures, which made

them susceptible to mechanical vibrations.

ii. Any vibration causing angular variation in the grating caused large lateral motion

of the spots, and sometimes the spots moved off the optical devices.

iii. Losses in the system were too high to make the entire link work.

Another setup was designed to fix these issues. This second setup was built

on baseplates, which provide a much more stable platform. The vibration problem

transmitterchip grating

Figure 7.3: First generation optical setup using Spindler and Hoyer components.The portion on the transmitter side is visible.

splitter

receiverchip

pellicle beammultiplexed beamthrough the fiber

grating for demultiplexingimagingcamera

IR LED

Figure 7.4: Second generation WDM link optical setup. A closeup of the receiverside is shown in the picture.

encountered in the first setup was eliminated in this implementation. Baseplates are

discussed in detail by Brubaker et al. [48] and a brief description is given in Chapter 3.

Fig 7.4 shows a closeup of the optical setup on the receiver side, where the receiver

chip, the imaging camera, and the grating are visible. Gold-coated echelle gratings

were used for multiplexing and demultiplexing different wavelength channels on the

MQW diode array. To align the input beam to the right receiver, the chip could be

observed through a camera. Once aligned, the pellicle beam splitter could be removed

from the setup without beam deviation to reduce the overall loss in the link.

846.5 847 847.5 848 848.5 849 849.50.1

ch.1 ch.3ch.2 ch.4

λ λ = 849.1 nm

arraymodulator

= 847.1 nm

Figure 7.5: CCD scan of the wavelength of the modulated transmitter output.Solid and dashed lines represent two snapshots at different times. The correspondingmodulators are shown below the wavelength scan.

7.2.2 Measurement results

By replacing the receiver chip with a CCD camera, channel definitions in the received

beam could be visualized. Different wavelengths were imaged linearly across the

camera. Fig. 7.5 shows the first four channels at two different time instances. All

four channels have changed their state. The finite contrast ratio of the modulators is

visible in the picture.

The losses in the second generation system were still quite high. Due to a large

coupling loss into the single mode fiber, the link operation with the fiber was not

feasible. It was possible to operate the link with light propagating in free space. A

single channel was tested by externally driving a 32-bit pseudo random sequence on a

transmitteddata

receiveddata

Figure 7.6: 80 Mb/s operation of a single channel in a WDM link

modulator. The drive signal of the modulator was correctly replicated by the receiver

(Fig. 7.6). The receiver output was connected to an electrical pad to view the signal

directly on the oscilloscope. This result demonstrates, in principle, the operation of

a short-pulse-based WDM system [136].

The main reasons contributing to the low power in the system were: a) larger than

expected losses in optical components, such as the gratings; b) low contrast ratio of

the modulators (∼ 1.3); c) a large spacing between modulators, effectively reducing

spectral utilization efficiency. The next step would be to reduce the losses in the

system to be able to operate the WDM link with the fiber.

7.3 Summary

Combining all the advantages of short pulses, a spectrally sliced WDM system could

be implemented. The main features of such a system are: no need for wavelength

monitoring, receiver sensitivity enhancement, latency reduction in the receivers, and

synchronization of all the channels.

In this WDM implementation, fiber coupling losses were the final obstacle in mak-

ing the interconnect operate through the fiber. There are many ways in which this

system and components can be improved to increase the power budget and the per-

formance. The spacing between the modulators can either be eliminated, or some

method can be used to be more spectrally efficient, i.e., all the light intended for one

channel can be focused on the modulator. Using micro-lenses is one way of focusing

all the light on the modulator. Or, using frequency comb, the energy can be con-

centrated on the appropriate modulator locations [137]. Improving the contrast ratio

of the modulators will also improve the system performance. MQW diodes bonded

on the silicon chips had a capacitance of ∼ 260 fF. By reducing this capacitance,

the sensitivity of the integrating receiver can be improved, relaxing the system power

budget.

In conclusion, a proof-of-principle operation of a short-pulse-based WDM inter-

connect system, without fiber, was demonstrated.

Chapter 8

Conclusions

In providing dense interconnects with large bandwidths to silicon chips, optics can

be an alternative to electrical wires. Latency, power budget, and synchronization are

critical issues in interconnects. Optical links might be able to address these issues by

using unconventional means, i.e., short pulse signaling.

This dissertation has shown that the RZ data format with a low duty cycle (short

pulses), instead of NRZ, can bring significant improvements in the interconnect per-

formance. Short pulses have a) all the energy concentrated in a very short time

(sub-picosecond and picoseconds); b) very sharp rising and falling edges; c) wide

bandwidth (few nm); and d) very low pulse-to-pulse jitter depending on the mode of

generation. Because of these properties, short pulses in interconnects may provide the

sensitivity enhancement and latency reduction in receivers, synchronization of large

modulator arrays, precise clock injection to silicon chips, and single-source WDM. A

short pulse system is feasible only in optics because of low attenuation and dispersion

during propagation.

The flip-chip bonding process as described in Chapter 3 allows the integration

of well-established high-performance silicon circuits with optically superior GaAs de-

vices. This integration enables the use of short pulses in optical interconnects to

silicon chips with very little added parasitics.

Chapter 4 showed that the sensitivity of transimpedance and integrating receivers

CHAPTER 8. CONCLUSIONS 107

could be enhanced by using short pulses. A 3 dB sensitivity enhancement for an inte-

grating receiver was demonstrated. A transimpedance receiver operating with short

pulses may generate voltage spikes on the supply, which can degrade its performance.

In contrast, an integrating receiver integrates the charge in the pulses and will not

generate current spikes because of short pulses. This receiver was found to be better

suited for operation with short pulses.

The latency of optical interconnects can be reduced to make them feasible for

global on-chip interconnects as shown in Chapter 5. The latency of three receiver

architectures (transimpedance, integrating, and totem-pole diode pair) was analyzed.

Short pulses significantly improved the performance of all three receivers. It was

demonstrated that the latency of the transimpedance receiver could be reduced by ∼65% by using short pulses compared to NRZ data. A totem-pole diode pair (“recless”)

receiver had the shortest delay in short pulse interconnects, but at the expense of

optical power.

In dense parallel interconnects, synchronization of all the bit streams on the re-

ceiver side for easy data recovery is a critical task. Skew and jitter of up to half a bit

can be removed from an entire array of modulators with a short pulse readout. Skew

and jitter removal of 3/8 of a bit period was demonstrated in Chapter 6. Compared to

schemes such as per-pin skew compensation, this scheme is simple and easily scalable

without reducing the density of interconnects. The laser output power requirement

scales linearly with the number of channels.

A precise skew and jitter-free clock is required in applications such as high speed

multiplexing and demultiplexing, analog-to-digital conversion, and precise sampling

of on-chip signals. It is demonstrated that by eliminating the receiver amplifier cir-

cuit and using only the diode pair, a precise clock can be injected into the circuit.

Even though silicon detectors have a long-tail response at 850 nm because of deep

carriers, they were used because of the potential for lower capacitance and cost. The

high frequency response of these detectors was obtained using on-chip samplers. A

proof-of-principle experiment presented in Chapter 6 demonstrated the precise clock

injection with very low jitter.

Many advantages of short pulses were incorporated in the demonstration of a

spectrally-sliced WDM interconnect system in Chapter 7. The main features of such

a system are: no need for wavelength monitoring, receiver sensitivity enhancement,

latency reduction in receivers, and synchronization of all the channels. This system

was operated at 80 MHz, the repetition rate of the short pulse laser. The losses were

very high in the system and the transportation of all the channels via fiber could not

be demonstrated.

A very large IO throughput can be achieved by using flip-chip bonded MQW

diodes. In the present work, 200 diodes were integrated in an area of ∼ 1.2 × 1.2 mm2.

With a differential scheme, a total of 100 IO could be potentially operated. Assuming

a conservative speed of operation at 600 Mbps (it improves with technology scaling),

the total throughput could be 60 Gbps from this chip. This demonstrates the huge

throughput possible with optical interconnects.

Future work

This dissertation has tried to explore short pulse (RZ) signaling in interconnects. This

work has just scratched the surface of this potentially vast field. Dense interconnects

to silicon chips, and global on-chip interconnects might be practical using short pulses.

To demonstrate the feasibility, systems with the possibility of miniaturization need

to be built. It means that the packaging of optical systems becomes a critical issue.

Traditionally, the packaging has been one of the bottlenecks in widespread implemen-

tation of optics. A lot of effort is being put in to miniaturize optical systems and

to improve the optomechanics. Modelocked semiconductor lasers are getting small

enough to fit into a reasonably sized system, though more research is required in this

area. Work on the optical bridges [138] to simplify and miniaturize the optomechanics

is a step forward for improved packaging.

The interconnect system in the present work can be improved in many ways.

On the component side, lower capacitance and high contrast ratio devices will be

very helpful. Devices used in this dissertation work had a capacitance of ∼ 260 fF.

Smaller MQW devices can be fabricated to reduce the capacitance to below 50 fF [30].

Flip-chip bonding on silicon-on-insulator circuits will further reduce the capacitance

of these devices. The contrast ratio of these devices needs to be enhanced for bet-

ter signal-to-noise ratio in interconnects. Low contrast devices are easily saturated

because of the high power required to get sufficient signal strength. The circuits pre-

sented in this work are meant to demonstrate the properties and advantages of short

pulses. The optimization of these circuits, specifically for the operation with short

pulses will create a more efficient interconnect system.

The scaling of CMOS technology will help improve the performance of modulator

drivers and receivers. However, it will also create new challenges. A lower supply

voltage will make it harder to get a high contrast ratio from the modulators. These

modulators will need to be redesigned to operate with smaller swings. Or, failing

that, the circuits will need to provide a larger-than-supply swing to operate the mod-

ulators.

Bibliography

[1] J. Goodman, F. Leonberger, S. Kung, and R. Athale, “Optical Interconnections

for VLSI Systems,” Proceedings of the IEEE, vol. 72, pp. 850–866, 1984.

[2] J. Goodman, “Fan-in and fan-out with optical interconnects,” Optica Acta,

vol. 32, pp. 1489–1496, 1985.

[3] M. Feldman, S. Esener, C. Guest, and S. Lee, “Comparison between optical

and electrical interconnects based on power and speed considerations,” Applied

Optics, vol. 27, no. 9, pp. 3820–3829, 1988.

[4] D. Miller and H. Ozaktas, “Limit to the Bit-Rate Capacity of Electrical Inter-

connects from the Aspect Ratio of the System Architecture,” Journal of Parallel

and Distributed Computing, vol. 41, pp. 42–52, Feb. 1997.

[5] D. Miller, “Physical Reasons for Optical Interconnection,” International Jour-

nal of Optoelectronics, vol. 11, pp. 155–68, May 1997.

[6] D. Miller, “Dense Optical Interconnections for Silicon Electronics,” in Trends in

Optics: Research, Developments, and Applications, vol. 3 of Ed: A. Consortini,

pp. 207–222, 1996.

[7] D. Miller, “Rationale and Challenges for Optical Interconnects to Electronic

Chips,” Proceedings of the IEEE, vol. 88, pp. 728–749, June 2000.

[8] A. Krishnamoorthy and D. Miller, “Scaling Optoelectronic-VLSI Circuits into

the 21st Century: A Technology Roadmap,” Journal Selected Topics in Quan-

tum Electronics, vol. 2, pp. 55–76, Apr. 1996.

BIBLIOGRAPHY 111

[9] D. Miller, “Optics for low-energy communication inside digital processors:

quantum detectors, sources, and modulators as efficient impedance converters,”

Optics Letters, vol. 14, no. 2, pp. 146–148, 1989.

[10] A. Krishnamoorthy and D. Miller, “Firehose architectures for free-space opti-

cally interconnected VLSI circuits,” Journal of Parallel and Distributed Com-

puting, vol. 41, pp. 109–114, Feb. 1997.

[11] H. Ozaktas and J. Goodman, “Implications of interconnection theory for optical

digital computing,” Applied Optics, vol. 31, no. 26, pp. 5559–5567, 1992.

[12] M. Haney and M. Christensen, “Performance Scaling Comparison for Free-

Space Optical and Electrical Interconnection Approaches,” Applied Optics,

vol. 37, pp. 2886–2894, May 1998.

[13] G. Yayla, P. Marchand, and S. Esener, “Speed and Energy Analysis of Digital

Interconnections: Comparison of On-Chip, Off-Chip, and Free-Space Technolo-

gies,” Applied Optics, vol. 37, pp. 205–227, Jan. 1998.

[14] E. Berglind, L. Thylen, B. Jaskorzynska, and C. Svensson, “A comparison of

dissipated power and signal-to-noise ratios in electrical and optical intercon-

nects,” Journal of Lightwave Technology, vol. 17, pp. 68–73, Jan. 1999.

[15] W. Dally and J. Poulton, “Transmitter equalization for 4-Gbps signaling,” IEEE

Micro, vol. 17, pp. 48–56, Jan. 1997.

[16] M. Horowitz, C. Yang, and S. Sidiropoulos, “High-speed electrical signaling:

overview and limitations,” IEEE Micro, pp. 12–24, Jan. 1998.

[17] A. Lentine, K. Goossen, J. Walker, L. Chirovsky, L. D’Asaro, S. Hui, B. Tseng,

R. Leibenguth, D. Kossives, D. Dahringer, D. Bacon, T. Woodward, and

D. Miller, “Arrays of optoelectronic switching nodes comprised of flip-chip-

bonded MQW modulators and detectors on silicon CMOS circuitry,” IEEE

Photonics Technology Letters, vol. 8, pp. 221–223, Feb. 1996.

BIBLIOGRAPHY 112

[18] D. Cutrer and K. Lau, “Ultralow power optical interconnect with zero-biased,

ultralow threshold laser-how low a threshold is low enough?,” IEEE Photonics

Technology Letters, vol. 7, pp. 4–6, Jan. 1995.

[19] R. Pu, C. Duan, and C. Wilmsen, “Hybrid integration of VCSEL’s to CMOS

integrated circuits,” Journal on Selected Topics in Quantum Electronics, vol. 5,

pp. 201 –208, Mar. 1999.

[20] A. Andreou, Z. Kalayjian, A. Apsel, P. Pouliquen, R. Athale, G. Simonis, and

R. Reedy, “Silicon on sapphire CMOS for optoelectronic microsystems,” IEEE

Circuits and Systems Magazine, vol. 1, no. 3, pp. 22–30, 2001.

[21] K. Choquette, V. Hietala, K. Geib, S. Mukherjee, and A. Allerman, “Hybrid

integrated VCSEL and driver arrays for optical interconnects,” in 13th Annual

Meeting of IEEE Lasers and Electro-Optics Society, vol. 2, pp. 424–425, 2000.

[22] F. Delpiano, B. Bostica, M. Burzio, P. Pellegrino, and L. Pesando, “10-channel

optical transmitter module operating over 10 Gb/s based on VCSEL and hybrid

integrated silicon optical bench,” in Electronic Components and Technology

Conference, pp. 759–762, 1999.

[23] K. Ebeling, “VCSELs: prospects and challenges for optical interconnects,” in

13th Annual Meeting of IEEE Lasers and Electro-Optics Society, vol. 1, pp. 7–8,

[24] D. Miller, D. Chemla, T. Damen, A. Gossard, W. Wiegmann, T. Wood, and

C. Burrus, “Band edge Electro-absorption in Quantum Well Structures: The

Quantum Confined Stark Effect,” Physical Review Letters, vol. 53, pp. 2173–

2177, Nov. 1984.

[25] G. Boyd, D. Miller, D. Chemla, S. McCall, A. Gossard, and J. English, “Mul-

tiple Quantum Well Reflection Modulator,” Applied Physics Letters, vol. 50,

pp. 1119–1121, Apr. 1987.

BIBLIOGRAPHY 113

[26] R. Simes, R. Yan, C. Barron, D. Derrickson, D. Lishan, J. Karin, L. Coldren,

M. Rodwell, S. Elliot, and B. Hughes, “High-frequency electrooptic Fabry-Perot

modulators,” IEEE Photonics Technology Letters, vol. 3, pp. 513 – 515, June

[27] K. Goossen, J. Cunningham, W. Jan, and R. Leibenguth, “On the operational

and manufacturing tolerances of GaAs-AlAs MQW modulators,” IEEE Journal

of Quantum Electronics, vol. 34, pp. 431–438, Mar. 1998.

[28] M. Islam, R. Hillman, D. Miller, D. Chemla, A. Gossard, and J. English, “Elec-

troabsorption in GaAs/AlGaAs Coupled Quantum Well Waveguides,” Applied

Physics Letters, vol. 50, pp. 1098–1100, Apr. 1987.

[29] G. Livescu, D. Miller, T. Sizer, D. Burrows, J. Cunningham, A. Gossard, and

J. English, “High-speed absorption recovery in quantum well diodes by diffusive

electrical conduction,” Applied Physics Letter, vol. 54, pp. 748–750, 1989.

[30] K. Goossen, J. Walker, L. D’Asaro, S. Hui, B. Tseng, R. Leibenguth, D. Kos-

sives, D. Bacon, D. Dahringer, L. Chirovsky, A. Lentine, and D. Miller, “GaAs

MQW modulators integrated with silicon CMOS,” IEEE Photonics Technology

Letters, vol. 7, pp. 360 –362, Apr. 1995.

[31] F. Kiamilev, J. Lambirth, R. Rozier, and A. Krishnamoorthy, “Design of a 64-

bit, 100 MIPS microprocessor core IC for hybrid CMOS-SEED technology,” in

Proceedings of the Third International Conference on Massively Parallel Pro-

cessing Using Optical Interconnections, Oct. 1996.

[32] R. Rozier and F. Kiamilev, “Design of an MCM FFT processor,” IEEE Multi-

Chip-Module Conference, pp. 83 – 88, Feb. 1997.

[33] A. Walker, T. Yang, J. Gourlay, J. Dines, M. Forbes, S. Prince, D. Baillie,

D. Neilson, R. Williams, L. Wilkinson, and G. Smith, “Optoelectronic systems

based on InGaAs-complementary-metal-oxide-semiconductor smart-pixel arrays

and free-space optical interconnects,” Applied Optics, vol. 37, pp. 2822–2830,

May 1998.

BIBLIOGRAPHY 114

[34] O. Kibar, D. Van Blerkom, F. Chi, and S. Esener, “Power minimization and

technology comparisons for digital free-space optoelectronic interconnections,”

Journal of Lightwave Technology, vol. 17, pp. 546–555, Apr. 1999.

[35] C. Fan, B. Mansoorian, D. Vanblerkom, M. Hansen, V. Ozguz, S. Esener, and

G. Marsden, “Digital free-space optical interconnections: a comparison of trans-

mitter technologies,” Applied Optics, vol. 34, pp. 3103–3115, June 1995.

[36] T. Nakahara, S. Matsuo, S. Fukushima, and T. Kurokawa, “Performance com-

parison between multiple-quantum-well modulator-based and vertical-cavity-

surface-emitting laser-based smart pixels,” Applied Optics, vol. 35, pp. 860–871,

Feb. 1996.

[37] J. Goodman, Introduction to Fourier Optics. New York: McGraw-Hill, 1968.

[38] L. Camp, R. Sharma, and M. Feldman, “Guided-wave and free-space optical

interconnects for parallel-processing systems: a comparison,” Applied Optics,

vol. 33, pp. 6168–6180, Sept. 1994.

[39] S. Esener, “Implementation and prospects for chip-to-chip free-space optical

interconnects,” in Electron Devices Meeting, 2001.

[40] P. Rosenberg, K. Giboney, A. Yuen, J. Straznicky, D. Haritos, L. Buckman,

R. Schneider, S. Corzine, F. Kiamilev, and D. Dolfi, “The PONI-1 parallel-

optical link,” in Proceedings of the Electronic Components and Technology Con-

ference, pp. 763 – 769, June 1999.

[41] N. Boden, D. Cohen, R. Felderman, A. Kulawik, C. Seitz, J. Seizovic, and

Wen-King Su, “Myrinet: a gigabit-per-second local area network,” IEEE Micro,

vol. 15, pp. 29–36, Feb. 1995.

[42] “The International Technology Roadmap for Semiconductors (2001 Edition).”

[43] R. Ho, K. Mai, and M. Horowitz, “The Future of Wires,” Proceedings of the

IEEE, vol. 89, pp. 490–504, Apr. 2001.

BIBLIOGRAPHY 115

[44] K. Tamura, “Short pulse lasers and their applications to optical communica-

tions,” in IEEE Lasers and Electro-Optics Society, vol. 2, pp. 537–538, 1999.

[45] E. Avrutin, J. Marsh, and E. Portnoi, “Monolithic and multi-gigahertz mode-

locked semiconductor lasers: constructions, experiments, models and applica-

tions,” IEE Proceedings of Optoelectronics, vol. 147, pp. 251 –278, Aug. 2000.

[46] S. Shen and A. Wiener, “Demonstration of timing skew compensation for bit-

parallel WDM data transmission with picosecond precision,” IEEE Photonics

Technology Letters, vol. 11, pp. 566–568, May 1999.

[47] L. Boivin, M. Nuss, J. Shah, D. Miller, and H. Haus, “Receiver sensitivity

improvement by impulsive coding,” IEEE Photonics Technology Letters, vol. 9,

pp. 684–686, May 1997.

[48] J. Brubaker, F. McCormick, F. Tooley, J. Sasian, T. Cloonan, A. Lentine,

S. Hinterlong, and M. Herron, “Optomechanics of a free-space photonic switch:

the components,” in Proceedings of the SPIE, vol. 1533, Dec. 1991.

[49] Hans Peter Herzig, Micro-Optics Elements, Systems and Applications. Taylor

& Francis Inc., 1997.

[50] J. Jahns and S. Sinzinger, Microoptics. John Wiley & Sons, 1999.

[51] J. Lin, J. Gamelin, S. Wang, M. Hong, and J. Mannaerts, “Short pulse gen-

eration by electrical gain switching of vertical cavity surface emitting laser,”

Electronics Letters, vol. 27, pp. 1956–1958, Oct. 1991.

[52] N. Stelmakh, J.-M. Lourtioz, G. Marquebielle, G. Volluet, and J.-P. Hirtz,

“Generation of high-energy (0.3 /spl mu/ J) short pulses (400 ps) from a gain-

switched laser diode stack with subnanosecond electrical pump pulses,” Journal

on Selected Topics in Quantum Electronics, vol. 3, pp. 245–249, Apr. 1997.

[53] C. Chang, C. Sun, D. Albares, and E. Jacobs, “High-energy (59 pJ) and

low-jitter (250 fs) picosecond pulses from gain-switching of a tapered-stripe

BIBLIOGRAPHY 116

laser diode via resonant driving,” IEEE Photonics Technology Letters, vol. 8,

pp. 1157–1159, Sept. 1996.

[54] B.-L. Lee and C.-F. Lin, “Short-pulse generation with broad-band tunability

from semiconductor lasers in an external ring cavity,” IEEE Photonics Tech-

nology Letters, vol. 12, pp. 618–620, June 2000.

[55] S. Arahira, Y. Matsui, T. Kunii, S. Oshiba, and Y. Ogawa, “Optical short pulse

generation at high repetition rate over 80 GHz from a monolithic passively

modelocked DBR laser diode,” Electronics Letters, vol. 29, pp. 1013–1015, May

[56] L. Krainer, R. Paschotta, G. Spuhler, I. Klimov, C. Teisset, K. Weingarten,

and U. Keller, “Tunable picosecond pulse-generating laser with repetition rate

exceeding 10 GHz,” Electronics Letters, vol. 38, pp. 225–227, Feb. 2002.

[57] A. Krishnamoorthy, T. Woodward, R. Novotny, K. Goossen, J. Walker,

A. Lentine, L. D’Asaro, S. Hui, B. Tseng, R. Leibenguth, D. Kossives,

D. Dahringer, L. Chirovsky, G. Aplin, R. Rozier, F. Kiamilev, and D. Miller,

“Ring oscillators with optical and electrical readout based on hybrid GaAs

MQW modulators bonded to 0.8 um silicon VLSI circuits,” Electronics Letters,

vol. 31, pp. 1917 –1918, Oct. 1995.

[58] T. Woodward, A. Krishnamoorthy, K. Goossen, J. Walker, B. Tseng, J. Lothian,

S. Hui, and R. Leibenguth, “Modulator-driver circuits for optoelectronic VLSI,”

IEEE Photonics Technology Letters, vol. 9, pp. 839–841, June 1997.

[59] E. McCluskey, Logic Design Principles: with Emphasis on Testable Semicustom

Circuits. Prentice-Hall, 1986.

[60] S. Golomb, Shift Register Sequence. Aegean Park Press, 1982.

[61] E. Yeung and A. Horowitz, “A 2.4 Gb/s/pin Simultaneous Bidirectional Parallel

Link with Per-Pin Skew Compensation,” Journal of Solid State Circuits, vol. 35,

pp. 1619–1628, Nov. 2000.

BIBLIOGRAPHY 117

[62] P. Larsson and C. Svensson, “Measuring high-bandwidth signals in CMOS cir-

cuits,” Electronics Letters, vol. 29, pp. 1761 – 1762, Sept. 1993.

[63] R. Ho, B. Amrutur, K. Mai, B. Wilburn, T. Mori, and M. Horowitz, “Appli-

cations of on-chip samplers for test and measurement of integrated circuits.,”

IEEE Symposium on VLSI Circuits, pp. 138–139., June 1998.

[64] S. Tewksbury, L. Hornak, H. Nariman, S. Langsjoen, and S. McGinnis, “Coin-

tegration of optoelectronics and submicron CMOS,” in Proceedings of Wafer

Scale Integration, pp. 358 – 367, Jan. 1993.

[65] A. Krishnamoorthy and K. Goossen, “Optoelectronic-VLSI: photonics inte-

grated with VLSI circuits,” IEEE Journal on Selected Topics in Quantum Elec-

tronics, vol. 4, pp. 899 –912, Nov. 1998.

[66] A. Krishnamoorthy, A. Lentine, K. Goossen, J. Walker, T. Woodward,

J. Ford, G. Aplin, L. D’Asaro, S. Hui, B. Tseng, R. Leibenguth, D. Kossives,

D. Dahringer, M. Chirovsky, and D. Miller, “3-D integration of MQW modula-

tors over active submicron CMOS circuits: 375 Mb/s transimpedance receiver-

transmitter circuit,” IEEE Photonics Technology Letters, vol. 7, pp. 1288 –1290,

Nov. 1995.

[67] H. Wang, J. Luo, K. Shenoy, Y. Royter, J. Fonstad, C. G., and D. Psaltis,

“Monolithic integration of SEEDs and VLSI GaAs circuits by epitaxy on elec-

tronics,” IEEE Photonics Technology Letters, vol. 9, pp. 607–609, May 1997.

[68] M. Oren, A. McCarthy, F. Tooley, A. Laprise, D. Plant, A. Kirk, Y. Lu, and

J. Zhao, “Device processing technology for free-space optical interconnect sys-

tem,” in Electronic Components and Technology Conference, pp. 886–889, 2001.

[69] H. Chen, K. Liang, Q. Zeng, X. Li, Z. Chen, Y. Du, and R. Wu, “Flip-chip

bonded hybrid CMOS/SEED optoelectronic smart pixels,” IEE Proceedings of

Optoelectronics, vol. 147, pp. 2–6, Feb. 2000.

BIBLIOGRAPHY 118

[70] S. Personick, “Receiver design for optical fiber systems,” Proceedings of the

IEEE, vol. 65, no. 12, pp. 1670–1678, 1977.

[71] T. Woodward, A. Krishnamoorthy, A. Lentine, and L. Chirovsky, “Optical re-

ceivers for optoelectronic VLSI,” IEEE Journal on Selected Topics in Quantum

Electronics, vol. 2, pp. 106–116, Apr. 1996.

[72] T. Nakahara, H. Tsuda, K. Tateno, S. Matsuo, and T. Kurokawa, “Hybrid

integration of GaAs pin-photodiodes with CMOS transimpedance amplifier cir-

cuits,” Electronics Letters, vol. 34, pp. 1352–1353, June 1998.

[73] G. Halkias, N. Haralabidis, E. Kyriakis-Bitzaros, and S. Katsafouros, “1.7

GHz bipolar optoelectronic receiver using conventional 0.8 /spl mu/m BiC-

MOS process,” in IEEE International Symposium on Circuits and Systems,

vol. 5, pp. 417–420, 2000.

[74] N. Dutta, K. Tu, and B. Levine, “Optoelectronic integrated receiver,” Electronic

Letters, vol. 33, pp. 1254–1255, July 1997.

[75] J. Choi, B. Sheu, and O. Chen, “A monolithic GaAs receiver for optical inter-

connect systems,” IEEE Journal of Solid State Circuits, vol. 29, pp. 328–331,

Mar. 1994.

[76] H. Zimmermann, T. Heide, and A. Ghazi, “Monolithic high-speed CMOS-

photoreceiver,” Photonics Technology Letters, vol. 11, pp. 254–256, Feb. 1999.

[77] A. Tanabe, M. Soda, Y. Nakahara, T. Tamura, Y. Yoshida, and A. Furukawa,

“A Single-Chip 2.4-Gb/s CMOS Optical Receiver IC with Low Substrate Cross-

Talk Preamplifier,” IEEE Journal of Solid State Circuits, vol. 33, pp. 2148–

2153, Dec. 1998.

[78] A. Krishnamoorthy, T. Woodward, K. Goossen, J. Walker, A. Lentine, L. Chi-

rovsky, S. Hui, B. Tseng, R. Leibenguth, J. Cunningham, and W. Jan, “Op-

eration of a single-ended 550 Mbit/s, 41 fJ, hybrid CMOS/MQW receiver-

transmitter,” Electronics Letters, vol. 32, pp. 764–766, Apr. 1996.

BIBLIOGRAPHY 119

[79] T. Woodward and L. Chirovsky, “Operation of diode-clamped FET-SEED op-

tical receivers with low-contrast single-ended signals,” Photonics Technology

Letters, vol. 7, pp. 1489–1492, Dec. 1995.

[80] T. Yoon and B. Jalali, “1 Gbit/s fibre channel CMOS transimpedance ampli-

fier,” Electronics Letters, vol. 33, pp. 588–589, Mar. 1997.

[81] D. Blerkom, Chi Fan, M. Blum, and S. Esener, “Transimpedance receiver design

optimization for smart pixel arrays,” IEEE Journal of Lightwave Technology,

vol. 16, pp. 119–126, Jan. 1998.

[82] M. Forbes, Electronic design issues in high-bandwidth parallel optical interfaces

to VLSI circuits. PhD thesis, Heriot-Watt University, Mar. 1999.

[83] T. Woodward, “Optical receivers for smart pixel applications,” in Lasers and

Electro-Optics Society Annual Meeting, vol. 1, pp. 67–68, 1995.

[84] P. Winzer and A. Kalmar, “Sensitivity enhancement of optical receivers by

impulsive coding,” Journal of Lightwave Technology, vol. 17, pp. 171–177, Feb.

[85] J. Dines, “Smart pixel optoelectronic receiver based on a charge sensitive ampli-

fier design,” IEEE Journal on Selected Topics in Quantum Electronics, vol. 2,

pp. 117–120, Apr. 1996.

[86] T. Woodward, A. Krishnamoorthy, K. Goossen, J. Walker, J. Cunningham,

W. Jan, L. Chirovsky, S. Hui, B. Tseng, D. Kossives, D. Dahringer, D. Ba-

con, and R. Leibenguth, “Clocked-sense-amplifier-based smart-pixel optical re-

ceivers,” Photonics Technology Letters, vol. 8, pp. 1067–1069, Aug. 1996.

[87] M. Kuijk, D. Coppee, and R. Vounckx, “Spatially modulated light detector in

CMOS with sense-amplifier receiver operating at 180 Mb/s for optical data link

applications and parallel optical interconnects between chips,” IEEE Journal

on Selected Topics in Quantum Electronic, vol. 4, pp. 1040–1045, Nov. 1998.

BIBLIOGRAPHY 120

[88] M. Matsui, H. Hara, Y. Uetani, Lee-Sup Kim, T. Nagamatsu, Y. Watanabe,

A. Chiba, K. Matsuda, and T. Sakurai, “A 200 MHz 13 mm/sup 2/ 2-D DCT

macrocell using sense-amplifying pipeline flip-flop scheme,” IEEE Journal of

Solid-State Circuits, vol. 29, pp. 1482–1490, Dec. 1994.

[89] B. Nikolic, V. Oklobdzija, V. Stojanovic, J. Wenyan, K. James, and L. Ming-

Tak, “Improved sense-amplifier-based flip-flop: design and measurements,”

IEEE Journal of Solid-State Circuits, vol. 35, pp. 876–884, June 2000.

[90] K. Williams, M. Dennis, I. Duling, C. Villarruel, and R. Esman, “A simple

high-speed high-output voltage digital receiver,” Photonics Technology Letters,

vol. 10, pp. 588–590, Apr. 1998.

[91] M. Yoneyama, K. Takahata, T. Otsuji, and Y. Akazawa, “Analysis and applica-

tion of a novel model for estimating power dissipation of optical interconnections

as a function of transmission bit error rate,” Journal of Lightwave Technology,

vol. 14, pp. 13–22, Jan. 1996.

[92] G. Keeler, D. Agarwal, B. Nelson, N. Helman, and D. Miller, “Performance

enhancement of an optical interconnect using short pulses from a modelocked

diode laser,” in Conference on Lasers and Electro-Optic Society, 2002.

[93] W. Dally and J. Poulton, Digital Systems Engineering. Cambridge University

Press, 1998.

[94] A. Dowlatabadi, “Challenges in CMOS mixed-signal designs for analog circuit

designers,” in Midwest Symposium on Circuits and Systems, vol. 1, pp. 47–50,

[95] G. Keeler, D. Agarwal, C. Debaes, B. Nelson, C. Helman, H. Thienpont, and

D. Miller, “Optical pump-probe measurements of the latency of silicon CMOS

optical interconnects,” IEEE Photonics Technology Letters, vol. 14, pp. 1214–

1216, Aug. 2002.

BIBLIOGRAPHY 121

[96] D. Agarwal, G. Keeler, B. Nelson, N. Helman, and D. Miller, “Optical inter-

connect operation with high noise immunity,” in Conference on Lasers and

Electro-Optic Society, 2002.

[97] A. Deutsch, P. Coteus, G. Kopcsay, H. Smith, C. Surovic, B. Krauter, D. Edel-

stein, and P. Restle, “On-chip wiring design challenges for gigahertz operation,”

Proceedings of the IEEE, vol. 89, pp. 529–555, Apr. 2001.

[98] J. Dambre, H. Van Marck, and J. Van Campenhout, “Quantifying the impact

of optical interconnect latency on the performance of optoelectronic FPGAs,”

in The 6th International Conference on Parallel Interconnects, pp. 91–97, Oct.

[99] J. Collet, D. Litaize, J. VanCampenhout, C. Jesshope, M. Desmulliez, H. Thien-

pont, J. Goodman, and A. Louri, “Architectural approach to the role of optics in

monoprocessor and multiprocessor machines,” Applied Optics, vol. 39, pp. 671–

682, Feb. 2000.

[100] H. Neefs, P. Van Heuven, and J. Van Campenhout, “Latency requirements of

optical interconnects at different memory hierarchy levels of a computer sys-

tem,” in Proceedings of SPIE on Optics Computing, vol. 3490, pp. 552–555,

[101] E. Kyriakis-Bitzaros, N. Haralabidis, Y. Moisiadis, M. Lagadas, A. Georgakilas,

and G. Halkias, “Comparison of the signal latency in optical and electrical

interconnections for interchip links,” Optical Engineering, vol. 40, pp. 144–146,

Jan. 2001.

[102] B. Cherkauer and E. Friedman, “A unified design methodology for CMOS ta-

pered buffers,” IEEE Transactions on Very Large Scale Integration (VLSI) Sys-

tems, vol. 3, pp. 99–111, Mar. 1995.

[103] E. Kyriakis-Bitzaros, N. Haralabidis, M. Lagadas, A. Georgakilas, Y. Moisiadis,

and G. Halkias, “Realistic end-to-end simulation of the optoelectronic links

BIBLIOGRAPHY 122

and comparison with the electrical interconnections for system-on-chip applica-

tions,” Journal of Lightwave Technology, vol. 19, pp. 1532–1542, Oct. 2001.

[104] J. Weiland, H. Melchior, M. Kearley, C. Morris, A. Moseley, M. Goodwin,

and R. Goodfellow, “Optical receiver array in silicon bipolar technology with

selfaligned, low parasitic III/V detectors for DC-1 Gbit/s parallel links,” Elec-

tronics Letters, vol. 27, pp. 2211–2213, Nov. 1991.

[105] D. Agarwal and D. Miller, “Latency in short pulse based optical interconnects,”

in The 14th Annual Meeting of the IEEE Lasers and Electro-Optics Society,

vol. 2, pp. 812–813, 2001.

[106] B. Wooley, “EE315 class notes.”

[107] M. Horowitz, “EE372 class notes.”

[108] H. Johansson and C. Svensson, “Time Resolution of NMOS Sampling Switches

Used on Low-Swing Signals,” IEEE Journal of Solid State Circuits, vol. 33,

pp. 237–245, Feb. 1998.

[109] G. Keeler, B. Nelson, D. Agarwal, and D. Miller, “Skew and jitter removal using

short optical pulses for optical interconnection,” IEEE Photonics Technology

Letters, vol. 12, pp. 1041–1135, June 2000.

[110] P. Restle, T. McNamara, D. Webber, P. Camporese, K. Eng, K. Jenkins,

D. Allen, M. John, M. Quaranta, D. Boerstler, C. Alpert, C. Carter, R. Bai-

ley, and J. Petrovick, “A clock distribution network for microprocessors,” IEEE

Journal of Solid-State Circuits, vol. 36, pp. 792–799, June 2000.

[111] X. Jiang and S. Horiguchi, “Optimization of wafer scale H-tree clock distribu-

tion network based on a new statistical skew model,” in IEEE International

Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 96–104, 2000.

[112] J.-H. Yeh, R. Kostuk, and Kun-Yii Tu, “Board level H-tree optical clock dis-

tribution with substrate mode holograms,” Journal of Lightwave Technology,

vol. 13, pp. 1566–1578, July 1995.

BIBLIOGRAPHY 123

[113] H. Fair and D. Bailey, “Clocking design and analysis for a 600 MHz Alpha

microprocessor,” in IEEE Solid-State Circuits Conference, pp. 398–399, 1998.

[114] Y. Ismail, E. Friedman, and J. Neves, “Exploiting the on-chip inductance in

high-speed clock distribution networks,” IEEE Transactions on Very Large

Scale Integration (VLSI) Systems, vol. 9, pp. 963–973, Dec. 2001.

[115] G. Pratt and J. Nguyen, “Distributed synchronous clocking,” IEEE Transac-

tions on Parallel and Distributed Systems, pp. 316–330, Mar. 1995.

[116] P. Varma and K. Ramganesh, “Skewing clock to decide races - double-edge-

triggered flip-flop,” Electronics Letters, vol. 37, pp. 1506–1507, Dec. 2001.

[117] P. Zarkesh-Ha, T. Mule, and J. Meindl, “Characterization and modeling of clock

skew with process variations,” in Proceedings of the IEEE Custom Integrated

Circuits, pp. 441–444, May 1999.

[118] V. Mehrotra and D. Boning, “Technology scaling impact of variation on clock

skew and interconnect delay,” in Interconnect Technology Conference, pp. 122–

124, June 2001.

[119] C. Zhao and R. Chen, “Performance consideration of three-dimensional opto-

electronic interconnection for intra-multichip-module clock signal distribution,”

Applied Optics, pp. 2537–2544, Apr. 1997.

[120] P. Delfyett, D. Hartman, and S. Ahmad, “Optical clock distribution using a

mode-locked semiconductor laser diode system,” Journal of Lightwave Technol-

ogy, vol. 9, pp. 1646–1649, Dec. 1991.

[121] S. Kawanishi, Y. Yamabayashi, T. Takada, H. Takara, M. Saruwatari, and

K. Nakagawa, “2 Gb/s operation of an optical-clock-driven monolithically in-

tegrated GaAs D-flip-flop with metal-semiconductor-metal photodetectors for

high-speed synchronous circuits,” Photonics Technology Letters, vol. 4, pp. 160–

162, Feb. 1992.

BIBLIOGRAPHY 124

[122] T. Woodward and A. Krishnamoorthy, “1-Gb/s integrated optical detectors and

receivers in commercial CMOS technologies,” IEEE Journal on Selected Topics

in Quantum Electronics, vol. 5, pp. 146–156, Mar. 1999.

[123] G. E. Stillman, V. M. Robbins, and N. Tabatabaie, “III-V compound semi-

conductor devices: Optical detectors,” IEEE Transaction on Electron Devices,

vol. ED-31, p. 1643 1655, 1984.

[124] R. Perry, “Analysis and characterization of the spectral response of CMOS

based integrated circuit (IC) photodetectors,” in Proceedings of the Thir-

teenth Biennial University/Government/Industry Microelectronics Symposium,

pp. 170 – 175, June 1999.

[125] S. Csutak, J. Schaub, W. Wu, and J. Campbell, “High-speed monolithically in-

tegrated silicon optical receiver fabricated in 130-nm CMOS technology,” IEEE

Photonics Technology Letters, vol. 14, pp. 516 – 518, Apr. 2002.

[126] T. Heide, A. Ghazi, H. Zimmermann, and P. Seegebrecht, “Monolithic CMOS

photoreceivers for short-range optical data communications,” Electronics Let-

ters, vol. 35, pp. 1655–1656, Sept. 1999.

[127] C. Rooman, D. Coppee, and M. Kuijk, “Asynchronous 250-Mb/s optical re-

ceivers with integrated detector in standard CMOS technology for optocoupler

applications,” IEEE Journal of Solid-State Circuits, vol. 35, pp. 953–958, July

[128] C. Schow, J. Schaub, R. Li, J. Qi, and J. Campbell, “A monolithically in-

tegrated 1-Gb/s silicon photoreceiver,” IEEE Photonics Technology Letters,

vol. 11, pp. 20 –121, Jan. 1999.

[129] J.-F. Roux, J.-L. Coutaz, and S. Tedjini, “All-optical high-frequency charac-

terization of optical devices for optomicrowave applications,” IEEE Photonics

Technology Letters, vol. 12, pp. 1031 – 1033, Aug. 2000.

BIBLIOGRAPHY 125

[130] C. Debaes, D. Agarwal, A. Bhatnagar, H. Thienpont, and D. Miller, “High-

impedance high-frequency silicon detector response for precise receiverless op-

tical clock injection,” in Proceedings of the SPIE, vol. 4654, pp. 78–88, 2002.

[131] S. Wagner and T. Chapuran, “Broadband high-density WDM transmission us-

ing superluminescent diodes,” Electronics Letters, vol. 26, pp. 696–697, May

[132] D. Sampson and W. Holloway, “100 mW spectrally-uniform broadband

ASE source for spectrum-sliced WDM systems,” Electronics Letters, vol. 30,

pp. 1611–1612, Oct. 1994.

[133] M. Nuss, W. Knox, and D. Miller, “Dense WDMwith femtosecond laser pulses,”

in Lasers and Electro-Optics Society Annual Meeting, vol. 2, pp. 199–200, 1994.

[134] L. Boivin, M. Nuss, S. Cundiff, W. Knox, and J. Stark, “103-channel chirped-

pulse WDM transmitter,” in Conference on Optical Fiber Communication,

pp. 276–277, 1997.

[135] B. Collings, M. Mitchell, L. Boivin., and W. Knox, “A 1021 channel WDM

system,” IEEE Photonics Technology Letters, vol. 12, pp. 906–908, July 2000.

[136] B. Nelson, G. Keeler, D. Agarwal, N. Helman, and D. Miller, “Demonstration

of a wavelength division multiplexed chip-to-chip optical interconnect,” Con-

ference on Lasers and Electro-Optic Society, 2002.

[137] H. Shi, J. Finlay, G. Alphonse, J. Connolly, and P. Delfyett, “Multiwavelength

10-GHz picosecond pulse generation from a single-stripe semiconductor diode

laser,” IEEE Photonics Technology Letters, vol. 9, pp. 1439–1441, Nov. 1997.

[138] H. Thienpont, C. Debaes, V. Baukens, H. Ottevaere, P. Vynck, P. Tuteleers,

G. Verschaffelt, B. Volckaerts, A. Hermanne, M. Hanney, and I. Veretennicoff,

“Plastic Micro-Optical Interconnection Modules for Parallel Free-space inter-

and intra-MCM Data Communication,” in Proceedings of the IEEE, 2000.

OPTICAL INTERCONNECTS TO SILICON CHIPS USING...

Documents