Post on 20-Mar-2018
transcript
OPTICAL INTERCONNECTS TO SILICON CHIPS USING
SHORT PULSES
a dissertation
submitted to the department of electrical engineering
and the committee on graduate studies
of stanford university
in partial fulfillment of the requirements
for the degree of
doctor of philosophy
Diwakar Agarwal
September 2002
c© Copyright 2002 by Diwakar Agarwal
All Rights Reserved
ii
I certify that I have read this dissertation and that in
my opinion it is fully adequate, in scope and quality, as
a dissertation for the degree of Doctor of Philosophy.
David A. B. Miller(Principal Adviser)
I certify that I have read this dissertation and that in
my opinion it is fully adequate, in scope and quality, as
a dissertation for the degree of Doctor of Philosophy.
Joseph W. Goodman
I certify that I have read this dissertation and that in
my opinion it is fully adequate, in scope and quality, as
a dissertation for the degree of Doctor of Philosophy.
Mark A. Horowitz
Approved for the University Committee on Graduate
Studies:
iii
Abstract
Processor speeds continue to increase rapidly due to the scaling of CMOS line-widths,
but electrical interconnect speeds have not grown at the same rate. The loss mecha-
nisms in electrical interconnects limit their ultimate capacity. Optical interconnects
have the potential to alleviate this interconnect bottleneck. At short scales such as
board-to-board, chip-to-chip, and on-chip, the important requirements for these opti-
cal interconnects are low latency, high throughput, high density, high bandwidth, and
simple integration with mainstream silicon technology. This thesis investigates optical
interconnects designed to meet these requirements using short pulses, in conjunction
with multiple quantum well (MQW) diodes filp-chip bonded to silicon CMOS chips.
The use of short optical pulses (100 fs to a few ps), equivalent to a return-to-zero
(RZ) format with very low duty cycle, has many potential advantages. We show
that using short pulses in optical links can, a) enhance the sensitivity of the receiver;
b) remove skew and jitter from an array of transmitters (modulators); c) deliver a
precise clock signal; d) reduce the latency of the receiver; and e) enable wavelength
division multiplexing. Furthermore, the sensitivity of the receivers can be enhanced
by 3 dB or more by using short pulses, which improves the overall system power
budget. The latency of trans-impedance and integrating receivers can be reduced by
greater than 60%, which might make global on-chip optical interconnects feasible.
The latency can be even further reduced by using a totem-pole diode pair without
amplification at the expense of optical power. All these benefits are investigated
through simulations and a series of experimental demonstrations.
iv
Acknowledgments
There are a lot of people who have made it possible for me to be at this stage, and in
the process helped me in my academic and personal growth. I would like to express
my sincere thanks to all of them.
First, thanks to Dr. David Miller for his constant guidance. During the course
of my research he has always been very encouraging. I have learned a lot of stuff
from him, but one thing stands out in my mind. He has always said that if there is a
problem, which is getting difficult to figure out, go to the basics. It is amazing how
many times we forget to do this, even though it is such common sense.
I would like to thank Dr. Joseph Goodman for getting me interested in the area of
optics when I came to Stanford. He has been providing valuable advice and guidance
whenever required. I would also like to thank Dr. Mark Horowitz for reading my
thesis and giving me excellent critiques. Dr. Horowitz has patiently listened to my
ideas and given helpful suggestions over the course of my research. Access to his
hardware lab was also very helpful in testing my chips. Thanks are also due to Dr.
Fabian Pease for serving in my examination committee.
The optical interconnect project required the collaboration of several people to
make it successful. I would like to thank Gordon and Noah for their tireless work on
processing and flip-chip bonding of modulators, and for the squash games that took
the frustration out. Bianca designed the baseplates for the optical setups. Without
these components, this work would not have been possible. Christof Debaes of Vrije
Universiteit Brussel worked very closely with me on the design of the chip fabricated
through National Semiconductor. It was a lot of fun and a learning experience working
with him. Optical testing was a joint effort with all the students mentioned above.
v
Vijit, Ryo, and Helen worked for the initial development of the flip-chip bonding
process. Aparna and Ray have been very helpful by asking detailed questions during
their circuit design learning process. Micah has been a constant source of inspiration
for finishing my work. Volkan and Helen have been a sounding board for my ideas
and complaints. Coffee breaks with Martina provided relief from the “hard routine”
in which Christof was also a participant whenever he was visiting. Late night chats
with Sameer were quite refreshing and tennis with Henry was a lot of fun. I am also
grateful for administrative support of Ingrid Tarien. I do want to thank everybody
in the Miller group for making my experience enjoyable as well as valuable.
Ted Woodward and Ashok Krishnamoorthy kindly educated me in optical inter-
connect and receiver design during my summer internship at Bell Laboratory. Bill
Ellersick, Stefanos Sidiropoulos, Ken Yang, Amrutur Bharadwaj, and Evelina Yeung
in Dr. Horowitz’s group have helped by answering questions and providing circuit
design help. Gibong Jeong, Jane Lam, and Edmund Lam, former students of Dr.
Goodman have also been very helpful. I would also like to thank National Semicon-
ductor for the fabrication of the CMOS chip and JSEP, DARPA, and MARCO for
funding the research.
Friends outside the laboratory have given me company in many sports and fun
activities, which I enjoyed very much. My sisters, Anamika and Swarnima, reminded
me at regular intervals of possible life after Ph.D. I would like to thank my wife Kokila
for motivating me to finish my thesis and making it a worthwhile experience. She has
tried to read the thesis from an architect’s point of view and reminded me to make
it look more artistic. Thanks to her, my thesis has less mistakes. Any remaining
mistakes are solely my fault. Finally, I would like to thank my parents who have
always given me their unconditional support and without whom I would never have
made it here in the first place.
vi
Contents
Abstract iv
Acknowledgments v
1 Introduction 1
1.1 Potential advantages of optics . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Limitations of electrical interconnects . . . . . . . . . . . . . . 3
1.1.2 Other advantages of optics . . . . . . . . . . . . . . . . . . . . 5
1.2 Components of an optical interconnect . . . . . . . . . . . . . . . . . 6
1.2.1 Optoelectronic devices . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Free space optical interconnects . . . . . . . . . . . . . . . . . 10
1.3 Challenges in current optical communication . . . . . . . . . . . . . . 11
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Short Pulses in Interconnects 16
2.1 Improved receiver performance . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Low latency in receivers . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Better synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Wavelength division multiplexing (WDM) . . . . . . . . . . . . . . . 22
3 Optical Interconnect Setup and Components 24
3.1 Optical test bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 MQW diodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
vii
3.3 Silicon chips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Modulator driver . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.2 Pseudo random bit sequence (PRBS) generator and tester . . 32
3.3.3 Samplers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Hybrid integration of GaAs devices . . . . . . . . . . . . . . . . . . . 34
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Receivers 38
4.1 Transimpedance receiver . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Integrating receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Totem-pole diode pair receiver . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Fabrication and testing . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.1 Transimpedance receiver . . . . . . . . . . . . . . . . . . . . . 53
4.4.2 Integrating receiver . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.3 Measurement with supply noise . . . . . . . . . . . . . . . . . 58
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5 Latency in Interconnects 62
5.1 Transimpedance receivers . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1.1 Modeling of latency . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.2 Measurement of latency . . . . . . . . . . . . . . . . . . . . . 71
5.2 Integrating Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3 Totem-pole diode receiver . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Scaling of latency with technology . . . . . . . . . . . . . . . . . . . . 78
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6 Timing in Silicon Chips 81
6.1 Jitter and skew removal . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 Optical clock injection . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.2.1 Silicon detectors . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.2 Frequency response of silicon detectors . . . . . . . . . . . . . 87
6.2.3 Receiverless clock injection . . . . . . . . . . . . . . . . . . . . 90
viii
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7 Wavelength Division Multiplexing System 95
7.1 Concept of WDM with short pulses . . . . . . . . . . . . . . . . . . . 96
7.2 System implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.2.1 Optical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.2.2 Measurement results . . . . . . . . . . . . . . . . . . . . . . . 103
7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8 Conclusions 106
Bibliography 110
ix
List of Tables
5.1 Receiver latency with NRZ and short pulse inputs. Optical energy per
bit for the transimpedance and integrating receivers is ∼ 50 fJ, and for
the recless receiver is 450 fJ. . . . . . . . . . . . . . . . . . . . . . . . 80
6.1 The dimensions and the capacitances of the silicon detectors imple-
mented in this work. Two n-well detectors and two interdigitated de-
tectors of different sizes were chosen. . . . . . . . . . . . . . . . . . . 87
x
List of Figures
1.1 Interconnects at different levels . . . . . . . . . . . . . . . . . . . . . 2
1.2 MQW modulator operation . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Wavelength vs. contrast ratio curve for MQW modulator for different
voltage swings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Schematic demonstration of NRZ and RZ coding . . . . . . . . . . . . 13
2.1 A pulse train and its spectrum . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Short pulse properties . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Sensitivity enhancement in transimpedance receiver with short pulses 19
2.4 Timing diagram of the integrating receiver with short pulse and NRZ
inputs. Energy incident during the evaluation phase is not integrated. 20
2.5 Skew removal from multiple parallel channels using short pulses. The
three waveforms are electrical drive signals and they are read by a short
pulse which samples all the channels at the same time. . . . . . . . . 21
2.6 Spectral slicing of short pulse spectrum for WDM . . . . . . . . . . . 22
3.1 Schematic diagram of an optical interconnect system . . . . . . . . . 25
3.2 Optomechanical setup for testing . . . . . . . . . . . . . . . . . . . . 26
3.3 Schematic and the picture of totem-pole connected diodes . . . . . . 28
3.4 Layout of the chip fabricated in the 0.5 µm HP process . . . . . . . . 29
3.5 Layout of the chip fabricated in the 0.25 µm National Semiconductor
process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6 Eye diagram of modulator driver operation at 800 Mb/s obtained by
optical readout of the modulator. . . . . . . . . . . . . . . . . . . . . 31
xi
3.7 Schematic of a LFSR generating a pseudo random sequence of length
27 − 1, where a square corresponds to a D flip-flop. . . . . . . . . . . 32
3.8 Schematic of the circuit to verify the sequence generated by the LFSR
shown earlier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.9 The circuit schematic of the on-chip sampler in 0.25 µm CMOS tech-
nology. All transistors are minimum length. (Yeung et al. [61]) . . . 33
3.10 Integration of GaAs devices on silicon chips . . . . . . . . . . . . . . 35
3.11 Picture of a CMOS chip with flip chip bonded diodes . . . . . . . . . 36
4.1 Transimpedance receiver structure . . . . . . . . . . . . . . . . . . . . 41
4.2 Schematic of the transimpedance frontend and the small-signal equiv-
alent circuit of its implementation. . . . . . . . . . . . . . . . . . . . 42
4.3 Pulse and step response of the transimpedance stage . . . . . . . . . 43
4.4 Pulse and step response of the transimpedance stage with varying feed-
back resistance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Pulse response of the transimpedance stage with varying feedback re-
sistances normalized to the maximum of step response. . . . . . . . . 45
4.6 Pulse and step response of the transimpedance stage with varying
front-end capacitance. . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.7 Pulse response of the transimpedance stage with varying pulse width 47
4.8 Schematic of the integrating receiver frontend . . . . . . . . . . . . . 48
4.9 Timing diagram of the operation of integrating receiver with NRZ and
short pulse inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.10 Input data arrival-tolerance margins illustrated for NRZ and short
pulse inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.11 Totem-pole diode pair connected to a high impedance input node of
inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.12 Schematic of the transimpedance receiver. Transistor widths men-
tioned here are in λ, where λ = 0.2 µm for the technology used. All
transistors are minimum length. . . . . . . . . . . . . . . . . . . . . . 54
xii
4.13 SPICE simulation of the transimpedance receiver with 10 µA average
photocurrent. Voltage at node out is shown. Top curve is for 1 Gbps
operation of the receiver with 260 fF of diode capacitance. Bottom
curve shows the operation at 1.5 Gbps with 100 fF of diode capacitance. 55
4.14 Eye diagram of the transimpedance receiver operation with NRZ input
at 600 Mb/s. 26 µA average photocurrent is injected in each beam. . 56
4.15 Eye diagram of the transimpedance receiver output voltage with short
pulse input at 80 Mb/s. . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.16 Schematic of the integrating receiver fabricated in the 0.5 µm tech-
nology. Transistor widths are shown in λ, where λ is 0.35 µm. All
transistors are minimum length. . . . . . . . . . . . . . . . . . . . . . 57
4.17 Operation of the integrating receiver with optical readout at 600Mb/s. 57
4.18 Sensitivity comparison for NRZ and short pulse data for integrating
receiver operating at 400 Mbps in a chip-to-chip link. . . . . . . . . . 58
4.19 Transimpedance receiver delay variation as a function of supply volt-
age. This measurement was done via the pump-probe technique. The
nominal supply voltage was 2.5 V. . . . . . . . . . . . . . . . . . . . . 59
4.20 Bit error rate curves of integrating receiver operation in a link at
100 Mbps with NRZ data. Sinusoid noise was injected in the supply
with different peak-to-peak values at 1 KHz. . . . . . . . . . . . . . . 60
5.1 ITRS projection of on-chip electrical interconnect delays with technol-
ogy scaling [42] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Components of latency in a modulator-based interconnect system . . 64
5.3 Mechanism of latency reduction in a transimpedance receiver with
short pulse input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4 First order model of a transimpedance receiver with variable length
post-amplifier chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.5 Pulse energy vs. delay for short pulse and NRZ input for the first-order
model. Corresponding SPICE simulations are denoted with “x”. . . . 68
xiii
5.6 Variation of delay vs. number of post-amplifier stages for different total
gain, assuming a constant gain-bandwidth product for all stages. . . . 69
5.7 Number of post-amplifier stages vs. delay for different pulse energy . 70
5.8 Pulse energy vs. receiver delay for 2 and 3 post-amplifier stages . . . 70
5.9 Pump-probe setup for transceiver latency measurement . . . . . . . . 72
5.10 Receiver transmitter module used for testing latency via pump-probe
method. The numbers mentioned here are the sizes of PMOS and
NMOS transistors in λ, where λ = 0.2 µm. . . . . . . . . . . . . . . 73
5.11 Comparison of the latency of the transimpedance receiver-transmitter
module with short pulse and NRZ inputs. . . . . . . . . . . . . . . . 73
5.12 Circuit schematic of the integrating receiver frontend . . . . . . . . . 74
5.13 Latency with respect to clock in the integrating receiver with NRZ and
short pulse inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.14 Latency of the entire integrating receiver, including the SR latch, with
short pulse input computed by using SPICE circuit simulator. . . . . 76
5.15 Schematic of the totem-pole diode pair receiver connected to the high
impedance input of the inverter buffer. . . . . . . . . . . . . . . . . . 77
5.16 Voltage vs. time at node “in” of the recless receiver for NRZ and short
pulse inputs with minimum optical energy to swing the node by supply
voltage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.17 Comparing the delay of the transimpedance receiver with short pulse
data for 0.25 µm and 0.5 µm technologies by normalizing to FO4 delay
in respective technologies. . . . . . . . . . . . . . . . . . . . . . . . . 78
5.18 FO4 gate delay scaling with technology [107] . . . . . . . . . . . . . . 79
6.1 Transmitted signals from two channels readout with a cw laser. Chan-
nels are skewed by 3/8 of a bit period. . . . . . . . . . . . . . . . . . 83
6.2 Skew removal by short pulse readout of two modulator channels skewed
by 3/8 of a bit period. Ones and zeros are alternately read by these
pulses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
xiv
6.3 Jitter removal from a single interconnect channel. Upper trace is the
electrical drive signal with jitter and the bottom trace is the optical
readout of the receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 A cross-sectional view of two silicon detector topologies . . . . . . . . 86
6.5 The sampled signal trace showing the response of the first interdig-
itated detector to an optical short pulse. The optical energy in the
pulse was 0.74 pJ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.6 The frequency behavior of the various silicon detectors. The response
of the second interdigitated detector was not included for clarity. The
curves have been normalized with respect to their first frequency com-
ponent for comparison reasons. . . . . . . . . . . . . . . . . . . . . . 89
6.7 Schematic of receiverless optical clock injection with optical short pulses
using a totem-pole diode pair. The inverter provides very little capac-
itive loading, though it can be eliminated and clock can be injected
directly at the desired node. . . . . . . . . . . . . . . . . . . . . . . . 91
6.8 Equivalent circuit of the totem-pole pair implementation with interdig-
itated diodes. Due to substrate connection this device was self-resetting. 91
6.9 Receiverless optical clock injection with optical short pulses of 6 pJ
onto the totem-pole configuration of interdigitated detectors. . . . . . 92
6.10 Histogram of the pulse signals crossing at marker level at half their
swing. The histograms correspond to two experiments one of which is
delayed 10 ps more compared to the reference clock. . . . . . . . . . . 93
7.1 An exaggerated view of the frequency comb incident on the modulator
array. Frequency components of the 80 MHz pulse train are separated
in space by a blazed grating. . . . . . . . . . . . . . . . . . . . . . . . 97
7.2 Schematic of the WDM system implementation . . . . . . . . . . . . 98
7.3 First generation optical setup using Spindler and Hoyer components.
The portion on the transmitter side is visible. . . . . . . . . . . . . . 101
7.4 Second generation WDM link optical setup. A closeup of the receiver
side is shown in the picture. . . . . . . . . . . . . . . . . . . . . . . . 102
xv
7.5 CCD scan of the wavelength of the modulated transmitter output.
Solid and dashed lines represent two snapshots at different times. The
corresponding modulators are shown below the wavelength scan. . . . 103
7.6 80 Mb/s operation of a single channel in a WDM link . . . . . . . . . 104
xvi
Chapter 1
Introduction
Modern computer processors run at the clock speeds of many GHz but the processor
to memory interface runs only at a few hundred MHz. A key reason for this difference,
and a problem for computing in general, is that the interface connection speeds are
not able to keep up with the increase in the processor speeds. This is mainly because
of design issues of electrical busses and their underlying physical properties. Due
to the capacity limitations of electrical wires, all long distance communication is
now done via optics. For medium distance communication, e.g. LAN, MAN, WAN
(about 300m-100km), optics is making inroads specifically because only optics can
support the high data rates required by these applications. At shorter distances (a
few meters - few hundred meters), primarily in data links, optics is rapidly gaining
entry. Even at distances shorter than a few meters, research is underway to use optics
for communication purposes.
A categorization of optical links is shown in Fig. 1.1. Short distance communica-
tion can be divided into the following categories: machine-to-machine (a few meters
to 100s of meters), inter-shelf or possibly on large boards or backplanes (a few cm to
a few meters), chip-to-chip (a few cm) and on-chip (up to a few cm). There are a
few products in the market for machine-to-machine communication using optics but
other categories are still in research stages. The practicality and the feasibility of
chip-to-chip and on-chip communication using optics is still an open question.
Optical interconnects to chips still face many technical challenges. Optics might
1
CHAPTER 1. INTRODUCTION 2
1 mm 1 cm 10 cm 1 m 10 m 100 m 1 km 10 km 100 km
inter−chip
chip−to−chip
inter−shelf
racks/chassis
LAN/WAN
Longhaul
2 D free space single and multimode fibercoarse−WDM and TDMparallel interconnects dense−WDM and TDM
Single mode fiber
1000 km
Figure 1.1: Interconnects at different levels
need to provide very dense interconnects, probably 1000’s of interconnects per chip.
Small and efficient optoelectronic components are needed to satisfy this requirement.
Even though very sophisticated optics and components are available for long distance
communication, new technology is needed for connection to chips because the re-
quirements are very different. Small latency, low noise, low power dissipation, and
the ability to coexist with mainstream silicon technology are required for dense inter-
connects. This dissertation investigates the role of return-to-zero signaling in meeting
these requirements.
We will first discuss the potential benefits of optics in interconnects in Section 1.1.
Then in Section 1.2 we will briefly introduce the devices, technologies, and components
required for optical interconnects. Challenges for optical interconnect and the focus
of this work will be discussed in Section 1.3. We will finally conclude in Section 1.4
by giving an overview of all the chapters.
1.1 Potential advantages of optics
Optical interconnects to chips have been studied for a long time. This study started
with a seminal paper by Goodman et al. [1]. Since then many authors have addressed
the benefits and limits of optical interconnects ( [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]), and
the analysis of optics vs. electronics ([3] [12] [13] [14]). We will look at the potential
benefits of optical interconnects, which come out of the references mentioned.
CHAPTER 1. INTRODUCTION 3
Optics has a very high frequency carrier (order of THz), a very short wavelength
(∼ 1 µm) and large photon energy. The very high optical carrier frequency eliminates
frequency dependent loss in the modulation band, and makes short pulse communi-
cation feasible. The short wavelength allows imaging with a single lens, low loss
in waveguides, impedance matching with very low overhead, and wavelength divi-
sion multiplexing (WDM). Voltage isolation is a result of the large photon energy of
optics [6].
1.1.1 Limitations of electrical interconnects
It is important to understand the limitations and issues of electrical interconnects to
realize the benefits of optics.
i. Aspect ratio limit. As observed by Miller and Ozaktas [4], in digital electrical
interconnects, the total number of bits per second is limited by the “architecture”
of the interconnect, i.e., the length and the cross sectional area. For capacitive-
resistive (RC) lines, the limit to the total number of bits per second (B) depends
only on the “aspect ratio” of the line, which is defined as the ratio of the length
(l) of the interconnect to the square root of the total cross-sectional area (√A) of
the interconnect1. B depends to some degree on the design of the electrical lines.
Roughly speaking, B ∼ BoA/l2, with Bo ∼ 1016 bits/s for unequalized lines. For
inductive-capacitive (LC) lines, the formula for bit rate capacity is the same as for
RC lines, though the factor Bo is slightly smaller, 1015, limited by the skin effect.
Equalization [15], multilevel modulation and the use of repeaters can increase the
total number of bits per second. These schemes, however, add complexity to the
system, which will limit the density of interconnects. The added complexity may
also increase the latency of interconnects. In comparison, optics does not suffer
from this limit because the mechanisms of loss and signal distortion are different.
1This meaning of “aspect ratio”, which might better be referred to as the “architectural aspect
ratio”, differs from the use of the term “aspect ratio” that is the ratio of the height to width for
metal connections on chips.
CHAPTER 1. INTRODUCTION 4
ii. Frequency dependent loss and equalization. The loss profile of electrical wires
has a significant frequency dependence over the entire frequency band of interest
for high speed communication. In baseband communication, commonly used on
electrical wires, a frequency response from DC to the signal bandwidth is required
when no coding is used. The response of electrical wires is not flat for multiple
decades in the frequency domain and requires equalization to compensate for
large loss at high frequencies. In optics a very high frequency carrier is used
and the signal modulation of multiple gigahertz is a small fraction of the carrier
frequency. The response of the medium is quite flat over this frequency range,
requiring no equalization, thus simplifying the system design.
iii. Signal integrity. Driving large pad capacitance by off-chip interconnects can
generate electrical noise on the supply, affecting the signals on the chip. Electrical
chips have signal pads with a capacitance of about 1-2 pF. Large drivers are
required to drive this capacitance. Switching the voltage on these capacitors
generates large current transients, which act as noise sources for the circuits
on the chip. This noise can corrupt signals. Optical devices, if miniaturized and
integrated on the chip, can have much lower capacitances requiring lower currents
to drive them. Consequently, less electrical noise may be injected into the chip.
iv. Distance dependent loss. Loss is very significant in electrical wires at high fre-
quencies because of the skin effect. The design of the interconnects needs to be
customized for different lengths to account for these losses. Also, the frequency
response of the cable changes with the change in length, and a redesign of equal-
ization is required if it is used. In comparison, the losses in transmission of light
are very small. An optical interconnect designed for a few meters can easily be
used for kilometers. To give an idea of the numbers involved, in a 12 m RG-55U
cable the loss at 2 GHz is ∼ 10 dB [16], while in a 1000 m long single-mode optical
fiber at 42 THz carrier frequency the loss is a mere 3 dB. The propagation loss
in free space is also low for optics.
CHAPTER 1. INTRODUCTION 5
1.1.2 Other advantages of optics
i. Density of interconnects. Electronics can provide very high densities of intercon-
nects at the on-chip level. Electrical chips can have a large number of pins to
increase the density of inter-chip interconnects, as in a ball grid array (BGA),
though lots of pins need to be used for supply and ground for a reliable high
speed interconnection. For off-chip or board-to-board interconnects, optics can
offer very large densities. Optical devices can be made very small and 1000’s of
input-outputs (IO) can be achieved on a chip. An experimental chip with 4000
optical IO in a 7 × 7 mm2 area has been demonstrated [17]. Optical interconnects
can utilize the third dimension by being able to cross the beams. In free space,
a few optical elements can handle a large number of beams easily, retaining very
high interconnect densities.
ii. Impedance matching. Most electrical lines are designed for 50 Ω impedance, which
requires a 50 Ω termination to avoid reflections. A lot of power is absorbed in
this termination. In optics, a quarter-wavelength-wide index matching material
(anti-reflection coating) can match the impedance of two dissimilar materials to
remove reflections. This is equivalent to the termination in electrical lines; in
optics, though, there is virtually no power dissipation in this index matching
material. In optics, a beam splitter can be used to tap the optical signal for
monitoring, with small or negligible reflections. A similar tap in electrical lines
needs to be very well designed to minimize the impedance discontinuities, and
hence to reduce the reflections.
iii. Voltage isolation. Optical communication is accomplished by sending photons be-
tween two physically separate transmitting and receiving nodes. The voltages on
the two sides need not be related to each other and can be completely electrically
isolated. This provides noise immunity from one side to the other. With scaling
in electronic chips, supply currents are increasing and so are resistive drops in
DC supply and ground bounce effects. Hence this voltage isolation property of
optics may become progressively more important for future generations.
CHAPTER 1. INTRODUCTION 6
1.2 Components of an optical interconnect
So, what does an optical interconnect consist of? At the physical layer, an optical link
has three main components: a transmitter, the transmission medium, and a receiver.
In digital circuits, binary data in the form of voltage levels (whose value depends on
the technology) needs to be transmitted. Data in the form of these voltage levels
is fed to a transmitter driver, which converts these levels into the voltage or current
signal required to drive the optical transmitter device. The optical transmitter device
converts these electrical signals into the modulation of light beams, which then travel
through some propagation medium to the destination. The photodiode on the receiver
side converts the optical signal into current, which is then converted into logic level
by the receiver.
We will first consider different optoelectronic devices, then look at issues in receiver
circuits, and finally we will discuss free space optical interconnects.
1.2.1 Optoelectronic devices
Vertical cavity surface emitting lasers and quantum well modulators are the leading
contenders as output devices for dense optical interconnects. Lasers are current mode
devices while modulators are voltage mode devices. Quantum well modulators can
also be used as photodiodes.
We will first look into these devices and then look at optoelectronic devices in
silicon since it might be beneficial to have optical devices in mainstream silicon tech-
nology from the point of view of cost.
Vertical cavity surface emitting lasers (VCSEL)
VCSELs are a strong candidate as a transmitter device as they have improved signif-
icantly in the last few years. Oxide confined VCSELs can achieve very low threshold
currents [18]. Sub-mA threshold currents are now easily achieved in VCSELs. Re-
cently, optical interconnects with arrays of VCSELs have been demonstrated [19] [20]
[21] [22].
CHAPTER 1. INTRODUCTION 7
There are many issues that still need to be addressed for using large VCSEL arrays
in optical interconnects [23].
i. Uniformity of threshold current. The threshold currents of lasers need to be
uniform to have uniform behavior of VCSELs across the entire array. If the array
is non-uniform, individual lasers would need to be monitored and controlled,
which will make the entire design complex.
ii. Thermal issues. To avoid the turn-on delay of the laser, it is typically biased
near the threshold and driven well above the threshold when on. In a large array,
in particular, the resulting high current densities can heat up the lasers changing
their properties.
iii. Wavelength stability. The wavelength of these lasers drifts (with temperature and
aging), and is difficult to precisely specify in manufacturing. Many components
in the interconnect can be wavelength-sensitive, and it is important to maintain
the wavelength. Control of the wavelength is even more important in wavelength
division multiplexing.
Multiple quantum well (MQW) modulators
MQW modulators are p-i-n diodes with quantum wells in the i region. The structure
of these modulators and their operation is shown in Fig. 1.2. By applying a voltage
across MQW diodes, the wavelength of the absorption peak can be shifted. This
effect is called the quantum-confined Stark effect (QCSE) [24] [25]. If the modulator
is operated at a single wavelength, varying the voltage across the device changes
the absorption for that wavelength. GaAs-based MQW diodes show a strong QCSE
shift around 850 nm wavelength of light. The ratio of reflected light intensity in low
absorption state vs. the reflected light intensity in high absorption state is defined as
the contrast ratio (CR). By using a reflecting surface at the bottom of the modulator,
the light makes two passes, increasing the contrast ratio of the device. The typical
contrast ratio for this reflection modulator is 2:1 for a 3V swing as shown in Fig. 1.3.
This contrast ratio is limited but, by using Fabry-Perot effects it can be en-
hanced [26]. Moreover, modulators can be used in a differential fashion to double
CHAPTER 1. INTRODUCTION 8
Indium bump
p−contact
quantumwells
n−contact
V
Figure 1.2: MQW modulator operation
Figure 1.3: Wavelength vs. contrast ratio curve for MQW modulator for differentvoltage swings.
the swing. The voltage swings available to modulators diminish as the CMOS line-
widths decrease. Research is in progress to develop new modulators which can operate
with low voltage swings. Details of the physical operation of modulators and their
properties are dealt with in Refs. [27] [28] [29]. These devices are used as transmitters
CHAPTER 1. INTRODUCTION 9
in the work described in this dissertation.
MQW modulators are well suited for integration in large arrays. There are many
advantages in using these large arrays of modulators:
i. High yields have been demonstrated. Greater than 99.99% of diodes in large arrays
have been demonstrated in Ref. [30]. With this kind of yield, large arrays of
interconnects can be fabricated. High IO count optically-interconnected systems
have been demonstrated because of these yields [17] [31] [32] [33].
ii. Single off-chip laser source. The laser for a modulator system can be placed
away from the chip, removing the source of heat generation from the proximity
of modulators. Also, by using a single source to operate all modulators, it is
relatively easy to synchronize the whole system.
iii. Modulator or a photodiode. The diode structure described here can be used both
as a modulator and a detector depending on the circuit to which it is connected.
Being able to use this device as an input or an output device simplifies the
fabrication.
iv. Operation with short pulses. Another big advantage of modulators is that they
can be used to modulate short optical pulses (100fs ∼ 1ps) [29]. Short pulses
have many benefits in optical interconnects as mentioned in the next chapter.
Modulators also have their share of problems. The QCSE is temperature de-
pendent; variation in temperature can move the exciton peak and severely reduce the
contrast ratio. Bringing external beams on modulators can be a disadvantage because
more optics is required to handle incident and reflected beams.
Various optical transmitter technologies are compared in literature [34] [35] [36].
1.2.2 Receivers
A receiver consists of a photodiode to convert an optical signal into an electrical
current and circuitry to convert the current into a full logic swing. Circuits can
be fabricated, for example, in silicon or in GaAs. Silicon foundries are very well
CHAPTER 1. INTRODUCTION 10
established, and even though the performance of the circuits is slow compared to
GaAs, very high circuit densities can be achieved, making it a preferred technology.
For best receiver performance, the capacitance of the photodiode and the receiver
circuit should be as low as possible. A small capacitance design requires fewer gain
stages and has better noise immunity. To reduce the capacitance, monolithic detec-
tors can be made in silicon but silicon has a large absorption depth at wavelengths
near 850 nm, much deeper than junction depths in silicon CMOS. Most photons are
absorbed deep inside the substrate causing generated carriers to come to the surface
over a long period, also leading to inefficient photodetectors. Effectively, every bit
generates a long electrical response tail. There are ways in which faster responses can
be generated, but they reduce the responsivity. Metal-semiconductor-metal (MSM)
photodiodes can also be potentially used to reduce the capacitance, though these
cannot be made in a standard CMOS process.
Another approach is to use GaAs detectors. GaAs is a good absorber at 850 nm
and it is possible to obtain a very fast response with the quantum efficiency reaching
nearly one. The diode structure used for a modulator (as described in the last section)
can also be used as a photodiode. This simplifies the system because, a simple device
works as both a modulator and a detector, depending on the circuitry to which it is
connected. An alternative would be to make MSM detectors in GaAs, which could
lead to fast, efficient, and low-capacitance detectors.
To acquire both the advantages of an advanced silicon foundry and the perfor-
mance of GaAs devices, a hybrid integration scheme can be used. This is explained
in Chapter 3, and receiver circuits are discussed in Chapter 4.
1.2.3 Free space optical interconnects
Light beams with modulated data need to propagate in some medium to reach the
destination. In non-line-of-sight communication, data needs to be sent through a
guided medium. In long distance telecommunications, data is sent through a single
mode fiber, which has a very low loss of 0.2 dB/km at 1550 nm. For relatively shorter
distances multimode fibers are used, because the loss and dispersion of these fibers is
CHAPTER 1. INTRODUCTION 11
tolerable at these distances.
For very high density interconnects at very short distances, a guided medium
may not be appropriate. Since beams need to travel short distances, bulk optics
can be used to direct many parallel beams with a few elements. In “Introduction
to Fourier Optics” by Goodman [37], an analysis of optical elements used in the
design of systems is presented. For chip-to-chip and on-chip interconnects, the free
space approach provides required densities of interconnects. Waveguides can also be
used for short distances though they can have very high losses and the density of
interconnects is typically much lower than in free space. A comparison of the free
space and guided approaches is given in Ref. [38] and discussion about the free space
approach is given in Refs. [12] and [39]. Dispersion and losses can be very low in
free space optical interconnects (FSOI). In our current work, we primarily use free
space interconnects on slotted stainless steel baseplates. These baseplates act like a
breadboard system for optics and are described in Chapter 3.
1.3 Challenges in current optical communication
Optics is making inroads to short reach interconnections. Many technical advances in
devices and packaging have taken place recently. Optical interconnect products are
available for local area network (LAN) and wide area network (WAN) applications [40]
[41]. But can optics provide a solution for chip-to-chip and on-chip interconnects?
For very short reach applications, the density of interconnects required is very
high. There are still device and integration challenges to get high densities of optical
interconnects. When optical devices are integrated close to digital circuits, because
of noise from digital circuits, the operation of interconnects is affected. The heating
of devices due to power dissipation may also limit the density of interconnects. It is
still an open issue as to what densities for optical interconnects, on and off chip, can
be achieved.
On-chip global interconnects require low latency, possibly less than a clock cycle.
With the continuing scaling of silicon CMOS technology, the delay of global wires
with and without repeaters is increasing at least relative to the clock cycle [42] [43].
CHAPTER 1. INTRODUCTION 12
One solution for this problem is to change the architecture of the chip. Or, if we want
to use optics, can it provide low latencies at these scales? Can the data be delivered
within a clock cycle, accounting for the transmitter driver and receiver delay?
For high density parallel interconnects, synchronization to a local clock is a chal-
lenging task. Synchronizing each individual channel will be very inefficient, and a
limiting factor on the number of channels. Are there ways in which all the channels
can be synchronized in the optical domain itself?
Clock distribution on chips with very low jitter and skew is becoming increasingly
difficult. Even with optical clock distribution, receiver circuits add a lot of jitter
and skew. Many applications, such as analog to digital conversion, and high speed
multiplexing and demultiplexing, require a very precise clock with low jitter. Can we
reduce the skew and jitter, and have a very precise clock delivered to the chip?
Using short pulse signaling in interconnects might provide the answer to many of
these questions.
Optical communication is mostly done in binary format, i.e., the messages are put
into a sequence of zeros and ones. At the physical level, ones can be encoded in at
least two ways, while for a zero, no light beam is transmitted. One way to encode a
one is to send a constant light intensity for the entire bit period; the other method
is to send a pulse shorter than the bit period. The first method of encoding is called
non-return-to-zero (NRZ) and the second method is termed as return-to-zero (RZ).
When the pulse duration is much shorter than the bit period, the pulses are referred
as short pulses (Fig. 1.4).
Short pulses provide some unique advantages in communications [44], though it
is important that the medium be able to support the propagation of these pulses.
On electrical wires the frequency dependent losses are very high for short pulses, and
there is substantial dispersion that spreads the pulses, making their use impractical.
In optics, as mentioned earlier, because of the high frequency carrier, the losses for the
entire spectrum of short pulses are nearly constant. Also, the dispersion in an optical
medium for small distances is tolerable and does not cause significant broadening.
Short pulse benefits in optical interconnects are briefly enumerated below.
i. Receiver sensitivity enhancement. By using short pulses, it is possible to improve
CHAPTER 1. INTRODUCTION 13
NRZ
RZ
RZ withshort pulses
bit period
0 1 0 1 0
Figure 1.4: Schematic demonstration of NRZ and RZ coding
the sensitivity of the receiver, and hence reduce the number of gain stages. With
smaller receiver size, it might be possible to increase the density of interconnects.
Or, with the same number of stages, optical power required can be reduced for
improved system power budget.
ii. Latency reduction. Because of very sharp rise and fall times, short pulses can
reduce the latency of receivers and might reduce the overall latency of the link.
Global on-chip interconnects might be feasible with short pulses.
iii. Synchronization and clocking. Short pulses generated from a modelocked laser
have very low pulse-to-pulse jitter. A sharp rising edge with very low jitter can
be utilized to distribute a very precise clock. Also, short pulses that are much
shorter than the bit period can be used to read out the modulator at the nominal
center of the bit. This can effectively eliminate skew and jitter of up to half a bit
period from the transmitting channels and synchronize the entire array without
any extra processing.
iv. Wavelength division multiplexing (WDM). Short pulses (150 fs) have very broad
bandwidth (∼ 5 nm). Multiple separate channels can be created by spectral slic-
ing this bandwidth and modulating each slice individually. Channels generated
from a single source eliminate the need of wavelength monitoring of each channel.
Also, in a system using WDM all the benefits of short pulses can be utilized.
CHAPTER 1. INTRODUCTION 14
Short pulse signaling can potentially make optical interconnects feasible at the
chip-to-chip and on-chip level. This dissertation investigates some of the issues of
short distance optical interconnects operating with short pulses, including their prac-
ticality and feasibility. There are many compelling reasons for using optics at these
short distances.
1.4 Organization
The organization of this thesis is as follows. Chapter 2 gives an overview of short
pulses. The properties of short pulses, their generation, and their propagation in an
optical medium are addressed. A broad overview of the benefits of short pulses in
interconnects is presented.
The components of the chip-to-chip link used in this work are presented in Chap-
ter 3. These components include optomechanics, GaAs based MQW diodes, and
silicon CMOS chips. Circuits designed on silicon chips are discussed in detail. The
hybrid integration process used for flip-chip bonding of MQW diodes on silicon chips
is also mentioned.
Chapter 4 gives the details of three receiver architectures; transimpedance, inte-
grating, and totem-pole. Fabrication of these receivers and their experimental mea-
surement results are presented in this chapter. These receivers are operated with
NRZ and short pulse data and their performances are compared. It is shown that
short pulses can improve the receiver sensitivity significantly.
Latency in optical links is considered in Chapter 5. It is a very important criterion
for global on-chip interconnects. Short pulses can reduce the latency of a receiver,
hence making optical interconnects a potential solution for on-chip interconnects.
Chapter 6 presents clocking and synchronization with short pulses. These pulses
can remove skew and jitter of up to half a bit period from the entire array of mod-
ulators by nominally reading the whole array at the center of the bit period. Using
silicon detectors for low capacitance, and eliminating the receiver circuit, a very pre-
cise clock can be injected with short pulses. A totem-pole diode pair is used as a
push-pull device to generate the clock by alternately putting the pulses on the top
CHAPTER 1. INTRODUCTION 15
and bottom diode.
Wavelength division multiplexing (WDM) using short pulses is demonstrated in
Chapter 7. A short pulse beam is spectrally sliced to generate multiple wavelength
channels. A single short pulse source generating all the optical channels keeps the
output from the entire modulator array synchronized. Wavelength monitoring for
each separate channel is not required.
Finally the conclusions are presented in Chapter 8.
Chapter 2
Short Pulses in Interconnects
The non-return-to-zero (NRZ) format is the most commonly used format for data
communication. For a given speed it is bandwidth-efficient, which is very useful in
bandwidth-limited systems. An alternative to NRZ is the return-to-zero (RZ) format
which has no transition for a logic ’0’ and two transitions for a logic ’1’. The short
pulses referred to in this thesis are RZ encoding with very low duty-cycle. These
pulses are of the order of a few picoseconds or shorter, which for a 1 GHz link have
a duty-cycle of about 10−3.
Short pulses are typically generated by using a modelocked laser with modelock-
ing done actively or passively. Modelocking ensures very low pulse-to-pulse jitter.
Repetition rates of many gigahertz have been demonstrated; past research work in
modelocked lasers is comprehensively summarized by Avrutin et al. [45]. Short-pulse
sources in general are summarized by Tamura [44]. For the current work, a commer-
cial Ti-Sapphire laser with 80 MHz repetition rate is used. Due to limited commercial
applications at this time, high repetition rate short pulse commercial lasers are not
readily available at wavelengths convenient for the present work.
In an optical medium, attenuation is frequency-independent for a broad frequency
range. In absolute terms, attenuation per unit distance can be very small as well. Low
attenuation for large bandwidth allows the propagation of short pulses for distances of
interest in interconnects. Similarly, dispersion is also very small in an optical medium
for distances of interest for interconnects, making propagation of short pulses feasible.
16
CHAPTER 2. SHORT PULSES IN INTERCONNECTS 17
Shen et al. [46] have demonstrated a short pulse-based WDM transmission with only
3 ps of skew over the entire transmission band. This is in contrast to electrical wires
where both attenuation and dispersion are very high. This difference occurs because
a high frequency carrier is used in optics while baseband communication is used in
the electronic domain. Even very high speed modulation rates are small compared
to optical carrier frequencies (∼ 1014 − 1015 Hz), so such modulation makes little
difference to optical propagation.
A stream of ideal pulses can be represented mathematically by Dirac-delta func-
tions:n=+∞∑
n=−∞
δ(t− nT ) (2.1)
where T is the period of repetition and δ(t) is a Dirac-delta impulse. In the frequency
domain, this impulse train corresponds to another comb of Dirac-delta functions or
modes with a frequency separation of 1/T, as given by the following equation.
n=+∞∑
n=−∞
δ(f − n
T) (2.2)
Such an ideal pulse stream contains an infinite set of frequency components separated
by the repetition rate of the pulses. The pulses generated by the laser are not ideal
impulses, but are more likely approximately Gaussian in shape. For very short pulses
the spectrum is still very close to the spectrum of an ideal impulse train for a large
number of modes.
For non-ideal pulses with a pulse shape p(t), the pulse train and the corresponding
spectrum are given by
n=+∞∑
n=−∞
p(t− nT )⇔ P (f)n=+∞∑
n=−∞
δ(f − n
T) (2.3)
where P (f) is the Fourier transform of p(t). The spectrum of the train of pulses
consists of ideal impulses, and the envelope is determined by the Fourier transform
of an individual pulse. If the pulses are Gaussian in nature then the spectrum of the
CHAPTER 2. SHORT PULSES IN INTERCONNECTS 18
pulse train is also Gaussian in its envelope. This is illustrated in Fig. 2.1. For the
laser used in the current work, the pulse width is about 150 fs and the spectral width
is ∼ 5 nm. These pulses are much shorter than any time scale on the chip and can
effectively be treated as impulses.
f
tT
1/T
p(t)
P(f)
Figure 2.1: A pulse train and its spectrum
Large amplitude, large bandwidth, fast rising and falling edges, and low pulse-to-
pulse jitter (Fig. 2.2) are very useful properties of short pulses in optical interconnects.
The following sections give a brief overview of the different advantages of using short
pulses in interconnects, which form the motivation for this work.
low pulse to pulse jitter (< 3ps rms)
largeamplitude
very large bandwidth (> 2 THz)
150 fs
Figure 2.2: Short pulse properties
CHAPTER 2. SHORT PULSES IN INTERCONNECTS 19
2.1 Improved receiver performance
The optical power budget is an important constraint in providing a large number
of IO between chips. Reducing the amount of optical power required will allow a
larger number of IO. Keeping everything else the same while reducing optical power
in a link requires larger amplification in the receiver, thus increasing the size of
the receiver and the amount of electrical power dissipation. By using short pulses,
the optical power required by the receiver can be reduced, without increasing the
amplification, because short pulses have all the energy concentrated in a very short
period. Sensitivity enhancement of transimpedance receivers with short pulses was
first mentioned by Boivin et al. [47]. In the case of NRZ data, while the input is
being charged, the charge leaks away through the feedback resistor, giving a smaller
peak swing for the same amount of energy compared to the short pulse input. This
is schematically illustrated in Fig. 2.3. For an integrating receiver, all the energy
is concentrated in the integrating period when short pulses are used. With NRZ
data, optical energy incident during the resetting period is wasted as illustrated in
Fig. 2.4. Thus the sensitivity of an integrating receiver improves by using short pulses,
though the extent of this improvement depends on the fraction of clock cycle used
for resetting. Chapter 4 expands on this idea, and presents the advantages of short
pulses with different receiver architectures.
in
time
volta
ge
NRZ
short pulsei
Figure 2.3: Sensitivity enhancement in transimpedance receiver with short pulses
CHAPTER 2. SHORT PULSES IN INTERCONNECTS 20
clock
NRZinput
short pulseinput
integrationphase phase
evaluation
Figure 2.4: Timing diagram of the integrating receiver with short pulse and NRZinputs. Energy incident during the evaluation phase is not integrated.
2.2 Low latency in receivers
For on-chip connections, because of increasing clock speeds and reducing line-widths,
it is becoming increasingly difficult to send data across the chip in one clock cycle.
For example, on a 2 cm wide chip, repeatered global interconnections would require
∼ 330 ps assuming the speed of propagation of the signal to be roughly c/5 (c is the
velocity of light in vacuum) [6]. For optical interconnects to be a viable alternative,
the latency of optical links needs to be lower. It is potentially possible to reduce the
latency of an optical link by using short pulses instead of NRZ data format. The
latency of a transimpedance receiver can be reduced by ∼ 65%, if short pulses are
used. The latency of the integrating receiver can also be reduced significantly at the
expense of timing margin. The lowest latency in a receiver can be achieved by using
an amplifier-less scheme. A totem-pole structure of detectors connected to a high
impedance node can be charged or discharged to full supply levels using short pulses
in a very short period. The latency of transimpedance, integrating and totem-pole
receiver architectures is analyzed in Chapter 5.
CHAPTER 2. SHORT PULSES IN INTERCONNECTS 21
2.3 Better synchronization
There are two aspects to synchronization in a system. One is to have all the channels
in a parallel link synchronized to each other, and the other is to provide accurate
clock to all parts of the chip. Short pulses can improve the synchronization of the
system because of fast rise and fall times, and low cycle-to-cycle jitter. In an imple-
mentation with multiple parallel channels, the drive waveforms have skew and jitter
due to process variations, temperature variations, and noise on supply lines. This
causes phase misalignment at the receiver, and imposes a system power penalty. As
mentioned earlier in this chapter, short pulses are like impulses and they effectively
sample the state of the modulator. By using short pulses to read out the parallel
channels at a nominal bit center, the effect of skew and jitter from those channels can
be removed. Fig. 2.5 conceptually shows the removal of skew from different channels.
Similarly, the effect of jitter can be removed. There is a limit though, to the amount
of skew and jitter that can be removed, namely up to half a bit period.
ch.1
ch.2
ch.3
Figure 2.5: Skew removal from multiple parallel channels using short pulses. Thethree waveforms are electrical drive signals and they are read by a short pulse whichsamples all the channels at the same time.
It is also possible to inject a very precise clock using short pulses. This clock
can be potentially used to retime the data coming in on parallel optical IO. Apart
CHAPTER 2. SHORT PULSES IN INTERCONNECTS 22
from this, the precise clock can find application in testing and debugging. In these
applications, a very low jitter clock is required to characterize waveforms on the chip.
Typically, amplifiers in the optical receiver also introduce jitter. To circumvent this
problem, an amplifier-less scheme is proposed, which is capable of providing a very
precise clock. Synchronization and clocking are dealt in with Chapter 6.
2.4 Wavelength division multiplexing (WDM)
~ 5nmwavelength
Figure 2.6: Spectral slicing of short pulse spectrum for WDM
In backplanes of current routers, the volume available for wiring is limited. The use
of WDM can potentially reduce the number of wires by transmitting multiple channels
on one fiber. A 150 fs pulse has a bandwidth of ∼ 5 nm. This broad bandwidth can be
split into multiple frequency bands, and each band can be modulated independently.
These bands are orthogonal to each other and can be combined to be sent through, say,
a single fiber and then split again at the receiving end. Fig. 2.6 shows the concepts
of splitting the spectrum to generate multiple channels. By using a single source
to generate multiple channels, many system aspects are also simplified. Different
channels are carved out of a single spectrum, hence they automatically maintain the
wavelength separation and require no monitoring. In contrast, a laser-based WDM
system requires a very careful monitoring of the wavelengths of lasers so they do not
drift into neighboring channels. Removing the monitoring requirement reduces the
cost of the system. In the case of short pulses, the channels are also synchronized
CHAPTER 2. SHORT PULSES IN INTERCONNECTS 23
while going to the receiver as shown in the previous section. The received data on
all the channels can then be sampled using a single clock, reducing the complexity of
the system.
Chapter 7 goes into the details of the implementation of a short pulse-based WDM
link.
Chapter 3
Optical Interconnect Setup and
Components
In this chapter we will describe technology that is common to the work in subsequent
chapters. Specifically, we will discuss the optical apparatus, the optoelectronic de-
vices, the integration technology, the overall layout of the silicon CMOS chips, and
some of the relatively standard circuits used on the chips.
A schematic of a generic dense chip-to-chip optical link based on modulators is
shown in Fig. 3.1. Either a short pulse beam from a modelocked laser or a continuous
wave (cw) beam is incident on a diffractive optical element (DOE), which fans out this
beam into multiple beams. These beams are modulated by an array of modulators
driven by the electrical signals from the chip. The modulated beams are imaged on
the receiver chip. The output of the receiver drives either, a) an output electrical pad
for direct testing; b) an on-chip bit error rate tester for evaluating link performance; or
c) another modulator for optical verification of the received data. All-optical testing
by reading out the modulator driven by the receiver eliminates the need for high-speed
output electrical pads from the chip.
For the present work, the optomechanics was designed on a breadboard style
system based on slotted stainless steel baseplates. GaAs-based MQW diodes acting
as modulators and photodetectors were flip-chip bonded to silicon CMOS chips. The
optomechanics and optical test bench setup are described in the next section. The
24
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 25
Figure 3.1: Schematic diagram of an optical interconnect system
properties and operation of MQW diodes are presented in Section 3.2. Section 3.3
describes the silicon chips designed for this work. Finally, Section 3.4 deals with the
hybrid integration of MQW diodes and silicon chips.
3.1 Optical test bench
The implementation of the dense chip-to-chip optical link was done using slotted
stainless steel baseplates (Fig. 3.2). The input optical beam (cw or short pulse) was
fanned out into 20 beams for 10 linear differential channels using a diffractive optical
element (spot array generator). Beam steering was done by a pair of Risley prisms,
which moved the beam by small amounts when they were rotated. The chips were
mounted on XYZ stages, external to the baseplate, to provide better controllability
of the placement. The alignment of the beams was done visually by viewing with the
imaging cameras, shown at the top of Fig. 3.2.
In optical testing, optomechanics is an essential element. Precision in alignment
and stability are required for repeatable measurements. The slotted stainless steel
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 26
shortpulsebeam
Rx Tx
slotted baseplate
spot arraygenerator
imaging cameras
beamreadout
Figure 3.2: Optomechanical setup for testing
baseplates used in the current work satisfy these requirements, simultaneously pro-
viding easy reconfigurability for a low setup time. This kind of setup was reported by
Brubaker et al. [48]. These baseplates are precision milled to 1 µm flatness over the
entire surface. All the components in a given slot are aligned to a common optical
axis, which is the same as the mechanical axis of the slots. The overall assembly with
baseplates is mechanically and thermally very stable. The baseplate setup minimizes
the time required for assembly and alignment, because all the components are on a
single optical axis. The optical components are mounted in circular cells which are
custom designed and placed on precision milled slots in the baseplate. The compo-
nents are held in place by using ceramic magnets, providing a stable arrangement
after alignment. For a given implementation, a custom layout of the slots is generally
required, unless optical path lengths are not critical. For the case of non-critical path
lengths, an arbitrary grid of slots can be used, providing flexibility and convenience.
The DOE used in this setup was an eight level phase-only mask fabricated by
Digital Optics Corporation. It was a representation of the Fourier transform of the
required pattern. The intensity of 20 fan out beams generated by the DOE was
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 27
uniform within 90% at the wavelength of interest, i.e. 850 nm. The design of a DOE
is explained in Refs. [49] [50].
There are many ways of generating short pulses. One way is to drive a laser
with very short current spikes as in Refs. [51] [52] [53]. Another way is to use either
active, passive, or hybrid modelocking in lasers. Active modelocking is typically done
by driving the laser from an external modulation source [54]. Passive modelocking
involves a saturable absorber in the laser cavity or Kerr nonlinear lensing [55]. For
this work, short pulses were generated by a commercial modelocked Ti:sapphire laser
at 80 MHz. The availability of a high power and a high-repetition-rate commercial
laser is presently limited by a relatively small demand, though in research, many
high-repetition-rate modelocked lasers have been demonstrated, e.g. in Ref. [56].
3.2 MQW diodes
Chapter 1 gave a basic overview of the design and operation of MQW diodes. These
diodes are p-i-n structures with quantum wells in the i region. They work as modula-
tors on the basis of quantum-confined Stark effect (QCSE). MQW diodes fabricated
in GaAs exhibit strong QCSE around a wavelength of 850 nm. These diodes can not
only be used as modulators, but also as photodiodes. Being able to use the same
device for modulation and reception simplifies the design of an interconnect system.
The MQW diodes used in this work were first-generation devices fabricated at
Stanford. They exhibited the Fabry-Perot effect, because an anti-reflection coating
could not be used for processing reasons. This Fabry-Perot effect degraded the per-
formance of the devices.
The overall size of these diodes after fabrication was 40 × 80 µm2. As photodiodes
these devices had a responsivity of about 0.13 A/W, a fourth of the expected value of
0.5 A/W. The maximum responsivity of a GaAs photodiode can be 0.66 A/W, a limit
corresponding to one electron per photon at a photon energy of 1.5 eV (850 nm). The
responsivity of 0.5 A/W is routinely achieved with anti-reflection coating [30]. With
a voltage swing of ∼ 3 V, the contrast ratio of these diodes when used as modulators
was about 1.3:1, which was much below the expected value of 2:1.
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 28
Because of low contrast ratio, these modulators were used differentially to increase
the signal strength. For single-ended electrical circuits, these diodes can be connected
in a totem-pole configuration to provide differential optical output, as in Fig. 3.3. In
this figure, a schematic and the corresponding picture of a totem-pole connected diode
pair is shown. The same configuration can be used at the receiver for differential
optical and single-ended electrical input. Two diodes can also be used separately in
a fully differential configuration.
in
Figure 3.3: Schematic and the picture of totem-pole connected diodes
The bonded capacitance of these diodes was originally expected to be 100 fF but
it was actually ∼ 260 fF. This capacitance was measured by using ring-oscillators on
the silicon chip. The oscillation frequencies of an unloaded oscillator and a MQW
diode-loaded oscillator were compared [57]. This large deviation in the capacitance
of these diodes affected the performance of the circuits quite adversely.
3.3 Silicon chips
Most of the receiver testing was done on two CMOS chips fabricated using different
technologies. One chip was fabricated in the 0.5 µm HP process and other was fabri-
cated in the 0.25 µm National Semiconductor process. In this section, the description
of the circuits on these chips will be presented.
The layout of the chip fabricated in the 0.5 µm process is shown in Fig. 3.4. The
chip consists of linear arrays of transceivers. The receiver output is connected to
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 29
generator 1noise testing
BER tester 2
BER tester 1
Test
Transceiver
Transceiver
PRBS
array
circuits
VCO for
array
PRBS generator 2
Figure 3.4: Layout of the chip fabricated in the 0.5 µm HP process
a modulator driver, so that the received data can be verified by reading the state
of the modulator. For this testing, the modulator is driven by the receiver output,
and a continuous wave beam reads out the modulator state. The modulated beam
can be observed by using a commercial photodiode. This allows for all-optical test-
ing of receivers, eliminating issues associated with high speed electrical pads. Both
transimpedance and integrating receivers are designed on this chip. Because of some
fabrication issues, the transimpedance receivers did not function correctly on this chip.
To test the performance of the receiver in terms of bit error rate, pseudo-random bit
sequence (PRBS) generators were incorporated on the chip. The details of the design
of PRBS generators and bit error rate tester circuits are given in Section 3.3.2. To
test the robustness of the receivers with substrate noise, voltage-controlled oscillators
were also designed on this chip. These oscillators were capable of generating substrate
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 30
noise at different frequencies. Receiver test circuits with outputs to electrical pads
were also accommodated.
receivertransmitterpairs
receiversconnectedto samplers
Ring oscillatorsto measure silicondetector capacitance
Silicon detectorsconnected tosamplers
Figure 3.5: Layout of the chip fabricated in the 0.25 µm National Semiconductorprocess
The layout of the chip fabricated in the 0.25 µm process is shown in Fig. 3.5.
This chip has receiver-transmitter pairs for all-optical testing as in the previous chip.
Receiver-transmitter pairs on this chip are designed so that the latency measurements
of on-chip interconnects can be performed. Apart from all-optical testing, electrical
samplers are put on this chip for the probing of internal nodes of the circuits, described
in Section 3.3.3. The chip also contains silicon detectors for tests on optical clock
injection. The sampler circuits fabricated on this chip are high voltage samplers,
which detect voltages above ∼ 1 V.
The design of the receivers and their operation with short pulses forms an impor-
tant part of this dissertation and is separately dealt with in detail in Chapter 4.
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 31
3.3.1 Modulator driver
The contrast ratio of modulators improves with higher voltage swing [27]. But if
modulators can be used with the voltage swing corresponding to the supply voltage
of the chip, the driver design is simplified, though with the scaling of technology, and
the supply voltage, it might be necessary to use higher-than-supply swings to get a
large enough contrast from modulators. Some circuits for high voltage swing drive
to modulators are described in Ref. [58]. Here, the modulator driver is designed to
generate the supply swing on the modulator.
Figure 3.6: Eye diagram of modulator driver operation at 800 Mb/s obtained byoptical readout of the modulator.
A modulator driver is a chain of buffers designed to drive ∼ 100 fF of capacitance.
Because the modulator capacitance was larger than expected, the driver chain was
not able to operate at very high speeds. In simulation for both 0.5 µm and 0.25 µm
technology with 100 fF of capacitance, the modulator driver was able to operate in
excess of 1 Gbps, while the fabricated modulator driver with bonded diodes operated
only up to 800 Mbps in 0.5 µm technology. The eye diagram is shown in Fig. 3.6.
A similar performance was obtained on the 0.25 µm chip. Because of limited drive
capability, the modulator driver was not able to drive certain pulse outputs related
to short pulse testing.
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 32
3.3.2 Pseudo random bit sequence (PRBS) generator and
tester
A pseudo random sequence can be generated by using storage elements (e.g. flip-
flops) and an XOR gate in a feedback loop [59]. The connection of the XOR gate
depends on a polynomial reported by many researchers, for example see Ref. [60].
This structure is referred to as a linear feedback shift register (LFSR). In general,
the maximum period for an n-stage LFSR is 2n − 1. The signals generated from
a LFSR are not truly random. Pseudo-random sequences are better for test-pattern
generation as they can be reproduced easily and verified. For example, a length 27−1
sequence generator is shown in Fig. 3.7. A feedforward circuit, such as the one shown
in Fig. 3.8, can verify a sequence generated from the earlier circuit. This circuit can
be used for bit error rate (BER) testing on the receiver side. It is important to note
that a LFSR should not be in an all-zero state because it will continue to generate
zeros after that. This possibility can occur only at the start of this circuit and a
mechanism can be inserted to start the circuit from a fixed state that is not all zeros.
x0x7x6 x3 x2x5 x4 x1
Figure 3.7: Schematic of a LFSR generating a pseudo random sequence of length27 − 1, where a square corresponds to a D flip-flop.
x7x6x3x2 x5x4x1
datainput
output
Figure 3.8: Schematic of the circuit to verify the sequence generated by the LFSRshown earlier.
PRBS generators and testers were designed on the 0.5 µm technology chip. There
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 33
were two PRBS generators: the first one generating a 27 - 1 length sequence and
the second one generating a 222 - 1 length sequence. All the tests were done with
the longer sequence generator; the short one was fabricated so that the entire bit
sequence could be seen on an oscilloscope for debugging. The 222 - 1 length sequence
generator was connected to an array of modulators to simulate a random bit stream.
Corresponding BER testers based on verification of the sequence generation logic were
also designed. In such a circuit, every error generated one transition at the output.
The total number of transitions in a given time period were counted to compute the
bit error rate.
3.3.3 Samplers
On-chip samplers allow the measurement of relatively high frequency content inside
the chip. This technique was proposed by Larsson and Svensson [62] and later many
authors have published different sampling methodologies, e.g. as in Ref. [63]. The
vsignal
vcalib
gnd gnd
gnd gnd
gnd
hold
smpClk_b
SmpClk & sample & enable
smpClk & calibrate & enable
smp
vdd
vdd vdd vdd
enable
1.6u1.6u
4.8u
1.6u
0.8u
3.2u4.8u
2.8u
3.2u
12u 12u
vdd
To shared current mirror
Figure 3.9: The circuit schematic of the on-chip sampler in 0.25 µm CMOS tech-nology. All transistors are minimum length. (Yeung et al. [61])
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 34
high bandwidth of MOS transmission gates makes this idea possible. By the sub-
sampling of a repetitive signal with varying clock phase, high speed analog signals
can be reconstructed. Samplers were fabricated on the 0.25 µm technology chip for
sampling silicon detector and the receiver responses.
A master-slave sample-and-hold switched-capacitor circuit forms the core of the
on-chip sampler, as shown in Fig. 3.9. A source follower buffer is placed between
master and slave nodes (marked by smp and hold on the schematic) to remove the
bandwidth limitation due to charge sharing. The hold voltage is transformed into
current and extracted out of the chip. Since the relationship between the sampled
voltage and the output current is not linear, we have multiplexed a calibration signal
at the input of the sampler. The sampler output current is calibrated with this input
signal before using the sampler. Every sampler is independently calibrated to account
for process and environmental variations across the chip. The transmission gates of
samplers are formed by PMOS transistors; therefore it is only possible to measure
signals above the threshold of the transistor (∼ 1 V). By extensive simulation, the
3 dB bandwidth of samplers was found to be ∼ 4 GHz.
3.4 Hybrid integration of GaAs devices
Ideally, one would like to integrate optoelectronic devices monolithically on silicon
chips. Silicon detectors have problems of speed and sensitivity in the near infrared,
and there are no viable silicon modulators or emitters for the kinds of densities,
efficiencies, and speeds required for interconnects to CMOS chips. One problem
with III-V devices for monolithic integration is that III-V compounds are not lattice
matched with silicon and hence cannot be grown without introducing many defects.
Also, the introduction of GaAs in a silicon foundry is often not acceptable because it
might have detrimental effects on silicon circuits.
A hybrid approach is more promising, because it avoids the aforementioned process
incompatibility issues. Using this approach, well established high performance silicon
circuits are combined with optically superior GaAs devices [64]. One technique of
integrating these devices is shown in Fig. 3.10. This technique is used for bonding
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 35
devices at Stanford. High yields have been demonstrated with large arrays of bonded
devices [30] [65] [66] [67] [68] [69].
coatingAnti Reflection
n+Indium solder
silicon
silicon
n+
silicon
n+
epoxyi MQW
GaAs
i MQW
GaAs
p AlGaAs
p AlGaAs
p AlGaAs
i MQW epoxy
step1
step2
step3
Figure 3.10: Integration of GaAs devices on silicon chips
For bonding GaAs devices to silicon chips, pads for contact are placed at appro-
priate places on silicon chips. The completed silicon chips are then post-processed
to deposit a barrier layer, followed by gold and then indium. GaAs-based devices
are etched to form mesa structures. To make both n and p contacts planar, the n
and i regions are etched all the way down to the p layer and a thick indium bump is
deposited to get it to the level of indium contact on the n side. Step 1 of Fig. 3.10
shows a silicon chip and GaAs-based devices at this stage. In step 2, both are brought
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 36
together with heat and pressure, to join them. The GaAs substrate absorbs light at
850 nm and the illumination of the devices needs to be done from the side of GaAs
wafer. Hence, the GaAs wafer is removed by etching and an anti-reflection coating
is optionally deposited on the devices. After removing the wafer, the devices stand
apart as mesa structures. GaAs and silicon have different expansion coefficients with
temperature and by removing the GaAs wafer, the problem of thermal stress between
silicon and GaAs is eliminated.
Fig. 3.11 shows a picture of the CMOS chip with flip-chip bonded MQW diodes.
These diodes were 80 × 40 µm2 in size. The diodes were fabricated and flip-chip
bonded at Stanford. It is quite possible to make smaller devices, which might be
preferable to reduce the photodiode capacitance. At Lucent, very small devices with
flip-chip pads of size 15 × 15 µm2 have been fabricated [30]. Ten rows of these diodes
were fabricated with 20 diodes in each row. The spacing between the diodes in a row
was 62.5 µm and the rows were separated by 125 µm.
Figure 3.11: Picture of a CMOS chip with flip chip bonded diodes
CHAPTER 3. OPTICAL INTERCONNECT SETUP AND COMPONENTS 37
3.5 Summary
A slotted-baseplate-based optical system was implemented for a dense chip-to-chip
optical link. An eight phase level DOE was used as a fan out element to gener-
ate 20 beams for modulation by the modulators. Short pulses were generated by a
modelocked Ti:Sapphire laser operating at 80 MHz. The GaAs-based MQW diodes
were used as modulators and photodetectors after flip-chip bonding on silicon CMOS
chips. As photodiodes, their responsivity was 0.13 A/W and their capacitance was
∼ 260 fF. As modulators, their contrast ratio was ∼ 1.3:1. These devices were first-
generation devices and these numbers were quite different from the expected values.
The performance of the circuits fabricated in the 0.5 µm and 0.25 µm technology was
adversely affected because of this variation. The modulator driver operated up to
800 Mbps. The BER tester, the pseudo random bit sequence generator, and on-chip
samplers were designed on the chips to facilitate the link testing.
Chapter 4
Receivers
Earlier chapters introduced the technology used in this work and the concept of a very
low duty cycle return-to-zero scheme for improved performance of links. This chapter
looks into the design of receiver circuits for short distance links, and highlights the
differences in operation with short pulse and NRZ data.
The design of optical receivers for short distances is similar to telecommunications
receivers in some ways but the requirements are very different. Sensitivity is very im-
portant for telecommunications receivers because they operate with very few photons
per bit. In contrast, receivers for interconnects trade off sensitivity for lower power
dissipation. The area and the cost of the receivers are more critical in short links
than in telecommunications. In telecommunications, the serial data rate is increased
for higher throughput (wavelength division multiplexing is also used) requiring re-
ceivers to run at very high speeds. In short links the throughput is increased by
increasing parallelism. The receiver for telecommunications is noise-limited while the
short link receiver is typically gain-limited. To get high overall throughput, receivers
in interconnects need to be densely packed with other circuits, where supply and
substrate noise generated from surrounding circuits and the electrical crosstalk from
other receivers can impair the performance. Hence, for receivers to operate in this
environment they should be immune to noise generated from surrounding circuits.
38
CHAPTER 4. RECEIVERS 39
A lot of literature has been devoted to the design of receivers for telecommunica-
tions starting with the ground-breaking work by Personick [70]. A typical telecom-
munications receiver consists of a transimpedance stage, gain stages, a decision stage,
and an automatic gain control (AGC) module. In interconnects the number of stages
is generally minimized to reduce power consumption and total delay of the receiver.
Optical receivers can be operated with a single modulated optical beam, or with
differentially modulated optical beams. With a single beam implementation, espe-
cially if the system is to be DC coupled, a reference signal needs to be generated on
the chip. While with a differential beam implementation, the reference information
is carried by the beams. It is also possible to have a fixed threshold determined by
the devices forming the receiver [71] [72]. In a single beam implementation, perfor-
mance can possibly be degraded by optical intensity variations and the noise in the
reference generation mechanism. Variations in the received optical power require the
reference to be dynamically varied. Due to difficulties in generating a good reference
on the chip, receiver sensitivity is enhanced by using differential beams. In telecom-
munications, it is very expensive to incorporate two fibers to carry differential beams
for every channel, but in the case of free space interconnects, doubling the number
of beams is not such a significant problem. Differential beams also double signal
contrast, which is required in current modulator based system because of a limited
contrast ratio.
Receivers can be implemented in many different technologies. High performance
receivers have been demonstrated in BiCMOS [73], GaAs [74] [75], and silicon CMOS
[76] [77]. Because of advances in silicon CMOS technology and widespread use, the
cost of fabrication in this technology is very low. Also, a very high density of circuits
can only be achieved in silicon CMOS, making this a preferred choice for fabrication
of circuits. As detailed in Chapter 3, circuits for this present work were fabricated in
silicon CMOS. The overall chip design is described in Chapter 3 itself.
Three receiver topologies are considered in this chapter: transimpedance, integrat-
ing, and totem-pole stacked diode pair. The earlier work has primarily been focused
on NRZ data input to receivers. We believe that the use of short pulses with receivers
leads to useful advantages as will be shown in later sections.
CHAPTER 4. RECEIVERS 40
A transimpedance receiver is a commonly used architecture in telecommunica-
tions. There is no clock required at the frontend which makes this receiver potentially
very fast. Synchronization to the local clock domain can be done after recovering the
signal to a full logic level. In this receiver the current generated by the photodiode
is converted to voltage by the transimpedance stage, which is amplified to full signal
swing by further amplification stages. The speed of this receiver is typically limited
by the frontend time constant, which is determined by the total capacitance at the
input node and the effective feedback resistance seen by the frontend. When a short
pulse format is used instead of NRZ, the large amplitude of short pulses increases
the sensitivity of the transimpedance receiver. The operation of this receiver and the
effect of changing various physical parameters are covered in Section 4.1.
An integrating receiver integrates input photocurrent and uses positive feedback
to make a decision. It is the most sensitive of the three architectures considered here.
This receiver requires a clock synchronized to the data input at the frontend. The
use of short pulses improves the timing margin of this receiver. The latency of this
receiver can be reduced at the expense of the timing margin, which is explained in
detail in Chapter 5. The sensitivity of the integrating receiver also improves by using
short pulses as mentioned in Section 4.2.
A totem-pole stacked diode pair is the simplest form of receiver. It works on the
principle of integrating the input optical power directly at the input node. Full swing
is generated at this node, which eliminates the need of any further amplification.
Removing amplification stages has the advantage of eliminating possible skew and
jitter introduced by the amplification circuitry. The operation of this receiver is
explained in Section 4.3.
The organization of this chapter is as follows. First the principles of operation of
three different receiver architectures are presented along with a comparison of their
performance with short pulse and NRZ input. The later sections give the fabrication
details and testing results of transimpedance and integrating receivers. The totem-
pole diode pair receiver is explained in detail in Chapter 6, as it is primarily used for
clock injection.
CHAPTER 4. RECEIVERS 41
4.1 Transimpedance receiver
Transimpedance is the most common architecture for receivers for both telecommu-
nications and interconnects. This receiver does not require any clock at the frontend
(asynchronous), which makes it relatively easier to use. The design of this receiver
has been discussed in detail at many places in literature for telecommunications [73]
[77] and interconnects [76] [78] [79] [80] [81]. We will summarize the operation of this
receiver frontend without going into a lot of detail, and then compare the performance
of this receiver for NRZ and short pulse input.
vin vout
R f post−amplifierchain
i in
− A
frontend
Figure 4.1: Transimpedance receiver structure
The transimpedance receiver structure is shown in Fig. 4.1. It consists of photodi-
odes connected to an inverting amplifier with resistive feedback and a post-amplifier
chain. The difference of the currents from the two photodiodes flows into the circuit.
A high-input-impedance inverting amplifier and the resistor form the transimpedance
stage, which converts the photocurrent flowing into the circuit into voltage. This volt-
age is then amplified by the post-amplifier chain and a decision is made about the logic
level. The transimpedance receiver is analyzed for various optimizations in literature,
e.g. in Refs. [81] [82].
This receiver was implemented in silicon CMOS with inverters acting as ampli-
fiers. This implementation was originally proposed by Woodward et al. [83] and later
analyzed in detail for NRZ data by Forbes [82]. Fig. 4.2 shows the schematic of the
transimpedance frontend and the small-signal equivalent circuit of its implementation.
CHAPTER 4. RECEIVERS 42
The two stacked diodes convert an optically differential signal into a single-ended pho-
tocurrent input (iin). The DC light intensity incident on the two diodes is cancelled
out and only the difference current flows into the circuit. In the equivalent circuit
shown, gm and gds are the total transconductance and output conductance of the
MOS transistors respectively, CL is the total capacitive loading of all the components
connected at the output of the frontend, Ci is the total input capacitance of the
receiver, and Rf is the feedback resistance. The gain of the amplifier can also be
expressed as A = gm/gds. The transimpedance gain of the first stage is given by
Z(s) =Vout(s)
Iin(s)=
1/gds − ARf
(1 + A) + τs+ s2/ω2o
(4.1)
where τ = CiRf +CL/gds+Ci/gds and 1/ω2o = CiRfCL/gds. ζ = ωoτ/2 is the damping
factor, which determines the settling time of the transient response.
gdsC L
vin
vout
R f
vin vout
R f
i in
vini
C i
+
−
+
−
inmg
Figure 4.2: Schematic of the transimpedance frontend and the small-signal equiva-lent circuit of its implementation.
Based on Eq. 4.1, the transimpedance frontend was analyzed for short pulse and
NRZ inputs. Parameter values corresponding to 0.25 µm technology were assumed,
this being the technology in which this receiver was fabricated (Rf = 5 kΩ, Cin =
90 fF, A = 15, gds = 0.125 mf, CL = 60 fF). To simulate a current pulse input, a
pulse of 10 ps was assumed at the input. NRZ data was simulated by a step stimulus.
In the NRZ case, operation at 1 Gbps was assumed to compute energy in a bit period.
For equal energy in pulse and NRZ inputs, the time response of the frontend is
shown in Fig. 4.3. Short pulse input provides a larger peak response than does NRZ
CHAPTER 4. RECEIVERS 43
input. This is because the large amplitude of the high frequency components of the
short pulse help produce larger output amplitude maxima. Another way to explain
this is that the charge does not leak away at the input with short pulses while the
output peaks, in contrast to NRZ, and a larger output is generated for the same
total charge. Effectively, short pulses enhance the sensitivity of the transimpedance
receiver. Boivin et al. [47] first described this sensitivity enhancement with short
pulses for telecommunication applications. Later Winzer et al. [84] analyzed it a
step further. Both of these references looked at the sensitivity enhancement in a
bandwidth-constrained transimpedance receiver for optimum thermal and shot noise
performance. For short pulse interconnects, thermal and shot noise do not limit the
performance and the bandwidth of the receiver can be quite large. This might allow
larger sensitivity gains.
0 0.2 0.4 0.6 0.8 1−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Time (ns)
Am
plitu
de (
a.u.
)
pulsestep
Time (ns)
Out
put
volt
age
(a.u
.)
Figure 4.3: Pulse and step response of the transimpedance stage
We now look at the effect of variations of different components on the relative
performance with short pulses and NRZ data.
CHAPTER 4. RECEIVERS 44
Effect of feedback resistance
The feedback resistance (Rf ) determines the transimpedance gain. Increasing the
value of the feedback resistance increases the transimpedance gain while reducing the
damping factor. The time response with different feedback resistance values is plotted
in Fig. 4.4. A larger feedback resistance causes a bigger amplitude for both pulse and
step responses.
0 0.2 0.4 0.6 0.8 1−0.1
0
0.1
0.2
0.3
Time (ns)
Pul
se r
espo
nse
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
Time (ns)
Ste
p re
spon
se
3k5k7k
Step
res
pons
e (a
.u.)
Pul
se r
espo
nse
(a.u
.)
Figure 4.4: Pulse and step response of the transimpedance stage with varying feed-back resistance.
Comparing the response of a pulse input to that of a step input with different
feedback resistances (Fig. 4.5), a smaller feedback resistance gives larger relative
sensitivity enhancement for short pulses. This is because the bandwidth of the receiver
increases with lower feedback resistance and more frequency components appear at the
output of the transimpedance stage, even though the absolute amplitude reduces due
to smaller gain. Simultaneously, the pulse width also reduces with smaller feedback
resistance. Resistance cannot be reduced to a very small value because later stages
require a minimum pulse width to propagate the pulse, and also, very low gain is not
acceptable. For maximum gain, we would like to have the largest possible feedback
CHAPTER 4. RECEIVERS 45
resistance, which would not broaden the pulse to the extent of causing inter-symbol-
interference (ISI) in the system at the bit rate of operation. The appropriate resistance
value is determined given the bit rate.
0 0.2 0.4 0.6 0.8 1−1
0
1
2
3
4
5
6
Time (ns)
Am
plitu
de (
a.u.
)
3k5k7k
Out
put
volt
age
(a.u
.)
Figure 4.5: Pulse response of the transimpedance stage with varying feedback re-sistances normalized to the maximum of step response.
Effect of input capacitance
The input capacitance (Cin) is dominated by the photodiode capacitance. Based on
Eq. 4.1, the pulse and step responses for different input capacitances are plotted in
Fig. 4.6. A smaller capacitance value produces larger amplitude with a given pulse
input. It is desirable to reduce the frontend capacitance to as small a number as
possible to improve the sensitivity of the receiver.
Effect of pulse width
In current simulations, an electrical pulse of 10 ps is assumed, based on device char-
acteristics. The carrier sweep-out time in a typical p-i-n photodiode is expected to
CHAPTER 4. RECEIVERS 46
0 0.2 0.4 0.6 0.8 1−0.1
0
0.1
0.2
0.3
0.4
Time (ns)
Pul
se r
espo
nse
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
Time (ns)
Ste
p re
spon
se
50fF90fF130fF
Pul
se r
esep
onse
(a.
u.)
Step
res
pons
e (a
.u.)
Figure 4.6: Pulse and step response of the transimpedance stage with varying front-end capacitance.
be of this magnitude, limited by carrier transport. If the electrical pulses generated
at the output of the photodiode are wider, then the behavior of the transimpedance
stage is shown in Fig. 4.7. For the same energy in the pulses, wider pulses produce
smaller peak amplitude at the output. If we broaden the pulses to a bit period, we get
the NRZ case. To get the largest peak amplitude we would like to use the narrowest
pulses possible.
Advantages and issues with short pulse operation
As seen above, short pulses can improve the sensitivity of the transimpedance receiver.
This sensitivity improvement for the entire receiver can be more than 3 dB. The
latency of this receiver can also be reduced significantly (up to 65%) by using short
pulses.
Apart from the above advantages of short pulses, there are some issues with us-
ing short pulses. Since short pulses create a fast transient response in the receiver,
CHAPTER 4. RECEIVERS 47
0 0.2 0.4 0.6 0.8 1−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Time (ns)
Out
put a
mpl
itude
Increasing pulse width
Out
put
volt
age
(a.u
.)
Figure 4.7: Pulse response of the transimpedance stage with varying pulse width
this transient could cause inductive supply noise (Ldi/dt), which might reduce the
signal-to-noise ratio in large arrays. Also, the output pulse width generated by the
transimpedance frontend can vary due to parameter variation. If the pulses at the
output are too short, then they will not be detected by the decision stage and if the
pulses are broader than the bit period, data dependent effects would degrade the
receiver performance.
The integrating receiver described in the next section solves these problems.
4.2 Integrating receiver
The concept of positive feedback for amplification and logic decision has been reported
in many places [85] [86] [87]. Based on a similar principle, the implementation of an
integrating receiver is shown in Fig. 4.8. This receiver is based on the strongarm latch
mentioned in Ref. [88]. In this implementation, differential input data is integrated
at the input nodes for half the clock cycle (clock low), during which the rest of the
circuit is put into a metastable state with both outputs at the supply voltage. In the
CHAPTER 4. RECEIVERS 48
next half cycle (clock high), a decision is made about the received data. At the end of
this half cycle the input nodes are reset so that new data can be integrated (Fig. 4.9).
The output of this receiver has valid data output for only half the bit period, and for
the other half the output is at the precharge voltage. To convert this output to valid
data for the entire cycle, a set-reset (SR) latch is used. The SR latch limits the speed
of this receiver. The performance of the entire receiver can be improved by using a
modified latch to get a valid bit for the entire bit period. This has been demonstrated
in [89]. A modified latch was not implemented in this work.
in
clk clkclk
out
clk
out
Vsupply
VcVc
in
Figure 4.8: Schematic of the integrating receiver frontend
out
in(NRZ)
clock
evaluation integration
in(shortpulse RZ)
integration evaluation
out
Figure 4.9: Timing diagram of the operation of integrating receiver with NRZ andshort pulse inputs.
CHAPTER 4. RECEIVERS 49
The sensitivity of an integrating receiver is typically better than a transimpedance
receiver because of the use of positive feedback in the latch, though it requires a clock
synchronized to the data input at the frontend. The synchronized clock is typically
generated using a phase-locked loop (PLL) or a delay-locked loop (DLL). It can even
be generated by using a totem-pole diode pair as mentioned in Chapter 6. Data
extraction by automatic clock synchronization was not implemented in the present
work. The clock phase was aligned manually.
The voltage difference ∆V at the input nodes is a function of the total capacitance
of all the devices connected at the input nodes (Cin) and the difference of input optical
powers in the two beams (∆Pin). In a simplified form, this voltage difference can be
represented as
∆V =∆PinRt
Cin
(4.2)
where R is the responsivity of the photodiodes and t is the time of integration.
A larger ∆V gives a faster response and is also less likely to be affected by the
noise sources. This equation suggests that by reducing Cin, the differential voltage
generated by the input light can be increased. Cin is dominated by the photodiode
capacitance, which can be reduced by reducing the size of the diode or by using a
different kind of diode like a metal-semiconductor-metal (MSM) diode, or by using
a silicon-on-insulator (SOI) process. For short pulses, the time for which the charge
integrates is the pulse width, which is very short compared to the bit period. For
NRZ, the integration time is half the cycle.
Advantages with short pulses
The problems encountered in short pulse operation with the transimpedance receiver
are not present in this receiver. Since short pulses are integrated at the receiver
frontend, no spikes are generated in the receiver supply. Also the receiver does not
generate short pulses, instead it generates a 50% duty cycle output, which can easily
be converted into NRZ data for subsequent use.
In a short pulse link, an integrating receiver with a clocked frontend has significant
CHAPTER 4. RECEIVERS 50
advantages over NRZ signaling. These advantages include sensitivity enhancement,
tolerance to pulse arrival time, latency reduction, and improved supply noise perfor-
mance. The following sections discuss these advantages.
Sensitivity enhancement
If the integration period is half the clock cycle, then the timing diagram for this
receiver is as shown in Fig. 4.9. NRZ input incident during the evaluation period is
not integrated or utilized. By contrast, short pulses have all the energy concentrated
in a very brief period and it is integrated during the integration period. When the
integration and the evaluation period are the same, half of the energy in the NRZ
input is wasted. Effectively, the use of short pulses gives a 3 dB enhancement in
sensitivity.
Tolerance to pulse arrival
nrz t spt
in(NRZ)
clock
integration evaluation integration evaluation
in(shortpulse)
Figure 4.10: Input data arrival-tolerance margins illustrated for NRZ and shortpulse inputs.
If short pulses are used, then the pulses can arrive anytime during the integration
period and all the charge will be integrated. For these very short pulses, the flexibility
in arrival time is about half a bit period. A similar flexibility exists with NRZ data,
but it depends on the rise and fall times of the bit, which are typically much larger
than the rise and fall times of the short pulses. The margin is then reduced by the
sum of the rise and fall times of NRZ input.
CHAPTER 4. RECEIVERS 51
Fig. 4.10 shows how the rise and fall times of NRZ input reduces the tolerance to
the arrival time of data. In this figure tnrz and tsp are tolerances to arrival for NRZ
and short pulse input respectively.
Latency reduction
The time taken to generate a valid output from the time of arrival of data can be
reduced by using short pulses instead of NRZ data. The latency (the total delay
between input and output) is reduced at the expense of the timing margin. If the
pulses arrive closer to the end of the integration period, then the delay from input to
output can be reduced in this receiver, though the receiver performance is susceptible
to variation in pulse arrival time. For example, if the pulses are delayed too much,
then they fall outside the integration period and the charge is not integrated causing
an error at the output. The latency reduction with short pulses is explained in detail
in Chapter 5.
4.3 Totem-pole diode pair receiver
The receivers mentioned above have a relatively high latency because they have one or
more stages of amplification, each of which introduces delay. Controlling or limiting
latency is, however, crucial for on-chip interconnects. Also, the amplification stages
in receivers add skew and jitter, which could be a problem in a large receiver array
or optical clock injection. By eliminating the amplification stages and generating
full swing at the diodes, the latency of the receiver can be reduced, and skew and
jitter associated with amplification stages can be avoided. This leads to a totem-
pole diode-pair-based receiver implementation, or, in short, a “totem-pole” receiver
implementation.
A stacked diode pair (“totem-pole”) is connected to a high impedance node, pos-
sibly a buffer, to create an integrating frontend (Fig. 4.11). This receiver qualifies
as an integrating receiver but is treated separately here because of its interesting
characteristics for short pulse operation, especially for clock injection. If the data
beam is incident on the top diode, then the current flows into the circuit; if the data
CHAPTER 4. RECEIVERS 52
in
Figure 4.11: Totem-pole diode pair connected to a high impedance input node ofinverter.
is incident on the bottom diode, then the current flows out of the circuit. As soon
as the node “in” is charged to supply rails, diodes are forward biased clamping the
voltage on node “in”.
This receiver trades off the electrical gain stages for additional optical power. A
few researchers have thought of using only the diode as a receiver, but primarily for
telecommunications applications where photons are scarce. Williams et al. used the
photodiode with an erbium doped fiber amplifier (EDFA) to boost the light intensity
to generate a large voltage swing [90]. Yoneyama et al. have hypothesized a receiver
consisting only of a photodiode and estimated power dissipation in links as a function
of bit error rate [91]. In contrast to telecommunications, interconnects tend to have
a larger optical power at the receiver, making this receiver architecture more feasible.
Also, in the references mentioned above, the output of the photodiodes is connected
to a 50 Ω resistance, which might require larger optical power for operation compared
to driving a high-impedance node (e.g. a capacitance). The capacitance seen by the
flip-chip bonded photodiode connected to an inverter circuit could be below 100 fF,
making this high impedance application more attractive.
There are many advantages of using this structure apart from its simplicity. This
receiver can operate at very high speeds by using short pulses, because the charging
time of the input node is determined by the carrier transit time inside the diodes,
which is of the order of few picoseconds. Full swing signals are generated in a single
stage eliminating the jitter and skew from amplifier stages. A single stage also reduces
the latency of the receiver, which can potentially make on-chip optical interconnects
feasible.
CHAPTER 4. RECEIVERS 53
The optical input power requirement for this receiver depends on the responsivity
of the diodes and the capacitance of the input node. The capacitance on this node
is dominated by the diode capacitance. In the current work, p-n diodes in silicon are
implemented. Monolithic diodes are more appealing for clock injection because of
their potential for low capacitance.
The analysis of latency of this receiver is given in Chapter 5 and the details of its
implementation for clock injection are in Chapter 6.
4.4 Fabrication and testing
Fabrication details and measurements of the transimpedance and integrating receivers
are presented in this section. Measurements for the effect of supply noise on these
receivers are also given.
4.4.1 Transimpedance receiver
There are many ways of implementing a transimpedance receiver. In this work, an
inverter-amplifier-based implementation is chosen because of the simplicity and small
footprint. This possibly allows for very large densities of optical IO.
The transimpedance receiver was fabricated in 0.25 µm technology. A schematic
of the entire circuit is shown in Fig. 4.12. This architecture is analyzed in detail in
Ref. [82]. The first stage of this receiver is a transimpedance stage with the feedback
resistor implemented with a PMOS transistor. The effective feedback resistance of
PMOS can be changed by changing the voltage at node vtune. A small PMOS device
is capable of providing large resistance values. The transimpedance stage also consists
of the clamping diodes, formed by source and gate connected NMOS transistors, to
limit the output swing. By limiting the output swing, the dynamic range of this
receiver is increased.
This receiver has a very small footprint of 15 µm × 17 µm, which allows for
high density integration. This circuit was simulated in circuit simulator SPICE with
vtune at 0 V. The transimpedance gain of the first stage was about 5 kΩ. Power
CHAPTER 4. RECEIVERS 54
clampingdiodes
4
4
5
24
10
24
10
24
9
in out
vtune
Figure 4.12: Schematic of the transimpedance receiver. Transistor widths men-tioned here are in λ, where λ = 0.2 µm for the technology used. All transistors areminimum length.
dissipation of this receiver was approximately 3 mW. Simulations show that for 100 fF
of diode capacitance this receiver can work up to ∼ 1.5 Gbps for 10 µA of average
photocurrent for NRZ input into the circuit. The capacitance of the bonded devices
was about 260 fF, for which this receiver worked at much lower speeds. A simulated
performance of this receiver with 10 µA of average photocurrent is shown in Fig. 4.13.
With no light incident on the receiver, the output stays on one supply rail and
when a short pulse is incident on a photodiode, it either switches to the other sup-
ply rail or continues to stay on the same supply rail depending on which diode is
illuminated. If the receiver switches to the other supply rail then it has to switch
back to the earlier rail within a bit period for no inter-symbol interference. Hence the
recovery time of the receiver determines the maximum speed the receiver can operate
with short pulses. For short pulse input with ∼ 520 fF total device capacitance, the
receiver works up to 200 Mbps in simulation. With lower capacitance this receiver
can operate at much higher speeds with short pulses in simulation.
Receivers on the chip were tested by using optical readout. The bit error rate
tester was not connected on this chip. The receiver was tested at 600 Mbps with
NRZ data generated from directly modulated lasers, and the eye diagram obtained
CHAPTER 4. RECEIVERS 55
Vo
ltag
es (
lin)
0
500m
1
1.5
2
2.5
Time (lin) (TIME)0 5n 10n 15n 20n 25n 30n
1 Gbps with 260 fF diode capacitance
Vo
ltag
es (
lin)
0
500m
1
1.5
2
2.5
Time (lin) (TIME)0 5n 10n 15n 20n 25n 30n
1.5 Gbps with 100 fF diode capacitance
Figure 4.13: SPICE simulation of the transimpedance receiver with 10 µA averagephotocurrent. Voltage at node out is shown. Top curve is for 1 Gbps operation ofthe receiver with 260 fF of diode capacitance. Bottom curve shows the operation at1.5 Gbps with 100 fF of diode capacitance.
by optical readout of the modulator is shown in Fig. 4.14. The eye is barely open at
this speed because of the modulator driver limitation to drive 260 fF. The speed of
operation was also limited because of the large capacitance.
The receiver performance was also tested using electrical samplers. This way the
modulator driver was not a limitation since high capacitance modulator diodes were
not involved. The receiver was verified to work up to 900 Mbps with NRZ input.
Short pulse testing of the receiver was done at only 80 Mbps because of the speed
limitation of the laser. The receiver output was driving a modulator driver, which was
CHAPTER 4. RECEIVERS 56
Figure 4.14: Eye diagram of the transimpedance receiver operation with NRZ inputat 600 Mb/s. 26 µA average photocurrent is injected in each beam.
Figure 4.15: Eye diagram of the transimpedance receiver output voltage with shortpulse input at 80 Mb/s.
read out by a cw beam as shown in Fig. 4.15. A 400 Mbps short-pulse laser available
did not have the sufficient output power to test the receiver in a chip-to-chip link.
Because of the limitation in driving large capacitance, no sensitivity enhancement
was measured for this receiver. According to the simulations, with 40 fF of diode
capacitance, there is ∼ 5 dB of sensitivity enhancement.
4.4.2 Integrating receiver
The schematic of the integrating receiver circuit is shown in Fig. 4.16. This circuit
was fabricated in the 0.5 µm technology. According to the simulations, the average
electrical power consumption of this receiver was 2.3 mW. A higher device capacitance
for this circuit requires a larger optical power for the same speed of operation.
The integrating receiver was operated at 600 Mbps with roughly 50 µW (∼ 14 µA
CHAPTER 4. RECEIVERS 57
in
clk clkclk
out
clk
out
Vsupply
VcVc
in
24
12 12
12 12
20 20
8 81616
16
Figure 4.16: Schematic of the integrating receiver fabricated in the 0.5 µm tech-nology. Transistor widths are shown in λ, where λ is 0.35 µm. All transistors areminimum length.
Figure 4.17: Operation of the integrating receiver with optical readout at 600Mb/s.
photocurrent) average power in each beam from directly modulated lasers with NRZ
data input. The readout was done optically. The eye diagram of the receiver operation
is shown in Fig. 4.17.
The receiver performance with short pulses and NRZ input data was compared
by operating the receiver in an optical link. The link was operated at 400 Mbps with
the pseudo random sequence generator driving the modulator. A custom externally-
driven modelocked laser was used to generate short pulses at 400 MHz repetition rate.
The output power of this laser was approximately 5 mW, which was just sufficient
for the testing of this link [92].
The receiver output was put into the BER tester and the number of errors was
CHAPTER 4. RECEIVERS 58
−16 −15.5 −15 −14.5 −14 −13.5 −13 −12.5 −12 −11.5
1e−3
1e−5
1e−7
1e−9
Power per beam (dBm)
Bit
Err
or R
ate
short pulseNRZ
~ 3.1 dB
Figure 4.18: Sensitivity comparison for NRZ and short pulse data for integratingreceiver operating at 400 Mbps in a chip-to-chip link.
counted to get the bit error rate. Fig. 4.18 shows the BER vs. signal power per beam
at the receiver. The operation with short pulses required 3 dB less power compared
to NRZ input, verifying the sensitivity enhancement mentioned in the last section.
4.4.3 Measurement with supply noise
The digital circuits placed close to the receivers can inject noise in them through the
supply line and the substrate [93] [94]. Also, a large number of receivers connected to
the same supply may switch simultaneously and generate large current spikes on the
supply line. Because of the impedance of the supply line, voltage variations occur on
these lines with current spikes. Since the receivers amplify small-magnitude analog
signals, they are susceptible to these noise sources. The voltage noise on supply lines
might cause jitter at the output of the receivers.
Using the pump-probe method, given in detail in Chapter 5, the delay of the
transimpedance receiver with supply voltage was mapped [95]. This curve is shown
CHAPTER 4. RECEIVERS 59
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3530
540
550
560
570
580
590
600
610
620
630
Supply (V)
Del
ay (
ps)
Figure 4.19: Transimpedance receiver delay variation as a function of supply voltage.This measurement was done via the pump-probe technique. The nominal supplyvoltage was 2.5 V.
in Fig. 4.19. The delay varies as ∼ 11 ps/100 mV, which shows that this receiver is
quite sensitive to supply variations. With a large receiver array connected to a single
supply, it is possible to have few hundred mV of supply voltage fluctuation, which
would result in quite significant delay variation. The large amount of resulting jitter
can add a performance penalty to the the link.
To test the performance of the integrating receiver with supply noise, a chip-to-
chip link was operated at 100 Mbps with NRZ modulation. Since high frequency
supply noise could not be injected because of the by-pass capacitors, a sinusoidal
noise signal at 1 KHz was injected on the receiver supply from an external source.
Using the on-chip BER tester it was possible to quantify the effect of injected supply
noise. Bit error rate curves vs. total optical power in the link are plotted in Fig. 4.20.
The power penalty was only 0.12 dB for 100 mV of supply noise [96].
To characterize the effect of substrate noise on the integrating receiver, voltage-
controlled oscillators in the vicinity of the receivers were operated. There was no
CHAPTER 4. RECEIVERS 60
−8 −7.8 −7.6 −7.4 −7.2 −7
1e−3
1e−5
1e−7
1e−9
Power per beam (dBm)
Bit
Err
or R
ate
No noise0.1Vpp0.2Vpp0.3Vpp
Figure 4.20: Bit error rate curves of integrating receiver operation in a link at100 Mbps with NRZ data. Sinusoid noise was injected in the supply with differentpeak-to-peak values at 1 KHz.
measurable power penalty on the operation of the link while running these noise
generators.
4.5 Summary
Transimpedance, integrating, and totem-pole receiver topologies were discussed in
this chapter. Even though these topologies have been examined in literature, not
much work has been done to analyze them for short pulse operation. This chapter
has looked at the operation of these receivers with short pulse (RZ) input and the
possible advantages and issues with this operation.
Transimpedance receiver sensitivity can be enhanced by using short pulses, as
compared to NRZ data, though larger supply noise might be generated with short
pulse operation. A receiver fabricated in the 0.25 µm technology was shown to be
prone to jitter with supply noise as it had a delay variation of ∼ 11 ps/100 mV of
CHAPTER 4. RECEIVERS 61
supply voltage variation measured using pump-probe technique. The operation of
this receiver was verified up to 600 Mbps with optical readout and up to 900 Mbps
with on-chip samplers. The performance of this receiver was affected by the larger-
than-designed capacitance of flip-chip bonded diodes.
The integrating receiver mentioned above has higher sensitivity than the tran-
simpedance receiver because it amplifies the signal with positive feedback. Being
fully differential, this receiver is more immune to supply and substrate noise. In a
chip-to-chip link with every 100 mV of supply noise, an optical power penalty of only
0.12 dB was measured. 600 Mbps NRZ operation with direct modulated lasers was
demonstrated. The receiver had a ∼ 3 dB of sensitivity enhancement in the link with
short pulse operation compared to NRZ.
It would seem that a fully differential integrating receiver is well suited for oper-
ation with short optical pulses.
Chapter 5
Latency in Interconnects
In connections between and within electronic chips, total latency is a very important
parameter in determining system performance. As the CMOS linewidth scales, the
processor clock speed increases, making it difficult to run an entire chip synchronously.
In other words, transferring data within a clock cycle is becoming difficult. According
to the International Technology Roadmap for Semiconductors (ITRS) estimate [42],
gate delay and local interconnect delay are being reduced as the technology is scal-
ing (Fig. 5.1), but the delay of global interconnects with and without repeaters is
continuously increasing relative to the clock period.
The propagation velocity of global interconnects with repeaters is a small fraction
of the velocity of light (10% - 20%) and is not expected to improve significantly [7]
[16] [97]. For 0.25 µm technology the delay of global lines is less than a clock cycle,
but for future technologies the delay will be longer than a clock cycle. If the signals
can be propagated at a significant fraction of the velocity of light, e.g. > 0.3 c, the
delay in communication will be less than a clock cycle up to 0.1 µm technology [7].
It might be possible to use optics to provide communication across chips at a
significant fraction of the velocity of light. For optics to be feasible, the delay in the
transmitter and the receiver has to be very low, of the order of a few gate delays. The
delay of propagation in optical media cannot be altered though it is relatively fast
(∼ 0.67c in glass). Transmitter and receiver circuits are designed in silicon CMOS,
hence they are likely to keep pace with silicon chips to perform logic operations as
62
CHAPTER 5. LATENCY IN INTERCONNECTS 63
the technology scales [8].
Dambre et al. [98] have shown that with low latency optical links, three-dimensional
optoelectronic multi-FPGAs outperform two-dimensional electronic FPGAs. In a re-
cent paper, Collet et al. [99] have concluded that since the most critical issue in
computer architecture is the access time to the main memory, the signal latency is of
critical importance in implementing optical interconnects. In Ref. [100] concerns are
expressed about increased latency in optical interconnects compared to their electri-
cal counterpart because of the added functions of electrical-to-optical and optical-to-
electrical conversion. But because of advanced integration techniques, as mentioned
in Chapter 3, parasitics associated with optical components can be reduced by a sig-
nificant amount reducing the latency in driving them. Kyriakis-Bitzaros et al. [101],
on the basis of a realistic model in 0.8 µm CMOS technology, demonstrated that the
latency of an optical link is lower than the electrical link even for sub-centimeter line
length.
Most work until now has looked at the latency of an optical link with NRZ data
Figure 5.1: ITRS projection of on-chip electrical interconnect delays with technologyscaling [42]
CHAPTER 5. LATENCY IN INTERCONNECTS 64
format with a VCSEL or an edge-emitting laser as a transmitter. Turn-on delay
of lasers could add significant latency, which depends on the electrical drive signal
strength and waveform [18]. Turn-on delay can be eliminated by using modulators
instead of VCSELs. It is possible to significantly reduce the latency of optical in-
terconnects by using short pulses with modulators. The fast optical rise time and
concentration of all the energy in short pulses both work towards reducing the la-
tency. In this chapter we will explore the latency in optical interconnects operating
with short pulses.
receiver
delay in propagation
modulator
delay
driver delay
Figure 5.2: Components of latency in a modulator-based interconnect system
Optical interconnects have three components: the transmitter, the medium of
propagation and the receiver. A schematic of a modulator-based optical interconnect
system is shown in Fig. 5.2. The transmitter can be easily optimized because it es-
sentially consists only of digital components (its input is a digital logic level). For a
MQW modulator, the driver is typically an electrical buffer chain. The optimization
of a buffer chain is mentioned in Ref. [102]. The receiver, having analog input pro-
vides the largest room for improvement. A similar viewpoint was also expressed in
Ref. [103]. In the following sections we will analyze different receiver architectures for
CHAPTER 5. LATENCY IN INTERCONNECTS 65
latency with short pulse operation. Signal latency, here, is defined as the maximum
of rise or fall delay between input and output waveforms, measured at 50% of the
signal amplitude.
The organization of this chapter is as follows. The next three sections address
the latency of three different receiver architectures: transimpedance, integrating, and
totem-pole diode pair. The latency analysis of receivers via modeling is verified by
SPICE simulations. Experimental measurements of the latency of the transimpedance
receiver are also presented. The scaling of latency with technology is considered in
Section 5.4. Finally, the conclusions are presented.
5.1 Transimpedance receivers
The transimpedance receiver is the most commonly used receiver in optical commu-
nication. The latency of the transimpedance receiver with NRZ data format has
been analyzed in Ref. [103] and a measurement of latency for one implementation
is reported in [104]. For this work such receivers were fabricated in 0.25 µm CMOS
technology as mentioned in Chapter 4. This circuit was analyzed by simulating in
circuit- simulator SPICE and by using a first-order analytic model. Using the model
helps in a better understanding of the latency in this receiver. Intuitively, we would
expect to lower the latency of the transimpedance receiver by using short pulses, as
compared to NRZ. This is because for the same energy in the pulse, a larger maximum
amplitude at the output of the transimpedance stage is generated, which reduces the
gain required from later stages, hence reducing the latency. Also the transimpedance
stage is charged faster with a short pulse, as compared to NRZ input, because of
the concentration of energy in a very short period (Fig. 5.3). The following section
deals with the modeling of the latency and the section after that gives the details of
measurement setup and results.
CHAPTER 5. LATENCY IN INTERCONNECTS 66
out
Larger amplitude at this node
Transimpedance stage:
Postamplifier chain:smaller gain is required
Charges the output faster
Figure 5.3: Mechanism of latency reduction in a transimpedance receiver with shortpulse input.
5.1.1 Modeling of latency
To understand the mechanism of latency in receivers, a first-order model of the tran-
simpedance receiver is analyzed. This model is shown in Fig. 5.4. The first stage is the
transimpedance amplifier with a finite gain-bandwidth product. An ideal amplifier
with a series output impedance Ra (same as 1/gds of transistors) together with the
output capacitance model the finite gain-bandwidth amplifier. All the capacitances
at the output of the frontend amplifier, including the input capacitance of the next
stage, are combined into a single capacitance represented by CL in the figure. The
gain stages are modeled as open loop amplifiers with a finite gain-bandwidth prod-
uct. After computing the swing at the output of the transimpedance amplifier, the
required gain-per-stage (Gps) is calculated for the post-amplifier chain. Due to the
finite gain-bandwidth product, the time constant of the stage can be deduced given
the required Gps. A first order estimation of latency can be done by adding the time
constants of all these stages. A step input simulates the NRZ response and a 10 ps
pulse simulates the pulse response.
The following parameter values are assumed for simulation, which correspond to
the parameters of 0.25 µm technology in which this receiver was fabricated, with
low-capacitance high-responsivity photodetectors.
CHAPTER 5. LATENCY IN INTERCONNECTS 67
• Total capacitance at the input of the receiver (Cin) = 90 fF
• Feedback resistance (Rf ) = 5 kΩ
• Output impedance of the amplifier (Ra) = 8 kΩ
• Total capacitive loading at the output of the amplifier (CL) = 60 fF
• Open loop gain of the amplifier (A1) = 15
• Gain-bandwidth product of each post amplifier stage = 10 GHz
• Speed of operation = 1 Gbps
• Photodiode responsivity = 0.5 A/W
• Number of post amplifier stages = 2
• Pulse width of electrical current pulses generated from photodiode (limited by
the transit time of carriers in intrinsic region) = 10 ps
R a
R f
C L
−A
i
amp1 amp2 amp3
variable length amplifier chain
v vin out
C in
in
1
Figure 5.4: First order model of a transimpedance receiver with variable lengthpost-amplifier chain
The capacitance of each photodiode is assumed to be 40 fF which is close to the
value of capacitance reported for flip-chip bonded MQW diodes [57]. The capacitance
value achieved in the current work was much higher, but in future runs it is expected
to be below 40 fF. The transfer function of the transimpedance stage is given by
H(s) =Vout(s)
Iin(s)=
Ra − A1Rf
1 + A1 + s(RfCin +RaCin +RaCL) + s2(RfCinRaCL)(5.1)
CHAPTER 5. LATENCY IN INTERCONNECTS 68
Based on this transfer function, pulse and step responses were computed for the tran-
simpedance stage. Energies per bit (also referred as pulse energies) were computed
based on 1 Gbps operation of the receiver. Latency in the receiver with different
input pulse energies is shown in Fig. 5.5. This result shows a latency reduction of ∼65% for large pulse energies by using short pulses as compared to NRZ [105]. This is
a very large reduction in latency which could make optical interconnects competitive
for on-chip connections. These results match very well with the results of precise
simulations in circuit-simulator SPICE of the transimpedance receiver fabricated on
this chip. This validates our first-order model, which we can therefore use to explore
the effect of different parameters on the latency of the receiver.
0 50 100 150 200 250 300 350 4000
50
100
150
200
250
300
350
400
450
500
Optical energy per bit (fJ)
Rec
eive
r d
elay
(p
s)
short pulseNRZ
’x’ are SPICE simulation
Figure 5.5: Pulse energy vs. delay for short pulse and NRZ input for the first-ordermodel. Corresponding SPICE simulations are denoted with “x”.
To consider the effect of the number of stages in the post-amplifier chain on latency,
if the total gain needed from the post amplifier chain is A and the number of stages
is N , then the gain required per stage is Gps = A1/N . Due to finite gain-bandwidth
product of each stage, the delay per stage (inverse of bandwidth) is proportional to
CHAPTER 5. LATENCY IN INTERCONNECTS 69
the gain required per stage. Hence the total delay of the chain (τ) scales as
τ ∝ N.A1/N (5.2)
Gps decreases exponentially with the number of stages N . For low N , the exponential
decay of A1/N dominates in Eq. 5.2, while for a larger N the linear increase of N
dominates. This behavior can be seen in Fig. 5.6. Intuitively, for a large N , when N
is increased to N+1, the reduction in gain per stage is very small. Since the reduction
in gain is small, the reduction in delay per stage is also small, but because of the extra
stage the total delay (which is the sum of the delays of all stages) increases. On the
other hand, for a small N , when N is increased to N + 1, the reduction in gain per
stage is relatively large causing a large reduction in the delay. Even with one extra
stage, the overall delay is reduced.
0 5 10 15 20 25 30 35 400
10
20
30
40
50
60
70
80
90
Number of stages (N)
To
tal d
elay
N.A
1/N
Number of stages vs. total delay
A=20 A=200
A=2000
Figure 5.6: Variation of delay vs. number of post-amplifier stages for different totalgain, assuming a constant gain-bandwidth product for all stages.
The receiver delay vs. the number of stages for different input optical energies per
bit (also referred as pulse energy) are plotted in Fig. 5.7. This plot follows the same
CHAPTER 5. LATENCY IN INTERCONNECTS 70
1 2 3 4 5 6 7 8 9 100
100
200
300
400
500
600
700
Number of post amplifier
Rec
eive
r d
elay
(p
s)
Short pulseNRZ
100 fJ
30 fJ
100 fJ
30 fJ
Figure 5.7: Number of post-amplifier stages vs. delay for different pulse energy
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
350
400
450
500
Pulse energy (fJ)
Del
ay (
ps)
Short pulseNRZ
2 stages
3 stages
2 stages
3 stages
Figure 5.8: Pulse energy vs. receiver delay for 2 and 3 post-amplifier stages
CHAPTER 5. LATENCY IN INTERCONNECTS 71
pattern as in Fig. 5.6. The calculated latencies of the receiver vs. pulse energy for
a 2 stage and a 3 stage post amplifier are shown in Fig. 5.8. This figure illustrates
that as the pulse energy is increased, the amount of gain required reduces, causing
the delay to be minimized by a lower number of stages for a pulse energy higher than
a certain crossover pulse energy. Crossover occurs at ∼ 70 fJ for NRZ in this figure
but for short pulses this crossover occurs below the the plotted pulse energies.
In this section we saw that by using short pulses the latency in the transimpedance
receiver can be reduced very significantly (∼ 65%) compared to NRZ data. The results
of the first order model and SPICE simulation match very closely. By using the first
order model, it was also concluded that for a given input pulse energy there is an
optimum number of stages to minimize latency, which may not be the same for short
pulses and NRZ input. For reasonable pulse energies, as a rule of thumb, the latency
is minimized by using somewhere between two to five post-amplifier stages.
Measurement results and setup details are given in the next section and the results
will be seen to verify the simulated results of this section.
5.1.2 Measurement of latency
The results in the earlier section suggest that the latency is significantly improved by
using short pulses. To verify this concept, the latency of the receiver-modulator driver
pair was measured experimentally. Circuits were fabricated in 0.25 µm CMOS tech-
nology and the optical devices, multiple-quantum-well diodes, were flip-chip bonded
with the process mentioned in Chapter 3. An optical pump-probe setup was used for
measurement [95]. Short pulses (∼ 150 fs) generated from a Ti:sapphire modelocked
laser at 850 nm were used as pump and probe beam as illustrated in Fig. 5.9. Short
pulses at 80 MHz (repetition rate of the laser) as pump beam and a cw laser output
as balance beam are incident on the differential diode pair at the receiver input. The
pump beam excites the receiver, while the balance beam brings the receiver back
to its original state over time. The electrical output of the receiver drives a modu-
lator driver. The voltage output of the modulator driver is sampled optically with
a readout beam marked as probe beam in the figure, at the same rate as the pump
CHAPTER 5. LATENCY IN INTERCONNECTS 72
beam. Varying the delay between pump and probe beam maps the response of the
transceiver pair. Since the optical pulses are only 150 fs, sub-picosecond resolution
can be achieved in measurements [95]. This approach allows one transceiver transi-
tion to be accurately measured. By interchanging the pump and the balance beam,
it is possible to measure the other transition too.
modelockedlaser
lasercw diode
delay stage
pumpbeam
balancebeam
probebeam
or lock−inoscilloscope
chopper
CMOS chipwith MQW diodes
Figure 5.9: Pump-probe setup for transceiver latency measurement
The latency of the entire interconnect can be easily computed by adding the delay
in propagation to the measured latency of the transceiver pair. The measurement of
latency for NRZ data was not done with the same setup because the delays required
for that measurement were much larger. These NRZ measurements were done using
a high speed detector (2.5 GHz bandwidth), and directly evaluating the waveforms
on an oscilloscope. This is justified because the latency in this case was significantly
larger. The pump-probe method in particular can be used to characterize the variation
in latency due to supply voltage variation, which translates into jitter at the output
of the receiver. The results of those measurements were presented in Chapter 4.
Fig. 5.11 shows the measured values of latency for NRZ and short pulses for
the circuit in Fig. 5.10. These results match the SPICE simulations of the circuit
within error in estimating the parameters. Short pulses reduce the latency of the
CHAPTER 5. LATENCY IN INTERCONNECTS 73
30:10
30:10 30:10
30:10
tune
out24:10 24:10 24:9
5
30:10
90:3030:10
90:30
gnd
vdd
Receiver BuffersModulator
driver
Figure 5.10: Receiver transmitter module used for testing latency via pump-probemethod. The numbers mentioned here are the sizes of PMOS and NMOS transistorsin λ, where λ = 0.2 µm.
101
102
103
104
105
0
0.5
1
1.5
2
Pulse energy (fJ)
Del
ay (
ns)
NRZ and short pulse latency measured and simulated
measured NRZsimulated NRZmeasured spsimulated sp
Figure 5.11: Comparison of the latency of the transimpedance receiver-transmittermodule with short pulse and NRZ inputs.
CHAPTER 5. LATENCY IN INTERCONNECTS 74
transceiver pair compared to NRZ input by a very significant amount. Most of this
reduction comes from the receiver. The latency of the receiver can be further reduced
by reducing the capacitance of the diodes.
Measurements of latency of a transimpedance receiver implemented in bipolar
technology were presented by Wieland et al. [104]. The overall delay of their receiver
was 1.5 ns at 1 Gbps operation. The latency measured with NRZ data here for the
transceiver is of the same order, though the latency is quite low with short pulses.
5.2 Integrating Receiver
Fig. 5.12 shows the circuit schematic diagram of the integrating receiver. The opera-
tion of this receiver was explained in detail in Chapter 4. This receiver integrates the
current at the input for half a cycle. It evaluates and precharges for the remaining
half cycle.
in
clk clkclk
out
clk
out
Vsupply
VcVc
in
Figure 5.12: Circuit schematic of the integrating receiver frontend
In this receiver, the latency is a function of the total integrated charge. A typical
integration period is half of the clock cycle. If the energy is spread over the entire
bit period, as in the case of NRZ, the latency is half of the bit period plus the
time to resolve the logic level. Fig. 5.13 illustrates the details of the timing of this
receiver. In the case of short pulses, the pulses can arrive at the end of the integration
CHAPTER 5. LATENCY IN INTERCONNECTS 75
period and dump all the energy in an instant. As seen in the figure, the latency
with a short pulse is only the evaluation time. The evaluation time of this receiver
depends logarithmically on the amount of integrated charge due to positive feedback
amplification [106]. Certainly, for a practical system, there needs to be some timing
margin to account for the jitter and other variability in the system. This could be
incorporated after knowing the details of the system design.
Short pulsedata input
Integrating Evaluationphasephase
NRZ datainput
clock
valid output
t1
t2
t2: latency with NRZt1: latency with short pulses
Figure 5.13: Latency with respect to clock in the integrating receiver with NRZ andshort pulse inputs.
This receiver operates on the principle of positive feedback, hence it is very sen-
sitive. For modeling of this receiver, the parameter values of the 0.25 µm CMOS
technology are assumed so that the results can be compared with the transimpedance
receiver. Other parameters assumed in simulation are: photodetector capacitance is
40 fF, photodetector responsivity is 0.5 A/W, and the pulse width of electrical pulses
generated from photodiode is 10 ps.
SPICE simulation of the latency of the entire integrating receiver circuit (including
SR latch) is plotted in Fig. 5.14. According to this simulation, the total latency of the
receiver is ∼ 150 ps for 50 pJ of pulse energy. The delay is approximately logarithmic
with input optical energy.
CHAPTER 5. LATENCY IN INTERCONNECTS 76
0 10 20 30 40 50 60 70 80150
160
170
180
190
200
210
220
230
Pulse energy (fJ)
Rec
eive
r d
elay
(p
s)
Figure 5.14: Latency of the entire integrating receiver, including the SR latch, withshort pulse input computed by using SPICE circuit simulator.
5.3 Totem-pole diode receiver
Very low latency at the expense of larger power can be achieved by using a diode
pair connected in the totem-pole configuration as shown in Fig. 5.15. This design
is effectively receiver less (“recless”) as there is no voltage amplifier involved. This
receiver needs to be connected to a high impedance node like the gate of a buffer so
that the charge can be integrated. Here the input capacitance is charged to the supply
rails by providing sufficient optical power. The optical power required to charge the
node “in” to the supply rails is a linear function of the frontend capacitance, which
is typically dominated by the photodiode capacitance. If the total capacitance at the
node “in” is Cin and the total voltage swing required at the node is Vsup then the total
charge required Qtot = CinVsup. For photodiode responsivity R, the minimum optical
energy required is Eopt = CinVsup/R. This optical energy can either be delivered in a
very brief period or it can be spread out over the entire bit period (T). If the input to
this receiver is NRZ data, with the minimum required pulse energy, the input node
CHAPTER 5. LATENCY IN INTERCONNECTS 77
will reach half of the supply voltage in half a cycle (tnrz = T ), which is very long
latency. Instead, if short pulses are used, the input node will be charged immediately
(tsp), only limited by the carrier transit time in the intrinsic region of MQW diode.
The timing diagram of the charging of the input node with NRZ and short pulses is
shown in Fig. 5.16.
in
Figure 5.15: Schematic of the totem-pole diode pair receiver connected to the highimpedance input of the inverter buffer.
tnrz
Vsup
T T= T/2 tsp
NRZ input short pulse input
Figure 5.16: Voltage vs. time at node “in” of the recless receiver for NRZ and shortpulse inputs with minimum optical energy to swing the node by supply voltage.
If the flip-chip bonded photodiode capacitance is 40 fF and responsivity is 0.5 A/W
then for a total capacitance of 90 fF (assuming 10 fF capacitance of the buffer) the
optical energy required to charge the input node by 2.5 V (supply voltage for 0.25 µm
CMOS technology) is 450 fJ. By using a metal-semiconductor-metal photodiode, or
a silicon photodiode in a silicon-on-insulator process, the photodiode capacitance can
be reduced, which will reduce the optical energy required. For a 1 µm long intrinsic
region, the carrier transit time is roughly 10 ps, which determines the latency with
CHAPTER 5. LATENCY IN INTERCONNECTS 78
short pulses in this receiver. By comparison, for 1 Gbps operation, the latency with
NRZ data with minimum optical power required will be 0.5 ns. This receiver gives
the minimum latency with short pulses of all the three receivers mentioned in this
chapter, though the amount of optical energy required can be much higher, depending
on the capacitance.
5.4 Scaling of latency with technology
0 50 100 150 200 250 300 350 4000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Optical energy per bit (fJ)
Rec
eive
r d
elay
(n
orm
aliz
ed t
o F
O4)
0.25 µm0.5 µm
Figure 5.17: Comparing the delay of the transimpedance receiver with short pulsedata for 0.25 µm and 0.5 µm technologies by normalizing to FO4 delay in respectivetechnologies.
A fan-out-of-4 (FO4) delay is defined as the delay of a gate driving four gates of
the same size. The latency of the receiver is expected to scale roughly as FO4 delay for
the technology. The delay of a transimpedance receiver, scaled by the corresponding
FO4 delay, is compared (Fig. 5.17) in 0.5 µm and 0.25 µm CMOS technologies with
short pulse input. The comparison is done through SPICE simulations. In 0.5 µm
technology the FO4 delay is 270 ps while in 0.25µm technology it is 90 ps. The two
CHAPTER 5. LATENCY IN INTERCONNECTS 79
curves follow each other closely.
As seen from the normalized latency in two different technologies, we can con-
cluded that the latency of the receiver scales as FO4 delay in the technology. This
would allow optical interconnects to keep pace with the performance of silicon chips.
The scaling of FO4 gate delay with technology scaling is predicted in Fig. 5.18.
Technology Ldrawn ( m)µ
FO4
gate
del
ay (
ps)
Figure 5.18: FO4 gate delay scaling with technology [107]
5.5 Summary
The latency of three different receiver architectures; transimpedance, integrating, and
recless (totem-pole diode pair) with NRZ and short pulse data inputs are shown in
Table 5.1. Short pulses significantly improve the performance of all three receivers.
A recless receiver with short pulses has the shortest delay, but at the expense of
optical power. The optical power required depends on the photodiode capacitance.
The transimpedance and integrating receivers have a similar performance with short
pulses.
Chip sizes are expected to increase modestly with future generations and will
remain around 2 cm across. Assuming a global interconnect of 2 cm the latency
of a repeatered electrical line is ∼ 330 ps (at 20% of the velocity of light). For
optical interconnects the propagation time for 2 cm distance in glass is ∼ 100 ps.
The latency in the transmitter can be brought down to ∼ 70 ps (assuming a single
CHAPTER 5. LATENCY IN INTERCONNECTS 80
Receiver type NRZ delay short pulse delay(ps) (ps)
Transimpedance 340 120Integrating 650 150Recless 500 10
Table 5.1: Receiver latency with NRZ and short pulse inputs. Optical energy perbit for the transimpedance and integrating receivers is ∼ 50 fJ, and for the reclessreceiver is 450 fJ.
buffer driving the modulator capacitance), and as we have shown, the receiver latency
can be reduced to ∼ 70 ps with short pulses. This shows that optical interconnects
can achieve latencies comparable to electrical interconnects or even less, at least
theoretically, for on-chip global communication.
Chapter 6
Timing in Silicon Chips
We have already seen in earlier chapters that large amplitude, and sharp rising and
falling edges of short pulses can be used for improving the sensitivity of the re-
ceivers, and reducing the latency of interconnects. Apart from these benefits, the low
pulse-to-pulse jitter in short pulses generated from a modelocked laser can help in
synchronization of the system. Phase aligning a large number of parallel intercon-
nect channels and providing a precise, skew-and-jitter free clock are two ways we will
consider to improve the synchronization of the system.
In providing a large number of parallel IOs, synchronization of all the channels to
a local clock on the receiving end is a challenging task. One way to synchronize all the
channels is to provide per channel timing management via electronics, though this is
cumbersome and requires silicon area to be devoted to each channel, which in turn
reduces the density of interconnects. Instead, the use of short pulses with modulators
automatically synchronizes all the channels by eliminating skew and jitter from the
modulator drive signals because of low pulse-to-pulse jitter in the short pulse train
as detailed in Section 6.1.
In current high-performance integrated circuits, precise clock signals are crucial
for the operation. In fact, the accuracy of clocks is a limiting factor in multiplexing
systems and analog-to-digital conversion systems. For example, the time resolution
of a NMOS sampling switch in a standard 0.8 µm CMOS technology is ∼ 21 ps
(48 Gb/s) [108] when there is no jitter on the clock, while in practical systems the
81
CHAPTER 6. TIMING IN SILICON CHIPS 82
attainable speeds are much slower due to jitter on the clock. We can generate large
enough swings to drive logic without any amplifier by using short pulses on the de-
tectors with low capacitance in a totem-pole diode pair (receiverless clock injection).
This eliminates delay, skew, and jitter from the receiving circuit, which can achieve
very precise clock input. We monolithically integrated silicon detectors to reduce the
capacitance of the diodes and to reduce the cost associated with hybrid integration
for this implementation. A proof-of-principle demonstration of precise clock injec-
tion with silicon detectors is described in Section 6.2. Characterization of the high
frequency response of silicon detectors using on-chip samplers is also presented.
6.1 Jitter and skew removal
High speed electrical links are typically serial links, where the entire data stream
is sent on a single channel, and the data is recovered by the receiver by extracting
the clock simultaneously. Jitter on this channel reduces the timing margins of the
receiver. In high-density parallel interconnects, if a single clock is used to extract all
the channels, the situation is even more difficult because in addition to jitter there can
be skew among the channels. Per channel skew compensation (e.g. as implemented by
Yeung and Horowitz [61]) can eliminate the skew from various sources at the receiver.
But it requires additional silicon area and does not remove jitter.
By employing short pulses in a modulator-based system, all parallel channels can be
resynchronized (illustrated in Fig. 2.5), removing both skew and jitter, and by making
the optical path lengths of all the channels equal, all the channels will be synchronized
at the receiving end. Up to half a bit of skew and jitter can be removed by this method.
To demonstrate skew removal experimentally, two channels were driven externally
from a bit stream skewed by 3/8 of a bit period [109]. Readout with a cw beam maps
the electrical drive of the modulators as seen in Fig. 6.1. These modulator channels
were then read by short pulses, which were nominally placed at the center of the
bit period. As shown in Fig. 6.2, skew was completely eliminated by the short pulse
readout.
CHAPTER 6. TIMING IN SILICON CHIPS 83
ch. 1
ch. 2
Figure 6.1: Transmitted signals from two channels readout with a cw laser. Channelsare skewed by 3/8 of a bit period.
Figure 6.2: Skew removal by short pulse readout of two modulator channels skewedby 3/8 of a bit period. Ones and zeros are alternately read by these pulses.
Jitter from modulator channels can be similarly removed by reading out the mod-
ulator with short pulse at the nominal center of the bit. To demonstrate jitter removal
experimentally, an optical link was operated with a modulator driven by a signal with
± 3/8 bit of jitter. The receiver output was connected to another modulator, which
was read by a cw beam to give the received data shown in Fig. 6.3. The jitter has
clearly been removed by this approach.
The removal of skew and jitter demonstrates that a low jitter, periodic pulse train
from a modelocked laser can phase align the signals from an array of modulator
channels. This synchronization is achieved solely because of the short pulse readout.
CHAPTER 6. TIMING IN SILICON CHIPS 84
short pulse readoutof modulator
signal with jittermodulator drive
Figure 6.3: Jitter removal from a single interconnect channel. Upper trace is theelectrical drive signal with jitter and the bottom trace is the optical readout of thereceiver.
A single clock can therefore recover data from all of these phase-aligned channels,
simplifying the system implementation. The optical power requirement from the
modelocked laser scales only linearly as the number of channels are increased.
6.2 Optical clock injection
The requirement of a precise clock is becoming a bottleneck in many applications.
Precise clock injection is required in analog-to-digital conversion, high speed multi-
plexing and demultiplexing, and test and measurement of high speed signals. To run
a chip synchronously, a skew and jitter-free clock needs to be distributed across the
chip [110]. To distribute the clock symmetrically, interconnections in the form of H
trees [111, 112], grid [113], and many other topologies are used. Researchers have also
used coupled oscillators [114, 115] to distribute precise clock across the chip. It is even
possible to intentionally skew the clock to improve the performance of circuits [116].
These techniques do help in clock distribution, but at the cost of significantly in-
creased complexity. Also, an extremely careful design is required to reduce the skew.
As the technology scales, the clock skew problem will get worse [117, 118].
CHAPTER 6. TIMING IN SILICON CHIPS 85
Many attempts have been made to distribute the clock optically [112] [119]. Op-
tical clock distribution with short pulses has also been investigated by Delfyett et
al. [120] and Kawanishi et al. [121]. Delfyett et al. achieved 12 ps of jitter between
two ports under test, which is a remarkable result. All the work mentioned here con-
sisted of a receiver at each end-node to generate logic levels from the optical signal.
When distributing the clock to a large number of nodes, variation in this receiver in-
troduces skew and jitter in the received signal. We propose a receiverless scheme with
short pulses for clock injection. By using only the photodetectors and eliminating the
receiver circuit, the source of skew and jitter is also eliminated. It is then possible to
inject a precise clock to a large number of nodes with short pulses.
Monolithically integrated silicon detectors can potentially reduce the cost, simplify
the fabrication (by using standard CMOS fabrication process), and reduce the capac-
itance, as compared to hybrid integrated photodetectors. At 850 nm (wavelength of
operation), though, due to large absorption depth there are many issues with silicon
detectors. A discussion of these issues with the implementation of silicon detectors
is presented next. On-chip samplers are used to characterize the high speed response
of the silicon detectors. A demonstration of precise clock injection with a receiverless
scheme, implemented with silicon detectors, is presented at the end of this section.
6.2.1 Silicon detectors
As silicon has indirect bandgap at the wavelength of interest (850 nm), it has poor ab-
sorption [122] [123] [124]. Direct bandgap materials have an abrupt absorption change
near the band-edge, while silicon has a gradual onset of absorption near the band-
edge. Because of this weak indirect absorption, the absorption length (1/α where α
is the absorption coefficient) is roughly 14 µm. This absorption length is larger than
the typical well depth in current CMOS technologies. For example, in 0.25 µm CMOS
technology, the technology used in this work, the n-well depth is 1.2 µm. The deple-
tion region, formed near the well edge, absorbs very little optical energy. Most of the
light generates carriers deep into the substrate, which slowly diffuse to the depletion
region and generate a long tail response. This long tail inhibits the capability of using
CHAPTER 6. TIMING IN SILICON CHIPS 86
p+ n+
Nwell
p+ n+
Nwell
p+ n+
Nwell
1.2um
p−substrate finger spacing
N−Well Detector
Nwell
n+p+ p+
p−substrate
Interdigitated Detector (IDT)
Figure 6.4: A cross-sectional view of two silicon detector topologies
silicon photodetectors at very high speeds. By spatially blocking the light in certain
regions, and taking the difference of responses from blocked and unblocked regions, a
faster response could be obtained from silicon detectors [87]. In this method, however,
the responsivity reduces by a significant amount. Even with lower responsivity and
a long response tail, high speed receiver operation with monolithic silicon detectors
has been demonstrated [122] [125] [126] [127] [128].
In this work we have used two different kind of silicon detectors. Their cross-
sections are shown in Fig. 6.4. The first detector consists of a diode formed by
n-well and p-substrate. The second detector has interdigitated p-diffusion and n-
diffusion areas contained within an n-well. The p-diffusion and n-diffusion fingers of
the interdigitated detector are connected by metal. Using interdigitation increases
the depletion region in the device. For each topology we implemented two detectors
with slightly different dimensions, as summarized in Table 6.1.
The DC-responsivities, measured by an optical probe-station setup, were around
0.025 A/W. The capacitance of the detectors was measured on-chip by using ring
oscillators [57]. A five stage inverter ring formed an oscillator. Each inverter was
loaded with a copy of the silicon detector whose capacitance was being measured. The
frequency of the inverter stage was divided by 32 before it was extracted electrically
CHAPTER 6. TIMING IN SILICON CHIPS 87
Area Finger spacing Capacitance(µm2) (µm) (fF)
Nwell 1 10 × 11.6 — 30Nwell 2 20 × 21.6 — 85Interdigitated 1 19.2 × 20 4.4 122Interdigitated 2 22.4 × 20 5.2 124
Table 6.1: The dimensions and the capacitances of the silicon detectors implementedin this work. Two n-well detectors and two interdigitated detectors of different sizeswere chosen.
at an output pad, so that the output pad would not need to drive a very high speed
signal outside the chip. By comparing this oscillation frequency with the frequency of
an unloaded oscillator, the capacitance of the detector was extracted. The capacitance
values are shown in Table 6.1.
6.2.2 Frequency response of silicon detectors
To be able to use silicon detectors for high-speed clock injection, we need to know
their frequency response. One way of characterizing the frequency response of a silicon
detector is by exciting it with a broadband source such as a short pulse and measuring
the response using probes [129]. Instead of loading with the large capacitance of the
probe, here we measure the frequency response of silicon detectors with realistic
loading, using an on-chip sampler and short pulse excitation. Details of the design
and operation of the sampler are given in Chapter 3. The bandwidth of the sampler
was found to be ∼ 4 GHz through simulations.
The voltage signal created by the short pulse excitation of the silicon detector
connected to a high impedance node of the sampler was sampled. Since the short
pulses were about 150 fs long, the impulse response of the detector was measured to a
good approximation. To make the signal periodic, the detector input was reset every
period to the supply rail. Clocks were aligned in such a way that short pulses arrived
slightly after the reset was released. A typical sampled trace is shown in Fig. 6.5.
The small glitch at the beginning of the trace was due to the release of the reset on
CHAPTER 6. TIMING IN SILICON CHIPS 88
−6 −4 −2 0 2 4 6 81.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
time (ns)
sam
pler
vol
tage
↓ Reset Released
←Impact of optical pulse
Long Tail
Student Version of MATLAB
Figure 6.5: The sampled signal trace showing the response of the first interdigitateddetector to an optical short pulse. The optical energy in the pulse was 0.74 pJ.
the detector. The optical pulse induced a fast falling edge followed by a long tail at
the detector.
By taking the derivative of measured voltage we can get the profile of the photo-
current (i(t) = CdV/dt) assuming the capacitance remains the same. The Fourier
transform of the current signal (i(t)) gives the frequency response of the detector.
This response for different diodes is shown in Fig. 6.6. The plot shows that the
frequency response falls off as sub-20 dB/decade until roughly 2 GHz [130]. This
fall off indicates that detectors can not be modeled as a first-order system; in fact
the response falls off roughly as 10 dB/decade suggesting that the detector frequency
response can be modeled as:
h(s) =1
1 +√sτ
. (6.1)
With the present setup it is hard to measure τ because it is very small. The curve
falls off very rapidly around 4 GHz because of the frequency response of the samplers.
The sub-20 dB/decade fall off of detector frequency response allow a large enough
CHAPTER 6. TIMING IN SILICON CHIPS 89
108
109
1010
−25
−20
−15
−10
−5
0
Freq (Hz)
Nor
mal
ized
am
plitu
de (
dB)
square root sdependence
Roll off dueto finite BWof samplers
NWELL1NWELL2IDT1root s
Figure 6.6: The frequency behavior of the various silicon detectors. The response ofthe second interdigitated detector was not included for clarity. The curves have beennormalized with respect to their first frequency component for comparison reasons.
signal to be obtained at high frequencies, making it easier to use silicon detectors at
high speeds.
The output of the sampler settles on a much smaller time scale than the repetition
period of the short pulse laser. This allows us to extract more information from the
sampler by connecting its output to a high speed digital oscilloscope. The oscilloscope
itself constructs the trace by consecutively sampling different periods of the real signal.
Thus, by acquiring the oscilloscope’s whole signal, we actually get several versions of
the sampled signal. From this information, an estimate of noise in the signal can be
made for a certain delay. In this technique there is a complication that the sampler
output is affected by clock feedthrough at the slave switch (refer to Chapter 3 for
the schematic of the sampler). This can be compensated by calibrating the ripple
beforehand for different output currents. This method is used to estimate the jitter
in the injected clock signal.
CHAPTER 6. TIMING IN SILICON CHIPS 90
6.2.3 Receiverless clock injection
By eliminating the receiver amplifier circuit, a potential source of skew and jitter can
be removed. Optical path lengths can be controlled very precisely to distribute a
virtually skew-free signal to different points of the chip. Very low pulse-to-pulse jitter
short pulses from a modelocked laser, along with a receiverless logic recovery scheme,
can provide a very precise clock. We present a proof-of-principle demonstration of
this concept in this section.
Willams et al. used a single detector to generate full logic swing for a telecommu-
nications receiver [90]. They used an erbium doped fiber amplifier (EDFA) to amplify
the optical power and the output of the detector was driving a 50 Ω resistance. In-
stead, we propose a high impedance load connected to the diode. Low capacitance of
the monolithically integrated diode may reduce the optical power requirement.
To implement this scheme, two silicon diodes are connected in a totem-pole config-
uration. The top diode is connected to the supply and the bottom diode is connected
to the ground as shown in Fig. 6.7. When the optical pulse is incident on the top
diode, the node marked as in is charged to the supply voltage and when the optical
pulse falls on bottom diode, node in discharges to the ground1. By alternating the
pulses on the top and bottom diodes, we were able to inject a very precise clock into
the chip. In the present case, a small CMOS inverter was connected to the output of
the totem-pole diode configuration as a high impedance load. To verify the operation
of this scheme, the output of this inverter was sampled by on-chip samplers. Since
the detectors in this design were very small, the output of the inverter was sampled
rather than the detector to minimize the distortion. We could set the node of the
detector to an external bias voltage for testing via a pass gate designed on the chip.
This scheme was implemented with two different kind of detectors. In the first
implementation a totem-pole was created with n-well/p-substrate detector and a p-
diffusion/n-well detector at 5 µm spacing. We were not able to create very significant
swing on this device because of the large difference in the responsivity of the detectors
and the diffusion of carriers from one detector to another. One possible solution to
1This is a simplified description. In fact the diodes can go into forward bias under illumination,
which means the node in can actually go above the supply voltage or below ground.
CHAPTER 6. TIMING IN SILICON CHIPS 91
V sup
set pulse
samplednode
in
reset pulseT/2 delayed
node ofinterest
Figure 6.7: Schematic of receiverless optical clock injection with optical short pulsesusing a totem-pole diode pair. The inverter provides very little capacitive loading,though it can be eliminated and clock can be injected directly at the desired node.
V sup
set pulse
samplednode
in
600 Ω
Figure 6.8: Equivalent circuit of the totem-pole pair implementation with interdig-itated diodes. Due to substrate connection this device was self-resetting.
this problem may be to increase the separation of the diodes.
The second scheme was implemented with interdigitated detectors with two fin-
gers. The area of the detector was 14.4 × 10.4 µm2. Unfortunately, this scheme cre-
ated a substrate connection, tying the device node to ground. Though, this ground
connection was effectively through a 600 Ω resistance, which made this scheme self
resetting and required only one beam input. There are also disadvantages of self-
resetting: first, operation now requires more optical power and second, the resetting
edge is completely controlled by the substrate connection and not through the beam.
Fig. 6.9 shows the operation of this device via the sampled output of the inverter
when a 6 pJ pulse was incident on the device. Assuming 600 Ω resistance, this curve
CHAPTER 6. TIMING IN SILICON CHIPS 92
Time (ns)
Sam
ple
d V
olt
age
(V)
7.2 7.4 7.6 7.8 8 8.2 8.41.2
1.4
1.6
1.8
2
2.2
2.4
2.6
Student Version of MATLAB
Figure 6.9: Receiverless optical clock injection with optical short pulses of 6 pJ ontothe totem-pole configuration of interdigitated detectors.
is very similar to the one predicted by simulations. Simulations also infer that the
slew rate of this curve is limited by the inverter.
Fig. 6.9 is a grey scale image with the intensity of a point proportional to the
probability of the sample. At the midpoint of the swing on the rising edge at the
device (the falling edge on the curve in Fig. 6.9 because this signal is after the inverter)
we can obtain a histogram to determine the jitter on the signal. Fig. 6.10 shows this
histogram. The standard deviation of this histogram is 4 ps and the peak-to-peak
jitter is 20 ps. These measurements are close to the accuracy of our measurement
setup; when the optical pulse was directly put into the oscilloscope, through the
oscilloscope’s optical input, a similar jitter was obtained with a standard deviation
of 3.7 ps.
The incidence time of a pulse can be very precisely controlled by changing the
path length of the beam. This allows for very precise clock phase variation. Jitter
histograms of normal incidence and 10 ps delayed incidence in Fig. 6.10 illustrate
the controllability of the clock phase [130]. The mean of the falling edge after the
inverter is shifted by exactly 10 ps, proving the accuracy of the technique. In this
CHAPTER 6. TIMING IN SILICON CHIPS 93
7.45 7.46 7.47 7.48 7.49 7.50
5
10
15
20
25
30
35Early Curve: µ :7.4678 ns σ :3.9253 ps Hits :298
Late Curve: µ :7.4778 ns σ :3.988 ps Hits :319
time (ns)
occ
ure
nce
s
Figure 6.10: Histogram of the pulse signals crossing at marker level at half theirswing. The histograms correspond to two experiments one of which is delayed 10 psmore compared to the reference clock.
proof-of-principle demonstration, we measured the output of the inverter, which not
only reduced the slew rate of the signal but also added the jitter. Consequently, we
can conclude that the signal at the detector might perform even better.
This scheme can be improved further by using lower capacitance diodes. The
silicon-on-insulator process reduces the capacitance of integrated diodes quite sig-
nificantly, which will reduce the amount of optical power required. To improve the
responsivity of these diodes, the frequency of the light can be doubled. At 425 nm
the absorption length reduces to ∼ 0.2 µm. This will make a large impact on the
performance of these detectors.
6.3 Summary
Short pulses can potentially help in synchronization issues. As we saw in Section 6.1,
short pulses can synchronize an array of modulators by eliminating skew and jitter.
By nominally placing the pulses in the center of the bit period, skew and jitter of up
CHAPTER 6. TIMING IN SILICON CHIPS 94
to half of a bit period can be removed. We demonstrated skew and jitter removal
of 3/8 of a bit period with this method. Synchronizing channels with short pulses
should eliminate the need for per-channel skew compensation, reducing the overall
complexity of the design. In conclusion, short optical pulses provide a simple and
scalable solution to data recovery in very large parallel interconnects.
Very precise clock injection is also possible with short pulses using a receiver-
less scheme, with potential applications in analog to digital conversion, high speed
multiplexing and demultiplexing, and low-skew on-chip clock distribution. Silicon
detectors were investigated because of the ease in integration with the current CMOS
process and low capacitance. Using on-chip electrical samplers, the high frequency
response of these detectors was obtained. A proof-of-principle experiment presented
here demonstrated the operation of precise clock injection with very low jitter.
Chapter 7
Wavelength Division Multiplexing
System
It is a common practice to use wavelength-division multiplexing (WDM) in telecom-
munications. The use of WDM enhances the capacity of the fibers by using passive
optical components to separate different wavelength channels, and have each channel
processed by the electronic circuits. Moreover, currently deployed WDM systems
typically operate in the wavelength range of erbium doped fiber amplifiers. A single
amplifier can amplify all the channels simultaneously, a significant advantage in the
system.
For short distance interconnects the issues are significantly different. The through-
put, the latency, and the cost are of paramount importance. By operating each chan-
nel at the maximum speed of the silicon technology without time-division multiplexing
and providing a large number of parallel channels, the throughput can be increased
while keeping transmitters and receivers very simple. Silicon CMOS allows cheap yet
dense circuits, and by using multiple parallel channels, a higher throughput can be
achieved at a low cost. Compared to point-to-point links, wavelength-division mul-
tiplexing allows communication using a single fiber. In space-constrained backplanes
and non-line-of-sight links, WDM might be a preferred solution.
Typical WDM implementations involve one laser for each channel emitting at a
specified wavelength, that is monitored very closely to avoid drift. As the number
95
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 96
of channels increases, the number of wavelengths also increases, increasing the cost
and the complexity of the system. Also, the channels need to be synchronized at the
receiver end to remove skew and jitter for a synchronous system implementation.
A broadband optical source can be spectrally sliced to generate WDM channels,
which could then be modulated using modulators. This concept was implemented by
Wagner et al. [131] and Sampson et al. [132] using a super luminescent light emit-
ting diode (LED) as a broadband source. This implementation removes the wave-
length monitoring requirement from each individual channel, though synchronization
still needs to be done. Spectrally sliced WDM can also be implemented using short
pulses [46] [133] [134] [135]. Using femto-second pulses as a broadband source can not
only remove the monitoring requirement of each channel, but also synchronize all the
channels in the readout of modulators.
In this chapter we present a proof-of-principle demonstration WDM system, using
spectral slicing of short pulses for short distance interconnects. This system can po-
tentially utilize all the advantages of short pulses mentioned in the earlier chapters.
The concept of a WDM system using spectral slicing of short pulses is presented in
Section 7.1. First and second generation system implementation and measurement re-
sults are presented in Section 7.2. Finally the conclusions and possible improvements
to the system are mentioned.
7.1 Concept of WDM with short pulses
Femtosecond pulses have a very large bandwidth. A 150 fs pulse has a spectral width
of roughly 5 nm. This spectrum can be divided to form different channels. A train of
short pulses is represented by a train of Dirac-delta impulses in the frequency domain
with the envelope of these impulses determined by the Fourier transform of the pulse
shape.
n=+∞∑
n=−∞
p(t− nT )⇔ P (f)n=+∞∑
n=−∞
δ(f − n
T) (7.1)
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 97
fs pulses
30Ghz
at 80 Mhz
lens
m
m
blazed grating
separated by the grating17 Comb of frequencies
73Ghz
µ
µ45.5
Figure 7.1: An exaggerated view of the frequency comb incident on the modulatorarray. Frequency components of the 80 MHz pulse train are separated in space by ablazed grating.
where P (f) is the Fourier transform of pulse p(t). The separation of impulses in the
frequency domain is the same as the repetition frequency of pulses in the time domain.
To maintain the pulse sufficiently short for each channel, it is important to have a
large number of impulse components in the frequency domain for each channel. Us-
ing a single component (effectively a single wavelength) for each channel would make
the WDM implementation similar to a broadband source implementation and many
advantages of short pulses in interconnects would not be utilized. Fig. 7.1 shows an
exaggerated view of different frequency components incident on an array of modula-
tors. The modulator spacing shown in the figure corresponds to the implementation
in this work. For the non-flat envelope of pulse spectrum (P (f)), different channels
will encounter different optical powers. This variation in power needs to be accounted
in the optical power budget of the system.
A single short pulse source generating all the channels simplifies many system
criteria and provides multiple advantages. The benefits of the short pulse WDM
system with MQW diodes hybrid-integrated to silicon CMOS chips are summarized
below:
i. In traditional WDM, different lasers generate different channels, which need to
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 98
fs pulses
Transmitterchip
gratingsReceiver
chip
high speeddetector
readoutbeam
fiber
Figure 7.2: Schematic of the WDM system implementation
be carefully monitored. If the laser frequency drifts, it can generate crosstalk
to the neighboring channel. However, using a single source to generate all the
channels eliminates this problem. The linear dimension of the modulator defines
the spectral width of a channel, and the separation of the modulators defines the
guard band between the channels. Since these are fixed dimensions, no monitoring
is required.
ii. All the advantages of short pulses mentioned in the earlier chapters can be uti-
lized.
iii. Hybrid integration enables each channel to be placed very close to the electrical
origin of the signal, reducing the latency of the propagation in the wires. In con-
trast, if multiple streams are time-multiplexed onto a single channel, the streams
need to be routed to the multiplexer, incurring extra latency in the process. In
WDM, channel multiplexing and demultiplexing can be accomplished using a
passive optical component without incurring any latency penalty.
The schematic in Fig. 7.2 shows the principle of operation of the WDM system.
Short optical pulses (∼ 150 fs), generated by a Ti:Sapphire modelocked laser, are
dispersed by the first grating into a wavelength spread in space. A lens collimates
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 99
the different wavelengths, which are then incident on the array of modulators. Each
modulator modulates a small band of wavelengths. The modulated light is reflected
back to the grating where it is again combined into a single beam (multiplexing). A
single mode fiber transports this beam to the destination. A second grating disperses
the beam into a spatial wavelength spread (demultiplexing). The modulated channels
are put on the corresponding receiver diodes. Received data is converted to a full logic
swing by the receiver.
It is important to note here that the pulses after modulation and multiplexing
are of the order of a few picoseconds. The width of the individual pulse is still short
compared to the bit period, thus retaining all the advantages of the short pulse link.
Dispersion in the fiber is not a concern because the distances involved are of the order
of a few tens of meters. Even for longer distance, Shen et al. have demonstrated the
transmission of spectrally sliced channels with a total span of 15 nm over a 2.5 km
standard single mode fiber/dispersion-compensating fiber link with less than 3 ps
timing skew [46].
Implementation of the proof-of-principle demonstration system is mentioned in
the next section.
7.2 System implementation
The diodes integrated on the silicon chips had a pitch of 62.5 µm. The ∼ 5 nm
bandwidth of the short pulse train was spatially distributed on 20 diodes, with each
two adjacent diodes forming a differential channel. The light falling between the
diodes is ideally absorbed, which forms a guard band between each diode. In a
system prone to misalignment this guard band avoids crosstalk between channels,
but it is not really required in a properly aligned system. Removing the guard band
will improve the power budget of the system. A channel spacing of 0.5 nm was used,
which corresponded to a frequency separation of ∼ 200 GHz. The modulators had
a window of 17 µm × 17 µm, which modulated the light. Modulator spacing was
inefficient but the pitch of the diode array was fixed for future operation with fiber
ribbon. At 80 MHz (the repetition frequency of the laser), around 340 frequency
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 100
components fall on the modulator and 900 components fall on the space between the
diodes. The number of frequency components modulated by the modulator was large
enough to retain the pulse width in picoseconds, which was still much shorter than
the bit period.
A silicon chip designed in the 0.5 µm technology was used in testing. The chip
consisted of a pseudo-random bit sequence (PRBS) generator driving an array of ten
differential channels. Integrating receivers were used because of high sensitivity. The
receivers were designed to either drive modulators for all-optical testing, or to drive
on-chip circuits for bit error rate testing.
Two iterations of system implementation were done to fix the shortcomings of the
first generation as mentioned below.
7.2.1 Optical setup
A picture of the first generation optical setup is shown in Fig. 7.3. Spindler and
Hoyer components were used to assemble the system. Short pulses were directed to
the chip by a polarizing beam-splitter, so that the modulated beam reflected from
the chip could be rotated in polarization by 90o to redirect it in a different direction,
and couple into a single mode fiber. A pellicle beam splitter was used to illuminate
the chip with an infra-red LED to help align the input pulses by viewing the chip
through a camera. This pellicle was removable to minimize the loss. The pellicle was
very thin, and it did not shift the position of the beam.
There were a few problems with this first generation optical setup:
i. The components were mounted at a height on thin rod structures, which made
them susceptible to mechanical vibrations.
ii. Any vibration causing angular variation in the grating caused large lateral motion
of the spots, and sometimes the spots moved off the optical devices.
iii. Losses in the system were too high to make the entire link work.
Another setup was designed to fix these issues. This second setup was built
on baseplates, which provide a much more stable platform. The vibration problem
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 101
transmitterchip grating
Figure 7.3: First generation optical setup using Spindler and Hoyer components.The portion on the transmitter side is visible.
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 102
splitter
receiverchip
pellicle beammultiplexed beamthrough the fiber
grating for demultiplexingimagingcamera
IR LED
Figure 7.4: Second generation WDM link optical setup. A closeup of the receiverside is shown in the picture.
encountered in the first setup was eliminated in this implementation. Baseplates are
discussed in detail by Brubaker et al. [48] and a brief description is given in Chapter 3.
Fig 7.4 shows a closeup of the optical setup on the receiver side, where the receiver
chip, the imaging camera, and the grating are visible. Gold-coated echelle gratings
were used for multiplexing and demultiplexing different wavelength channels on the
MQW diode array. To align the input beam to the right receiver, the chip could be
observed through a camera. Once aligned, the pellicle beam splitter could be removed
from the setup without beam deviation to reduce the overall loss in the link.
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 103
846.5 847 847.5 848 848.5 849 849.50.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
11.0
0.5
Ref
lect
ed I
nten
sity
(no
rmal
ized
)
ch.1 ch.3ch.2 ch.4
λ λ = 849.1 nm
arraymodulator
= 847.1 nm
Figure 7.5: CCD scan of the wavelength of the modulated transmitter output.Solid and dashed lines represent two snapshots at different times. The correspondingmodulators are shown below the wavelength scan.
7.2.2 Measurement results
By replacing the receiver chip with a CCD camera, channel definitions in the received
beam could be visualized. Different wavelengths were imaged linearly across the
camera. Fig. 7.5 shows the first four channels at two different time instances. All
four channels have changed their state. The finite contrast ratio of the modulators is
visible in the picture.
The losses in the second generation system were still quite high. Due to a large
coupling loss into the single mode fiber, the link operation with the fiber was not
feasible. It was possible to operate the link with light propagating in free space. A
single channel was tested by externally driving a 32-bit pseudo random sequence on a
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 104
transmitteddata
receiveddata
Figure 7.6: 80 Mb/s operation of a single channel in a WDM link
modulator. The drive signal of the modulator was correctly replicated by the receiver
(Fig. 7.6). The receiver output was connected to an electrical pad to view the signal
directly on the oscilloscope. This result demonstrates, in principle, the operation of
a short-pulse-based WDM system [136].
The main reasons contributing to the low power in the system were: a) larger than
expected losses in optical components, such as the gratings; b) low contrast ratio of
the modulators (∼ 1.3); c) a large spacing between modulators, effectively reducing
spectral utilization efficiency. The next step would be to reduce the losses in the
system to be able to operate the WDM link with the fiber.
7.3 Summary
Combining all the advantages of short pulses, a spectrally sliced WDM system could
be implemented. The main features of such a system are: no need for wavelength
monitoring, receiver sensitivity enhancement, latency reduction in the receivers, and
synchronization of all the channels.
In this WDM implementation, fiber coupling losses were the final obstacle in mak-
ing the interconnect operate through the fiber. There are many ways in which this
system and components can be improved to increase the power budget and the per-
formance. The spacing between the modulators can either be eliminated, or some
method can be used to be more spectrally efficient, i.e., all the light intended for one
CHAPTER 7. WAVELENGTH DIVISION MULTIPLEXING SYSTEM 105
channel can be focused on the modulator. Using micro-lenses is one way of focusing
all the light on the modulator. Or, using frequency comb, the energy can be con-
centrated on the appropriate modulator locations [137]. Improving the contrast ratio
of the modulators will also improve the system performance. MQW diodes bonded
on the silicon chips had a capacitance of ∼ 260 fF. By reducing this capacitance,
the sensitivity of the integrating receiver can be improved, relaxing the system power
budget.
In conclusion, a proof-of-principle operation of a short-pulse-based WDM inter-
connect system, without fiber, was demonstrated.
Chapter 8
Conclusions
In providing dense interconnects with large bandwidths to silicon chips, optics can
be an alternative to electrical wires. Latency, power budget, and synchronization are
critical issues in interconnects. Optical links might be able to address these issues by
using unconventional means, i.e., short pulse signaling.
This dissertation has shown that the RZ data format with a low duty cycle (short
pulses), instead of NRZ, can bring significant improvements in the interconnect per-
formance. Short pulses have a) all the energy concentrated in a very short time
(sub-picosecond and picoseconds); b) very sharp rising and falling edges; c) wide
bandwidth (few nm); and d) very low pulse-to-pulse jitter depending on the mode of
generation. Because of these properties, short pulses in interconnects may provide the
sensitivity enhancement and latency reduction in receivers, synchronization of large
modulator arrays, precise clock injection to silicon chips, and single-source WDM. A
short pulse system is feasible only in optics because of low attenuation and dispersion
during propagation.
The flip-chip bonding process as described in Chapter 3 allows the integration
of well-established high-performance silicon circuits with optically superior GaAs de-
vices. This integration enables the use of short pulses in optical interconnects to
silicon chips with very little added parasitics.
Chapter 4 showed that the sensitivity of transimpedance and integrating receivers
106
CHAPTER 8. CONCLUSIONS 107
could be enhanced by using short pulses. A 3 dB sensitivity enhancement for an inte-
grating receiver was demonstrated. A transimpedance receiver operating with short
pulses may generate voltage spikes on the supply, which can degrade its performance.
In contrast, an integrating receiver integrates the charge in the pulses and will not
generate current spikes because of short pulses. This receiver was found to be better
suited for operation with short pulses.
The latency of optical interconnects can be reduced to make them feasible for
global on-chip interconnects as shown in Chapter 5. The latency of three receiver
architectures (transimpedance, integrating, and totem-pole diode pair) was analyzed.
Short pulses significantly improved the performance of all three receivers. It was
demonstrated that the latency of the transimpedance receiver could be reduced by ∼65% by using short pulses compared to NRZ data. A totem-pole diode pair (“recless”)
receiver had the shortest delay in short pulse interconnects, but at the expense of
optical power.
In dense parallel interconnects, synchronization of all the bit streams on the re-
ceiver side for easy data recovery is a critical task. Skew and jitter of up to half a bit
can be removed from an entire array of modulators with a short pulse readout. Skew
and jitter removal of 3/8 of a bit period was demonstrated in Chapter 6. Compared to
schemes such as per-pin skew compensation, this scheme is simple and easily scalable
without reducing the density of interconnects. The laser output power requirement
scales linearly with the number of channels.
A precise skew and jitter-free clock is required in applications such as high speed
multiplexing and demultiplexing, analog-to-digital conversion, and precise sampling
of on-chip signals. It is demonstrated that by eliminating the receiver amplifier cir-
cuit and using only the diode pair, a precise clock can be injected into the circuit.
Even though silicon detectors have a long-tail response at 850 nm because of deep
carriers, they were used because of the potential for lower capacitance and cost. The
high frequency response of these detectors was obtained using on-chip samplers. A
proof-of-principle experiment presented in Chapter 6 demonstrated the precise clock
injection with very low jitter.
Many advantages of short pulses were incorporated in the demonstration of a
CHAPTER 8. CONCLUSIONS 108
spectrally-sliced WDM interconnect system in Chapter 7. The main features of such
a system are: no need for wavelength monitoring, receiver sensitivity enhancement,
latency reduction in receivers, and synchronization of all the channels. This system
was operated at 80 MHz, the repetition rate of the short pulse laser. The losses were
very high in the system and the transportation of all the channels via fiber could not
be demonstrated.
A very large IO throughput can be achieved by using flip-chip bonded MQW
diodes. In the present work, 200 diodes were integrated in an area of ∼ 1.2 × 1.2 mm2.
With a differential scheme, a total of 100 IO could be potentially operated. Assuming
a conservative speed of operation at 600 Mbps (it improves with technology scaling),
the total throughput could be 60 Gbps from this chip. This demonstrates the huge
throughput possible with optical interconnects.
Future work
This dissertation has tried to explore short pulse (RZ) signaling in interconnects. This
work has just scratched the surface of this potentially vast field. Dense interconnects
to silicon chips, and global on-chip interconnects might be practical using short pulses.
To demonstrate the feasibility, systems with the possibility of miniaturization need
to be built. It means that the packaging of optical systems becomes a critical issue.
Traditionally, the packaging has been one of the bottlenecks in widespread implemen-
tation of optics. A lot of effort is being put in to miniaturize optical systems and
to improve the optomechanics. Modelocked semiconductor lasers are getting small
enough to fit into a reasonably sized system, though more research is required in this
area. Work on the optical bridges [138] to simplify and miniaturize the optomechanics
is a step forward for improved packaging.
The interconnect system in the present work can be improved in many ways.
On the component side, lower capacitance and high contrast ratio devices will be
very helpful. Devices used in this dissertation work had a capacitance of ∼ 260 fF.
Smaller MQW devices can be fabricated to reduce the capacitance to below 50 fF [30].
CHAPTER 8. CONCLUSIONS 109
Flip-chip bonding on silicon-on-insulator circuits will further reduce the capacitance
of these devices. The contrast ratio of these devices needs to be enhanced for bet-
ter signal-to-noise ratio in interconnects. Low contrast devices are easily saturated
because of the high power required to get sufficient signal strength. The circuits pre-
sented in this work are meant to demonstrate the properties and advantages of short
pulses. The optimization of these circuits, specifically for the operation with short
pulses will create a more efficient interconnect system.
The scaling of CMOS technology will help improve the performance of modulator
drivers and receivers. However, it will also create new challenges. A lower supply
voltage will make it harder to get a high contrast ratio from the modulators. These
modulators will need to be redesigned to operate with smaller swings. Or, failing
that, the circuits will need to provide a larger-than-supply swing to operate the mod-
ulators.
Bibliography
[1] J. Goodman, F. Leonberger, S. Kung, and R. Athale, “Optical Interconnections
for VLSI Systems,” Proceedings of the IEEE, vol. 72, pp. 850–866, 1984.
[2] J. Goodman, “Fan-in and fan-out with optical interconnects,” Optica Acta,
vol. 32, pp. 1489–1496, 1985.
[3] M. Feldman, S. Esener, C. Guest, and S. Lee, “Comparison between optical
and electrical interconnects based on power and speed considerations,” Applied
Optics, vol. 27, no. 9, pp. 3820–3829, 1988.
[4] D. Miller and H. Ozaktas, “Limit to the Bit-Rate Capacity of Electrical Inter-
connects from the Aspect Ratio of the System Architecture,” Journal of Parallel
and Distributed Computing, vol. 41, pp. 42–52, Feb. 1997.
[5] D. Miller, “Physical Reasons for Optical Interconnection,” International Jour-
nal of Optoelectronics, vol. 11, pp. 155–68, May 1997.
[6] D. Miller, “Dense Optical Interconnections for Silicon Electronics,” in Trends in
Optics: Research, Developments, and Applications, vol. 3 of Ed: A. Consortini,
pp. 207–222, 1996.
[7] D. Miller, “Rationale and Challenges for Optical Interconnects to Electronic
Chips,” Proceedings of the IEEE, vol. 88, pp. 728–749, June 2000.
[8] A. Krishnamoorthy and D. Miller, “Scaling Optoelectronic-VLSI Circuits into
the 21st Century: A Technology Roadmap,” Journal Selected Topics in Quan-
tum Electronics, vol. 2, pp. 55–76, Apr. 1996.
110
BIBLIOGRAPHY 111
[9] D. Miller, “Optics for low-energy communication inside digital processors:
quantum detectors, sources, and modulators as efficient impedance converters,”
Optics Letters, vol. 14, no. 2, pp. 146–148, 1989.
[10] A. Krishnamoorthy and D. Miller, “Firehose architectures for free-space opti-
cally interconnected VLSI circuits,” Journal of Parallel and Distributed Com-
puting, vol. 41, pp. 109–114, Feb. 1997.
[11] H. Ozaktas and J. Goodman, “Implications of interconnection theory for optical
digital computing,” Applied Optics, vol. 31, no. 26, pp. 5559–5567, 1992.
[12] M. Haney and M. Christensen, “Performance Scaling Comparison for Free-
Space Optical and Electrical Interconnection Approaches,” Applied Optics,
vol. 37, pp. 2886–2894, May 1998.
[13] G. Yayla, P. Marchand, and S. Esener, “Speed and Energy Analysis of Digital
Interconnections: Comparison of On-Chip, Off-Chip, and Free-Space Technolo-
gies,” Applied Optics, vol. 37, pp. 205–227, Jan. 1998.
[14] E. Berglind, L. Thylen, B. Jaskorzynska, and C. Svensson, “A comparison of
dissipated power and signal-to-noise ratios in electrical and optical intercon-
nects,” Journal of Lightwave Technology, vol. 17, pp. 68–73, Jan. 1999.
[15] W. Dally and J. Poulton, “Transmitter equalization for 4-Gbps signaling,” IEEE
Micro, vol. 17, pp. 48–56, Jan. 1997.
[16] M. Horowitz, C. Yang, and S. Sidiropoulos, “High-speed electrical signaling:
overview and limitations,” IEEE Micro, pp. 12–24, Jan. 1998.
[17] A. Lentine, K. Goossen, J. Walker, L. Chirovsky, L. D’Asaro, S. Hui, B. Tseng,
R. Leibenguth, D. Kossives, D. Dahringer, D. Bacon, T. Woodward, and
D. Miller, “Arrays of optoelectronic switching nodes comprised of flip-chip-
bonded MQW modulators and detectors on silicon CMOS circuitry,” IEEE
Photonics Technology Letters, vol. 8, pp. 221–223, Feb. 1996.
BIBLIOGRAPHY 112
[18] D. Cutrer and K. Lau, “Ultralow power optical interconnect with zero-biased,
ultralow threshold laser-how low a threshold is low enough?,” IEEE Photonics
Technology Letters, vol. 7, pp. 4–6, Jan. 1995.
[19] R. Pu, C. Duan, and C. Wilmsen, “Hybrid integration of VCSEL’s to CMOS
integrated circuits,” Journal on Selected Topics in Quantum Electronics, vol. 5,
pp. 201 –208, Mar. 1999.
[20] A. Andreou, Z. Kalayjian, A. Apsel, P. Pouliquen, R. Athale, G. Simonis, and
R. Reedy, “Silicon on sapphire CMOS for optoelectronic microsystems,” IEEE
Circuits and Systems Magazine, vol. 1, no. 3, pp. 22–30, 2001.
[21] K. Choquette, V. Hietala, K. Geib, S. Mukherjee, and A. Allerman, “Hybrid
integrated VCSEL and driver arrays for optical interconnects,” in 13th Annual
Meeting of IEEE Lasers and Electro-Optics Society, vol. 2, pp. 424–425, 2000.
[22] F. Delpiano, B. Bostica, M. Burzio, P. Pellegrino, and L. Pesando, “10-channel
optical transmitter module operating over 10 Gb/s based on VCSEL and hybrid
integrated silicon optical bench,” in Electronic Components and Technology
Conference, pp. 759–762, 1999.
[23] K. Ebeling, “VCSELs: prospects and challenges for optical interconnects,” in
13th Annual Meeting of IEEE Lasers and Electro-Optics Society, vol. 1, pp. 7–8,
2000.
[24] D. Miller, D. Chemla, T. Damen, A. Gossard, W. Wiegmann, T. Wood, and
C. Burrus, “Band edge Electro-absorption in Quantum Well Structures: The
Quantum Confined Stark Effect,” Physical Review Letters, vol. 53, pp. 2173–
2177, Nov. 1984.
[25] G. Boyd, D. Miller, D. Chemla, S. McCall, A. Gossard, and J. English, “Mul-
tiple Quantum Well Reflection Modulator,” Applied Physics Letters, vol. 50,
pp. 1119–1121, Apr. 1987.
BIBLIOGRAPHY 113
[26] R. Simes, R. Yan, C. Barron, D. Derrickson, D. Lishan, J. Karin, L. Coldren,
M. Rodwell, S. Elliot, and B. Hughes, “High-frequency electrooptic Fabry-Perot
modulators,” IEEE Photonics Technology Letters, vol. 3, pp. 513 – 515, June
1991.
[27] K. Goossen, J. Cunningham, W. Jan, and R. Leibenguth, “On the operational
and manufacturing tolerances of GaAs-AlAs MQW modulators,” IEEE Journal
of Quantum Electronics, vol. 34, pp. 431–438, Mar. 1998.
[28] M. Islam, R. Hillman, D. Miller, D. Chemla, A. Gossard, and J. English, “Elec-
troabsorption in GaAs/AlGaAs Coupled Quantum Well Waveguides,” Applied
Physics Letters, vol. 50, pp. 1098–1100, Apr. 1987.
[29] G. Livescu, D. Miller, T. Sizer, D. Burrows, J. Cunningham, A. Gossard, and
J. English, “High-speed absorption recovery in quantum well diodes by diffusive
electrical conduction,” Applied Physics Letter, vol. 54, pp. 748–750, 1989.
[30] K. Goossen, J. Walker, L. D’Asaro, S. Hui, B. Tseng, R. Leibenguth, D. Kos-
sives, D. Bacon, D. Dahringer, L. Chirovsky, A. Lentine, and D. Miller, “GaAs
MQW modulators integrated with silicon CMOS,” IEEE Photonics Technology
Letters, vol. 7, pp. 360 –362, Apr. 1995.
[31] F. Kiamilev, J. Lambirth, R. Rozier, and A. Krishnamoorthy, “Design of a 64-
bit, 100 MIPS microprocessor core IC for hybrid CMOS-SEED technology,” in
Proceedings of the Third International Conference on Massively Parallel Pro-
cessing Using Optical Interconnections, Oct. 1996.
[32] R. Rozier and F. Kiamilev, “Design of an MCM FFT processor,” IEEE Multi-
Chip-Module Conference, pp. 83 – 88, Feb. 1997.
[33] A. Walker, T. Yang, J. Gourlay, J. Dines, M. Forbes, S. Prince, D. Baillie,
D. Neilson, R. Williams, L. Wilkinson, and G. Smith, “Optoelectronic systems
based on InGaAs-complementary-metal-oxide-semiconductor smart-pixel arrays
and free-space optical interconnects,” Applied Optics, vol. 37, pp. 2822–2830,
May 1998.
BIBLIOGRAPHY 114
[34] O. Kibar, D. Van Blerkom, F. Chi, and S. Esener, “Power minimization and
technology comparisons for digital free-space optoelectronic interconnections,”
Journal of Lightwave Technology, vol. 17, pp. 546–555, Apr. 1999.
[35] C. Fan, B. Mansoorian, D. Vanblerkom, M. Hansen, V. Ozguz, S. Esener, and
G. Marsden, “Digital free-space optical interconnections: a comparison of trans-
mitter technologies,” Applied Optics, vol. 34, pp. 3103–3115, June 1995.
[36] T. Nakahara, S. Matsuo, S. Fukushima, and T. Kurokawa, “Performance com-
parison between multiple-quantum-well modulator-based and vertical-cavity-
surface-emitting laser-based smart pixels,” Applied Optics, vol. 35, pp. 860–871,
Feb. 1996.
[37] J. Goodman, Introduction to Fourier Optics. New York: McGraw-Hill, 1968.
[38] L. Camp, R. Sharma, and M. Feldman, “Guided-wave and free-space optical
interconnects for parallel-processing systems: a comparison,” Applied Optics,
vol. 33, pp. 6168–6180, Sept. 1994.
[39] S. Esener, “Implementation and prospects for chip-to-chip free-space optical
interconnects,” in Electron Devices Meeting, 2001.
[40] P. Rosenberg, K. Giboney, A. Yuen, J. Straznicky, D. Haritos, L. Buckman,
R. Schneider, S. Corzine, F. Kiamilev, and D. Dolfi, “The PONI-1 parallel-
optical link,” in Proceedings of the Electronic Components and Technology Con-
ference, pp. 763 – 769, June 1999.
[41] N. Boden, D. Cohen, R. Felderman, A. Kulawik, C. Seitz, J. Seizovic, and
Wen-King Su, “Myrinet: a gigabit-per-second local area network,” IEEE Micro,
vol. 15, pp. 29–36, Feb. 1995.
[42] “The International Technology Roadmap for Semiconductors (2001 Edition).”
[43] R. Ho, K. Mai, and M. Horowitz, “The Future of Wires,” Proceedings of the
IEEE, vol. 89, pp. 490–504, Apr. 2001.
BIBLIOGRAPHY 115
[44] K. Tamura, “Short pulse lasers and their applications to optical communica-
tions,” in IEEE Lasers and Electro-Optics Society, vol. 2, pp. 537–538, 1999.
[45] E. Avrutin, J. Marsh, and E. Portnoi, “Monolithic and multi-gigahertz mode-
locked semiconductor lasers: constructions, experiments, models and applica-
tions,” IEE Proceedings of Optoelectronics, vol. 147, pp. 251 –278, Aug. 2000.
[46] S. Shen and A. Wiener, “Demonstration of timing skew compensation for bit-
parallel WDM data transmission with picosecond precision,” IEEE Photonics
Technology Letters, vol. 11, pp. 566–568, May 1999.
[47] L. Boivin, M. Nuss, J. Shah, D. Miller, and H. Haus, “Receiver sensitivity
improvement by impulsive coding,” IEEE Photonics Technology Letters, vol. 9,
pp. 684–686, May 1997.
[48] J. Brubaker, F. McCormick, F. Tooley, J. Sasian, T. Cloonan, A. Lentine,
S. Hinterlong, and M. Herron, “Optomechanics of a free-space photonic switch:
the components,” in Proceedings of the SPIE, vol. 1533, Dec. 1991.
[49] Hans Peter Herzig, Micro-Optics Elements, Systems and Applications. Taylor
& Francis Inc., 1997.
[50] J. Jahns and S. Sinzinger, Microoptics. John Wiley & Sons, 1999.
[51] J. Lin, J. Gamelin, S. Wang, M. Hong, and J. Mannaerts, “Short pulse gen-
eration by electrical gain switching of vertical cavity surface emitting laser,”
Electronics Letters, vol. 27, pp. 1956–1958, Oct. 1991.
[52] N. Stelmakh, J.-M. Lourtioz, G. Marquebielle, G. Volluet, and J.-P. Hirtz,
“Generation of high-energy (0.3 /spl mu/ J) short pulses (400 ps) from a gain-
switched laser diode stack with subnanosecond electrical pump pulses,” Journal
on Selected Topics in Quantum Electronics, vol. 3, pp. 245–249, Apr. 1997.
[53] C. Chang, C. Sun, D. Albares, and E. Jacobs, “High-energy (59 pJ) and
low-jitter (250 fs) picosecond pulses from gain-switching of a tapered-stripe
BIBLIOGRAPHY 116
laser diode via resonant driving,” IEEE Photonics Technology Letters, vol. 8,
pp. 1157–1159, Sept. 1996.
[54] B.-L. Lee and C.-F. Lin, “Short-pulse generation with broad-band tunability
from semiconductor lasers in an external ring cavity,” IEEE Photonics Tech-
nology Letters, vol. 12, pp. 618–620, June 2000.
[55] S. Arahira, Y. Matsui, T. Kunii, S. Oshiba, and Y. Ogawa, “Optical short pulse
generation at high repetition rate over 80 GHz from a monolithic passively
modelocked DBR laser diode,” Electronics Letters, vol. 29, pp. 1013–1015, May
1993.
[56] L. Krainer, R. Paschotta, G. Spuhler, I. Klimov, C. Teisset, K. Weingarten,
and U. Keller, “Tunable picosecond pulse-generating laser with repetition rate
exceeding 10 GHz,” Electronics Letters, vol. 38, pp. 225–227, Feb. 2002.
[57] A. Krishnamoorthy, T. Woodward, R. Novotny, K. Goossen, J. Walker,
A. Lentine, L. D’Asaro, S. Hui, B. Tseng, R. Leibenguth, D. Kossives,
D. Dahringer, L. Chirovsky, G. Aplin, R. Rozier, F. Kiamilev, and D. Miller,
“Ring oscillators with optical and electrical readout based on hybrid GaAs
MQW modulators bonded to 0.8 um silicon VLSI circuits,” Electronics Letters,
vol. 31, pp. 1917 –1918, Oct. 1995.
[58] T. Woodward, A. Krishnamoorthy, K. Goossen, J. Walker, B. Tseng, J. Lothian,
S. Hui, and R. Leibenguth, “Modulator-driver circuits for optoelectronic VLSI,”
IEEE Photonics Technology Letters, vol. 9, pp. 839–841, June 1997.
[59] E. McCluskey, Logic Design Principles: with Emphasis on Testable Semicustom
Circuits. Prentice-Hall, 1986.
[60] S. Golomb, Shift Register Sequence. Aegean Park Press, 1982.
[61] E. Yeung and A. Horowitz, “A 2.4 Gb/s/pin Simultaneous Bidirectional Parallel
Link with Per-Pin Skew Compensation,” Journal of Solid State Circuits, vol. 35,
pp. 1619–1628, Nov. 2000.
BIBLIOGRAPHY 117
[62] P. Larsson and C. Svensson, “Measuring high-bandwidth signals in CMOS cir-
cuits,” Electronics Letters, vol. 29, pp. 1761 – 1762, Sept. 1993.
[63] R. Ho, B. Amrutur, K. Mai, B. Wilburn, T. Mori, and M. Horowitz, “Appli-
cations of on-chip samplers for test and measurement of integrated circuits.,”
IEEE Symposium on VLSI Circuits, pp. 138–139., June 1998.
[64] S. Tewksbury, L. Hornak, H. Nariman, S. Langsjoen, and S. McGinnis, “Coin-
tegration of optoelectronics and submicron CMOS,” in Proceedings of Wafer
Scale Integration, pp. 358 – 367, Jan. 1993.
[65] A. Krishnamoorthy and K. Goossen, “Optoelectronic-VLSI: photonics inte-
grated with VLSI circuits,” IEEE Journal on Selected Topics in Quantum Elec-
tronics, vol. 4, pp. 899 –912, Nov. 1998.
[66] A. Krishnamoorthy, A. Lentine, K. Goossen, J. Walker, T. Woodward,
J. Ford, G. Aplin, L. D’Asaro, S. Hui, B. Tseng, R. Leibenguth, D. Kossives,
D. Dahringer, M. Chirovsky, and D. Miller, “3-D integration of MQW modula-
tors over active submicron CMOS circuits: 375 Mb/s transimpedance receiver-
transmitter circuit,” IEEE Photonics Technology Letters, vol. 7, pp. 1288 –1290,
Nov. 1995.
[67] H. Wang, J. Luo, K. Shenoy, Y. Royter, J. Fonstad, C. G., and D. Psaltis,
“Monolithic integration of SEEDs and VLSI GaAs circuits by epitaxy on elec-
tronics,” IEEE Photonics Technology Letters, vol. 9, pp. 607–609, May 1997.
[68] M. Oren, A. McCarthy, F. Tooley, A. Laprise, D. Plant, A. Kirk, Y. Lu, and
J. Zhao, “Device processing technology for free-space optical interconnect sys-
tem,” in Electronic Components and Technology Conference, pp. 886–889, 2001.
[69] H. Chen, K. Liang, Q. Zeng, X. Li, Z. Chen, Y. Du, and R. Wu, “Flip-chip
bonded hybrid CMOS/SEED optoelectronic smart pixels,” IEE Proceedings of
Optoelectronics, vol. 147, pp. 2–6, Feb. 2000.
BIBLIOGRAPHY 118
[70] S. Personick, “Receiver design for optical fiber systems,” Proceedings of the
IEEE, vol. 65, no. 12, pp. 1670–1678, 1977.
[71] T. Woodward, A. Krishnamoorthy, A. Lentine, and L. Chirovsky, “Optical re-
ceivers for optoelectronic VLSI,” IEEE Journal on Selected Topics in Quantum
Electronics, vol. 2, pp. 106–116, Apr. 1996.
[72] T. Nakahara, H. Tsuda, K. Tateno, S. Matsuo, and T. Kurokawa, “Hybrid
integration of GaAs pin-photodiodes with CMOS transimpedance amplifier cir-
cuits,” Electronics Letters, vol. 34, pp. 1352–1353, June 1998.
[73] G. Halkias, N. Haralabidis, E. Kyriakis-Bitzaros, and S. Katsafouros, “1.7
GHz bipolar optoelectronic receiver using conventional 0.8 /spl mu/m BiC-
MOS process,” in IEEE International Symposium on Circuits and Systems,
vol. 5, pp. 417–420, 2000.
[74] N. Dutta, K. Tu, and B. Levine, “Optoelectronic integrated receiver,” Electronic
Letters, vol. 33, pp. 1254–1255, July 1997.
[75] J. Choi, B. Sheu, and O. Chen, “A monolithic GaAs receiver for optical inter-
connect systems,” IEEE Journal of Solid State Circuits, vol. 29, pp. 328–331,
Mar. 1994.
[76] H. Zimmermann, T. Heide, and A. Ghazi, “Monolithic high-speed CMOS-
photoreceiver,” Photonics Technology Letters, vol. 11, pp. 254–256, Feb. 1999.
[77] A. Tanabe, M. Soda, Y. Nakahara, T. Tamura, Y. Yoshida, and A. Furukawa,
“A Single-Chip 2.4-Gb/s CMOS Optical Receiver IC with Low Substrate Cross-
Talk Preamplifier,” IEEE Journal of Solid State Circuits, vol. 33, pp. 2148–
2153, Dec. 1998.
[78] A. Krishnamoorthy, T. Woodward, K. Goossen, J. Walker, A. Lentine, L. Chi-
rovsky, S. Hui, B. Tseng, R. Leibenguth, J. Cunningham, and W. Jan, “Op-
eration of a single-ended 550 Mbit/s, 41 fJ, hybrid CMOS/MQW receiver-
transmitter,” Electronics Letters, vol. 32, pp. 764–766, Apr. 1996.
BIBLIOGRAPHY 119
[79] T. Woodward and L. Chirovsky, “Operation of diode-clamped FET-SEED op-
tical receivers with low-contrast single-ended signals,” Photonics Technology
Letters, vol. 7, pp. 1489–1492, Dec. 1995.
[80] T. Yoon and B. Jalali, “1 Gbit/s fibre channel CMOS transimpedance ampli-
fier,” Electronics Letters, vol. 33, pp. 588–589, Mar. 1997.
[81] D. Blerkom, Chi Fan, M. Blum, and S. Esener, “Transimpedance receiver design
optimization for smart pixel arrays,” IEEE Journal of Lightwave Technology,
vol. 16, pp. 119–126, Jan. 1998.
[82] M. Forbes, Electronic design issues in high-bandwidth parallel optical interfaces
to VLSI circuits. PhD thesis, Heriot-Watt University, Mar. 1999.
[83] T. Woodward, “Optical receivers for smart pixel applications,” in Lasers and
Electro-Optics Society Annual Meeting, vol. 1, pp. 67–68, 1995.
[84] P. Winzer and A. Kalmar, “Sensitivity enhancement of optical receivers by
impulsive coding,” Journal of Lightwave Technology, vol. 17, pp. 171–177, Feb.
1999.
[85] J. Dines, “Smart pixel optoelectronic receiver based on a charge sensitive ampli-
fier design,” IEEE Journal on Selected Topics in Quantum Electronics, vol. 2,
pp. 117–120, Apr. 1996.
[86] T. Woodward, A. Krishnamoorthy, K. Goossen, J. Walker, J. Cunningham,
W. Jan, L. Chirovsky, S. Hui, B. Tseng, D. Kossives, D. Dahringer, D. Ba-
con, and R. Leibenguth, “Clocked-sense-amplifier-based smart-pixel optical re-
ceivers,” Photonics Technology Letters, vol. 8, pp. 1067–1069, Aug. 1996.
[87] M. Kuijk, D. Coppee, and R. Vounckx, “Spatially modulated light detector in
CMOS with sense-amplifier receiver operating at 180 Mb/s for optical data link
applications and parallel optical interconnects between chips,” IEEE Journal
on Selected Topics in Quantum Electronic, vol. 4, pp. 1040–1045, Nov. 1998.
BIBLIOGRAPHY 120
[88] M. Matsui, H. Hara, Y. Uetani, Lee-Sup Kim, T. Nagamatsu, Y. Watanabe,
A. Chiba, K. Matsuda, and T. Sakurai, “A 200 MHz 13 mm/sup 2/ 2-D DCT
macrocell using sense-amplifying pipeline flip-flop scheme,” IEEE Journal of
Solid-State Circuits, vol. 29, pp. 1482–1490, Dec. 1994.
[89] B. Nikolic, V. Oklobdzija, V. Stojanovic, J. Wenyan, K. James, and L. Ming-
Tak, “Improved sense-amplifier-based flip-flop: design and measurements,”
IEEE Journal of Solid-State Circuits, vol. 35, pp. 876–884, June 2000.
[90] K. Williams, M. Dennis, I. Duling, C. Villarruel, and R. Esman, “A simple
high-speed high-output voltage digital receiver,” Photonics Technology Letters,
vol. 10, pp. 588–590, Apr. 1998.
[91] M. Yoneyama, K. Takahata, T. Otsuji, and Y. Akazawa, “Analysis and applica-
tion of a novel model for estimating power dissipation of optical interconnections
as a function of transmission bit error rate,” Journal of Lightwave Technology,
vol. 14, pp. 13–22, Jan. 1996.
[92] G. Keeler, D. Agarwal, B. Nelson, N. Helman, and D. Miller, “Performance
enhancement of an optical interconnect using short pulses from a modelocked
diode laser,” in Conference on Lasers and Electro-Optic Society, 2002.
[93] W. Dally and J. Poulton, Digital Systems Engineering. Cambridge University
Press, 1998.
[94] A. Dowlatabadi, “Challenges in CMOS mixed-signal designs for analog circuit
designers,” in Midwest Symposium on Circuits and Systems, vol. 1, pp. 47–50,
1997.
[95] G. Keeler, D. Agarwal, C. Debaes, B. Nelson, C. Helman, H. Thienpont, and
D. Miller, “Optical pump-probe measurements of the latency of silicon CMOS
optical interconnects,” IEEE Photonics Technology Letters, vol. 14, pp. 1214–
1216, Aug. 2002.
BIBLIOGRAPHY 121
[96] D. Agarwal, G. Keeler, B. Nelson, N. Helman, and D. Miller, “Optical inter-
connect operation with high noise immunity,” in Conference on Lasers and
Electro-Optic Society, 2002.
[97] A. Deutsch, P. Coteus, G. Kopcsay, H. Smith, C. Surovic, B. Krauter, D. Edel-
stein, and P. Restle, “On-chip wiring design challenges for gigahertz operation,”
Proceedings of the IEEE, vol. 89, pp. 529–555, Apr. 2001.
[98] J. Dambre, H. Van Marck, and J. Van Campenhout, “Quantifying the impact
of optical interconnect latency on the performance of optoelectronic FPGAs,”
in The 6th International Conference on Parallel Interconnects, pp. 91–97, Oct.
1999.
[99] J. Collet, D. Litaize, J. VanCampenhout, C. Jesshope, M. Desmulliez, H. Thien-
pont, J. Goodman, and A. Louri, “Architectural approach to the role of optics in
monoprocessor and multiprocessor machines,” Applied Optics, vol. 39, pp. 671–
682, Feb. 2000.
[100] H. Neefs, P. Van Heuven, and J. Van Campenhout, “Latency requirements of
optical interconnects at different memory hierarchy levels of a computer sys-
tem,” in Proceedings of SPIE on Optics Computing, vol. 3490, pp. 552–555,
1998.
[101] E. Kyriakis-Bitzaros, N. Haralabidis, Y. Moisiadis, M. Lagadas, A. Georgakilas,
and G. Halkias, “Comparison of the signal latency in optical and electrical
interconnections for interchip links,” Optical Engineering, vol. 40, pp. 144–146,
Jan. 2001.
[102] B. Cherkauer and E. Friedman, “A unified design methodology for CMOS ta-
pered buffers,” IEEE Transactions on Very Large Scale Integration (VLSI) Sys-
tems, vol. 3, pp. 99–111, Mar. 1995.
[103] E. Kyriakis-Bitzaros, N. Haralabidis, M. Lagadas, A. Georgakilas, Y. Moisiadis,
and G. Halkias, “Realistic end-to-end simulation of the optoelectronic links
BIBLIOGRAPHY 122
and comparison with the electrical interconnections for system-on-chip applica-
tions,” Journal of Lightwave Technology, vol. 19, pp. 1532–1542, Oct. 2001.
[104] J. Weiland, H. Melchior, M. Kearley, C. Morris, A. Moseley, M. Goodwin,
and R. Goodfellow, “Optical receiver array in silicon bipolar technology with
selfaligned, low parasitic III/V detectors for DC-1 Gbit/s parallel links,” Elec-
tronics Letters, vol. 27, pp. 2211–2213, Nov. 1991.
[105] D. Agarwal and D. Miller, “Latency in short pulse based optical interconnects,”
in The 14th Annual Meeting of the IEEE Lasers and Electro-Optics Society,
vol. 2, pp. 812–813, 2001.
[106] B. Wooley, “EE315 class notes.”
[107] M. Horowitz, “EE372 class notes.”
[108] H. Johansson and C. Svensson, “Time Resolution of NMOS Sampling Switches
Used on Low-Swing Signals,” IEEE Journal of Solid State Circuits, vol. 33,
pp. 237–245, Feb. 1998.
[109] G. Keeler, B. Nelson, D. Agarwal, and D. Miller, “Skew and jitter removal using
short optical pulses for optical interconnection,” IEEE Photonics Technology
Letters, vol. 12, pp. 1041–1135, June 2000.
[110] P. Restle, T. McNamara, D. Webber, P. Camporese, K. Eng, K. Jenkins,
D. Allen, M. John, M. Quaranta, D. Boerstler, C. Alpert, C. Carter, R. Bai-
ley, and J. Petrovick, “A clock distribution network for microprocessors,” IEEE
Journal of Solid-State Circuits, vol. 36, pp. 792–799, June 2000.
[111] X. Jiang and S. Horiguchi, “Optimization of wafer scale H-tree clock distribu-
tion network based on a new statistical skew model,” in IEEE International
Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 96–104, 2000.
[112] J.-H. Yeh, R. Kostuk, and Kun-Yii Tu, “Board level H-tree optical clock dis-
tribution with substrate mode holograms,” Journal of Lightwave Technology,
vol. 13, pp. 1566–1578, July 1995.
BIBLIOGRAPHY 123
[113] H. Fair and D. Bailey, “Clocking design and analysis for a 600 MHz Alpha
microprocessor,” in IEEE Solid-State Circuits Conference, pp. 398–399, 1998.
[114] Y. Ismail, E. Friedman, and J. Neves, “Exploiting the on-chip inductance in
high-speed clock distribution networks,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 9, pp. 963–973, Dec. 2001.
[115] G. Pratt and J. Nguyen, “Distributed synchronous clocking,” IEEE Transac-
tions on Parallel and Distributed Systems, pp. 316–330, Mar. 1995.
[116] P. Varma and K. Ramganesh, “Skewing clock to decide races - double-edge-
triggered flip-flop,” Electronics Letters, vol. 37, pp. 1506–1507, Dec. 2001.
[117] P. Zarkesh-Ha, T. Mule, and J. Meindl, “Characterization and modeling of clock
skew with process variations,” in Proceedings of the IEEE Custom Integrated
Circuits, pp. 441–444, May 1999.
[118] V. Mehrotra and D. Boning, “Technology scaling impact of variation on clock
skew and interconnect delay,” in Interconnect Technology Conference, pp. 122–
124, June 2001.
[119] C. Zhao and R. Chen, “Performance consideration of three-dimensional opto-
electronic interconnection for intra-multichip-module clock signal distribution,”
Applied Optics, pp. 2537–2544, Apr. 1997.
[120] P. Delfyett, D. Hartman, and S. Ahmad, “Optical clock distribution using a
mode-locked semiconductor laser diode system,” Journal of Lightwave Technol-
ogy, vol. 9, pp. 1646–1649, Dec. 1991.
[121] S. Kawanishi, Y. Yamabayashi, T. Takada, H. Takara, M. Saruwatari, and
K. Nakagawa, “2 Gb/s operation of an optical-clock-driven monolithically in-
tegrated GaAs D-flip-flop with metal-semiconductor-metal photodetectors for
high-speed synchronous circuits,” Photonics Technology Letters, vol. 4, pp. 160–
162, Feb. 1992.
BIBLIOGRAPHY 124
[122] T. Woodward and A. Krishnamoorthy, “1-Gb/s integrated optical detectors and
receivers in commercial CMOS technologies,” IEEE Journal on Selected Topics
in Quantum Electronics, vol. 5, pp. 146–156, Mar. 1999.
[123] G. E. Stillman, V. M. Robbins, and N. Tabatabaie, “III-V compound semi-
conductor devices: Optical detectors,” IEEE Transaction on Electron Devices,
vol. ED-31, p. 1643 1655, 1984.
[124] R. Perry, “Analysis and characterization of the spectral response of CMOS
based integrated circuit (IC) photodetectors,” in Proceedings of the Thir-
teenth Biennial University/Government/Industry Microelectronics Symposium,
pp. 170 – 175, June 1999.
[125] S. Csutak, J. Schaub, W. Wu, and J. Campbell, “High-speed monolithically in-
tegrated silicon optical receiver fabricated in 130-nm CMOS technology,” IEEE
Photonics Technology Letters, vol. 14, pp. 516 – 518, Apr. 2002.
[126] T. Heide, A. Ghazi, H. Zimmermann, and P. Seegebrecht, “Monolithic CMOS
photoreceivers for short-range optical data communications,” Electronics Let-
ters, vol. 35, pp. 1655–1656, Sept. 1999.
[127] C. Rooman, D. Coppee, and M. Kuijk, “Asynchronous 250-Mb/s optical re-
ceivers with integrated detector in standard CMOS technology for optocoupler
applications,” IEEE Journal of Solid-State Circuits, vol. 35, pp. 953–958, July
2000.
[128] C. Schow, J. Schaub, R. Li, J. Qi, and J. Campbell, “A monolithically in-
tegrated 1-Gb/s silicon photoreceiver,” IEEE Photonics Technology Letters,
vol. 11, pp. 20 –121, Jan. 1999.
[129] J.-F. Roux, J.-L. Coutaz, and S. Tedjini, “All-optical high-frequency charac-
terization of optical devices for optomicrowave applications,” IEEE Photonics
Technology Letters, vol. 12, pp. 1031 – 1033, Aug. 2000.
BIBLIOGRAPHY 125
[130] C. Debaes, D. Agarwal, A. Bhatnagar, H. Thienpont, and D. Miller, “High-
impedance high-frequency silicon detector response for precise receiverless op-
tical clock injection,” in Proceedings of the SPIE, vol. 4654, pp. 78–88, 2002.
[131] S. Wagner and T. Chapuran, “Broadband high-density WDM transmission us-
ing superluminescent diodes,” Electronics Letters, vol. 26, pp. 696–697, May
1990.
[132] D. Sampson and W. Holloway, “100 mW spectrally-uniform broadband
ASE source for spectrum-sliced WDM systems,” Electronics Letters, vol. 30,
pp. 1611–1612, Oct. 1994.
[133] M. Nuss, W. Knox, and D. Miller, “Dense WDMwith femtosecond laser pulses,”
in Lasers and Electro-Optics Society Annual Meeting, vol. 2, pp. 199–200, 1994.
[134] L. Boivin, M. Nuss, S. Cundiff, W. Knox, and J. Stark, “103-channel chirped-
pulse WDM transmitter,” in Conference on Optical Fiber Communication,
pp. 276–277, 1997.
[135] B. Collings, M. Mitchell, L. Boivin., and W. Knox, “A 1021 channel WDM
system,” IEEE Photonics Technology Letters, vol. 12, pp. 906–908, July 2000.
[136] B. Nelson, G. Keeler, D. Agarwal, N. Helman, and D. Miller, “Demonstration
of a wavelength division multiplexed chip-to-chip optical interconnect,” Con-
ference on Lasers and Electro-Optic Society, 2002.
[137] H. Shi, J. Finlay, G. Alphonse, J. Connolly, and P. Delfyett, “Multiwavelength
10-GHz picosecond pulse generation from a single-stripe semiconductor diode
laser,” IEEE Photonics Technology Letters, vol. 9, pp. 1439–1441, Nov. 1997.
[138] H. Thienpont, C. Debaes, V. Baukens, H. Ottevaere, P. Vynck, P. Tuteleers,
G. Verschaffelt, B. Volckaerts, A. Hermanne, M. Hanney, and I. Veretennicoff,
“Plastic Micro-Optical Interconnection Modules for Parallel Free-space inter-
and intra-MCM Data Communication,” in Proceedings of the IEEE, 2000.