Post on 31-May-2020
transcript
The Internetand the Foundations of Computer Science
Catherine C. McGeoch
January 22, 2013
Contents
1 Introduction 10
1.1 What is the Internet? . . . . . . . . . . . . . . . . . . . 11
1.1.1 The Layered Architecture . . . . . . . . . . . . 12
1.1.2 Where do standards come from? . . . . . . . . . 17
1.2 What is Computer Science? . . . . . . . . . . . . . . . 19
1.2.1 Process: Protocols and Algorithms . . . . . . . 20
1.2.2 Data: Codes, Formats, and Structures . . . . . 21
1.2.3 Machines and Languages . . . . . . . . . . . . . 21
1.3 What’s in the book? . . . . . . . . . . . . . . . . . . . 22
1.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . 23
2 The Physical Layer: Bits in Motion 24
2.1 Physical Transmission Media . . . . . . . . . . . . . . . 25
2.2 Bit Transmission Rates . . . . . . . . . . . . . . . . . . 30
2.3 Storage Media . . . . . . . . . . . . . . . . . . . . . . . 42
2.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . 43
1
CONTENTS 2
2.5 Questions . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Binary Codes 49
3.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2 Letters and Symbols . . . . . . . . . . . . . . . . . . . 68
3.3 Sounds and images . . . . . . . . . . . . . . . . . . . . 71
3.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . 76
3.5 Questions . . . . . . . . . . . . . . . . . . . . . . . . . 77
4 Protocols 80
4.1 Two Dining Philosophers . . . . . . . . . . . . . . . . . 83
4.1.1 Two Bad Protocols . . . . . . . . . . . . . . . . 86
4.1.2 A random backoff protocol. . . . . . . . . . . . 93
4.1.3 A channel partitioning protocol. . . . . . . . . . 95
4.1.4 A token-passing protocol. . . . . . . . . . . . . 100
4.2 Designing protocols for real networks. . . . . . . . . . . 105
4.2.1 Network topologies. . . . . . . . . . . . . . . . . 105
4.2.2 Physical Properties of Networks . . . . . . . . . 107
4.2.3 Frame Formats and Binary Codes . . . . . . . . 109
4.2.4 Error control. . . . . . . . . . . . . . . . . . . . 110
4.3 Example Link Layer Protocols . . . . . . . . . . . . . . 110
4.3.1 Ethernet (IEEE 802.3) . . . . . . . . . . . . . . 111
4.3.2 Wireless Ethernet . . . . . . . . . . . . . . . . . 111
CONTENTS 3
4.3.3 ATM . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4 Chapter Review . . . . . . . . . . . . . . . . . . . . . . 111
4.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5 State Machines 112
5.1 Describing Protocols With State Machines. . . . . . . . 112
5.1.1 A Simple State Machine . . . . . . . . . . . . . 113
5.1.2 State Machines for Dining Philosophers . . . . . 121
5.2 Analyzing Protocols . . . . . . . . . . . . . . . . . . . 124
5.2.1 Deadlock Avoidance . . . . . . . . . . . . . . . 124
5.2.2 Liveness properties . . . . . . . . . . . . . . . . 124
5.2.3 Efficiency . . . . . . . . . . . . . . . . . . . . . 125
5.3 Other Uses For State Machines . . . . . . . . . . . . . 125
5.3.1 Representing Patterns and Formats . . . . . . . 125
5.3.2 Representing Languages . . . . . . . . . . . . . 125
5.3.3 Playing Games . . . . . . . . . . . . . . . . . . 125
6 The Network Layer 126
6.0.4 Networks and Internetworks, Simplified . . . . . 128
6.0.5 Network Service Model . . . . . . . . . . . . . . 133
6.0.6 Routing . . . . . . . . . . . . . . . . . . . . . . 138
6.1 Example: The IP Protocol . . . . . . . . . . . . . . . . 142
CONTENTS 4
7 Algorithms and Data Structures 146
7.1 Reading pseudocode. . . . . . . . . . . . . . . . . . . . 147
7.1.1 Our First Algorithm . . . . . . . . . . . . . . . 148
7.1.2 An Algorithm with Iteration . . . . . . . . . . . 157
7.1.3 Algorithms 102 . . . . . . . . . . . . . . . . . . 163
7.2 Algorithnms 102: Data Structures and Procedures . . . 167
7.2.1 Data Structures . . . . . . . . . . . . . . . . . . 167
7.3 Finding Shortest Paths . . . . . . . . . . . . . . . . . . 171
7.3.1 The Link State Algorithm . . . . . . . . . . . . 171
7.4 Looking Things Up . . . . . . . . . . . . . . . . . . . . 172
7.4.1 Linear Search . . . . . . . . . . . . . . . . . . . 172
7.4.2 Binary Search . . . . . . . . . . . . . . . . . . . 172
7.4.3 Binary Search Trees . . . . . . . . . . . . . . . 172
7.4.4 Hash Tables . . . . . . . . . . . . . . . . . . . . 173
7.5 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . 173
8 The Transport Layer 174
8.1 Example: TCP . . . . . . . . . . . . . . . . . . . . . . 174
8.2 Operating Systems . . . . . . . . . . . . . . . . . . . . 174
9 Programming Languages 175
9.1 What Programmers Do . . . . . . . . . . . . . . . . . . 175
9.2 A Brief History of Programming Languages . . . . . . 175
CONTENTS 5
9.3 Some Example Languages . . . . . . . . . . . . . . . . 175
9.3.1 Java: A high-level language . . . . . . . . . . . 175
9.3.2 MINI: A simple assembly language . . . . . . . 175
9.3.3 MINI-CODE: A simple machine language . . . . 175
9.4 Software and Society . . . . . . . . . . . . . . . . . . . 175
9.4.1 Copyrights and Patents: Is software text or ma-
chine? . . . . . . . . . . . . . . . . . . . . . . . 175
9.4.2 The Open Source Debate . . . . . . . . . . . . . 176
9.4.3 Safety, security, and responsibility . . . . . . . . 176
10 The Internet And Cryptography 177
10.1 Secret Codes . . . . . . . . . . . . . . . . . . . . . . . . 177
10.2 Public Key Encryption . . . . . . . . . . . . . . . . . . 181
10.2.1 Cryptographic Protocols . . . . . . . . . . . . . 183
10.2.2 One-Way Trapdoor Functions . . . . . . . . . . 188
10.3 Encryption on the Internet. . . . . . . . . . . . . . . . 193
10.4 Notable Encryption Technologies . . . . . . . . . . . . 196
10.5 Privacy, Security, and the Law . . . . . . . . . . . . . . 199
Preface
To The Student
In 1908, James. E. Homans published Self-Propelled Vehicles, a Com-
prehensive Treatise on the Theory, Construction, Operation, Care,
and Management of all forms of automobiles.” His preface points out
that the motor vehicle is an extremely complex machine:
Its construction and operation involve the consideration
of an extensive range of facts in several widely separated
departments. The study of its construction and operation
is a liberal education in itself.
In order to answer every question that must occur .... one
must produce a whole library of books, rather than a single
volume of convenient size. Virtually all such questions
may be forestalled, however by clear explanations of the
principles governing the design and construction of the
machine.
Substitute “Internet” for “motor vehicle” in the above, and the
claim hold today: The study of the construction and operation of the
Internet requires a collection of information from a wide variety of
6
CONTENTS 7
sources; but a clear explanation of the principles governing its design
can forstall a great deal of perplexity.
This book sets out to do for the Internet what Homans did for the
automobile: to present a comprehensive overview of the major Inter-
net technologies, with an emphasis on the scientific principles under-
lying their design, rather than on instructions for their use. You will
find no information on how to install and use the Netscape browser:
but you will learn how browsers work in general, and what makes one
browser design better than another. You will not learn how to install
an Ethernet card, but you will be able to evaluate the pro’s and con’s
of the the Ethernet media access protocol.
Together with a tour of Internet technologies, the purpose of this
book is to provide a broad overview of the major questions and re-
search areas that make up the discipline of Computer Science: througout,
the technology is used to introduce related fundamental questions, and
conceptual tools for attacking those questions. If you intend to ma-
jor in Computer Science, this book will give you a solid foundation
on which to base your studies; if you do not intend to study Com-
puter Science futher, this book will give you an appreciation for the
intellectual challenges we face, and a better understanding of what
technological advances the future might hold.
The book assumes no prior experience with the Internet or with
computers, beyond a mild familiarity with its most visible parts: if
you have used email, if you know what a web browser is, you’ll be
fine. Some sections assume mathematical background at about the
middle school level: if you can multiply and divide, and if you vaguely
remember how exponentials and logarithms work, you’ll be fine.
CONTENTS 8
To the Professor
This book presents a broad first introduction to the discipline of
computer science. Most of the topics recommended as fundamental
computer science concepts by the joint ACM/IEEE-CS task force in
Computing Curricula 2001 1 are presented here: algorithmic thinking
(design and analysis); data structures and data representation; the
importance of abstraction; and the role of programming languages
in the computational enterprise. Several topics relating to computer
science and society are also covered.
What makes this book different from similar introductory text-
books is that these topics are tied to a survey of Internet technologies,
rather than to computing technologies that reside “inside the box.”
For example, a conventional texbook might present various storage
media (RAM and ROM, magnetic disks, magnetic tapes, etc.) and
use that presentation to motivate a discussion of bit codes for rep-
resenting information; this textbook, instead, surveys physical trans-
mission media (copper wire, fiber optics, wireless) on the way to a
discussion of bit codes and bit patterns.
In a similar way, the discusion of data structures for the search
problem is naturally introduced by a networking question: how might
a router organize a routing table (which contains keyed informa-
tion) for most efficient lookups? The discussion of intractability is
prompted by questions arising in public-key cryptography; and the
survey of programming languages informs a discussion of the Digital
Millineum Copyright Act and whether software should be considered
free speech, a patentable machine, or copyrightable intellectual prop-
erty. Old wine, new bottles: the computer science part is fundamental
and largely unchanged; the technology part joins the 21st century.
1www.computer.org/education/cc2001/final/
CONTENTS 9
Why reorganize old material in this new way? Because it is fun
and topical. The Internet has rocketed to the forefront of modern con-
sciousness, yet its inner workings remain mysterious to most people.
Students are deeply interested in learning more about how it works,
and the study of Internet technologies is rich in jumping-off places for
investigation of the deeper questions of computer science, and of the
social issues that determine how the Internet will evolve.
Chapter 1
Introduction
Internal linksare in red. Ex-ternal links arein blue.
This book provides a broad survey of Internet technologies, and of the
discipline of Computer Science. Although the two areas are related,
not confuse one with the other: Science is concerned with discovery of
universal and demonstrable truths, and exploring the limits of what
we can know; Technology is about building useful and reliable tools.
New scientific knowledge is gained through rigorous effort and adopted
after careful review by trained skeptics. New technologies can sweep
through societies like wildfire or disappear overnight.
Check out the following companion website http:www.cs.amherst.
edu/ccm/ for fun.
Or you can visit my home page.
Here is how you would add a link to an internal label. See the
definition of the application layer in the next section.
10
http:www.cs.amherst.edu/ccm/http:www.cs.amherst.edu/ccm/http://www.cs.amherst.edu/ccm/
CHAPTER 1. INTRODUCTION 11
1.1 What is the Internet?Here is a pictureof twisted pairwire.Our exploration follows a well-documented conceptual structure called
the layered architecture of the Internet. This layered architecture or-
ganizes the enormous complexity of the Internet into distinct inter-
acting parts, called “layers.” Most Internet products are designed
to run “within” specific layers of this architecture, and to interact
with one another “across” these layers. We shall consider these lay-
ers in bottom-up order in this text; each new layer will prompt some
“deeper” questions of computer science. This is a rightside margin.You can readmore in Chapter3.
We start with some definitions. A host s a generic term for any
computer-like device that is connected to a network, including a per-
sonal computer, a multi-user computer, printer, router, and other
types of devices. Computers and hosts run programs. When a pro-
gram is running inside a host, it is called a process. Depending on the
capacity and complexity of a host, between one and several hundred
processes might be resident at any given time.
A network is a collection of hosts that are all owned and con-
trolled by a single organization. The Internet is a huge collection of
public and privately-owned networks, all cooperating to ship messages
from source hosts to destination hosts.
A LAN (local area network) is small, perhaps ranging over a group
of offices: LANS make up the neighborhood roads and driveways of the
so-called Information Highway. A MAN (metripolitan area network)
covers a regional area: these networks form the highways and belt-
ways that get you across town quickly. A WAN (side area network)
may span a nation or continent: these are the high speed Interstate
highways.
Occasionally, a process sends a message to another process, which
CHAPTER 1. INTRODUCTION 12
Figure 1.1: Two pictures of twisted pair wire
may be on the same host, or on a host within the same network, or a
really remote host somewhere on a distant network.
1.1.1 The Layered Architecture
In fact, a typical Internet transaction involves not only the browser
and the server processes, but several other processes running on the
client and server hosts, as well as processes running on hosts located
in between these two. These various processes belong to layers of
the different Internet Protocol Stack, according to the services they
are expected to provide. Figure 1.2 shows the five main layers of the
Internet (some subdivisions into more layers are ignored here). In the
next few sections we will see what services are provided at each layer.
The Internet layered architecture provides a conceptual device or-
ganizing the huge variety of software and hardware products that work
CHAPTER 1. INTRODUCTION 13
�� ��The UserApplication Layer: User services
Transport Layer: Concierge desk
Network Layer: Routing and delivery
Data Link Layer: Direct contact
Physical Layer: Transmission media
Examples:
Email, Web browsers
TCP, UDP
IP
Ethernet
Copper wire, Radio waves
Figure 1.2: The Layered Internet
together to make the Internet run. Any particular Internet software
or hardware product can be categorized as “belonging” to some layer
according to its basic functions and responsibilities. We say the prod-
uct provides specific services to products in the layer above it, and
it relies upon services provided by products in the layer below it. Check out thismargin too. canyou add a url?No matter where they live in the layered architecture, a pair of
processes performing an Internet transaction are known by their roles
as the client and the server. In the usual scenario, the server process
waits patiently on its host for something to happen; the client process
initiates contact and sends a request message to the server; and the
server responds to the request.
The Application Layer The application layer contains the software
that interacts with and provides services to Internet users, also known
as human beings. This layer presents the “look and feel” of the Inter-
net, and is the one you are probably most familiar with.
CHAPTER 1. INTRODUCTION 14
For example, your favorite web browser is an example of an application-
layer product.
If you want to view a particular page on the Internet, you can
click on a link, type in a URL (Universal Resource Locator), or type
something into a query box and click “Submit.” This causes your
browser, acting as client, to send a request message across the Internet.
to the distant computer where the desired web page resides. A web
server process running on that host is responsible for receiving the
message from your browser and sending the requested web page back
to it. Firefox and Internet Explorer are examples of popular browser
software packages; the most popular web server software package is
called Apache.
In fact, the process is a little more complicated. The Internet
Domain Name system makes it easier for humans to remember which
hosts are where, by assigning names like Internet domain names,
like www.cs.amherst.edu or cnn.com to particular hosts. But an
Internet transaction must use the a “real” host names, which are
called IP numbers (IP means Internet Protocol). An IP number is
a 4-part numbers that look like this: 148.85.77.95.
Every time you use a domain name with your browser (via link or
typed-in URL), it must find out the IP number of the host associated
with that domain name. So when you want to view that page, your
browser first acts as a client to send an inquiry to a DNS (Domain
Name System) server somewhere on the Internet, and the server sends
back the IP number that corresponds to that domain name. Only then
can your browser act as client to send the request message to the web
server on the remote host.
Domain name lookup is an example of a service that is provided
at the application layer, but which doesn’t involve direct interaction
CHAPTER 1. INTRODUCTION 15
with users, so you don’t usually notice it. There may be other “invis-
ible” services involved in simple web-page access, which may provide
encryption/decryption of some messages, host authentication (is this
ebay.com or a fake website?), and so forth.
Here are more examples of services found at the application layer
of the Internet.
• Email. When you want to send a message to your best friendSam (sam@somewhere.com) your email client will perform a do-
main name lookup to find the IP number of Sam’s email server
host (somewhere.com), then transmit the message to the email
server running on that host, which stores it for safekeeping.
Sam’s email client will eventually ask the server if any mail
has arrived, and the server will respond with a “yes” message
(Sam’s client goes bing - you’ve got mail). Sam’s client will send
a request for the message to the server, and then will show it to
you.
• Instant Messaging.
• File transfer and exchange. This does not have to be illegal.
• Remote login. A program such as ssh allows you to sit atthe keyboard of one computer and have all your keyboard and
commands transferred to a distant computer for execution over
there. This is called having remote access to the other com-
puter. The computer you sit at runs the ssh client program,
which initiates the connection, and the distant computer runs
the ssh server program, which receives commands that you type.
CHAPTER 1. INTRODUCTION 16
Transport Layer. Your browser doesn’t contact the web server di-
rectly. Instead, it composes a request message, and asks another
process to deal with it. In fact, in order to accomplish all this sending
and receiving, all application layer processes rely on the services of a
transport layer process.
The transport layer process works like the concierge desk at the
hotel. managing the communication between guests and the outside
world. The concierge receives letters from residents and hands them
to the mail carrier, and she receives letters from the mail carrier and
notifies the appropriate resident that something has arrived.
In the context of the Internet, transport layer services are usually
provided by a two software products known as TCP (Transmission
Control Protocol) and UDP (User Datagram Protocol). The former
is slower but more reliable, the latter is quicker but less reliable. Each
product contains both client and server software.
Network Layer In our our hotel analogy, the network layer works
like an international postal service made up of local, regional, and
national offices. Local offices communicate with concierges at the
sending and receiving ends of the mail shipment, but in between, the
offices contact one another to transmit the mail in stages. Typically an
Internet message must make several hops, from host to host, network
to network, along the way to its destination.
To make it all work, every host connected to the Internet must
run a network layer program called Internet Protocol (IP). When
IP receives a message, it checks the message destination and figures
out which neighboring host to send the packet to next: this decision-
making operation is called routing. Routers must be able to monitor
the changing status of nearby hosts and come up with alternative
CHAPTER 1. INTRODUCTION 17
routes around conjested, crashed, or forbidden areas of the network.
Chapter ?? looks at major challenges and choices associated with
Transport and Network layer services and Chapter ?? explores routing
in more detail.
Data Link Layer A journey of a dozen hops begins with a single
hop. The network layer process decides which host should next receive
the message, and then contacts a process at the data link layer to
take care of the actual transmission of the message the adjacent host.
This process is responsible for contacting the neighboring host and for
ensuring that the packet is properly transmitted. The most widely
used suite of services at this layer are provided by Ethernet products,
but many alternatives are available. Chapter ?? surveys data link
layer services.
Physical Layer The direct connection between two hosts must be
made of some physical transmission medium, perhaps a wire, or per-
haps by radio waves. The bottom layer of the Internet is concerned
with the variety of technologies available for connecting hosts one
to another, and for transmitting messages as efficiently, reliably, and
cheaply as possible.
1.1.2 Where do standards come from?
A standard (including but not limited to binary code standards) is
a written document containing a precise technical specification how
some type of information domain is to be represented or how some
process should work. A draft proposal for a standard, called an RFC
(Request for Comments) is made available to industry professionals
who are invited to criticize it and to suggest modifications. After a
CHAPTER 1. INTRODUCTION 18
few rounds of criticism, the standard is officially adopted (but still
called an RFC even after adoption). The technical definition of the
Internet is contained within a large and widely dispersed collection of
RFC’s produced by several different standard-writing organizations.
Standards developed through open dialog and made public in tech-
nical documents are called open standards. In contrast, a closed stan-
dard is developed privately by a company and protected as a trade
secret. As Chapter ?? suggests, however, there is much controversy
over the open-standard vs closed-standard ideals: on the one hand,
open standards are generally recognized as more robust, error-free,
and cost-effective than closed standards; and manufacturers can in-
crease sales by claiming that their computing devices and software
products are fully compatible with products sold by others. On the
other hand, open standards conflict with traditional views of copy-
right, patent protections, trade secrets, and licensing rights.
Here are some of the major Internet open standards organizations.
Many of the codes mentioned below will discussed in this chapter.
• ANSI. The American National Standards Institute publishesstandards in all areas of manufacturing, science, and information
technology. They are responsible for the ASCII code, which is
used for representing letters and characters. Visit www.ansi.org.
• IEEE (pronounced eye-triple-eee). The Institute of Electronicsand Electrical Engineering publishes standards for number rep-
resentation and for communication protocols such as Ethernet.
Visit www.ieee.org.
• ISO/IEC. The International Organization for Standardizationdevelops standards in many technical fields, excepting electri-
cal engineering, which is covered by the affiliated International
CHAPTER 1. INTRODUCTION 19
Electrotechnical Commission. Subcommittees of these organi-
zations publish standards in many areas of computing informa-
tion technology, such as the JPEG and MPEG data compression
standards for images, sound, and video. Visit www.iso.ch.
• ISOC. The Internet Society publishes network protocol stan-dards. Their sub-organization IETF (the Internet Engineering
Task Force) has produced standards for HTTP, IP, TCP, PPP,
ICMO, and many other Internet and network communication
protocols. Visit www.isoc.org.
• The Unicode Consortium publishes the Unicode standard,which is a new expanded alternative to ASCII for representing
characters and symbols. Visit www.unicode.org.
• W3C. The World Wide Web Consortium publishes standardsfor data formats, languages, and protocols having to do with web
browsers and web servers. They are responsible for HTML and
its variations (used for creating web pages). See www.w3c.org.
• JPEG.
• MPEG.
1.2 What is Computer Science?
Although this book is organized around aspects of Internet technol-
ogy, much of the information contained here concerns fundamental
principles of Computer Science. This book is intended to provide a
comprehensive survey of the problems and methodologies of the sci-
ence, with examples and questions drawn from the technology. Most
of the questions of Computer Science are found at the conjunction
CHAPTER 1. INTRODUCTION 20
of two central activities: design and analysis of algorithmic pro-
cesses; and the design and analysis of representation schemes for
information.
1.2.1 Process: Protocols and Algorithms
An algorithm is a description of a set of steps necessary to solve
some given problem. A problem is described in terms of input and
output: for example the problem might be, given a list of n numbers
as input, to produce a list of the same numbers in sorted order, as
output. An algorithm would be a description of steps for sorting a
list of numbers – of course we need a better idea of what constitutes
a “step,” but all in good time. A protocol is a step-by-step process
similar to an algorithm, but with a different purpose. Protocols are
used for reliable communication and transfer of information among
computing devices. We use the term algorithmic process to refer
to algorithms and protocols collectively.
As it turns out, some algorithms are efficient, and some are not: for
example, a fast sorting algorithm implemented on a typical personal
computer could sort a million numbers in just a couple seconds, while
a naive sorting algorithm (likely the one you would think of first)
might take require a few hours for the same set of numbers. A major
research effort of computer science is the search for better algorithms
for given problems.
The efficiency of any given algorithm depends partially on how
exactly its input represented and organized. Indeed, the correctness
and efficiency of the whole enterprise may depend heavily on how the
algorithm itself is represented. The study of data organization and
representation such topics bit codes, data structures, and mem-
CHAPTER 1. INTRODUCTION 21
ory managment heirarchies, depending on the scale of the data
items involved (small to large). The study of ways of representing
algorithms includes areas such as software engineering, program-
ming language design, and models of computation. At one end
of this scale we try to find better methods and practices for writing
computer programs (a program is just an algorithm represented in a
particulary useful way); at the other end we consider the power of
various ways of representing algorithmic computation in general.
• An example algorithm.
• A language for describing algorithms.
• A trace of the algorithm on a particular instance.
• Two ways to represent things
• How representation affects the algorithm.
1.2.2 Data: Codes, Formats, and Structures
Process and representation – algorithm and data, protocol and data
format, program and programming language – form the yin and yang
of computer science. On the one hand we study how to do it ; on the
other we investigate how to say it. This book highlights the interplay
between these two fundamental modes of inquiry, as they arise in
several key areas of computer science.
1.2.3 Machines and Languages
How you say the algorithm can be as important as how you represent
the data. Machine is language.
CHAPTER 1. INTRODUCTION 22
1.3 What’s in the book?
Of course, progress in science both determines and is influenced by
development in technology. The Internet provides an especially rich
area for exploring the boundary between these two domains of knowl-
edge, and the purpose of this text is to explore several facets of that
intersection.
Section ?? sets out some of the major themes and concepts of
Computer Science that will be explored. Two major concerns: design
and evaluation of algorithmic proceses, and design and evaluation of
languages and representation schemes. Example?
Chapter 4 presents media access protocols, by which hosts sharing
a communication line can negotiate ways to take turns using it. A
protocol an algorithmic process: this chapter compares protocols with
respect to properties like correctness, efficiency, reliability.
Chapter 2 describes the wide variety of bit codes that are in use
to assign meanings to patterns of bits. This is a representation prob-
lem concerning small-scale items like numbers, symbols, sounds, and
pictures. We desire to find representation schemes that are compact
and that are compatible with efficient algorithms.
Chapter ?? describes a powerful notational device for representing
protocols so that their properties can be analyzed. This representation
device, known a State Machine or a Finite Automota, is useful in a
wide variety of contexts and is a fundamental abstraction of computer
science. It can be used to investigate many kinds of questions of both
representation and process.
Chapter ?? describes data structures, which are ways to represent
data and information at a medium scale – about the level of phone
CHAPTER 1. INTRODUCTION 23
books, small company databases, and diagrams of small networks.
The chapter also presents algorithms (processes) for tasks that arise
continually in Internet processing. Methods of algorithm analysis are
also introduced.
A program is an algorithm written in a programming language
that can be understood by a computer. Chapter ?? describes varieties
of programming languages, and the basic blocks upon which they
are constructed. Programming languages are representation methods
for algorithmic processes, that can be compared in terms of their
functionality and practicality.
Chapter crypto describes algorithms for cryptography, which prompts
more fundamental questions about computational processes in gen-
eral: what can be computed? What can be computed efficiently?
Chapter privacy considers issues of privacy, including the use of
public and private databases on the Internet. The design of efficient
and trustworthy databases raises questions, about both representation
and process, that are explored in this chapter.
Finally, both technology and science inform our discussion of so-
cial issues relating to the rise of the Internet. And so forth. Many
more fundamental and intriguing questions will be raised in subse-
quent chapters. Enjoy!
1.4 Further Reading
Paragrpah about further reading. Paragaph about online resources.
Chapter 2
The Physical Layer: Bits inMotion
All the information and data transmitted over the Internet – numbers,
text, photographs, video – must first be encoded into sequences of the
binary digits 0 and 1, called bits. For example, to send the message
Hi! in the widely-used ASCII code (pronounced ask-kee), you would
need to transmit this sequence of 24 bits (8 bits per character):
H i !
0100 1000 0110 1001 0010 0001
Chapter 3 describes ASCII and several other binary codes. This
chapter looks at the problem of how to move the bits from one host
to another. (Host is the generic term for a device connected to a
network: computer, printer, router, phone, etc.)
We consider three types of physical transmission media : cop-
per wire, glass fiber, and pure thin air (also known as wireless). Sec-
tion 2.1 explains how they work and compares some of their prop-
erties. Section 2.2 looks at how the medium affects bit transmission
24
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 25
a b
Figure 2.1: Panel (a) shows CAT5 cable, containing four twisted pairsof wires; panel (b) shows a cable containing a bundle of optical fibers.
rates. Section 2.3 considers the related question of how to store bits
in memories.
2.1 Physical Transmission Media
We start with a quick overview of the properties of copper, fiber optic,
and wireless transmission media.
Copper wireand opticalfiber are calledguided mediabecause thesignal followsthe contours ofthe wire.Unguided isanother wordfor wirelesstransmissiontechnologies.
Copper wire transmission works by sending electrical signals
along a wire made of (wait for it ...) copper.
A common example is twisted pair wire, made of two thin wires
coated with plastic insulation: one wire carries the electrical signal,
and the other is twisted around it to reduce sensitivity to electrostatic
interference, also known as “noise.” This is simply telephone wire,
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 26
used to connect land-line phones the world over. Figure 2.1 (a) shows
the four twisted pairs in a Category 5 (CAT5) cable.
LAN: LocalArea Network.
CAT5 is the most common type of cable used in LANs, which
are the small networks found in homes and offices. The coax cable
that connects you to your broadband cable service is another type of
copper wire technology.
Fiber optic technology works by sending light signals (typically
in the infrared range) down a thin strand made of clear glass or plastic.
The strand is made of an inner core surrounded by a tube made of
a slightly different material, called the outer core. The two cores are
designed so that the light bounces off the boundary between them
and is transmitted by reflection along the inner core, like this:
As Figure 2.1 (b) illustrates, hundreds of fiber optic strands may
be bundled in a cable, together with a stiffening wire to prevent too
much bending (which will break the fibers).
MAN:MetropolitanArea Network.WAN: WideArea Network.
Like copper wires, optical fibers are inexpensive to make. Be-
cause it supports extremely fast transmission rates, (about which
more below), it has replaced copper cable in most city-sized MANs
and continent-sized WANs.
Wireless transmission technologies work by sending electromag-
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 27
1210 10 10 10
168
Radio M’wave
4
1GHz 1THz 1PHz1MHz
Infrared Visible
Figure 2.2: The Electromagnetic Spectrum
netic signals through the air. As Figure 2.2 illustrates, electromag-
netic waves have different properties depending on their frequencies.
Frequencies are measured in Hertz (Hz), or cycles per second.
See Table 2.3for definitionsof G, T, andrelated prefixes.
Electromagnetic waves at frequencies below 300 GHz are called
radio waves. Microwaves have frequencies at the high end of the radio
range, between 1 GHz and 300 GHz. Infrared waves (used in fiber
optics) have frequencies between 300GHz and about 400THz, much
higher than microwaves, at the low end of the visible light spectrum.
Wireless networks employ waves of different frequencies depend-
ing on their suitability to the situation. For example, radio and low-
frequency microwaves can pass through solid objects like buildings,
while higher-frequency waves tend to bounce off (or be absorbed by)
solid objects. Consequently, wireless LANs (WLANs), which are usu-
ally found inside buildings, tend to use frequencies at the low end of
the microwave range. In the U.S., popular wireless technologies such
as WiFi and Bluetooth use frequencies near 2.4GHz.
On the other hand, higher-frequency signals travel farther per unit
of power and tend to yield faster transmission rates. The tower-to-
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 28
What happens inPrefix Symbol Magnitude Numeric this many secondspico p a trillionth 10−12 half-life of a bottom quarknano ν a billionth 10−9 light travels 1 footmicro µ a millionth 10−6 strobe light flashmilli m a thousandth 10−3 bullet travels 1 footcenti c a hundredth 10−2 3 movie framesdeci d a tenth 10−1
deka D ten 101
hecto h hundred 102 Indy 500: 2 lapskilo k thousand 103 16.7 minutes
mega M million 106 11.6 daysgiga G billion 109 31.7 yearstera T trillion 1012 317.7 centuriespeta P quadrillion 1015 32 megayears
Figure 2.3: Number magnitudes and their notations.
tower and tower-to-satellite transmitters found in MAN and WANs
typically use microwaves in the 4GHz to 18GHz range; in these sys-
tems, each transmitter needs to have a clear line of sight to the re-
ceiver at the other end, with no obstacles like hills or buildings in the
way. That’s why telecommunications towers are commonly found on
hilltops with transmitters/receivers place high above the treelines.
Higher signal frequencies in the infrared and visible ranges don’t
work well for long-distance wireless transmission because the signal
can be scattered by atmosphric conditions such as clouds and rain-
drops.
PAN: PersonalArea Network.
Infrared transmission is occasionally found in PANs, where the
hosts are a few inches or feet apart and have a direct line of sight.
Remote media controllers like TV clickers use wireless infrared.
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 29
Cost comparison. The raw materials for to support these transmis-
sion technologies – copper, glass, plastic – are fairly inexpensive to
acquire. And the transmitter/receiver devices that sit at the end of
wired technologies are not too difficult to build. Most of the costs as-
sociated with these technologies are due to instalation and operation.
As a general rule, unguided transmission technologies are cheaper
to install and operate than wired technologies. In LANs it is only
necessary to plug a transmitter/receiver to an incoming internet cable,
and the network is good to go – no need to string wires throughout
the building. On the other hand, CAT5 cable is pretty much already
in place nearly everywhere in the civilized world, so there may be no
installation cost. Optical cable doesn’t bend much, and is difficult
to snake through walls and around corners, and is less often used in
wired LANs.
Long distance guided networks can be expensive to install because
of the cost of acquiring property rights for the cables. One elegant
solution to this problem was devised by the Southern Pacific Railroad,
which owns rights to narrow strips of property all over the western
U.S. In 1972 they began laying fiber optic cables along their tracks
and selling network capacity. That division of the company eventually
became SPRINT, now a telecommunications giant.
Wireless long distance networks have the advantage that trans-
mission towers only require property rights to small plots of land.
However in most parts of the world it is also necessary to purchase a
license to broadcast in specific frequency bandwidths. In the U.S., for
example, since 1994 the FCC has raised over $60 billion for the U.S.
Treasury by conducting auctions to sell spectrum licenses to telecom-
munications companies for use in MANs and WANs. Wireless LANs
are allowed to operate without licenses, using frequency bands near
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 30
2.4GHz (in North America) that are allocated for short-range trans-
missions.
Security. These types of media can also be compared in terms of
security, or how much protection they provide against eavesdropping.
Anyone who has seen a spy movie knows that it is easy to place a tap
on telephone wire, although taps can be detected if you know to look
for them. Fiber optic cables are much more secure, since attaching
a device to read the signals would require breaking and splicing the
fibers, which would be easy to detect.
WEP: WiredEquivalentPrivacy. WPA:WiFi ProtectedAccess.
Wireless transmission is very insecure compared to wired trans-
mission, since anyone could eavesdrop on a wireless broadcast and it
would be impossible to detect who is listening. For this reason, most
wireless networks in commercial use employ some type of encryption
– scrambling the signal – to prevent casual or accidental eavesdrop-
ping. For example, WiFi networks encrypt signals using one of two
methods, called WEP and WPA. WEP, the older standard, is easy
to break using software widely available on the Internet; WPA and
its newer version WPA2 are considered much more secure, but not
perfectly impermeable. The National Security Agency (NSA) has
made available strong (highly secure) encryption methods for use by
telecommunication companies in tower and satellite communications.
2.2 Bit Transmission Rates
The main question about a given transmission technology is: how fast
can it move the bits?
The question does not refer to latency, which is how quickly the
signal moves from point A to point B. All the major technologies
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 31
1 0 00 Data
Signal = data plus carrier
a
0 1 0 0
Data
Clock
b
Figure 2.4: Transmitting the bits 0100. Panel (a) illustrates frequencymodulation in an analog signal. Panel (b) shows Manchester Encodingusing digital signals.
work by sending electrical and electromagnetic signals, which travel
at nearly the speed of light (impeded just a little by the physical
medium). Signal latency is about the same whether the signal is sent
over copper, glass, or air, and near-speed-of-light is about as good as
it gets on this planet.
Instead, the question is about the bit rate, measured in bits per
second (bps), which corresponds to how fast bits can follow one
another in the transmission medium. The bit rate depends on several
factors that we consider next.
Analog and digital signals. Figure 2.4 presents two examples of how
bit values can be represented by changes in electromagnetic signals.
Panel (a) shows a sinuous analog wave form and panel (b) shows a
squared-off digital wave form. Either type of waveform can be used
in any type of physical media, but newer guided technologies tend to
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 32
be digital, and unguided technologies are commonly associated with
analog methods. The amplitudeof a signal isthe verticaldifferencebetween thehighest andlowest points inadjacent wavecrests.
In analog transmission, binary values can be encoded using fre-
quency modulation (FM) or amplitude modulation (AM). We
concentrate on FM transmission since it is more common in network
communications. Recall that signal frequency refers to how quickly
one wave crest follows another (cycles per second). Frequency is in-
versely related to wavelength, which is the horizontal distance be-
tween successive wave crests. For example, 300MHz radio waves have
1 meter wavelengths, while 300THz infrared waves have wavelengths
near 1 micrometer.
Panel (a) of Figure 2.4 shows an how an FM transmitter com-
bines a binary data signal with an analog carrier signal to create the
frequency-modulated result. At the other end of the transmission, a
receiver tunes in to the carrier frequency and decodes the bit values
by measuring the differences between the carrier it expects and the
signal it receives.
Here is an animation showing how frequency modula-
tion is used to encode binary signals.
Our firstdefinition ofbandwidth.
The range of spectrum used in a particular transmission is called
a band, and bandwidth refers to the difference between maximum
and minimum frequencies in use. Bandwidth is measured in Hertz.
For example, if 0 and 1 are represented by 11kHZ and 31kHZ signals,
the bandwidth would be 20kHz.
Panel 2.4 (b) shows a common type of digital transmission method
called Manchester Encoding. This example uses two signals, which
would be carried by two separate wires in a guided medium. A con-
troller at one end of the wires sends a steady clock signal to pace and
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 33
synchronize the controller at the other end. The data wire is inspected
by the receiving controller only when the clock signal is high. During
that time, a transition from low to high on the data line indicates a 0
bit, and a transition from high to low indicates a 1 bit. (Some digital
transmission methods do not use separate clock lines.)
Baud rates and bit rates. Is not necessary to limit the signal to
just two values representing 0 and 1. Instead, we could speed things
up by letting the signals represent pairs of bits. In FM transmission
this would correspond to using four different signal frequencies.
To see how it works, consider again our sound wave analogy: in this
scenario, different frequencies correspond to different musical notes
(do, re, me fa ...). To transmit 01001101 using just two frequencies
(do, re), you would have to sing eight notes:
do re do do re re do re.
But with four frequencies (do, re, me, fa) to represent (00, 01, 10,
11), you could transmit 01001101 using only four notes:
re do fa re,
thus sending the same bit sequence in half the time. Memorize thisfundamentalrelationship: ifa = 2b, thenb = log2 a.
In general, a transmitter needs 2k different signals to encode k bits
per signal: 4 different signal for blocks of 2 bits, 8 different signals
for blocks of 3 bits, and so forth. We use the term word size for the
number of bits encoded per signal, and signal variety to refer to the
number of different signals needed to support the word size.
Wireless transmitters can achieve faster transmission rates by in-
creasing the word size this way, but only up to a limit. Adding one
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 34
more bit to the word size means doubling the signal variety. Greater
signal variety makes the signal harder to decode, so the transmission
must either slow down (and then what’s the point) or the error rate
will increase. Higher error rates mean missed signals have to be re-
sent, which slows down the overall bit rate. The problem is to find
the “sweet spot” that gives the best balance between word size and
reliability.
Word size is also limited by the available bandwidth. Greater sig-
nal variety requires use of a wider frequency band, since the different
frequencies have to be spaced reasonably far apart in the spectrum.
In practice, physical and regulatory limits are placed on how much
bandwidth a single transmitter can use.
Wired media can enjoy faster bit rates by increasing the signal
variety, or simply by adding more wires to the cable. Copper-based
media typically uses the latter method, with between 1 and 32 data
lines bundled together in a cable. Optical fiber networks may exploit
both approaches, since it is possible to transmit dozens of different
signals reliably in one fiber.
The baud rate is the maximum frequency at which the data sig-
nal can change, measured in cycles per second (Hz). That is, in analog
transmissoin baud rate corresponds to how quickly one signal can fol-
low another, independent of how many bits each signal represents.
Note that baud rate cannot be faster than the maximum signal fre-
quency within the band being used. For example if an FM transmitter
uses 11kHz and 31kHz frequencies, the baud rate cannot be higher
than 31kHz. In digital transmission with a clock the baud rate is the
same as clock frequency.
The (maximum) bit rate for a given network technology is calcu-
lated by multiplying word size (bits per signal) by baud rate (signals
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 35
per second):
max bit rate = word size × baud rate.
Bit rates are measured in bits per second (bps). Thus a cable
containing 8 data wires and one clock with cycle frequency 10MHz
would have a baud rate of 10MHz and a bit rate of 80Mbps. An FM
transmitter operating at 2GHz (max) with a word size of 2 (signal
variety = 4) would have a baud rate of 2GHz and a top bit rate of
4Gbps.
Our seconddefinition ofbandwidth.
In networking terminology, bit rate is also known as bandwidth.
Note this definition of the term is completely different from the first:
In FM transmission, bandwidth is a difference measured in cycles per
second; here it is a product measured in bits per second. Confus-
ing, yes: it helps to be aware of this distinction when reading about
network technologies.
Channel hopping. One drawback to the simple FM transmission
scheme described above is the need to assign a specific frequency band
to each transmitter. This works fine for stationary transmitters (like
towers) and corporations that are willing to pay for licenses to trans-
mit on specific bandwidths, but not so much for small wireless LANs
in, say, Internet cafes. If two laptops use the use the same transmis-
sion frequencies in the same general area, their signals would interfere
with one another, making reception impossible.
For example, suppose a (laptop style) transmitter is capable of
broadcasting at any frequency in a given range, corresponding to any
note on a piano keyboard (excluding black keys for simplicity). Notes
are labelled by octave, with C4 at middle C, like this:
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 36
...D2 E2 F2 G2 A2 B2 C3 D3 E3 F3 G3 A3 B3 C4 D4 E4 F4 G4 A4 B4 C5 D5 E5 F5 G5 A5 B5 C6 D6 E6 F6 G6 A6...
For those whodon’t readmusic: the clefspacescorrespond tothe notes asshown (d4, f4,etc.); the cleflines correspondto the notesbetween thespaces).
The simple approach would be to assign each transmitter a band
of adjacent notes, called a channel. For example, assuming one-
bit words, transmitter T1 could be assigned to channel (C4, D4) for
(0,1), and T2 could be assigned (D5, E5) for (0,1). Receivers R1 and
R2 could tune in to the separate channels, and the transmitters can
broadcast simultaneously with no problems:
D4F4A4C5E5
i i i i i i ii i i i i i i
T1
T2
But if a new transmitter T3 enters the region, that is also assigned
channel (D5, E5), chaos results. The T2 and T3 signals jam each other
and become impossible to decode.
D4F4A4C5E5
i i i i i i ii i i i i i ii i i i i i i
T1
T2 T3
Spreadspectrum is alsoknown aswideband.Channelhopping is alsocalled frequencyhopping.
Wireless LANs solve this problem by adopting some type of spread
spectrum approach such as channel hopping.
By this method, every transmitter broadcasts on all channels (i.e.
over the entire keyboard), by hopping from channel to channel in a
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 37
pre-set “random” pattern. For example suppose each channel has a
pair of adjacent frequencies (lower=0, higher=1), and the T1 and T2
transmitters hop from channel to channel at time steps 1, 2, 3 . . .,
according to different patterns, like this:
(0 1) 1 2 3 4 5 ...(F5 G5) T2(D5 E5)(B5 C5) T2(G4 A5)(E4 F4) T1 T2(C4 D4)(A4 B4) T2(F3 G3) T1(D3 E3) T1(B3 C3) T1(G2 A3)(E2 F2) T1 T2
Now transmitters T1 and T2 would broadcast something like this:
F2A2C3E3G3B3D4F4A4C5E5G5
i i
i
ii i
i
i
ii iii
i
Receivers R1 and R2 must also hop channels in synchrony with
T1 and T2 in order to decode their respective transmissions. (Try
it: Can you decode the bits sent by each transmitter?) It is possible
for two transmitters to hit the same channel at the same time, but
interference of this sort is rare and and easily fixed.
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 38
Channel hopping offers several advantages over the simpler fixed-
channel method for popular WLAN networks like WiFi and Blue-
tooth. For example, it is not necessary to deal with frequencies jam-
ming as hosts join and leave the network. Second, it is impossible to
eavesdrop without knowing the key to the particular hopping sequence
upon which a transmitter/receiver pair is synchronized. Finally, the
problem of interference from outside sources (which usually occurs at
fixed narrow frequencies) is greatly reduced, so signals can be trans-
mitted using much less power.
Interesting historical note: the original patent holders on this
channel-hopping technique were Hedy Lamarr – glamorous Hollywood
actress and part-time engineer – and George Antheil, a composer of
film scores (among other things). Their 1942 patent on a secret com-
munication system described the use of a paper roll from a player
piano to create radio transmitters and receivers that thwarted eaves-
droppers by hopping from channel to channel in synchrony.
Putting it all together. Bit rates can vary widely depending on sev-
eral factors. First, every type of signal attenuates, or loses power,
when transmitted through a medium. The range of a medium is
the maximum distance a signal can travel without needing some kind
of booster or repeater. Range can be increased by adding power to
the transmitter, but this requires more energy and may not be cost-
effective. Second, bit rate depends on the reliability of a medium,
a measure of how (in)vulnerable it is to interference from outside
sources. Hight attenuation and low reliability means bits can be
“dropped” during transmission, which leads to slower bit rates be-
cause they have to be re-sent.
We can understand how these factors interrelate by appealing once
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 39
again to our sound transmission analogy. Your voice has a certain
range depending on whether you are speaking in a quiet room (low
interference) or on a city street corner (high interference). Your range
can be increased by shouting (adding power). But even within a given
range limit, and depending on the amount of ambient noise in the
area, you have to speak more carefully and slowly (or repeat yourself)
when your listener is several yards away than when you are speaking
face-to-face.
If your message has to be sent farther than your voice range car-
ries, you’ll have to line up some friends (signal boosters) to retransmit
your message. For example, the People’s Mic became popular in the
Occupy Wall Street protests in places where use of electric micro-
phones was banned: by this method, the speaker pauses after every
few words and the nearby crowd echoes the same phrase so it can be
heard farther away; multiple echoes are required if the crowd is very
large. This is perfectly effective for increasing range, but having to
echo phrases reduces the overall rate at which messages can be sent.
The engineers who design physical transmission technologies for
the Internet must balance bit rate against many factors including
attenuation, interference, range, power, and cost. We can’t assign a
specific number to, say, the bit rate of a CAT5 cable, but we can
identify some common points in the space of modern design options.
Ethernet isdescribed inSection 4.3.1.
• Copper cables have range around 100 meters (328 feet) whenused to support 10Gbps in a popular type of Ethernet LAN.
Fiber optic cables can support the same bit rate at ranges around
40 miles. Copper wire has range as high as 10 kilometers (6.2
miles) when used in voice telephone lines, which require much
lower bit rates.
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 40
• Microwave transmission towers are typically spaced 25 to 50miles apart. Earth-to-satellite transmissions span distances of
1,200 to 22,000 miles. The smaller number corresponds to Low
Earth Orbit satellites common in phone networks; the higher
number to geosynchronous satellites used in network communi-
cations. The round-trip latency of an earth-to-satellite trans-
mission is typically around 40 milliseconds for the former and
about half a second (500 milliseconds) for the latter.
• The WiFi wireless LAN standard specifies a range of 70 metersindoors and 100 meters outdoors; this network operates at a bit
rate of 150Mbps, about 100 times slower than comparable wired
LANs.
• Bit error rate, a measure of reliability, refers to the proportionof incorrect bits received in a given transmission over a given
time span. When used in Ethernet LANs, twisted pair cables
have error rates in the range of one to ten bits per million bits
transferred; error rates in WiFi LANs are comparable. Long
distance microwave transmitters have error rates around one
per billion. Optical cable is nearly impervious to noise since the
black plastic cover protects the fibers from light interference,
and bit error rates are negligible.
The attentive reader will have noticed that optical fiber technology
soundly beats the competition in most categories: bit rate, security,
range, and reliability. This is why fiber optic cables have almost
completely replaced copper cables in long-distance communications
networks, and are gradually taking over LANs as well.
Indeed, optical controllers can achieve signal switching rates as
high as 50 Tbps at ranges up to 50 miles. This is about 500 times
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 41
faster than any modern computer could even produce bits to be trans-
mitted: fiber optic technology is waiting for computer technology to
catch up. It is not unusual for several network hosts to share the
use of a single fiber in a communications network by taking turns
using it, which reduces the awful gap between high capacity and low
utilization.
Bit rates are so high partly because infrared light has much higher
frequency than other types of electromagnetic waves, and partly be-
cause the signal attenuates very little over long distances, and is in-
vulnerable to interference. Fiber cable is also cheap to build and fairly
inexpensive to operate. The drawbacks are the expense of building
controllers (which gets smaller every year) and the inconvenience of
installing (unbendable) cables indoors.
Wireless networks tend to have slower bit rates than their wired
counterpoints, but are relatively inexpensive to build and operate.
And of course wired media cannot compete with the convenience –
and sometimes necessity – of not having to be tethered to the rest of
the network.
The next time you download a file from the Internet, the bits in
the file will probably cross several types of physical media on their
way to you: perhaps by twisted pair cable from the file server to the
building exterior; then by fiber optic across town; then via microwave
satellite to cross an ocean; then more fiber optic to your home; then by
wireless transmitter to your computer screen. The overall bit trans-
mission rate that you experience will be determined by the slowest
transmission medium the bits encounter along the way. In most cases
the slowdown occurs at the first and final legs of the transmission.
One modern challenge for telecommunications companies is to find a
cost-effective way to speed up that so-called “last mile” of service to
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 42
the customer, especially in rural areas.
2.3 Storage Media
We finish this chapter with a quick survey of storage media, which
are used for representing bits inside computers and in carry-along
devices. Here are some options. Very roughly, the technologies in the
list below are ordered by access speed (time to read bits) from fastest
to slowest.
• A computer RAM (random access memory) is the kind of mem-ory the computer uses when running a program. RAM memory
is contained in small black chips (see Figure ??). Inside a RAM,
a 0-bit may be represented by 0 volts of electricity, and a 1-bit
by 5 volts of electricity, stored in a tiny circuit or capacitor.
This is known as a volatile storage medium because the bits
are lost when the power goes off.
• A ROM (read-only memory) is non-volatile, which means bitsare preserved when the power goes off. “Read-only” signifies
that the data can’t be over-written. ROMs are used for storing
files and programs when they are not being used; they contain
bit values that were fixed at the factory by setting tiny fuses and
antifuses to create electrical conductors and non-conductors in
the material.
• An EEPROM is an erasable and programmable read-only mem-ory, (note that “programmable” and “read-only” are opposites,
creating an oxymoron). In these types of memories the original
factory-set values can be erased by exposure to strong ultrivaio-
late light and then re-set with high-voltage programming de-
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 43
vices; but they eventually wear out and can’t be reprogrammed.
A flash memory is a newer type of EEPROM that supports
high speed reading and writing.
• A computer hard drive is another type of non-volatile stor-age device. It stores binary values by changing the magnetic
orientations of tiny particles on the surface of a spinning disc.
The bit values are arranged in concentric rings called tracks.
A read-write head hovers over a track on the spinning disk, ei-
ther changing the magnetic orientations (writing to the disk),
or detecting the orientations (reading from it).
• CD-ROMS, audio CDs, and DVDs use optical methods to storebits arranged on a disc. In this technology the 0’s and 1’s are
represented by microscopically-small patterns of smooth and
rough areas on the surface of the disc. The smooth areas are
called lands and the rough spots are called pits. The pits and
lands are arranged in tracks; a laser bean is aimed at a track
as the CD spins, and an optical sensor detects how the beam
reflects off the pits and lands.
• Another common type of magnetic storage media is found onthe black strips on the backs of credit cards and other types
of swiping media. The black material contains microscopically
small magnets that can be oriented to represent binary values.
2.4 Resources
• Greg Sanger, “How Fiber Optics Works,” The Industiral Physi-cist, www.aip.org/tip/INPHFA/vol-8/iss-2/p18.pdf.
Feb/MAr 2002. pp 18-21.
www.aip.org/tip/INPHFA/vol-8/iss-2/p18.pdf
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 44
• Wikipedia articles TBA
• Demos of AM vs FM TBA
2.5 Questions
Projects. Find a friend, and get your hands on some type of audio
transmission device: drum, trumpet, electric keyboard, etc. Design
a transmission technology around your device by filling out the ques-
tionnaire in Figure X.
Find a friend and get your hands on some type of long distance
visual transmission medium: signal flags, flashlights, stadium cards,
etc. Design a transmission technology around your device by filling
out the questionnaire in Figure X.
Find a friend and get your hands on a slinky. Develop an analog
transmission technology using the slinky.
Internet Surf: Here is another great idea for transmitting a long
message across a continent : burn everything into CD-ROM disks,
pack the disks into boxes, and ship everything by overnight delivery
service (such as Federal Express) to the destination host. Do some
online research and answer the following questions about this trans-
mission method. Justify your answers with a clear description of your
assumptions about how to apply each term to this particular technol-
ogy.
• What is the maximum range of an overnight shipment from yourarea?
• What is the transmission delay of this medium, at maximumrange?
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 45
• How much total time is required to ship 10 Megabytes of datafrom one host site tto another? How much time for 10 Giga-
bytes?
• How much would it cost to send 10 Megabytes of data a distanceof 2000 miles? How much for 10 Gigabytes?
• How does this medium compare, in terms of transmission delay,total transmission time, and installation/transmission costs, to
the other five transmission media discussed in this chapter?
• How does this medium compare to the others in terms of relia-bility and security?
• Can you think of a situation where this type of bulk-shipmenttransmission might be preferred over the other five?
Here are some other properties of physical media we may con-
sider. Answer the questions below; justify your answers with a clear
description of your assumptions about how to apply each term to this
medium.
• Was this medium synchronous or asynchronous (or perhaps both)?
• Was this a half-duplex or a full-duplex medium?
• What was the direct host-to-host range of this medium?
• What was the typical transmission delay?
• What were typical baud rates and the bandwidths?
• How did reliability and security compare to other transmissiontechnologies in use at the time?
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 46
• How much total transmission time was required to send a mes-sage containing 50 letters?
Evaluate one of the following antique transmission technologies in
terms of our criteria: range, delay, cost, reliability, security, baud rate,
and bandwidth. You may need to do some online research to learn
more about how it works. What factors led to its demise?
• The telegraph system flourished in the United States in the mid-dle 1800’s. Telegraph transmissions used Morse Code, which can
be thought of as a ternary (as opposed to binary) system using
the three symbols “dot” and “dash” and “space” to encode infor-
mation (you can’t decode the message without knowing where
the spaces are).
• Pony Express
• Ship-to-ship signaling with hand-held flags
• Ship-to-ship signaling with signal lamps.See Openspectrum@wikipedia tolearn moreabout the FreeSpectrummovement.
Essays Technology experts have pointed out that it is possible to
apply the spread-spectrum idea to long-distance wireless WAN broad-
casting as well. This would make broadcast licenses unnecessary and
eliminate a huge source of income for the U.S. government. The Open
Spectrum movement is aimed at expanding the range of frequencies
that can be used license-free.
The FCC regulates content (no dirty words) on TV broadcast. Do
they have the right to regulate content on the Internet as well?
Homework Do the math.
http://www.wikipedia.org/wiki/Open_spectrumhttp://www.wikipedia.org/wiki/Open_spectrumhttp://www.wikipedia.org/wiki/Open_spectrum
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 47
1. Which is faster (assuming maximum signal switching rates for
each medium), a radio transmitter that can encode 8 bits per
signal, or a twisted pair transmittor that can encode only 2 bits
per signal? Why do you say so?
2. Suppose you need to transmit a collection of 50 files, each con-
taining 10 Mbytes of data. What kind of bandwidth is required
to be able to transmit this data in less than one second? Less
than 30 minutes? Which of the transmission technologies pre-
sented in this chapter are able to achieve those bandwidths?
3. Suppose you could stuff 100 Gigabytes of data into a pneumatic
tube carrier (commonly used in department stores at the turn
of the last century – look it up on wikipedia). What would the
bandwidth be, assuming that carriers can follow one another at
a rate of one every two seconds?
4. A typical computer file containing a digital photograph may hold
a grid of 1200 x 1024 pixels (short for “picture elements,” which
look like tiny dots in the image). Each pixel is represented by a
24-bit value indicating its color and intensity. What is the total
size, in bits, of such a file? How long would it take to transmit
this file using the five transmission technologies discussed in this
chapter?
5. A typical digital audio file fresh from the recording studio re-
quires over a million bits per second to represent sound with
good fidelity.
• How many bits would your favorite 3-minute pop tune re-quire?
• How long would it take to send the digital representationof that song over a wireless radio transmittor?
CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 48
• The audio file could be compressed to about one-twelfthits original size using MP3 compression: how much trans-
mission time would be required after compression?
• Is this more or less time than would be required for a com-mercial radio station to play the song?
• What accounts for the time differences between digital andaudio types of radio transmissions?
Chapter 3
Binary Codes
As pointed out in Chapter 2, all information transmitted on the Inter-
net – whether numbers, email messages, web pages, photographs, or
tunes – must first be encoded as sequences of the binary values 0 and
1. A binary code is a convention that assigns meanings to patterns
of bits.
This chapter surveys some of the more common binary codes. Sec-
tion 3.1 consider codes for representing numbers. Section 3.2 surveys
codes for representing letters and symbols, and Section 3.3 looks at
codes for representing images and sounds.
The last two sections surveys codes that are especially useful in
network communications: Section ?? describes data compression codes
and Section ?? looks at codes for detecting and correcting transmis-
sion errors.
49
CHAPTER 3. BINARY CODES 50
3.1 Numbers
Numbers are represented in computers using the binary (base two)
number system instead of the decimal (base ten) system. To under-
stand how it works, recall some key facts about the base ten number
system that you learned in elementary school:
1. It uses ten digits, 0 through 9.
2. The position of a digit within a number tells you which power
of ten it represents.
3. The value of the digit is a multiplier for that power of ten.
For example, 5,082 represents:
2 ones 2× 100 plus8 tens 8× 101 plus0 hundreds 0× 102 plus5 thousands 5× 103.
The principle is the same with binary numbers, except that only
the digits 0 and 1 are used, and the position indicates a power of two
rather than a power of ten. For example, 13 (thirteen) in base ten is
1101 in base two because those digits represent: Figure 3.3 has atable of powersof 2.1 one 1× 20 plus
0 twos 0× 21 plus1 four 1× 22 plus1 eight 1× 23.
Notice that 13 = 8 + 4 + 1. Not a coincidence.
By convention, binary numbers are not written with interrupt-
ing commas, but rather with interrupting spaces between blocks of
CHAPTER 3. BINARY CODES 51
Algorithm : Binary to Decimal(B)
B: Binary number with digits bn−1bn−2 . . . b1b0D: Decimal number with digits dm−1dm−2 . . . d1d0
D ← 0 (1)for i← 0 to n− 1 (2)
do
{if bi = 1 (3)
then D ← D + 2i (4)output (D) (5)
Figure 3.1: An algorithm for converting a number from base two to baseten.
four digits. So the binary value 1110101 is written as 111 0101, not
1, 110, 101. This number is pronounced ”one-one-one oh-one-oh-one.”
Converting among bases. Figure 3.1 shows an algorithm for con-
verting from base two to base ten. An algorithm is a step-by-step
procedure for accomplishing a given task.
Take a look at the algorithmic notation we shall use throughout
the book. The name of the algorithm is on the first line; the (B) in
parenthesis signifies that B is the input to the algorithm. The top
section introduces the notation for the input and output variables in
the algorithm. A variable is like the x in an algebra problem - it is
a place-holder that can be set to a specific value when you work out
the problem. In this case our variables are the binary number B, the
decimal number D, and their individual digits. B contains n digits
identified by b0 through bn−1 (reading right to left) and D contains m
CHAPTER 3. BINARY CODES 52
digits d0 through dm−1.
The (numbered) steps of the algorithm are interpreted like this:
(1) Start by setting D equal to 0. The← is pronounced “gets,”as in D gets 0.
(2) The for loop notation specifies a repeated process: for each
value of i counting from 0 to n− 1, do steps (3) and (4).
(3)(4) The if-then statement” specifies an an alternative: if bi
equals 1, then D gets D plus 2i; if bi does not equal 1, skip step
(4).
(5) The output reports the value calculated for D. Note that
this does not happen until after the for loop has run through all
its values for i.
Algorithms describe processes: when you carry out the process
using a specific input value, we say you are “running the algorithm.”
Let’s try running the algorithm for B = 1101 (so n = 4), keeping
track of D as we go. Notice that steps (2, 3, 4) are repeated once for
each value of i, because of the for loop.
(1) D = 0
(2, 3, 4) i = 0, b0 = 1, so D = 0 + 1 = 1
(2, 3, 4) i = 1, b1 = 0, so D remains 1
(2, 3, 4) i = 2, b2 = 1, so D = 1 + 4 = 5
(2, 3,4) i = 3, b3 = 1, so D = 5 + 8 = 13
(5) The answer is D = 13.
CHAPTER 3. BINARY CODES 53
We have converted the binary number 1101 to its decimal equiva-
lent 13. Try running the algorithm on other numbers to make sure you
understand how it works. demos/demo-bin2dec/DemoApplet.html
The reverse algorithm for converting a number from decimal to
binary is shown in Figure 3.2. The basic idea is to divide D in half
repeatedly, assigning a value to a digit of B depending on whether
the result is even or odd. Here’s how to read this algorithm:
(1) The algorithm uses an extra variable x. First, x gets the
value of input D.
(2) The for loop runs through every value of i, counting from
0 to n− 1, and assigning values to the digits bi one by one.
(3) The if-then-else statement specifies two alternatives: if x is
even, then bi gets 0; otherwise (if x is odd) bi gets 1.
(4) After bi is assigned a value, divide x in half. The notation
bx/2c means to round x/2 down to the next integer.
(5) The output line reports the result B.
Let’s try running the algorithm with D = 13 as input.
(1) x = 13
(3) i = 0, x is odd, so b0 = 1
(4) x = b13/2c = 6
(3) i = 1, x is even, so b1 = 0
(4) x = b6/2c = 3
(3) i = 2, x is odd, so b2 = 1
(4) x = b3/2c = 1
CHAPTER 3. BINARY CODES 54
Algorithm : Decimal to Binary(D)
D: Decimal number with digits dm−1dm−2 . . . d1d0B : Binary number with digits bn−1bn−2ldotsb1b0
x← D (1)for i← 0 to n− 1 (2)
do
if x is even (3)
then bi ← 0else bi ← 1
x← bx/2c (4)output (B) (5)
Figure 3.2: An algorithm for converting from Decimal to Binary.
(3) i = 3, x is odd , so b3 = 1
(4) x = b1/2c = 0
(5) The answer is B = 1101
Try running the algorithm on other decimal numbers. demos/demo-
dec2bin/DemoApplet.html
Word size and overflow. The word size is the number of bits
allowed in a given binary code. The word size determines how many
different patterns are possible. For example, a 4-bit word allows 16
different patterns;
0000 0001 0010 00110000 0101 0110 01111000 1001 1010 10111100 1101 1110 1111
CHAPTER 3. BINARY CODES 55
The general rule is this:
A b-bit word size permits 2b different patterns. To
represent x different patterns, you need a word size of at
least dlg xe bits.Where have youseen this nota-tion before?
The notation lg denotes the base-2 logarithm, and dxe means xrounded up to the next integer. Thus we would need a 5-bit word to
encode all 26 letters of the alphabet, because 5 = dlg2 26e. Since a5-bit word encodes 32 bit patterns, there would be 6 unused patterns
left over. Figure 3.3 shows a handy table of powers of 2.
Most modern computers use 32-bit or sometimes 64-bit words to
represent integers. From Figure 3.3 we see that a 32-bit word permits
4,294,967,296 different patterns. This means that only the the positive
integers 0 through 4,29,967,295 can be represented as binary numbers.
Computer arithmetic has this fundamental limitation: whenever
an arithmetic operation results in a number that can’t be repre-
sented in the given word size, the calculation will be wrong. In
particular, the high-order bits (on the left) of the answer are sim-
ply lost. This is called numerical overflow. Here is a demon-
stration of overflow arising in a simple computation. demos/demo-
factorial/DemoApplet.html
Programmers are expected to be aware of the possibility of over-
flow and to write programs that avoid the problem. For the most
part, they do; but programmers don’t always manage to catch every
error. Section 3.5 has some scary stories about things gone wrong due
to numerical overflow.
Negative integers Of course, computers would be useless if they
couldn’t represent negative as well as positive numbers. To explain
CHAPTER 3. BINARY CODES 56
x 2x
0 11 22 43 84 165 326 647 1288 2569 512
10 1, 02411 2, 04812 4, 19613 8, 19214 16, 38415 32, 76816 63, 53617 131, 07218 262, 14419 524, 28820 1, 048, 57621 2, 097, 15222 4, 194, 30423 8, 388, 60824 16, 777, 21625 33, 554, 43226 67, 108, 86427 134, 217, 72828 268, 435, 45629 536, 870, 91230 1, 073, 741, 82431 2, 147, 483, 34832 4, 294, 967, 296
Figure 3.3: Powers of two.
CHAPTER 3. BINARY CODES 57
how negative numbers are represented, we need to choose a fixed
word size. Real computers use word sizes of 32 (sometimes 64) bits
for positive and negative numbers, but the illustrations in this section
use 12-bit words for simplicity. A 12-bit word yields 4196 different
patterns.
There are three different codes in common use for representing
sets of positive and negative integers. The first and the simplest code
is called signed magnitude notation. In this code the leftmost bit
represents the sign of the number, 0 for positive and 1 for negative,
and the remaining bits represent the number magnitude in base two.
For example, decimal 13 is represented in a 12-bit signed magni-
tude as
0000 0000 1101.
Notice how the number is “padded” with zeros on the left to fill out
the word. The decimal value -13 is represented by
1000 0000 1101.
If we were interpreting this as a simple binary number it would rep-
resent decimal 2059. The signed magnitude code gives us a different
interpretation of this bit pattern.
The signed magnitude code splits the set of possible patterns into
two groups. Patterns that start with 0 represent positive integers,
and patterns that start with 1 represent negative integers. Thus,
with a 12 bit word, the largest number that can be represented is
0111 1111 1111, equal to +2047. The smallest number is binary
1111 1111 1111, equal to -2047. The interpretation is the same no
Note211 = 2048.
matter what the word size.
This is a perfectly logical idea, but it has some drawbacks. For
one thing, there are two ways to represent 0, as a positive number
CHAPTER 3. BINARY CODES 58
Algorithm : Negate Two’s Complement(P )
P : Binary number with digits pw−1pw−2 . . . p1p0N : Binary number with digits nw−1nw−2 . . . n1n0
for i← 0 to w − 1 (1)
do
if pi = 0
then ni ← 1else ni ← 0
N ← N + 1 (2)output (N)
Figure 3.4: Negating a number in two’s complement notation.
0000 0000 0000 and a negative number 1000 0000 0000, which adds
complications when the computer performs arithmetic. Second, the
computer circuitry needed to perform subtraction using this code is
unnecessarily slow and expensive to build.
An alternative code that solves these two problems is called two’s
complement notation. Positive integers in two’s complement look
exactly the same as positive integers in signed magnitude. Also, like
signed magnitude, two’s complement notation uses the leftmost bit
as a sign bit, 0 for positive and 1 for negative. Negative numbers,
however, use a different scheme, as illustrated by the algorithm in
Figure 3.4.
The algorithm basically takes two steps: the for loop in step (1)
flips all the bits from 0 to 1 and from 1 to 0, and then step (2)
adds one to the result. To run the algorithm you need to be able to
add numbers in base two. It’s not so hard, since there are only four
possible outcomes of adding two binary digits:
CHAPTER 3. BINARY CODES 59
0 0 1 1+0 +1 +0 +1— — — —0 1 1 10
To add larger numbers, you follow the base ten method, working
right to left and carrying the one when necessary. Here are some
examples of addition using 12-bit words; the top row shows the carry
digits. The decimal version of each sum is shown below it.
111 1 1111 10000 0001 1000 0000 1111 0111 0000 1111 1111
+ 0000 0000 0001 +0000 0000 0101 + 0000 0000 0100——————– ——————– ———————0000 0001 1001 0000 1111 1100 0001 0000 0011
24 247 255+1 +5 +4
—————— ——————– ———————25 252 259
Here’s a placeholder for a DEMO for you to practice adding num-
bers in binary.
Let’s run the algorithm to find the two’s complement representa-
tion of -19.
Input: P = 0000 0001 0011 is the binary representation of 19.
(1) After the for loop runs, N = 1111 1110 1100.
(2) Adding N + 1 = 1111 1110 1101.
(3) The answer is 1111 1110 1101.
The algorithm in Figure 3.4 works