The Internet and foundations - Amherst Collegeccmcgeoch/cs105/readings/book.pdf · The book assumes...

transcript

The Internetand the Foundations of Computer Science

Catherine C. McGeoch

January 22, 2013

Contents

1 Introduction 10

1.1 What is the Internet? . . . . . . . . . . . . . . . . . . . 11

1.1.1 The Layered Architecture . . . . . . . . . . . . 12

1.1.2 Where do standards come from? . . . . . . . . . 17

1.2 What is Computer Science? . . . . . . . . . . . . . . . 19

1.2.1 Process: Protocols and Algorithms . . . . . . . 20

1.2.2 Data: Codes, Formats, and Structures . . . . . 21

1.2.3 Machines and Languages . . . . . . . . . . . . . 21

1.3 What’s in the book? . . . . . . . . . . . . . . . . . . . 22

1.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . 23

2 The Physical Layer: Bits in Motion 24

2.1 Physical Transmission Media . . . . . . . . . . . . . . . 25

2.2 Bit Transmission Rates . . . . . . . . . . . . . . . . . . 30

2.3 Storage Media . . . . . . . . . . . . . . . . . . . . . . . 42

2.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . 43

1

CONTENTS 2

2.5 Questions . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Binary Codes 49

3.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2 Letters and Symbols . . . . . . . . . . . . . . . . . . . 68

3.3 Sounds and images . . . . . . . . . . . . . . . . . . . . 71

3.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . 76

3.5 Questions . . . . . . . . . . . . . . . . . . . . . . . . . 77

4 Protocols 80

4.1 Two Dining Philosophers . . . . . . . . . . . . . . . . . 83

4.1.1 Two Bad Protocols . . . . . . . . . . . . . . . . 86

4.1.2 A random backoff protocol. . . . . . . . . . . . 93

4.1.3 A channel partitioning protocol. . . . . . . . . . 95

4.1.4 A token-passing protocol. . . . . . . . . . . . . 100

4.2 Designing protocols for real networks. . . . . . . . . . . 105

4.2.1 Network topologies. . . . . . . . . . . . . . . . . 105

4.2.2 Physical Properties of Networks . . . . . . . . . 107

4.2.3 Frame Formats and Binary Codes . . . . . . . . 109

4.2.4 Error control. . . . . . . . . . . . . . . . . . . . 110

4.3 Example Link Layer Protocols . . . . . . . . . . . . . . 110

4.3.1 Ethernet (IEEE 802.3) . . . . . . . . . . . . . . 111

4.3.2 Wireless Ethernet . . . . . . . . . . . . . . . . . 111

CONTENTS 3

4.3.3 ATM . . . . . . . . . . . . . . . . . . . . . . . . 111

4.4 Chapter Review . . . . . . . . . . . . . . . . . . . . . . 111

4.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5 State Machines 112

5.1 Describing Protocols With State Machines. . . . . . . . 112

5.1.1 A Simple State Machine . . . . . . . . . . . . . 113

5.1.2 State Machines for Dining Philosophers . . . . . 121

5.2 Analyzing Protocols . . . . . . . . . . . . . . . . . . . 124

5.2.1 Deadlock Avoidance . . . . . . . . . . . . . . . 124

5.2.2 Liveness properties . . . . . . . . . . . . . . . . 124

5.2.3 Efficiency . . . . . . . . . . . . . . . . . . . . . 125

5.3 Other Uses For State Machines . . . . . . . . . . . . . 125

5.3.1 Representing Patterns and Formats . . . . . . . 125

5.3.2 Representing Languages . . . . . . . . . . . . . 125

5.3.3 Playing Games . . . . . . . . . . . . . . . . . . 125

6 The Network Layer 126

6.0.4 Networks and Internetworks, Simplified . . . . . 128

6.0.5 Network Service Model . . . . . . . . . . . . . . 133

6.0.6 Routing . . . . . . . . . . . . . . . . . . . . . . 138

6.1 Example: The IP Protocol . . . . . . . . . . . . . . . . 142

CONTENTS 4

7 Algorithms and Data Structures 146

7.1 Reading pseudocode. . . . . . . . . . . . . . . . . . . . 147

7.1.1 Our First Algorithm . . . . . . . . . . . . . . . 148

7.1.2 An Algorithm with Iteration . . . . . . . . . . . 157

7.1.3 Algorithms 102 . . . . . . . . . . . . . . . . . . 163

7.2 Algorithnms 102: Data Structures and Procedures . . . 167

7.2.1 Data Structures . . . . . . . . . . . . . . . . . . 167

7.3 Finding Shortest Paths . . . . . . . . . . . . . . . . . . 171

7.3.1 The Link State Algorithm . . . . . . . . . . . . 171

7.4 Looking Things Up . . . . . . . . . . . . . . . . . . . . 172

7.4.1 Linear Search . . . . . . . . . . . . . . . . . . . 172

7.4.2 Binary Search . . . . . . . . . . . . . . . . . . . 172

7.4.3 Binary Search Trees . . . . . . . . . . . . . . . 172

7.4.4 Hash Tables . . . . . . . . . . . . . . . . . . . . 173

7.5 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . 173

8 The Transport Layer 174

8.1 Example: TCP . . . . . . . . . . . . . . . . . . . . . . 174

8.2 Operating Systems . . . . . . . . . . . . . . . . . . . . 174

9 Programming Languages 175

9.1 What Programmers Do . . . . . . . . . . . . . . . . . . 175

9.2 A Brief History of Programming Languages . . . . . . 175

CONTENTS 5

9.3 Some Example Languages . . . . . . . . . . . . . . . . 175

9.3.1 Java: A high-level language . . . . . . . . . . . 175

9.3.2 MINI: A simple assembly language . . . . . . . 175

9.3.3 MINI-CODE: A simple machine language . . . . 175

9.4 Software and Society . . . . . . . . . . . . . . . . . . . 175

9.4.1 Copyrights and Patents: Is software text or ma-

chine? . . . . . . . . . . . . . . . . . . . . . . . 175

9.4.2 The Open Source Debate . . . . . . . . . . . . . 176

9.4.3 Safety, security, and responsibility . . . . . . . . 176

10 The Internet And Cryptography 177

10.1 Secret Codes . . . . . . . . . . . . . . . . . . . . . . . . 177

10.2 Public Key Encryption . . . . . . . . . . . . . . . . . . 181

10.2.1 Cryptographic Protocols . . . . . . . . . . . . . 183

10.2.2 One-Way Trapdoor Functions . . . . . . . . . . 188

10.3 Encryption on the Internet. . . . . . . . . . . . . . . . 193

10.4 Notable Encryption Technologies . . . . . . . . . . . . 196

10.5 Privacy, Security, and the Law . . . . . . . . . . . . . . 199

Preface

To The Student

In 1908, James. E. Homans published Self-Propelled Vehicles, a Com-

prehensive Treatise on the Theory, Construction, Operation, Care,

and Management of all forms of automobiles.” His preface points out

that the motor vehicle is an extremely complex machine:

Its construction and operation involve the consideration

of an extensive range of facts in several widely separated

departments. The study of its construction and operation

is a liberal education in itself.

In order to answer every question that must occur .... one

must produce a whole library of books, rather than a single

volume of convenient size. Virtually all such questions

may be forestalled, however by clear explanations of the

principles governing the design and construction of the

machine.

Substitute “Internet” for “motor vehicle” in the above, and the

claim hold today: The study of the construction and operation of the

Internet requires a collection of information from a wide variety of

6

CONTENTS 7

sources; but a clear explanation of the principles governing its design

can forstall a great deal of perplexity.

This book sets out to do for the Internet what Homans did for the

automobile: to present a comprehensive overview of the major Inter-

net technologies, with an emphasis on the scientific principles under-

lying their design, rather than on instructions for their use. You will

find no information on how to install and use the Netscape browser:

but you will learn how browsers work in general, and what makes one

browser design better than another. You will not learn how to install

an Ethernet card, but you will be able to evaluate the pro’s and con’s

of the the Ethernet media access protocol.

Together with a tour of Internet technologies, the purpose of this

book is to provide a broad overview of the major questions and re-

search areas that make up the discipline of Computer Science: througout,

the technology is used to introduce related fundamental questions, and

conceptual tools for attacking those questions. If you intend to ma-

jor in Computer Science, this book will give you a solid foundation

on which to base your studies; if you do not intend to study Com-

puter Science futher, this book will give you an appreciation for the

intellectual challenges we face, and a better understanding of what

technological advances the future might hold.

The book assumes no prior experience with the Internet or with

computers, beyond a mild familiarity with its most visible parts: if

you have used email, if you know what a web browser is, you’ll be

fine. Some sections assume mathematical background at about the

middle school level: if you can multiply and divide, and if you vaguely

remember how exponentials and logarithms work, you’ll be fine.

CONTENTS 8

To the Professor

This book presents a broad first introduction to the discipline of

computer science. Most of the topics recommended as fundamental

computer science concepts by the joint ACM/IEEE-CS task force in

Computing Curricula 2001 1 are presented here: algorithmic thinking

(design and analysis); data structures and data representation; the

importance of abstraction; and the role of programming languages

in the computational enterprise. Several topics relating to computer

science and society are also covered.

What makes this book different from similar introductory text-

books is that these topics are tied to a survey of Internet technologies,

rather than to computing technologies that reside “inside the box.”

For example, a conventional texbook might present various storage

media (RAM and ROM, magnetic disks, magnetic tapes, etc.) and

use that presentation to motivate a discussion of bit codes for rep-

resenting information; this textbook, instead, surveys physical trans-

mission media (copper wire, fiber optics, wireless) on the way to a

discussion of bit codes and bit patterns.

In a similar way, the discusion of data structures for the search

problem is naturally introduced by a networking question: how might

a router organize a routing table (which contains keyed informa-

tion) for most efficient lookups? The discussion of intractability is

prompted by questions arising in public-key cryptography; and the

survey of programming languages informs a discussion of the Digital

Millineum Copyright Act and whether software should be considered

free speech, a patentable machine, or copyrightable intellectual prop-

erty. Old wine, new bottles: the computer science part is fundamental

and largely unchanged; the technology part joins the 21st century.

1www.computer.org/education/cc2001/final/

CONTENTS 9

Why reorganize old material in this new way? Because it is fun

and topical. The Internet has rocketed to the forefront of modern con-

sciousness, yet its inner workings remain mysterious to most people.

Students are deeply interested in learning more about how it works,

and the study of Internet technologies is rich in jumping-off places for

investigation of the deeper questions of computer science, and of the

social issues that determine how the Internet will evolve.

Chapter 1

Introduction

Internal linksare in red. Ex-ternal links arein blue.

This book provides a broad survey of Internet technologies, and of the

discipline of Computer Science. Although the two areas are related,

not confuse one with the other: Science is concerned with discovery of

universal and demonstrable truths, and exploring the limits of what

we can know; Technology is about building useful and reliable tools.

New scientific knowledge is gained through rigorous effort and adopted

after careful review by trained skeptics. New technologies can sweep

through societies like wildfire or disappear overnight.

Check out the following companion website http:www.cs.amherst.

edu/ccm/ for fun.

Or you can visit my home page.

Here is how you would add a link to an internal label. See the

definition of the application layer in the next section.

10

http:www.cs.amherst.edu/ccm/http:www.cs.amherst.edu/ccm/http://www.cs.amherst.edu/ccm/

CHAPTER 1. INTRODUCTION 11

1.1 What is the Internet?Here is a pictureof twisted pairwire.Our exploration follows a well-documented conceptual structure called

the layered architecture of the Internet. This layered architecture or-

ganizes the enormous complexity of the Internet into distinct inter-

acting parts, called “layers.” Most Internet products are designed

to run “within” specific layers of this architecture, and to interact

with one another “across” these layers. We shall consider these lay-

ers in bottom-up order in this text; each new layer will prompt some

“deeper” questions of computer science. This is a rightside margin.You can readmore in Chapter3.

We start with some definitions. A host s a generic term for any

computer-like device that is connected to a network, including a per-

sonal computer, a multi-user computer, printer, router, and other

types of devices. Computers and hosts run programs. When a pro-

gram is running inside a host, it is called a process. Depending on the

capacity and complexity of a host, between one and several hundred

processes might be resident at any given time.

A network is a collection of hosts that are all owned and con-

trolled by a single organization. The Internet is a huge collection of

public and privately-owned networks, all cooperating to ship messages

from source hosts to destination hosts.

A LAN (local area network) is small, perhaps ranging over a group

of offices: LANS make up the neighborhood roads and driveways of the

so-called Information Highway. A MAN (metripolitan area network)

covers a regional area: these networks form the highways and belt-

ways that get you across town quickly. A WAN (side area network)

may span a nation or continent: these are the high speed Interstate

highways.

Occasionally, a process sends a message to another process, which


Figure 1.1: Two pictures of twisted pair wire

may be on the same host, or on a host within the same network, or a

really remote host somewhere on a distant network.

1.1.1 The Layered Architecture

In fact, a typical Internet transaction involves not only the browser

and the server processes, but several other processes running on the

client and server hosts, as well as processes running on hosts located

in between these two. These various processes belong to layers of

the different Internet Protocol Stack, according to the services they

are expected to provide. Figure 1.2 shows the five main layers of the

Internet (some subdivisions into more layers are ignored here). In the

next few sections we will see what services are provided at each layer.

The Internet layered architecture provides a conceptual device or-

ganizing the huge variety of software and hardware products that work


�� The UserApplication Layer: User services

Transport Layer: Concierge desk

Network Layer: Routing and delivery

Data Link Layer: Direct contact

Physical Layer: Transmission media

Examples:

Email, Web browsers

TCP, UDP

IP

Ethernet

Copper wire, Radio waves

Figure 1.2: The Layered Internet

together to make the Internet run. Any particular Internet software

or hardware product can be categorized as “belonging” to some layer

according to its basic functions and responsibilities. We say the prod-

uct provides specific services to products in the layer above it, and

it relies upon services provided by products in the layer below it. Check out thismargin too. canyou add a url?No matter where they live in the layered architecture, a pair of

processes performing an Internet transaction are known by their roles

as the client and the server. In the usual scenario, the server process

waits patiently on its host for something to happen; the client process

initiates contact and sends a request message to the server; and the

server responds to the request.

The Application Layer The application layer contains the software

that interacts with and provides services to Internet users, also known

as human beings. This layer presents the “look and feel” of the Inter-

net, and is the one you are probably most familiar with.


For example, your favorite web browser is an example of an application-

layer product.

If you want to view a particular page on the Internet, you can

click on a link, type in a URL (Universal Resource Locator), or type

something into a query box and click “Submit.” This causes your

browser, acting as client, to send a request message across the Internet.

to the distant computer where the desired web page resides. A web

server process running on that host is responsible for receiving the

message from your browser and sending the requested web page back

to it. Firefox and Internet Explorer are examples of popular browser

software packages; the most popular web server software package is

called Apache.

In fact, the process is a little more complicated. The Internet

Domain Name system makes it easier for humans to remember which

hosts are where, by assigning names like Internet domain names,

like www.cs.amherst.edu or cnn.com to particular hosts. But an

Internet transaction must use the a “real” host names, which are

called IP numbers (IP means Internet Protocol). An IP number is

a 4-part numbers that look like this: 148.85.77.95.

Every time you use a domain name with your browser (via link or

typed-in URL), it must find out the IP number of the host associated

with that domain name. So when you want to view that page, your

browser first acts as a client to send an inquiry to a DNS (Domain

Name System) server somewhere on the Internet, and the server sends

back the IP number that corresponds to that domain name. Only then

can your browser act as client to send the request message to the web

server on the remote host.

Domain name lookup is an example of a service that is provided

at the application layer, but which doesn’t involve direct interaction


with users, so you don’t usually notice it. There may be other “invis-

ible” services involved in simple web-page access, which may provide

encryption/decryption of some messages, host authentication (is this

ebay.com or a fake website?), and so forth.

Here are more examples of services found at the application layer

of the Internet.

• Email. When you want to send a message to your best friendSam (sam@somewhere.com) your email client will perform a do-

main name lookup to find the IP number of Sam’s email server

host (somewhere.com), then transmit the message to the email

server running on that host, which stores it for safekeeping.

Sam’s email client will eventually ask the server if any mail

has arrived, and the server will respond with a “yes” message

(Sam’s client goes bing - you’ve got mail). Sam’s client will send

a request for the message to the server, and then will show it to

you.

• Instant Messaging.

• File transfer and exchange. This does not have to be illegal.

• Remote login. A program such as ssh allows you to sit atthe keyboard of one computer and have all your keyboard and

commands transferred to a distant computer for execution over

there. This is called having remote access to the other com-

puter. The computer you sit at runs the ssh client program,

which initiates the connection, and the distant computer runs

the ssh server program, which receives commands that you type.


Transport Layer. Your browser doesn’t contact the web server di-

rectly. Instead, it composes a request message, and asks another

process to deal with it. In fact, in order to accomplish all this sending

and receiving, all application layer processes rely on the services of a

transport layer process.

The transport layer process works like the concierge desk at the

hotel. managing the communication between guests and the outside

world. The concierge receives letters from residents and hands them

to the mail carrier, and she receives letters from the mail carrier and

notifies the appropriate resident that something has arrived.

In the context of the Internet, transport layer services are usually

provided by a two software products known as TCP (Transmission

Control Protocol) and UDP (User Datagram Protocol). The former

is slower but more reliable, the latter is quicker but less reliable. Each

product contains both client and server software.

Network Layer In our our hotel analogy, the network layer works

like an international postal service made up of local, regional, and

national offices. Local offices communicate with concierges at the

sending and receiving ends of the mail shipment, but in between, the

offices contact one another to transmit the mail in stages. Typically an

Internet message must make several hops, from host to host, network

to network, along the way to its destination.

To make it all work, every host connected to the Internet must

run a network layer program called Internet Protocol (IP). When

IP receives a message, it checks the message destination and figures

out which neighboring host to send the packet to next: this decision-

making operation is called routing. Routers must be able to monitor

the changing status of nearby hosts and come up with alternative


routes around conjested, crashed, or forbidden areas of the network.

Chapter ?? looks at major challenges and choices associated with

Transport and Network layer services and Chapter ?? explores routing

in more detail.

Data Link Layer A journey of a dozen hops begins with a single

hop. The network layer process decides which host should next receive

the message, and then contacts a process at the data link layer to

take care of the actual transmission of the message the adjacent host.

This process is responsible for contacting the neighboring host and for

ensuring that the packet is properly transmitted. The most widely

used suite of services at this layer are provided by Ethernet products,

but many alternatives are available. Chapter ?? surveys data link

layer services.

Physical Layer The direct connection between two hosts must be

made of some physical transmission medium, perhaps a wire, or per-

haps by radio waves. The bottom layer of the Internet is concerned

with the variety of technologies available for connecting hosts one

to another, and for transmitting messages as efficiently, reliably, and

cheaply as possible.

1.1.2 Where do standards come from?

A standard (including but not limited to binary code standards) is

a written document containing a precise technical specification how

some type of information domain is to be represented or how some

process should work. A draft proposal for a standard, called an RFC

(Request for Comments) is made available to industry professionals

who are invited to criticize it and to suggest modifications. After a


few rounds of criticism, the standard is officially adopted (but still

called an RFC even after adoption). The technical definition of the

Internet is contained within a large and widely dispersed collection of

RFC’s produced by several different standard-writing organizations.

Standards developed through open dialog and made public in tech-

nical documents are called open standards. In contrast, a closed stan-

dard is developed privately by a company and protected as a trade

secret. As Chapter ?? suggests, however, there is much controversy

over the open-standard vs closed-standard ideals: on the one hand,

open standards are generally recognized as more robust, error-free,

and cost-effective than closed standards; and manufacturers can in-

crease sales by claiming that their computing devices and software

products are fully compatible with products sold by others. On the

other hand, open standards conflict with traditional views of copy-

right, patent protections, trade secrets, and licensing rights.

Here are some of the major Internet open standards organizations.

Many of the codes mentioned below will discussed in this chapter.

• ANSI. The American National Standards Institute publishesstandards in all areas of manufacturing, science, and information

technology. They are responsible for the ASCII code, which is

used for representing letters and characters. Visit www.ansi.org.

• IEEE (pronounced eye-triple-eee). The Institute of Electronicsand Electrical Engineering publishes standards for number rep-

resentation and for communication protocols such as Ethernet.

Visit www.ieee.org.

• ISO/IEC. The International Organization for Standardizationdevelops standards in many technical fields, excepting electri-

cal engineering, which is covered by the affiliated International


Electrotechnical Commission. Subcommittees of these organi-

zations publish standards in many areas of computing informa-

tion technology, such as the JPEG and MPEG data compression

standards for images, sound, and video. Visit www.iso.ch.

• ISOC. The Internet Society publishes network protocol stan-dards. Their sub-organization IETF (the Internet Engineering

Task Force) has produced standards for HTTP, IP, TCP, PPP,

ICMO, and many other Internet and network communication

protocols. Visit www.isoc.org.

• The Unicode Consortium publishes the Unicode standard,which is a new expanded alternative to ASCII for representing

characters and symbols. Visit www.unicode.org.

• W3C. The World Wide Web Consortium publishes standardsfor data formats, languages, and protocols having to do with web

browsers and web servers. They are responsible for HTML and

its variations (used for creating web pages). See www.w3c.org.

• JPEG.

• MPEG.

1.2 What is Computer Science?

Although this book is organized around aspects of Internet technol-

ogy, much of the information contained here concerns fundamental

principles of Computer Science. This book is intended to provide a

comprehensive survey of the problems and methodologies of the sci-

ence, with examples and questions drawn from the technology. Most

of the questions of Computer Science are found at the conjunction


of two central activities: design and analysis of algorithmic pro-

cesses; and the design and analysis of representation schemes for

information.

1.2.1 Process: Protocols and Algorithms

An algorithm is a description of a set of steps necessary to solve

some given problem. A problem is described in terms of input and

output: for example the problem might be, given a list of n numbers

as input, to produce a list of the same numbers in sorted order, as

output. An algorithm would be a description of steps for sorting a

list of numbers – of course we need a better idea of what constitutes

a “step,” but all in good time. A protocol is a step-by-step process

similar to an algorithm, but with a different purpose. Protocols are

used for reliable communication and transfer of information among

computing devices. We use the term algorithmic process to refer

to algorithms and protocols collectively.

As it turns out, some algorithms are efficient, and some are not: for

example, a fast sorting algorithm implemented on a typical personal

computer could sort a million numbers in just a couple seconds, while

a naive sorting algorithm (likely the one you would think of first)

might take require a few hours for the same set of numbers. A major

research effort of computer science is the search for better algorithms

for given problems.

The efficiency of any given algorithm depends partially on how

exactly its input represented and organized. Indeed, the correctness

and efficiency of the whole enterprise may depend heavily on how the

algorithm itself is represented. The study of data organization and

representation such topics bit codes, data structures, and mem-


ory managment heirarchies, depending on the scale of the data

items involved (small to large). The study of ways of representing

algorithms includes areas such as software engineering, program-

ming language design, and models of computation. At one end

of this scale we try to find better methods and practices for writing

computer programs (a program is just an algorithm represented in a

particulary useful way); at the other end we consider the power of

various ways of representing algorithmic computation in general.

• An example algorithm.

• A language for describing algorithms.

• A trace of the algorithm on a particular instance.

• Two ways to represent things

• How representation affects the algorithm.

1.2.2 Data: Codes, Formats, and Structures

Process and representation – algorithm and data, protocol and data

format, program and programming language – form the yin and yang

of computer science. On the one hand we study how to do it ; on the

other we investigate how to say it. This book highlights the interplay

between these two fundamental modes of inquiry, as they arise in

several key areas of computer science.

1.2.3 Machines and Languages

How you say the algorithm can be as important as how you represent

the data. Machine is language.


1.3 What’s in the book?

Of course, progress in science both determines and is influenced by

development in technology. The Internet provides an especially rich

area for exploring the boundary between these two domains of knowl-

edge, and the purpose of this text is to explore several facets of that

intersection.

Section ?? sets out some of the major themes and concepts of

Computer Science that will be explored. Two major concerns: design

and evaluation of algorithmic proceses, and design and evaluation of

languages and representation schemes. Example?

Chapter 4 presents media access protocols, by which hosts sharing

a communication line can negotiate ways to take turns using it. A

protocol an algorithmic process: this chapter compares protocols with

respect to properties like correctness, efficiency, reliability.

Chapter 2 describes the wide variety of bit codes that are in use

to assign meanings to patterns of bits. This is a representation prob-

lem concerning small-scale items like numbers, symbols, sounds, and

pictures. We desire to find representation schemes that are compact

and that are compatible with efficient algorithms.

Chapter ?? describes a powerful notational device for representing

protocols so that their properties can be analyzed. This representation

device, known a State Machine or a Finite Automota, is useful in a

wide variety of contexts and is a fundamental abstraction of computer

science. It can be used to investigate many kinds of questions of both

representation and process.

Chapter ?? describes data structures, which are ways to represent

data and information at a medium scale – about the level of phone


books, small company databases, and diagrams of small networks.

The chapter also presents algorithms (processes) for tasks that arise

continually in Internet processing. Methods of algorithm analysis are

also introduced.

A program is an algorithm written in a programming language

that can be understood by a computer. Chapter ?? describes varieties

of programming languages, and the basic blocks upon which they

are constructed. Programming languages are representation methods

for algorithmic processes, that can be compared in terms of their

functionality and practicality.

Chapter crypto describes algorithms for cryptography, which prompts

more fundamental questions about computational processes in gen-

eral: what can be computed? What can be computed efficiently?

Chapter privacy considers issues of privacy, including the use of

public and private databases on the Internet. The design of efficient

and trustworthy databases raises questions, about both representation

and process, that are explored in this chapter.

Finally, both technology and science inform our discussion of so-

cial issues relating to the rise of the Internet. And so forth. Many

more fundamental and intriguing questions will be raised in subse-

quent chapters. Enjoy!

1.4 Further Reading

Paragrpah about further reading. Paragaph about online resources.

Chapter 2

The Physical Layer: Bits inMotion

All the information and data transmitted over the Internet – numbers,

text, photographs, video – must first be encoded into sequences of the

binary digits 0 and 1, called bits. For example, to send the message

Hi! in the widely-used ASCII code (pronounced ask-kee), you would

need to transmit this sequence of 24 bits (8 bits per character):

H i !

0100 1000 0110 1001 0010 0001

Chapter 3 describes ASCII and several other binary codes. This

chapter looks at the problem of how to move the bits from one host

to another. (Host is the generic term for a device connected to a

network: computer, printer, router, phone, etc.)

We consider three types of physical transmission media : cop-

per wire, glass fiber, and pure thin air (also known as wireless). Sec-

tion 2.1 explains how they work and compares some of their prop-

erties. Section 2.2 looks at how the medium affects bit transmission

24

CHAPTER 2. THE PHYSICAL LAYER: BITS IN MOTION 25

a b

Figure 2.1: Panel (a) shows CAT5 cable, containing four twisted pairsof wires; panel (b) shows a cable containing a bundle of optical fibers.

rates. Section 2.3 considers the related question of how to store bits

in memories.

2.1 Physical Transmission Media

We start with a quick overview of the properties of copper, fiber optic,

and wireless transmission media.

Copper wireand opticalfiber are calledguided mediabecause thesignal followsthe contours ofthe wire.Unguided isanother wordfor wirelesstransmissiontechnologies.

Copper wire transmission works by sending electrical signals

along a wire made of (wait for it ...) copper.

A common example is twisted pair wire, made of two thin wires

coated with plastic insulation: one wire carries the electrical signal,

and the other is twisted around it to reduce sensitivity to electrostatic

interference, also known as “noise.” This is simply telephone wire,


used to connect land-line phones the world over. Figure 2.1 (a) shows

the four twisted pairs in a Category 5 (CAT5) cable.

LAN: LocalArea Network.

CAT5 is the most common type of cable used in LANs, which

are the small networks found in homes and offices. The coax cable

that connects you to your broadband cable service is another type of

copper wire technology.

Fiber optic technology works by sending light signals (typically

in the infrared range) down a thin strand made of clear glass or plastic.

The strand is made of an inner core surrounded by a tube made of

a slightly different material, called the outer core. The two cores are

designed so that the light bounces off the boundary between them

and is transmitted by reflection along the inner core, like this:

As Figure 2.1 (b) illustrates, hundreds of fiber optic strands may

be bundled in a cable, together with a stiffening wire to prevent too

much bending (which will break the fibers).

MAN:MetropolitanArea Network.WAN: WideArea Network.

Like copper wires, optical fibers are inexpensive to make. Be-

cause it supports extremely fast transmission rates, (about which

more below), it has replaced copper cable in most city-sized MANs

and continent-sized WANs.

Wireless transmission technologies work by sending electromag-


1210 10 10 10

168

Radio M’wave

4

1GHz 1THz 1PHz1MHz

Infrared Visible

Figure 2.2: The Electromagnetic Spectrum

netic signals through the air. As Figure 2.2 illustrates, electromag-

netic waves have different properties depending on their frequencies.

Frequencies are measured in Hertz (Hz), or cycles per second.

See Table 2.3for definitionsof G, T, andrelated prefixes.

Electromagnetic waves at frequencies below 300 GHz are called

radio waves. Microwaves have frequencies at the high end of the radio

range, between 1 GHz and 300 GHz. Infrared waves (used in fiber

optics) have frequencies between 300GHz and about 400THz, much

higher than microwaves, at the low end of the visible light spectrum.

Wireless networks employ waves of different frequencies depend-

ing on their suitability to the situation. For example, radio and low-

frequency microwaves can pass through solid objects like buildings,

while higher-frequency waves tend to bounce off (or be absorbed by)

solid objects. Consequently, wireless LANs (WLANs), which are usu-

ally found inside buildings, tend to use frequencies at the low end of

the microwave range. In the U.S., popular wireless technologies such

as WiFi and Bluetooth use frequencies near 2.4GHz.

On the other hand, higher-frequency signals travel farther per unit

of power and tend to yield faster transmission rates. The tower-to-


What happens inPrefix Symbol Magnitude Numeric this many secondspico p a trillionth 10−12 half-life of a bottom quarknano ν a billionth 10−9 light travels 1 footmicro µ a millionth 10−6 strobe light flashmilli m a thousandth 10−3 bullet travels 1 footcenti c a hundredth 10−2 3 movie framesdeci d a tenth 10−1

deka D ten 101

hecto h hundred 102 Indy 500: 2 lapskilo k thousand 103 16.7 minutes

mega M million 106 11.6 daysgiga G billion 109 31.7 yearstera T trillion 1012 317.7 centuriespeta P quadrillion 1015 32 megayears

Figure 2.3: Number magnitudes and their notations.

tower and tower-to-satellite transmitters found in MAN and WANs

typically use microwaves in the 4GHz to 18GHz range; in these sys-

tems, each transmitter needs to have a clear line of sight to the re-

ceiver at the other end, with no obstacles like hills or buildings in the

way. That’s why telecommunications towers are commonly found on

hilltops with transmitters/receivers place high above the treelines.

Higher signal frequencies in the infrared and visible ranges don’t

work well for long-distance wireless transmission because the signal

can be scattered by atmosphric conditions such as clouds and rain-

drops.

PAN: PersonalArea Network.

Infrared transmission is occasionally found in PANs, where the

hosts are a few inches or feet apart and have a direct line of sight.

Remote media controllers like TV clickers use wireless infrared.


Cost comparison. The raw materials for to support these transmis-

sion technologies – copper, glass, plastic – are fairly inexpensive to

acquire. And the transmitter/receiver devices that sit at the end of

wired technologies are not too difficult to build. Most of the costs as-

sociated with these technologies are due to instalation and operation.

As a general rule, unguided transmission technologies are cheaper

to install and operate than wired technologies. In LANs it is only

necessary to plug a transmitter/receiver to an incoming internet cable,

and the network is good to go – no need to string wires throughout

the building. On the other hand, CAT5 cable is pretty much already

in place nearly everywhere in the civilized world, so there may be no

installation cost. Optical cable doesn’t bend much, and is difficult

to snake through walls and around corners, and is less often used in

wired LANs.

Long distance guided networks can be expensive to install because

of the cost of acquiring property rights for the cables. One elegant

solution to this problem was devised by the Southern Pacific Railroad,

which owns rights to narrow strips of property all over the western

U.S. In 1972 they began laying fiber optic cables along their tracks

and selling network capacity. That division of the company eventually

became SPRINT, now a telecommunications giant.

Wireless long distance networks have the advantage that trans-

mission towers only require property rights to small plots of land.

However in most parts of the world it is also necessary to purchase a

license to broadcast in specific frequency bandwidths. In the U.S., for

example, since 1994 the FCC has raised over $60 billion for the U.S.

Treasury by conducting auctions to sell spectrum licenses to telecom-

munications companies for use in MANs and WANs. Wireless LANs

are allowed to operate without licenses, using frequency bands near


2.4GHz (in North America) that are allocated for short-range trans-

missions.

Security. These types of media can also be compared in terms of

security, or how much protection they provide against eavesdropping.

Anyone who has seen a spy movie knows that it is easy to place a tap

on telephone wire, although taps can be detected if you know to look

for them. Fiber optic cables are much more secure, since attaching

a device to read the signals would require breaking and splicing the

fibers, which would be easy to detect.

WEP: WiredEquivalentPrivacy. WPA:WiFi ProtectedAccess.

Wireless transmission is very insecure compared to wired trans-

mission, since anyone could eavesdrop on a wireless broadcast and it

would be impossible to detect who is listening. For this reason, most

wireless networks in commercial use employ some type of encryption

– scrambling the signal – to prevent casual or accidental eavesdrop-

ping. For example, WiFi networks encrypt signals using one of two

methods, called WEP and WPA. WEP, the older standard, is easy

to break using software widely available on the Internet; WPA and

its newer version WPA2 are considered much more secure, but not

perfectly impermeable. The National Security Agency (NSA) has

made available strong (highly secure) encryption methods for use by

telecommunication companies in tower and satellite communications.

2.2 Bit Transmission Rates

The main question about a given transmission technology is: how fast

can it move the bits?

The question does not refer to latency, which is how quickly the

signal moves from point A to point B. All the major technologies


1 0 00 Data

Signal = data plus carrier

a

0 1 0 0

Data

Clock

b

Figure 2.4: Transmitting the bits 0100. Panel (a) illustrates frequencymodulation in an analog signal. Panel (b) shows Manchester Encodingusing digital signals.

work by sending electrical and electromagnetic signals, which travel

at nearly the speed of light (impeded just a little by the physical

medium). Signal latency is about the same whether the signal is sent

over copper, glass, or air, and near-speed-of-light is about as good as

it gets on this planet.

Instead, the question is about the bit rate, measured in bits per

second (bps), which corresponds to how fast bits can follow one

another in the transmission medium. The bit rate depends on several

factors that we consider next.

Analog and digital signals. Figure 2.4 presents two examples of how

bit values can be represented by changes in electromagnetic signals.

Panel (a) shows a sinuous analog wave form and panel (b) shows a

squared-off digital wave form. Either type of waveform can be used

in any type of physical media, but newer guided technologies tend to


be digital, and unguided technologies are commonly associated with

analog methods. The amplitudeof a signal isthe verticaldifferencebetween thehighest andlowest points inadjacent wavecrests.

In analog transmission, binary values can be encoded using fre-

quency modulation (FM) or amplitude modulation (AM). We

concentrate on FM transmission since it is more common in network

communications. Recall that signal frequency refers to how quickly

one wave crest follows another (cycles per second). Frequency is in-

versely related to wavelength, which is the horizontal distance be-

tween successive wave crests. For example, 300MHz radio waves have

1 meter wavelengths, while 300THz infrared waves have wavelengths

near 1 micrometer.

Panel (a) of Figure 2.4 shows an how an FM transmitter com-

bines a binary data signal with an analog carrier signal to create the

frequency-modulated result. At the other end of the transmission, a

receiver tunes in to the carrier frequency and decodes the bit values

by measuring the differences between the carrier it expects and the

signal it receives.

Here is an animation showing how frequency modula-

tion is used to encode binary signals.

Our firstdefinition ofbandwidth.

The range of spectrum used in a particular transmission is called

a band, and bandwidth refers to the difference between maximum

and minimum frequencies in use. Bandwidth is measured in Hertz.

For example, if 0 and 1 are represented by 11kHZ and 31kHZ signals,

the bandwidth would be 20kHz.

Panel 2.4 (b) shows a common type of digital transmission method

called Manchester Encoding. This example uses two signals, which

would be carried by two separate wires in a guided medium. A con-

troller at one end of the wires sends a steady clock signal to pace and


synchronize the controller at the other end. The data wire is inspected

by the receiving controller only when the clock signal is high. During

that time, a transition from low to high on the data line indicates a 0

bit, and a transition from high to low indicates a 1 bit. (Some digital

transmission methods do not use separate clock lines.)

Baud rates and bit rates. Is not necessary to limit the signal to

just two values representing 0 and 1. Instead, we could speed things

up by letting the signals represent pairs of bits. In FM transmission

this would correspond to using four different signal frequencies.

To see how it works, consider again our sound wave analogy: in this

scenario, different frequencies correspond to different musical notes

(do, re, me fa ...). To transmit 01001101 using just two frequencies

(do, re), you would have to sing eight notes:

do re do do re re do re.

But with four frequencies (do, re, me, fa) to represent (00, 01, 10,

11), you could transmit 01001101 using only four notes:

re do fa re,

thus sending the same bit sequence in half the time. Memorize thisfundamentalrelationship: ifa = 2b, thenb = log2 a.

In general, a transmitter needs 2k different signals to encode k bits

per signal: 4 different signal for blocks of 2 bits, 8 different signals

for blocks of 3 bits, and so forth. We use the term word size for the

number of bits encoded per signal, and signal variety to refer to the

number of different signals needed to support the word size.

Wireless transmitters can achieve faster transmission rates by in-

creasing the word size this way, but only up to a limit. Adding one


more bit to the word size means doubling the signal variety. Greater

signal variety makes the signal harder to decode, so the transmission

must either slow down (and then what’s the point) or the error rate

will increase. Higher error rates mean missed signals have to be re-

sent, which slows down the overall bit rate. The problem is to find

the “sweet spot” that gives the best balance between word size and

reliability.

Word size is also limited by the available bandwidth. Greater sig-

nal variety requires use of a wider frequency band, since the different

frequencies have to be spaced reasonably far apart in the spectrum.

In practice, physical and regulatory limits are placed on how much

bandwidth a single transmitter can use.

Wired media can enjoy faster bit rates by increasing the signal

variety, or simply by adding more wires to the cable. Copper-based

media typically uses the latter method, with between 1 and 32 data

lines bundled together in a cable. Optical fiber networks may exploit

both approaches, since it is possible to transmit dozens of different

signals reliably in one fiber.

The baud rate is the maximum frequency at which the data sig-

nal can change, measured in cycles per second (Hz). That is, in analog

transmissoin baud rate corresponds to how quickly one signal can fol-

low another, independent of how many bits each signal represents.

Note that baud rate cannot be faster than the maximum signal fre-

quency within the band being used. For example if an FM transmitter

uses 11kHz and 31kHz frequencies, the baud rate cannot be higher

than 31kHz. In digital transmission with a clock the baud rate is the

same as clock frequency.

The (maximum) bit rate for a given network technology is calcu-

lated by multiplying word size (bits per signal) by baud rate (signals


per second):

max bit rate = word size × baud rate.

Bit rates are measured in bits per second (bps). Thus a cable

containing 8 data wires and one clock with cycle frequency 10MHz

would have a baud rate of 10MHz and a bit rate of 80Mbps. An FM

transmitter operating at 2GHz (max) with a word size of 2 (signal

variety = 4) would have a baud rate of 2GHz and a top bit rate of

4Gbps.

Our seconddefinition ofbandwidth.

In networking terminology, bit rate is also known as bandwidth.

Note this definition of the term is completely different from the first:

In FM transmission, bandwidth is a difference measured in cycles per

second; here it is a product measured in bits per second. Confus-

ing, yes: it helps to be aware of this distinction when reading about

network technologies.

Channel hopping. One drawback to the simple FM transmission

scheme described above is the need to assign a specific frequency band

to each transmitter. This works fine for stationary transmitters (like

towers) and corporations that are willing to pay for licenses to trans-

mit on specific bandwidths, but not so much for small wireless LANs

in, say, Internet cafes. If two laptops use the use the same transmis-

sion frequencies in the same general area, their signals would interfere

with one another, making reception impossible.

For example, suppose a (laptop style) transmitter is capable of

broadcasting at any frequency in a given range, corresponding to any

note on a piano keyboard (excluding black keys for simplicity). Notes

are labelled by octave, with C4 at middle C, like this:


...D2 E2 F2 G2 A2 B2 C3 D3 E3 F3 G3 A3 B3 C4 D4 E4 F4 G4 A4 B4 C5 D5 E5 F5 G5 A5 B5 C6 D6 E6 F6 G6 A6...

For those whodon’t readmusic: the clefspacescorrespond tothe notes asshown (d4, f4,etc.); the cleflines correspondto the notesbetween thespaces).

The simple approach would be to assign each transmitter a band

of adjacent notes, called a channel. For example, assuming one-

bit words, transmitter T1 could be assigned to channel (C4, D4) for

(0,1), and T2 could be assigned (D5, E5) for (0,1). Receivers R1 and

R2 could tune in to the separate channels, and the transmitters can

broadcast simultaneously with no problems:

D4F4A4C5E5

i i i i i i ii i i i i i i

T1

T2

But if a new transmitter T3 enters the region, that is also assigned

channel (D5, E5), chaos results. The T2 and T3 signals jam each other

and become impossible to decode.

D4F4A4C5E5

i i i i i i ii i i i i i ii i i i i i i

T1

T2 T3

Spreadspectrum is alsoknown aswideband.Channelhopping is alsocalled frequencyhopping.

Wireless LANs solve this problem by adopting some type of spread

spectrum approach such as channel hopping.

By this method, every transmitter broadcasts on all channels (i.e.

over the entire keyboard), by hopping from channel to channel in a


pre-set “random” pattern. For example suppose each channel has a

pair of adjacent frequencies (lower=0, higher=1), and the T1 and T2

transmitters hop from channel to channel at time steps 1, 2, 3 . . .,

according to different patterns, like this:

(0 1) 1 2 3 4 5 ...(F5 G5) T2(D5 E5)(B5 C5) T2(G4 A5)(E4 F4) T1 T2(C4 D4)(A4 B4) T2(F3 G3) T1(D3 E3) T1(B3 C3) T1(G2 A3)(E2 F2) T1 T2

Now transmitters T1 and T2 would broadcast something like this:

F2A2C3E3G3B3D4F4A4C5E5G5

i i

i

ii i

i

i

ii iii

i

Receivers R1 and R2 must also hop channels in synchrony with

T1 and T2 in order to decode their respective transmissions. (Try

it: Can you decode the bits sent by each transmitter?) It is possible

for two transmitters to hit the same channel at the same time, but

interference of this sort is rare and and easily fixed.


Channel hopping offers several advantages over the simpler fixed-

channel method for popular WLAN networks like WiFi and Blue-

tooth. For example, it is not necessary to deal with frequencies jam-

ming as hosts join and leave the network. Second, it is impossible to

eavesdrop without knowing the key to the particular hopping sequence

upon which a transmitter/receiver pair is synchronized. Finally, the

problem of interference from outside sources (which usually occurs at

fixed narrow frequencies) is greatly reduced, so signals can be trans-

mitted using much less power.

Interesting historical note: the original patent holders on this

channel-hopping technique were Hedy Lamarr – glamorous Hollywood

actress and part-time engineer – and George Antheil, a composer of

film scores (among other things). Their 1942 patent on a secret com-

munication system described the use of a paper roll from a player

piano to create radio transmitters and receivers that thwarted eaves-

droppers by hopping from channel to channel in synchrony.

Putting it all together. Bit rates can vary widely depending on sev-

eral factors. First, every type of signal attenuates, or loses power,

when transmitted through a medium. The range of a medium is

the maximum distance a signal can travel without needing some kind

of booster or repeater. Range can be increased by adding power to

the transmitter, but this requires more energy and may not be cost-

effective. Second, bit rate depends on the reliability of a medium,

a measure of how (in)vulnerable it is to interference from outside

sources. Hight attenuation and low reliability means bits can be

“dropped” during transmission, which leads to slower bit rates be-

cause they have to be re-sent.

We can understand how these factors interrelate by appealing once


again to our sound transmission analogy. Your voice has a certain

range depending on whether you are speaking in a quiet room (low

interference) or on a city street corner (high interference). Your range

can be increased by shouting (adding power). But even within a given

range limit, and depending on the amount of ambient noise in the

area, you have to speak more carefully and slowly (or repeat yourself)

when your listener is several yards away than when you are speaking

face-to-face.

If your message has to be sent farther than your voice range car-

ries, you’ll have to line up some friends (signal boosters) to retransmit

your message. For example, the People’s Mic became popular in the

Occupy Wall Street protests in places where use of electric micro-

phones was banned: by this method, the speaker pauses after every

few words and the nearby crowd echoes the same phrase so it can be

heard farther away; multiple echoes are required if the crowd is very

large. This is perfectly effective for increasing range, but having to

echo phrases reduces the overall rate at which messages can be sent.

The engineers who design physical transmission technologies for

the Internet must balance bit rate against many factors including

attenuation, interference, range, power, and cost. We can’t assign a

specific number to, say, the bit rate of a CAT5 cable, but we can

identify some common points in the space of modern design options.

Ethernet isdescribed inSection 4.3.1.

• Copper cables have range around 100 meters (328 feet) whenused to support 10Gbps in a popular type of Ethernet LAN.

Fiber optic cables can support the same bit rate at ranges around

40 miles. Copper wire has range as high as 10 kilometers (6.2

miles) when used in voice telephone lines, which require much

lower bit rates.


• Microwave transmission towers are typically spaced 25 to 50miles apart. Earth-to-satellite transmissions span distances of

1,200 to 22,000 miles. The smaller number corresponds to Low

Earth Orbit satellites common in phone networks; the higher

number to geosynchronous satellites used in network communi-

cations. The round-trip latency of an earth-to-satellite trans-

mission is typically around 40 milliseconds for the former and

about half a second (500 milliseconds) for the latter.

• The WiFi wireless LAN standard specifies a range of 70 metersindoors and 100 meters outdoors; this network operates at a bit

rate of 150Mbps, about 100 times slower than comparable wired

LANs.

• Bit error rate, a measure of reliability, refers to the proportionof incorrect bits received in a given transmission over a given

time span. When used in Ethernet LANs, twisted pair cables

have error rates in the range of one to ten bits per million bits

transferred; error rates in WiFi LANs are comparable. Long

distance microwave transmitters have error rates around one

per billion. Optical cable is nearly impervious to noise since the

black plastic cover protects the fibers from light interference,

and bit error rates are negligible.

The attentive reader will have noticed that optical fiber technology

soundly beats the competition in most categories: bit rate, security,

range, and reliability. This is why fiber optic cables have almost

completely replaced copper cables in long-distance communications

networks, and are gradually taking over LANs as well.

Indeed, optical controllers can achieve signal switching rates as

high as 50 Tbps at ranges up to 50 miles. This is about 500 times


faster than any modern computer could even produce bits to be trans-

mitted: fiber optic technology is waiting for computer technology to

catch up. It is not unusual for several network hosts to share the

use of a single fiber in a communications network by taking turns

using it, which reduces the awful gap between high capacity and low

utilization.

Bit rates are so high partly because infrared light has much higher

frequency than other types of electromagnetic waves, and partly be-

cause the signal attenuates very little over long distances, and is in-

vulnerable to interference. Fiber cable is also cheap to build and fairly

inexpensive to operate. The drawbacks are the expense of building

controllers (which gets smaller every year) and the inconvenience of

installing (unbendable) cables indoors.

Wireless networks tend to have slower bit rates than their wired

counterpoints, but are relatively inexpensive to build and operate.

And of course wired media cannot compete with the convenience –

and sometimes necessity – of not having to be tethered to the rest of

the network.

The next time you download a file from the Internet, the bits in

the file will probably cross several types of physical media on their

way to you: perhaps by twisted pair cable from the file server to the

building exterior; then by fiber optic across town; then via microwave

satellite to cross an ocean; then more fiber optic to your home; then by

wireless transmitter to your computer screen. The overall bit trans-

mission rate that you experience will be determined by the slowest

transmission medium the bits encounter along the way. In most cases

the slowdown occurs at the first and final legs of the transmission.

One modern challenge for telecommunications companies is to find a

cost-effective way to speed up that so-called “last mile” of service to


the customer, especially in rural areas.

2.3 Storage Media

We finish this chapter with a quick survey of storage media, which

are used for representing bits inside computers and in carry-along

devices. Here are some options. Very roughly, the technologies in the

list below are ordered by access speed (time to read bits) from fastest

to slowest.

• A computer RAM (random access memory) is the kind of mem-ory the computer uses when running a program. RAM memory

is contained in small black chips (see Figure ??). Inside a RAM,

a 0-bit may be represented by 0 volts of electricity, and a 1-bit

by 5 volts of electricity, stored in a tiny circuit or capacitor.

This is known as a volatile storage medium because the bits

are lost when the power goes off.

• A ROM (read-only memory) is non-volatile, which means bitsare preserved when the power goes off. “Read-only” signifies

that the data can’t be over-written. ROMs are used for storing

files and programs when they are not being used; they contain

bit values that were fixed at the factory by setting tiny fuses and

antifuses to create electrical conductors and non-conductors in

the material.

• An EEPROM is an erasable and programmable read-only mem-ory, (note that “programmable” and “read-only” are opposites,

creating an oxymoron). In these types of memories the original

factory-set values can be erased by exposure to strong ultrivaio-

late light and then re-set with high-voltage programming de-


vices; but they eventually wear out and can’t be reprogrammed.

A flash memory is a newer type of EEPROM that supports

high speed reading and writing.

• A computer hard drive is another type of non-volatile stor-age device. It stores binary values by changing the magnetic

orientations of tiny particles on the surface of a spinning disc.

The bit values are arranged in concentric rings called tracks.

A read-write head hovers over a track on the spinning disk, ei-

ther changing the magnetic orientations (writing to the disk),

or detecting the orientations (reading from it).

• CD-ROMS, audio CDs, and DVDs use optical methods to storebits arranged on a disc. In this technology the 0’s and 1’s are

represented by microscopically-small patterns of smooth and

rough areas on the surface of the disc. The smooth areas are

called lands and the rough spots are called pits. The pits and

lands are arranged in tracks; a laser bean is aimed at a track

as the CD spins, and an optical sensor detects how the beam

reflects off the pits and lands.

• Another common type of magnetic storage media is found onthe black strips on the backs of credit cards and other types

of swiping media. The black material contains microscopically

small magnets that can be oriented to represent binary values.

2.4 Resources

• Greg Sanger, “How Fiber Optics Works,” The Industiral Physi-cist, www.aip.org/tip/INPHFA/vol-8/iss-2/p18.pdf.

Feb/MAr 2002. pp 18-21.

www.aip.org/tip/INPHFA/vol-8/iss-2/p18.pdf


• Wikipedia articles TBA

• Demos of AM vs FM TBA

2.5 Questions

Projects. Find a friend, and get your hands on some type of audio

transmission device: drum, trumpet, electric keyboard, etc. Design

a transmission technology around your device by filling out the ques-

tionnaire in Figure X.

Find a friend and get your hands on some type of long distance

visual transmission medium: signal flags, flashlights, stadium cards,

etc. Design a transmission technology around your device by filling

out the questionnaire in Figure X.

Find a friend and get your hands on a slinky. Develop an analog

transmission technology using the slinky.

Internet Surf: Here is another great idea for transmitting a long

message across a continent : burn everything into CD-ROM disks,

pack the disks into boxes, and ship everything by overnight delivery

service (such as Federal Express) to the destination host. Do some

online research and answer the following questions about this trans-

mission method. Justify your answers with a clear description of your

assumptions about how to apply each term to this particular technol-

ogy.

• What is the maximum range of an overnight shipment from yourarea?

• What is the transmission delay of this medium, at maximumrange?


• How much total time is required to ship 10 Megabytes of datafrom one host site tto another? How much time for 10 Giga-

bytes?

• How much would it cost to send 10 Megabytes of data a distanceof 2000 miles? How much for 10 Gigabytes?

• How does this medium compare, in terms of transmission delay,total transmission time, and installation/transmission costs, to

the other five transmission media discussed in this chapter?

• How does this medium compare to the others in terms of relia-bility and security?

• Can you think of a situation where this type of bulk-shipmenttransmission might be preferred over the other five?

Here are some other properties of physical media we may con-

sider. Answer the questions below; justify your answers with a clear

description of your assumptions about how to apply each term to this

medium.

• Was this medium synchronous or asynchronous (or perhaps both)?

• Was this a half-duplex or a full-duplex medium?

• What was the direct host-to-host range of this medium?

• What was the typical transmission delay?

• What were typical baud rates and the bandwidths?

• How did reliability and security compare to other transmissiontechnologies in use at the time?


• How much total transmission time was required to send a mes-sage containing 50 letters?

Evaluate one of the following antique transmission technologies in

terms of our criteria: range, delay, cost, reliability, security, baud rate,

and bandwidth. You may need to do some online research to learn

more about how it works. What factors led to its demise?

• The telegraph system flourished in the United States in the mid-dle 1800’s. Telegraph transmissions used Morse Code, which can

be thought of as a ternary (as opposed to binary) system using

the three symbols “dot” and “dash” and “space” to encode infor-

mation (you can’t decode the message without knowing where

the spaces are).

• Pony Express

• Ship-to-ship signaling with hand-held flags

• Ship-to-ship signaling with signal lamps.See Openspectrum@wikipedia tolearn moreabout the FreeSpectrummovement.

Essays Technology experts have pointed out that it is possible to

apply the spread-spectrum idea to long-distance wireless WAN broad-

casting as well. This would make broadcast licenses unnecessary and

eliminate a huge source of income for the U.S. government. The Open

Spectrum movement is aimed at expanding the range of frequencies

that can be used license-free.

The FCC regulates content (no dirty words) on TV broadcast. Do

they have the right to regulate content on the Internet as well?

Homework Do the math.

http://www.wikipedia.org/wiki/Open_spectrumhttp://www.wikipedia.org/wiki/Open_spectrumhttp://www.wikipedia.org/wiki/Open_spectrum


1. Which is faster (assuming maximum signal switching rates for

each medium), a radio transmitter that can encode 8 bits per

signal, or a twisted pair transmittor that can encode only 2 bits

per signal? Why do you say so?

2. Suppose you need to transmit a collection of 50 files, each con-

taining 10 Mbytes of data. What kind of bandwidth is required

to be able to transmit this data in less than one second? Less

than 30 minutes? Which of the transmission technologies pre-

sented in this chapter are able to achieve those bandwidths?

3. Suppose you could stuff 100 Gigabytes of data into a pneumatic

tube carrier (commonly used in department stores at the turn

of the last century – look it up on wikipedia). What would the

bandwidth be, assuming that carriers can follow one another at

a rate of one every two seconds?

4. A typical computer file containing a digital photograph may hold

a grid of 1200 x 1024 pixels (short for “picture elements,” which

look like tiny dots in the image). Each pixel is represented by a

24-bit value indicating its color and intensity. What is the total

size, in bits, of such a file? How long would it take to transmit

this file using the five transmission technologies discussed in this

chapter?

5. A typical digital audio file fresh from the recording studio re-

quires over a million bits per second to represent sound with

good fidelity.

• How many bits would your favorite 3-minute pop tune re-quire?

• How long would it take to send the digital representationof that song over a wireless radio transmittor?


• The audio file could be compressed to about one-twelfthits original size using MP3 compression: how much trans-

mission time would be required after compression?

• Is this more or less time than would be required for a com-mercial radio station to play the song?

• What accounts for the time differences between digital andaudio types of radio transmissions?

Chapter 3

Binary Codes

As pointed out in Chapter 2, all information transmitted on the Inter-

net – whether numbers, email messages, web pages, photographs, or

tunes – must first be encoded as sequences of the binary values 0 and

1. A binary code is a convention that assigns meanings to patterns

of bits.

This chapter surveys some of the more common binary codes. Sec-

tion 3.1 consider codes for representing numbers. Section 3.2 surveys

codes for representing letters and symbols, and Section 3.3 looks at

codes for representing images and sounds.

The last two sections surveys codes that are especially useful in

network communications: Section ?? describes data compression codes

and Section ?? looks at codes for detecting and correcting transmis-

sion errors.

49

CHAPTER 3. BINARY CODES 50

3.1 Numbers

Numbers are represented in computers using the binary (base two)

number system instead of the decimal (base ten) system. To under-

stand how it works, recall some key facts about the base ten number

system that you learned in elementary school:

1. It uses ten digits, 0 through 9.

2. The position of a digit within a number tells you which power

of ten it represents.

3. The value of the digit is a multiplier for that power of ten.

For example, 5,082 represents:

2 ones 2× 100 plus8 tens 8× 101 plus0 hundreds 0× 102 plus5 thousands 5× 103.

The principle is the same with binary numbers, except that only

the digits 0 and 1 are used, and the position indicates a power of two

rather than a power of ten. For example, 13 (thirteen) in base ten is

1101 in base two because those digits represent: Figure 3.3 has atable of powersof 2.1 one 1× 20 plus

0 twos 0× 21 plus1 four 1× 22 plus1 eight 1× 23.

Notice that 13 = 8 + 4 + 1. Not a coincidence.

By convention, binary numbers are not written with interrupt-

ing commas, but rather with interrupting spaces between blocks of


Algorithm : Binary to Decimal(B)

B: Binary number with digits bn−1bn−2 . . . b1b0D: Decimal number with digits dm−1dm−2 . . . d1d0

D ← 0 (1)for i← 0 to n− 1 (2)

do

{if bi = 1 (3)

then D ← D + 2i (4)output (D) (5)

Figure 3.1: An algorithm for converting a number from base two to baseten.

four digits. So the binary value 1110101 is written as 111 0101, not

1, 110, 101. This number is pronounced ”one-one-one oh-one-oh-one.”

Converting among bases. Figure 3.1 shows an algorithm for con-

verting from base two to base ten. An algorithm is a step-by-step

procedure for accomplishing a given task.

Take a look at the algorithmic notation we shall use throughout

the book. The name of the algorithm is on the first line; the (B) in

parenthesis signifies that B is the input to the algorithm. The top

section introduces the notation for the input and output variables in

the algorithm. A variable is like the x in an algebra problem - it is

a place-holder that can be set to a specific value when you work out

the problem. In this case our variables are the binary number B, the

decimal number D, and their individual digits. B contains n digits

identified by b0 through bn−1 (reading right to left) and D contains m


digits d0 through dm−1.

The (numbered) steps of the algorithm are interpreted like this:

(1) Start by setting D equal to 0. The← is pronounced “gets,”as in D gets 0.

(2) The for loop notation specifies a repeated process: for each

value of i counting from 0 to n− 1, do steps (3) and (4).

(3)(4) The if-then statement” specifies an an alternative: if bi

equals 1, then D gets D plus 2i; if bi does not equal 1, skip step

(4).

(5) The output reports the value calculated for D. Note that

this does not happen until after the for loop has run through all

its values for i.

Algorithms describe processes: when you carry out the process

using a specific input value, we say you are “running the algorithm.”

Let’s try running the algorithm for B = 1101 (so n = 4), keeping

track of D as we go. Notice that steps (2, 3, 4) are repeated once for

each value of i, because of the for loop.

(1) D = 0

(2, 3, 4) i = 0, b0 = 1, so D = 0 + 1 = 1

(2, 3, 4) i = 1, b1 = 0, so D remains 1

(2, 3, 4) i = 2, b2 = 1, so D = 1 + 4 = 5

(2, 3,4) i = 3, b3 = 1, so D = 5 + 8 = 13

(5) The answer is D = 13.


We have converted the binary number 1101 to its decimal equiva-

lent 13. Try running the algorithm on other numbers to make sure you

understand how it works. demos/demo-bin2dec/DemoApplet.html

The reverse algorithm for converting a number from decimal to

binary is shown in Figure 3.2. The basic idea is to divide D in half

repeatedly, assigning a value to a digit of B depending on whether

the result is even or odd. Here’s how to read this algorithm:

(1) The algorithm uses an extra variable x. First, x gets the

value of input D.

(2) The for loop runs through every value of i, counting from

0 to n− 1, and assigning values to the digits bi one by one.

(3) The if-then-else statement specifies two alternatives: if x is

even, then bi gets 0; otherwise (if x is odd) bi gets 1.

(4) After bi is assigned a value, divide x in half. The notation

bx/2c means to round x/2 down to the next integer.

(5) The output line reports the result B.

Let’s try running the algorithm with D = 13 as input.

(1) x = 13

(3) i = 0, x is odd, so b0 = 1

(4) x = b13/2c = 6

(3) i = 1, x is even, so b1 = 0

(4) x = b6/2c = 3

(3) i = 2, x is odd, so b2 = 1

(4) x = b3/2c = 1


Algorithm : Decimal to Binary(D)

D: Decimal number with digits dm−1dm−2 . . . d1d0B : Binary number with digits bn−1bn−2ldotsb1b0

x← D (1)for i← 0 to n− 1 (2)

do

if x is even (3)

then bi ← 0else bi ← 1

x← bx/2c (4)output (B) (5)

Figure 3.2: An algorithm for converting from Decimal to Binary.

(3) i = 3, x is odd , so b3 = 1

(4) x = b1/2c = 0

(5) The answer is B = 1101

Try running the algorithm on other decimal numbers. demos/demo-

dec2bin/DemoApplet.html

Word size and overflow. The word size is the number of bits

allowed in a given binary code. The word size determines how many

different patterns are possible. For example, a 4-bit word allows 16

different patterns;

0000 0001 0010 00110000 0101 0110 01111000 1001 1010 10111100 1101 1110 1111


The general rule is this:

A b-bit word size permits 2b different patterns. To

represent x different patterns, you need a word size of at

least dlg xe bits.Where have youseen this nota-tion before?

The notation lg denotes the base-2 logarithm, and dxe means xrounded up to the next integer. Thus we would need a 5-bit word to

encode all 26 letters of the alphabet, because 5 = dlg2 26e. Since a5-bit word encodes 32 bit patterns, there would be 6 unused patterns

left over. Figure 3.3 shows a handy table of powers of 2.

Most modern computers use 32-bit or sometimes 64-bit words to

represent integers. From Figure 3.3 we see that a 32-bit word permits

4,294,967,296 different patterns. This means that only the the positive

integers 0 through 4,29,967,295 can be represented as binary numbers.

Computer arithmetic has this fundamental limitation: whenever

an arithmetic operation results in a number that can’t be repre-

sented in the given word size, the calculation will be wrong. In

particular, the high-order bits (on the left) of the answer are sim-

ply lost. This is called numerical overflow. Here is a demon-

stration of overflow arising in a simple computation. demos/demo-

factorial/DemoApplet.html

Programmers are expected to be aware of the possibility of over-

flow and to write programs that avoid the problem. For the most

part, they do; but programmers don’t always manage to catch every

error. Section 3.5 has some scary stories about things gone wrong due

to numerical overflow.

Negative integers Of course, computers would be useless if they

couldn’t represent negative as well as positive numbers. To explain


x 2x

0 11 22 43 84 165 326 647 1288 2569 512

10 1, 02411 2, 04812 4, 19613 8, 19214 16, 38415 32, 76816 63, 53617 131, 07218 262, 14419 524, 28820 1, 048, 57621 2, 097, 15222 4, 194, 30423 8, 388, 60824 16, 777, 21625 33, 554, 43226 67, 108, 86427 134, 217, 72828 268, 435, 45629 536, 870, 91230 1, 073, 741, 82431 2, 147, 483, 34832 4, 294, 967, 296

Figure 3.3: Powers of two.


how negative numbers are represented, we need to choose a fixed

word size. Real computers use word sizes of 32 (sometimes 64) bits

for positive and negative numbers, but the illustrations in this section

use 12-bit words for simplicity. A 12-bit word yields 4196 different

patterns.

There are three different codes in common use for representing

sets of positive and negative integers. The first and the simplest code

is called signed magnitude notation. In this code the leftmost bit

represents the sign of the number, 0 for positive and 1 for negative,

and the remaining bits represent the number magnitude in base two.

For example, decimal 13 is represented in a 12-bit signed magni-

tude as

0000 0000 1101.

Notice how the number is “padded” with zeros on the left to fill out

the word. The decimal value -13 is represented by

1000 0000 1101.

If we were interpreting this as a simple binary number it would rep-

resent decimal 2059. The signed magnitude code gives us a different

interpretation of this bit pattern.

The signed magnitude code splits the set of possible patterns into

two groups. Patterns that start with 0 represent positive integers,

and patterns that start with 1 represent negative integers. Thus,

with a 12 bit word, the largest number that can be represented is

0111 1111 1111, equal to +2047. The smallest number is binary

1111 1111 1111, equal to -2047. The interpretation is the same no

Note211 = 2048.

matter what the word size.

This is a perfectly logical idea, but it has some drawbacks. For

one thing, there are two ways to represent 0, as a positive number


Algorithm : Negate Two’s Complement(P )

P : Binary number with digits pw−1pw−2 . . . p1p0N : Binary number with digits nw−1nw−2 . . . n1n0

for i← 0 to w − 1 (1)

do

if pi = 0

then ni ← 1else ni ← 0

N ← N + 1 (2)output (N)

Figure 3.4: Negating a number in two’s complement notation.

0000 0000 0000 and a negative number 1000 0000 0000, which adds

complications when the computer performs arithmetic. Second, the

computer circuitry needed to perform subtraction using this code is

unnecessarily slow and expensive to build.

An alternative code that solves these two problems is called two’s

complement notation. Positive integers in two’s complement look

exactly the same as positive integers in signed magnitude. Also, like

signed magnitude, two’s complement notation uses the leftmost bit

as a sign bit, 0 for positive and 1 for negative. Negative numbers,

however, use a different scheme, as illustrated by the algorithm in

Figure 3.4.

The algorithm basically takes two steps: the for loop in step (1)

flips all the bits from 0 to 1 and from 1 to 0, and then step (2)

adds one to the result. To run the algorithm you need to be able to

add numbers in base two. It’s not so hard, since there are only four

possible outcomes of adding two binary digits:


0 0 1 1+0 +1 +0 +1— — — —0 1 1 10

To add larger numbers, you follow the base ten method, working

right to left and carrying the one when necessary. Here are some

examples of addition using 12-bit words; the top row shows the carry

digits. The decimal version of each sum is shown below it.

111 1 1111 10000 0001 1000 0000 1111 0111 0000 1111 1111

+ 0000 0000 0001 +0000 0000 0101 + 0000 0000 0100——————– ——————– ———————0000 0001 1001 0000 1111 1100 0001 0000 0011

24 247 255+1 +5 +4

—————— ——————– ———————25 252 259

Here’s a placeholder for a DEMO for you to practice adding num-

bers in binary.

Let’s run the algorithm to find the two’s complement representa-

tion of -19.

Input: P = 0000 0001 0011 is the binary representation of 19.

(1) After the for loop runs, N = 1111 1110 1100.

(2) Adding N + 1 = 1111 1110 1101.

(3) The answer is 1111 1110 1101.

The algorithm in Figure 3.4 works

The Internet and foundations - Amherst Collegeccmcgeoch/cs105/readings/book.pdf · The book assumes...

Documents