Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | winchell-vance |
View: | 19 times |
Download: | 1 times |
1
Link Layer
4/19/2012
Admin
Written Assignment—Network new due date: Monday, April 23
If you are considering replacement work, please stop by to talk to me
Any feedback/suggestions on the course will be appreciated.
2
3
Recap: Internet Routing
Intradomain routing and interdomain routing
CIDR to allow flexibility in aggregation of destination addresses to improve routing scalability Longest prefix matching to determine the
next hop to a destination
Basic switching fabric design
4
Putting it Together: Example 1 (same network): A->B
Look up dest address find dest is on same net
Hand datagram to link layer to send inside a link-layer frame
miscfields223.1.1.1223.1.1.3data
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
B
Dest. Net. next router Nhops
223.1.1/24 1223.1.2/24 223.1.1.4 2223.1.3/24 223.1.1.4 2
forwarding table in A
0.0.0.0/0 223.1.1.4 -
223.1.4.1
To Internet
src dst
5
Putting it Together: Example 2 (Different Networks): A-> E
look up dest address in forwarding table routing table: next hop
router to dest is 223.1.1.4
Hand datagram to link layer to send to router 223.1.1.4 inside a link-layer frame the dest. of the link layer
frame is 223.1.1.4
miscfields223.1.1.1223.1.2.3 data
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.3
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
Dest. Net. next router Nhops
223.1.1/24 1223.1.2/24 223.1.1.4 2223.1.3/24 223.1.1.4 2
forwarding table in A
0.0.0.0/0 223.1.1.4 -
223.1.4.1
To Internet
Summary of Network Layer We have covered the basics of the network
layer routing and forwarding
There are multiple other topics that we did not cover Multicast/anycast QoS slides will be linked on the
schedule page just in case you need reading in the summer
6
7
Recap: The Hourglass Architecture of the Internet
IP
Ethernet FDDIWireless
TCP UDP
Telnet Email FTP WWW
ADSL CableDOCSIS
8
Link Layer: Introduction
Some terminology: hosts and routers are nodes (bridges and switches too)
communication channels that connect adjacent nodes along a communication path are links wired, wireless dedicated, shared
2-PDU is called a frame, encapsulates 3-PDU datagram
“link”
9
Link layer: Context
Data-link layer has responsibility of transferring datagram from one node to another node
Datagram may be transferred by different link protocols over different links, e.g., Ethernet on first link, frame relay on
intermediate links 802.11 on last link
transportation analogy
trip from New Haven to San Francisco taxi: home to union
station train: union station
to JFK plane: JFK to San
Francisco airport shuttle: airport to
hotel
10
Link Layer Services Framing
o encapsulate datagram into frame, adding header, trailer and error detection/correction
Multiplexing/demultiplexingo frame headers to identify src, dest
Media access control Forwarding/switching with a link-layer (Layer 2)
domain Reliable delivery between adjacent nodes
o we learned how to do this already !o seldom used on low bit error link (fiber, some twisted
pair)o common for wireless links: high error rates
11
Adaptors Communicating
link layer typically implemented in “adaptor” (aka NIC) Ethernet card,
modem, 802.11 card
adapter is semi-autonomous, implementing link & physical layers
sending side: encapsulates datagram
in a frame adds error checking bits,
rdt, flow control, etc.
receiving side looks for errors, rdt, flow
control, etc extracts datagram,
passes to receiving node
sendingnode
frame
receivingnode
datagram
frame
adapter adapter
link layer protocol
12
LAN/MAC/Physical Address
In most link-layer, each adapter has a unique link layer address (also called MAC address)
• used as address in datalink frames to identify the interface
• 48 bit MAC address (for most types of LANs) burned in the adapter ROM
• MAC address allocation administered by IEEE;manufacturer buys portion of MAC address space (to assure uniqueness)
13
Recall Earlier Routing Discussion
Starting at A, given IP datagram addressed to E:
look up net. address of E, find C
link layer sends datagram to C inside link-layer frame; the dest. address should be C’s MAC address
C’s MACaddr
A ’s MACaddr
A ’s IPaddr
E’s IPaddr
IP payload
datagramframe
frame source,dest address
datagram source,dest address
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
C
Question: how to determine MAC address of C knowing C’s IP address?
14
ARP: Address Resolution Protocol
Each IP node (Host, Router) on LAN has ARP table
ARP Table: IP/MAC address mappings for some LAN nodes
< IP address; MAC address; TTL> TTL (Time To Live): time
after which address mapping will be forgotten (typically 20 min)
[yry3@cicada yry3]$ /sbin/arpAddress HWtype HWaddress Flags Mask Ifacezoo-gatew.cs.yale.edu ether AA:00:04:00:20:D4 C eth0artemis.zoo.cs.yale.edu ether 00:06:5B:3F:6E:21 C eth0lab.zoo.cs.yale.edu ether 00:B0:D0:F3:C7:A5 C eth0
15
ARP Protocol
ARP is “plug-and-play”: nodes create their ARP tables without
intervention from net administrator
A broadcast protocol: source broadcasts query frame, containing
queried IP address • all machines on LAN receive ARP query
destination D receives ARP frame, replies• frame sent to A’s MAC address (unicast)
16
Comparison of IP address and MAC Address
IP address is locator address depends on
network to which an interface is attached
• NOT portable
introduces features (e.g., CIDR) for routing scalability
IP address needs to be globally unique (if no NAT)
MAC address is an identifiero dedicated to a
device• portable
o flat
MAC address does not need to be globally unique, but the current assignment ensures uniqueness
Outline
Admin Link layer overview Error detection
17
18
Error Detection
D = Data protected by error checking, may include header fieldsED = Error Detection bits (redundancy)
• Error detection not 100% reliable!• a good error detector may miss some errors, but rarely• larger ED field generally yields better detection
• Error detection design considers computation primitives.
19
Cyclic Redundancy Check: Background Widely used in practice, e.g.,
Ethernet, DOCSIS (Cable Modem), FDDI, PKZIP, WinZip, PNG
For a given data D, consider it as a polynomial D(x) consider the string of 0 and 1 as the
coefficients of a polynomial• e.g. consider string 10011 as x4+x+1
addition and subtraction are modular 2, thus the same as xor
Choose generator polynomial G(x) with r+1 bits, where r is called the degree of G(x)
20
Cyclic Redundancy Check: Encode Given data G(x) and D(x), choose R(x)
with r bits, such that D(x)xr+R(x) is exactly divisible by G(x)
The bits correspond to D(x)xr+R(x) are sent to the receiver
+x
21
Ethernet Frame Structure
Sending adapter encapsulates IP datagram (or other network layer protocol packet) in Ethernet frame
Preamble: 8 bytes 7 bytes with pattern 10101010 followed by one byte with
pattern 10101011 (why the preamble?) Source and dest. addresses: 6 bytes Type: indicates the higher layer protocol, mostly IP but
others may be supported such as Novell IPX and AppleTalk
CRC: CRC-32 checked at receiver, if error is detected, the frame is simply dropped
8 6 6 2 46-1500 (including padding) 4
22
Cyclic Redundancy Check: Decode
Since G(x) is global, when the receiver receives the transmission T’(x), it divides T’(x) by G(x) if non-zero remainder: error detected! if zero remainder, assumes no error
Encode:CRC(G)
DT = D(x)xr+R(x) T ’
check
23
CRC: Steps and an Example
Suppose the degree of G(x) is r
Append r zero to D(x), i.e. consider D(x)xr
Divide D(x)xr by G(x). Let R(x) denote the reminder
Send <D, R> to the receiver
24
The Power of CRC Let T(x) denote D(x)xr+R(x), and E(x) the polynomial of the
error bits the received signal is T’(x) = T(x)+E(x)
Since T(x) is divisible by G(x), we only need to consider if E(x) is divisible by G(x)
Encode:CRC(G)
DT = D(x)xr+R(x) T ’
check
25
Designing CRC
Detect a single-bit error: E(x) = xi
if G(x) contains two or more terms, E(x) is not divisible by G(x)
Detect an odd number of errors: E(x) has an odd number of terms: lemma: if E(x) has an odd number of terms, E(x) cannot
be divisible by (x+1)• suppose E(x) = (x+1)F(x), let x=1, the left hand will be 1, while
the right hand will be 0 thus if G(x) contains x+1 as a factor, E(x) will not be
divided by G(x)
Many more errors can be detected by designing the right G(x)
26
Example G(x) 32 bits CRC:
CRC32: x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1
used by Ethernet, FDDI, PKZIP, WinZip, and PNG GSM phones
For more details see the link below and further links it contains: http://en.wikipedia.org/wiki/Cyclic_redundancy_check
.
Outline
Admin Link layer overview Error detection/correction Link access
27
28
Multiple Access Links and Protocols
Two types of “links”: point-to-point
e.g., a leased dedicated line, PPP for dial-up access
broadcast (shared wire or medium) traditional Ethernet; Cable networks 802.11 wireless LAN; cellular networks satellite
29
Multiple Access Protocols Single shared broadcast channel
thus, if two or more simultaneous transmissions by nodes, due to interference, only one node can send successfully at a time (see CDMA later for an exception)
multiple access protocol Protocol that determines how nodes share
channel, i.e., determines when nodes can transmit Communication about channel sharing must use
channel itself !
Discussion: properties of an ideal multiple access protocol.
30
Ideal Mulitple Access ProtocolBroadcast channel of rate R bps Efficiency: when only one node wants to
transmit, it can send at full rate R Rate allocation:
simple fairness: when N nodes want to transmit, each can send at average rate R/N
we may need more complex rate control Decentralized:
no special node to coordinate transmissions no synchronization of clocks
Simple
31
MAC Protocols: a Taxonomy
Goals efficient, rate control, decentralized,
simple
Three broad classes: channel partitioning
divide channel into smaller “pieces” (time slot, frequency, code)
non-partitioning random access
• allow collisions “taking-turns”
• a token coordinates shared access to avoid collisions
32
Outline
Admin. and recap Link layer overview Error detection and correction Media access control (MAC) protocols
channel partitioning
33
Channel Partitioning: TDMA
TDMA: time division multiple access Access to channel in "rounds" Each station gets fixed length slot (length =
pkt trans time) in each round Unused slots go idle Example: 6-station LAN, 1,3,4 have pkt, slots
2,5,6 idle
34
Channel Partitioning: FDMA
FDMA: frequency division multiple access Channel spectrum divided into frequency bands Each station assigned fixed frequency band Unused transmission time in frequency bands go
idle Example: 6-station LAN, 1,3,4 have pkt,
frequency bands 2,5,6 idle
frequ
ency
bands time
5
1
4
3
2
6
35
1 2 3 4 5 6 7 8
935-960 MHz124 channels (200 kHz)downlink
890-915 MHz124 channels (200 kHz)uplink
frequ
ency
time
GSM TDMA frame
GSM time-slot (normal burst)
4.615 ms
546.5 µs577 µs
tail user data TrainingSguardspace S user data tail
guardspace
3 bits 57 bits 26 bits 57 bits1 1 3
GSM - TDMA/FDMA
S: indicates data or control
36
Channel Partitioning: CDMA
CDMA (Code Division Multiple Access) Used mostly in wireless broadcast channels
(cellular, satellite, etc) A spread-spectrum technique
History: http://people.seas.harvard.edu/~jones/cscie129/nu_lectures/lecture7/hedy/lemarr.htm
37
CDMA: Encoding
All users share same frequency, but each user m has its own unique “chipping” sequence (i.e., code) cm to encode data, i.e., code set partitioning e.g. cm = 1 1 1 -1 1 -1 -1 -1
Assume original data are represented by 1 and -1
Encoded signal = (original data) modulated by (chipping sequence) assume cm = 1 1 1 -1 1 -1 -1 -1
if data is d, send d cm, • if data d is 1, send cm
• if data d is -1 send -cm
CDMA: Encoding
38
user data d(t)
chipping sequence c(t)
resultingsignal
1 -1
-1 1 1 -1 1 -1 1 -11 -1 -1 1 11
X
=
tb
tc
tb: bit periodtc: chip period
-1 1 1 -1 -1 1 -1 11 -1 1 -1 -11
39
CDMA: Decoding
Inner-product (summation of bit-by-bit product) of encoded signal and chipping sequence if inner-product > 0, the data is 1; else -1
40
CDMA Encode/Decode
Code of user m cm: 1 1 1 -1 1 -1 -1 -1
- The number of bitsof each chipping sequence is M
Encode
Decode
41
CDMA: Deal with Multiple-User Interference
Two codes Ci and Cj are orthogonal, if , where we use “.” to denote inner
product, e.g.
If codes are orthogonal, multiple users can “coexist” and transmit simultaneously with minimal interference:
iiij
jj cdccd )(
0 ij cc
C1: 1 1 1 -1 1 -1 -1 -1 C2: 1 -1 1 1 1 -1 1 1-----------------------------------------C1 . C2 = 1 +(-1) + 1 + (-1) +1 + 1+ (-1)+(-1)=0
Analogy: Speak in different languages!
42
CDMA: Two-Sender Interference
Code 1: 1 1 1 -1 1 -1 -1 -1Code 2: 1 -1 1 1 1 -1 1 1
Discussions
Advantages of channel partitioning
Problems of channel partitioning
43
44
Outline
Recap Link layer overview Error detection and correction MAC protocols
Partitioning protocols Non-partitioning MAC protocols
• Random access
45
Random Access Protocols
When a node has packets to send transmit at full channel data rate R no a priori coordination among nodes
Two or more transmitting nodes -> “collision” Random access MAC protocol specifies:
when to access channel? how to detect collisions? how to recover from collisions?
Examples of random access MAC protocols: slotted ALOHA and pure ALOHA CSMA and CSMA/CD, CSMA/CA
46
Slotted Aloha [Norm Abramson]
Time is divided into equal size slots (= pkt trans. time)
Node with new arriving pkt: transmit at beginning of next slot
If collision: retransmit pkt in future slots with probability p, until successful.
Success (S), Collision (C), Empty (E) slots
47
Slotted Aloha EfficiencyQ: What is the fraction of successful
slots?suppose n stations have packets to sendsuppose each transmits in a slot with probability
p
- prob. of succ. by a specific node: p (1-p)(n-1)
- prob. of succ. by any one of the N nodes
S(p) = n * Prob (only one transmits) = n p (1-p)(n-1)
48
Goodput vs. Offered LoadS =
thro
ughput
= “
goodput
” (
succ
ess
rate
)
G = offered load = np0.5 1.0 1.5 2.0
Slotted Aloha
when p n < 1, as p (or n) increases probability of empty slots reduces probability of collision is still low, thus goodput increases
when p n > 1, as p (or n) increases, probability of empty slots does not reduce much, but probability of collision increases, thus goodput decreases
goodput is optimal when p n = 1
49
Maximum Efficiency vs. n
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
2 7 12 17 n
ma
xim
um
eff
icie
nc
y1/e = 0.37
At best: channeluse for useful transmissions 37%of time!
50
Pure (unslotted) Aloha Unslotted Aloha: simpler, no clock
synchronization Whenever pkt needs transmission:
send without awaiting for the beginning of slot
Collision probability increases: pkt sent at t0 collide with other pkts sent in [t0-1,
t0+1]
51
Pure Aloha (cont.)Assume a node transmit with probability p in one unit of time
P(success by a given node) = P(node transmits) * P(no other node transmits in [t0-1,t0]
* P(no other node transmits in [t0,
t0+1]
= p . (1-p)n-1 . (1-p)n-1
= p . (1-p)2(n-1)
P(success by any of N nodes) = n p . (1-p)2(n-1)
- Bound: 1/(2e) = .18
52
Goodput vs. Offered LoadS =
thro
ughput
= “
goodput
” (
succ
ess
rate
)
G = offered load = Np0.5 1.0 1.5 2.0
0.1
0.2
0.3
0.4
Pure Aloha
protocol constrainseffective channelthroughput!
Slotted Aloha
53
Dynamics of (Slotted) Aloha
In reality, the number of stations backlogged is changing we need to study the dynamics when using a
fixed transmission probability p
Assume we have a total of m stations (the machines on a LAN): n of them are currently backlogged, each tries
with a (fixed) probability p the remaining m-n stations are not backlogged.
They may start to generate packets with a probability pa, where pa is much smaller than p
54
Modeln backlogged
each transmits with prob. p
m-n: unbacklogged
each transmits with prob. pa
55
Dynamics of Aloha: Effects of Fixed Probability
n: number of backlogged stations
0 m
successful transmission rate at
offered load np + (m-n)pa
new arrival rate:(m-n) pa
desirable stable point
undesirable stable point
Lesson: if we fix p, but n varies, we may have an undesirable stable point
offered load = 1
- assume a total ofm stations- pa << p- success rate is thedeparture rate, the rate the backlog is reducing
dep.andarrivalrateofbackloggedstations
56
Summary of Problems of Aloha Protocols Problems
slotted Aloha has better efficiency than pure Aloha but clock synchronization is hard to achieve
Aloha protocols have low efficiency due to collision or empty slots
• when offered load is optimal (p = 1/N), the goodput is only about 37%
• when the offered load is not optimal, the goodput is even lower
undesirable steady state at a fixed transmission rate, when the number of backlogged stations varies
Ethernet design: address the problems: approximate slotted Aloha without clock
synchronization reduce the penalty of collision or empty slots infer optimal transmission rate
57
The Basic MAC Mechanisms of Ethernet
get a packet from upper layer;K := 0; n := 0; // K: control wait time; n: no. of
collisionsrepeat: wait for K * 512 bit-time; while (network busy) wait; wait for 96 bit-time after detecting no signal; transmit and detect collision; if detect collision stop and transmit a 48-bit jam signal; n ++; m:= min(n, 10), where n is the number of
collisions choose K randomly from {0, 1, 2, …, 2m-1}. if n < 16 goto repeat else give up
58
Ethernet
“Dominant” LAN technology: First widely used LAN
technology Kept up with speed race: 10 Mbps, 100 Mbps,
1 Gbps, 10 Gbps
Metcalfe’s Ethernetsketch
Course Topics Summary
The Internet is a general-purpose, large-scale, distributed computer network
Major design features/principles packet switching/statistical multiplexing hour-glass architecture end-to-end principle decentralized architecture
• E.g., DNS, interdomain routing resource allocation framework
• optimization decomposition through duality adaptive control
• e.g., AIMD sliding window self clocking, Ethernet queueing modeling/performance analysis and design tradeoff between theoretical impossibility and practice
Evolution Driven by Technology, Infrastructure, Policy,
Applications, and Understanding: technology
• e.g., wireless/optical communication technologies and device miniaturization (sensors)
infrastructure• e.g., cloud computing
applications• e.g., content distribution, game, tele presence, sensing, grid
computing, VoIP, understanding
• e.g., resource sharing principle, routing principles, mechanism design, optimal stochastic control (randomized access)
Complexity comes from evolution. Don’t be afraid to challenge the foundation and
redesign!
61
Backup Slides
62
63
Ethernet’s Exponential Backoff:
Goal: adapt retransmission attempts to estimated current load compared with CSMA, 1/2m can be considered
as p not a static p---adjusted using exponential
backoff• first collision: choose K from {0,1}; delay is K x 512
bit transmission times• after second collision: choose K from {0,1,2,3}…• after ten or more collisions, choose K from {0,1,2,3,4,
…,1023}
Many Issues
How to make it faster
How to make it more efficient
How to make it more reliable/robust/secure
64
65
CSMA: Carrier Sense Multiple Access
CSMA: listen before transmitObjective: approximate slotted Aloha (compared
with pure Aloha)
If backlogged, wait until channel sensed idle, then transmit pkt with prob. p
human analogy: don’t interrupt others !
66
CSMA Collisions
collisions can still occur:propagation delay means two nodes may nothear each other’s transmission
Collision:entire packet transmission time wasted; still not veryefficient!
spatial layout of nodes along EthernetA B C D
tim
e
t0
67
CSMA/CD (Collision Detection) Human analogy: the polite conversationalist
CSMA/CD: observations:
• collisions can be detected within short time• if colliding transmissions are aborted, we can reduce
channel wastage carrier sensing, deferral as in CSMA collision detection:
• easy in wired LANs: measure signal strengths, compare transmitted, received signals
• difficult in wireless LANs: receiver shuts off while transmitting
68
spatial layout of nodes along EthernetA B C D
tim
e
t0
spatial layout of nodes along EthernetA B C D
tim
e
t0
B detectscollision, aborts
D detectscollision,aborts
CSMA/CD: Collision Detection
instead of wasting the whole packettransmission time, abort after detection.
69
Efficiency of CSMA/CD Given collision detection, instead of wasting the
whole packet transmission time (a slot), we waste only the time needed to detect collision.
Use a contention slot of 2 T, where T is one-way propagation delay (why 2 T ?)
When the transmission probability p is approximately optimal (p = 1/N), we try approximately e times before each successful transmission
P/C
P: packet size, e.g. 1000 bitsC: link capacity, e.g. 10Mbps
70
Efficiency of CSMA/CD The efficiency (the percentage of useful time) is
approximately
The value of a plays a fundamental role in the efficiency of CSMA/CD protocols.
Question: you want to increase the capacity of a link layer technology (e.g., , 10 Mbps Ethernet to 100 Mbps), but still want to maintain the same efficiency, what can you do?
PTC
aTea
CPT
CP
CP
where,511
11
2 5
71
Summary of Problems to be Addressed
Approximate slotted Aloha
Reduce the penalty of collision or empty slots
Infer optimal transmission rate
Physical Layer
72
Internet Bandwidth Growth
Source: TeleGeograph Research
What Determines Transmission Rate?
Service: transmit a bit stream from a sender to a receiver
Encodingchannel
Decodingoutput bit stream
input bit stream
sender receiver
Question to be addressed: how much can we send through the channel ?
Basic Theory: Channel Capacity The maximum number of bits that can be
transmitted per second (bps) by a physical media is:
where W is the frequency range, S/N is the signal noise ratio. We assume Gaussian noise.
)1(log2 NSW
Fourier Transform
Suppose the period of a data unit is f (=1/T), then the data unit can be represented as the sum of many harmonics (sin(), cos()) with frequencies f, 2f, 3f, 4f, …
A reasonably behaved periodic function g(t), with minimal period T, can be constructed as the sum of a series of sines and cosines:
11
21 )2cos()2sin()(
nn
nn nftbnftactg
dtnfttgb
dtnfttga
dttgc
Tf
T
Tn
T
Tn
T
T
)2cos()(
)2sin()(
)(
/1
0
2
0
2
0
2
nnn barms char “b”
Signal Attenuation
The quality of signal will degrade when it travels loss, frequency passing
)1(log2 NSW
Frequency Dependent Attenuation The received signal will be distorted even when
there is no interference and the transmitted signal is “perfect” square waveform
Example: Voltage-attenuation magnitude ratios of Category 5 cable. For example, 500 feet of cable attenuates a 10-MHz, 1-V signal to 0.32 V, which corresponds to about –9.90 dB (= 20 log 1/0.32)
Example
Example: W=3000Hz, S/N 4000
kbpsbandwidth 36)40001(log3000max 2
telephone networksender modem
ModemModulation
(digit->analog)
3Khz bandwidth(add white noise)
ISPdemodulation
output bit stream
input bit stream
Analog to Digital quantization
for transmitting throughthe digital telephone
backbone
ISP modem
V.34 (33.6kbps Dialup Modem)
channel
Example: ADSL Spectrum allocation:
divided into a total of 256 downstream and 32 upstream tones, where each tone is a standard 4kHz voice channel
During initial negotiation, a tone is used only if the S/N is above 6 db (4)
kbpsup
Mbpsdown
297)41(log4000*32
4.2)41(log4000*256
2
2
Faster
82
The Wire: Fiber
A look at a fiber
How it works?
A graded index fiber
The Wire: Fiber
Wide spectrum at low loss: ~0.3db/km (c.f. copper ~190db/km @100Mhz), 30-100km without repeater
Bandwidth of a single fiber theoretical: 100-200Tbps
http://www.trnmag.com/Stories/080101/Study_shows_fiber_has_room_to_grow_080101.html
Lightweight: 33 tons of copper to transmit the same amount of information carried by ¼ pound of optical fiber
Advantages of Fibers
How to Do Switching?
Optical-Electrical-Optical Optical switch: optical micro-electro-mechanical systems
(MEMS)
Optical path One optical switch
http://www.qwest.com/largebusiness/enterprisesolutions/networkMaps/preloader.swf
Example: MEMS Optical Switch Using mirrors, e.g. Lambda Router
Implications
Fine-grained switching may not be feasible
What is the architecture of optical networks: packet switching, circuit switching, or others?
More Efficient
89
Large deployment of highly adaptive, multipoint applications
An iterative process between two sets of adaptation: ISP: traffic engineering to change routing
to shift traffic away from higher utilized links
• current traffic pattern new routing matrix
App: direct traffic to better performing end points
• current routing matrix new traffic pattern
Problem: Inefficient Interactions
ISP optimizer interacts poorly with App.
ISP Traffic Engineering+ App Latency Optimizer
- red: App adjust alone; fixed ISP routing- blue: ISP traffic engineering adapt alone; fixed App communications
The Fundamental Problem Traditional Internet architectural feedback
to application efficiency is limited: routing (hidden) rate control through coarse-grained TCP
congestion feedback To achieve better efficiency, needs explicit
communications between network resource providers and applications
P4P Framework – Design Goals
Performance improvement Scalability and extensibility: support
diverse ISP objectives and applications scenarios in large networks
Privacy preservation Ease of implementation Open standard: any ISP, provider,
applications can easily implement it
Current Status
P4P-WG Next step
wider integration IETF standard
• AT&T• Bezeq Intl• BitTorrent• CacheLogic• Cisco Systems• Grid Networks• Joost• LimeWire• Manatt• Oversi• Pando Networks• PeerApp• Telefonica Group• VeriSign• Verizon• Vuze• Univ of Washington• Yale University
• Abacast• AHT Intl• Akamai• Alcatel Lucent• CableLabs• Cablevision• Comcast• Cox Comm• Juniper Networks• Microsoft• MPAA• NBC Universal• Nokia• RawFlow• Solid State
Networks• Thomson• Time Warner Cable• Turner Broadcasting
Reliability
Is the Internet Reliable?
A key design objective of the “Internet” (i.e., packet-switched networks) is robustness
Does the Internet infrastructure achieve the target reliability objective of a highly reliable system (99.999%)?
Perspective
911 Phone service (1993 NRIC report +) 29 minutes per year per line 99.994% availability
Std. Phone service (various sources) 53+ minutes per line per year 99.99+% availability
…what about the Internet? Various studies: about 99.5% Need to reduce down time by 500 times to
achieve five nines; 50 times to match phone service
Unreachable Networks: 10 days
Internet Disaster Recovery Response
Why slow response? the cable repairing is slow: not until 21 days
after quake BGP is not designed to create business
relationship
Objective a meta-BGP to facilitate discovery and
creation of BGP business relationship
100
101
Backup: IP Multicast
102
IP Fragmentation & Reassembly Network links have MTU
(max.transfer size) - largest possible link-level frame. different link types,
different MTUs, e.g. Ethernet MTU is 1500 bytes
Large IP datagram divided (“fragmented”) one datagram
becomes several datagrams
“reassembled” only at final destination
IP header bits used to identify, order related fragments
fragmentation: in: one large datagramout: 3 smaller datagrams
reassembly
103
IP Fragmentation and Reassembly
ID=x
offset=0
fragflag=0
length=4000
ID=x
offset=0
fragflag=1
length=1500
ID=x
offset=1480
fragflag=1
length=1500
ID=x
offset=2960
fragflag=0
length=1040
One large datagram becomesseveral smaller datagrams
Example 4000 byte
datagram MTU = 1500
bytes
104
IP Multicast: Service Model
Multicast group concept: use of indirection A group is identified by a location-independent
logical address (class D IP address: prefix 1110) Open group model
Anyone can send packets to the “logical” group address Anyone can join a group and receive packets
Normal, best-effort delivery semantics of IP
128.119.40.186
128.59.16.12
128.34.108.63
128.34.108.60
multicast group
226.17.30.197
Needed: infrastructure to deliver mcast-addressed datagrams to all hosts that have joined that multicast group
105
Multicast Across LANs
shared tree source-based trees
Goal: find a tree (or trees) connecting routers having local mcast group members source-based: different tree from sender to each receiver
– Distance-vector multicast routing protocol (DVMRP)– Protocol-independent multicast-dense mode (PIM-DM)
shared-tree: same tree used by all group members– Core-Based Tree (CBT)– Protocol-independent multicast-sparse mode (PIM-SM)
106
Source Tree: Reverse Path Flooding (RPF)
A router x forwards a packet from source (S) iff it arrives via neighbor y, and y is on the shortest path from x back to S
A packet is replicated to all but the incoming interface
xxyy
tt
SS
a
zz
1
1
1
1
1
107
Reverse Path Forwarding: Improvement Basic idea: forward a packet from S only
on child links for S A child link of router x for source S
a link that has x as parent on the shortest path from thelink to S
a child x notifies its parent y(through the routing protocol)that it has selected y as itsparent
xxyy
tt
SS
a
zz
108
Reverse Path Forwarding: Pruning No need to forward datagrams down
subtree with no mcast group members
“prune” msgs sent upstream by router with no downstream group members
R1
R2
R3
R4
R5
R6 R7
router with attachedgroup member
router with no attachedgroup member
prune message
LEGENDS: source
links with multicastforwarding
P
P
P
109
Pruning
Prune (Source, Group) at a leaf router if no members send No-Membership Report (NMR) up tree
If all children of router R prune (S,G) propagate prune for (S,G) to its parent
What do you do when a member of a group (re)joins? send a Graft message to upstream parent
How to deal with failures? prune dropped flow is reinstated down stream routers re-prune
Note: again a soft-state approach
110
Implementation of Source Trees in the Internet
Multicast OSFP (MOSFP) Membership is part of the link state distribution;
calculate source specific, pre-pruned trees
Reverse Path Forwarding Distance Vector Multicast Routing Protocol (DVMRP) Protocol Independent Multicast – Dense Mode (PIM-DM)
• very similar to DVMRP
Difference: PIM uses any unicast routing algorithm to determine the path from a router to the source; DVMRP uses distance vector
Question: the state requirement of Reverse Path Forwarding
111
Building a Shared Tree
Steiner Tree: minimum cost tree connecting all routerswith attached group members
A Steiner tree is not a spanning tree because you do not need to connect all nodes in the network
Problem is NP-hard Excellent heuristics exists Not used in practice:
computational complexity information about entire network needed monolithic: rerun whenever a router needs to join/leave
112
Center (Core) based Shared Tree
Single delivery tree shared by all One router identified as “center” of tree Tree construction is receiver-based
edge router sends unicast join-msg addressed to center router
join-msg “processed” by intermediate routers and forwarded towards center
join-msg either hits existing tree branch for this center, or arrives at center
path taken by join-msg becomes new branch of tree for this router
A sender unicasts a packet to center The packet is distributed on the tree when it hits the
tree
113
Example: M3 Joins
Group members: M1, M2
core
M1
M2 M3
shared tree
S1join message
Discussion: what is property of the constructed tree?
114
Example: M1 Sends Data Group members: M1, M2, M3 M1 sends data
core
M1
M2 M3
control (join) messagesdata S1
115
Shared Tree Protocols in the Internet
Core Based Tree Protocol Independent Multicast (PIM)
Sparse mode The catch: how do you know the center?
session announcement
116
Mbone: Tunneling
Q: How to connect “islands” of multicast routers in a “sea” of unicast routers?
mcast datagram encapsulated inside “normal” (non-multicast-addressed) datagram
normal IP datagram sent thru “tunnel” via regular IP unicast to receiving mcast router
receiving mcast router unencapsulates to get mcast datagram
physical topology logical topology