+ All Categories
Home > Documents > Sundar Iyer

Sundar Iyer

Date post: 23-Feb-2016
Category:
Upload: brit
View: 58 times
Download: 0 times
Share this document with a friend
Description:
Winter 2012 Lecture 7 Packet Buffers. EE384 Packet Switch Architectures. Sundar Iyer. The Problem. All packet switches (e.g. Internet routers, Ethernet switch) require packet buffers for periods of congestion. - PowerPoint PPT Presentation
Popular Tags:
23
Sundar Iyer Winter 2012 Lecture 7 Packet Buffers EE384 Packet Switch Architectures
Transcript
Page 1: Sundar Iyer

Sundar Iyer

Winter 2012Lecture 7

Packet Buffers

EE384Packet Switch Architectures

Page 2: Sundar Iyer

The Problem

• All packet switches (e.g. Internet routers, Ethernet switch) require packet buffers for periods of congestion.

• Size: A commonly used “rule of thumb” says that buffers need to hold one RTT (about 0.25s) of data. Even if this could be reduced to 10ms, a 4x10Gb/s linecard would require 400Mbits of buffering.

• Speed: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive

(depart). At 4x10Gb/s, minimum sized packets must arrive and depart every 8ns.

Page 3: Sundar Iyer

An ExamplePacket buffers for a 40Gb/s linecard

BufferMemory

Write Rate, R

One 40B packetevery 8ns

Read Rate, R

One 40B packetevery 8ns

Buffer Manager

UnpredictableScheduler Requests

Memory needs to be accessed for write or read every 4ns

Page 4: Sundar Iyer

Memory Operations Per Second (MOPS)

What is MOPS?• Num. Unique Memory Operations Per Second

• Refers to the speed of the address (not data) bus• Inverse of Random Access Time

Examples • SRAM with 4ns access time = 250M MOPS• DRAM with 50 ns access time = 20M MOPS

Page 5: Sundar Iyer

Memory Technology

Use SRAM?+ Fast enough random access time, but- Low density, high cost, high power.

Use DRAM? + High density means we can store data, but- Can’t meet random access time.

Page 6: Sundar Iyer

The Problem: No single memory technology is a good match

Ideal to have access/s of SRAM, Cost & Density of DRAM

800M MOPS$1 per Mb

800 Mb/s per pin

SRAM (S)

S

50M MOPS4c per Mb

1000 Mb/s per pin

FCRAM/RLDRAM (F)F

25M MOPS2c per Mb

3200 Mb/s per pin

XDRAM (X)X

25M MOPS1c per Mb

1600 Mb/s per pin

DDR3 (D)D

Page 7: Sundar Iyer

Sol 1: Can’t we just use lots of DRAMs as separate memories in parallel?

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

40B 40B40B40B 40B 40B 40B 40B

Solution– Write 40B packets to available banks– Read 40B packets from specified banks

Problem– What if back to back reads occur from a small number of

banks?

Read, write 40B every 4ns from a different ‘32ns access time’ memory

Page 8: Sundar Iyer

Sol 2: Can’t we just use lots of DRAMs as one monolithic memory in parallel?

BufferMemory

Write Rate, R

One 40B packetevery 8ns

Read Rate, R

One 40B packetevery 8ns

Buffer Manager

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

Read/write 320B every 32ns

40-79Bytes: 0-39 … … … … … 280-319

320B 320B

Page 9: Sundar Iyer

Sol 2: Works fine if there is only one FIFO

Write Rate, R

One 40B packetevery 8ns

Read Rate, R

One 40B packetevery 8nsBuffer Manager

40-79Bytes: 0-39 … … … … … 280-319

320B

Slow Buffer Memory

320B40B 320B

320B

40B40B 40B 40B 40B 40B 40B 40B 40B

320B320B320B320B320B320B320B320B320B320B

Page 10: Sundar Iyer

Sol 2: Works fine if there is only one FIFO

Write Rate, R

One 40B packetevery 8ns

Read Rate, R

One 40B packetevery 8nsBuffer Manager

40-79Bytes: 0-39 … … … … … 280-319

320B

Buffer Memory

320B?B 320B

320B

?B

320B320B320B320B320B320B320B320B320B320B

& Supports Variable Length Packets

Page 11: Sundar Iyer

Sol 2: In practice, buffer holds many FIFOs

40-79Bytes: 0-39 … … … … … 280-319

320B 320B 320B 320B

320B 320B 320B 320B

320B 320B 320B 320B

1

2

Q

Q might be 1k – 64k

Write Rate, R

One 40B packetevery 8ns

Read Rate, R

One 40B packetevery 8nsBuffer Manager

320B

320B?B 320B

320B

?B

How can we writemultiple variable-lengthpackets into differentqueues?

Page 12: Sundar Iyer

Problem

A block contains packets for different queues, which must be written to, or read from different memory locations.

Page 13: Sundar Iyer

Sol 3: Hybrid Memory Hierarchy

Packet processor

ArrivingPackets

R

DepartingPackets

R

Small fast cacheSRAM

Big slow memoryDRAM

A CPU cache is probabilistic

Q: Why is randomness a problem in this context?

Small Probability of

Miss Rate

Page 14: Sundar Iyer

ArrivingPackets

R

UnpredictableScheduler

Requests

DepartingPackets

R

12

1

Q

21234

345

123456

Small SRAM for FIFO heads

SRAM

Sol 4: Hybrid Memory Hierarchy with 100% Cache Hit Rate

Large DRAM memory holds FIFO body

57 6810 9

79 81011

1214 1315

5052 515354

8688 878991 90

8284 838586

9294 9395 68 7911 10

1

Q

2

Writingb bytes

Readingb bytes

for FIFO tails

5556

9697

8788

57585960

899091

1

Q

2

Small SRAM

DRAM

Page 15: Sundar Iyer

Design questions

1. What is the minimum SRAM needed to guarantee that a byte is always available in SRAM when requested?

2. What algorithm minimizes the SRAM size?

Page 16: Sundar Iyer

An Example Q = 5, w = 9+, b = 6

t = 1

Bytes

t = 3

Bytes

t = 4

Bytes

t = 5

Bytes

t = 7

Bytes

t = 2

Bytes

t = 6

Bytes

t = 0

BytesReplenish

Replenish

Page 17: Sundar Iyer

An Example Q = 5, w = 9+, b = 6

t = 8

Bytes

t = 9

Bytes

t = 10

Bytes

t = 11

Bytes

t = 12

Bytes

t = 13

Bytes Replenish

t = 19

Bytes Replenish

t = 23

Bytes

Read

Page 18: Sundar Iyer

The size of the SRAM cache

Necessity– How large does the SRAM cache need to be under any management algorithm?– Claim: wQ > Q(b - 1)(2 + lnQ)

Sufficiency– For any pattern of arrivals, what is the smallest SRAM cache needed so that a byte is always

available when requested? – For one particular algorithm: wQ = Qb(2 + lnQ)

w

Bytes

Q

w

Page 19: Sundar Iyer

Definitions

Occupancy: X(q,t)The number of bytes in FIFO q (in SRAM) at time t.

Deficit: D(q,t) = w - X(q,t)

w

Q

w

occupancy deficit

Page 20: Sundar Iyer

Smallest SRAM cache

11

1

read 1 byte f rom every queue:

queues are replenished, and queues have a defi cit of 1 byte.

read 1 byte f rom every queue with defi cit of 1 byte:

Q Qb b

Q

1st iteration:

2nd iteration:21

11

. .

queues have a defi cit of 2 bytes.

At the end of the iteration queues have a defi cit of bytes.

Af ter some number of iterations, we are down to one queue:

xth th

b

x x Q xb

i e

iteration:

1 11 1 ln 1 ln ( 1) ln , ln(1 ) . since f or small

I f the queue has f ewer than bytes in it, then successive reads make the queue under-run bef ore it can be replenis

x

Q x Q x b Qb b

b b

1 1 ln .hed. So, w b b Q

Page 21: Sundar Iyer

Smallest SRAM cache

In addition, each queue needs to hold (b – 1) bytes in case it is replenished with b bytes when only 1 byte has been removed.

Therefore, SRAM size must be at least: Qw > Q(b – 1)(2 + lnQ).

Page 22: Sundar Iyer

Most Deficit Queue First

Algorithm: Every b timeslots, replenish the queue with the largest deficit.

Claim: An SRAM cache of size Qw > Qb(2 + lnQ) is sufficient.

Examples: 1. 40Gb/s linecard, b=640, Q=128: SRAM = 560kBytes2. 160Gb/s linecard, b=2560, Q=512: SRAM = 10MBytes

Page 23: Sundar Iyer

23

Intuition for Theorem The maximum number of un-replenished requests for any i queues

wi, is the solution of the difference equation -

with boundary conditions

( ) ; { }-

- i 1

i 1i w b

w w b i 2, 3, ... Qi 1

qw Qb

Examples:

1. 40Gb/s line card, b=640, Q=128: SRAM = 560kBytes2. 160Gb/s line card, b=2560, Q=512: SRAM = 10MBytes


Recommended