Apresentação feita em 2006 no Annual Simulation Symposium.

transcript

39th Annual Simulation Symposium

Modeling, Simulation and Performance

Evaluation for a CIOQ Switch

Architecture

Sponsored by FAPEMIG

Antonio M. AlbertiINATEL – National Institute of

Telecommunications, MG, Brazil.

Sebastiao R. de Aguiar FilhoFEMC – Fundação Educacional Montes Claros,

MG, Brazil.

Anilton Salles GarciaUFES – Federal University of Espirito Santo,

ES, Brazil.

Presentation Outline

� Introduction

� Single Input Buffer CIOQ Architecture

� Class Based Input Buffer CIOQ Architecture

� Developed Models

� Performance Evaluation

� Final Remarks

Introduction

� In the past decade, data traffic has experimented a huge

growth, mainly due to Internet popularization.

� Telephony operators built new networks to transport end

users multimedia traffic.

� Technologies as ADSL and ATM (Asynchronous Transfer

Mode) emerged in access and core networks, respectively.

� Also, powerful routers have been developed to drain Internet

traffic.

Introduction

� Packet switching nodes and their architectures have

experienced a big development, not only in terms of capacity

and scalability, but also in terms of efficiency and QoS

support.

� An important portion of this deployment occurred in the

context of the ATM networks.

� Most of the ATM switch architectures are built arranging

multistage switching elements to form an interconnection

network.

Introduction

� They can be classified as:

� Blocking or non-blocking, according to their capacity to control

packet loss events or to eliminate blocking.

� Input-Queueing (IQ), Output-Queueing (OQ) or Shared-

Queueing (SQ), depending on where buffering is necessary.

� Output-Queueing:

� Advantage:

� It has 100% theoretical throughput.

� Disadvantages:

� It requires an internal speedup factor in order to transfer several packets to

a single output queue in every cycle.

� Output queues capacity must be large enough to store all the transferred

packets.

Introduction

� Input-Queueing:

� Advantage: � Overcomes the scalability problem, because they run as fast as the input line rate, therefore making possible to build very fast switches.

� Disadvantages: � It requires an internal speedup factor in order to transfer several packets to a single output queue in every cycle.

� Suffers from HOLB (Head-of-Line-Blocking), which limits the throughput to just 58.6%.

� Virtual Output Queueing (VOQ):

� Advantage: � Eliminates HOLB.

� Disadvantage: � High complexity and poor scalability, since the number of virtual queues in the input ports grows quadradically with the number of input ports.

Introduction

� Combined Input/Output Queue (CIOQ):

� Advantage:

� Combines input and output queueing and achieves a good balance

between performance and scalability.

� Capable to remove S packets from each input port and transfer up to S

packets to every output during an input time slot.

� Disadvantage:

� According to Luo et.al., CIOQ is very complex when compared with CICQ

(Combined Input-Crosspoint-Queueing).

Introduction

� Santos-Motoyama (SM) CIOQ:

� Advantages:

� Doesn’t need internal speedup.

� Can reduce HOLB while improving throughput.

� More simple than original CIOQ.

� These features motivated us to model, simulate and evaluate

SM CIOQ architectures. Also, we are interested on validate and

compare results with original SM paper.

� Santos-Motoyama developed two CIOQ Architectures:� Single Input Buffer CIOQ Architecture

� Class Based CIOQ Architecture

Single Input Buffer CIOQ Architecture

� It has one simple FIFO queue for each input port, a crossbar

with m internal links (or channels) from each input to each

output port and m output queues in every output port.

� Each input queue has a control unit (CRT), which monitors

queue’s head in order to determine if there exists a packet to

be transferred.

� If it is the case, it sends a request (REQ) to a desired output

port scheduler module (SCH) in order to request a crossbar

link to this output port.

� Any CRT can ask just one request per time slot.

� Overview

Port 1

Port 2

Port 3

Port 4

Port N

SCH 1 SCH 2 SCH N

REQ Bus

(N bits)

ACK Bus

(N bits)

1 2 m 1 2 m 1 2 m

Output

Port 1

Output

Port 2

Output

Port N

1 2 m 1 2 m 1 2 m

� The SCH grants on a round-robin basis up to m links to the

asking CRTs.

� This is done through acknowledgement signals (ACKs).

� To be fair, in the next cycle SCH will begin to grant from the

input that wasn’t granted in the previous cycle.

� The output queues are also served in a round-robin basis.

Class Based Input Buffer CIOQ Architecture

� Extended version of the previous architecture to support

traffic classes priorization.

� It has five logical FIFO queues in each input port, one for

every priority class.

� The priority classes are named according to ATM service

categories: CBR, rtVBR, nrtVBR, ABR and UBR.

� The incoming packets are classified and stored in the

appropriate class queues.

� The architecture also uses two buses: REQ and ACK.

� At each output port, 5xm physical queues are needed, where

m is the number of internal links.

� Also, it has one scheduler for each output port.

� Both input and output schedulers use round-robin service

discipline to determine service order.

� Overview

Port 1

REQ Bus

(N bits)

ACK Bus

(N bits)

nrtVBR

Port N

nrtVBR

Output

Port 1

Output

Port N

Port 2

nrtVBR

1 m1 m

Developed Models

� We used Arena 5.0TM to develop and implement simulation

models for the SM CIOQ architectures.

� To each architecture we developed a basic model and

implemented several derived models varying the number of

input-output ports (N), the number of internal links (m) and the

offered load (r).

� At the end, we developed 181 simulation models.

� Model Example: N8M2R09 (N=8, m=2 and r=0.9) single

buffer CIOQ model.

Developed Models

� N8M2R09 ArenaTM Model

Developed Models

� N8M2R09 Block DiagramCell

Generation

Decide 28Create 1

Create 2 Decide 29

Create 8 Decide 35

Regulation

Assign 1

Assign 2

Assign 8

Output Port

Definition

Hold 1

Hold 2

Hold 8

Input Port

Queues

Decide 1

Assign 9

Assign 10

Assign 16

Assign 18

Assign 24

Process 81

Process 82

Process 88

Decide 2 Assign 17

Assign 25

Assign 26

Assign 32

Schedulers and Crossbar

Decide

Process 9

Process 10

Process 16

Assign 73

Assign 74

Assign 80

Assign 81

Dispose

Output Ports

Create 10Process

100Delay 100

Assign 82

Dispose

Decide 3

Decide 9

Performance Evaluation

1 2 3 4 5 6 7 8 9 1010

N = 16

N = 32

N = 64

Traffic Load = 0.9

1 2 3 4 5 6 7 8 9 1010

Internal Links

Average Blocking Probability

Traffic Load=0.9

HOLB vs. number of internal links under 90% traffic load for switch sizes N=16, N=32

and N=64. a) our results. b) Santos-Motoyama results.

1 2 3 4 5 6 7 8 9 1010

Switch Size: 64 X 64

Traffic Load = 0.9

Traffic Load = 0.8

Traffic Load = 1.0

Traffic Load = 0.7

1 2 3 4 5 6 7 8 9 1010

Switch Size: 64 x 64

ρ = 0.7

ρ = 0.8

ρ = 0.9

HOLB vs. number of internal links for a 64x64 switch under several traffic loads. a) our

results. b) Santos-Motoyama results.

Mean input buffer occupation vs m under 90% traffic load for switch sizes N=8, N=16,

N=32 and N=64. a) our results. b) Santos-Motoyama results.

2 3 4 5 610

N = 16

N = 32

N = 64

Maximum occupation for input queues under 90% traffic load.

8 16 32 64

3 5 7 7 9

4 3 3 4 4

5 2 2 2 3

6 2 2 2 2

Per class mean input queue occupation vs m for a 16x16 switch under 90% traffic load.

a) our results. b) Santos-Motoyama results.

2 3 4 5 610

Internal Links

Average Queue Length

class1 - 40%

class2 - 20%

class3 - 20%

class4 - 10%

class5 - 10%

Switch Size: 16x16

Traffic Load: 0.9

2 3 4 5 610

Class 1 - 40%

Class 2 - 20%

Class 3 - 20%

Class 4 - 10%

Class 5 - 10%

Per class mean output queue occupation vs m for a 16x16 switch under 90% traffic load.

a) our results. b) Santos-Motoyama results.

2 3 4 5 6

Internal Links

Average Queue Length

class1

class2

class3

class4

class5

Switch Size: 16x16

Traffic Load: 0.9

2 3 4 5 6

Internal Links

Average Q

ueue Length

Classe 1

Classe 2

Classe 3

Classe 4

Classe 5

Switch Size: 16x16

Traffic Load: 0.9

Final Remarks

� We presented modeling, simulation and performance

evaluation of two Santos-Motoyama CIOQ architectures.

� We validated and compared results with SM previous work.

� We proved that the studied CIOQs can reduce HOLB using a

simple solution and without high speed rates inside the

switch, producing a good improvement with regard to Input

Queueing, not only in terms of occupation reduction, but also

in terms of HOLB decrease.

� Future works include performance evaluation under other

traffic patterns, traffic classes, load situations, internal links

and packet sizes (focusing on IP/MPLS/DiffServ networks).

Thank You!

alberti@inatel.br

tiao@femc.edu.br

anilton@inf.ufes.br

Apresentação feita em 2006 no Annual Simulation Symposium.

Technology