Throughput analysis of input-buffered ATM switch

Throughput analysis of input-buffered ATM switch

1.1. Makhamreh

Indexing terms: Packet switching, ATM switches, Queueing theory

Abstract: The limiting throughput of an ATM (asynchronous transfer mode) switch with input queueing under the FCFS (first-come-first-served) scheme is 0.59. Three policies are proposed in the literature to improve the throughput of the switch: policy A (input expansion policy), policy B (window selection policy) and policy C (cell discarding policy). The author considers, for policy C, correlated traffic where all cells belonging to a burst are routed to the same output port and each traffic source is modelled by an interrupted Bernoulli traffic process (IBP). The results for policy C show that the destination correlation does not affect the throughput, and that the throughput is higher when all input traffic is balanced. Better results for policy B are obtained compared with the simulation results found in the literature. Results show that policy A achieves the highest throughput. The throughput obtained by policy A is higher by 0.07 than that obtained by policy B. For example, with expansion size s = 8, the achievable throughput obtained by policy A is 0.94 With window size w = 8, the throughput obtained by policy B is 0.87. The throughput obtained by policy C exceeds 0.8 when the input traffic rate is less than 0.5. The throughput, for policy C, is p = 1 - (1 - p / Y N whether the input is IBP or Bernoulli and whether there is destination correlation or not. Simulation data, which are available in the literature for policy B, show better agreement with the author’s analytical results than with other analytical results in the literature.

1 Introduction

In a packet switch, buffering could be done at the input ports, at the output ports or at input/output ports of the switching element. Input-buffered packet switches have less throughput than output-buffered switches. An output-buffered switch can achieve 100% throughput. On the other hand, input buffering is done at the input controller outside the switching fabric, which imposes less implementation complexity on the switching ele-

0 IEE, 1998 IEE Proceedings online no. 19981660 Paper first received 27th September 1996 and in revised form 27th August 1997 The author is with the Electrical Engineering Department, Jordan University of Science and Technology, Irbid, Jordan

ment. For this reason, an input-buffered ATM switch might be a good candidate for high-performance switches. The performance of an input-buffered nonblocking packet switch is mainly restricted by the output blocking. To alleviate the effect of the head-of-the- line-blocking (HOL), many schemes have been suggested in the literature. The simplest discipline from a control and implementation point of view is FCFS, in which only the HOL cell of each input queue may contend for the switching array. If a HOL cell wins the contention, then it is removed from the HOL position, and the next in the line moves to the HOL position. The cells which lose in the contention remain at the HOL positions and contend again in the next time slot. It has been shown that the maximum achievable throughput decreases with switch size N and reaches 0.59 for very large switch size.

The queueing model in this paper might be applied to other contexts. The memory interference problem appears in multiprocessor systems, where N multiproc- essors access M memory modules. Each multiprocessor may send a request to a memory. Contention thus might occur.

In this paper, we consider three selection policies which improve the throughput of the ATM switch with input queueing: policy A (input expansion policy), policy B (window selection policy) and policy C (cell discarding policy). Among the three policies considered in this paper, policy A achieves the highest throughput. An approximate analysis of the maximum throughput as a function of window size has been carried out in [I] for an infinitely large switch size.

The interrupted Bernoulli process is suggested fre- quently in the literature as a traffic model. IBP can capture bursty traffic sources because it is basically an ON/OFF process. For policy C, we model each individ- ual input process as an IBP which alternates between busy and idle periods. We consider two routing mecha- nisms: correlated routing (it is also called correlated destination) and uncorrelated routing. In the case of correlated routing, all cells belonging to a burst are switched to the same output port. The maximum throughput of a space-division packet switch with correlated destinations is studied in [2], where it has been shown that, under the FCFS scheme, the correlation decreases the throughput from 0.59 to 0.51 when ten consecutive cells are sent to the same destination.

2 Switch model

We model an N x N nonblocking ATM switch with input queueing (Fig. 1). Time is slotted with one time slot equal to one cell duration. Each input port has an infinite buffer. In a time slot, if k cells are addressed to a particular output, only one cell is switched to its des-

15 IEE Proc.-Commun., Vol. 145, No. I , February 1998

tination. The rest, k - 1, are stored or discarded depending on the policy. The speed of the input port is equal to the speed of the output port and equal to one cell per slot. The switching fabric can switch only one cell during a time slot (speed of 1).

N X N nonblocking packet switch

Fig. 1 Switch model with input buffers

3.1 Policy A In this scheme, each input port is expanded into s cells before cells enter an asymmetric Ns x N switch. In a time slot, up to s cells from each input queue can be presented to the switch fabric for contention.

In the following analysis we follow [3], in which they analyse the FCFS policy. We follow their notation. Here, we include the modifications needed to account for the expansion policy. We assume random traffic and that there are always cells at the input queues wait- ing for transmission (saturation analysis). This policy clearly provides higher throughput than the HOL selection policy because it decreases the effect of HOL blocking. The disadvantage of the expansion policy is that the switch controller has to search more space in order to select a cell for transmission. As the space increases, such a search process might become difficult, both in terms of time and circuitry.

Let BmL be the number of cells blocked at the mth time slot and destined to output i by searching the Ns HOL cells. Also, let Amz be the number of cells arriving at a free input port and destined to output i at the mth time slot. We have

Bh = max(Bh,, + A; ~ 1,O) (1) As N + and s = 1, Amz becomes a Poisson distri-

bution with mean po (see [3]). In the same direction, we assume that this is true for s greater than 1.

Eqn. 1 represents an MIDI1 queue which has a mean queue length given by

The number of free input queues at the end of the (m - 1)th time slot, representing the total number of cells transmitted through the switching during the (m - 1)th time slot, is given by

N

(3) i=l

Notice that N is multiplied by s, the expansion size. The throughput of output port i is then

From solving eqns. 2, 3 and 4 we get

(5)

Therefore, the throughput is given by

(6) 1

PO = s + 1 - -44s2 + 8s - 4 2

Table 1 shows the throughput for several expansion sizes obtained by eqn. 5 . Notice that, with s = 1, the policy corresponds to input queueing with FCFS buffers. Referring to Table 1, we can notice that the throughput increases with s and that this policy gives improved performance with small values of s. An expansion size of 8 provides throughput of about 0.94.

Table 1: Switch throughput against expansion size s obtained by analysis

Exp. size 1 2 3 4 5 6 Formula 0.59 0.76 0.84 0.88 0.90 0.92

3.2 Policy B A higher switch throughput can also be achieved by relaxing the strict FCFS discipline at the input buffers. Each input still sends at most one cell into the switch fabric per time slot, but not necessarily the first cell in its queue, and no more than one cell is allowed to pass through the switch fabric to each output. Consider a scenario where, at the beginning of each time slot, up to the first w cells (called ‘window size’) in each input queue sequentially contend for access to the switch outputs. The cells at the heads of the input queues contend first for access to the switch outputs. Those inputs not selected to transmit the first cells in their input queues then contend with their second cells for access to any remaining idle outputs, and so on. A window size of w = 1 corresponds to input queueing with FCFS buffers.

This policy clearly provides higher throughput than the FCFS scheme because it decreases the effect of HOL blocking. With s = w, policy A (expansion policy) provides higher throughput than the window selection policy because the first s cells from each input queue are available in a time slot for contention.

The disadvantage of the window selection policy over the FCFS scheme is that the switch controller has to search more space in order to select a cell for transmission. As w increases, such a searching process might become difficult, in terms of time and circuitry.

We have noticed, by comparing the switch throughput of policy A and the available simulation results [4] for the window selection policy, that the throughput obtained by policy A is always greater by about 0.07 than the throughput obtained by the window selection policy. This observation provides us with a formula to calculate the throughput obtained by the window selection policy. The throughput is given by (s = w)

1 PO = s + 1 - -.\/4s2 - 8s - 4 - 0.07 (7) 2

Table 2 shows the throughput for several window sizes obtained by analysis and by simulation. The simulation results are taken from [4]. The results show very good agreement between the analysis and the simulation. It is worth stating that the results obtained by the above formula are closer to the simulation data than those obtained by a formula found in [1].

IEE Proc -Commun , Vol 145, No 1, February 1998 16

Table 2: Switch throughput against window size w obtained by simulation and formula

Window size 1 2 3 4 5 6

Simulation 0.59 0.70 0.76 0.80 0.83 0.85

Formula 0.59 0.69 0.77 0.81 0.83 0.85

Referring to Table 2, notice that the throughput increases with w and that this policy gives improved performance with small values of w. A window size of 8 provides throughput of about 0.87.

3.3 Policy C: the case of the interrupted Bernoulli process In this policy, whenever k cells are addressed to a particular output port in a time slot, only one can be switched to the output port. The remaining k - 1 cells are simply discarded. Thus, the HOL blocking is totally removed. In a time slot, the new arrivals k' will be allowed to compete alone without the effect of previous arrivals.

3.3.1 Routing mechanism 1: Each input traffic source to the switch is modelled by an interrupted Ber- noulli process (IBP). The IBP for source i alternates between two geometrically distributed periods: busy period, with mean l/ai, and idle period, with mean lipi During a busy period, cells arrive in every time slot, while during an idle period no cells arrive. This type of traffic model is usually preferred to a random traffic model (Bernoulli traffic), because it captures the bursty characteristic of the arrival traffic. We assume that cells belonging to a burst are routed randomly (uncorrelated routing, which we call routing mechanism 1) with routing probability rq from input port i to output port j , with

Now let ki be the number of arrival cells in a time slot whose destination is output port i, which is associated with transition matrix @. The algorithm to con- struct the transition matrix @, and with some modification R, is given below and can be found in [5]. Then the throughput of the output is

where nW*(ki) is the steady-state probability of k', which can be obtained by solving the following linear system of equations:

rii = 1. For symmetric routing, rii = UN.

p = 1 - 7&(0) (8)

rIT,(IC) = nT,w* (9) which is of size N + 1.

3.3.2 Routing mechanism 2: In this routing mechanism, we assume that cells belonging to a burst are switched to the same output port, say i (we call it destination correlation). Let k' be the number of cells that arrive in a time slot whose destination is output port i, which is associated with transition matrix w*. The steady-state distribution vector, nw*(ki), of w* can be obtained by solving the following linear set of equations:

rII*w ( I C " ) = rIb (V) w*

p = 1 - 7&(0)

(10)

(11)

The throughput of the output is

Aggregation of input tragic (constructing the transition matrix w*): For 1 I j , k I N, define the transition matrix Qi for source i as:

IEE Proc -Commun , Vol 145, No 1, February 1998

Q, [U, (1, .dl = 1 - Q, [ ( L j ) , (0,j)l = a, Q, KLjL (1, y)l = 0, Y + j Q, [(O, (0, A1 = 1 - P I

Q, [(O, ( 1 7 k)I = h r r k Notice that the equation Q,[(l, j ) , (1, y)] = 0, y f j forces all cells in a burst to be switched to the same output port. Also define n,(x,, y,) as the steady-state probability of the state (x,, y,) for the ith source, so that q(0, j ) = (1 - pJr , and n,(l, j ) = pLr, for i E { 1, 2, ..., N } a n d j E { 1, 2, ..., N } .

The aggregated transition matrix w* can be calculated as follows. First, the steady-state probabilities of the aggregated source, zw*(j), j = 0, 1, ..., N , can be calculated in 2N2 operations by multiplying recursively from the following equation:

N 1 7rb(( i )st t=O

1

Y 1 3 1 z = 1 L y

N

= n [(I - pt) + p t ( l - rz3) + P t T % J S ] (12) Z = 1

The above equation calculates the probability generating function of the induced steady-state probabilities, nw*(i), by multiplying out the steady-state probability generating functions of the N input traffic processes. The N input traffic processes are assumed to be independent. Secondly, the probability generating function of the induced transition probabilities, @(i, i'), can be calculated by multiplying out the N transition probability generating functions of the input traffic processes (which are independent processes), as given by the following equation:

N N-z

(z' ,Y') € F '

where F = { (x, y ) # (1, j ) } and F = { (x', y') # (1, j ) } . It takes roughly 4N2 calculations to get the coefficients of sltl. Now by dividing these coefficients by nw*(i) in eqn. 12, we get the transition probabilities w* (i, i + I , of the matrix @.

In the following we provide some numerical results where we assume uniform routing (i.e. the routing probabilities ( r , = 1/N) and the switch size is 8 (higher switch sizes are possible in the analysis because the linear system of equations is of size N + 1).

17

To find the effect of the burst length on the throughput Table 3 is developed. It is clear from Table 3 that increasing the burst length has no effect on the throughput. The explanation of that is since in policy C we do not queue the incoming cells (we drop unswitched cells), the correlation in the traffic does not affect the steady-state process of switching cells. The possibility that in the next time slot a large number of cells will arrive at the same destination equals the case of a smaller number of cells arriving. To find the effect of the traffic rate on the normalised throughput (the throughput divided by the traffic rate), we develop Table 4.

Table 3: Effect of burst length on throughput

Policv C (switch size: N = 8; input traffic rate 0 = 0.1)

Burst length (in cells) 5 10 20 30 100

Normalised throughput 0.957 0.957 0.957 0.957 0.957

From Table 4, it is clear that the normalised throughput decreases with the traffic rate and that the saturated throughput when the traffic rate p -+ 1 is about 0.656. This result tells us that when the switch operates under policy C it performs better when the traffic rate is, say, less than or equal to 0.5, where the throughput is greater than 0.8. Table 4 also shows that the throughput obtained by policy C is better than the FCFS scheme where the throughput saturates at 0.59 for large switch size. It can be seen, by comparing the above result with that obtained by considering Ber- noulli input, that the throughput is given by po = 1 - (1 - plWN, where p is the steady-state probability that a time slot contains a cell. This result is true whether the input is IBP or Bernoulli and whether the routing mechanism is 1 or 2.

3.4 Policy C: the case of Bernoulli traffic process The result obtained in this subsection is in [3], but we include it here for the continuity of the subject. For a Bernoulli traffic input process (BP), which is the limiting case of the IBP, the steady-state distribution of k is

k

T ( k ) = (9 ($)k (1 - $)“-k

k = 0 , 1 , 2 ) ’ ” , iv (14) The throughput of an output port is

p = 1 - T ( 0 )

For N -+ 00 the throughput p = 1 - e-p. If we let p -+ 1, the maximum throughput obtained is 0.63.

4 Conclusions

We have considered three policies which have improved the throughput of the ATM switch with input queueing: policy A (input expansion policy), policy B (window selection policy) and policy C (cell discarding policy). For policy C, we have considered an interrupted Bernoulli traffic process and Bernoulli traffic process, which is a limiting case of the IBP. The results show, for policy C, that the destination correlation does not affect the throughput. The throughput is given by po = 1 - (1 -p/WN, where p is the steady-state probability that a time slot contains a cell. This result is true whether the input is IBP or Bernoulli and whether the routing mechanism is routing mechanism 1 or 2. Among the three policies considered in the paper, policy A achieved the highest throughput. The throughput obtained by policy A is higher by 0.07 than that obtained by policy B. For example, with expansion size s = 8, the achievable throughput obtained by policy A was 0.94. With window size w = 8, the throughput obtained by policy B was 0.87. Simulation data which is available in the literature for policy B showed very good agreement with the analytical results.

5 Acknowledgments

The author would like to thank Dr D. McDonald and Dr N.D. Georganas for their helpful discussions related to the subject while he was a PhD student at the Uni- versity of Ottawa. He also thanks Dr Ani1 Gupta for providing his PhD thesis which contains a good review of the subject.

6 References

1 LEE, T.T.: ‘A modular architecture for very large packet switches’, IEEE Trans. Commun., 1990, 38, (7), pp. 1097-1106

2 CAO, X.-R.: ‘The maximum throughput of a nonblocking space- division packet switch with correlated destinations’, IEEE Trans. Commun., 1995, 43, (5), pp. 1898-1901

3 KAROL, M.J., HLUCHJ, M.G., and MORGAN, S.P.: ‘Input versus output queueing on a space-division packet switch’, IEEE Trans. Commun., 1987, 35, (12) HLUCHJ, M.G., and KAROL, M.J.: ‘Queueing in high-performance packet-switching’, IEEE J. Sel. Areas Commun., 1988, 6, (9), pp, 1587-1597

5 MAKHAMREH, I.I., MCDONALD, D., and GEORGA- NAS, N.D.: ‘Approximate analysis of a packet switch with finite output buffering and imbalanced correlated traffic’. Proceedings of IEEE ICC’94, New Orleans, 1994, session 330.2

4

Table 4: Normalised throughput against traffic rate

Policy C (switch size: N = 8; burst length = 30)

Traffic rate p 0.1 0.3 0.5 0.7 0.9 0.999

Normalised throughput 0.957 0.878 0.8066 0.7418 0.6834 0.6566

18 IEE Proc -Commun., Voi 145, No. 1, February 1998

Date post:	20-Sep-2016
Category:	Documents
Upload:	ii
View:	218 times
Download:	1 times

Throughput analysis of input-buffered ATM switch

Documents