# A Fast Parallel-Tree Switch Architecture for ATM Networks

Mayez Al-Mouhamed, Habib Youssef, and Wasif Hasan \*

#### Abstract

In this paper we present a novel fast packet switch architecture based on Banyan interconnection networks, called *Parallel-Tree Banyan Switch Fabric* (PTBSF). It consists of parallel Banyans (multiple outlets) arranged in a tree topology. The packets enter at the topmost Banyan. Internal conflicts are eliminated by using a conflict-free  $3 \times 4$  switching element which distributes conflicting cells over different Banyans. Thus, cell loss may occur only at the lowest Banyan. Increasing the number of Banyans leads to noticeable decrease in the cell loss rate. The switch can be engineered to provide arbitrarily high throughput and low cell loss rate without the use of input buffering nor cell pre-processing. The performance of the switch is evaluated analytically under uniform traffic load and by simulation under a variety of ATM traffic loads. Compared to other proposed architectures, the switch exhibited stable and excellent performance with respect to cell loss and switching delay for all studied conditions as required by ATM traffic sources. The advantages of PTBSF are modularity, regularity, self-routing, low processing overhead, high throughput and robustness under a variety of ATM traffic conditions.

Keywords: ATM switch architecture, ATM traffic, multistage networks, performance evaluation, simulation, traffic modeling.

# 1 Introduction

Modern telecommunication networks are evolving at a fast pace. Many newly emerging applications have diverse service features and require huge bandwidth.

In 1995, nearly 80% of the computers sold are multimedia capable as well as networking capable. Furthermore, there has been a sizable growth in the number of distributed applications (groupware, video conferencing, the World Wide Web, etc.). There is a general consensus in the computer networking community that current networks are ill equipped to accommodate this major influx of a variety of new services and the tremendous increase in bandwidth requirements. Broadband ISDN is one networking technology that can efficiently provide all of these new services. Efficiency is due to the adoption of the Asynchronous Transfer Mode

<sup>\*</sup>The authors are with the Department of Computer Engineering, College of Computer Science and Engineering, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia.

(ATM) as the transfer mode of B-ISDN. Furthermore, ATM combines the good features of circuit switching and packet switching techniques.

ATM technology is rapidly gaining ground. A number of ATM switches are already available in the market. Tremendous efforts are currently underway to resolve questions related to ATM signalling standards, traffic control, switch architecture, etc.

ATM values itself as a technology capable of scaling up smoothly and efficiently to increased demands for higher transmission speeds and larger transfer loads. It is limited only by the technology barriers of current transmission media. Therefore, an ATM switch should always be capable of forwarding packets (ATM cells) at higher speeds than the cell arrival rates to the switch. Moreover, the switch should be capable of supporting diverse traffic types and accommodating bit rates of the order of 100 Mbps and higher per input port. Thus, tremendous efforts are being made in order to design and implement a switch which is capable of switching cells at an extremely high rate and can handle diverse traffic types with least delay and high throughput (small cell loss rate) [1]. Our work is an effort in this direction.

Banyan networks, a class of multistage interconnection networks, are among the most favored building blocks for fast packet switches. Banyan networks possess several desirable characteristics such as space division, self routing, low hardware complexity, and regular structure which make them suitable for VLSI implementation. However, they suffer from two major drawbacks which are the potential of internal and external blocking which severely limits the throughput.

Several strategies have been suggested in order to overcome these problems. One is to use internal buffering as employed in the Buffered-Banyan networks [2]. But this increases the switch complexity (and hence large hardware overhead) and head of the line (HOL) queuing delays. In order to avoid this, either the internal links may be speeded up [3] or a distribution network may be installed in front of the switch to randomly distribute the traffic over the input ports and thus reduce blocking. The effect of HOL blocking can be decreased by allowing other queued packets to bypass the blocked HOL packet [4]. However, this leads to sequencing problem. Internal blocking can be avoided totally if we use a sorting network in front of the routing network. This has been done in the Batcher-Banyan network [5]. But the problem here is the complexity of the sorting network. The number of stages within the sorter is  $(N/4) * ((Nlog_2N)^2 + log_2N)$ , where N is the number of inputs [6]. Moreover output conflicts are still there.

The Starlite switch [7] employs recirculation of the conflicting packets back to the inputs which results in sequencing problems. Moreover, the necessity for feedback links restricts the size of the network. To overcome the sequencing problem, an aging (time stamp) mechanism gives the old packets priority for inclusion in the next batch of cells to be sorted. Schemes like three phase algorithm [8] and token passing [9] have been proposed in order to resolve external conflicts before any packet is sent. This involves a lot of processing overhead.

The Sunshine switch [10] incorporates several features in order to increase the throughput such as parallel Banyans, a sorting network, and recirculation. Though it succeeds in achieving high throughput, it involves considerable pre-processing overhead. Moreover, recirculation again restricts the size of the network in addition to causing sequencing problems.

An alternative strategy [11] to increase throughput is to distribute the incoming traffic

over parallel Banyans so that the routed packets are forwarded to the corresponding output buffers. Here the throughput increases slowly with the number of planes because internal blocking is still there. Moreover the complexity of the input and output port controllers increases significantly. The throughput of parallel Banyans [12] can be improved when separate control plane and data planes are used together with input buffering for conflicting cells. Higher throughput can be obtained at the cost of increased cell delay and delay jitter especially when the load is high.

Another switch architecture employing multiple Banyans is the tandem Banyan switching fabric (TBSF) [13]. This architecture consists of arranging Banyans in series one after the other, with the output of one Banyan connected to the input of the next and to the corresponding output buffer. When a conflict occurs, one of the packets is routed correctly while the other is misrouted. At the output, those packets which have been routed properly are distinguished from those that have been misrouted. The correctly routed packets go to the corresponding output buffers while the misrouted packets go into the next Banyan in series and are given another chance. The main drawback of this architecture is the possibility of packets getting out-of-sequence and the large delay jitter in routing the packets. Moreover, when one of the conflicting packets is misrouted, this packet has to be routed from the first stage onwards in the succeeding Banyan. Thus the advantage gained in routing the packet correctly up-to a certain stage is lost.

The piled Banyan switching fabric (PBSF) [14] also employs multiple Banyan networks arranged one above the other. Switching elements are  $4 \times 4$  except in the highest and lowest layers. Packets enter the topmost Banyan. The packets at the output of each Banyan go to the corresponding output buffer. If two packets conflict in some Banyan, one of the conflicting packets is routed to the correct output while the other is routed to the corresponding switching element in the Banyan plane in the next lower level. If three packets contend for the same output in a switching element, the packet at the vertical input is routed to the correct output, one of the packets at the horizontal inputs is routed vertically downward, while the third one is lost. Thus, in piled Banyan, loss can occur at any level including the lowest. Hence, the throughput of the piled Banyan remains constant at 98% under full load even when large number of Banyans is used.

In this paper, we present a Banyan based switch architecture called the *Parallel-Tree* Banyan Switching Fabric (PTBSF), which has the following desirable characteristics:

- 1. The switch has simple control, self-routing, no internal buffering, and no recirculation to minimize switching delay and reduce hardware as possible.
- 2. The switch is scalable. Its performance scales up well with the addition of reasonable extra hardware. The cell loss rate must meet ATM conditions.
- 3. The switch has a regular architecture to facilitate VLSI implementation.

The paper is organized as follows. In Section 2 we describe the proposed switch architecture, operations, hardware complexity, and switching delay. In section 3, we discuss the switch performance under uniform traffic conditions. Then in section 4, we study the switch performance under a simulated ATM workload. We conclude in section 5.



Figure 1: The parallel-tree switch architecture



Figure 2: Switching elements

# 2 The parallel-tree Banyan switch fabric

The parallel-tree Banyan switching fabric (PTBSF) is a fast packet switch architecture employing Banyan networks. It is a space division architecture which consists of arranging Banyan networks in a *parallel-tree structure* from layer 2 onwards as shown in Figure 1.

Each Banyan network within the switch consists of  $3 \times 4$  switching elements whose structure is shown in Figure 2-a.

Each  $3 \times 4$  switching element (SW) has two horizontal inputs ( $I_0$  and  $I_1$ ) and one vertical input ( $I_v$ ). There are 2 horizontal outputs ( $O_0$  and  $O_1$ ) and 2 vertical outputs ( $O_{v0}$  and  $O_{v1}$ ). The packets arrive at the inputs of the topmost Banyan. When a conflict occurs between two packets in the first level, one of the packets is routed correctly, while the other is routed vertically downward to the corresponding switching element in the next lower level.

If two cells arrive at a switching element, destined to the same output, one cell is routed

to the correct output while the other is routed through  $O_{v0}$  to a switching element in the same position in the next layer Banyan. From level two onwards, there can be three cells at a switch input requesting the same output. In this case, the switch routes one cell to  $O_0$  or  $O_1$  and the other two cells are routed vertically through the outputs  $O_{v0}$  and  $O_{v1}$  to the corresponding switching elements in the next lower layer Banyans. Therefore, internal conflicts are eliminated by locally distributing the conflicting cells over different Banyans. This strategy alleviates the problem of partitioning the input cells into non-conflicting groups in a completely distributed manner without pre-processing. This arrangement guarantees conflict-free routing without loss of cells regardless of the number of conflicting cells.

Based on the above routing strategy, the throughput on  $O_{v0}$  is higher than that of  $O_{v1}$ . Hence some load balancing must take place. A cell being routed vertically downwards, can be routed either to the left or to the right Banyan plane in the next lower level in the tree structure. The choice depends on the position of the switching element in the plane where conflict has occurred. The N/2 switches of each stage of every Banyan plane are numbered from 0 to N/2 - 1. The outputs  $O_{v0}$  ( $O_{v1}$ ) of all even (odd) numbered switches of a given Banyan plane are connected the left (right) Banyan plane of the next lower plane. This static strategy, which does not require additional hardware, helps achieve load balancing in terms of distributing traffic evenly to the Banyan planes in the tree structure. Thus, the routing algorithm used by the switch achieves two objectives:

- Cell loss occurs only at the vertical outputs of the lowest layer, and
- forwarding the cell vertically downward instead of misrouting it, preserves the routing achieved up to that stage. In other terms, the cell continues routing from the same stage in the next level of Banyan.

From second layer onwards, there are no cells at the horizontal inputs in the first stage. Hence, there can be no cells forwarded vertically downward at the first stage. Thus, from level three onwards, one stage reduces at each level for all the Banyan planes in that level. Each Banyan network, from layer two onwards, is made up of two types of switching elements. The first stage consists of  $1 \times 2$  *Demultiplexer* (DM) (see Figure 2-b), while the other stages consist of  $3 \times 4$  switching elements (SW).

At each level, the SWs in the last stage of each Banyan have the capability to route two packets to the same output as shown by the dual outputs in Figure 1. Thus, if two or more cells arrive at a switching element in the last stage destined to the same output, two cells can be routed to the correct output in that level itself. This reduces the number of cells traveling to the lower layers and hence reduces cell loss. The outputs of the Banyan networks in all the levels are connected to the corresponding output buffer. Appendix A presents the complexity analysis of PTBSF, TBSF, and PBSF hardware and their interconnection.

### 2.1 Switching delay characteristic

In the following, we analyze the switching delay in each of PTBSF, PBSF, and TBSF. The switching element SW in PTBSF is a  $3 \times 4$  switch which indicates that its state is a boolean function of the form  $ST_{PTBSF} = \sum d_2 d_1 d_0 pr_2 pr_1 pr_0$ , where  $d_2$ ,  $d_1$ , and  $d_0$  are the destination bits of the incoming cells and  $pr_2$ ,  $pr_1$ , and  $pr_0$  are the corresponding cell priorities. Using

dual input gates, the number of gate levels  $(n_{gate})$  required to implement a sum of product  $\Sigma b_1.b_2....b_v$  is  $n_{gate} = log_2(2^{n_v} \times n_v)$ , where  $n_v$  is the number of boolean variables per *minterm*. Evaluating the state of a switch in the PTBSF requires  $log_2(2^6 \times 6)$  gate delays which gives 8 gate levels. Also, one latch is needed to store the state. The state of the  $4 \times 4$  switch of the PBSF is of the form  $ST_{PBSF} = \Sigma d_2 d_1 d_0 pr_2 pr_1 pr_0$  for each pair of horizontal and vertical outputs, which indicates that 8 gate levels and one latch are needed for each pair. The longest path from switch input to output buffers in each of PTBSF and PBSF consists of n + L - 1 switching elements when there are  $N = 2^n$  inputs and L levels. Therefore, the approximate maximum switching delay of PTBSF and PBSF is,

$$T_{PTBSF} = T_{PBSF} = (n + L - 1)(8T_{gate} + T_{latch})$$

$$\tag{1}$$

where  $T_{gate}$  and  $T_{latch}$  are the gate and latch delays, respectively.

Finally, the state of the  $2 \times 2$  switch of TBSF is of the form  $ST_{TBSF} = \Sigma d_1 d_0 pr_1 pr_0$  which indicates that  $log_2(2^4 \times 4) = 6$  gate levels and one latch are needed. The longest path from the switch input to output buffers for TBSF with L Banyans in tandem consists of nL switching elements. Therefore, the delay in switching TBSF is,

$$T_{TBSF} = nL(6T_{gate} + T_{latch}) \tag{2}$$

Therefore, the switching delay of PTBSF (as well as that of PBSF) grows linearly with n + L, whereas the delay of TBSF grows with the product  $n \times L$ . Hence, the delay performance of TBSF degrades much more rapidly with the switch size than that of PTBSF.

For example, let us assume for simplicity that a gate or latch delay is equal to one time unit in the Equations 1 and 2. Using the same examples as above, to achieve a cell loss of  $10^{-6}$  under uniform traffic with N = 64, we must have L = 10 for TBSF [13] and L = 5(or 6.2 effective SW banyans) for PTBSF (Figure 5). Under these conditions,  $T_{TBSF} = 360$ and  $T_{PTBSF} = 80$  gate delays if one excludes the latch delay. If we increase N to 256, then  $T_{TBSF} = 576$  and  $T_{PTBSF} = 104$  gate delays. Hence, the switching delay of PTBSF is much smaller due to its parallel banyan structure.

#### 2.2 Features of the PTBSF

Below, we contrast our architecture (PTBSF) with the tandem architecture (TBSF) and the piled architecture (PBSF).

There are several advantages in arranging Banyan networks in a parallel tree structure.

- 1. In TBSF, once a cell is misrouted at some stage due to a conflict, it has to be routed from first stage onwards in the next Banyan in series. In contrast, in PTBSF, when a conflict occurs at some stage, one of the conflicting cells is routed vertically downward to the corresponding switching element in a Banyan in the next lower level. Therefore, the vertically forwarded cell continues being routed from the same stage in the next level of Banyan. Hence, the cell routing delay and delay jitter are much smaller in the PTBSF than in the TBSF (see subsection 2.1).
- 2. In PTBSF, there is no cell loss except in the lower layer contrary to the PBSF where the loss can occur at any layer except the first. The throughput of the PBSF reaches

some saturation because losses in the upper layers cannot be compensated by adding additional levels. But in PTBSF, adding more layers leads to a considerable improvement in throughput, which is an indicator of good performance scalability as opposed to the saturation effect of the PBSF.

3. In the PTBSF, there is no pre-processing delay and the cells are switched *on the fly*. Unlike TBSF, there is no activity and conflict bit processing which is required in order to distinguish correctly routed cells from misrouted cells within the switching elements as well as at the output of each Banyan. The decision to route a cell to a particular output in the PTBSF is based only on the destination and priority bits of the given cell.

In the following sections, we shall provide analytical and simulation results in support of the above discussion.

### **3** Evaluation under uniform traffic

In this section we study the cell loss probability of PTBSF, TBSF, and PBSF under uniform traffic pattern. We assume a time slotted synchronized operation of the switch, where the slot size is greater than or equal to the switch processing time. Cells must be synchronized and aligned with the local slot boundaries before being routed by the switch. The workload assumptions are as follows. All switch inputs are identical and independent. In every time slot, each input in the first stage of the first level has a probability p of having a cell and 1-p of having no cell. We also assume that the activity level at the various inputs are independent in time as well as in space. The cell destinations are uniformly distributed over all the switch outputs.

In the following subsections, we provide analytical expressions for the cell loss probability, which we also validate by simulation results. The results are obtained while adopting an Omega topology for the Banyan networks. In an *n*-stage Omega network, the  $X_{in}$ th output link from any stage is connected to the  $X_{out}$ th input of next stage such that  $X_{out} = \sigma(X_{in})$ , where  $\sigma()$  is the perfect shuffle permutation defined by  $\sigma(x_{n-1}x_{n-2}\ldots x_0) = x_{n-2}\ldots x_0x_{n-1}$ .

### 3.1 Analytical model of the PTBSF

We derive analytical expressions for the cell loss probability of the PTBSF, PBSF and TBSF, assuming uniform traffic at the switch input. We refer the reader to the switches shown in Figure 2.

For the case of PTBSF, assume a cell is issued into either inputs  $I_0$  or  $I_1$  of an SW-switch with a probability p, and that the probability of a cell on the vertical input  $I_v$  is q. Since the SW-switch has three inputs, we analyze the following cases:

- 1. There is one cell at the horizontal input  $I_0$  or  $I_1$  with probability p(1-p)(1-q) or at the vertical input  $I_v$  with probability  $(1-p)^2 q$ , or
- 2. there are two cells at  $I_0$  and  $I_1$  with probability  $p^2(1-q)$  or at  $I_0$  (or  $I_1$ ) and  $I_v$  with the probability p(1-p)q, or

| Switch | Cells       | $P(O_0)$ or $P(O_1)$               | P(O <sub>VO</sub> )            | P(O <sub>v1</sub> ) | P(loss)            |
|--------|-------------|------------------------------------|--------------------------------|---------------------|--------------------|
| PTBSF  | One cell    | (1-p)(1-q)p+(1-p) <sup>2</sup> q/2 | 0                              | 0                   | 0                  |
|        | Two cells   | 3(1-q)p <sup>2</sup> /4+3(1-p)pq/4 | (1-q)p <sup>2</sup> /2+(1-p)pq | 0                   | 0                  |
|        | Three cells | 7p <sup>2</sup> q/8                | p <sup>2</sup> q               | p <sup>2</sup> q/4  | 0                  |
| TBSF   | One cell    | р(1-р)                             | -                              | -                   | 0                  |
|        | Two cells   | 3p <sup>2</sup> /4                 | -                              | -                   | p <sup>2</sup> /2  |
| PBSF   | One cell    | (1-p)(1-q)p+(1-p) <sup>2</sup> q   | 0                              | 0                   | 0                  |
|        | Two cells   | 3(1-q)p <sup>2</sup> /4+2(1-p)pq   | (1-q)p <sup>2</sup> /4+(1-p)pq | P(O <sub>vo</sub> ) | 0                  |
|        | Three cells | p <sup>2</sup> q                   | 3p <sup>2</sup> q/4            | 3p <sup>2</sup> q/4 | p <sup>2</sup> q/4 |

Figure 3: Analytical models under uniform traffic



Figure 4: Computation of the analytical model for PTBSF

#### 3. there are three cells with probability $p^2q$ .

The probability that a cell exits the switch at a given output (passthrough) among  $O_0$ ,  $O_1$ ,  $O_{v0}$ , and  $O_{v1}$  is shown in Figure 3 for an arbitrary SW-switch. For a simple demultiplexer (Figure 2-b), if a cell is issued at the input  $I_v$  with probability q, the probability that the cell exits at either outputs  $O_0$  or  $O_1$  is q/2. This model allows propagating the passthrough probability in the horizontal direction and finding the probability of the cell being routed to the next lower level through the vertical outputs (see Figure 4). The probabilities on the vertical outputs at a particular level are used as inputs to the Banyans of the next lower level. The loss probability in the PTBSF is the sum of all probability of losses occurring at the outputs  $O_{v0}$  and  $O_{v1}$  of all SW-switches located in the lowest level of the PTBSF as shown in Figure 4.

The cell loss probabilities of TBSF and PBSF can be similarly derived and are summarized in Figure 3. Note that P(loss) is the probability of cell loss for PTBSF and PBSF, but the same denotes the probability of misrouting for TBSF.



Figure 5: Loss rate in PTBSF at full load (Simulation)

#### **3.2** Analytical and simulation results

Figure 5 shows the cell loss rate obtained by simulation of the PTBSF as function of the effective number of SW-Banyans for various values of N at full load (p = 1). A loss rate of  $10^{-6}$  that is required for most ATM traffic sources is achievable for all the considered ( $32 \le N \le 1024$ ) switch sizes. To obtain a cell loss rate of  $10^{-6}$ , the number of effective SW-Banyans should be 7 (5-level) for N = 32 and 20 (6-level) for N = 1024. Figure 6 shows the cell loss rate obtained by varying the load p for N = 1024.

Figure 7 shows the cell loss rate results obtained from the analytical model (represented by lines) and from simulation (points) for the PTBSF as a function of the effective number of SW-Banyans and for some values of N at full load (p = 1). The analytical results deviate from the simulation results with increasing number of levels. We believe that the deviation is due to the fact that the analytical model ignores correlation among the cells as well as the effect of load balancing of contending cells over distinct Banyans of successive levels. Correlation has the effect of increasing contention between the cells. On the other hand load balancing reduces contention.

### 3.3 Comparing PTBSF to PBSF

The scalability of the PTBSF performance is another important feature. The cell loss rate sharply decreases when we increase the hardware resources, that is the effective number of SW-Banyans required to achieve additional levels. Each additional Banyan level decreases the cell loss rate by at least an order of magnitude.

Let  $m_i(N)$  be the probability that a cell is vertically routed (PTBSF) or misrouted (TBSF) for a switch with N input ports and *i* effective SW - Banyans. This also corresponds to the switch cell loss probability. We evaluated these probabilities for different values of N and *i* for the PTBSF, TBSF, and PBSF. Figure 8 shows the variation of the ratio  $m_i(N)/m_{i+1}(N)$ 



Figure 6: Loss rate in PTBSF at different loads for N = 1024 (Simulation)

versus the number of effective SW - Banyans, for the TBSF, PBSF, and PTBSF. Figure 8 clearly indicates that cell loss decreases at a much faster rate for PTBSF than for PBSF, the other parallel switch. This performance gain comes with some additional hardware cost as discussed in subsection 2.1. For example, with N = 64 the cell loss rate decreases from  $10^{-4}$  to approximately  $10^{-6}$  when the effective number of SW-Banyans increases from 7 to 9. Hence, the PTBSF is one *parallel architecture* that overcomes the saturation problem that traditionally occurs in other proposed parallel switch architectures such as the PBSF. For example, the cell loss rate in the *piled Banyan switching fabric* (PBSF) remains at about  $10^{-2}$  even when we increase the number of levels beyond 4. This remains true even when the load is as low as p = 0.2. For a *piled Banyan switching fabric* with N = 1024, the saturation effect occurs at a loss rate of about  $10^{-2}$  for p = 1 and at about  $10^{-4}$  for p = 0.2 [14]. The saturation effect limits the scalability of switches such as the PBSF, since the use of additional levels does not result in any noticeable increase in the throughput.

In the PBSF cell loss may occur at any level. Therefore, increasing the hardware resource does not necessarily guarantee increase in the throughput. The reason is that addition of more hardware at the lowest level cannot compensate for the cell loss at the upper levels. Note that a cell loss occurs in the PBSF whenever three cells compete for one output of some switching element. This saturation effect appears at various switch sizes as shown on Figures 9, 10, and 11. In contrast, for PTBSF, the cell loss occurs only at the lowest level because in all levels, except the lowest, three cells competing for the output of a switching element do not result in any cell loss. The addition of one more level in PTBSF guarantees a throughput increase.

### 3.4 Comparing PTBSF to TBSF

In TBSF, when a conflict occurs within a switching element one cell is correctly routed and the other is misrouted. To distinguish misrouted cells from other cells an *activity bit* is appended



Figure 7: Loss rate in PTBSF at full load (Analytical(lines) and Simulation(points))

to the cell header. At the output of each Banyan, correctly routed cells are removed and stored into their corresponding output buffers, and misrouted cells are input to the next Banyan.

We have carried out testing of the TBSF under uniform traffic pattern and by using analytical modeling for N ranging from 32 to 1024. Figures 9, 10, and 11 show that both TBSF and PTBSF exhibit acceptable cell loss rates ( $10^{-6}$  or lower) as required by most ATM traffic sources.

In PTBSF, simpler hardware is needed at the Banyan outputs because we only need to check whether an output carries a cell or not prior to routing it to the corresponding output buffer. However, the switching element in the PTBSF requires more hardware than that of the TBSF. Moreover, most of the complexity of the PTBSF is in the additional required wiring, namely, in the vertical cell forwarding.

To avoid potential out-of-sequence delivery and excessive delay jitter in the TBSF, correctly routed cells at the output of intermediate banyans are delayed until completion of routing through the last banyan. The cell arrival time to the buffer becomes  $n \times L \times \delta$ , where  $\delta$  is the delay of a 2×2 switch. To keep the loss rate below 10<sup>-6</sup> under full load, the value of L should be at least 14, which means that the TBSF delay is 14 times the propagation delay of one Banyan switch. In general, for reasonable values of n and L, the switching delay of TBSF is about five times to one order of magnitude larger than that of PTBSF (see subsection 2.1).

Though both PTBSF and TBSF achieve a loss rate in the order of  $10^{-6}$  or less in all studied cases, the PTBSF requires less hardware when the switch size is below N = 128 as shown on Figures 9, 10, and 11. One can see that there is a crossover when comparing the required hardware cost (number of Banyans) for both PTBSF and TBSF to achieve a loss rate on the order of  $10^{-6}$ . The crossover occurs for N = 256. The plots indicate that TBSF requires less hardware than PTBSF for switch sizes of N = 256 and larger.



Figure 8: Cell loss ratio at different levels in the three architectures.

#### **3.5** Community of interest traffic

Sometimes the traffic pattern encountered by an ATM switch may be such that the contention between the cells is mainly due to internal congestion rather than output conflicts. This corresponds to the *communities of interest* traffic in which all cells entering the first M inputs (for example) are destined to the first M outputs, all cells entering the second group of M inputs are destined to the second group of M outputs and so on as shown in Figure 12.

In this traffic scenario, the input-output pairs that cause internal conflicts, due to their paths sharing the same switch interconnection links, constitute the "communities". This traffic pattern has been studied in [13]. In this section we will present the cell loss behavior in the three architectures under this kind of traffic.

We simulated the performance of the three architectures under a communities of interest traffic, with a group size equal to four. The results are shown in Figures 13 and 14 for switch sizes N = 32 and N = 64 respectively. As can be seen from the figures, the cell loss rate saturates for PBSF at about 0.25 for N = 32 and at 0.35 for N = 64. The performance of PBSF is particularly bad because the communities of interest traffic pattern causes several instances of three arriving cells at a switching element inputs requesting the same output. This leads to increased cell loss in PBSF. In contrast, PTBSF gives an excellent performance under this scenario, because it can route all three arriving cells at a switching element input, while balancing among switches of the lower level the conflicting cells. Thus, in PTBSF a loss rate of  $10^{-6}$  is achieved with just three effective SW-Banyans (3 levels) for N = 32 and 5.16 effective SW-Banyans (4 levels) for N = 64. For TBSF eight effective SW-Banyans are required for N = 32 and nine effective SW-Banyans are required for N = 64. Therefore, under communities of interest traffic, PTBSF also exhibited better cell loss performance than TBSF and PBSF.



Figure 9: Cell Loss rate in PTBSF, TBSF and PBSF under uniform traffic for N = 32 at full load (Simulation).



Figure 10: Cell Loss rate in PTBSF, TBSF, and PBSF under uniform traffic for N = 256 at full load (Simulation).



Figure 11: Cell Loss rate in PTBSF, TBSF, and PBSF under uniform traffic for N = 1024 at full load (Simulation).



Figure 12: Maximum internal congestion in a  $16 \times 16$  Banyan with group size M = 4.



Figure 13: Cell loss performance of TBSF, PBSF, and PTBSF when subjected to a *community* of interest traffic for N = 32 at full load at the inputs.

# 4 Performance Analysis under ATM Traffics

In order to check whether a particular switch architecture is suitable for broadband ATM or not, one has to observe the performance of the switch when subjected to the type of traffic expected in ATM networks. A realistic traffic mix will subject the switch to a genuine workload. A realistic workload has several desirable characteristics from a testing point of view:

- 1. Traffic load will be from a variety of sources, bursty and non-bursty.
- 2. The traffic sources will exhibit correlation in space as well as in time. This correlation will be due to several factors such as, burstiness of the sources, and the pairing of input ports with the output ports. Such correlation will last for the entire duration of the connections, not just for one burst.

ATM networks are expected to support a wide range of traffic sources. There is general agreement that the various traffic sources to be serviced by ATM networks belong to three classes:

- 1. Constant Bit Rate (CBR) sources. These consists namely of voice and some video sources, e.g. telephony, voice mail, or some telemedecine applications. CBR sources require a sustained amount of bandwidth, low latency, and a low cell delay jitter.
- 2. Variable Bit Rate (VBR) sources. Sources of this type are mainly the result of multimedia applications.
- 3. Available Bit Rate (ABR) sources. These mainly include computer related data sources, such as file transfer, electronic mail, terminal emulation, or LAN emulation over ATM.



Figure 14: Cell loss performance of TBSF, PBSF, and PTBSF when subjected to a *community* of interest traffic for N = 64 at full load at the inputs.

These sources do not require any specific bandwidth and have widely varying latency requirements and cell delay jitter.

### 4.1 Traffic Source Models

Traffic source characterization has been an extensive area of research[15]. The essential characteristics of a particular traffic source are all those parameters that are needed to completely characterize the randomness in the source. Such characteristics are important during the negotiation of required quality of service (QoS) at call setup. The essential traffic parameters of a source are used to develop a traffic model for the source. Traffic source models are important not just during the negotiation of QoS, but are also important for admission control, traffic policing, and during traffic control. They are also needed to predict the generation times of cells during the simulation of that source.

During the lifetime of a virtual connection, the corresponding traffic source will be in one of two states, *active* or *idle*. During the active state, the source is transmitting cells at some given rate. Depending on the type of source, each active state may be followed by an idle period during which the source is silent. This model is known as the *ON-OFF model*. There is a general belief that, with the exception of CBR sources, all other traffic sources exhibit this cyclic behavior (see Figure 15). For CBR-sources, there is no idle period.

There are other source models as well, such as the generally modulated Deterministic Process Model (GMDP) or the Markov Modulated Poisson Process Model (MMPP)[15]. However, the ON-OFF model is the least complex and is the most widely used by researchers to model ATM traffic sources. Furthermore, this basic ON-OFF model is flexible enough to accommodate most of the existing traffic sources with a reasonable accuracy.

The cells generated during the same ON-period form a *burst*. Furthermore, it is always



Figure 15: On-Off source model.

assumed that successive active and idle periods are statistically independent. As suggested by ITU-T (formerly known as CCITT), the length of the active period as well as that of the idle period are exponentially distributed, with average lengths  $\frac{1}{a}$  and  $\frac{1}{b}$  respectively (see Figure 15).

Several parameters have been identified, which together, completely characterize an ON-OFF traffic source. These are,

- p: Peak cell arrival rate. This is the cell arrival rate when the source is in the ON state that is p = 1/T, where T is the time between two consecutive cell arrivals during the ON period (see Figure 15).
- *m*: Average cell arrival rate. This is the cell arrival rate over the entire lifetime of the connection of the source that is

$$m = p \times \frac{a^{-1}}{a^{-1} + b^{-1}}$$

 $\beta$ : Traffic burstiness. A large value for this parameter indicates a very bursty source. The burstiness is

$$\beta = \frac{p}{m} = 1 + \frac{a}{b}$$

 $t_{on}$ : Average duration of active state. This average is computed over the entire lifetime of the connection.  $t_{on} = a^{-1}$ 

Equivalently, a traffic source can be characterized by the following three traffic parameters [15]:

- $R_p$ : Peak cell arrival rate.  $R_p = p = \frac{1}{T}$ .
- B: Average number of cells in a burst.  $B = \frac{1}{aT}$ .
- $T_i$ : Average time between two consecutive bursts.  $T_i = a^{-1} + b^{-1}$ .

Typical values [15, 16] for the traffic parameters for the various traffic source types are summarized in Table 1.

In our simulation study, we assume that the following three parameters are known about each source: (1) the average bit rate, (2) the mean burst length B in cells, and (3) the burstiness factor  $\beta$ . As recommended by ITU-T, we assume also that the active and idle periods are exponentially distributed with parameters a and b respectively. The parameters aand b, as well as other parameters (such as T and p) can all be calculated from the knowledge of m, B, and  $\beta$ . Next, we illustrate this calculation with an example.

| Type of Source           | B in      | Average bit rate     | Burstiness      | Cell Loss               |
|--------------------------|-----------|----------------------|-----------------|-------------------------|
|                          | cells $B$ | $m \times 384 \ bps$ | $\beta$         | Tolerance               |
| CBR                      | N/A       | 64 Kbps              | 1               | $10^{-4}$ to $10^{-6}$  |
| Connectionless data      | 200       | 700 Kbps             | as high as 1000 | $10^{-12}$              |
| Connection oriented data | 200       | 25 Mbps              | as high as 1000 | $10^{-12}$              |
| VBR video                | 2         | 25 Mbps              | 2 to 5          | $10^{-10}$              |
| Background data/video    | 3         | 1 Mbps               | 2 to 5          | $10^{-9}$ to $10^{-10}$ |
| VBR video/data           | 30        | 21 Mbps              | 2 to 5          | $10^{-9}$               |
| Slow video               | 3         | 6 Mbps               | 25              | $10^{-12}$              |

Table 1: Various traffic source types with their characteristic parameters.

### Example

Assume that a VBR Video/Data source has an average bit rate of 21 *Mbps*, an average number of cells in a burst B = 30, and a burstiness  $\beta = 5$ . Then the peak bit rate is  $5 \times 21$  *Mbps* = 105 *Mbps*. The payload of each cell is 384 bits. Hence,  $T = \frac{384}{105} \times 10^{-6} = 3.657 \mu s$ . Now, knowing the value of *B* and *T* and using the relation  $B = \frac{1}{aT}$ , one can readily obtain the value of  $a^{-1}$ . For this example,  $a^{-1} = BT = 109.71 \mu sec$ . Finally, using the relation  $\beta = 1 + \frac{a}{b}$ , one can also find that  $b^{-1} = (\beta - 1)a^{-1} = 438.84 \mu sec$ .

The parameters  $a^{-1}$ ,  $b^{-1}$  and T are all what is needed to simulate the cell generation from an ON-OFF source with exponentially distributed active and idle periods.

#### 4.2 Performance analysis under ATM traffic conditions

In this section, we study the cell loss characteristics of the TBSF, PBSF and PTBSF switches when subjected to a variety of ATM traffic mixes. We implemented a program, which simulates any of the three switches. The simulator takes as input the switch type (TBSF, PBSF, or PTBSF), the number of inputs (N), and the number and types of each traffic source. The simulator has the capability of generating traffic according to any of the sources listed in Table 1. All VBR sources follow the aforementioned ON-OFF model, where each VBR source is characterized by three parameters: its average cell rate m, the average number of cells per burst B, and the burstiness factor  $\beta$ .

It is impractical to simulate the three Banyan switches for all possible traffic mixes. Instead, we limited ourselves to some of the typical workloads expected by an ATM switch. We experimented with the following typical traffic mixes.

- Traffic Mix 1: The sources consist of 20% Video, 50% Voice, and 30% Data. The destinations are selected with equal probabilities.
- *Traffic Mix 2:* 40% of the sources are VBR Data/Video, 20% Voice, 20% Connectionless Data, and 20% Connection-Oriented Data. The destinations are selected with equal probabilities.

- Traffic Mix 3: The sources are as in Traffic Mix 2, but with output concentration, where only odd (respectively even) numbered output ports are selected.
- Traffic Mix 4: The sources are as in Traffic Mix 3. Here also we perform output concentration, but this time only either the upper (respectively lower) output ports are selected.

Traffic-2 subjects the switch to a much larger workload than that generated by Traffic-1. The reason is that the second traffic mix has a higher percentage of burstier sources with larger bandwidth requirements. The change in performance behavior from Traffic-1 to Traffic-2 will expose the sensitivity of the switch to the workload. Traffic-3 and Traffic-4 generate as high a workload as Traffic-2, with the important difference that the level of internal (internal links) as well as external (output ports) contention is increased.

We simulated the three switch architectures under the above traffic mixes. Below, we summarize our findings.

#### Comparison of TBSF, PBSF, and PTBSF

Figures 16 and 20 show the variation of cell loss rate as a function of the effective number of SW-Banyans for TBSF, PBSF, and PTBSF. The figures correspond to N=32 and N=64 respectively, and a workload of type *Traffic-1*. Our PTBSF switch exhibited superior performance (lower loss rates) than the other two switches. PBSF exhibited the poorest behavior, with the cell loss rate remaining at about  $10^{-6}$  from level 3 and onward. Such cell loss rate is not tolerated by most ATM traffic sources. Hence PBSF is not a suitable architecture for the fabric of ATM switches. Figures 16 and 20 illustrate another important fact. When the switch size is doubled from N=32 to N=64, we observed the cell loss rate has almost doubled for TBSF. On the other hand, both PBSF and PTBSF were not as sensitive to changes in N, and exhibited almost the same cell loss rates for both values of N. This observation requires further experimentation<sup>1</sup>.

#### Effect of the workload on cell loss in TBSF, PBSF, and PTBSF

Figures 20-23 show the cell loss performance of TBSF, PBSF, and PTBSF for N=64 and for *Traffic-1, Traffic-2, Traffic-3*, and *Traffic-4* respectively. We observe that among the three switches, TBSF is the least sensitive to a change in the traffic mix. Both PBSF and PTBSF are sensitive to a change in the workload. However, we noticed that the sensitivity of PTBSF to the workload becomes less and less noticeable as we increase the number of levels (the effective number of SW-Banyans).

Figures 22 and 23 correspond to the traffic mix with odd/even and upper/lower output concentration respectively. Both traffic mixes generate more internal collisions than traffic with no output concentration. The reader should be able to observe that for all three switches the cell loss performance for the lower/upper output concentration is noticeably worse than the case of odd/even output concentration. The reason is that lower/upper output concentration

<sup>&</sup>lt;sup>1</sup>We could not simulate the three switches for larger values of N because of excessively large run times



Figure 16: Cell loss ratio versus the effective number of SW-Banyans for N=32inputs, with 50% voice sources, 30% data, and 20% video. The destinations have equal probabilities of being selected

generates much more internal collisions than the odd/even output concentration. The reader should also note that PTBSF still exhibits superior performance than TBSF or PBSF for these traffic mixes.

All figures show the cell loss rate as a function of the effective number of SW-Banyans varying from 1 to 5. With the PBSF switch, we always reach saturation for small values of the effective number of SW-Banyans ( $\leq 3$ ). For the PTBSF switch, the simulator indicates a cell loss rate equal to zero when the effective number of SW-Banyans is greater than or equal to 5. For a TBSF switch with 5 levels, the cell loss rate varied between  $5 \times 10^{-6}$  (*Traffic-3/4*) and zero (*Traffic-1/2*).

### 5 Conclusion

In this paper we have proposed a fast space-division switch architecture that consists of parallel Banyan structure (levels) arranged in a tree topology. Cell routing and contention resolution are done in a completely distributed manner without internal buffering, and there is no cell pre-processing prior to cell switching. The proposed switch architecture avoids cell loss except at the lowest level. Therefore, scaling up the switch hardware provides as low cell loss rate as required for a variety of switch sizes. Though the vertical forwarding increases the switch complexity for VLSI implementation, the cell processing, delay jitter, and overall switching time are substantially lower compared to other proposals. The architectural features are, guaranteed in-sequence delivery due to low delay jitter, distributed control, self-routing, on-the-fly switching, and multiple-outlets.

The switch has been analytically evaluated under uniform traffic load and compared to two other Banyan-based architectures: the *piled Banyan switching fabric* and *tandem Banyan* 



Figure 17: Cell loss ratio versus the effective number of SW-Banyans for N=32inputs, with 20% voice sources, 20% connection-oriented data, and 20% connectionless data and 40% VBR Video/Data. The destinations have equal probabilities of being selected.

*switching fabric.* To evaluate the proposed switch under realistic ATM traffic, we used the on-off traffic source model in generating a number of ATM traffic mixes.

The proposed switch has higher performance than other proposed architectures in terms of cell loss rate and switching delay under both ATM and uniform traffic, especially for smaller switch sizes. Performance of the proposed switch compares favorably with other proposed switches with respect to cell loss and switching delay, for all studied traffic conditions. The advantages of the proposed switch are modularity, regularity, self-routing, low processing, and robustness under a variety of ATM traffic loads.

Because of its very simple routing the PTBSF requires relatively more interconnections than other switches like TBSF and PBSF. Future research may address the issue of searching 3-D switch architectures with more *link sharing*, which increases complexity of control, as one approach to reduce link complexity.

# References

- [1] Fouad A. Tobagi. Fast packet switch architectures for broadband integrated services digital network. *Proceedings of the IEEE*, 78(1):133–167, Jan. 1990.
- [2] A. Saha and M. D. Wagh. Performance analysis of banyan networks based on buffers of various sizes. *IEEE Proc. INFOCOM'90*, 1:157–164, 1990.
- [3] Y. Oie et al. Effect of speedup in nonblocking packet switch. Proc. Inter. Conf. on Communication, 2:410–414, 1989.



Figure 18: Cell loss ratio versus the effective number of SW-Banyans for N=32inputs, with 20% voice sources, 20% connection-oriented data, and 20% connectionless data and 40% VBR Video/Data. Output concentration, where only either odd or even numbered destinations are selected.

- [4] K. Shiomoto et al. Performance evaluation of cell bypass queueing discipline for buffered banyan type ATM switches. Proc. IEEE INFOCOM'90, 2:677–685, 1990.
- [5] K. E. Batcher. Sorting networks and their applications. AFIPS Proc. 1968 Spring Joint Computer Conf., 32:307–314, 1968.
- [6] Reza Rooholamini, Vladimir Cherkassky, and Mark Garver. Finding the right ATM switch for the market. *IEEE Computer*, 27(4):17–28, Apr. 1994.
- [7] A. Huang and S. Knauer. Starlite: A wideband digital switch. Proc. GLOBECOM 84, Atlanta, GA, pages 121–125, 1984.
- [8] J. Y. Hui and K. W. Cheung. A broadband packet switch for integrated transport. *IEEE J. Selected Areas in Communications*, 5(8):1264–1273, Oct. 1987.
- [9] B. Bingham and H. Bussey. Reservation-based contention resolution mechanism for batcher-banyan packet switches. *Electron. Lett.*, 24:772–773, 1988.
- [10] J. Giacopelli et al. Sunshine: A high performance self-routing broadband packet-switch architecture. *IEEE J. Selected Areas in Communications*, pages 1289–1298, Oct 1991.
- [11] J. S. Turner. New directions in communications (or which way to the information age?). *IEEE Commun. Mag.*, 24(10):8–15, Oct. 1986.
- [12] P. C. Wong and M. S. Yeung. Design and analysis of a novel fast packet switch-Pipeline Banyan. *IEEE/ACM Transactions on Networking*, 3(1):63–69, Feb. 1995.



Figure 19: Cell loss ratio versus the effective number of SW-Banyans for N=32inputs, with 20% voice sources, 20% connection-oriented data, and 20% connectionless data and 40% VBR Video/Data. Output concentration, where only either upper or lower destinations are selected.

- [13] Fouad A. Tobagi, Timothy Kwok, and Fabio M. Chiussi. Architecture, performance, and implementation of the tandem Banyan fast packet switch. *IEEE J. Selected Areas in Communications*, 9(8):1173–1193, Oct. 1991.
- [14] Toshihiro Hanawa et al. Multistage interconnection networks with multiple outlets. 1994 International Conference on Parallel Processing, I:1–8, 1994.
- [15] G. D. Stamoulis, M. E. Anagnostou, and A. D. Georgantas. Traffic source models for ATM networks: a survey. *Computer Communications, Butterworth-Heinemann Ltd*, 17(6):428–438, June 1994.
- [16] Ken Dubose and Hyong S. Kim. An Effective Bit Rate/Table Lookup Based Admission Control Algorithm for the ATM B-ISDN. Proceedings of the 17<sup>th</sup> Conference on Local Computer Networks, Minneapolis, pages 20–29, Sept. 1992.

#### Appendix: Hardware resource requirement

A PTBSF with L layers has  $2^{L-1}$  Banyan planes. However, the basic element of the first stage for a Banyan at layer  $l \ge 2$  is a  $1 \times 2$  demultiplexer DM (see Figure 2-b). In general, each Banyan of Layer  $l, l \ge 2$ , has one stage that consists of simple DM switches, and n-l+1 stages made of SW's. A DM has much less hardware than an SW. Therefore, we shall be using the number of SW's as a hardware complexity metric of the PTBSF switch. The following



Figure 20: Cell loss ratio versus the effective number of SW-Banyans for N=64 inputs, with 50% voice sources, 30% data, and 20% video. The destinations have equal probabilities of being selected

Lemma gives the number of SW-stages and DM-stages required for a PTBSF with  $2^n$  inputs and L layers.

**Lemma 1** A PTBSF with  $N = 2^n$  inputs and L layers requires  $S_{PTBSF} = (n - L + 2)2^{L-1} - 1$ SW-stages and  $2^{L-1} - 1$  DM-stages.

**Proof** There are one DM-stage and n - l + 1 SW-stages in each of the  $2^{l-2}$  Banyans of layer l, where  $2 \leq l \leq L$ . Therefore, the total number of DM-stages in the PTBSF is  $S_{DM}(L) = \sum_{l=2}^{L} 2^{l-2}$  that is  $2^{L-1} - 1$ . The number of SW-stages in the PTBSF is

$$S_{SW}(L) = n + \Sigma_{l=2}^{L}(n-l+1)2^{l-2}$$

 $S_{SW}(L)$  can be written as  $n + n \sum_{l=2}^{L} 2^{l-2} - \sum_{l=2}^{L} (l-1) 2^{l-2}$ . Therefore, the expression of  $S_{SW}(L)$  can easily be rewritten as

$$S_{SW}(L) = n + n(2^{L-1} - 1) - (L-2)2^{L-1} - 1$$

which simplifies to  $S_{SW}(L) = (n - L + 2)2^{L-1} - 1.$ 

From the structure of a TBSF with  $N = 2^n$  inputs and L Banyans in tandem, the number of *SW-stages* is  $S_{TBSF} = nL$ .

For a PBSF with  $N = 2^n$  inputs and L layers, the number of SW-stages in level l is equal to n - l + 1. Therefore, the overall number of SW-stages in all L levels is equal to  $S_{PBSF} = \sum_{l=1}^{L} (n - l + 1)$ , which simplifies to  $S_{PBSF} = L(2n - L + 1)/2$ .

The overall number of SW-switches for a particular switch architecture is equal to the product of the number of SW-stages by the number of SW-switches per stage. Obviously, for



Figure 21: Cell loss ratio versus the effective number of SW-Banyans for N=64inputs, with 20% voice sources, 20% connection-oriented data, and 20% connectionless data and 40% VBR Video/Data. The destinations have equal probabilities of being selected.

a Banyan switch with  $N = 2^n$  inputs, the number of elements (DM's or SW's) per stage is equal to  $2^{n-1}$ .

Since the SW-switch dominates the hardware, it is important to evaluate the effective number of n-stage Banyans constructed with SW-switches only. Such Banyans are referred to as *effective SW-Banyans*.

**Definition 1** The effective number of SW-Banyans  $E_X$  in a switch of type X, where  $X \in \{TBSF, PBSF, PTBSF\}$ , having L layers and  $2^n$  inputs is equal to the ratio of the total number of SW-stages  $S_X$  in the switch over the number of stages n per Banyan, that is

$$E_X(n,L) = \frac{S_X}{n}$$

Based on the above, the effective number of SW-Banyans in each of TBSF, PBSF, and PTBSF would be as follows:

$$E_{TBSF}(n,L) = L \tag{1}$$

$$E_{PBSF}(n,L) = \frac{(2n - L + 1)L}{2n}$$
(2)

$$E_{PTBSF}(n,L) = \frac{(n-L+2)2^{L-1}-1}{n}$$
(3)

Another important hardware feature is the total number of interconnections required in each of PTBSF, PBSF, and TBSF. For a PTBSF with  $N = 2^n$  inputs and L levels, one can easily prove that the total number of vertical interconnections is,



Figure 22: Cell loss ratio versus the effective number of SW-Banyans for N=64inputs, with 20% voice sources, 20% connection-oriented data, and 20% connectionless data and 40% VBR Video/Data. Output concentration, where only either odd or even numbered destinations are selected.

$$W_{PTBSF}^{v} = 2^{n-1} [2^{L-1}(n-L+3) - n - 2]$$
(4)

Equation 4 accounts for all interconnections among successive planes. Similarly, one can show that the total number of horizontal interconnections is,

$$W^{h}_{PTBSF} = 2^{n}(n+2^{L-1}) + 2W^{v}_{PTBSF}$$
(5)

Equation 5 simplifies to

$$W_{PTBSF}^{h} = 2^{n} [2^{L-1}(n - L + 4) - 2]$$
(6)

Equation 6 accounts for all interconnections between consecutive stages as well as Banyans outputs. Therefore, the overall number of required interconnections is  $W_{PTBSF} = W^h_{PTBSF} + W^v_{PTBSF}$  which simplifies to the following,

$$W_{PTBSF} = 2^{n-1} [2^{L-1} (3n - 3L + 11) - n - 6]$$
(7)

For TBSF, the overall number of required interconnections including the paths to the output buffers is simply,

$$W_{TBSF} = (n+1)L2^n \tag{8}$$

Finally, for a PBSF with  $N = 2^n$  inputs and L levels, one can also prove that the total number of horizontal and vertical interconnections is,

$$W_{PBSF} = 2^{n} [(L-2)(2n-L+1)+3n]$$
(9)



Figure 23: Cell loss ratio versus the effective number of SW-Banyans for N=64inputs, with 20% voice sources, 20% connection-oriented data, and 20% connectionless data and 40% VBR Video/Data. Output concentration, where only either upper or lower destinations are selected.

For example, to achieve a cell loss on the order of  $10^{-6}$  (see Figure 5) with N = 64the value of L must be 5 for the TBSF. Under the same conditions, the PTBSF requires 6.2 effective SW-Banyans which correspond to L = 5. In this case, the number of interconnections (Equation 8) required by TBSF is 2240 and the number required by PTBSF (Equation 7) is 6784, which is about 3 times more than TBSF. On the other hand, to achieve a cell loss of  $10^{-6}$  with N = 256 the value of L must be 12 for the TBSF. Under the same conditions, the PTBSF requires 15.88 effective SW-Banyans which correspond to L = 6. This leads to 27648 interconnections for TBSF and 67840 interconnections for PTBSF, i.e., in this case PTBSF requires about 2.45 times more than TBSF. For n = 10, L must be equal to 14 for TBSF and 6 for PTBSF to achieve a  $10^{-6}$  cell loss. In that case TBSF would require 157696 interconnections and PTBSF would require 368640, which is about 2.34 times more than TBSF. Therefore, it seems that PTBSF always require 2 to 3 times the number of interconnections required by TBSF.