+ All Categories
Home > Documents > Design and Evaluation of Network Processor Systems and

Design and Evaluation of Network Processor Systems and

Date post: 04-Feb-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
23
Design and Evaluation of Network Processor Systems and Forwarding Applications JING FU Licentiate Thesis Stockholm, Sweden, 2006
Transcript

Design and Evaluation of Network ProcessorSystems and Forwarding Applications

JING FU

Licentiate ThesisStockholm, Sweden, 2006

TRITA-EE 2006:054

ISSN 1653-5146

School of Electrical Engineering

KTH, Stockholm, Sweden

Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framlägges till

offentlig granskning för avläggande av licentiatexamen tisdag den 19 december 2006 i Sal

Q2.

© Jing Fu, December 2006

Tryck: Universitetsservice US AB

iii

Abstract

During recent years, both the Internet traffic and packet transmission rates have been growing

rapidly, and new Internet services such as VPNs, QoS and IPTV have emerged. To meet increas-

ing line speed requirements and to support current and future Internet services, improvements and

changes are needed in current routers both with respect to hardware architectures and forwarding ap-

plications. High speed routers are nowadays mainly based on application specific integrated circuits

(ASICs), which are custom made and not flexible enough to support diverse services. General-

purpose processors offer flexibility, but have difficulties to in handling high data rates. A number

of software IP-address lookup algorithms have therefore been developed to enable fast packet pro-

cessing in general-purpose processors. Network processors have recently emerged to provide the

performance of ASICs combined with the programmability of general-purpose processors.

This thesis provides an evaluation of router design including both hardware architectures and

software applications. The first part of the thesis contains an evaluation of various network proces-

sor system designs. We introduce a model for network processor systems which is used as a basis

for a simulation tool. Thereafter, we study two ways to organize processing elements (PEs) inside

a network processor to achieve parallelism: a pipelined and a pooled organization. The impact of

using multiple threads inside a single PE is also studied. In addition, we study the queueing behavior

and packet delays in such systems. The results show that parallelism is crucial to achieving high per-

formance, but both the pipelined and the pooled processing-element topologies achieve comparable

performances. The detailed queueing behavior and packet delay results have been used to dimension

queues, which can be used as guidelines for designing memory subsystems and queueing disciplines.

The second part of the thesis contains a performance evaluation of an IP-address lookup algo-

rithm, the LC-trie. The study considers trie search depth, prefix vector access behavior, cache be-

havior, and packet lookup service time. For the packet lookup service time, the evaluation contains

both experimental results and results obtained from a model. The results show that the LC-trie is an

efficient route lookup algorithm for general-purpose processors, capable of performing 20 million

packet lookups per second on a Pentium 4, 2.8 GHz computer, which corresponds to a 40 Gb/s link

for average sized packets. Furthermore, the results show the importance of choosing packet traces

when evaluating IP-address lookup algorithms: real-world and synthetically generated traces may

have very different behaviors.

The results presented in the thesis are obtained through studies of both hardware architectures

and software applications. They could be used to guide the design of next-generation routers.

v

AcknowledgementsI would like to thank my main advisor Professor Gunnar Karlsson, for believing in me and offering

me the PhD position in his lab. I would like to express my gratitude for his support, guidance

and comments. My second advisor, Associate Professor Olof Hagsand, has guided and helped me

throughout the years. Even though he has left the lab a year ago, he continued to provide me support

when needed.

I am also grateful to everyone at LCN, and the ones who have graduated and left the lab. I

appreciative the friendship, discussions and the stimulating working environment.

I would also like to thank the organizations and entities that have funded my research: the Win-

ternet project funded by SSF and the Graduate School of Telecommunication at KTH.

Finally, I would like to thank my family for their continuous support and patience.

Contents

Contents vii

1 Introduction 11.1 Network processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Queueing in network processor systems . . . . . . . . . . . . . . . . . . 4

1.3 IP-address lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Summary of original work 72.1 Paper A: Designing and Evaluating Network Processor Applications . . . 7

2.2 Paper B: Performance Evaluation and Cache Behavior of LC-Trie for IP-

Address Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Paper C: Queuing Behavior and Packet Delays in Network Processor Systems 8

2.4 Other papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Conclusions and future work 11

Bibliography 13

vii

Chapter 1

Introduction

The design of Internet switching devices, such as routers and switches, is becoming in-

creasingly challenging. One challenge comes from the technological trends in optics, pro-

cessors and memory. During recent years, the capacity of optical links has increased faster

than the processing capacity of single processor systems, which in turn has increased faster

than memory access speed. These trends have significant impact on the architecture of

routers since a single-processor router may not be able to process and buffer packets at

high-speed line rates. Therefore, parallel processing, queueing and buffering approaches

in routers are explored in various designs.

Another challenge comes from the requirements for new functionality in routers. The

main task of a traditional IP router is to perform IP-address lookup to determine the next-

hop address for the packets. The Internet has evolved, and numerous new services have

been introduced: Firewalls limit access to protected networks by filtering unwanted traffic;

quality of service (QoS) provides service contracts between networks and users; Internet

Protocol television (IPTV) delivers digital television using the IP protocol. Therefore, an

ideal router should be flexible and programmable in order to support current and future

Internet services.

Routers can be logically separated into a control plane and a forwarding plane. The

control plane is responsible for routing, management and signalling protocols. The for-

warding plane is responsible for forwarding packets from one port to another. Figure 1

shows a route processor executing in the control plane, and a switch fabric with a num-

ber of line cards together constitutes the forwarding plane. The forwarding procedure of

a packet can be summarized as follows. A packet first arrives on a port of a line card

for ingress processing. Depending on the forwarding decision such as IP-address lookup,

the outgoing interface and the egress line card of the packet are determined. Thereafter,

the packet is transmitted through the switch fabric to the line card for egress processing.

Finally, the packet is sent out through an output port to external links.

Routers can be divided into two categories depending on their functionality. An edge

router connects clients to the Internet while a core router serves solely to transmit data

between other routers, e.g. inside the network of an Internet service provider. The require-

1

2 CHAPTER 1. INTRODUCTION

Route

Processor

Line Card 1 Line Card 2

Line Card 3 Line Card 4

Control Plane

Forwarding Plane

Switch Fabric

External Links External Links

Figure 1.1: An abstraction of a router.

ments on routers differ depending on their type. While an edge router requires flexibility

and programmability, the main requirement of a core router is high capacity.

Current packet forwarding devices are typically based on general-purpose processors,

application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs)

or network processors. Deciding which technology to use involves a trade-off between

flexibility, performance, cost, development time and power consumption.

In a forwarding plane based on general-purpose processors, all networking function-

alities must be implemented in software, which leads to low throughput. Recently, open

software PC routers such as XORP [1] and Bifrost [2] have emerged. They put particular

focus on code efficiency in order to provide high performance, and could be used as edge

routers.

ASICs are most suitable for core routers. They are fast, but not flexible and pro-

grammable. They are custom-made for each application and all functionality is imple-

mented in hardware. However, there are configurable ASICs that provide limited configu-

ration, but are not fully programmable. The design of an ASIC is a long and complex task

involving a large team of engineers passing through several iterative design phases.

FPGAs contain programmable logic components and interconnects. Although they are

generally slower than ASICs, they have several advantages such as a shorter time to market

and flexibility.

Network processors are programmable with hardware optimized for network opera-

tions. The main purpose of a network processor is to provide the performance of ASICs

combined with the programmability of general-purpose processors.

There are also hybrid solutions to packet processing. First, it is common for network

processor systems to have general-purpose processors for control plane operations and

slow-path processing [3,5]. Second, a hybrid FPGA/ASIC architecture is also available [7].

In the following subsections, we first present network processors, followed by queueing

in network processor systems, and finally IP-address lookup.

1.1. NETWORK PROCESSORS 3

1.1 Network processors

Currently, there is a variety of network processor designs [3–6]. Despite the diversity of the

designs, there are some architectural similarities among them. First, all network processors

make use of processing elements (PEs) to perform packet processing functions. PEs are

instruction set processors that decode instruction streams.

Second, in order to meet increasing line speed requirements, all network processors

use some kind of parallel processing approach by exploiting packet-level and task-level

parallelism [6, 8].

1.1.1 Approaches to parallelism

The first approach to parallelism is to use multiple PEs, since the processing capacity

of a single PE may not suffice to handle line rates of high speed. There seems to be

two main variants for structuring the PEs. The first alternative is to arrange the PEs in

a pipeline, where each PE performs a single pipeline stage of packet processing. The

processing is then constrained to a single dimension but may have advantages in reduced

hardware complexity. The second alternative is to arrange the PEs in a general pool, where

each PE performs the same functionalities. The advantage with the pooled approach is its

generality. A hybrid approach combines the two approaches into parallel pipelines.

The second approach is to achieve parallelism within a single PE. The Intel IXP 2800

network processor, for example, supports several threads to be able to process packets

simultaneously [3], whereas the Xelerated X10 network processor uses a synchronized

dataflow mechanism to achieve parallelism within a PE [5].

1.1.2 Modelling and evaluation of network processors

A variety of network processor models have been introduced for management, program-

ming and performance evaluation purposes. The IETF ForCES (forwarding and control

element separation) group has defined the ForCES forwarding element model [10]. It pro-

vides a general management model for diverse forwarding elements including network

processors. Several programming models have been developed, for example NP-Click [9],

a programming model for the Intel IXP 1200 network processor [4].

The observation that current network processors are difficult to program has influenced

the work with NetVM, a network virtual machine [11]. NetVM models the main compo-

nents of a network processor, and aims at influencing the design of next-generation future

network processor architectures giving them a more unified programming interface.

A number of network processor models have been introduced to study various design

approaches [12–14]. Some of the models focus on a particular network processor such as

Intel IXP 1200 while others are general models.

4 CHAPTER 1. INTRODUCTION

1.2 Queueing in network processor systems

There are several places in a network processor system where queues are necessary. First,

packets may arrive through several ports to an ingress line card of a network processor

system at a higher rate than the service rate of the line card. Second, several ingress line

cards may simultaneously transmit a large number of packets to the same egress line card.

Third, the introduction of quality of service may cause best-effort packets to be queued

when higher priority traffic is present.

The queueing disciplines normally used between ingress and egress line cards include

input queueing, output queueing and virtual output queueing (VOQ) [15]. In input queue-

ing, the queues are placed between the ingress line cards and the switch fabric. Input

queueing is not efficient due to head-of-the-line blocking: If the packet at the head of a

queue is blocked waiting for access to a particular egress line card, packets in the queue

destined to other egress line cards are blocked as well, even if the other egress line cards

are ready to receive them.

In output queueing, the queues are placed between the switch fabric and the egress line

cards. It allows packets from several ingress line cards to be transmitted to a single egress

line card simultaneously. The main challenge in output queueing centers on the capacity

of the switch fabric and the queue memory: The fabric has to allow multiple packets to

be transmitted to an egress line card simultaneously, instead of one at a time in the input

queueing case. In addition, the memory bandwidth must have capacity equal to that of the

switch fabric.

As an alternative, a system can use virtual output queueing with the queues placed

between the ingress line cards and the switch fabric. Unlike input queueing, there is a

separate queue for each egress line card. This solves the head-of-the-line blocking prob-

lem and does not require high speedup in the switch fabric, but requires arbitration of the

queues. In addition to using a virtual output queue (VOQ) for each egress line card, a VOQ

for each traffic class can be used to provide service differentiation.

1.3 IP-address lookup

The processing functions inside a packet forwarding device can be categorized into a num-

ber of items, including classification, IP-address lookup, modification, metering, schedul-

ing, shaping, etc.

The goal of the IP-address lookup is to find the outgoing interface and the next-hop

address of an IP packet, given the destination IP-address as input argument. The procedure

is as follows: For each packet, the router extracts the destination IP address, makes a

lookup in a routing table, and determines the outgoing interface. Each entry in the routing

table consists of a prefix-length pair. Instead of an exact match, the IP address lookup

algorithm performs a longest-prefix match.

Since each packet needs to be examined at high network speed, the lookup procedure

must be very efficient. For a 10 Gb/s interface, 5 million lookups per second are required

for average sized packets.

1.4. CONTRIBUTIONS OF THE THESIS 5

1.3.1 Approaches to IP-address lookup

The current approaches for performing IP-address lookup include both hardware and soft-

ware solutions. The hardware solutions make use of content addressable memory (CAM)

and ternary content addressable memory (TCAM) to perform IP address lookup [16–18].

CAM and TCAM provide content-based memory that compares a search key to all slots in

memory in parallel. To achieve higher performance and reduce the time for memory ac-

cesses, many hardware solutions use a pipelined architecture [19, 20]. However, there are

many approaches for fast IP-address lookup that are based on software solutions. The ma-

jority of these algorithms use a modified version of a trie data structure [21]. Degermark,

et al., try to fit the data structure into the cache [22]. Srinivasan and Varghese present a

modified trie data structure for performing IP-address lookup [23]. Finally, the LC-trie, an

efficient IP-Address lookup algorithm, is another approach based on a modified version of

the trie data structure [24]. The LC-trie algorithm has recently been implemented in the

Linux 2.6.13 kernel forwarder.

1.3.2 Performance evaluation of IP-address lookup algorithms

There are several efforts studying accurate performance measurement of IP-address lookup

algorithms. Narlikar and Zane present an analytical model to predict the performance of

software-based IP-address lookup algorithms [25]. Kawabe et al., present a method for

predicting IP-address lookup algorithm performance based on statistical analysis of the

Internet traffic [26]. Ruiz-Sanchez et al., survey existing route lookup algorithms [27].

In particular, they examine the packet lookup service time of lookup algorithms using a

random network trace.

1.4 Contributions of the thesis

This thesis provides an evaluation of router designs including both hardware architectures

and software applications. The contributions include work in two areas.

The first area includes an investigation of various aspects of network processor sys-

tems. A system model is introduced and used as a basis for a simulation tool. In particular,

different methods to organize PEs to achieve parallelism are studied together with the im-

pact of using multiple threads inside a single PE. The queueing behavior and packet delays

in a system are studied given real-world and synthetically generated packet traces as input

data. The results on queueing behavior have been used to dimension queues, which can be

used as guidelines for designing memory subsystems and selecting queueing disciplines.

The second area includes a detailed performance evaluation of an IP-address lookup

algorithm, the LC-trie. The performance study includes trie search depth, prefix vector

access behavior, cache behavior, and packet lookup service time. For the packet lookup

service time, the evaluation contains both experimental results and results obtained from a

formal model.

6 CHAPTER 1. INTRODUCTION

The rest of the thesis is organized as follows. Chapter 2 summarizes the original work

of the thesis, presented in three enclosed papers. Chapter 3 concludes the work and pro-

poses future research topics.

Chapter 2

Summary of original work

2.1 Paper A: Designing and Evaluating Network ProcessorApplications

J. Fu and O. Hagsand, "Designing and Evaluating Network Processor Applications", in

Proc. of 2005 IEEE Workshop on High Performance Switching and Routing (HPSR), pp.

142-146, Hong Kong, May 2005.

Summary: This paper introduces a network processor model based on the study of

current network processors. A simulation environment following the model is constructed

using the Java programming language. The simulation environment provides a program-

ming interface and run-time to evaluate different design approaches. This includes pro-

cessing element layout, forwarding application design and use of multiple threads inside a

single processing element.

The evaluation of various designs is done by analyzing the resulting saturation through-

put, latency, utilization and code complexity. Using the simulation environment, an IPv4

forwarding application is evaluated using two different processing element topologies: a

pipelined and a pooled. In addition, the impact if using multiple threads inside a single PE

is studied.

The simulation results show that the use of parallelism is crucial to achieve high per-

formance. In particular, using multiple threads inside a processing element has a large

impact on the performance. The processing blocks become fully utilized when there are

four or more threads. Moreover, both architectures, the pipelined and the pooled achieve

comparable performances.

Contribution: The original idea of developing a model for network processors came

from the second author of the paper. The author of this thesis developed the model, carried

out simulation experiments, obtained and analyzed the results and wrote the article under

7

8 CHAPTER 2. SUMMARY OF ORIGINAL WORK

the supervision of the second author.

2.2 Paper B: Performance Evaluation and Cache Behavior ofLC-Trie for IP-Address Lookup

J. Fu, O. Hagsand and G. Karlsson, "Performance Evaluation and Cache Behavior of LC-

Trie for IP-Address Lookup", in Proc. of 2006 IEEE Workshop on High PerformanceSwitching and Routing (HPSR), Poznan, Poland, June 2006.

Summary: This paper presents a detailed performance analysis of the LC-trie IP-

address lookup algorithm. Both a realistic trace from FUNET [28] and a synthetically

generated random network trace are used for the study. The performance measurements

include trie search depth, prefix vector access behavior, cache behavior, and packet lookup

service time.

Search depth and prefix vector access behavior are obtained experimentally by running

a modified version of the LC-trie source code [29]. Cache behavior results are obtained by

using an introduced cache model supporting two replacement policies: random and least

recently used. Finally, the packet lookup service time results are obtained both through an

introduced model and through experimental runs.

The results show that for the FUNET trace, the LC-trie algorithm is capable of perform-

ing 20 million lookups per second on a Pentium 4, 2.8 GHz computer, which corresponds

to a 40 Gb/s link for average-sized packets. Since the FUNET trace contains both uni-

versity and student dormitory traffic, we believe it is a realistic and representative trace

for aggregated Internet traffic. As the results show that LC-trie performs up to five times

better on the realistic trace than on a synthetically generated random network trace, this il-

lustrates that the choice of traces may have a large influence on the results when evaluating

IP-address lookup algorithms.

Contribution: The original idea of performing a performance analysis of the LC-trie

lookup algorithm came from the third author of the paper. The author of this thesis carried

out experiments, developed the analytical model, obtained and analyzed the results and

wrote the article under the supervision of the co-authors.

2.3 Paper C: Queuing Behavior and Packet Delays in NetworkProcessor Systems

J. Fu, O. Hagsand and G. Karlsson, "Queuing Behavior and Packet Delays in Network Pro-

cessor Systems", Technical report, TRITA-EE 2006:050, ISSN 1653-5146, KTH, Royal

Institute of Technology, Sweden, October 2006.

2.4. OTHER PAPERS 9

Summary: This paper studies the queueing behavior and packet delays in a network

processor system. The study is based on an introduced system model and a simulation

tool that is constructed according to the model. Using the simulation tool, both best-effort

and diffserv IPv4 forwarding were modeled and tested using real-world and synthetically

generated packet traces. By studying the traces, the packet arrival processes are studied

and modeled.

The results include average and 99th percentile queue lengths for various queues, per-

centages of dropped packets and packet delays. The results on queueing behavior have

been used to dimension the queues, which can be used as guidelines for designing memory

subsystems and queueing disciplines. The result on packet delays show that our diffserv

setup provides good service differentiation for best-effort and priority packets. While best-

effort packets experience high loss probability, delay and delay jitters, priority packets are

transmitted with a very low delay and delay jitters are small.

The study also reveals that there are large differences in queueing behavior and packet

delays for the real-world and the synthetically generated traces. This indicates that the

choice of traces has a large impact on the results when evaluating router and switch archi-

tectures.

Contribution: The author of this thesis developed the model, carried out simulation

experiments, obtained and analyzed the results and wrote the article under the supervision

of the co-authors.

2.4 Other papers

The following papers have also been authored by the author of this thesis:

• J. Fu and O. Hagsand, "A Programming Model for a Forwarding Element", in Proc.

of SNCNW 2004, Karlstad, Sweden, November 2004.

• J. Fu and O. Hagsand, "An Evaluation Environment for Programmable Forwarding

Elements", Poster in Multimedia Interactive Protocols and Systems 2004, Grenoble,

France, November 2004.

• J. Fu, O. Hagsand and Gunnar Karlsson "An Analysis of Queueing Behavior in

Network Processor Systems", in Proc. of SNCNW 2006, Luleå, Sweden, October

2006.

Chapter 3

Conclusions and future work

This thesis presents an evaluation of network processor systems and a performance eval-

uation of the LC-trie IP-address lookup algorithm. A system model is developed to study

various design approaches. Thereafter, a simulation tool is constructed according to the

model. The simulation tool is used to evaluate several aspects of network processor de-

sign including PE layout, forwarding application design, use of multiple threads inside a

single PE, and queueing discipline. We evaluate these designs by analyzing the resulting

throughput, latency, utilization, code complexity, queueing behavior, and packet delays.

The conclusions from this work cover a number of areas. First, our results show that

the use of parallelism is crucial to achieve high performance. The best performance is

achieved when there are an adequate number of threads inside each PE. Also, both the

pipelined and pooled PE topologies achieve comparable performance. Second, the detailed

results on queueing behavior are used to dimension various queues to guide the design of

memory subsystems queueing disciplines. Third, the results on packet delays show that

our diffserv setup provides good service differentiation for best-effort and priority packets.

While best-effort packets experience high loss probability, delay and delay jitter, priority

packets are transmitted with a very low delay and delay jitter is small. Fourth, the results

on queueing behavior reveals that the real-world traffic is not only bursty, but packets are

also clumped together when transmitted to outgoing interfaces. As the real-world and

synthetically generated traces are compared, large differences in queueing behavior and

packet delays are observed. This illustrates that the choice of traces may have a large

influence on the results when evaluating router and switch architectures.

The performance measurements on the LC-trie include trie search depth, prefix vector

access behavior, cache behavior, and packet lookup service time. The packet lookup ser-

vice time is obtained both through experimental runs and a formal model. The results show

that the LC-trie algorithm is capable of performing 20 million packet lookups per second

on a Pentium 4, 2.8 GHz computer, which corresponds to a 40 Gb/s link for average sized

packets. Further, the results show that the LC-trie performs up to five times better on the

realistic FUNET trace compared to a synthetically generated network trace.

There are open issues to address which can be the subject of future study:

11

12 CHAPTER 3. CONCLUSIONS AND FUTURE WORK

• We would like to study in detail the design, implementation and performance of a

distributed router. In particular, we would like to study how to perform efficient IP-

address lookup. The lookup procedure could be partitioned and performed both in

the incoming and outgoing FEs. The separation of the procedure may result in new

algorithms and data structures enabling faster lookups.

• We would like to investigate how to support network services using a distributed

router. Any user, including computer hosts and routers, could join the distributed

router anywhere in the Internet. Network services, including virtual network service,

bypassing of Internet filters, and anonymity in the Internet are provided to the users.

The research will focus on how to provide the services and how to perform internal

routing in such a router.

Bibliography

[1] XORP, eXtensible Open Router Platform, Available: http://www.xorp.org/.

[2] Bifrost Network Proceject, Available: http://bifrost.slu.se/index.en.html.

[3] Intel Corp., ”Intel IXP 2800 Network Processor Family”, Hardware Reference Manual,

August 2004.

[4] Intel Corp., ”Intel IXP 1200 Network Processor Family”, Hardware Reference Manual,

December 2001.

[5] Xelerated, ”Xelerator x10q Network Processors”, Product Brief, October 2003.

[6] N. Shah, ”Understanding Network Processors”, Master’s thesis, Univ. of California,

Berkely, 2001.

[7] P. S. Zuchowski, C. B. Reynolds, R. J. Grupp, S.G. Davis, B. Cremen, and B. Troxel,

”A hybrid ASIC and FPGA architecture”, in Proc. of the 2002 IEEE/ACM interna-tional conference on computer-aided design, San Jose, California, pp. 187-194, 2002.

[8] D. E. Comer, ”Network Systems Design using Network Processors”, Prentice Hall,

2004.

[9] N. Shah, W. Plishker, and K. Keutzer, ”NP-Click: A programming model for the Intel

IXP1200”, in 9th International Symposium on High Performance Computer Architec-tures (HPCA), Feb 2003.

[10] L. Yang, J. Halpern, R. Gopal, and R. Dantu, ”ForCES Forwarding Element Model”,

Internet Draft, March 2006.

[11] L. Degioanni, M. Baldi, D. Buffa, F. Risso, F. Stirano, and G Varenni, ”Network

Virtual Machine (NetVM): A New Architecture for Efficient and Portable Packet Pro-

cessing Applications”, in Proc. of 8th International Conference on Telecommunica-tions (ConTEL 2005), pp. 153-168, Zagreb, Croatia, June 2005.

[12] M. Gries, C. Kulkarni, C. Sauer, K. Keutzer, ”Exploring Trade-offs in Performance

and Programmability of Processing Element Topologies for Network Processors”, Net-work Processor Design: Issues and Practices, volume 2, Morgan Kaufmann Publish-

ers, Nov. 2003.

13

14 BIBLIOGRAPHY

[13] L. Thiele, S. Chakraborty, M. Gries, ”Design Space Exploration of Network Pro-

cessor Architectures”, First Workshop on Network Processors at the 8th InternationalSymposium on High Performance Computer Architectures, February 2002.

[14] P. Crowley and J.-L. Baer, ”A Modeling Framework for Network Processor Systems”,

First Workshop on Network Processors at the 8th International Symposium on HighPerformance Computer Architectures, February 2002.

[15] T. Anderson, S. Owicki, J. Saxe, and C. Thacker, ”High speed switch scheduling for

local area networks,” ACM Trans. Comput. Syst., pp. 319-352, Nov. 1993.

[16] A. McAuley and P. Francis, ”Fast Routing table Lookups using CAMs,” in Proc. of

IEEE Infocom’93, vol. 3, pp. 1382-1391, San Francisco, 1993.

[17] F. Zane, N. Girija and A. Basu, ”CoolCAMs: Power-Efficient TCAMs for Forwarding

Engines,” in Proc. of IEEE Infocom’03, pp. 42-52, San Francisco, May 2003.

[18] E. Spitznagel, D. Taylor and J. Turner, ”Packet Classification Using Extended

TCAMs”, in Proc. of ICNP’03, pp. 120-131, Nov. 2003.

[19] P. Gupta, S. Lin, and N. McKeown, ”Routing Lookups in Hardware at Memory Ac-

cess Speeds,” in Proc. of IEEE Infocom’98, pp. 1240-1247, San Francisco, Apr. 1998.

[20] A. Moestedt and P. Sj’́odin, ”IP Address Lookup in Hardware for High-Speed Rout-

ing ,” in Proceedings of Hot Interconnects VI, Stanford, 1998.

[21] E. Fredkin. ”Trie memory,” Communications of the ACM, pp. 490-499. 1960.

[22] M. Degermark A. Brodnik, S. Carlsson, and S. Pink, ”Small Forwarding Tables for

Fast Routing Lookups”. in Proc. ACM SIGCOMM Conference’97, pages 3-14, Oct.

1997.

[23] V. Srinivasan and G. Varghese, ”Faster IP lookups using controlled prefix expansion,”

in Proc. of ACM SIGMETRICS 1998, Madison, pp.1-10, 1998.

[24] S. Nilsson and G. Karlsson, ”IP-address lookup using LC-tries,” IEEE Journal onSelected Areas in Communications, vol. 17, no. 6, pp. 1083-1092, June 1999.

[25] G. Narlikar and F. Zane, ”Performance modeling for fast IP lookups”, in Proc. of

ACM SIGMETRICS 2001, pp.1-12, 2001.

[26] R. Kawabe, S. Ata and M.Murata, ”Performance Prediction Method for IP lookup

algorithms,” in Proc. of IEEE Workshop on High Performance Switching and Routing,2002 pp.111-115, Kobe, Japan, May 2002.

[27] M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous, ”Survey and taxonomy of IP

address lookup algorithms,” IEEE Network Magazine, vol. 15, no. 2, pp. 8-23, Mar.

2001.

15

[28] CSC, Finish IT Center for Science, FUNET Looking Glass, Available:

http.//www.csc.fi/sumoi/funet/noc/looking-glass/lg.cgi.

[29] LC-trie implementation source code. Available: http://www.nada.kth.se/ snils-

son/public/code/router/.


Recommended