A Methodology for Evaluating Runtime Support in Network Processors

Post on 22-Jan-2016

29 views 0 download

description

A Methodology for Evaluating Runtime Support in Network Processors. University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu. Runtime Support in Network Processor. Network processor (NP) Multi-core system-on-chip - PowerPoint PPT Presentation

transcript

Department of Electrical and Computer Engineering

University of Massachusetts, Amherst Xin Huang and Tilman Wolf

{xhuang,wolf}@ecs.umass.edu

A Methodology for Evaluating Runtime Support in Network

Processors

2Department of Electrical and Computer Engineering

Runtime Support in Network Processor

Network processor (NP)• Multi-core system-on-chip• Programmability & high packet processing rate

Heterogeneous resources• Control processors• Multiple packet processors• Co-processors• Memory hierarchy• Interconnection

Runtime support• Dynamic task allocation

Receiveand

Transmit

Scratchpad

Hash Unit

μEμEμEμE

μEμEμEμE

SRAMand

DRAMInterface

XscaleControl

Processor

μEμEμEμE

μEμEμEμE

IXP 2800

3Department of Electrical and Computer Engineering

Receiveand

Transmit

Scratchpad

Hash Unit

μEμEμEμE

μEμEμEμE

SRAMand

DRAMInterface

XscaleControl

Processor

μEμEμEμE

μEμEμEμE

NP Hardware Resources

SRAM

Flash

Memory Mapped I/O

SDRAM

Workload

Task Allocation on the Processors

Runtime Mapping

General Operation of Runtime Support in NP

Input• Hardware resources• Workload

Mapping method Output

• Task allocation

Dynamic adaptation• Different runtime

support systems• Difficult to compare

AP2

AP1

AP3AP2 AP3AP3

4Department of Electrical and Computer Engineering

Contributions

Evaluation methodology• Traffic representation• Analytical system model based on queuing networks• Results

Specific: 3 example runtime support systemI. Ideal AllocationII. Full Processor Allocation

• R. Kokku, T. Riche, A. Kunze, J. Mudigonda, J. Jason, and H. Vin. A case for run-time adaptation in packet processing systems. In Proc. of the 2nd workshop on Hot Topics in Networks (HOTNETS-II), Cambridge, MA, Nov. 2003

III.Partitioned Application Allocation• T. Wolf, N. Weng, and C.-H. Tai. Design consideration for network

processor operating systems. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication System (ANCS), pages 71-80, Princeton, NJ, Oct. 2005

5Department of Electrical and Computer Engineering

Outline

Introduction Evaluation Methodology

• Dynamic Workload Model• Runtime System Model

Result Summary

6Department of Electrical and Computer Engineering

Workload

NP workload is characterized by applications and traffic

How to represent workload?

7Department of Electrical and Computer Engineering

Dynamic Workload Model

Workload graph:• Application/Task: T• Traffic: • Processing requirement:

Example:

Processing requirement:• R. Ramaswamy and T. Wolf. PacketBench: A tool for workload

characterization of network processing. In Proc. of IEEE 6th Annual Workshop on Workload Characterization (WWC-6), page 42-50, Austin, TX, Oct. 2003

( , )W T U

,t tU R( )iD t

8Department of Electrical and Computer Engineering

Outline

Introduction Evaluation Methodology

• Dynamic Workload Model• Runtime System Model

Result Summary

9Department of Electrical and Computer Engineering

Runtime System Model

Unified approach for all runtime systems• Queuing networks• Specific solution for each runtime system

• Runtime mapping: • Graph:• Packet arrival rate:• Service time:

Metrics for all runtime systems• Processor utilization:• Average number of packets in the system:

( , )i jD t p,ti j

:t tM T P( , )S P Q

K

10Department of Electrical and Computer Engineering

Three Example Runtime Support Systems

System I: Ideal Allocation System II: Full Processor Allocation System III: Partitioned Application Allocation

Workload

T1 T2T2

T1 & T2T1 & T2

T1 & T2T1 & T2

T1

T2 T2

T1_1

T2_1T2_1T2_1

T1_2T2_2T2_2

T1_4T2_4T2_4

T1_3T2_3T2_3

Ideal Allocation Full Processor Allocation Partitioned Application Allocation

11Department of Electrical and Computer Engineering

Example Evaluation Model – System I

Ideal Allocation • All processors can process all packets completely• Unrealistic, but can provide baseline

M/G/m FCFS single station

12Department of Electrical and Computer Engineering

M/G/m Single Station Queuing System

Cosmetatos approximation

Evaluation metrics

2 2/ / / / / /

11

/ /

0

1/ / / /

(1 ) ,

( ) ( ) ( ) 1; ; [ ] ,

(1 ) !(1 ) ! ! (1 )

1 1 4 5 2; (1 (1 )( 1) )

2 16

M G m M M m M D mB B

m k mmm

M M m mk

M D m M M m DmDm

W c W c W

where

P m m mW P

m m m k m

and

mW W nc m

nc m

K W m

G. Cosmetatos. Some Approximate Equilibrium Results for the Multiserver Queue (M/G/r). Operations Research Quarterly, USA, pages 615 – 620, 1976

G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley & Sons, Inc., New York, NY, August 1998

;m

13Department of Electrical and Computer Engineering

Example Evaluation Model – System II

Full Processor Allocation• Allocate entire tasks to subsets of processors• Allocate as few processors as possible to save power• One processor run one type of task• Reallocation is triggered by queue length

BCMP M/M/1-FCFS model

(Jackson network)

14Department of Electrical and Computer Engineering

BCMP Network

BCMP: Basket, Chandy, Muntz, and Palacios Characteristics: Open, closed, and mixed queuing network;

Several job classes; Four types of nodes: M/M/m–FCFS (class-independent service time), M/G/1–PS, M/G/∞–IS, and M/G/1–LCFS PR

Product-form steady-state solution: Open M/M/1-FCFS BCMP Queuing Network:

• Evaluation metrics:

11

1( ,..., ) ( ) ( ),

( )

N

N i ii

s s d s n sG K

11

( ,..., ) ( ), ( ) (1 ) i

Nk

N i i i i i ii

k k k k

F. Baskett, K. Chandy, R. Muntz, and F. Palacios. Open, Closed, and Mixed Networks of Queues wit Different Classes of Customers. Journal of the ACM, 22(2): 248 – 260, April 1975

,1 1 1

,1

C C Cir ir

i iri ir ir rr r r i i

eK K

15Department of Electrical and Computer Engineering

Example Evaluation Model – System III

Partitioned Application Allocation• Tasks be partitioned across multiple processors• Synchronized pipelines• Allocate tasks equally across all processors to maximize

throughput• Reallocate at fixed time intervals

Equations for evaluation metrics are the same as System II.

BCMP M/M/1-FCFS model(Jackson network)

16Department of Electrical and Computer Engineering

Outline

Introduction Evaluation Methodology

• Dynamic Workload Model• Runtime System Model

Result Summary

17Department of Electrical and Computer Engineering

Setup

System• 16 100MIPS processing engines • Queue lengths are infinite

Workload

Other assumptions• Partition applications into 7-15 subtasks

18Department of Electrical and Computer Engineering

Processor Allocation Over Time

Ideal:• 16 processors

Full Processor:• Change with traffic

Partitioned Application:• 16 processors

Full processor allocation system

19Department of Electrical and Computer Engineering

Processor Utilization Over Time

Ideal:• Lowest processor

utilization Full Processor:

• Highest processor utilization because using fewer number of processors

Partitioned Application:• Low processor utilization• Not equal to ideal case

due to the unbalanced task allocation and pipeline overhead

20Department of Electrical and Computer Engineering

Packets in System Over Time

Ideal:• Least number of packets

Full Processor:• Packets queued up due to

its high processor utilization

Partitioned Application:• Most number of packets

due to unbalanced task allocation and pipeline overhead

• More stable performance because of finer processor allocation granularity

21Department of Electrical and Computer Engineering

Performance for Different Data Rates

Ideal:• Smooth increase

Full Processor: • Periodical peak

Partitioned Application:• Smooth increase

The maximum data rate supported by the systems• Ideal: 100%• Full Processor: 79.6%• Partitioned application:

75.1%

22Department of Electrical and Computer Engineering

Implication of the Results

Ideal Allocation• Provide a base line

Full Processor Allocation• Allocate as few processors as possible to save power• Use entire processor as the allocation granularity• Good: High processor utilization• Bad: High performance variance

Partitioned Application Allocation• Equally distribute tasks on all the processors• Finer processor allocation granularity• Good: Stable performance• Bad: Difficult to get optimized solution => pipeline

synchronization overhead

23Department of Electrical and Computer Engineering

Summary

Analytical methodology for evaluating different runtime support NP systems

Dynamic workload model and runtime system model

Results: 3 example runtime support systems• Quantitative metrics• Tradeoffs

24Department of Electrical and Computer Engineering

Questions ?