Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | norman-burke |
View: | 222 times |
Download: | 1 times |
ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems
DesignDesignNetwork Processor Architecture and
ScalabilityChapter 13,14: D. E. Comer
Ning Weng ECE 526 2
NP ArchitecturesNP Architectures• Last class:
─ Key requirement of network processor: flexibility and scalability
─ Optimized instruction set and parallel processing using multiprocessors
• This class:─ Internal organization of NP:
• Computation, storage and communication• Operating support• Content addressable memory (CAM)
─ NP scaling issues
Ning Weng ECE 526 3
NP ArchitecturesNP Architectures• NP architecture characteristics
─ Computation• Processor hierarchy• Special-purpose functional units
─ Storage• Memory hierarchy• Content addressable memory (CAM)
─ Communication• Internal buses• External interfaces
─ Operation support• Concurrent/parallel execution support• Programming models• Dispatch mechanisms
Ning Weng ECE 526 4
Processor FunctionalityProcessor Functionality
Ning Weng ECE 526 5
Processor PyramidProcessor Pyramid
Ning Weng ECE 526 6
Packet Flow through Packet Flow through HierarchyHierarchy
• Accommodating tasks of different complexity and frequency─ Low level: simple and
frequent processing─ High level: occasional
and complex processing
• Computation scaling─ Faster processor─ More concurrent threads─ More processors─ More processor types
Ning Weng ECE 526 7
Memory HierarchyMemory Hierarchy• Different memory technologies used for performance, cost
and area• Conventional Approach:
─ Register + cache + off-chip DRAM • Exploiting locality: temporal and spatial
─ Optimized for average case─ Transparent to programmer
• Network Processors:─ Register, scratch pad, control store, onboard RAM, CAM/TCAM,
SRAM and SDRAM ─ Specialized for network processing application
• Little temporal locality
─ Explicit to application developer• Different to programming• More control
─ Memory hierarchy is not “cached” but used explicitly
Ning Weng ECE 526 8
Memory TechnologyMemory Technology• Characterized by access latency, area
─ SRAM: 2-10 ns, 4-6 transistors─ DRAM: 50-70 ns, 1 or 3 transistors
• What data should be store where?─ Instruction data─ Packets data: header, payload and meta-data─ Temporal data: data structure allocated on the stack─ Application data: persistent data, e.g., routing table, rule file
Memory Size ExampleMemory Size ExampleConsider a network system that processes IP
datagram. Assume the system executes 5,000 instructions per packet, each instruction occupies 4 bytes, 10% of instructions need to access 4-byte value memory, each datagram consists of 1500 bytes, a lookup examines 10 4-byte values on average in an IP routing table, and a datagram arrives and leaves in an Ethernet frame. Compute the total number of memory locations accessed to process on datagram. Assume no memory caching.
─ Instruction Memory:─ Packet Memory: ─ Application Memory: ─ Temporary Memory: Total:
Ning Weng ECE 526 9
Ning Weng ECE 526 10
Memory ScalingMemory Scaling• Memory access time: raw access speed
─ Technology dependent─ Important for random access
• Memory bandwidth─ Important for overall system performance─ Scale with
• Multiple ports• Multiple banks• Wider bus
─ Limits by• Pins and package cost
Ning Weng ECE 526 11
Content Addressable Content Addressable MemoryMemory
• Not using address to locate content
• CAM using content as input in a query-style format
• Organized as array of slots• Combination of
mechanisms─ Random access storage─ Exact-match pattern search
• Rapid search enabled with parallel hardware
Ning Weng ECE 526 12
Lookup using Conventional Lookup using Conventional CAMCAM
• Given ─ Pattern for which to search─ Known as key
• CAM returns─ First slot that match key or─ All slots that match key
• Algorithmfor each slot do { if (key == slot) { declare key matches slot; } else { declare key does not
match slot; }}
•
Ning Weng ECE 526 13
Ternary CAM (TCAM)Ternary CAM (TCAM)• Regular CAM
─ Binary value: 0 and 1─ Requiring key to match all the
content in one slot─ Not flexible
• TCAM─ Ternary value: 0, 1 and don’t
care─ Implemented using masking of
entries• Good for network processor
flow classification
Ning Weng ECE 526 14
TCAM LookupTCAM Lookup• Each slot has bit mask• Hardware uses mask to decide which bits to test• Algorithm
for each slot do { if (key & mask ) == (slot & mask)) { declare key matches slot; } else { declare key does not match slot; }}
Ning Weng ECE 526 15
Partial Matching using Partial Matching using TCAMTCAM
• Key matched slot 1• Packet belonging to flow ID: 00.02• Here “additional information” stored in each slot
Ning Weng ECE 526 16
Classification using TCAMClassification using TCAM• Flexibility: “additional information” stored in separate memory• Extracting values from fields in headers• Forming values in contiguous string• Using a key for TCAM lookup• Storing classification in slot
Ning Weng ECE 526 17
CommunicationCommunication• Internal interfaces: channels between processing
elements, memories─ Internal bus─ Hardware FIFO: sequential access─ Transfer register: random access─ Onboard shared memory: shared random access
• External interfaces─ Memory interfaces: accesses to larger off-chip memory─ Direct I/O interfaces: e.g., access to link interfaces─ Bus interfaces: accesses to other devices, e.g., control
CPU─ Switching fabric interface
• Access to switching fabric• Several standards (e.g., CSIX by NP Forum)
Communication Cost Communication Cost ExampleExample
• Consider a second generation network system that forwards IP datagram. If the system has 16 interfaces that each connect to an OC-192 line (data rate is 10 Gbps). These 16 interfaces are interconnected with a shared communication channel. The packet size is in the range of 40 bytes to 1500 bytes. What aggregate bandwidth is needed on the communication channel for the two design scenarios:─ Every bit of a packet transfers through the shared
communication channels.
─ Only a 4-byte packet memory address transfers through the shared communication channels.
Ning Weng ECE 526 18
Ning Weng ECE 526 19
NP Operating SupportNP Operating Support• Programming model: interrupt, event vs. thread
based• Parallel and concurrent execution support• Dispatch mechanism: how threads are initiated
Ning Weng ECE 526 20
SummarySummary• NP scaling by
─ Heterogeneous multiprocessors structured hierarchically─ Mixed memory technologies explicitly available to
programmer─ Different communication mechanisms─ Operating support important to achieve high system
performance
• NP scaling limited by─ Physical space: chip area (less than 400 mm2)─ Pin limits and packaging technology─ Power consumption and heat dissipation
Ning Weng ECE 526 21
For Next Class and For Next Class and ReminderReminder
• Read Comer: chapter 15 and 16• Homework solution on-line by Friday• Midterm: 10/6• Project
─ topic finalized 10/5 (group leader email me)─ proposal presentation 10/22