+ All Categories
Home > Documents > An Introduction to Smart NICs - cs.cornell.edu

An Introduction to Smart NICs - cs.cornell.edu

Date post: 07-Jan-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
52
An Introduction to Smart NICs and their use cases Mina Tahmasbi Arashloo Cornell University Fall 2019
Transcript
Page 1: An Introduction to Smart NICs - cs.cornell.edu

An Introduction to Smart NICs

and their use cases

Mina Tahmasbi ArashlooCornell University

Fall 2019

Page 2: An Introduction to Smart NICs - cs.cornell.edu

What is a smart NIC?

Page 3: An Introduction to Smart NICs - cs.cornell.edu

What is a (dumb) NIC?

Page 4: An Introduction to Smart NICs - cs.cornell.edu

What is a (dumb) NIC?

Network Interface Card

Any packet from the end host to the network and vice versa goes through the NIC

NetworkNIC

Implements: - the physical layer (L1) - (part of) the data link layer (L2)

Page 5: An Introduction to Smart NICs - cs.cornell.edu

What is a (dumb) NIC?

End Host

Data Link

Network

Transport

Application

PhysicalNIC

CPU

Page 6: An Introduction to Smart NICs - cs.cornell.edu

What is a (dumb) NIC?

End Host

Data Link

Network

Transport

Application

PhysicalNIC

CPU

On transmit (egress)

• The host CPU generates packets on application request

• Packets are sent to the NIC over PCIe

• The NIC transforms packets to bits and sends them over the link

Page 7: An Introduction to Smart NICs - cs.cornell.edu

What is a (dumb) NIC?

End Host

Data Link

Network

Transport

Application

PhysicalNIC

CPU

On receive (ingress)

• The NIC turns bits into packets

• Packets are sent to the host CPU over PCIe

• The host CPU processes packets and delivers them to applications

Page 8: An Introduction to Smart NICs - cs.cornell.edu

What is a (dumb) NIC?

End Host

Data Link

Network

Transport

Application

PhysicalNIC

CPU

Great division of labor!

general-purpose processor

running software

fixed-function hardware

- Simple - Does not change often

1

- Complicated - Changes frequently

2

CPU could keep up with the NIC, i.e, process packets at line rate in reasonable #cycles

3

Page 9: An Introduction to Smart NICs - cs.cornell.edu

What is a (dumb) NIC?

End Host

Data Link

Network

Transport

Application

PhysicalNIC

CPU

Not so great anymore

general-purpose processor

running software

fixed-function hardware

- Simple - Does not change often

1

- Complicated - Changes frequently

2

CPU could keep up with the NIC, i.e, process packets at line rate in reasonable #cycles

3

Page 10: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s

Line Rate

for max-sized packets

for min-sized packets

Page 11: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s

Line Rate

line rate for all packet sizes

Page 12: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s

Line Rate

line rate for some packet sizes

Page 13: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s

Line Rate

not line rate for any packet size

Page 14: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s

Line Rate

Page 15: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s Line rate has been increasing

Line Rate

Page 16: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s

Line Rate

Line rate has been increasing

If CPUs stayed the same, we could do less

processing per-packet at line-rate

Page 17: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s

Line Rate

But CPUs have been getting better too!

Page 18: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s

Line Rate

Line rate is increasing to 100-400 Gbps

Page 19: An Introduction to Smart NICs - cs.cornell.edu

Limits of Software Packet Processing

# CPU cycles per packet

Sing

le-C

ore

Thro

ughp

ut in

Mpp

s

Line Rate

But CPUs are not getting better as fast

Physical limits of semiconductor technology

= End of Moore’s law and

Dennard’s scaling

Page 20: An Introduction to Smart NICs - cs.cornell.edu

Solution?

Hardware Processor

Domain-Specific

Programmable

A

As opposed to general-purpose CPUs

can be optimized for network processing

operators can decide what part of packet processing is offloaded to hardware and how

Takes over part (or even all) of packet processing that is currently done by CPUs

Page 21: An Introduction to Smart NICs - cs.cornell.edu

Solution?

Hardware Processor

Domain-Specific

Programmable

A

On the NIC!

Co-location with the NIC provides extra benefits!

Page 22: An Introduction to Smart NICs - cs.cornell.edu

So, what is a smart NIC?

NetworkNIC

You can think of it asa dumb NIC

+a programmable domain-specific hardware

Page 23: An Introduction to Smart NICs - cs.cornell.edu

So, what is a smart NIC?

End Host

Data Link

Network

Transport

Application

Physicaldumb NIC

CPUgeneral-purpose

processorrunning software

fixed-functionhardware

Page 24: An Introduction to Smart NICs - cs.cornell.edu

So, what is a smart NIC?

End Host

Data Link

Network

Transport

Application

Physical

Smart NIC

CPU

?

Page 25: An Introduction to Smart NICs - cs.cornell.edu

A Closer Look at the Hardware

NIC

Current popular options:

1. Field Programmable gate array (FPGA) 2. Multi-core systems on chip (SoC)

Page 26: An Introduction to Smart NICs - cs.cornell.edu

Field Programmable Gate Arrays

Configurable Logic Block (CLB)

Embedded Memory or Block RAM (BRAM)

I/O Block

• An FPGA is a collection of small configurable logic and memory blocks

• Programmers can write code to assemble these blocks to perform their desired processing

Page 27: An Introduction to Smart NICs - cs.cornell.edu

Field Programmable Gate Arrays

Configurable Logic Block (CLB)

Embedded Memory or Block RAM (BRAM)

I/O Block

Why is an FPGA a popular hardware choice for smart NICs?

• FPGA hardware resources (logic and memory) can be highly customized for the intended computation

• Great fit for highly-parallelizable computation

Page 28: An Introduction to Smart NICs - cs.cornell.edu

Multi-Core Systems on Chip

• A “small” computer on a single chip

• Includes (light-weight) processing cores and a memory hierarchy

• Why is it a popular hardware choice for smart NICs?

• Programming model is close to software

• Cores (and the architecture) can be specialized for network processing

Page 29: An Introduction to Smart NICs - cs.cornell.edu

FPGAs vs Multi-Core SoCs for Network Processing

FPGAs Multi-Core SoCs

Hardware Architecture

Reconfigurable hardware and therefore can be highly

customized for the intended packet processing

The cores’ instruction set and memory architecture is fixed

and is therefore less customizable

Programming Model

Hardware description languages (e.g., Verilog)

↓Harder to program

C-like languages↓

Easier to program

Performance Higher throughputlower latency *

Lower throughputhigher latency *

* For most kinds of network processing

Page 30: An Introduction to Smart NICs - cs.cornell.edu

What are smart NICs used for?• Acceleration across the stack

• Hypervisor vSwitch: AccelNet (NSDI’18)

• Scheduling: PIEO (SIGCOMM’19), Loom (NSDI’19)

• Network functions: ClickNP (SIGCOMM’16), FlowBlaze (NSDI’19)

• Transport: Tonic (NSDI’20)

• Even applications: iPipe (SIGCOMM’19), KV-Direct (SOSP’17), Bing web search ranking (ISCA’14)

• Optimizing network I/O

• e.g., smart steering of packets to cores (FlexNIC, ASPLOS’16)

Page 31: An Introduction to Smart NICs - cs.cornell.edu

What are smart NICs used for?• Acceleration across the stack

• Hypervisor vSwitch: AccelNet (NSDI’18)

• Scheduling: PIEO (SIGCOMM’19), Loom (NSDI’19)

• Network functions: ClickNP (SIGCOMM’16), FlowBlaze (NSDI’19)

• Transport: Tonic (NSDI’20)

• Even applications: iPipe (SIGCOMM’19), KV-Direct (SOSP’17), Bing web search ranking (ISCA’14)

• Optimizing network I/O

• e.g., smart steering of packets to cores (FlexNIC, ASPLOS’16)

The Catch?Resource constraints!

both for computation and memory

Page 32: An Introduction to Smart NICs - cs.cornell.edu

Enabling Programmable Transport Protocols on High-Speed NICs

Mina Tahmasbi Arashloo1, Alexey Lavrov1, Manya Ghobadi2, Jennifer Rexford1, David Walker1, and David Wentzlaff1

1 Princeton University, 2 MIT

Page 33: An Introduction to Smart NICs - cs.cornell.edu

Network Stack

The Transport Layer

App 1

The Transport Layer

34

flow id, segment address

send-data (addr,length)

Flow 1 - Byte status sent, in-flight, lost, … - Credit

Flow 2 - Byte status sent, in-flight, lost, … - Credit

Flow m - Byte status sent, in-flight, lost, … - Credit

App 2

send-data (addr,length)

App N

send-data (addr,length)

IP and Below

Page 34: An Introduction to Smart NICs - cs.cornell.edu

Network Stack

The Transport Layer

App 1

The Transport Layer

34

flow id, segment address

send-data (addr,length)

Flow 1 - Byte status sent, in-flight, lost, … - Credit

Flow 2 - Byte status sent, in-flight, lost, … - Credit

Flow m - Byte status sent, in-flight, lost, … - Credit

App 2

send-data (addr,length)

App N

send-data (addr,length)

IP and Below

Transport Logic • Credit Management: How many bytes can I send?

• Segment Selection: Which bytes do I send?

Page 35: An Introduction to Smart NICs - cs.cornell.edu

Overview

35

Host

NIC

Application Layer

Memory Transport Layer - on the host

- Connection Management

Transport Layer - on the NIC - Data Transfer IP Layer

and BelowTransport Logic

(Tonic)

Outgoing Link

add/remove connectionsend N bytes from memory address A

• Credit Management• Segment Selection

Page 36: An Introduction to Smart NICs - cs.cornell.edu

Overview

35

Host

NIC

Application Layer

Memory Transport Layer - on the host

- Connection Management

Transport Layer - on the NIC - Data Transfer

DMA

IP Layer and

BelowTransport Logic (Tonic)

Outgoing Link

Next Segment

add/remove connectionsend N bytes from memory address A

Page 37: An Introduction to Smart NICs - cs.cornell.edu

Challenges of Implementing Transport Logic on High-Speed NICs

• Timing Constraints • Median packet size in data centers is 200 bytes• At 100 Gbps, one 128-byte packet every ~10 ns

• Back-to-back stateful event processing

• Memory Constraints • A few megabytes of high-speed memory• More than a thousand active flows• A few kilobits of per-flow state

36

Page 38: An Introduction to Smart NICs - cs.cornell.edu

Challenges of Implementing Transport Logic on High-Speed NICs

• Timing Constraints • Median packet size in data centers is 200 bytes• At 100 Gbps, one 128-byte packet every ~10 ns

• Back-to-back stateful event processing

• Memory Constraints • A few megabytes of high-speed memory• More than a thousand active flows• A few kilobits of per-flow state

37

Tonic • A programmable hardware architecture• running at 100 Gbps• within memory limits of commodity NICs

• to implement transport logic• with modest development effort

Page 39: An Introduction to Smart NICs - cs.cornell.edu

Main Observation

Common transport patterns as reusable components

• drive the design of an efficient hardware “template” for transport logic

• reduce the functionality users must specify

38

Page 40: An Introduction to Smart NICs - cs.cornell.edu

Tonic

The Two Engines

39

Segment Selection

Credit Management

flow ID, segment ID

segment transmitted

- Generates segment IDs for active flows

- Queues up the generated segments IDs - sends them out based on each flow’s credit

Page 41: An Introduction to Smart NICs - cs.cornell.edu

Segment Selection Patterns

40

Segment Selection

(reliable delivery)

Pick Bytes for Next Segment

Update Byte Status

Cannot maintain per-byte state on the NIC

Page 42: An Introduction to Smart NICs - cs.cornell.edu

Segment Selection Patterns

41

Segment Selection

(reliable delivery)

Select Next Segment

Update Segment Status

Pre-Calculate Segment Boundaries

Tonic

1. Only a few bits of state per segment• acked, rtxed, lost

• fixed function modules for common state updates

• programmable modules only for loss detection

2. Loss detection: acks and timeouts• only two programmable modules

•mutually exclusive → fewer concurrent state updates

3. Lost segments first, new segments next• fixed-function module for segment generation

Page 43: An Introduction to Smart NICs - cs.cornell.edu

Segment Selection

Tonic’s Segment Selection Engine

42

Memory for per-flow state: segment status, window size, …

Select Next Segmentactive flow

Periodic Updates (Timeout-based loss detection

and recovery)Timeout

Incoming

Loss Detection and Recovery

Common Segment Updates

ACK

Mer

ge

Page 44: An Introduction to Smart NICs - cs.cornell.edu

Segment Selection

Tonic’s Segment Selection Engine

42

Memory for per-flow state: segment status, window size, …

Select Next Segmentactive flow

Periodic Updates (Timeout-based loss detection

and recovery)Timeout

Incoming

Loss Detection and Recovery

Common Segment Updates

ACK

Mer

ge

Page 45: An Introduction to Smart NICs - cs.cornell.edu

Credit Management Patterns

43

Control Loop

Monitor Adjust Params

Rate Control

Calculate Credit

window/rate

1. Common credit management schemes

• Rate control: congestion window, data rate

• Admission control: grant tokens

2. Two main parameter adjustment signals • external signals, e.g., acks and CNPs• periodic internal signals, .e.g., counters• aligns with existing programmable modules for

segment selection

Page 46: An Introduction to Smart NICs - cs.cornell.edu

Tonic’s Credit Management Engine

44

Credit Management

Memory for per-flow state

Enqueueflows with enough credit

Transmit(Flow ID,

Segment ID) Received

Mer

ge

Flow ID Segment ID Queue Credit Other Variables

0 100 B …

1 200 B …

Page 47: An Introduction to Smart NICs - cs.cornell.edu

Hardware Implementation Challenges

• Consistent stateful operations

• Bitmap Operations

• Per-flow rate limiting

• More details in the paper

45

Page 48: An Introduction to Smart NICs - cs.cornell.edu

Evaluation - Programmability

• Implemented six representative protocols• Reno, New Reno• SACK (Selective ACK)• NDP (Receiver-driven data-center transport)• DCQCN, IRN (Improved RoCE NIC)

• All meet timing for 100 Gpbs (10-ns clock)

• Implemented within 200 lines of Verilog code• uses 0.5% of total logic resources

• Re-usable modules are 8K lines of Verilog code• uses 35% of total logic resources

46

Page 49: An Introduction to Smart NICs - cs.cornell.edu

Evaluation - Scalability

47

Page 50: An Introduction to Smart NICs - cs.cornell.edu

Evaluation - End-to-End Simulations

48

• Cycle-accurate hardware simulator for Tonic within NS3• Compared existing protocols with Tonic implementations

• TCP New Reno (plots shown below) and DCQCN

Page 51: An Introduction to Smart NICs - cs.cornell.edu

What’s Next for Smart NICs?

• More acceleration in each layer of the stack• application acceleration

• Hardware-efficient transport

Page 52: An Introduction to Smart NICs - cs.cornell.edu

What’s Next for Smart NICs?

• Generalizing network processing over heterogenous hardware• Given a network function, and

• a server with a CPU, and some accelerators on the NIC (FPGA, SoC, maybe even a GPU?)

• what is the best offloading strategy?

• iPipe (SIGCOMM’19): distributed applications over CPU and SoC-baed NICs

• Can we add programmable switches into the picture?


Recommended