+ All Categories
Home > Documents > DATA PROCESSING UNIT (DPU)

DATA PROCESSING UNIT (DPU)

Date post: 22-Feb-2022
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
35
Alexander Petrovskiy, Staff System Engineer, NVIDIA Networking August 2021 DATA PROCESSING UNIT (DPU) Technical overview
Transcript

Alexander Petrovskiy, Staff System Engineer, NVIDIA Networking

August 2021

DATA PROCESSING UNIT (DPU)Technical overview

2

SERVER NETWORKING EVOLUTION: FROM NIC TO DPUSoftware Defined Data Center Infrastructure-on-a-Chip

Accelerated SW-defined

Infrastructure on CPUSW-defined Infrastructure

on DPU

CPU

SmartNIC

SW-defined Networking SW-defined Security

SW-defined Storage Infrastructure Management

VMs

Acceleration Engines

Containers

CPU

DPU

SW-defined Networking SW-defined Security

SW-defined Storage Infrastructure Management

VMs

Acceleration Engines

Containers

DPU with integrated GPU

SW-defined Networking SW-defined Security

SW-defined Storage Infrastructure Management

Acceleration Engines

Datacenter on a DPU

CPU

VMs Containers

AI Applications

Legacy Infrastructure

CPU

Legacy NIC

SW-defined Networking SW-defined Security

SW-defined Storage Infrastructure Management

VMs Containers

3

DEMYSTIFYING SMARTNICs AND DPUsSoftware-Defined, Hardware-Accelerated

BLUEFIELD DPU

SoC-based DPU with full Data & Control Path Acceleration for Unified Cloud

CONNECT-X SmartNIC

ASIC-based advanced NIC with Fully Accelerated Datapath for Secure Cloud,

Telco and Enterprise

4

WHAT MAKES A SMARTNIC SMART?

PAM4

PAM4

PTP HW

Clock

Secure

Firmware Update

Secure Boot

(HW RoT)

Hardware Steering and Filtering

AES-XTS

Storage

Encryption

Engine

Key

Management

TLS Inline

Offload Engine

Connection

Tracking

IPsec Inline Offload Engine

(aware/un-aware)

RoCE

Selective

Repeat

Resilient

RoCE

Accurate

timestamp

x16 PCIe Gen 4.0

5

BLUEFIELD-2 DPU

ConnectX-6 Dx inside

200 Gbps Ethernet & InfiniBand, NRZ & PAM4 modulation

8 ARM A72 CPUs subsystem in a Tile architecture

- 8MB L2 cache, 6MB L3 cache in 4 Tiles

- ARM Frequency up-to 2.75GHz

Fully integrated PCIe switch, 16 bi-furcated Gen4.0

- Root Complex or End Point modes

1GbE Out-of-Band management port

16 lanes PCIe Gen3/4

Technical Overview

6

BLUEFIELD-3 DPU

ConnectX-7 inside

I/O

2x400Gbs (Active/Standby), 4x100Gbs Ethernet/InfiniBand

100G PAM4 serdes

400Gb/s bandwidth

Integrated PCIe switch

Gen5.0 x32+2

Multi-host – 8 hosts

Compute sub-system

16 Arm®A78 v8.2+ Hercules @2.3GHz

SkyMesh fully coherent low-latency interconnect

8MB L2 Cache, 16MB LLC System Cache

Built-in accelerators

Advanced Memory sub-system

Dual Channel 256GB DDR5-4800MT/s w/ ECC

NVDIMM-N Support

DDR memory encryption

1GbE Out-Of-Band management port

Self-hosted or Server-hosted

Technical Overview

Quad VPI Ports

Ethernet/InfiniBand:

10/25/50/100/200/400G

Out-of-Band

Management Port

SGMII

Mgmt

Port

(GbE)

ConnectX-7 SubsystemPacket Proc.

eSwitch Flow Steering / Switching

IPsec/TLS/CT

Application Offload, NVMe-oF, NVMe-oTCP, T10-DIF, etc.

Packet Proc.

RDMA transport Encrypt/Decrypt

DD

R 5

64b +

16b 4

800M

T/s

DMA

Last LevelCache

L2 Cache

Hercules

L2 Cache

Hercules

DMA DMA DMA

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

L2 Cache

Hercules

I2C,

USB,

DAP,UART

PCIe Gen 5.0 Switch

PCIe Gen 5.0 x32+2Root Complex or Endpoint

eMMC,

GPIO

Last LevelCache

Last LevelCache

Last LevelCache

DD

R 5

64b +

16b 4

800M

T/s

APU

RoT RegExpDecomp

TRNGPKA Accels

Accelerators

7

MOVING INFRASTRUCTURE SERVICES TO DPU

Software Defined Security

Distributed

NG Firewall

IDS/IPS DDOS

Prevention

Software Defined Storage

vRouter vSwitch VMs &

Containers

Software Defined Networking

NVMe-oF

Storage Direct

Data

Encryption

DeDup Micro

Segmentation

Telco/NFV Elastic

Storage

Root of

Trust

CompressionNAT/Load

Balancer

8

DPU FOR UNIFIED CLOUD USE-CASEUnified infrastructure for host Networking, Storage, Security and Management

Today’s Environment

Standard NIC

Network I/O Host MgmtStorage I/O

Hypervisor

Security &

Crypto

Functional Isolation

Today’s Environment

Network I/O

CPU

Scheduling

Storage I/O

Lightweight

Hypervisor

Security &

Crypto

BlueField-2 DPU

Unified Datacenter

Container Container

APP APP

VM VM

APP APP

Host Mgmt

Bare Metal Server

VM

APP

Container

APP

VM

APP

APPContainer Container

APP APP

AI

Acceleration

9

DPU – IS THE NEW NETWORK EDGEMoving the Top-of-rack Into the Server

SmartNIC/DPU

Datacenter ToR Switch

Host Based Networking

FRR

(BGP/EVPN)

Linux

networking

Linux

apps

10

HBN FOR UNIFIED CLOUD

Zero Trust

Network Administrator Server Administrator

EVPN Crypto Ansible

Standard Linux Control Plane

Hypervisors

Bare Metal

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Modern Networking with Classic Controls

VLAN Trunk

BGP

Peering

Offloaded

VXLAN

Accelerated Data PlaneContainers

11

DPU CAPABILITIES AND FEATURES - NETWORKING

12

HOST

ASAP2: VIRTUAL NETWORKING AND SDN ACCELERATIONAccelerated Switching and Packet Processing

VM VM Container

Legacy NIC

Hypervisor

CPU

Bandwidth

Virtual Switches / SDN Packet Processing Put Heavy Load on CPU

CPU

Bandwidth

Hardware Accelerated Packet Switching with Zero CPU Utilization Integrated with commercial and community partners

Leveraged to create Efficient Cloud Architectures

HOST

VM VM Container

ConnectX

HypervisorvSwitch vSwitch

13

VM VM VM

VF VF VF

PF

PF

Network adapter

Host

SR-IOV

Single root input/output virtualization

Native hardware access at the VM level

Every VM has direct access to the network adapter

(through virtual function, VF interface)

Baremetal-like network performance in VM, zero CPU

utilization

Guest awareness limitations: NIC driver in VM, VM Live

Migration challenges

VirtIO

Virtualization standard for network device drivers in Linux

systems

VirtIO abstracts the hardware to the guest OS in SW

Poor network performance in VM, CPU is utilized to move

packets

Guest un-aware: Virtio-net interface in VM, native VM Live

Migration

Can be accelerated in NIC HW using vDPA

HOST NETWORKING ACCELERATION

Comparing two general approaches

VM VM VM

Network adapter

Host

VirtIO VirtIO VirtIO

Hypervisor (vhost-net)

Hypervisor

SR-IOV

14

VF REPRESENTOR PORTSSW Representation of SR-IOV NIC Virtual Function

VF Representor

Net Device modeling of eswitch port and exposed

through PF driver

VF and its representor works like Linux veth pair

Flow configuration (add/remove)

Works under switchdev mode

Access from both kernel and DPDK

Multi Queue (RSS/TSO/CSUM)

Attach/Detach in DPDK

Multiple DPDK instances over VF representor

With VF representor, vSwitch can work with SRIOV

together and reduce CPU% consumed by virtio.

Port

DPU

SR-IOV

Host

SW Datapath

PF R1 R2

VM1

VF1

VM2

VF2

15

EMBEDDED SWITCH (ESWITCH)Flow-based Packet Processing and Steering Engine in SmartNIC/DPU

Classification

A

Classification

B

Classification

N

Action N

Action B

Action A

e-switch

Packets InProcessed

Packets

Out

Flow based Classification and action

Hierarchal multiple layer tables

Table consists of classification and action

Action may point to next table

Key fields example: Ethernet

L2/IPv4/IPv6/TCP/UDP/Inner Packet

(VXLAN/GENEVE/etc.)

Actions example: Allow/Deny, Re-write (Route/NAT),

Encap/Decap of headers, Meta Data set, Hairpin,

Sample, Counter, etc.

i2c-tools

16

NETWORKING OFFLOAD MODEL ON DPUFull Control Plane and Data Plane offload

Control plane and SW Datapath on DPU

HW Datapath is accelerated as in SmartNIC

Both SRIOV and VIRTIO interface to VM

Advantages

Support virtualized network services for VMs,

Containers and bare-metal cloud

Zero Host CPU utilization for networking services

All host resources (core and memory) can be used for

VMs

Efficient packet forwarding in HW

Host isolation

DPU can offload extra I/O and management services

Storage (NVMeOF, Virtio-blk)

Security (Firewall, DPI, IPSec/SSL crypto)

Host infrastructure management (BMC, Barametal/VM

provisioning)

VM3 VM1

Host

PortDPU

Control Plane

HW Datapath (eSwitch)

PF R1 R2

SW Datapath

R3

VhostBackend

VF3

VM2

virtio VF1 VF2

17

ASAP2: BLUEFIELD-2 TRAFFIC FLOWEmbedded CPU Configuration (Switchdev)

OVS / OVS-DPDK

ConnectX-6 DX

SmartNIC

Bare

Metal

PF0

pf0vf0 pf0hpf

Host

ECPF0

Bare

MetalVMVM

vport1vport0

pf0vf1

Network Interface

Flows

FDB

Controlplane

Isolation

Uplink

p0

18

DPU CAPABILITIES AND FEATURES - STORAGE

19

DPU STORAGE

UNPARALLELED

PERFORMANCESTORAGE SECURITY DISAGGREGATED STORAGE

Dual 100Gbps or single 200Gbps Up to 5.4M IOPs @4KB

Lowest latencyNVMe-oF acceleration

Storage Agility Meets Best-in-Class Hardware Acceleration

Data-at-rest AES-XTS encryptionAuthentication services

Protection between users

NVMe SNAP Virtio-blk SNAP

Integrated data & control planes

20

SNAP: LOCAL STORAGE TO EMULATED STORAGE

Host OS

Remote Storage

Host OS

NVMe Driver

✓Serving bare-metal and hypervisor/VMs

Bound by physical SSDs capacity

Under-utilized storage

Scalability on demand

Over-provisioning bound to compute node

Physical Local NVMe Storage SNAP Drive Emulation

✓ Serving bare-metal and hypervisor/VMs

✓ Over-provisioning, scaled to rack/cluster

✓ Saving OPEX and CAPEX

✓ OS-agnostic using inbox standard driver

✓ Supports all network transport types –

NVMe-oF, iSCSI, iSER and even proprietary

✓ Accelerated data path* for VMs

✓ Live-migration with virtio-blk* and vDPA*

✓ Support for older OSs where only virtio-blk* is available

NVMe Driver virtio-blk

* Roadmap

SNAPPhysical Local

Storage

21

BLUEFIELD-2 SNAP – NVME/VIRTIO-BLK

Emulate NVMe Local Storage

Connected to Remote Cloud Storage

Virtualized or Bare Metal Cloud

OS Agnostic with RDMA inside

OS/Hypervisor

NVMe std drvr

SNAP

NVMe SNAP SDK

User’s Storage Application

SPDKHardware NVMe-oF

Offload Accelerations

Eth/IB

Host Server

PCIe BUS

2 1

virtio-blk std drvr

2

virtio-blk

1

virtio

NVMe

- Enabling two data paths – (1) offload with NVMe-oF(RDMA)* vs (2) SPDK

- Pluggable to Linux’s block devices (NVMe-oF, iSCSI, iSER, etc)

- Provides infrastructure for Storage Application development

- Enabling End to End storage orchestration and integration

Framework for Storage virtualization software

22

DPU CAPABILITIES AND FEATURES - SECURITY

23

DPU SECURITY SOLUTIONS

SECURED

HARDWARE

Secure FW upgradeRoot-of-Trust

Arm trust zone

Integrated Security for modern data center needs

ADVANCED L4-L7

SECURITYCRYPTO

ACCELERATION

PROGRAMMABILTY

& ISOLATION

NG stateful firewallDeep Packet Inspection

Host introspection

Data-in-motion enc.Data-at-rest enc.

Public Key Acceleration

Hardened IsolationMicro-Segmentation

Programmable Networking

24

DPU SECURITY CAPABILITIESTrust Shifts to the DPU

Root-of-Trust

Stateful Firewall

Inline Crypto Accelerators

Deep Packet Inspection

Isolated Security Control Plane

DPU Security Services

Micro-segmentationNext Generation Firewall

DDoS ProtectionIntrusion ProtectionAnomaly Detection

Security Requires Full Isolation from

the Host

CPU

GPU

Network Traffic

BlueField DPU

25

IPSEC: TRANSPARENT ENCRYPTIONEncryption/decryption at 100Gb/s bidirectional

Host Host

IPsec

Software

Virtual

Switch

Encrypted IPsec Packet

PlaintextPacket

Workload Workload

Workload

EncryptedPacket

PlaintextPacket

Simple NIC vSwitch Control Plane

IPsec Control Plane

eSwitch and

IPsec engine

BlueField DPU

Traditional ServerIPsec runs on CPU

Workload

DPU Accelerated ServerIPsec and vSwitch on DPU

AccelerationEngine

Control Plane Software on ArmInline with other accelerators (tunneling, TLS, etc.)

Cipher: AES-GCM 128/256bit keys

Keys are stored encrypted in hardware

Encrypted RDMA

East-West encryption

26

ACCELERATING NEXT-GENERATION FIREWALLS

Accelerated Switching and Packet Processing (ASAP2) enables seamless offload of packet

filtering, steering, crypto and stateful connection tracking rules to the DPU HW

Hardware-Accelerated Policy Enforcement

Host

OS

Workload

Host

OS

NGFW

Workload

NGFW

Workload

NVIDIA DPU

27

DPU SOFTWARE

28

DPU HIGH LEVEL SW ARCHITECTURESoftware-Defined, Hardware-Accelerated Infrastructure

Software-Defined

Security

Distributed

Next-Gen

Firewall

IDS/IPS DDOS

Prevention

Software-Defined

Storage

vRouter

vSwitch

VMs and

ContainersSoftware-Defined

Networking

NVMe-oF Encrypt Dedupe

Micro

Segmentation

Telemetry/

PTP

Elastic

Root of

Trust

Compress

NAT/

Load

Balancer

Video

Streaming

DPU HW

DPU SW and SDK (DOCA)

Open and Programmable API Framework

Easy, Flexible Programming of Infrastructure / Acceleration and Security

29

DPU SOFTWARE COMPONENTS

Bootloader – UEFI, ATF (Arm Trusted FW), ACPI

Linux Distro - CentOS reference drivers, Ubuntu commercial OS

Mellanox Drivers : OFED driver, ASAP2, NVME SNAP

Secure Boot and Secure Firmware Upgrade

OpenBMC for BMC Management

ConnectX-6 Dx firmware binary file

30

NVIDIA DOCA

COMMUNITY of

DEVELOPERSACCELERATE TTM COMPETITIVE EDGE

SDK for ecosystem partners,

academia,community

Leverages open-source andindustry standards (DPDK, P4);

NGC-certified

Best performance;out-of-the-box experience;

libraries with special capabilities

LONG-TERM

COMMITMENT

Backward and forward compatibility;

consistency with performance improvements

Data-Center-Infrastructure-on-a-Chip Architecture

DOCA is for DPUs what CUDA is for GPUs

31

Developer Zone Program and Website

SDK Manager Support

Tools (Compilers, Benchmarks, etc.)

DOCA Drivers and Libraries

API References and Programming Guides

Reference Applications per Use Case

Accelerated Solutions Integration

DOCA

ONE-STOP SHOP FOR DPU DEVELOPERS

32

DOCA SDK STACK

APPLICATIONS

DOCA

SERVICES

DOCA LIBRARIES

DOCA DRIVERS

RDMA

DPI HPC/AI

VNF/UPF

FlexIO

TSDC

DPU Management

Security

DPDK RegExDPDK SFT

Inline Crypto

Networking

ASAP2

DPDKP4 P4-RT

HPC/AIStorage

SNAP VirtIO-FS

XTS Crypto

FLOW

UCX/UCCHost

IntrospectionOrchestration

SDN

Telemetry

Networking Security Storage Telco MediaHPC/AI

Comm Channel

DPU – BlueField and BlueField-X

DO

CA

RiverMax

Storage

Data Integrity

SPDK

33

JOIN THE DOCA DEVELOPER PROGRAM TODAYhttps://developer.nvidia.com/nvidia-doca-sdk-early-access

34

DPU PARTNER ECOSYSTEMHybrid Cloud Compatibility | No Fork-Lift Upgrades | No Vendor Lock-In


Recommended