+ All Categories
Home > Documents > Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform...

Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform...

Date post: 27-Jun-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
24
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG
Transcript
Page 1: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Xeon+FPGA Platform for the Data Center

ISCA/CARL 2015

PK Gupta, Director of Cloud Platform Technology, DCG/CPG

Page 2: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

2

Overview

• Data Center and Workloads

• Xeon+FPGA Accelerator Platform

• Applications and Eco-system

Page 3: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

3

Exponential growth in mobile….

Page 4: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

4

…is driving Data Center growth

Page 5: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

5

Leading to search for greater performance efficiencies…

Page 6: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

6

…Across Data Center Workloads

• Diverse workloads:

• Cloud Services: Search, Web Servers, ..

• Analytics: Big Data, Machine Learning, …

• Scientific: Genomics, Security, …

• Communication: Packet Processing, Virtual Switching, …

• Storage: Compression, Deduplication, …

• Changing dynamics:

• No single killer app

• Emerging new apps drive changes in workloads

Page 7: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

7

A homogenous compute platform for the Data Center?

Page 8: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

8

Overview

• Data Center and Workloads

• Xeon+FPGA Accelerator Platform

• Applications and Eco-system

Page 9: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Motivation for Accelerators

Enhanced Performance: Accelerators compliment CPU cores to meet market needs for performance of diverse workloads in the Data Center:

– Enhance single thread performance with tightly coupled accelerators or compliment multi-core performance with loosely coupled accelerators via PCIe or QPI attach

Move to Heterogeneous Computing: Moore’s Law continues but demands radical changes in architecture and software.

– Architectures will go beyond homogeneous parallelism, embrace heterogeneity, and exploit the bounty of transistors to incorporate application-customized hardware.

.

Page 10: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Accelerator Architecture

Flexibility

Cost (Area, Power)

Ease of Programming/ Development

Fixed-function

Reconfigurable

General Purpose Cores

Performance Efficiency: Performance/Watt, Performance/$Programming Complexity : Effort, Cost

Page 11: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Accelerator Attach

Best attach technology might be application or even algorithm dependent

Distance from Core

Cost(Latency, Granularity)

On-core

On-Chip

On-Package

QPI attach

PCIeattach

Page 12: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Coherency and Programming Model

•Data Movement

• In-line

• Accelerator processes data fully or partially from direct I/O

• Shared Virtual Memory :

• Virtual addressing eliminates need for pinning memory buffers

• Zero-copy data buffers

• Interaction between Core and Accelerator

• Off-load

• Hybrid : algorithm implemented on host and accelerator

Page 13: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

13

Proposed Platform for the Data Center

• FPGA with coherent low-latency interconnect:

• Simplified programming model

• Support for virtual addressing

• Data Caching

• Enables new classes of algorithms for acceleration with:

• Full access to system memory

• Support for efficient irregular data pattern access

• Remapping of algorithms from off-load model to hybrid processing model

• Fine grained interactions

Page 14: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

IVB+FPGA Software Development Platform

14

QPI

DDR3

DDR3

DDR3

DDR3

DDR3

PC

Ie*

3.0

x8

DM

I2

PC

Ie*

3.0

x8

PC

Ie*

3.0

x8

PC

Ie*

3.0

x8

PC

Ie*

3.0

x8

PC

Ie*

3.0

x8

DDR3

Intel® Xeon®

E5-2600 v2 Product Family

FPGA

ProcessorIntel® Xeon® E5-26xx v2Processor

FPGA Module Altera Stratix V

QPI Speed 6.4 GT/s full width (target 8.0 GT/s at full width)

Memory to FPGA Module

2 channels of DDR3(up to 64 GB)

Expansion connector to FPGA Module

PCIe 3.0 x8 lanes - maybe used for direct I/O e.g. Ethernet

FeaturesConfiguration Agent, CachingAgent,, (optional) Memory Controller

Software

Accelerator Abstraction Layer (AAL) runtime, drivers, sample applications

Software Development for Accelerating Workloads using Xeon and coherently attached FPGA in-socket

Heterogeneous architecture with homogenous platform support

Page 15: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Programming Interfaces

15

Host Application

Virtual Memory API Addr Translation

Uncore

QPI/KTI Link, Protocol, & PHY

CPU FPGA

QPI/KTI

CCIstandard

Accelerator Function Units (AFU)

CCIextended

Programming interfaces will be forward compatible from SDP to future MCP solutionsSimulation Environment available for development of SW and RTL

Service API

Physical Memory API

Accelerator Abstraction

Layer

Page 16: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Programming Interfaces : OpenCL

16

OpenCL Application

Virtual Memory API VirtMem

CPU FPGA

QPI/UPI/PCIe

CCI Standard

OpenCL Kernels

CCI Extended

Unified application code abstracted from the hardware environmentPortable across generations and families of CPUs and FPGAs

Service API

Physical Memory API

Accelerator Abstraction

Layer

System Memory

CFG

Physical Memory API

OpenCL RunTime

OpenCL Host Code

Code

OpenCL Kernel Code

Page 17: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

17

Overview

• Data Center and Workloads

• Xeon+FPGA Accelerator Platform

• Applications and Eco-system

Page 18: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Cloud Controller

dynamic nodecomposition

CloudServiceProvider

CloudServiceProvider

IA Logical Node with Acceleration Intel

Rackscale

Placingworkload

XEON+FPGA in the Cloud : integration with SDI and RSA

Intel HAAS

Cloud UsersCloud Users

Workload

SDI (Exposing IA capabilities)

IA Optimized Software

18

Page 19: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Cloud UsersCloud Users

Workload

Cloud Controller

IA Optimized Software

Intel HAAS

SDI (Exposing IA capabilities)

CloudServiceProvider

CloudServiceProvider

3rd party Accelerators

IntelAccelerators

Intel Store Client

3rd party Accelerators

3rd party IP developers3rd party IP developers

-IP Catalogue

XEON+FPGA in the Cloud: IP Store Concept

static/dynamic FPGA programming

Placingworkload

IA Node XEON+FPGA

19

Page 20: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

20

Example Usage : Deep Learning Framework for Visual Understanding

clu

ste

rn

od

ed

ev

ice

pri

mit

ive

s

Processing Tile ‘n’

Processing Tile 1DMA

PE

We

igh

ts

Inp

uts

Ou

tpu

ts

Processing Tile 0

PE

PE

Read Write RegAccess

SRAM Controller

Control State

Machine

IP Regist

ers

CCI Interface

Page 21: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Example Usage: Accelerating Open VSwitch w/DPDK

Offload DMA Engine to FPGA :

• Frees up CPU cycles to perform more useful work

• Reduce cache pollution.

• Add support for Packet Classification, ACL, and other functions including Direct I/O in FPGA

Open vSwitch Kernel Module

ovs-vswitchd

ovsdv-server

VM0 VM1

Physical Switch/NIC

QEMU QEMU

Userspace

Kernel space

DPDK Libraries

IVSHEM VHOST

netdev

TAP

Userspace forwarding

VirtIOVirtIO

FPGA DMA

21

Page 22: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Example Usage: High Frequency Trading Accelerator

Ethernet PHY & MAC

Feed Parser

Trading Logic / Statistics

Order Generation

FPGA

CPU

Host Application

22

Page 23: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

23

Academic Research

Call for Proposals: Intel-Altera Heterogeneous Architecture Research Platform ProgramSubmitted by Nicholas Carter

Intel-Altera Heterogeneous Architecture ResearchPlatform (HARP) Program

Intel® Corporation and Altera® Corporation are pleased to announce the Heterogeneous Architecture Research Platform (HARP) program, which will provide faculty with computer systems containing Intel microprocessors and an Altera Stratix® V FPGA module that incorporates Intel® QuickAssist Technology in order to spur research in programming tools, operating systems, and innovative applications for accelerator-based computing systems.

Page 24: Xeon+FPGA Platform for the Data Centercalcm/carl/lib/exe/...IVB+FPGA Software Development Platform 14 QPI DDR3 DDR3 DDR3 DDR3 DDR3 PCIe* 3.0 x8 DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe*

Q & A

24


Recommended