+ All Categories
Home > Documents > Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more...

Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more...

Date post: 26-Mar-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
54
Intel® MIC x100 Coprocessor Driver - on the Frontiers of Linux & HPC Nikhil Rao ([email protected]) LinuxCon 2013
Transcript
Page 1: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Intel® MIC x100 Coprocessor Driver - on the Frontiers of Linux & HPC

Nikhil Rao ([email protected])

LinuxCon 2013

Page 2: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Intel® Xeon Phi* (MIC) x100 Coprocessors

Highly-parallel Processing for Unparalleled Discovery

Groundbreaking: differences

Up to 61 IA cores/1.1 GHz/ 244 Threads

Up to 16GB memory with up to 352 GB/s bandwidth

512-bit SIMD instructions

Linux* operating system, IP addressable

Standard programming languages and tools

Leading to Groundbreaking results

Up to 1 TeraFlop/s double precision peak performance1 Enjoy up to 2.2x higher memory bandwidth than on an Intel® Xeon® processor E5 family-based server.2

Up to 4x more performance per watt than with an Intel® Xeon® processor E5 family-based server. 3

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Notes 1, 2 & 3, see backup for system configuration details.

Page 3: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Programming Models

Offload

main()

CPU

Native

main()

Coprocessor

foo()

CPU

main()

Coprocessor

CPU Coprocessor CPU

main()

Coprocessor

CPU

main()

Coprocessor

main()

main()

Symmetric

Page 4: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Compiler Assisted Offload

Host Only Code

float ret = 0;

#pragma omp parallel for reduction (+:ret)

for (int i = 0; i < size; i++)

{

ret += data[i];

}

ans = a[0] + a[1] + .. + a[n-1]

Page 5: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Compiler Assisted Offload

float ret = 0;

#pragma omp parallel for reduction (+:ret)

for (int i = 0; i < size; i++)

{

ret += data[i];

}

Loop Offloaded to Coprocessor

float ret = 0;

#pragma offload target(mic) in(size) in(data:length(size))

{

#pragma omp parallel for reduction (+:ret)

for (int i = 0; i < size; i++)

{

ret += data[i];

}

}

ans = a[0] + a[1] + .. + a[n-1]

Page 6: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Intel® Manycore Platform Software (MPSS) Stack

Host side Tools • Coprocessor FS, network configuration • Status monitoring (e.g. Temperature, Power, RAS) • Coprocessor OS state management (micctrl, mpssd) • VirtIO devices (mpssd)

Programming Models Host Platform

Tools

Driver

Coprocessor

Linux* OS

Offload Apps

Coprocessor

Linux* OS PCIe*

PCIe*

MPI* TCP/IP

Page 7: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

• Linux* OS, K1OM ABI

• Busybox filesystem

Intel® MPSS Coprocessor Environment

Programming Models Host Platform

Tools

Driver

Coprocessor

Linux* OS

Offload Apps

Coprocessor

Linux* OS PCIe*

PCIe*

MPI TCP/IP

Page 8: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Intel® Xeon Phi™ Coprocessor Driver

Coprocessor OS Management

Virtual (VirtIO based) Device Support

Process P0 Process P1 PCIe*

PCIe* Messaging & RDMA APIs (SCIF)

Page 9: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Coprocessor OS Boot

Host Driver

sysfs

FW ready

micctrl -b

User

Kernel

Coprocessor

Page 10: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Coprocessor OS Boot

Host Driver

sysfs

FW ready

micctrl -b

User

Kernel

bzImage file name

RAMdisk file name

boot

Coprocessor

Page 11: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Coprocessor OS Boot

Host Driver

sysfs

bzImage

FW ready

micctrl -b

User

Kernel

ramdisk

Coprocessor

Interrupt

Page 12: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Coprocessor OS Boot

Host Driver

sysfs

micctrl -b

User

Kernel

Linux* Coprocessor

Page 13: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Virtio Drivers • Virtio - framework that enables use of common

guest drivers across hypervisors

KVM

Qemu virtqueue

Guest

virtio_net.ko virtio_net.ko

Page 14: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Virtio Drivers • Virtio - framework that enables use of common

guest drivers across hypervisors

virtqueue Guest

virtio_net.ko

lguest

lguest virtio_net.ko

Page 15: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Virtio Drivers • Virtio - framework that enables use of common

guest drivers across hypervisors

Guest virtqueue

Coprocessor Host

mpssd virtio_net.ko

Page 16: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Virtio Drivers • Virtio - framework that enables use of common

guest drivers across hypervisors

Guest virtqueue

Coprocessor Host

mpssd

• Key benefits

• Reuse of well designed, maintained code

• Standard, enables a simple backend

• New devices possible in the future

virtio_net.ko

Page 17: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Device Emulation

HW

Hypervisor/Host OS

Virtio Driver

Virtio Data Path

Guest/Coprocessor OS

avail

used

virtqueue

Page 18: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Device Emulation

HW

Hypervisor/Host OS

Virtio Driver

Buffer

Virtio Data Path

Guest/Coprocessor OS

avail

Interrupt

used

virtqueue

Page 19: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Device Emulation

HW

Hypervisor/Host OS

Virtio Driver

Buffer

Virtio Data Path

Guest/Coprocessor OS

avail

used

virtqueue

Page 20: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Device Emulation

HW

Hypervisor/Host OS

Virtio Driver

Buffer

Virtio Data Path

Guest/Coprocessor OS

avail

Interrupt

used

virtqueue

Page 21: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Virtio Data Path Setup

Device Emulation (mpss daemon)

Coprocessor Host driver virtio-mic

Host OS Coprocessor OS

Virtio Bus

Device Page

Page 22: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Virtio Data Path Setup

Device Emulation (mpss daemon)

Coprocessor Host driver virtio-mic

Host OS Coprocessor OS

Device create IOCTL

• Device page entry

– vring addresses, interrupt information

– Status notification (e.g., driver unloaded)

Virtio Bus

Device Page

Device Entry

Page 23: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Virtio Data Path Setup

Device Emulation (mpss daemon)

Coprocessor Host driver virtio-mic

Host OS Coprocessor OS

Device create IOCTL

• Device page entry

– vring addresses, interrupt information

– Status notification (e.g., driver unloaded)

Virtio Device

Virtio Bus

Device Page

Device Entry

Page 24: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

TCP/IP

Virtio-net

virtio-pci

QEMU Network Backend

TAP

bridge

Host OS

Guest

QEMU process

kvm.ko

TCP/IP

Virtio-net

virtio-mic

Network backend (mpssd)

TAP

bridge

Host OS

Coprocessor OS

Coprocessor Driver

Data path

What’s different ?

Control path

Page 25: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

SCIF • Symmetric Communications Interface

• Goals

– Performance (PCIe* Available BW 7GB/s)

• TCP/IP host to card BW is around 400MB/s

– Abstract the PCIe* network

PCIe*

Host

Coprocessor

Coprocessor

IB* HCA

~ ~

Page 26: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

SCIF • Symmetric Communications Interface

• Goals

– Performance (PCIe* Available BW 7GB/s)

• TCP/IP host to card BW is around 400MB/s

– Abstract the PCIe* network

PCIe*

Host

Coprocessor

Coprocessor

IB* HCA

~ ~

Page 27: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

SCIF • Symmetric Communications Interface

• Goals

– Performance (PCIe* Available BW 7GB/s)

• TCP/IP host to card BW is around 400MB/s

– Abstract the PCIe* network

PCIe*

Host

Coprocessor

Coprocessor

IB* HCA

~ ~

Page 28: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

SCIF • Symmetric Communications Interface

• Goals

– Performance (PCIe* Available BW 7GB/s)

• TCP/IP host to card BW is around 400MB/s

– Abstract the PCIe* network

PCIe*

Host

Coprocessor

Coprocessor

IB* HCA

~ ~

Page 29: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

SCIF • Symmetric Communications Interface

• Goals

– Performance (PCIe* Available BW 7GB/s)

• TCP/IP host to card BW is around 400MB/s

– Abstract the PCIe* network

PCIe*

Host

Coprocessor

Coprocessor

IB* HCA

send/recv, RMA, mapped memory APIs

~ ~

Page 30: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

SCIF Endpoints & Connections

Process P0 PCIe* Process P1

Node 0

Port X Port Y

SCIF endpoint

– pipe to a PCIe* node or loopback, bound to a port ID

Exactly 2 endpoints can form a connection, SCIF data transfer/mapping APIs can only accept a connected endpoint

SCIF SCIF

Node 1

Page 31: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

• Connection

• Messaging

• Memory Registration

• Remote Memory Access (RMA)

• RMA Fencing

• Remote memory mapping (mmap)

SCIF API Functional Grouping

Page 32: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Connection & send/recv

Page 33: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Send/recv Implementation

Node 1

Port Y Port X

Node 0

msg0 msg1

Process P0

Process P1

Endpoint Recv Q

Endpoint Recv Q

SCIF SCIF

P0: scif_send(epd, msg0, len, flags); P1: scif_recv(epd, msg1, len, flags);

PCIe*

PCIe*

Page 34: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Send/recv Implementation

Node 1

Port Y Port X

Node 0

msg0 msg1

Process P0

Process P1

Endpoint Recv Q

Endpoint Recv Q

SCIF SCIF

P0: scif_send(epd, msg0, len, flags); P1: scif_recv(epd, msg1, len, flags);

PCIe*

PCIe*

Page 35: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Send/recv Implementation

Node 1

Port Y Port X

Node 0

msg0 msg1 msg1

Process P0

Process P1

Endpoint Recv Q

Endpoint Recv Q

SCIF SCIF

P0: scif_send(epd, msg0, len, flags); P1: scif_recv(epd, msg1, len, flags);

PCIe*

PCIe*

Page 36: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Memory Registration

Process P0 Process P1

Node 0 Node 1

Port X

buf0 buf1

• SCIF RMA provides zero copy inter-process data transfer

• Registration exposes local memory for remote access

• Pins pages – Local DMA engine access

– Remote access

Port Y

PCIe*

Page 37: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Registered Address Space (RAS)

• Offsets Reference registered memory in RMA APIs

• RAS is per connection

• Connection has 2 registered address spaces – Local & Remote

– Local RAS offset = Peer’s Remote RAS offset

node0:X

Remote RAS Local RAS

node1:Y

Remote RAS Local RAS

Connection

Process P0 Process P1

off_t scif_register(epd, addr, len, …, prot, ..);

Page 38: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Registered Address Space (RAS)

• Offsets Reference registered memory in RMA APIs

• RAS is per connection

• Connection has 2 registered address spaces – Local & Remote

– Local RAS offset = Peer’s Remote RAS offset

node0:X

Remote RAS Local RAS

buf0

node1:Y

Remote RAS Local RAS

Connection

Process P0 Process P1

offset0

off_t scif_register(epd, addr, len, …, prot, ..);

Page 39: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Registered Address Space (RAS)

• Offsets Reference registered memory in RMA APIs

• RAS is per connection

• Connection has 2 registered address spaces – Local & Remote

– Local RAS offset = Peer’s Remote RAS offset

node0:X

Remote RAS Local RAS

buf0

node1:Y

Remote RAS Local RAS

buf0

Connection

Process P0 Process P1

offset0 offset0

off_t scif_register(epd, addr, len, …, prot, ..);

Page 40: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Registered Address Space (RAS)

• Offsets Reference registered memory in RMA APIs

• RAS is per connection

• Connection has 2 registered address spaces – Local & Remote

– Local RAS offset = Peer’s Remote RAS offset

node0:X

Remote RAS Local RAS

buf0

node1:Y

Remote RAS Local RAS

buf0

Connection

Process P0 Process P1

offset0 offset0

off_t scif_register(epd, addr, len, …, prot, ..);

offset1

buf1

Page 41: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Registered Address Space (RAS)

• Offsets Reference registered memory in RMA APIs

• RAS is per connection

• Connection has 2 registered address spaces – Local & Remote

– Local RAS offset = Peer’s Remote RAS offset

node0:X

Remote RAS Local RAS

buf0

node1:Y

Remote RAS Local RAS

buf0

Connection

Process P0 Process P1

offset0 offset0

off_t scif_register(epd, addr, len, …, prot, ..);

buf1

offset1 offset1

buf1

Page 42: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

RMA int scif_writeto(epd, offset0, len, offset1, flags);

node0:X

Remote RAS Local RAS

buf0 buf1

offset1 offset0

Connection

Process P0

Page 43: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

RMA int scif_vwriteto(epd, buf0, len, offset1, flags);

node0:X

Remote RAS

buf1

offset1

Process VA

buf0

addr

Connection

Process P0

Page 44: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

RMA Fence APIs • Asynchronous RMAs allow overlap of compute &

communication

• Fence APIs allow synchronization with RMA completion

Non-blocking (polling) synchronization RAS

Tim

e

t1

t2

t3

t6

t7

RMA2

RMA1

t4

t5

RMA3

write v

off

scif_fence_signal(ep,off,v)

Page 45: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

RMA Fence APIs (contd)

scif_fence_wait(ep,m)

RAS

Tim

e

t1

t2

t3

t6

t7

RMA2

RMA1

t4

t5

RMA3

m=scif_fence_mark(ep)

Blocking Synchronization

Page 46: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Remote Memory Mapping

node0:X

Remote RAS Local RAS

Buf0 Buf1

Process VA

Buf0

va = mmap(addr, len, prot, flags, epd, offset1);

offset1

Connection

Lowest latency path for messaging

Process P0

offset0

Page 47: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Remote Memory Mapping

node0:X

Remote RAS Local RAS

Buf0 Buf1

Buf1

Process VA

va

Buf0

va = mmap(addr, len, prot, flags, epd, offset1);

offset1

Connection

Lowest latency path for messaging

Process P0

offset0

Page 48: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

OFED* over SCIF

• OpenFabrics Enterprise Distribution (OFED*) open-source software stack for InfiniBand* and iWARP*

• IB-SCIF driver

– Software emulated HCA

– Used within the box

– IB-SCIF driver uses kernel SCIF send/recv and RMA operations

IB uverbs

IB core

IB Verbs Library

IB-SCIF driver

SCIF

User / Kernel Mode

MPI Application

uDAPL

Host /

Coprocessor

IB-SCIF Library

Page 49: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

SCIF RMA Performance

0

1

2

3

4

5

6

7

8

Thro

ugh

pu

t (G

B/s

ec)

Transfer Size (KBytes)

Comparison of TCP and SCIF based BW

Available PCIe BW

SCIF Write DMA (Host initiated)

SCIF Write DMA (Coprocessor initiated)

TCP (Host->Coprocessor)

TCP (Coprocessor->Host)

Page 50: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Code Status & Plans

• Patches for features below submitted, expect inclusion in 3.13

– Coprocessor OS state management

– Virtio device support

• Future patches

– DMA engine & usage in Virtio device support

– SCIF

Page 51: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Summary

• MIC x100 Coprocessor driver is a key element of an all Linux* HPC platform

– Enables choice of programming models

• New driver features

– Virtio for PCIe* endpoints

– SCIF communication

Possibilities for reuse in your HW ? Suggestions ?

Let us know!

Page 52: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Acknowledgements!

• Team

– Dasa Chandramouli , Bruce Chang , Bill Clifford Ashutosh Dixit,, Sudeep Dutt, Harsha Kharche, Sanath Kumar, Ravi Murty, Johnnie Peters, Evan Powers, John Wiegert, Siva Yerramreddy, Caz Yokoyama, Jianxin Xiong

• Reviewers

– PJ Waskiewicz, Eddie Dong

• Presentation – James Reinders

Page 54: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software

Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Intel, Cilk, VTune, Xeon, Xeon Phi, Look Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries.

*Other names and brands may be claimed as the property of others. Copyright ©2013 Intel Corporation.


Recommended