+ All Categories
Home > Documents > Leveraging HyperTransport for a custom high-performance...

Leveraging HyperTransport for a custom high-performance...

Date post: 31-May-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
25
Leveraging HyperTransport for a custom high-performance cluster network Leveraging HyperTransport for a custom high-performance cluster network Mondrian Nüssle HTCE Symposium 2009 11.02.2009
Transcript
Page 1: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Mondrian NüssleHTCE Symposium 2009

11.02.2009

Page 2: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

OutlineOutline

Background & Motivation

Architecture

Hardware Implementation

Software Stack

Results

Conclusion

NIC NetworkHost Interface

Hyper-TransportIP Core

HTAXXBar

ATU

VELO

C&SRegisterfile

RMA

EXTOLLXBar

Net-work-port

Link-port

Link-port

Link-port

Link-port

Link-port

Link-port

Net-work-port

Net-work-port

Page 3: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

EXTOLL: Background & MotivationEXTOLL: Background & Motivation

High-performance computing synonymous with parallel computingInterconnection networks between processors are a key component in parallel systemsPatterson stated: “Latency lags Bandwidth”

The EXTOLL project at the CAG aims to significantly lower communication latency and improve communication in parallel systems

Page 4: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

GoalsGoals

Enable communication with extremely low latency → close to main memory access

Enable communication – computation overlapDesign a balanced system

In terms of CPU on-loading and off-loadingIn terms of system complexity

Adding bandwidth is much easier ☺

Page 5: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Key design factsKey design facts

Leverage HT as host interface for lowest latency of data transport between CPU and deviceLeverage modified HT as on-chip communication protocol Implement a lean network interface controller:

Minimize state information on NICProvide user-level, virtualized access (avoid kernel)Minimize number of CPU ↔ device and memory ↔device transactions

Network layer that provides reliable, in-order, low-latency transport service

Page 6: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

OutlineOutline

Background & Motivation

Architecture

Hardware Implementation

Software Stack

Results

Conclusion

NIC NetworkHost Interface

Hyper-TransportIP Core

HTAXXBar

ATU

VELO

C&SRegisterfile

RMA

EXTOLLXBar

Net-work-port

Link-port

Link-port

Link-port

Link-port

Link-port

Link-port

Net-work-port

Net-work-port

Page 7: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Block diagramBlock diagram

NIC NetworkHost Interface

Hyper-TransportIP Core

HTAXXBar

ATU

VELO

C&SRegisterfile

RMA

EXTOLLXBar

Net-work-port

Link-port

Link-port

Link-port

Link-port

Link-port

Link-port

Net-work-port

Net-work-port

Host Interface blockNIC block:

Several communication functions

Network block6 links9x9 crossbar

Flexible architecture:

Configurable data path

Page 8: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Communication functions: VELOCommunication functions: VELO

Virtualized Engine for low overheadEnable ultra-low send/receive communicationSupports messages of up to 64-byte (one cache line) directlyA single PIO transaction triggers sending of a messageMessage completion at the receiver is usually performed with a single DMA transaction

Minimized traffic between host and device!

NIC

ATU

VELO

C&SRegisterfile

RMA

Page 9: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Communication functions: RMACommunication functions: RMA

EXTOLL Remote Memory Architecture

Enables access to remote memory using put, get and atomic transactionsTransaction triggered by a single 128-bit SSE2 store → minimizing start-up latencyFlexible notifications:

at the requesterthe completerthe responder or any combination thereof

NIC

ATU

VELO

C&SRegisterfile

RMA

Page 10: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Supporting modulesSupporting modules

Address Translation UnitProvides address translation services for RMARegistration/unregistration latency in prototype systems starts at ~2 µsTranslation using on-chip TLB and main-memory tables

Control and Status Registerfileautomatically generated from high-level spec (including kernel code)Local and remote access possible (network management software)

NIC

ATU

VELO

C&SRegisterfile

RMA

Page 11: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

HT InterfaceHT Interface

HT-Core: interface to hostAll functional units need to communicate with hostAvoid protocol conversion for on chip-network

→ HTAX crossbar running on-chip protocol

simplifiedmore source tagsfixed format

Host Interface

Hyper-TransportIP Core

HTAXXBar

Page 12: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Network layerNetwork layer

Fully parametrizable width of data-paths and number of portsIn-order delivery of packetsVirtual channelsHardware retransmissionCut-through switchingCredit based flow-control

Current implementations:6 ports used to connect to external links16+2 bit data path width

Network

EXTOLLXBar

Net-work-port

Link-port

Link-port

Link-port

Link-port

Link-port

Link-port

Net-work-port

Net-work-port

Page 13: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

OutlineOutline

Background & Motivation

Architecture

Hardware Implementation

Software Stack

Results

Conclusion

NIC NetworkHost Interface

Hyper-TransportIP Core

HTAXXBar

ATU

VELO

C&SRegisterfile

RMA

EXTOLLXBar

Net-work-port

Link-port

Link-port

Link-port

Link-port

Link-port

Link-port

Net-work-port

Net-work-port

Page 14: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Implementation IImplementation I

EXTOLL prototype is implemented on the HTX-Board

Virtex 4 FX100 FPGA, speed-grade 11 or 126 SFP optical transceivers

Currently :16 bit width, 180 MHz core frequency

3.6 Gb/s links

Page 15: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Implementation IIImplementation II

> 90% of all slices of the FPGA are in use for the designHT-Core runs at 200 MHz internal frequency and HT400EXTOLL modules run with 180 MHz on speed-grade -12 device

Page 16: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

OutlineOutline

Background & Motivation

Architecture

Hardware Implementation

Software Stack

Results

Conclusion

NIC NetworkHost Interface

Hyper-TransportIP Core

HTAXXBar

ATU

VELO

C&SRegisterfile

RMA

EXTOLLXBar

Net-work-port

Link-port

Link-port

Link-port

Link-port

Link-port

Link-port

Net-work-port

Net-work-port

Page 17: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

EXTOLL Basedriver

atudrv

User-Application

libVELO

Middleware, i.e. MPI, GasNET(Library)

EXTOLL Hardware

VELO RMA RegisterfileATU

User Space

NIC

Kernel Space

Application Management

libRMA

extoll_rfrmadrvvelodrv

sEru

PCIConfig-space

Software StackSoftware Stack

OS bypassLayered approachPGAS support through GasNETMPI support through OpenMPILinux kernel driver

Page 18: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

OutlineOutline

Background & Motivation

Architecture

Hardware Implementation

Software Stack

Results

Conclusion

NIC NetworkHost Interface

Hyper-TransportIP Core

HTAXXBar

ATU

VELO

C&SRegisterfile

RMA

EXTOLLXBar

Net-work-port

Link-port

Link-port

Link-port

Link-port

Link-port

Link-port

Net-work-port

Net-work-port

Page 19: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

0

1

2

3

4

5

6

7

8

10 100 1000

Late

ncy

[us]

Size [byte], logarithmic scale

EXTOLL VELOEXTOLL RMA PutEXTOLL RMA Get

Results Results –– LatencyLatency

Start-up latency~ 1 µs

RMA Put transaction beats VELO at 256 bytes

Get latency is full roundtrip

Page 20: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

50

100

150

200

250

300

350

10 100 1000

Ban

dwid

th [M

B/s

]

Size [byte], logarithmic scale

EXTOLL VeloEXTOLL PutEXTOLL Get

Peak payload bandwidthHalf peak payload bandwidth

Results Results -- BandwidthBandwidthMore than n½ bandwidth

at 32 byte! Maximum bandwidth

reached at 4k

Page 21: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

0 200 400 600 800 1000 1200

HT1000 ASIC, 800MHz,est.

HT800 ASIC, 500MHz int,est.

optimized FPGA, HT400,200 MHz

FPGA, HT400, 180 MHz

Reference: MellanoxConnect X DDR IB

Technology ScalingTechnology Scaling

Already beats best IB Silicon

ASIC would show 3 times lower latency!

Page 22: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

OutlineOutline

Background & Motivation

Architecture

Hardware Implementation

Software Stack

Results

Conclusion

NIC NetworkHost Interface

Hyper-TransportIP Core

HTAXXBar

ATU

VELO

C&SRegisterfile

RMA

EXTOLLXBar

Net-work-port

Link-port

Link-port

Link-port

Link-port

Link-port

Link-port

Net-work-port

Net-work-port

Page 23: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

ConclusionConclusion

EXTOLL is an architecture for ultra low-latency communication in parallel systemsprototype hardware is up and runningbasic software environment is up and runningPerformance numbers are excellent:

~ 1 μs start-up latency on FPGA prototypeBandwidth limited by serializers & board, but can be improved with new platform

Page 24: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Next StepsNext Steps

more software is being addedMost interesting GasNET

Evaluation on 1024-core Valencia ClusterOn the hardware-side, next step is a new revision with a more powerful base technology

Evaluation of next platform for HW

Page 25: Leveraging HyperTransport for a custom high-performance ...ra.ziti.uni-heidelberg.de/coeht/pages/events/... · Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network

Thanks !Thanks !

Questions?


Recommended