+ All Categories
Home > Documents > Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor...

Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor...

Date post: 27-Dec-2015
Category:
Upload: lesley-gallagher
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs
Transcript
Page 1: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Xiaocheng ZhouIntel Labs China

“Single-chip Cloud Computer”

An experimental many-core processor from Intel Labs

Page 2: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

What is Tera-scale? TIPs of compute power operating on Tera-bytes of data

Terabytes

TIPS

Gigabytes

MIPS

Megabytes

GIPS

Perf

orm

an

ce

Dataset Size

Kilobytes

KIPS

Tera-scale

Multi-core

Single-core

Mult-Media

3D &Video

Text

RMS Personal Media Creation and Management

Learning & Travel

Entertainment

Health

Source: electronic visualization lab University of Illinois

http://techresearch.intel.com/articles/Tera-Scale/1421.htm

Page 3: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Performance Scaling Challenges

EnergyEfficiency

EmergingApplications

ProgrammingStrategy

DesignComplexity

Page 4: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Cloud Computing Today

Cloud datacenters:–1000s of networked computers –Millions of threads & petabytes of data

Opportunity: –Lower power, higher density via

integration–Greater efficiency and better

programmability

Example: Intel’s Open Cirrus testbedIntel Labs Pittsburgh

Future:Many-core Processor?

Page 5: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Single-chip Cloud Computer (SCC)

• Experimental many-core CPU on 45 nm Hi-K metal-gate silicon

• 48 IA-compatible cores

• Network of 2-core nodes mimics cloud computing at chip level

• Fine-grained power management scales from 25-125W

• Supports proven, highly parallel “scale-out” programming models

Page 6: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

R

MC

MC

MC

MC

24 Tiles24

Routers48 IA cores

Inside the SCCCore 1

Core 2

L2 Cache

L2 Cache

ROUTERMessage BufferROUTER

MEM

ORY

CO

NTR

OLL

ER

• 2D mesh network• 4 Integrated DDR3

memory controllers (64GB addressable)

R R

R R R

1TILE

Dual-core SCDC Tile

R

Page 7: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Architecture– 6x4 2D Mesh NOC– 16B wide data links + 2B sideband– 8 Virtual Channels in 2 classes– Fixed (X-Y) routing

Performance– Target freq: 2GHz @ 1.1V– Link Bandwidth 64GB/s– 4 cycle latency

Power Management– Independent Frequency & Voltage control– Sleep mode, clock gating, low power RF

On-die Interconnect

Page 8: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Memory – Up to 64GB DDR3 via 4 memory controllers @

21.3GB/s– 16KB SRAM in each tile as Message Passing Buffer

(MPB) Caching

– 32KB L1 per core (16KB I,D), 12MB L2 cache (256KB/core)

– No HW cache-coherent shared memory Addressing

– Core physical to system physical addresses in 16MB sections

– Memory mapped configuration & control registers

Memory Architecture

Page 9: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Address Translation:From Core Address to System Address

Core PhysicalAddress Space

Core Physical Address Space

System Physical Address Space

Physical-Physical Mapping

Physical-Physical Mapping

Look Up Table (LUT)

Page 10: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.
Page 11: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

11

Page 12: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Message Passing on SCC Regions of memory mapped to multiple cores

– Message Passing Buffer (MPB) for small fast messages– Larger buffers in off-die memory

Message Passing Data Type (MPDT)– R/W bypass L2 cache – tagged in L1 as MPDT– New instruction to selectively invalidate MPDT lines

Read/Write to other core’s MPB on-die– Synchronize through special atomic register bits– Core-core asynchronous interrupts

High-level API for applications – “RCCE”– One-sided communication (Get, Put, Send, Recv)– MPB allocation, synchronization

Page 13: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Improving Energy EfficiencyFine-grain, software-controlled power management

8 voltage and 28 frequency islands– Each tile can run at a different frequency– 6 banks of four tiles can run at different voltages– Also independent V&F control for I/O network & MCs

Mem

ory

Con

trolle

r

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Tile

R

Mem

ory

Con

trolle

r

Mem

ory

Con

trolle

rM

em

ory

Con

trolle

r

V1 V2

FnFn

Fn Fn V3

V4 V5 V6

Page 14: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Package and Test Board

Technology 45nm Process

Package 1567 pin LGA package

14 layers (5-4-5)

Signals 970 pins

Page 15: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

SCC Platform Board Overview

Page 16: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

SCC “Chipset”

System Interface FPGA– Connects to SCC Mesh interconnect– IO capabilities like PCIe, Ethernet & SATA

– Bitstream loaded by BMC

Board Management Controller (BMC)– JTAG interface for Clocking, Power etc.– USB Stick to hold FPGA bitstream– Network interface for User intercation via Telnet– Status monitoring

Page 17: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Software Environment SCC Software

– Customized Linux– Bare Metal– RCCE communication & power management– Tools

– Selected Intel tools (e.g., icc, ifort, ...) – Microsoft research release of SCC extensions to Visual Studio

Management Console PC Software– PCIe driver with integrated TCP/IP driver– Programming API for communication with SCC platform– GUI for interaction with SCC platform– Command line tools for interaction with SCC platform

Page 18: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

RCCE Communication API

A compact, lightweight communication environment.– SCC and RCCE were designed together side by

side:– … a true HW/SW co-design project.

A research vehicle to understand how message passing APIs map onto many core chips.

For experienced parallel programmers willing to work close to the hardware.

Static SPMD Execution Model:– identical UEs created together when a program starts (this is

a standard approach familiar to message passing programmers)

Page 19: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

RCCE Power Management API

RCCE power management emphasizes safe control: V/GHz changed together within each 4-tile (8-core) power domain.– A Master core sets V + GHz for all cores in domain.

– RCCE_istep_power(): – steps up or down V + GHz, where GHz is max for selected voltage.

– RCCE_wait_power(): – returns when power change is done

– RCCE_step_frequency(): – steps up or down only GHz

Power management latencies – V changes: Very high latency, O(Million) cycles.– GHz changes: Low latency, O(few) cycles.

Page 20: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

sccGui for debugging

Modify config registers

Read system memory

20

Page 21: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

sccBoot & sccReset

sccBoot:A command-line tool that allows to boot Linux on selected cores and to check the status (“which cores are currently booted”).

sccReset:A command-line tool that allows to reset selected SCC cores.

Page 22: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

sccKonsole

Regular konsole, with automatic login to selected cores.

Enables broadcasting amongst shells.

22

Page 23: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

MARC - Many-core Application Research

Community Worldwide research partnership program with academia &

industry Providing access to SCC for many-core programming research Overwhelming interest - ~200 research proposals received SCC datacenter is online - Community website up and running

http://communities.intel.com/community/marc

Page 24: Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Recommended