Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.....

Post on 18-May-2018

214 views 0 download

transcript

Parallella: A Love StoryHeterogeneous.. 

Parallel.. Efficient..

Open..Andreas OlofssonMIT, Jan 7,2013

1

Adapteva Achieves 3 “World Firsts”

2

1. First processor company to reach 50 GFLOPS/W

2. First open source OpenCL™ SDK in the mobile market

3. First semiconductor company to successfully crowd‐source project

“OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.”

Prologue

3

Why we need heterogeneous and parallel platforms

4

0

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1990 1995 2000 2005 2010 2015 2020 2025 2030

System ProcessingNeedsLegacy ProcessingEfficiency

“The Efficiency Gap”Von NeumannSaturation

ASIC, FPGA, DSP, CPU?

5

ASIC FPGA DSP CPU

Flexibility Poor Great Good Good

Efficiency Great Good Good Fair

DevelopmentCost/Risk High Medium Medium Low

Leverage Minimal Modest High Huge

A Practical Radar System Example

6

ADC/DAC FPGA

1

DDR

uP

Storage

Display

EpiphanyFPGAs are great for front‐end 

DSP and connectivity.

The missing piece: a math engine that is high performance, low-power and C-programmable.

Microprocessors are great for user interfacing, knowledge 

extraction, and system management.

Why SOC integration is so disruptive

7

62 cm3 0.00003 cm3>1M X Volume Reduction

2XCPU

A5X‐die~13mm

FPU~0.15mm

ARMA9

~2mm

A5X Chip~16mm

iPhone4s~58mm What if your 

smartphone disappears?

The Problem: SOCs are complex!

8

$10,000

$100,000

$1,000,000

$10,000,000

$100,000,000

$1,000,000,000

Per Product SOC R&D CostsWhat if you could do a 

28nm chip for $100k?

Our Vision: True Heterogeneous Computing

9

SYSTEM‐ON‐CHIP

BIGCPU

FPGA

BIGCPU

BIGCPU

BIGCPU

1000 small Epiphany RISC CPUs/DSPs

GPU Analog

Epiphany: Massive Task‐Parallelism

10

Coprocessor to ARM/Intel CPU 25mW per core C/C++ programmable

Programming Models

11

MODEL#1TASK QUEUE MODEL

• Up to 2 GFLOPS/core• Supports standard C/C++• “Cloud on a chip”

MODEL #2DATA PARALLEL MODEL

• openCL programmable• Easy integration of C/C++• openMP/MPI roadmap

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

X86/ARM/FPGA Host

Task1

Task3Task4

Task2

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

X86/ARM/FPGA HostTask1

Epiphany Silicon Devices

12

Features:• 16 RISC CPU cores• 512KB distributed memory• IEEE Floating Point• 32 distributed DMA engines• 4 off-chip serial links• 65nmSpecifications:• 1 GHz• 32 GFLOPS• 2 Watt Max Chip Power• 512 GB/sec memory bandwidth• 8GB/sec off chip BW

eLinkI/O

eLinkI/O

eLink I/O

eLink I/O

Features:• 64 RISC CPU cores• 2MB distributed memory• IEEE Floating Point• 128 distributed DMA engines• 4 off-chip serial links• 28nmSpecifications:• 800 MHz• 100 GFLOPS• 2 Watt Max Chip Power• 1.6 TB/sec memory bandwidth• 8GB/sec off chip BW

Parallella

13

Parallella Open Computing

14

Rj45

USB

GPIO

GPIO

ZYNQ(ARM)CPU

E64

1GB SDRAM

uSD

HDMI

USB

• Open (and ”free”):• Documentation• Board design files• Drivers• Software Tools

• Accessible (NO NDAs!)• $100 entry point• ~4000 devs signed up in 4 weeks

IO IO

How cool is this?

15

100 GFLOPS100 KW$10M

(1992)Connection Machine 5

100 GFLOPS5 W (20k X)$200 (50k X)

(2012/2013)Parallella Board

Rj45

USB

GPIO

GPIO

ZYNQ(ARM)CPU

E64

1GB SDRAM

uSD

HDMI

USB

eLink

Parallella Architecture

16

Dual CoreARM A9

AXI BUS

MIO

SHARED DRAM

“O/S” DRAM

USB OTG USB 2.0

UART Ethernet

SD‐CARD I2C

DAC/ADC IFHDMI

Controller

AXI‐MASTER AXI‐SLAVE

“Glue‐Logic”

DaughterCard

AXI‐MASTER

ZynqFPGA

Zynq“Hard”

Off‐Chip

EpiphanyEpiphany

MEM‐CTRL

“Sandbox”

Parallella Coprocessor Approach

17

ARM runs Linux

Epiphany accelerates key

tasks

Programmable logic “makes

anything possible”

Program Flow

1. ARM boots Linux. First stage boot loader from Flash, everything else from SD card.2. “Main” application executes on ARM3. Application sends critical tasks send to Epiphany using OpenCL or simple threads4. ARM/Epiphany communication through shared DRAM buffer outside virtual memory of O/S.

18

Zedboard Introduction

18

19

The Future is… Open

Heterogeneous Massively Task-Parallel

Efficient

Grande Challenges Ahead…• Rebuild the computer ecosystem• Rewrite billions of lines of code• Retrain millions of programmers• Rewrite the education curriculum