Date post: | 12-Jul-2015 |
Category: |
Devices & Hardware |
Upload: | somnath-mazumdar |
View: | 469 times |
Download: | 2 times |
Parallella
Presented By:Somnath Mazumdar
University of Siena, Italy
Outline
This Presentation was held on 10th Dec 2014
Place: Ericsson Research Lab, Lund
SwedenThis work is licensed under a Creative Commons Attribution 4.0 International License.
Outline
Introduction
Architecture
System View
Programming
Conclusion
Outline
Genesis
Influenced by Open Source Hardware Design projects:ArduinoBeaglebone
Inspired by:Raspberry PiZedboard
The board is open source hardware*
*https://github.com/parallella/parallella-hw
In News “Smallest Supercomputer in the World”
Adapteva A-1…...
• Launched at ISC'14*
• It has 2.112 RISC cores
• Based on 64-core Epiphany board
• Power Consumption 200 Watt.
• Performance: 16 Gflop/s per Watt
*http://primeurmagazine.com/weekly/AE-PR-07-14-104.htmlImage Source:https://twitter.com/StreamComputing/media
Adapteva (Zynq + Epiphany III)
• Based on Epiphany™ architecture (Multi-core MIMD
Architecture)
• SoC fully programmable Xilinx Zynq with dual core CPU ARM Cortex-A9
• 16/64-core microprocessor/coprocessor:No cache32-bit coresMax Clock Speed 1 GHz (600 MHz)Peak Performance : 32 GFLOPS Support Fused Multiply–Add (FMA) operationsSuperscalar floating-point (IEEE-754) RISC CPU CoreTwo floating point operations /clock cycle.
• Supports Static Dual-Issue Scheduling
IALU: Single 32-bit integer operation/clk. cycle. FPU: Single floating-point
instruction /clk cycle 64 General purpose registers Program Sequencer supports
all standard program flows…. Branching costs 3 cycles. No hardware support:
Integer multiply Floating point divide Double-precision
floating point ops.
eCore CPU(1)
Adapteva (Zynq + Epiphany III)
Epiphany Architecture(1)
Every router in the mesh is connected to North, East, West, South, and to a mesh node.
Routers at every node contains round-robin arbiters. Routing hop latency is 1.5 clock cycles
Interconnects
• Ecores are Connected by 2D low-latency NoC (eMesh) rMesh for read xMesh for off-chip write cMesh for on-chip write
• eMash has only nearest-neighbor direct connections.
• Each routing link can transfer up to 8 bytes data on every clock cycle. Network-On-Chip Overview(1)
Network Topology(1)
Interconnects
• Network complete transactions in a single clock cycle because of spatial locality and short point-to-point on-chip wires.
• Each mesh node has globally addressable ID (6 row-ID and 6 col-ID)
Memory
Chip Core Start Address End Address Size
(0,0) 00000000 00007FFF 32KB
• Shared memory (32 bit wide flat memory and unprotected)
• Primary Memory: 1GB (DDR3 SDRAM)• Flash Memory: 128Mb (Boot code) • Is a little-endian memory architecture.• This, single, flat address space consisting of 232 8-
bit bytes.(consisting of 230 32-bit words)• SRAM Distribution:
• On every clock cycle 64 bits of data / instructions can be exchanged between memory and CPU’s register file, network interface or local DMA.
• Dual channel DMA engine
• Memory Mapped Registers
• Each eCore has 32KB of local memory(4 sub-banks * 8KB)
• eCPU has a variable-length instruction pipeline that depends on the type of instruction being executed.
Memory
Memory Architecture(2)
Memory: Read-Write Transactions
• Read transactions are non-blocking• RW transactions from local memory follow a strong
memory-order model.• RW transactions that access non-local memory
follow weak memory-order model.• Soln: Use run-time synchronization calls with
order-dependent memory sequences.• Less inter-node communication
Scalability
• It has four identical source-synchronous bidirectional off chip eLink.
• eLink is non-blocking
• Optimal bandwidth is achieved when a large number of incrementally numbered 64 bit data packets are sent consecutively
FPGA eLink Integration(1)
360 Degree View(front)
Image Source : http://www.parallella.org/board/
360 Degree View(back)
Image Source : http://www.parallella.org/board/
PEC: Parallella Expansion Connector
How to get started..
1. Create a Parallellamicro-SD card1
2. Connect the wires mentioned in2
3. Power On 4. Go...
1. http://www.parallella.org/create-sdcard/2. http://www.parallella.org/quick-start/
Epiphany Host Library (eHAL)
• Encapsulates low-level Epiphany functionality(Epiphany device driver)
• Library interface is defined in “e-hal.h”.• Steps to write a program:
1. Prepare the system:e_init(NULL); //Initialize system
e_reset_system(); //reset the platform
e_get_platform_info(&platform); // get the actual system parameters
2. Allocate Memory(optional)e_mem_t emem; // object of type e_mem_t
char emsg[Size];e_alloc(&emem, <BufOffset>, <BufferSize>); //Allocate a buffer in shared external memory
3. Open Workgroup:e_open(&dev, 0, 0, platform.rows, platform.cols); // open all cores (OR)
e_open(&dev, 0, 0, 1, 1); // Core coordinates relative to the workgroup.
e_reset_group(&dev); //Soft Reset
Epiphany Host Library (eHAL)
4. Load program
e_load("program", &dev, 0, 0, E_TRUE);
5. Wait and then print message from buffer.
usleep(time);
e_read(&emem, 0, 0, 0x0, emsg, _BufSize);
fprintf(stderr, "\"%s\"\n", emsg);
6: Close every connection.
e_close(&dev);
e_free(&emem);
e_finalize();
Epiphany Host Library (eHAL)
Epiphany Hardware Utility Library (eLib)
• Provides functions for configuring and querying eCores.
• Also automates many common programming tasks in eCores
• Steps to write an eCore program• Step1: Declare shared memory:
char outbuf[128] SECTION("shared_dram");• Step2: Enquire about eCore id:
e_coreid_t coreid;coreid = e_get_coreid();
• Step3: Print “Hello World” with core id• Step4: Exit
Hello Worldint main(int argc, char *argv[]){
e_platform_t platform;e_epiphany_t dev;e_mem_t emem;char emsg[_BufSize];e_init(NULL);e_reset_system(); e_get_platform_info(&platform);e_alloc(&emem, _BufOffset,
_BufSize);e_open(&dev, 0, 0, 1, 1);e_load("e_core.srec", &dev, 0, 0,
E_TRUE);usleep(10000);e_read(&emem, 0, 0, 0x0, emsg,
_BufSize);fprintf(stderr, "\"%s\"\n", emsg);e_close(&dev);fflush(stdout);e_free(&emem);e_finalize();return 0;
}
#include <needed .h files>#include "e-lib.h" char outbuf[128] SECTION("shared_dram");
int main(void){e_coreid_t coreid;coreid = e_get_coreid();
sprintf(outbuf, "Hello World from core 0x%03x!", coreid);
return 0;}
Host SideeCore Side
Epiphany Program Build Flow(2)
Where to put the code..
• 3 different Linker Description Files (LDF)
• Internal.ldf : Store Data/Ins. in internal SRAM (limit 32KB).
• Fast.ldf : User code/data and stack in internal SRAM. Standard libraries in external DRAM.
Good for few large library functions
• Legacy.ldf: Everything stored in external DRAM (limit 1MB)
Slower than internal and legacy..
Synchronization(eCores)
http://www.linuxplanet.org/blogs/?cat=2359
Barrier for synchronizing parallel executing threads
1. Setup e_barrier_init(bar_array[],tgt_bar_array[])
2. Call Function
3. Wait for sync e_barrier(bar_array[],tgt_bar_array[]
Mutex(blocking & non blocking)..
1. Setup:e_mutex_init(0,0,s_mutex, mutex_attr)
2. Gain access:e_mutex_lock(0,0,s_mutex)
3. Call function
4. Release accesse_mutex_unlock(0,0,s_mutex)
Image Source: http://xkcd.com/1445/
Synchronization between the ARM and eCores useflag
Because: eMesh writes from an individual Epiphany core to the external shared DRAM will update the DRAM in the same order as they were sent. However if multiple cores are writing to external DRAM, the sequence of writing into the DRAM will be changed.
Soln:1. Set Flag
2. Use software barrier function e_barrier() (time consuming)
3. Use the experimental hardware barrier opcode
My Understanding
Useful for Sync
Ecore side Read & Write:e_write(remote, Dst, row, col, Src, Byte_size);e_read(remote, Dst, row, col,Src, Byte_size);
Remote parameter must be either: e_group_config if remote is workgroup core ore_emem_config if remote is an external memory buffer
Conclusion
• Fast and power efficient
• Power needed 5V/2A (0.3A -1.5A)
• Fully-featured ANSI-C/C++ and OpenCLprogramming environments
• Large Application domain support
• But..• Need Improved SDK (on the way..)• Cache might improve the performance (software cache is
on the way…)• Synchronization and randomness is a big issue…
Reference
1. Epiphany Architecture Referencehttp://www.adapteva.com/docs/epiphany_arch_ref.pdf
2. Epiphany SDK Reference:http://adapteva.com/docs/epiphany_sdk_ref.pdf
3. Esdk GitHub: https://github.com/adapteva/epiphany-sdk
4. Reading: http://www.adapteva.com/all-documents/