Date post: | 15-Jan-2016 |
Category: |
Documents |
Upload: | darin-nasby |
View: | 226 times |
Download: | 0 times |
System Simulation Of 1000-cores Heterogeneous SoCs
Shivani RaghavEmbedded System Laboratory (ESL)
Ecole Polytechnique Federale de Lausanne (EPFL)
ESL Work on Energy-Aware Datacenter Design
2
System Simulation for many-core
SIMinG-1k
communic.
now
Load profile Nw
Datacenter infrastructure
IPS
IPS
Load profile 1, 2 and 3
IPS
PMSM: Power/Therm. Manager
New server cooling tech.
network
now
Load profile 1
w$
now
Price profile 1
Internet
Grid
IPS
IPS
IPS
$
now
Price profile N
Emerging Data-Intensive Workloads
Cloud Servers
Molecular Dynamics
Monte CarloSimulations
Gene Sequencing
Online Gaming Services
Financial SimulationsMedical Imaging
Demand for Hardware Acceleration
Tile based ManycoresIntel SCC, Tile 64(Integrated)
GPU Clusters (off –chip
Accelerators)
Hybrid Cores AMD Fusion (on-chip)
Urgent Need for Simulation of Heterogeneous SoCs
Thermal& Power
Evaluations
BenchmarkingProfiling
Debugging
Design Space Exploration
Early Software
Development
Simulation
How to Design a Fast and Scalable Many-Core
Simulator?Parallel Target
Parallel Simulator
Parallel Host
Simulating Parallel Target on Parallel Hostis an Old Technology…
FPGA GPGPU
FlexusRAMP Opportunity
WWT IIGraphite
Cotson, OVPSim
Large ParallelSystems
Target ArchitectureData-Parallel Coprocessors
Simple In-order Cores
1000s of cores in a tile network
Fine grain parallelism
Core
Caches
Memory
Switch
Solution – Accelerating Simulation using GPGPUs
Target Architecture Host Platform
APerfectMatch
Outline
• Problem Overview Simulation of Heterogeneous SoCs
• SolutionSIMinG-1k: A GPU accelerated
simulator• Evaluation• Summary
Overall Simulation Framework
Host Platform
Sequential Code
Data Parallel Code
Simulator SIMinG-1k
TargetArchitecture
General Purpose
CPU
Many-Core Accelerator
Application
SIMinG-1k - Features
• Instruction Accurate
• Inexpensive and Easily Available
• Fast Development Cycle
• Equation Performance Model
• Portability (Target Independent)
• Interpretation based core-simulation
Challenges of using GPU as a host
• SIMT (Single inst multiple threads)• Divergent Code is a problem• Synchronization outside thread block• Slow CPU-GPU communication• Global Memory is slow and limited
Outline
• Problem Overview Simulation of Heterogeneous
SoCs• Solution
SIMinG-1k (GPU accelerated simulator)• Evaluation• Summary
Results – Architecture 1MIPS - Number of simulated instruction in host wall clock time
ARM
ISA
Data Scratchpad
Single tile of target Accelerator
Inst Scratchpad
128 256 512 1024 2048 40960
100
200
300
400
500
600
700
MMNCCIDCTEPCCDQFFTSYNC1
Number of Simulated Cores
S-M
IPS
Speed Up – Architecture 1
32 64 128 256 512 1024 2048 40960
200
400
600
800
1000
SIMinG-1k
OVP
# Simulated Cores
MIPS
Matrix Multiply
Speedup compared to simulation on OVPSim (thousands of ARM cores)
Single tile of Data-parallel Accelerator(cores, caches, on-chip interconnect)
Results – Architecture 2
Core
Caches
Memory
Switch
128 256 512 1024 2048 409605101520253035404550
0.180 0.077 0.026 0.006 0.002 0.001
NCCMMIDCTDQFFTEPCCSYNC1
Number of Simulated Cores
S-M
IPS
Speed Up – Architecture 2Speedup compared to serial simulation on QEMU
Outline
• Problem Overview Simulation of Heterogeneous
SoCs• Solution
SIMinG-1k (GPU accelerated simulator)• Evaluation• Summary
Conclusion Challenge
Fast and parallel simulator for heterogeneous SoCs Solution
Parallelize 1000 core simulation using GPUs Design
Full System Simulation using QEMU and SIMinG-1k Results
High Scalability and speedup upto 4096 cores
Extend the simulator for thermal and power evaluations Complete simulation of Cloud Data Centers
Future Work
Thanks!
Questions?