+ All Categories
Home > Documents > Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL ...

Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL ...

Date post: 24-Dec-2015
Category:
Upload: ursula-reeves
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
30
Multicore/Manycore Processors Joram Benham April 2, 2012
Transcript
Page 1: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Multicore/Manycore Processors

Joram BenhamApril 2, 2012

Page 2: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Outline

Introduction Motivation Multicore Processors

Overview, CELL Advantages of CMPs

Throughput, Latency Challenges Future of Multicore

Page 3: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Introduction

Multicore processors Several/many cores on the same chip Dual/quad core – two/four cores

AKA Chip-multiprocessors (CMPs)

Page 4: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Motivation

Page 5: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Motivation - ILP

Instruction-Level Parallelism Pipelining – split execution into stages Superscalar – issue multiple instruction

each cycle Out-of-order execution Branch prediction

Take advantage of implicit program parallelism – instruction independence

Page 6: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Motivation – ILP Problems

1. Limited amount of implicit parallelism in sequentially designed/coded programs

2. Circuitry for pipelining becomes complex after 10-20 stages

3. Power – circuitry for ILP exploitation results in exponentially more power being used

Page 7: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Intel processor power over time. Power in Watts on y-axis, years on x-axis.

Page 8: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Chip-MultiprocessorsAKA Multicore/Manycore Processor

Page 9: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

CMPs

Getting harder to build better uniprocessors

CMPs are less difficult Can reuse/modify old designs Add modified copies to same chip

Requires a paradigm shift From Von Neumann model to parallel

programming model Thread-level parallelism + instruction-

level parallelism

Page 10: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Basic Uniprocessor Design

Page 11: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Basic CMP Design

Page 12: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Real CMP Example - CELL

CELL CMP – heterogeneous Developed by Sony, Toshiba, IBM Built for Sony’s PlayStation 3 Contains 9 cores

1 Power Processing Element (PPE) 8 Synergistic Processing Elements (SPEs)

Page 13: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Page 14: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Advantages of CMPsThroughput, Latency

Page 15: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Improving Throughput

Web-server throughput Handle many independent service

requests Collections of uniprocessor servers

used Then, multiprocessor systems CMP approach

Use less power for communication Reducing clock-speeds

Page 16: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Throughput – Servers

General rule: “The simpler the pipeline, the lower the

power.” Simple cores – less power used Less speed, but more cores available to

handle requests

Page 17: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Comparison of power usage by equivalent narrow issue/in-order processors, and wide-issue/out-of-order processors on throughput-oriented software.

Page 18: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Throughput - Multithreading Server applications:

High thread-level parallelism Lower instruction-level parallelism, high cache

miss rates Results in idle processor time on uniprocessors

Hardware multithreading Coarse-grained: stalls trigger switches Fine-grained: switch threads continuously Simultaneous: Run multiple threads using

superscalar issuing

Page 19: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Throughput – Increase the Cores

More cores = higher total hardware thread count

What kind of cores should be added? Fewer larger, more complex cores▪ Individual threads complete faster

Many smaller, simpler cores▪ Slightly slower – but more cores means more

threads, and higher throughput

Page 20: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Improving Latency

Latency is more important in some programs E.g. Desktop applications, compilation

CMPs are closer together on chip – less communication time

Two ways CMPs help with latency Parallelize the code for responsive

applications Run sequential applications on their own

hardware threads – no competition between threads

Page 21: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Multicore ChallengesPower and Temperature, Cache Coherence, Memory Access, Paradigm Shift, Starvation

Page 22: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Power and Temperature

In theory: two cores on the same chip = twice as much power + lots of heat

Solutions: Reduce core clock speeds Implement a power control unit

Page 23: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

CELL chip-multiprocessor thermal diagram.

Page 24: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Cache Coherence

Multiple cores, independent local caches Load same block of main memory into

cache – may result in data inconsistency Cache coherence schemes

Snooping: Watch the communication bus

Directory-based: Keep track of which memory locations are being shared in multiple caches

Page 25: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Memory Issues

We need more memory to share among multicore processors 64-bit processors – helps address the

issue: more addressable memory Useless if we cannot access it quickly Disk speed slows everyone down

Page 26: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Change to Parallel Paradigm “To use multicore, you really have to use

multiple threads. If you know how to do it, it's not bad. But the first time you do it there are lots of ways to shoot yourself in the foot. The bugs you introduce with multithreading are so much harder to find.”

Have to educate programmers Convince them to make their programs

concurrent

Page 27: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Starvation

Sequential programs will not use all cores Some cores “starve”

Shared cache usage One core evicts another core’s data Other core has to keep accessing main

memory

Page 28: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Future of MulticoreMulticore, Manycore, Hybrids

Page 29: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Future of CMPs

Instruction-level parallelism reaching its limits

CMPs help with throughput and latency

Two types of CMP will emerge “Manycore”: large number of small,

simple cores, targets at servers/throughput

“Multicore”: fewer, faster superscalar cores for very latency sensitive programs

“Hybrids”: heterogeneous combinations

Page 30: Joram Benham April 2, 2012.  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

References

Hammond, L., Laudon, J., Olukotun, K. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency. Morgan and Claypool, 2007.

Hennessy, J. L., Patterson, D. A. Computer Architecture: A Quantitative Approach.San Francisco: Morgan Kaufmann Publishers, 2007.

Mashiyat, A. S. “Multi/Many Core Systems.” St. Francis Xavier University course presentation, 2011.

Schauer, Bryan. “Multicore Processors – A Necessity.” Proquest Discovery Guides. September 2008. Web. Accessed April 2 2012. <http://www.csa.com/discoveryguides/multicore/review.pdf>


Recommended