+ All Categories
Home > Documents > Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is...

Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is...

Date post: 15-Jan-2016
Category:
Upload: justus-chadwell
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
16
Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 U nported License .
Transcript
Page 1: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Multithreading

Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Page 2: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Fine-Grain Multithreading

• Fine grain multithreading performs a switch between threads EVERY cycle. What is the primary goal of such an approach?

Selection “Best” argument

A Service each thread equally

B Hide instruction latencies

C Reduce replicated resources

D Exploit idle resources

E None of the above

Page 3: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Fine-Grain Multithreading

• Fine grain multithreading performs a switch between threads EVERY cycle. What is the primary drawback of such an approach?

Selection “Best” argument

A Poor scalability (benefits for 8 threads exceed benefits for 64 threads).

B Extra hardware

C Need for large number of threads

D 1 and 2

E 2 and 3

Page 4: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Course-Grain Multithreading

• Course grain multithreading performs a switch between threads whenever one thread encounters a high latency event. What is the primary goal of such an approach?

Selection Best argument

A Service each thread equally

B Hide instruction latencies

C Reduce replicated resources

D Exploit idle resources

E None of the above

Page 5: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Course-Grain Multithreading

• Course grain multithreading performs a switch between threads whenever one thread encounters a high latency event. What is the primary drawback of such an approach?

Selection Best argument

A Poor scalability (benefits for 8 threads exceed benefits for 64 threads)

B Context switch times are slow

C Extra hardware

D 1 and 2

E 2 and 3

Page 6: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Context Switch

• What happens on context switch?– Transfer of register state

– Transfer of PC

– Draining of the pipeline

• Additionally:– Warm up caches

– Warm up branch predictors

Page 7: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Multithreading

Issue Width Issue Width Issue Width

Coarse Grain Fine Grain SMT

Page 8: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Simultaneous Multithreading1. More functional units

2. Larger instruction queue

3. Larger reorder buffer

4. Means to differentiate between threads in the instruction queue, regrename, and reorder buffer

5. Ability to fetch from multiple programs

Selection Required Resources

A 1, 2, 3, 4, 5

B 1, 3, 5

C 1, 4, 5

D 4, 5

E None of the above

Given a modern out of order processor with register renaming, inst. queue, reorder buffer, etc. – What is REQUIRED to perform speculative multithreading

Point is – if you can just fetch from multiple streams – the processor is usually over provisioned anyway

Page 9: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Modern OOO Processor

Fetch Decode

InstructionQueue

Register Rename

INTALU

INTALU

INTALU

FPALU

FPALU

LoadQueue

StoreQueue

L1

Reorder Buffer

Draw just the need to fetch more insructions

Page 10: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

SMT vs. early multi-core

• The argument was between a single aggressive SMT out-of-order processor and a number of simpler processors.

• At the time – the advantage for the simpler processors was a higher clock rate.

• The disadvantage for the simpler processors were lack of functional units / in-order execution / smaller caches/ etc.

Page 11: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

SM vs. MP

Page 12: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

SMT vs. early CMP

• SMT – 4 issue, 4 int ALU, 4 FP ALU

• CMP – 2 cores each 2-issue, 2 int ALU, 2 FP ALUs

• Say you have 4 threads

• Say you have 2 threads – one is floating point intense and the other is integer intense

• Say you have 1 thread

Point out single thread drives benchmark tests – no one buys a processor which does worse!

Page 13: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Multi-core recently

• Instruction queues were taking up 20% of a core area for 4-issue, how complex would it be for 8-issue?

• Simpler hardware does not mean faster CR.

• Tons of die space.

• Larger caches weren’t helping performance that much

• Why not just replicate a single advanced processor (core)?

Page 14: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

SMT vs. CMP - Revised

• SMT – 4 issue, 4 int ALU, 4 FP ALU

• CMP – 2 cores each 4-issue, 4 int ALU, 4 FP ALUs

• Say you have 4 threads

• Say you have 2 threads – one is floating point intense and the other is integer intense.

• Say you have 1 thread….

Page 15: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Multi-core Today

• 4-8 cores per chip. “Multi-core Era”

• Throughput scales well with the number of cores.

• Each core is frequently SMT as well (for more throughput)

• Great when you have 4-8 threads (most of us have a fair number at any given time)

• What to do when we get 128 cores (“Many core era”)??

Page 16: Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Multithreading Key Points

• Simultaneous Multithreading– Inexpensive addition to increase throughput for multiple threads

– Enables good throughput for multiple threads

– Does not impact single thread performance

• Single Chip Multiprocessors– ILP wall/Memory Wall/ Power Wall – all point to multi-core

– Enables excellent throughput for multiple threads

• Where do we find all these threads?Field of dreams argument


Recommended