+ All Categories
Home > Documents > Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Date post: 18-Jan-2016
Category:
Upload: osborn-turner
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
33
Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches
Transcript
Page 1: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Parallelism in Processors

Several Approaches

Page 2: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Why Parallelism?• Simple fact is there is never enough

processor speed• Performance gains come from two

areas• Better integration technololgy• Better implementation of parallelism• Next two graphics show this

Page 3: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Chip Performance

Page 4: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Gains From Parallelism

Page 5: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Summary• The bulk of the gains have come

from faster and smaller components• A significant amount from parallelism• The parallelism has also offset the

greater complexity of the instruction set

Page 6: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Approaches• Instruction level parallelism

– Instructions operate in parallel– Pipelining

• Data parallelism– Vector processors

• Processor level parallelism– Multiple CPUs

Page 7: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

First Attempt• One bottleneck is that accessing

instructions from memory is slow• Processor is usually order of

magnitude faster• Usually faster than cache also• Therefore have a fetch engine that

gets instructions all the time• This is the Prefetch buffer

Page 8: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Prefetch buffer• Don’t wait for the current instruction

to finish– Fetch the next instruction as soon as the

current instruction arrives

• This scheme can make a mistake since a goto or branch makes the next instruction difficult to guess

• You may also fetch in two directions and discard the unused– These are stored in the prefetch buffer

Page 9: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Two stages

• Now we have two independent pieces

• The instruction fetch mechanism– Using the prefetch buffer

• The instruction execute mechanism– This is where most of the work is done

• This generalizes into a pipeline of several stages

Page 10: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Pipelines• Each of the following are stages:

– Fetch the instruction– Decode the instruction– Locate and fetch operands– Execute the operation– Write the results back

• These may belong to separate hardware chunks that operate in parallel

Page 11: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Example:• All of this goes on in parallel:• Fetch instruction 8• Decode instruction 7• Fetch operands for instruction 6• Execute instruction 5• Write back data for instruction 4

Page 12: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

A Simulator

Page 13: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Superscalar architectures• Have a single fetcher drive two different

lines each of which consists of these stages• The decode through write back occurs in

parallel on two or more separate lines• This is the Pentium approach• The main pipeline can handle anything• The second pipeline can handle integer

operations or simple floating point operations– Simple such as load / store from floating

processor

Page 14: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

CDC 6600• Just the execute is parallel• This only works well if execute step

takes longer than the other steps• This is particularly true for floating

point and memory access instructions• The 6600 had multiple I/O and Floating

Point processors that could execute in parallel– This is the last of the Cray machines in 60s

Page 15: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Problems?• Pipelining needs some instruction

independence to work optimally• If instructions A, B, C are consecutive and

B depends on the result of A and C depends on the result of B we may have a problem with either approach

• Operand fetch of B cannot complete until write back of A, stalling the whole line

• However, the average mix of instructions tends to not have these hard dependencies in every instruction

• Compilers can also optimize by mixing up the expression output

Page 16: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Problem Example

Page 17: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Limits on Instruction Level Parallelism

• There is a limit on the gains• The more stages the less likely that

the instruction sequence will be suitable

• The more expensive the recovery for a mistake

• Dividing up an instruction processing past 10-20 stages makes for too little work to be done by each stage

• The more complicated the processor the more heat it generates

Page 18: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Chip Power Consumption

Page 19: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Operating System Parallelism

• Next we need the types of parallel processing enabled by the OS

• This usually involves multiple processes and thread

• Several flavors:• Uniprocessing• Hyperthreading– Multiprocessing

Copyright © 2005-2011 Curt Hill

Page 20: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

UniProcessing• Single CPU, but apparent multiple

tasks• Permissive

– Any system call allows the current task to be suspended and another started

– Windows 3• Preemptive

– A task is suspended when it makes a system call that could require waiting

– A time slice occurs• Scalar, array and vector processors

Page 21: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Multiple Processors MultiProcessing

• Real multiprocessing involves multiple CPUs

• Multiple CPUs can be executing different jobs

• They may also be in the same job, if it allows

• The CPUs are almost completely independent– They may share memory or disk or both

Page 22: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Multiprocessors• Two or more CPUs with shared

memory• Multiprocessors generally need both

hardware and OS support• This technique has been used since

the 60s• The idea is that two CPUs can

outperform one• It will become even more important

Page 23: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Half Way: HyperThreading• The Hyper Threading CPUs are a

transitional form• There is one CPU with two register

sets• The CPU alternates between registers

in execution thus giving better concurrency than a uniprocessor

• Windows XP considers it two CPUs

Page 24: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Multi-Tasking Operating System

– There are multiple processes– Each has its own memory– In a single CPU system process executes

until:• Waiting for I/O• Used its time slice• Something with higher priority is now ready

– When a process is suspended– A queue of processes waiting to execute is

examined, the first is chosen and executed

Page 25: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Multiple CPUs– Updating this to multiple CPUs mostly

requires that the dispatcher part cannot have both CPUs running there at the same time

– This requires some type of exclusive instruction and the dispatcher utilize it

– Windows 95, DOS cannot– Windows NT, OS/2 and UNIX allow

Page 26: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

MPU Loss• Because of the need to have one CPU

lock out the other in certain instances, two CPUs never perform to the same level as one that is twice as fast– 90% seems to be average– Thus an MPU with two 1 GHz processors

will perform similar to a 1.8 GHz uniprocessor

• More than two yields more loss• Most servers are duals or more

Page 27: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Multiprocessors Again• Before the Pentium a multiprocessor

needed extra hardware to prevent the CPUs from performing a race error of some sort

• The Pentium could share four pins and that was all the hardware support that was needed

• The next advance was the multicores

Page 28: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Multicore Chips• Instead of one very fast CPU on a

chip put two not so fast CPUs• These are the multicore chips• They are actually removing some of

the complexity of pipelining to make it smaller and then also using a slower and cooler technology

Page 29: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Manufacturer’s Offerings

• Intel’s HyperThreading chips were a transitionary form

• AMD and Intel dual-processors became available in 2005

• Sun has a 4 core SPARC to be released 2005-2006

• Microsoft changed its license to be per chip, so that a multi-core chip is considered one processor

Page 30: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Disadvantages• The bus to the memory becomes the

bottleneck• Several things are accessing the memory

independently: two or more CPUs, Direct Memory Access controllers (disk controllers, video)

• One solution is dual port memory• Separate caches can also help• Another solution is to give each processor

its own local, private memory, but this diminishes the type of sharing that can go on

Page 31: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Chip MultiProcessors

Page 32: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Copyright © 2005-2011 Curt Hill

Multicomputers• When the number of connections get

large the sharing of memory gets hard

• A multicomputer consists of many parallel processors, each with their own memory and disk

• Then communication is accomplished by messages sent from one to all or one to another

• Grid computing is one alternative

Page 33: Copyright © 2005-2011 Curt Hill Parallelism in Processors Several Approaches.

Conclusion• Moore’s Law has not been about just

better integration techniques• Parallelism in the single CPU and in

multiple CPUs has also contributed• Pipelining has been major technique

for single CPUs• There are other presentations on

multicomputer and multiprocessor systems

Copyright © 2005-2011 Curt Hill


Recommended