+ All Categories
Home > Documents > Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains...

Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains...

Date post: 13-Mar-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
48
Parallel Computing Architecture
Transcript
Page 1: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Parallel Computing

Architecture

Page 2: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 2

• Uses of interconnection networks

• Connect processors to shared memory

• Connect processors to each other

• Interconnection media types

• Shared medium

• Switched medium

Interconnection Networks

Page 3: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 3

Parallel Computers

• Vector Computers

• Multiple CPUs

• Instructions include direct vector operations

• Pipelined – data streams through vector arithmetic

units (CRAY)

• Processor array – processors execute the same

instruction

• Multiprocessors

• Multiple CPUs with shared memory

• Multicomputers

• Multiple CPUs with distributed memory

Page 4: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 4

Processor Array

• Only well adapted to data parallel

problems

Page 5: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 5

Multiprocessors

• Shared memory

• Can be built with comodity components

• Centralized

• Extension of a multiprocessor

• Add CPUs to a BUS

• Same memory access time

• UMA – Uniform memory access

• Also known as SMP (symmetric multiprocessor)

• Distributed

• Memory distributed among processors

• NUMA – Non-uniform memory access

• Allows greater numbers of processors

Page 6: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 6

Centralized multiprocessors

• Problem: Cache coherence

• Write invalidate protocol

Page 7: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 7

Most common solution to cache coherency

1. Each CPU’s cache controller monitors (snoops)

the bus & identifies which cache blocks are

requested by other CPUs.

2. A Processor gains exclusive control of data item

before performing “write”.

3. Before “write” occurs, all other copies of data

item cached by other Processors are

invalidated.

4. When any other CPU tries to read a memory

location from an invalidated cache block,

• a cache miss occurs

• it has to retrieve updated data from memory

Write Invalidate Protocol

Page 8: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 8

Cache-coherence

Cache

CPU A

Cache

CPU B

Memory

7 X

Page 9: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 9

CPU A

Cache

CPU B

Memory

X 7

7

Cache-coherence

Read from memory is

not a problem.

Page 10: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 10

CPU A CPU B

Memory

X 7

7 7

Cache-coherence

Page 11: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 11

CPU A CPU B

Memory

X 2

7 2

Cache-coherence

Write to memory is a

problem.

Page 12: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 12

CPU A CPU B

Memory

X 7

7 7

Cache-coherence

A cache control

monitor snoops the bus

to see which cache

block is being

requested by other

processors.

Page 13: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 13

CPU A CPU B

Memory

X 7

7 7

Cache-coherence

Intent to write X

Before a write can

occur, all copies of

data at that address

are declared invalid.

Page 14: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 14

CPU A CPU B

Memory

X 7

7

Cache-coherence

Intent to write X

Page 15: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

2010@FEUP Architecture 15

CPU A CPU B

Memory

X 2

Cache-coherence

2

When another processor

tries to read from this

location in cache, it

receives a cache miss

error and will have to

refresh from main

memory.

Page 16: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Distributed Multiprocessors

• Increase local memory bandwidth and

lower average memory access time

• The all memory has a single address

space

2010@FEUP Architecture 16

Page 17: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Cache Coherence

• Implementation more difficult

• No shared memory bus to “snoop”

• Directory-based protocol needed

• Some NUMA multiprocessors do not

support it in hardware

• Only instructions, private data in cache

• Large memory access time variance

2010@FEUP Architecture 17

Page 18: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Directory-based Protocol

• Distributed directory contains information about

cacheable memory blocks

• One directory entry for each cache block

• Each entry has

• Sharing status

• Which processors have copies

• Sharing status

• Uncached -- (denoted by “U”)

• Block not in any processor’s cache

• Shared – (denoted by “S”)

• Cached by one or more processors, read only

• Exclusive – (denoted by “E”)

• Cached by exactly one processor, write access

2010@FEUP Architecture 18

Page 19: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Directory-based Protocol

2010@FEUP Architecture 19

Interconnection Network

Directory

Local Memory

Cache

CPU 0

Directory

Local Memory

Cache

CPU 1

Directory

Local Memory

Cache

CPU 2

Page 20: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Directory-based Protocol

2010@FEUP Architecture 20

Interconnection Network

CPU 0 CPU 1 CPU 2

7 X

X U 0 0 0 Dir

Mem

Cache

Bit Vector

Page 21: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 reads X

2010@FEUP Architecture 21

Interconnection Network

CPU 0 CPU 1 CPU 2

7 X

X U 0 0 0 Dir

Mem

Cache

Bit Vector Read Miss

Page 22: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 reads X

2010@FEUP Architecture 22

Interconnection Network

CPU 0 CPU 1 CPU 2

7 X

X S 1 0 0 Dir

Mem

Cache

Bit Vector

7 X

Page 23: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU2 reads X

2010@FEUP Architecture 23

Interconnection Network

CPU 0 CPU 1 CPU 2

7 X

X S 1 0 0 Dir

Mem

Cache

Bit Vector

Read Miss

7 X

Page 24: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU2 reads X

2010@FEUP Architecture 24

Interconnection Network

CPU 0 CPU 1 CPU 2

7 X

X S 1 0 1 Dir

Mem

Cache

Bit Vector

7 X 7 X

Page 25: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 Writes 6 to X

2010@FEUP Architecture 25

Interconnection Network

CPU 0 CPU 1 CPU 2

7 X

X S 1 0 1 Dir

Mem

Cache

Bit Vector

7 X 7 X

Write Miss

Page 26: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 Writes 6 to X

2010@FEUP Architecture 26

Interconnection Network

CPU 0 CPU 1 CPU 2

7 X

X S 1 0 1 Dir

Mem

Cache

Bit Vector

7 X 7 X

Invalidate

Page 27: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU1 Reads X

2010@FEUP Architecture 27

Interconnection Network

CPU 0 CPU 1 CPU 2

7 X

X E 1 0 0 Dir

Mem

Cache

Bit Vector

6 X

Read Miss

Page 28: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU1 Reads X

2010@FEUP Architecture 28

Interconnection Network

CPU 0 CPU 1 CPU 2

6 X

X S 1 0 0 Dir

Mem

Cache

Bit Vector

6 X

Switch to Shared

Page 29: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU1 Reads X

2010@FEUP Architecture 29

Interconnection Network

CPU 0 CPU 1 CPU 2

6 X

X S 1 1 0 Dir

Mem

Cache

Bit Vector

6 X 6 X

Page 30: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU2 Writes 5 to X

2010@FEUP Architecture 30

Interconnection Network

CPU 0 CPU 1 CPU 2

6 X

X S 1 1 0 Dir

Mem

Cache

Bit Vector

6 X 6 X

Write Miss

Page 31: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU2 Writes 5 to X

2010@FEUP Architecture 31

Interconnection Network

CPU 0 CPU 1 CPU 2

6 X

X S 1 1 0 Dir

Mem

Cache

Bit Vector

6 X 6 X

Invalidate

Page 32: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU2 Writes 5 to X

2010@FEUP Architecture 32

Interconnection Network

CPU 0 CPU 1 CPU 2

6 X

X E 0 0 1 Dir

Mem

Cache

Bit Vector

5 X

Page 33: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 Writes 4 to X

2010@FEUP Architecture 33

Interconnection Network

CPU 0 CPU 1 CPU 2

6 X

X E 0 0 1 Dir

Mem

Cache

Bit Vector

5 X

Write Miss

Page 34: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 Writes 4 to X

2010@FEUP Architecture 34

Interconnection Network

CPU 0 CPU 1 CPU 2

5 X

X S 0 0 1 Dir

Mem

Cache

Bit Vector

5 X

Make shared

Page 35: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 Writes 4 to X

2010@FEUP Architecture 35

Interconnection Network

CPU 0 CPU 1 CPU 2

5 X

X U 0 0 0 Dir

Mem

Cache

Bit Vector Invalidate

Page 36: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 Writes 4 to X

2010@FEUP Architecture 36

Interconnection Network

CPU 0 CPU 1 CPU 2

5 X

X S 1 0 0 Dir

Mem

Cache

Bit Vector

5 X Creates cache

block storage

for X

Page 37: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 Writes 4 to X

2010@FEUP Architecture 37

Interconnection Network

CPU 0 CPU 1 CPU 2

4 X

X E 1 0 0 Dir

Mem

Cache

Bit Vector

5 X

Write X

Page 38: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 Writes Back X Block

2010@FEUP Architecture 38

Interconnection Network

CPU 0 CPU 1 CPU 2

4 X

X S 1 0 0 Dir

Mem

Cache

Bit Vector

4 X

Data Write Back

Page 39: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

CPU0 flushes cache block X

2010@FEUP Architecture 39

Interconnection Network

CPU 0 CPU 1 CPU 2

X U 0 0 0 Dir

Mem

Cache

Bit Vector

4 X

Data Write Back

Page 40: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Multicomputer

• Distributed memory multiple-CPU

computer

• Same address on different processors

refers to different physical memory

locations

• Processors interact through message

passing

• Flavors

• Asymmetrical

• Symmetrical

• Mixed

2010@FEUP Architecture 40

Page 41: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Asymmetrical Multicomputer

• Back-end dedicated to parallel operations

• Single front-end computer can limit

scalability of system

• Every application requires development of

both front-end and back-end program

2010@FEUP Architecture 41

Page 42: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Symmetrical Multicomputer

• Every processor executes same program

• No simple way to balance program

development workload among processors

• More difficult to achieve high performance

with several processes on each processor

2010@FEUP Architecture 42

Page 43: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Mixed Cluster Multicomputer

• Co-located computers

• Dedicated to running parallel jobs

• Identical operating system

• Identical local disk images

2010@FEUP Architecture 43

Page 44: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Flynn’s Taxonomy

• Instruction stream

• Data stream

• Single vs. multiple

• Four combinations

• SISD

• SIMD

• MISD

• MIMD

2010@FEUP Architecture 44

Page 45: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Flynn’s Taxonomy

• SISD

• Single Instruction, Single Data

• Single-CPU systems

• Note: co-processors don’t count

• Can execute multiple functions

• Multiple I/O

• Example: PCs

• SIMD

• Single Instruction, Multiple Data

• Two architectures fit this category

• Pipelined vector processor

• Processor array

2010@FEUP Architecture 45

Page 46: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Flynn’s Taxonomy

• MISD

• Multiple Instruction, Single Data

• Example: systolic array

• MIMD

• Multiple Instruction, Multiple Data

• Multiple-CPU computers

• Multiprocessors

• Multicomputers

2010@FEUP Architecture 46

Page 47: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

Systolic Array

• Multiple interconnected processing

elements

• Example: Sorting element

2010@FEUP Architecture 47

Input phase (1 clock)

3 inputs: a, b, c,

a

b

c min(a, b, c)

med(a, b, c)

max(a, b, c)

Output phase (1 clock)

3 outputs: min, med, max

Page 48: Architectureapm/REDAI/docs/redai02v.pdfDirectory-based Protocol •Distributed directory contains information about cacheable memory blocks •One directory entry for each cache block

A priority queue in a systolic array

• One insertion, 2 extractions

2010@FEUP Architecture 48

4

5

8

7

-∞ 5

8 4

7

-∞

∞ ∞

Inserting 7

4

5

7

8

∞ ∞

5

7 ∞

4

8

∞ ∞

Extraction

7

8

∞ ∞

7

∞ ∞

5

8 ∞

5

Extraction


Recommended