Pertemuan 26 Parallel Processing 2

1

Pertemuan 26Parallel Processing 2

Matakuliah : H0344/Organisasi dan Arsitektur Komputer

Tahun : 2005

Versi : 1/1

2

Learning Outcomes

Pada akhir pertemuan ini, diharapkan mahasiswa

akan mampu :

• Menjelaskan prinsip kerja parallel processing

3

Outline Materi

• Multiple Processor Organization

• Symmetric Multiprocessor

• Cache Coherence and The MESI Protocol

• Clusters

• Non-uniform Memory Access

• Vector Computation

4

Cache coherence and MESI Protocol

The cache coherence:

Multiple copies of the same data can exist in different caches simultaneously, and if processors are allowed to update their own copies freely, an inconsistent view of memory can result.

• Software solution

• Hardware solution

• Directory protocol

• Snoopy protocol

5


MESI cache line states

MModified

EExclusive

SShared

IInvalid

This cache line valid?

Yes Yes Yes No

The memory copy is …

Out of date Valid Valid -

Copies exist in other caches?

No No Maybe Maybe

A write to this line …

Does not go to bus Does not go to bus Goes to bus and updates cache

Goes directly to bus

6


MESI state transition diagram

Invalid Shared

Modified Exclusive

RMS

RME

WH

WM

WH

RH

RH

WH

RH

(a) Line in cache at initiating processor

Invalid Shared

Modified Exclusive

SHW SHR

(a) Line in snooping cache

SHWSHR

SHW SHR

7

Clusters

Four benefits that can be achieved with clustering:

• Absolute scalability

• Incremental scalability

• High availability

• Superior price/performance

8

Clusters

Cluster configuration

P P

M I/O I/O

PP

MI/OI/OHigh speed message link

P P

M I/O I/O

PP

MI/OI/O

High speed message linkI/O I/O

RAID

(a) Standby server with no shared disk

(a) Shared disk

9

Clustering methods: benefits and limitationsClustering method Description Benefits Limitation

Passive standby A secondary server takes over in case of primary server failure.

Easy to implement. High cost because the secondary server is unavailable for other processing tasks.

Active secondary The secondary server is also used for processing tasks.

Reduces cost because secondary servers can be used for processing.

Increased complexity.

Separate servers Separate servers have their own disks. Data are continuously copied from primary to secondary server.

High availability. High network and server overhead due to copying operations.

Servers connected to disk

Servers are cabled to the same disks, but each server owns its disk. If one server fails, its disks are taken over by the other server.

Reduced network and server overhead due to elimination of copying operations.

Usually requires disk mirroring or RAID technology to compensate for risk of disk failure.

Servers share disks Multiple servers simultaneously share access to disks.

Low network and server overhead. Reduced risk of downtime caused by disk failure.

Requires look manager software. Usually used with disk mirroring or RAID technology.

10

Operating system design issue:

• Failure management

• Load balancing

• Parallel computation

• Parallelizing compiler

• Parallelized application

• Parametric computing

Clusters

11

• Uniform memory access (UMA)

• Non uniform memory access (NUMA)

• Cache coherent NUMA (CC-NUMA)

Non uniform memory access

12

L1 cache

Processor1-1

L1 cache

L1 cache

Processor1-m

L1 cache Directory

Mainmemory

1

I/O

L1 cache

Processor2-1

L1 cache

L1 cache

Processor2-m

L1 cache Directory

Mainmemory

2

I/O

L1 cache

ProcessorN-1

L1 cache

L1 cache

ProcessorN-m

L1 cache

DirectoryMain

memoryN

I/O

Interconnectionnetwork

Non uniform memory access

CC-NUMA Organization

13

DO 100 I = 1, N DO 100 I = 1, N

DO 100 J = 1, N C(I, J) = 0.0 (J = 1, N)

C(I, J) = 0.0 DO 100 K = 1, N

DO 100 K = 1, N C(1, J) = C(I, J) + A(I, K) * B(K, J) (J = 1, N)

C(I, J) = C(I, J) + A(I, K) * B(K, J) 100 CONTINUE

100 CONTINUE

(a) Scalar processing (b) Vector processing

DO 50 J = 1, N - 1

FORK 100

50 CONTINUE

J = N

100 DO 200 I = 1, N

C(I, J) = 0.0

DO 200 K = 1, N

C(I, J) = C(I, J) + A(I, K) * B(K, J)

200 CONTINUE

(c) Parallel processing

Vector computation

14

Memory

Inputregister

Pipelined ALU

Memory

Inputregister

Outputregister

Outputregister

(a) Pipelined ALU

(b) Parallel ALUs

ALU

ALU

ALU

Vector computation

15

Vectorcomputation

Compareexponent

Shiftsignificand

Addsignificands

Normalizexi

yizi

C S A N zixi

yi

C S A N zixi

yi

C S A N zi+1xi+1

yi+1

C S A N zi+2xi+2

yi+2

C S A N zi+3xi+3

yi+3

C S A Nx1, y1 z1

C S A Nx2, y2 z2

C S A Nx3, y3 z3

C S A Nx4, y4 z4

C S A Nx5, y5 z5

C S A Nx1, y1 z1

C S A Nx2, y2 z2

C S A Nx3, y3 z3

C S A Nx4, y4 z4

C S A Nx5, y5 z5

C S A Nx6, y6 z6

C S A Nx7, y7 z7

C S A Nx8, y8 z8

C S A Nx9, y9 z9

C S A Nx10, y10 z10

C S A N z11

C S A N z12

x11, y11

x12, y12

(a) Pipelined ALU

(b) Four parallel ALUs

16

Vectorcomputation

DO 100 J = 1, 50

CR(J) = AR(J) * BR(J) – AI(J) * BI(J)

100 CI(J) = AR(J) * BI(J) + AI(J) * BR(J)

Operation Cycle Operation Cycle

AR(J) * BR(J) T1(J) 3 AR(J) V1(J) 1

AI(J) * BI(J) T2(J) 3 BR(J) V2(J) 1

T1(J) – T2(J) CR(J) 3 V1(J) * V2(J) V3(J) 1

AR(J) * BI(J) T3(J) 3 AI(J) V4(J) 1

AI(J) * BR(J) T4(J) 3 BI(J) V5(J) 1

T3(J) + T4(J) CI(J) 3 V4(J) * V5(J) V6(J) 1

TOTAL 12 V3(J) – V6(J) V7(J) 1

(a) Storage to storage V7(J) CR(J) 1

V1(J) * V5(J) V8(J) 1

V4(J) * V2(J) V9(J) 1

V8(J) + V9(J) V0(J) 1

V0(J) CI(J) 1

TOTAL 12

(b) Register to register

17

Vectorcomputation

DO 100 J = 1, 50

CR(J) = AR(J) * BR(J) – AI(J) * BI(J)

100 CI(J) = AR(J) * BI(J) + AI(J) * BR(J)

Operation Cycle Operation Cycle

AR(J) V1(J) 1 AR(J) V1(J) 1

V1(J) * BR(J) V2(J) 1 V1(J) * BR(J) V2(J) 1

AI(J) V3(J) 1 AI(J) V3(J) 1

V3(J) * BI(J) V4(J) 1 V2(J) – V(3) * BI(J) V2(J) 1

V2(J) – V4(J) V5(J) 1 V2(J) CR(J) 1

V5(J) CR(J) 1 V1(J) * BI(J) V4(J) 1

V1(J) * BI(J) V6(J) 1 V4(J) + V3(J) * BR(J) V5(J) 1

V4(J) * BR(J) V7(J) 1 V5(J) CI(J) 1

V6(J) + V7(J) V8(J) 1 TOTAL 8

V8(J) CI(J) 1 (d) Compound instruction

TOTAL 10

(c) Storage to storage

Date post:	03-Jan-2016
Category:	Documents
Upload:	britanni-rogers
View:	45 times
Download:	0 times

Pertemuan 26 Parallel Processing 2

Documents