+ All Categories
Home > Documents > Pertemuan 26 Parallel Processing 2

Pertemuan 26 Parallel Processing 2

Date post: 03-Jan-2016
Category:
Upload: britanni-rogers
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Pertemuan 26 Parallel Processing 2. Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1. Learning Outcomes. Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : Menjelaskan prinsip kerja parallel processing. Outline Materi. - PowerPoint PPT Presentation
Popular Tags:
17
1 Pertemuan 26 Parallel Processing 2 Matakuliah : H0344/Organisasi dan Arsitektur Komputer Tahun : 2005 Versi : 1/1
Transcript
Page 1: Pertemuan 26 Parallel Processing 2

1

Pertemuan 26Parallel Processing 2

Matakuliah : H0344/Organisasi dan Arsitektur Komputer

Tahun : 2005

Versi : 1/1

Page 2: Pertemuan 26 Parallel Processing 2

2

Learning Outcomes

Pada akhir pertemuan ini, diharapkan mahasiswa

akan mampu :

• Menjelaskan prinsip kerja parallel processing

Page 3: Pertemuan 26 Parallel Processing 2

3

Outline Materi

• Multiple Processor Organization

• Symmetric Multiprocessor

• Cache Coherence and The MESI Protocol

• Clusters

• Non-uniform Memory Access

• Vector Computation

Page 4: Pertemuan 26 Parallel Processing 2

4

Cache coherence and MESI Protocol

The cache coherence:

Multiple copies of the same data can exist in different caches simultaneously, and if processors are allowed to update their own copies freely, an inconsistent view of memory can result.

• Software solution

• Hardware solution

• Directory protocol

• Snoopy protocol

Page 5: Pertemuan 26 Parallel Processing 2

5

Cache coherence and MESI Protocol

MESI cache line states

MModified

EExclusive

SShared

IInvalid

This cache line valid?

Yes Yes Yes No

The memory copy is …

Out of date Valid Valid -

Copies exist in other caches?

No No Maybe Maybe

A write to this line …

Does not go to bus Does not go to bus Goes to bus and updates cache

Goes directly to bus

Page 6: Pertemuan 26 Parallel Processing 2

6

Cache coherence and MESI Protocol

MESI state transition diagram

Invalid Shared

Modified Exclusive

RMS

RME

WH

WM

WH

RH

RH

WH

RH

(a) Line in cache at initiating processor

Invalid Shared

Modified Exclusive

SHW SHR

(a) Line in snooping cache

SHWSHR

SHW SHR

Page 7: Pertemuan 26 Parallel Processing 2

7

Clusters

Four benefits that can be achieved with clustering:

• Absolute scalability

• Incremental scalability

• High availability

• Superior price/performance

Page 8: Pertemuan 26 Parallel Processing 2

8

Clusters

Cluster configuration

P P

M I/O I/O

PP

MI/OI/OHigh speed message link

P P

M I/O I/O

PP

MI/OI/O

High speed message linkI/O I/O

RAID

(a) Standby server with no shared disk

(a) Shared disk

Page 9: Pertemuan 26 Parallel Processing 2

9

Clustering methods: benefits and limitationsClustering method Description Benefits Limitation

Passive standby A secondary server takes over in case of primary server failure.

Easy to implement. High cost because the secondary server is unavailable for other processing tasks.

Active secondary The secondary server is also used for processing tasks.

Reduces cost because secondary servers can be used for processing.

Increased complexity.

Separate servers Separate servers have their own disks. Data are continuously copied from primary to secondary server.

High availability. High network and server overhead due to copying operations.

Servers connected to disk

Servers are cabled to the same disks, but each server owns its disk. If one server fails, its disks are taken over by the other server.

Reduced network and server overhead due to elimination of copying operations.

Usually requires disk mirroring or RAID technology to compensate for risk of disk failure.

Servers share disks Multiple servers simultaneously share access to disks.

Low network and server overhead. Reduced risk of downtime caused by disk failure.

Requires look manager software. Usually used with disk mirroring or RAID technology.

Page 10: Pertemuan 26 Parallel Processing 2

10

Operating system design issue:

• Failure management

• Load balancing

• Parallel computation

• Parallelizing compiler

• Parallelized application

• Parametric computing

Clusters

Page 11: Pertemuan 26 Parallel Processing 2

11

• Uniform memory access (UMA)

• Non uniform memory access (NUMA)

• Cache coherent NUMA (CC-NUMA)

Non uniform memory access

Page 12: Pertemuan 26 Parallel Processing 2

12

L1 cache

Processor1-1

L1 cache

L1 cache

Processor1-m

L1 cache Directory

Mainmemory

1

I/O

L1 cache

Processor2-1

L1 cache

L1 cache

Processor2-m

L1 cache Directory

Mainmemory

2

I/O

L1 cache

ProcessorN-1

L1 cache

L1 cache

ProcessorN-m

L1 cache

DirectoryMain

memoryN

I/O

Interconnectionnetwork

Non uniform memory access

CC-NUMA Organization

Page 13: Pertemuan 26 Parallel Processing 2

13

DO 100 I = 1, N DO 100 I = 1, N

DO 100 J = 1, N C(I, J) = 0.0 (J = 1, N)

C(I, J) = 0.0 DO 100 K = 1, N

DO 100 K = 1, N C(1, J) = C(I, J) + A(I, K) * B(K, J) (J = 1, N)

C(I, J) = C(I, J) + A(I, K) * B(K, J) 100 CONTINUE

100 CONTINUE

(a) Scalar processing (b) Vector processing

DO 50 J = 1, N - 1

FORK 100

50 CONTINUE

J = N

100 DO 200 I = 1, N

C(I, J) = 0.0

DO 200 K = 1, N

C(I, J) = C(I, J) + A(I, K) * B(K, J)

200 CONTINUE

(c) Parallel processing

Vector computation

Page 14: Pertemuan 26 Parallel Processing 2

14

Memory

Inputregister

Pipelined ALU

Memory

Inputregister

Outputregister

Outputregister

(a) Pipelined ALU

(b) Parallel ALUs

ALU

ALU

ALU

Vector computation

Page 15: Pertemuan 26 Parallel Processing 2

15

Vectorcomputation

Compareexponent

Shiftsignificand

Addsignificands

Normalizexi

yizi

C S A N zixi

yi

C S A N zixi

yi

C S A N zi+1xi+1

yi+1

C S A N zi+2xi+2

yi+2

C S A N zi+3xi+3

yi+3

C S A Nx1, y1 z1

C S A Nx2, y2 z2

C S A Nx3, y3 z3

C S A Nx4, y4 z4

C S A Nx5, y5 z5

C S A Nx1, y1 z1

C S A Nx2, y2 z2

C S A Nx3, y3 z3

C S A Nx4, y4 z4

C S A Nx5, y5 z5

C S A Nx6, y6 z6

C S A Nx7, y7 z7

C S A Nx8, y8 z8

C S A Nx9, y9 z9

C S A Nx10, y10 z10

C S A N z11

C S A N z12

x11, y11

x12, y12

(a) Pipelined ALU

(b) Four parallel ALUs

Page 16: Pertemuan 26 Parallel Processing 2

16

Vectorcomputation

DO 100 J = 1, 50

CR(J) = AR(J) * BR(J) – AI(J) * BI(J)

100 CI(J) = AR(J) * BI(J) + AI(J) * BR(J)

Operation Cycle Operation Cycle

AR(J) * BR(J) T1(J) 3 AR(J) V1(J) 1

AI(J) * BI(J) T2(J) 3 BR(J) V2(J) 1

T1(J) – T2(J) CR(J) 3 V1(J) * V2(J) V3(J) 1

AR(J) * BI(J) T3(J) 3 AI(J) V4(J) 1

AI(J) * BR(J) T4(J) 3 BI(J) V5(J) 1

T3(J) + T4(J) CI(J) 3 V4(J) * V5(J) V6(J) 1

TOTAL 12 V3(J) – V6(J) V7(J) 1

(a) Storage to storage V7(J) CR(J) 1

V1(J) * V5(J) V8(J) 1

V4(J) * V2(J) V9(J) 1

V8(J) + V9(J) V0(J) 1

V0(J) CI(J) 1

TOTAL 12

(b) Register to register

Page 17: Pertemuan 26 Parallel Processing 2

17

Vectorcomputation

DO 100 J = 1, 50

CR(J) = AR(J) * BR(J) – AI(J) * BI(J)

100 CI(J) = AR(J) * BI(J) + AI(J) * BR(J)

Operation Cycle Operation Cycle

AR(J) V1(J) 1 AR(J) V1(J) 1

V1(J) * BR(J) V2(J) 1 V1(J) * BR(J) V2(J) 1

AI(J) V3(J) 1 AI(J) V3(J) 1

V3(J) * BI(J) V4(J) 1 V2(J) – V(3) * BI(J) V2(J) 1

V2(J) – V4(J) V5(J) 1 V2(J) CR(J) 1

V5(J) CR(J) 1 V1(J) * BI(J) V4(J) 1

V1(J) * BI(J) V6(J) 1 V4(J) + V3(J) * BR(J) V5(J) 1

V4(J) * BR(J) V7(J) 1 V5(J) CI(J) 1

V6(J) + V7(J) V8(J) 1 TOTAL 8

V8(J) CI(J) 1 (d) Compound instruction

TOTAL 10

(c) Storage to storage


Recommended