Parallel Computers Organizations and Architecture Department of Computer Science Southern Illinois...

Post on 02-Jan-2016

212 views 0 download

Tags:

transcript

Parallel ComputersOrganizations and Architecture

Department of Computer ScienceSouthern Illinois University Edwardsville

Summer, 2015

Dr. Hiroshi FujinokiE-mail: hfujino@siue.edu

CS 312 Computer Organization and Architecture

Mult_Sched/001

CS 312 Computer Organization and Architecture

Four hardware architecture for “parallel computers”

Tightly-Coupled Multi-Processor System

Functionally-Specialized Multi-Processor System

Loosely-Coupled Multi-Processor System

Distributed Systems (“most loosely coupled systems”)

MotherboardMotherboard

Mult_Sched/002

Tightly-Coupled Multi-Processor System

• Multi-Processor System (multi-processor motherboard)

• Single-Processor System with a multi-core processor

Multi-ProcessorSystem

Single-Processor Systemwith multi-core processor

ProcessorProcessor

Processor Core(ALU and others)

CS 312 Computer Organization and Architecture

Mult_Sched/002

Tightly-Coupled Multi-Processor System

• Multi-Processor System (multi-processor motherboard)

CS 312 Computer Organization and Architecture

Two processors on a motherboard

Mult_Sched/002

Tightly-Coupled Multi-Processor System

CS 312 Computer Organization and Architecture

• Single-Processor System with a multi-core processor

CPU cores

Motherboard

Graphic Interface

Video RAM (“VRAM”)

Mult_Sched/003

Functionally-Specialized Multi-Processor System

Examples: • GPU on graphics card• Built-in processor on high-speed disk controllers or NICs

(especially those using DMA)

Processor

Monitor(CRT, Flat Panel)

DAC

Graphic-card performs D/A conversion using DAC.

GPU

GPU processes image data in the graphic-card memory

Processor sends graphic command to GPU

Graphic-card sends analog image signals (RGB-signals) to monitor

(GPU = “Graphic Processing Unit”)

CS 312 Computer Organization and Architecture

Mult_Sched/003

Functionally-Specialized Multi-Processor System

Examples: • GPU on graphics card (GPU = “Graphic Processing Unit”)

CS 312 Computer Organization and Architecture

DMA SCSI I/O card

CPU

Control Program (in ROM)

Mult_Sched/004

Loosely-Coupled Multi-Processor System

• Multi-Systemboard (multiple motherboard) computers

Computer System“Bus”

Processor

System Board(Motherboard)

Memory

• A computer with multiple motherboards (“blades”)

• Blades communicate through the bus

• Each blade is a computer

• Communication delay over the bus

at least “s” order

CS 312 Computer Organization and Architecture

Mult_Sched/004

Loosely-Coupled Multi-Processor System

• Multi-Systemboard (multiple motherboard) computers

CS 312 Computer Organization and Architecture

Mult_Sched/005

Distributed Systems (“most loosely coupled systems”)

AS 1

AS 4

AS 2

AS 3

• Processor• Local Memory• Secondary Storage• Other I/O

• Processor• Local Memory• Secondary Storage• Other I/O

• Processor• Local Memory• Secondary Storage• Other I/O

• Processor• Local Memory• Secondary Storage• Other I/O

Process(executable codes)

Process Migration

File (data)

Data MigrationNetwork

CS 312 Computer Organization and Architecture

Mult_Sched/006

Three different types of tightly-coupled multi-processor systems

(1) “Fine-grained” multi-processor parallel computers

(2) “Medium-grained” multi-processor parallel computers

(3) “Coarse-grained” multi-processor parallel computers

CS 312 Computer Organization and Architecture

Mult_Sched/007

Fine-Grained Multi-Process

• Fine-grained = instruction-level multi-processing

Your program(binary executable)

A = B + C;X = Y + Z;

W = A + X;

synchronization

Dependency

Granularity: 1~20 instructions

CPU CPU

CS 312 Computer Organization and Architecture

Mult_Sched/008

Medium-Grained Multi-Process

• Medium-grained = thread-level multi-processing

Your program(binary executable)

ThreadA

ThreadB

ThreadC

ThreadD

Processor Processor

CS 312 Computer Organization and Architecture

Mult_Sched/009

Medium-Grained Multi-Process

• Example: Web Browser

ThreadA -- Display thread (text output & jpeg image processing)

ThreadB -- Taking user inputs (edit boxes, radio boxes in the browser window

ThreadC -- Network input (receiving data from network)

ThreadD -- Network output (sending data to network)

ThreadA ThreadB ThreadC ThreadD

Receivingdata

Displayingdata

User makesinputs

Receivingdata

Transmitdata

CS 312 Computer Organization and Architecture

Mult_Sched/010

Medium-Grained Multi-Process

• Example: Web Browser

ThreadA -- Display thread (text output & jpeg image processing)

ThreadB -- Taking user inputs (edit boxes, radio boxes in the browser window

ThreadC -- Network input (receiving data from network)

ThreadD -- Network output (sending data to network)

ThreadA ThreadB ThreadC ThreadD

ReceivingdataDisplaying

dataUser makesinputs

Receivingdata

Transmitdata

Browser executionwith better responses

Granularity: 20~200 instructions

CS 312 Computer Organization and Architecture

Mult_Sched/011

Coarse-Grained Multi-Process

• Coarse-grained = process-level multi-tasking

Process assignment to multiple processors in multi-tasking environment

Memory

Processor

Time

CS 312 Computer Organization and Architecture

Mult_Sched/012

Coarse-Grained Multi-Process

• Coarse-grained = process-level multi-tasking

Process assignment to multiple processors in multi-tasking environment

Memory

Processor PoolGranularity = ms order

• 1ms (@ 1GHz) = 1 million instructions

• 100ms (@ 1GHz) = 100M instructions

Granularity: 1~100 M instructions

Time

CS 312 Computer Organization and Architecture