+ All Categories
Home > Technology > Paralle programming 2

Paralle programming 2

Date post: 27-Jun-2015
Category:
Upload: anshul-sharma
View: 324 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
1b.1 Types of Parallel Computers Two principal approaches: • Shared memory multiprocessor • Distributed memory multicomputer ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2010. Aug 26, 2010
Transcript
Page 1: Paralle programming 2

1b.1

Types of Parallel Computers

Two principal approaches:

• Shared memory multiprocessor

• Distributed memory multicomputer

ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2010. Aug 26, 2010

Page 2: Paralle programming 2

1b.2

Shared Memory Multiprocessor

Page 3: Paralle programming 2

1b.3

Conventional ComputerConsists of a processor executing a program stored in a (main) memory:

Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address.

Main memory

Processor

Instructions (to processor)Data (to or from processor)

Page 4: Paralle programming 2

1b.4

Shared Memory Multiprocessor SystemNatural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module:

Processors

Processor-memory Interconnections

Memory moduleOneaddressspace

Page 5: Paralle programming 2

1b.5

Simplistic view of a small shared memory multiprocessor

Examples:• Dual Pentiums• Quad Pentiums

Processors Shared memory

Bus

Page 6: Paralle programming 2

1b.6

Real computer system have cache memory between the main memory and processors. Level 1 (L1) cache and Level 2 (L2) cache.

Example Quad Shared Memory Multiprocessor

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Memory controller

Memory

Processor/memorybus

Shared memory

Page 7: Paralle programming 2

1b.7

“Recent” innovation

• Dual-core and multi-core processors• Two or more independent processors in one

package

• Actually an old idea but not put into wide practice until recently.

• Since L1 cache is usually inside package and L2 cache outside package, dual-/multi-core processors usually share L2 cache.

Page 8: Paralle programming 2

1b.8

Single quad core shared memory multiprocessor

L2 Cache

Memory controller

MemoryShared memory

Chip

ProcessorL1 cache

ProcessorL1 cache

ProcessorL1 cache

ProcessorL1 cache

Page 9: Paralle programming 2

1b.9

Examples• Intel:

– Core Dual processors -- Two processors in one package sharing a common L2 Cache. 2005-2006

– Intel Core 2 family dual cores, with quad core from Nov 2006 onwards

– Core i7 processors replacing Core 2 family - Quad core Nov 2008

– Intel Teraflops Research Chip (Polaris), a 3.16 GHz, 80-core processor prototype.

• Xbox 360 game console -- triple core PowerPC microprocessor.

• PlayStation 3 Cell processor -- 9 core design.References and more information -- wikipedia

Page 10: Paralle programming 2

1b.10

Multiple quad-core multiprocessors(example coit-grid05.uncc.edu)

Memory controller

MemoryShared memory

L2 Cache

possible L3 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Page 11: Paralle programming 2

1b.11

Programming Shared Memory Multiprocessors

Several possible ways

1. Thread libraries - programmer decomposes program into individual parallel sequences, (threads), each being able to access shared variables declared outside threads.

Example Pthreads

2. Higher level library functions and preprocessor compiler directives to declare shared variables and specify parallelism. Uses threads.

Example OpenMP - industry standard. Consists of library functions, compiler directives, and environment variables - needs OpenMP compiler

Page 12: Paralle programming 2

1b.12

3. Use a modified sequential programming language -- added syntax to declare shared variables and specify parallelism.

Example UPC (Unified Parallel C) - needs a UPC compiler.

4. Use a specially designed parallel programming language -- with syntax to express parallelism. Compiler automatically creates executable code for each processor (not now common).

5. Use a regular sequential programming language such as C and ask parallelizing compiler to convert it into parallel executable code. Also not now common.

Page 13: Paralle programming 2

1b.13

Message-Passing Multicomputer

Complete computers connected through an interconnection network:

Processor

Interconnectionnetwork

Local

Computers

Messages

memory

Page 14: Paralle programming 2

1b.14

Interconnection Networks

Many explored in the 1970s and 1980s

• Limited and exhaustive interconnections• 2- and 3-dimensional meshes• Hypercube• Using Switches:

– Crossbar– Trees– Multistage interconnection networks

Page 15: Paralle programming 2

1b.15

Networked Computers as a Computing Platform

• A network of computers became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990s.

• Several early projects. Notable:

– Berkeley NOW (network of workstations) project.

– NASA Beowulf project.

Page 16: Paralle programming 2

1b.16

Key advantages:

• Very high performance workstations and PCs readily available at low cost.

• The latest processors can easily be incorporated into the system as they become available.

• Existing software can be used or modified.

Page 17: Paralle programming 2

1b.17

Beowulf Clusters*

• A group of interconnected “commodity” computers achieving high performance with low cost.

• Typically using commodity interconnects - high speed Ethernet, and Linux OS.

* Beowulf comes from name given by NASA Goddard Space Flight Center cluster project.

Page 18: Paralle programming 2

1b.18

Cluster Interconnects

• Originally fast Ethernet on low cost clusters• Gigabit Ethernet - easy upgrade path

More Specialized/Higher Performance• Myrinet - 2.4 Gbits/sec - disadvantage: single vendor• cLan• SCI (Scalable Coherent Interface)• QNet• Infiniband - may be important as infininband

interfaces may be integrated on next generation PCs

Page 19: Paralle programming 2

1b.19

Dedicated cluster with a master node and compute nodes

User

Master node

Compute nodes

Dedicated Cluster

Ethernet interface

Switch

External network

Computers

Local network

Page 20: Paralle programming 2

1b.20

Software Tools for Clusters

• Based upon message passing programming model

• User-level libraries provided for explicitly specifying messages to be sent between executing processes on each computer .

• Use with regular programming languages (C, C++, ...).

• Can be quite difficult to program correctly as we shall see.

Page 21: Paralle programming 2

Next step• Learn the message passing

programming model, some MPI routines, write a message-passing program and test on the cluster.

1b.21


Recommended