+ All Categories
Home > Documents > SIMD and Associative Computing

SIMD and Associative Computing

Date post: 07-Jan-2016
Category:
Upload: truong
View: 49 times
Download: 1 times
Share this document with a friend
Description:
SIMD and Associative Computing. Computational Models and Algorithms. Associative Computing Topics. Introduction References for Associative Computing Motivation for the MASC model The MASC and ASC Models A Language Designed for the ASC Model Two ASC Algorithms and Programs - PowerPoint PPT Presentation
Popular Tags:
48
SIMD and Associative Computing Computational Models and Algorithms
Transcript
Page 1: SIMD and Associative Computing

SIMD and Associative Computing

Computational Models and Algorithms

Page 2: SIMD and Associative Computing

2

Associative Computing Topics

• Introduction– References for Associative Computing– Motivation for the MASC model– The MASC and ASC Models– A Language Designed for the ASC Model– Two ASC Algorithms and Programs

• ASC and MASC Algorithm Examples– ASC version of Prim’s MST Algorithm– ASC version of QUICKHULL – MASC version of QUICKHULL.

Page 3: SIMD and Associative Computing

3

Associative Computing References

Note: Below KSU papers are available on the website: http://www.cs.kent.edu/~parallel/

(Click on the link to “papers” in left column)

1. Maher Atwah, Johnnie Baker, and Selim Akl, An Associative Implementation of Classical Convex Hull Algorithms, Proc of the IASTED International Conference on Parallel and Distributed Computing and Systems, 1996, 435-438

2. Johnnie Baker and Mingxian Jin, Simulation of Enhanced Meshes with MASC, a MSIMD Model, Proc. of the Eleventh IASTED International Conference on Parallel and Distributed Computing and Systems, Nov. 1999, 511-516.

Page 4: SIMD and Associative Computing

4

Associative Computing References

3. Mingxian Jin, Johnnie Baker, and Kenneth Batcher, Timings for Associative Operations on the MASC Model, Proc. of the 15th International Parallel and Distributed Processing Symposium, (Workshop on Massively Parallel Processing, San Francisco, April 2001.

4. Jerry Potter, Johnnie Baker, Stephen Scott, Arvind Bansal, Chokchai Leangsuksun, and Chandra Asthagiri, An Associative Computing Paradigm, Special Issue on Associative Processing, IEEE Computer, 27(11):19-25, Nov. 1994. (Note: MASC is called ‘ASC’ in this article.)

5. Jerry Potter, Associative Computing - A Programming Paradigm for Massively Parallel Computers, Plenum Publishing Company, 1992.

Page 5: SIMD and Associative Computing

5

Associative Computers

Associative Computer: A SIMD computer with a few additional features supported in hardware.

• These additional features can be supported (less efficiently) in traditional SIMDs in software.

• The name “associative” is due to its ability to locate items in the memory of PEs by content rather than location.

Page 6: SIMD and Associative Computing

6

Associative Models

The ASC model (for ASsociative Computing) gives a list of the properties assumed for an associative computer.

The MASC (for Multiple ASC) Model• Supports multiple SIMD (or MSIMD)

computation. • Allows model to have more than one Instruction

Stream (IS)– The IS corresponds to the control unit of a SIMD.

• ASC is the MASC model with only one IS. – The one IS version of the MASC model is sufficiently

important to have its own name.

Page 7: SIMD and Associative Computing

7

ASC & MASC are KSU Models• Several professors and their graduate students at Kent

State University have worked on models • The STARAN and the ASPRO fully support the ASC

model in hardware. The MPP supports ASC, partly in hardware and partly in software.– Prof. Batcher was chief architect or consultant

• Dr. Potter developed a language for ASC• Dr. Baker works on algorithms for models, applications,

and architectures to support models• Dr. Walker has students who worked on a hardware

design to support the ASC and MASC models. • Dr. Batcher and Dr. Potter are currently not actively

working on ASC/MASC models but still provide advice.

Page 8: SIMD and Associative Computing

8

Motivation• The STARAN Computer (Goodyear Aerospace,

early 1970’s) and later the ASPRO provided an architectural model for associative computing embodied in the ASC model.

• ASC extends the data parallel programming style to a complete computational model.

• ASC provides a practical model that supports massive parallelism.

• MASC provides a hybrid data-parallel, control parallel model that supports associative programming.

• Descriptions of these models allow them to be compared to other parallel models

Page 9: SIMD and Associative Computing

9

The ASC Model

IS

CELL

NETWORK

PEMemory

Cells

PEMemory

PEMemory

Page 10: SIMD and Associative Computing

10

Basic Properties of ASC• Instruction Stream

– The IS has a copy of the program and can broadcast instructions to cells in unit time

• Cell Properties– Each cell consists of a PE and its local memory– All cells listen to the IS – A cell can be active, inactive, or idle

• Inactive cells listen but do not execute IS commands until reactivated

• Idle cells contain no essential data and are available for reassignment

• Active cells execute IS commands synchronously

Page 11: SIMD and Associative Computing

11

Basic Properties of ASC

• Responder Processing– The IS can detect if a data test is satisfied by

any of its responder cells in constant time (i.e., any-responders property).

– The IS can select an arbitrary responder in constant time (i.e., pick-one property).

Page 12: SIMD and Associative Computing

12

• Constant Time Global Operations (on a parallel variable with a value in each PE)– Logical OR and AND of binary values– Maximum and minimum of numbers– Associative searches

• Communications– There are two networks

• PE communications (or cell) network • IS broadcast/reduction network (which could be

implemented as two separate networks)

Basic Properties of ASC

Page 13: SIMD and Associative Computing

13

Basic Properties of ASC

– The PE communications network is normally supported by an interconnection network

• E.g., a 2D mesh

– The broadcast/reduction network(s) are normally supported by two networks (broadcast & reduction) but sometimes combined.

• See posted paper by Jin, Baker, & Batcher (listed in associative references)

• Control Features– PEs and the IS and the networks all operate

synchronously, using the same clock

Page 14: SIMD and Associative Computing

14

Non-SIMD Properties of ASC• Observation: The ASC properties that are unusual for

SIMDs are the constant time operations:– Constant time responder processing

• Any-responders?• Pick-one

– Constant time global operations• Logical OR and AND of binary values• Maximum and minimum value of numbers• Associative Searches

• Typically, SIMDs cannot perform any of these operations in constant time – but have efficient executions of them.

• These timings are justified by implementations using a resolver in the paper by Jin, Baker, & Batcher (listed in associative references and posted).

Page 15: SIMD and Associative Computing

15

1

Busy-idle

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

0

1

1

0

0

1

IS

Typical Data Structure for ASC Model

Make, Color – etc. are fields the programmer establishes

Various data types are supported. Some examples will show string data, but they are not supported in the ASC simulator.

Page 16: SIMD and Associative Computing

16

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

Busy-idle

1

0

1

1

0

0

1

IS

The Associative Search

IS asks for all cars that are red and on the lot.

PE1 and PE7 respond by setting a mask bit in their PE.

Page 17: SIMD and Associative Computing

17

Characteristics of Associative

Programming • Consistent use of style of programming called

data parallel programming• Consistent use of global associative searching

and responder processing• Usually, frequent use of the constant time global

reduction operations: AND, OR, MAX, MIN• Broadcast of data using IS bus allows the use of

the PE network to be restricted to parallel data movement.

Page 18: SIMD and Associative Computing

18

Associative Programming Characteristics

• Tabular representation of data – think 2D arrays• Use of searching instead of sorting• Use of searching instead of pointers• Use of searching instead of the ordering provided

by linked lists, stacks, queues• Promotes an highly intuitive programming style

that promotes high productivity• Uses structure codes (i.e., numeric

representation) to represent data structures such as trees, graphs, embedded lists, and matrices.

• Examples of the above are given in– Ref: Nov. 1994 IEEE Computer article.– Also, see “Associative Computing” book by Potter.

Page 19: SIMD and Associative Computing

19

Languages Designed for the ASC

• Professor Potter has created several languages for the ASC model.

• ASC is a C-like language designed for ASC model• ACE is a higher level language than ASC that uses

natural language syntax; e.g., plurals, pronouns.• Anglish is an ACE variant that uses an English-like

grammar (e.g., “their”, “its”) • An OOPs version of ASC for the MASC was discussed

(by Potter and his students), but never designed.• Language References:

– ASC Primer – Copy available on parallel lab website www.cs.kent.edu/~parallel/

– “Associative Computing” book by Potter [11] – some features in this book were never fully implemented in ASC Compiler

Page 20: SIMD and Associative Computing

20

Algorithms and Programs Implemented in ASC

• A wide range of algorithms implemented in ASC without the use of the PE network:– Graph Algorithms

• minimal spanning tree• shortest path• connected components

– Computational Geometry Algorithms• convex hull algorithms (Jarvis March, Quickhull,

Graham Scan, etc)• Dynamic hull algorithms

Page 21: SIMD and Associative Computing

21

ASC Algorithms and Programs(not requiring PE network)

– String Matching Algorithms• all exact substring matches• all exact matches with “don’t care” (i.e., wild card)

characters.

– Algorithms for NP-complete problems• traveling salesperson • 2-D knapsack.

– Data Base Management Software• associative data base• relational data base

Page 22: SIMD and Associative Computing

22

ASC Algorithms and Programs (not requiring a PE network)

– A Two Pass Compiler for ASC – not the one we will be using. This compiler uses ASC parallelism.

• first pass• optimization phase

– Two Rule-Based Inference Engines for AI• An Expert System OPS-5 interpreter• PPL (Parallel Production Language interpreter)

– A Context Sensitive Language Interpreter• (OPS-5 variables force context sensitivity)

– An associative PROLOG interpreter

Page 23: SIMD and Associative Computing

23

Associative Algorithms & Programs

(using a network)• There are numerous associative programs that

use a PE network;– 2-D Knapsack ASC Algorithm using a 1-D mesh– Image processing algorithms using 1-D mesh– FFT (Fast Fourier Transform) using 1-D nearest

neighbor & Flip networks– Matrix Multiplication using 1-D mesh– An Air Traffic Control Program (using Flip network

connecting PEs to memory)• Demonstrated using live data at Knoxville in mid

70’s.

• All but first were developed in assembler at Goodyear Aerospace

Page 24: SIMD and Associative Computing

Two Algorithm Examples for ASC

• The remainder of the slides are optional• They illustrate two optimal algorithms for ASC –

described at the data structures level.– Minimal Spanning Tree

• Provides an ASC version of the sequential Prim’s algorithm (sometimes called Dijkstra-Prim algorithm)

– Convex Hull • Provides an ASC version of the Quickhull sequential

algorithm

• The implementation of these algorithms are simpler than implementation of their sequential version.

24

Page 25: SIMD and Associative Computing

25

Example 1 - MST

• A graph has nodes labeled by some identifying letter or number and arcs which are directional and have weights associated with them.

• Such a graph could represent a map where the nodes are cities and the arc weights give the mileage between two cities.

A B

C D

E

3

5 2

54

Page 26: SIMD and Associative Computing

26

The MST Problem

• The MST problem assumes the weights are positive, the graph is connected, and seeks to find the minimal spanning tree,

– i.e. a subgraph that is a tree1, that includes all nodes (i.e. it spans), and

– where the sum of the weights on the arcs of the subgraph is the smallest possible weight (i.e. it is minimal).

• Why would an algorithm solving this problem be useful?

• Note: The solution may not be unique.1 A tree is a set of points called vertices, pairs of distinct

vertices called edges, such that (1) there is a sequence of edges called a path from any vertex to any other, and (2) there are no circuits, that is, no paths starting from a vertex and returning to the same vertex.

Page 27: SIMD and Associative Computing

27

ASC-MST Algorithm Preliminaries

• Next, a “data structure” level presentation of Prim’s algorithm for the MST is given.

• The data structure used is illustrated in the next two slides. – This example is from the Nov. 1994 IEEE Computer

paper cited in the references.

• There are two types of variables for the ASC model, namely– the parallel variables (i.e., ones for the PEs) – the scalar variables (ie., the ones used by the IS). – Scalar variables are essentially global variables.

• Can replace each with a parallel variable with this scalar value stored in each entry.

Page 28: SIMD and Associative Computing

28

ASC-MST Algorithm Preliminaries (cont.)

• In order to distinguish between them here, the parallel variables names end with a “$” symbol.

• Each step in this algorithm takes constant time. • One MST edge is selected during each pass through the

loop in this algorithm.• Since a spanning tree has n-1 edges, the running time of

this algorithm is O(n) and its cost is O(n 2).– Definition of cost is (running time) (number of processors)

• Since the sequential running time of the Prim MST algorithm is O(n 2) and is time optimal, this parallel implementation is cost optimal.– Cost & optimality will be covered in parallel algorithm

performance evaluation chapter (See Ch 7 of Quinn)

Page 29: SIMD and Associative Computing

29

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

2 8

96

3

3

4

7

2

Page 30: SIMD and Associative Computing

30

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node b

a

IS

wait∞∞∞9∞∞f

3byes∞∞363∞e

4byes∞3∞∞4∞d

7byes96∞∞78 c

2ano∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 31: SIMD and Associative Computing

31

Algorithm: ASC-MST-PRIM(root)1. Initialize candidates to “waiting”

2. If there are any finite values in root’s field,

3. set candidate$ to “yes”

4. set parent$ to root

5. set current_best$ to the values in root’s field

6. set root’s candidate field to “no”

7. Loop while some candidate$ contain “yes”

8. for them

9. restrict mask$ to mindex(current_best$)

10. set next_node to a node identified in the preceding step

11. set its candidate to “no”

12. if the value in their next_node’s field are less than current_best$, then

13. set current_best$ to value in next_node’s field

14. set parent$ to next_node

15. if candidate$ is “waiting” and the value in its next_node’s field is finite

16. set candidate$ to “yes”

17. set parent$ to next_node

18. set current_best to the values in next_node’s field

Page 32: SIMD and Associative Computing

32

Comments on ASC-MST Algorithm• The three preceding slides are Figure 6 in [Potter, Baker,

et.al.] IEEE Computer, Nov 1994].• Preceding slide gives a compact, data-structures level

pseudo-code description for this algorithm – Pseudo-code illustrates Potter’s use of pronouns

(e.g., them, its) and possessive nouns.– The mindex function returns the index of a processor

holding the minimal value.– This MST pseudo-code is much shorter and simpler

than data-structure level sequential MST pseudo-codes

• e.g., see one of Baase’s textbooks cited in references• Algorithm given in Baase’s books is identical to this parallel

algorithm, except for a sequential computer

• Next, a more detailed explanation of the algorithm in preceding slide will be given next.

Page 33: SIMD and Associative Computing

33

Tracing 1st Pass of MST Algorithm on Figure 6 (Put below chart & Figure 6 on board)

curr

ent_

best

$

cand

idat

e$

next-node

a

IS

∞∞∞9∞∞f

∞∞363∞e

∞3∞∞4∞d

96∞∞78 c

∞347∞2b

∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 34: SIMD and Associative Computing

34

Algorithm: ASC-MST-PRIM• Initially assign any node to root.• All processors set

– candidate$ to “wait”– current-best$ to – the candidate field for the root node to “no”

• All processors whose distance d from their node to root node is finite do– Set their candidate$ field to “yes– Set their parent$ field to root.– Set current_best$ = d.

Page 35: SIMD and Associative Computing

35

Algorithm: ASC-MST-PRIM (cont. 2/3)

• While the candidate field of some processor is “yes”, – Restrict the active processors whose candidate field

is “yes” and (for these processors) do• Compute the minimum value x of current_best$.• Restrict the active processors to those with

current_best$ = x and do– pick an active processor, say node y.

» Set the candidate$ value of node y to “no” – Set the scalar variable next-node to y.

Page 36: SIMD and Associative Computing

36

Algorithm: ASC-MST-PRIM (cont. 3/3)

–If the value z in the next_node column of a processor is less than its current_best$ value, then

»Set current_best$ to z. »Set parent$ to next_node

– For all processors, if candidate$ is “waiting” and the distance of its node from next_node y is finite, then

• Set candidate$ to “yes”• Set current_best$ to the distance of its node from

y.• Set parent$ to y

Page 37: SIMD and Associative Computing

37

Quickhull Algorithm for ASC• Reference:

– [Maher, Baker, Akl, “An Associative Implementation of Classical Convex Hull Algorithms” ]

• Review of Sequential Quickhull Algorithm– Suffices to find the upper convex hull

of points that are on or above the line

• Select point h so that the area of triangle weh is maximal.

• Proceed recursively with the sets of points on or above the lines and .

whhe

we

h

Page 38: SIMD and Associative Computing

38

Previous Illustration

w

e

h

Page 39: SIMD and Associative Computing

39

Example for Data Structure

p1, w

p7

p2

P3, e

p4

p5

P6, h

Page 40: SIMD and Associative Computing

40

Data Structure for Preceding Example

1p3p162p7

11p3p198p6ctr

1p3p1711p5h

1p3p148p4

11p3p1212IS

0p3p117p20

11p3p131p1

job$hull$righ

t-p

t$

area$name$ left

-pt$

x-co

ord

$

y-co

ord

$

point$

w

ep3

h

PE ma

sk

Page 41: SIMD and Associative Computing

41

Algorithms & Assumption

• Basic algorithms exist for the following problems in Euclidean geometry for plane:– Determine whether a third point lies on, above, or

below the line determined by two other points.– Compute the area of a triangle determined by three

points.• Standard Assumption

– Three arbitrary points do not all lie on the same line.

Reference: Introduction to Algorithms by Cormen, Leisterson, Rivest,

(& Stein), McGraw Hill, Chapter on Computational Geometry.

Page 42: SIMD and Associative Computing

42

ASC Quickhull Algorithm(Upper Convex Hull)

ASC-Quickhull( planar-point-set )

1. Initialize: ctr = 1, area$ = 0, hull$ = 02. Find the PE with the minimal x-coord$ and let

w be its point$a) Set its hull$ value to 1

3. Find the PE with the PE with maximal x-coord$ and let e be its point$

a) Set its hull$ to 14. All PEs set their left-pt to w and right-pt to e.5. If the point$ for a PE lies above the line

a) Then set its job$ value to 1b) Else set its job$ value to 0

Page 43: SIMD and Associative Computing

43

ASC Quickhull Algorithm (cont)

6. Loop while parallel job$ contains a nonzero valuea) The IS makes its active cell those with a maximal

job$ value.b) Each (active) PE computes and stores the area of

triangle (left-pt$, right-pt$, point$ ) in area$c) Find the PE with the maximal area$ and let h be its

point.• Set its hull$ value to 1

d) Each PE whose point$ is above• sets its job$ value to ++ctr• sets its right-pt to h

e) Each PE whose point$ is above• sets its job$ to ++ctr• sets its left-pt to h

f) Each PE with job$ < ctr -2 sets its job$ value to 0

hptleft ,

ptrighth ,

Page 44: SIMD and Associative Computing

44

Highest Job Order Assigned to Points Above Lines

1

2

6

7

3

5

4

Page 45: SIMD and Associative Computing

45

Order that Triangles are Computed

1

5

7

6

2

3

4

Page 46: SIMD and Associative Computing

46

Performance of ASC-Quickhull

Average Case:

• Assume either of the following: – For some integer k>1, on average 1/k of the

points above each line being processed are eliminated each round.

• For example, consider k = 3, as one of three different areas are eliminated each round

– O(lg n) points are on the convex hull.• For randomly generated points, the number of

convex hull points is very close to lg(n) points.

Page 47: SIMD and Associative Computing

47

Performance of ASC-Quickhull (cont)

• Either of above assumptions imply the average running time is O(lg n). – For example, each pass through algorithm

loop produces one convex hull point.

• The average cost is O(n lg n)Worst Case:• Running time is O(n).• Cost is O(n2)

Recall: The definition of cost is Cost = (running time) (nr. of processors)

Page 48: SIMD and Associative Computing

48

Additional Comments on ASC Convex Hull Algorithm

• The full “convex hull” algorithm requires that an order (e.g., clockwise) list of convex hull points be returned.– Preceding algorithms for ASC and MASC can be

extended to handle this.

• This detail is omitted here to keep the algorithms simpler.– More information can be found in the paper “An

Associative Implementation of Classical Convex Hull Algorithms” by Atwah, Baker, and Akl and in Maher Atwah’s master’s thesis at KSU.


Recommended