Download - Summary

c-Perfect Hashing Schemes for Arrays, with Applications to

Parallel MemoriesG. Cordasco1, A. Negro1, A. L. Rosenberg2 and V. Scarano1

1 Dipartimento di Informatica ed Applicazioni ”R.M. Capocelli” Università di Salerno, 84081, Baronissi (SA) – Italy

2 Dept.of Computer Science University of Massachusetts at Amherst

Amherst, MA 01003, USA

Workshop on Distributed Data and Structures 2003

2

Summary

The Problem The Motivation and some examples Our results Conclusion


3

The problemMapping nodes of a data structure on a parallel memory system in such a way that data can be efficiently accessed by templates.

Data structures (D): array, trees, hypercubes… Parallel memory systems (PM):

PP-1

P0

P1

M0

M1

MM -2

MM-1

PM

D


4

The problem(2)Mapping nodes of a data structure on a parallel memory system in such a way that data can be efficiently accessed by templates.

Templates: distinct sets of nodes (for arrays: rows, columns, diagonals, submatrix; for trees: subtrees, simple paths, levels; etc).

Formally: Let GD be the graph that describes the data structure D. Atemplate t, for D is defined such as a subgraph of GD andeach subgraph of GD isomorphous to t will be called instanceof t-template.

Efficiently: few(or no) conflicts (i.e. few processors need to access the same memory module at the same time).


5

How a mapping algorithm should be:

Efficient: the number of conflicts for each instance of the considered template type should be minimized. Versatile: should allow efficient data access by an algorithm that uses different templates. Balancing Memory Load: it should balance the number of data items stored in each memory module. Allow for Fast Memory Address Retrieval: algorithms for retrieving the memory module where a given data item is stored should be simple and fast. Use memory efficiently: for fixed size templates, it should use the same number of memory modules as the size.


6

Why the versatility?

Multi-programmed parallel machines: different set of processors run different programs and access different templates. Algorithms using different templates: in manipulating arrays, for example,accessing lines (i.e. rows and columns) is common. Composite templates: some templates are “composite”, e.g. the Range-query template consisting of a path with complete subtrees attached.

A versatile algorithm is likely to perform well on that...


7

Previous results: Array

Research in this field originated with strategies for mapping two-dimensional arrays into parallel memories.

Euler latin square (1778): conflict-free (CF) access to lines (rows and columns). Budnik, Kuck (IEEE TC 1971): skewing distance

CF access to lines and some subblocks. Lawrie (IEEE TC 1975): skewing distance

CF access to lines and main diagonals. requires 2N modules for N data items.


8

Previous results: Array(2)

Colbourn, Heinrich (JPDC 1992): latin squares CF access for arbitrary subarrays: for r > s r × s and s × r subarrays. Lower Bound: for any CF mapping algorithm for arrays, where templates are r × s and s × r subarrays ( r > s ) and lines requires n > rs + s memory modules. Corollary: more than one-seventh of the memory modules are idle when any mapping is used that is CF for lines, r × s and s × r subarrays ( r > s ).


9

Kim, Prasanna Kumar (IEEE TPDS 93): latin squares CF access for lines, and main squares (i.e.

where the top-left item ( i, j ) is such that i=0 mod and

j=0 mod ). Perfect Latin squares: main diagonals are also CF. Every subarray has at most 4 conflicts (intersects at most 4 main squares)


N N

N

N

0 1 2 3

2 3 0 1

3 2 1 0

1 0 3 2

N N


10

Das, Sarkar (SPDP 94): quasi-groups (or groupoids). Fan, Gupta, Liu (IPPS 94): latin cubes.

These source offer strategies that scale with the number of memory module so that the number of available modules changes with the problem instance.



11

We study templates that can be viewed as generalizing array-blocks and “paths” originating from the origin vertex <0,0>:

Chaotic array (C): A (two-dimensional) chaotic array C is an undirected graph whose vertex-set VC is a subset of N × N that is

order-closed, in the sense that for each vertex <r, s> ≠ <0, 0> of C, the set

Our results: Templates

1, , , 1cV r s r s


12

Ragged array (R): A (two-dimensional) ragged array R is a chaotic array whose vertex-set VR

satisfies the following:

if <v1 , v2 > VR , then {v1} × [v2] VR;

if <v1 , 0 > VR , then [v1] × {0} VR;

Motivation pixel maps; lists of names where the table change shape dynamically; relational table in relational database.

For each n N, [n] denotes the set {0, 1, …, n -1}

Our results: Templates(2)


13

Rectangular array (S): A (two-dimensional) rectangular array S is a ragged array whose vertex-set has the form [a] × [b] for some a, b N.

Our results: Templates(3)


14

Our results: Templates Chaotic Arrays

Ragged Arrays

Rectangular Arrays

Rectangualar

RaggedChaotic

Arrays


15

For any integer c > 0 , a c-contraction of an array A is a graph G that is obtained from A as follows.

Rename A as G(0). Set k = 0. Pick a set S of c vertices of G(k) that were vertices of A. Replace these vertices by a single vertex vS; replace all edges of G(k) that are incident to vertices of S by edges that are incident to vS. The graph so obtained is G(k+1)

Iterate step 2 some number of times; G is the final G(k).

Our results: c-contraction

G(1)A=G(0) G(2)=G


16

Our results are achieved by proving a (more general) result, of independent interest, about the sizes of graphs that are “almost” perfect universal for arrays.

A graph Gc= (Vc, Ec) is c-perfect-universal for the family An if for each array A An exists a c-contraction of A that is a labeled-subgraph of Gc.

An denotes the family of arrays having n or fewer vertices.

Our results: some definitions


17

The c-perfection number for An, denoted Perfc(An), is the size of the smallest graph that is c-perfect-universal for the family An

Theorem (F.R.K. Chung, A.L.Rosenberg, L.Snyder 1983)

Perf1(Cn) =

Perf1(Rn) =

Perf1(Sn) = n

Our results: c-perfection number

12 2

n n 1 2

1 32 3 3

n nn


18

Theorem

For all integers c and n, letting X ambiguously denote Cn and Rn,

we have Perfc(X) .

Theorem

For all integers c and n, Perfc(Cn) .

Theorem

For all integers c and n, Perfc(Rn) .

Theorem

For all integers c and n, Perfc(Sn) = .

Our results

21n

c

11

2 2( 1)

n n

c c

.

1

1

n nn

c c

n

c


19

Perfc(Cn) Perfc(Rn)

Our results:A Lower Bound on Perfc(Cn) and Perfc(Rn) 1 1

( ) 12 2( 1)n

n nSize Uc c c

.

0,2

n

)1(2

,0c

n

0,0

Un


20

Perfc(Cn)

d = (n+1)/c; Vd = { v = <v1, v2> | v1 < d and v2 < d }.

For each vertex v = <v1, v2> :

if v Vd then color(v) = v1 d + v2;

if v Vd then color(v) = color(<v1 mod d, v2 mod d>);

Our results: An Upper Bound on Perfc(Cn)

21n

c

Vd

0,0

d = 3


21

Perfc(Cn) Size(Vd)=

Our results: An Upper Bound on Perfc(Cn) 2

1n

c

n = 13

c = 5

d = (n+1)/c = 3

Vd

0,0


22

Conclusions

A mapping strategy for versatile templates with a given “conflict tolerance” c.

c-Perfect Hashing Schemes for Binary Trees, with Applications to Parallel Memories (accepted for EuroPar 2003):

Future Work: Matching Lower and Upper Bound for Binary Trees.

( 1) /( 1) (1 1/ ) ( 1) /( 1)2 ( ) 2n c c n cc nPerf T c