c-Perfect Hashing Schemes for Arrays, with Applications to
Parallel MemoriesG. Cordasco1, A. Negro1, A. L. Rosenberg2 and V. Scarano1
1 Dipartimento di Informatica ed Applicazioni ”R.M. Capocelli” Università di Salerno, 84081, Baronissi (SA) – Italy
2 Dept.of Computer Science University of Massachusetts at Amherst
Amherst, MA 01003, USA
Workshop on Distributed Data and Structures 2003
2
Summary
The Problem The Motivation and some examples Our results Conclusion
Workshop on Distributed Data and Structures 2003
3
The problemMapping nodes of a data structure on a parallel memory system in such a way that data can be efficiently accessed by templates.
Data structures (D): array, trees, hypercubes… Parallel memory systems (PM):
PP-1
P0
P1
M0
M1
MM -2
MM-1
PM
D
Workshop on Distributed Data and Structures 2003
4
The problem(2)Mapping nodes of a data structure on a parallel memory system in such a way that data can be efficiently accessed by templates.
Templates: distinct sets of nodes (for arrays: rows, columns, diagonals, submatrix; for trees: subtrees, simple paths, levels; etc).
Formally: Let GD be the graph that describes the data structure D. Atemplate t, for D is defined such as a subgraph of GD andeach subgraph of GD isomorphous to t will be called instanceof t-template.
Efficiently: few(or no) conflicts (i.e. few processors need to access the same memory module at the same time).
Workshop on Distributed Data and Structures 2003
5
How a mapping algorithm should be:
Efficient: the number of conflicts for each instance of the considered template type should be minimized. Versatile: should allow efficient data access by an algorithm that uses different templates. Balancing Memory Load: it should balance the number of data items stored in each memory module. Allow for Fast Memory Address Retrieval: algorithms for retrieving the memory module where a given data item is stored should be simple and fast. Use memory efficiently: for fixed size templates, it should use the same number of memory modules as the size.
Workshop on Distributed Data and Structures 2003
6
Why the versatility?
Multi-programmed parallel machines: different set of processors run different programs and access different templates. Algorithms using different templates: in manipulating arrays, for example,accessing lines (i.e. rows and columns) is common. Composite templates: some templates are “composite”, e.g. the Range-query template consisting of a path with complete subtrees attached.
A versatile algorithm is likely to perform well on that...
Workshop on Distributed Data and Structures 2003
7
Previous results: Array
Research in this field originated with strategies for mapping two-dimensional arrays into parallel memories.
Euler latin square (1778): conflict-free (CF) access to lines (rows and columns). Budnik, Kuck (IEEE TC 1971): skewing distance
CF access to lines and some subblocks. Lawrie (IEEE TC 1975): skewing distance
CF access to lines and main diagonals. requires 2N modules for N data items.
Workshop on Distributed Data and Structures 2003
8
Previous results: Array(2)
Colbourn, Heinrich (JPDC 1992): latin squares CF access for arbitrary subarrays: for r > s r × s and s × r subarrays. Lower Bound: for any CF mapping algorithm for arrays, where templates are r × s and s × r subarrays ( r > s ) and lines requires n > rs + s memory modules. Corollary: more than one-seventh of the memory modules are idle when any mapping is used that is CF for lines, r × s and s × r subarrays ( r > s ).
Workshop on Distributed Data and Structures 2003
9
Kim, Prasanna Kumar (IEEE TPDS 93): latin squares CF access for lines, and main squares (i.e.
where the top-left item ( i, j ) is such that i=0 mod and
j=0 mod ). Perfect Latin squares: main diagonals are also CF. Every subarray has at most 4 conflicts (intersects at most 4 main squares)
Previous results: Array(3)
N N
N
N
0 1 2 3
2 3 0 1
3 2 1 0
1 0 3 2
N N
Workshop on Distributed Data and Structures 2003
10
Das, Sarkar (SPDP 94): quasi-groups (or groupoids). Fan, Gupta, Liu (IPPS 94): latin cubes.
These source offer strategies that scale with the number of memory module so that the number of available modules changes with the problem instance.
Previous results: Array(4)
Workshop on Distributed Data and Structures 2003
11
We study templates that can be viewed as generalizing array-blocks and “paths” originating from the origin vertex <0,0>:
Chaotic array (C): A (two-dimensional) chaotic array C is an undirected graph whose vertex-set VC is a subset of N × N that is
order-closed, in the sense that for each vertex <r, s> ≠ <0, 0> of C, the set
Our results: Templates
1, , , 1cV r s r s
Workshop on Distributed Data and Structures 2003
12
Ragged array (R): A (two-dimensional) ragged array R is a chaotic array whose vertex-set VR
satisfies the following:
if <v1 , v2 > VR , then {v1} × [v2] VR;
if <v1 , 0 > VR , then [v1] × {0} VR;
Motivation pixel maps; lists of names where the table change shape dynamically; relational table in relational database.
For each n N, [n] denotes the set {0, 1, …, n -1}
Our results: Templates(2)
Workshop on Distributed Data and Structures 2003
13
Rectangular array (S): A (two-dimensional) rectangular array S is a ragged array whose vertex-set has the form [a] × [b] for some a, b N.
Our results: Templates(3)
Workshop on Distributed Data and Structures 2003
14
Our results: Templates Chaotic Arrays
Ragged Arrays
Rectangular Arrays
Rectangualar
RaggedChaotic
Arrays
Workshop on Distributed Data and Structures 2003
15
For any integer c > 0 , a c-contraction of an array A is a graph G that is obtained from A as follows.
Rename A as G(0). Set k = 0. Pick a set S of c vertices of G(k) that were vertices of A. Replace these vertices by a single vertex vS; replace all edges of G(k) that are incident to vertices of S by edges that are incident to vS. The graph so obtained is G(k+1)
Iterate step 2 some number of times; G is the final G(k).
Our results: c-contraction
G(1)A=G(0) G(2)=G
Workshop on Distributed Data and Structures 2003
16
Our results are achieved by proving a (more general) result, of independent interest, about the sizes of graphs that are “almost” perfect universal for arrays.
A graph Gc= (Vc, Ec) is c-perfect-universal for the family An if for each array A An exists a c-contraction of A that is a labeled-subgraph of Gc.
An denotes the family of arrays having n or fewer vertices.
Our results: some definitions
Workshop on Distributed Data and Structures 2003
17
The c-perfection number for An, denoted Perfc(An), is the size of the smallest graph that is c-perfect-universal for the family An
Theorem (F.R.K. Chung, A.L.Rosenberg, L.Snyder 1983)
Perf1(Cn) =
Perf1(Rn) =
Perf1(Sn) = n
Our results: c-perfection number
12 2
n n 1 2
1 32 3 3
n nn
Workshop on Distributed Data and Structures 2003
18
Theorem
For all integers c and n, letting X ambiguously denote Cn and Rn,
we have Perfc(X) .
Theorem
For all integers c and n, Perfc(Cn) .
Theorem
For all integers c and n, Perfc(Rn) .
Theorem
For all integers c and n, Perfc(Sn) = .
Our results
21n
c
11
2 2( 1)
n n
c c
.
1
1
n nn
c c
n
c
Workshop on Distributed Data and Structures 2003
19
Perfc(Cn) Perfc(Rn)
Our results:A Lower Bound on Perfc(Cn) and Perfc(Rn) 1 1
( ) 12 2( 1)n
n nSize Uc c c
.
0,2
n
)1(2
,0c
n
0,0
Un
Workshop on Distributed Data and Structures 2003
20
Perfc(Cn)
d = (n+1)/c; Vd = { v = <v1, v2> | v1 < d and v2 < d }.
For each vertex v = <v1, v2> :
if v Vd then color(v) = v1 d + v2;
if v Vd then color(v) = color(<v1 mod d, v2 mod d>);
Our results: An Upper Bound on Perfc(Cn)
21n
c
Vd
0,0
d = 3
Workshop on Distributed Data and Structures 2003
21
Perfc(Cn) Size(Vd)=
Our results: An Upper Bound on Perfc(Cn) 2
1n
c
n = 13
c = 5
d = (n+1)/c = 3
Vd
0,0
Workshop on Distributed Data and Structures 2003
22
Conclusions
A mapping strategy for versatile templates with a given “conflict tolerance” c.
c-Perfect Hashing Schemes for Binary Trees, with Applications to Parallel Memories (accepted for EuroPar 2003):
Future Work: Matching Lower and Upper Bound for Binary Trees.
( 1) /( 1) (1 1/ ) ( 1) /( 1)2 ( ) 2n c c n cc nPerf T c