Post on 31-Dec-2015
description
transcript
Garching – 01.06.05 SCCS-Kolloquium
Cache Efficient Data Structures and Algorithms for
d-Dimensional Problems
Judith HartmannTU München, Institut für Informatik V
hartmanj@in.tum.de
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Overview
• cache hierarchies and data locality
• discretization by space-filling curve
• system of stacks
• measured results
• concluding remarks
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Cache Hierarchy
1
10
100
1000
10000
100000
Pe
rfo
rma
nc
e (
19
80
= 1
)
1980 1984 1988 1992 1996 2000 2004
Performance gap (Hennesy/Patterson)
CPU Memory
L2 cache
registers
CPU
L3 cache
main
memory
L1 data L1 inst
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Cache Efficiency as Main Objective
• algorithm for solving PDEs
– modern concepts (finite elements, multigrid, etc.)
– cache-efficiency
• road to cache-efficiency
– spacefilling curve for discretization of the domain
– data transport via stack system
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Iterative Construction of the Peano Curve
• Leitmotiv L
• for each dimension: decomposition of all diagonals
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Important Properties for the Construction of a Stack System
• projection property
• palindrome property
• recursivity in dimension
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Organization of a d-Dimensional Iteration
(d-1)-dim. process (d-1)-dim. process
(d-1)-dim. system of
stacks
d-dim. process
d-dim. input stack
d-dim. output stack
1
2
3
6
5
4
7
8
9• induced discretization
of the unit square and order for the grid cells
• processing of data
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Necessary Properties of a Useful System of Stacks
• constant number of stacks
• determinsitic data access
• dimensional recursivity
• self similarity
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Number of Stacks
type of stack 0D 1D 2D 3D 4D
d=1 2 2 4
d=2 4 4 2 10
d=3 8 12 6 2 28
d=4 16 32 24 8 2 28
• number of stacks for several dimensionalities
• number of i-dimensional stacks for a d-dimensional iteration
• total number of lower dimensional stacks: 3d-1
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Stacks and Geometrical Boundary Elements
• matching of i-dimensional stack and i-dimensional boundary element of the unit cube
• ternary encoding of geometrical boundary elements
000 001
011
101
111
100
110
002
201
021
211
120
112
102
200
121
202
221
122
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Data Access
002
201
021
211
120
112
102
200
121
100
120
102
200
100
200
• parameters for each grid point
100
– status: determination of the sort of the stack
• basic stack numbers
• mechanism of inheritance
– stack numbers: which stack of this sort
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Necessary Properties of a Useful System of Stacks
• constant number of stacks
• determinsitic data access
• dimensional recursivity
• self similarity
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Hardware PerformancePoisson Equation on the Unit Cube
• cache-hit-rates > 99,9 %
resolution d=2 d=3 d=4 d=5 d=6
9 x ··· x 9 99.06% 99.93% 99.95% 99.99% 99.81%
27 x ··· x 27 99.85% 99.99% 99.98% 99.97% -
81 x ··· x 81 99.98% 99.98% 99.96% - -
243 x ··· x 243 99.99% 99.97% - - -
• processing time per degree of freedom
– decreases with increasing grid size
– is augmnted by a factor ~2.7 for each dimension
• unpenalized disc-usage possible
resolution d=2 d=3 d=4 d=5
9 x ··· x 9 5.45·10-6 1.74·10-5 4.77·10-5 1.47·10-4
27 x ··· x 27 4.98·10-6 1.35·10-5 3.66·10-5 1.05·10-4
81 x ··· x 81 4.41·10-6 1.25·10-5 3.32·10-5 -
243 x ··· x 243 4.32·10-6 1.22·10-5 - -
• processing time per degree of freedom
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
AdaptivityPoisson Equation on a Sphere
• no loss of cache-efficiency:cache-hit-rates > 99,9 %
• locally refined grids:
– computing time improves (if appropriately approximated)
– no loss of cache-efficiency
dim. domain resolution time per dof L2 hit rate
d=3
unit cube 243 x ··· x 243 3.53·10-6 99.96%
sphere243 x ··· x 243(regular grid)
5.66·10-6 99.98%
sphere243 x ··· x 243
(only on Ω)3.90·10-6 99.97%
d=4
unit cube 81 x ··· x 81 2.64·10-5 99.96%
sphere81 x ··· x 81
(regular grid)1.24·10-4 99.96%
sphere81 x ··· x 81 (only on Ω)
3.83·10-5 99.96%
SCCS 01.06.05 Cache Efficient Data Structures and Algorithms for d-Dimensional Problems
Conclusion & Further Developments
• method for solving d-dimensional PDEs
– modern numerical methods
– high cache-efficiency
– complicated geometrics
• further developements:
– potential for parallelization
– general system of PDEs, Stokes, Navier-Stokes, etc.
– potential for solving time-dependent PDEs in 3 space dimensions