Data-driven Query Processing for Immersive Computational
TurbulenceKalin KanovDepartment of
Computer ScienceJohns Hopkins
University
The Big PictureScientific disciplines have developed a
computational branchModels without closed form solutions solved
numericallyThis has lead to an explosion of data
Simulation and analysis workloads are data-intensiveProducing\scanning large amounts of data
Management of these data represents a significant challengeStorage\archivingQuery processingVisualization
Remote Immersive AnalysisFormerly, analysis performed during the
computationNo data stored for subsequent examination
Data-intensive computing breakthroughs have allowed for new interaction with scientific numerical simulations
Turbulence Database ClusterStores entire space-time evolution of the simulationProvides public access to world-class simulationsImplements “immersive turbulence*” approach
Introduces new challenges
*E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database
cluster. In Supercomputing, 2007.
GoalsDevelop data-driven query processing
techniquesReduce I/O and computation costsReduce or eliminate storage overheadExploit domain knowledge and structure
Provide user interfaces that are efficient and flexible
Streamline the process of data ingest
Turbulence Database Cluster
0 1 2 3 4 5 6 7 8 910
11
12
13
14
15
Processing a Batch Query
10 11 14 15
8 9 12 13
2 3 6 7
0 1 4 5query 1 query 3
query 2
q1:q2: 9
11
12
14
q3: 4 5 6 7
0 1 2 3 4 6 8 912
Redundant I/OMultiple disk seeks
I/O Streaming Evaluation MethodLinear data requirements of the
computation allow for:Incremental evaluationStreaming over the dataConcurrent evaluation of batch queries
0 1 2 3 4 5 6 7 8 910
11
12
13
14
15
Processing a Batch Query
10 11 14 15
8 9 12 13
2 3 6 7
0 1 4 5query 1 query 3
query 2
11
145 70 1 2 3 4 6 8 9
12
q1 q1 q1 q1 q1
q3
q3 q1
q3
q3 q1 q1
q2
q2 q1
q2
q2
I/O Streaming:
Sequential I/OSingle pass
Lagrange Polynomial Interpolation
€
f (x',y ') = lyp−N2+ j
j=1
N
∑ (y') lxn−N2+i
i=1
N
∑ (x') ⋅ f (xn−N2+ i,y
p−N2+ j)
Lagrange coefficients Dat
a
Spatial Differentiation
€
dfdx xn
=
€
112Δx
f (xn−2)
€
−23Δx
f (xn−1)
€
+23Δx
f (xn+1)
€
−1
12Δxf (xn+2)
€
xn
Derivative Interpolation
€
dfdx xn
=
€
112Δx
f (xn−2)
€
−23Δx
f (xn−1)
€
+23Δx
f (xn+1)
€
−1
12Δxf (xn+2)
€
xn
128 Workload
Over an order of magnitude improvement Sorting leads to a more sequential accesJoin/Order By executes entire batch as a joinI/O Streaming
Each atom is read only onceEffective cache usage
I/O Streaming alleviates I/O bottleneckComputation emerges as the more costly operation
Particle TrackingWeb Server/Mediator
DB Node 1
Distribute Points based on
€
x p (tm )
Computational Module
€
x p* (tm ) = x p (tm ) + u(x p (tm ), tm )Δt p
Storage Layer Retrieve
€
u(x p (tm ), tm )
DB Node N
Computational Module
€
x p* (tm ) = x p (tm ) + u(x p (tm ), tm )Δt p
Storage Layer Retrieve
€
u(x p (tm ),tm )
xp(tm)xp(tm)
x*p(tm)x*
p(tm)
Particle TrackingWeb Server/Mediator
DB Node 1
Distribute Points based on
€
x p* (tm )
Computational Module
€
x p (tm+1) =x p (tm ) + x p
* (tm ) + u(x p* (tm ),tm+1)Δtp
2
Storage Layer Retrieve
€
u(x p* (tm ),tm+1)
DB Node N
Computational Module
Storage Layer Retrieve
x*p(tm)x*p(tm)
xp(tm+1)xp(tm+1)
€
x p (tm+1) =x p (tm ) + x p
* (tm ) + u(x p* (tm ),tm+1)Δtp
2
€
u(x p* (tm ), tm+1)
Summary and Future WorkExtend I/O streaming technique to different
decomposable kernel computations:DifferentiationSpatial InterpolationTemporal interpolationFiltering and coarse-graining
Provide a flexible user interfaceAllow for different filter functionsAllow for new kernel computations
Improve particle tracking routineReduce communication between mediator and DB nodesAsynchronous processingCaching and pre-fetching