ParaTools, Inc. 1
Kppa The Kinetic PreProcessor Accelerated
November 15, 2012
ParaTools, Inc. 2
Chemical Kinetic Simulation
• Pyrolysis and Combustion
• Air and Water Quality
• Climate Change
• Wildfires, Volcanic Eruptions
• Plastics Devolatilzation
• Microorganism Growth
• Cell Biology
ParaTools, Inc. 3
Solving Coupled Stiff ODE Systems is Expensive
Twelve Threads Intel® Xeon® X5650
@ 2.67GHz
ParaTools, Inc. 4
Solving Coupled Stiff ODE Systems is Expensive
Twelve Threads Intel® Xeon® X5650
@ 2.67GHz
ParaTools, Inc. 5
Solving Coupled Stiff ODE Systems is Expensive
Twelve Threads Intel® Xeon® X5650
@ 2.67GHz
ParaTools, Inc. 6
Kppa: The Kinetic PreProcessor Accelerated
Code generation Analysis Lexical
parser
MATLAB
Kppa
Fortran
C
Python
Code optimized
High-Level Description Serial
Multi-core
GPGPU
Intel MIC
Architecture
ParaTools, Inc. 7
Kppa-generated can be 40x faster than hand-tuned
• Reorder chemical species to minimize fill-in
• Reorder concentration data to maximize vectorization
• Reorder grid cells to reduce excessive iteration
Code generation Analysis Lexical
parser
0
10
20
30
40
50
1 3 5 7 9 11 13 15 17
Sp
eed
up
Threads
SMALL STRATO
Linear NVIDIA® C1060
IBM® BladeCenter® QS22 Intel® Xeon® 5400
ParaTools, Inc. 8
Meta-programming for Automatic Code Generation
Code generation
Analysis Lexical parser
<% !d_Decomp.begin()!
piv = lang.Variable('piv', REAL, ! 'Row element divided by diagonal')!piv.declare() !%>! ${size_t} idx = blockDim.x*blockIdx.x+threadIdx.x;! if(idx < ${ncells32}) {! ${A} += idx;!<%!lang.upindent()!for i in xrange(1, nvar):! for j in xrange(crow[i], diag[i]):! c = icol[j]! d = diag[c]! piv.assign(A[j*ncells32] / A[d*ncells32])! A[j*ncells32].assign(piv)!...!%>! }!<% d_Decomp.end() %>!
A[334] = t1;! A[337] = -A[272]*t1 + A[337];! A[338] = -A[273]*t1 + A[338];! A[339] = -A[274]*t1 + A[339];! A[340] = -A[275]*t1 + A[340];! t2 = A[335]/A[329];! A[335] = t2;! A[337] = -A[330]*t2 + A[337];! A[338] = -A[331]*t2 + A[338];! A[339] = -A[332]*t2 + A[339];! A[340] = -A[333]*t2 + A[340];! t3 = A[341]/A[59];!
C/C++/CUDA, Fortran
ParaTools, Inc. 9
Kppa’s Architecture Parameterization
CPU GPU Intel® MIC CBEA
Instruction Cardinality 1 32 16 4
Integrator Cardinality 1 ∞! 16 4
Scratch Size N/A 16 KB N/A 256 KB
Targeting the Intel® Xeon® Phi™ coprocessor was easy!
Just had to parameterize the new architecture in Kppa and generate an OpenMP code
ParaTools, Inc. 10
Kppa-generated SAPRC Kernel: 35.6x Speedup
0
10
20
30
40
50
60
70
80
Intel® Xeon® E3-1220 (3.1GHz)
NVIDIA® GeForce® GTS 450
Intel® Xeon Phi™ Coprocessor
73.438
16.532 2.06
Seco
nd
s
Grid: Time:
# Species: # Equations:
Method:
24x36x8 8 hours 79 211 Rodas4
ParaTools, Inc. 11
Download Kppa from www.paratools.com/kppa