Methodology
• Lattice Boltzmann method is based on the Boltzmann transport equation
• Domain is discretized with lattice nodes instead of rigorous meshing
• Independence from mesh allows for complex domains like porous media
• Masses at nodes collide and then stream information to neighbors
Implementation of Parallel Computing for Multiphase Flows using the Lattice Boltzmann Method
Jaime Mudrich (DOE Fellow), Rinaldo G. Galdamez (DOE Fellow), and Seckin Gokaltun, Ph.D. Applied Research Center, Florida International University, Miami, FL
Validation of the Parallel LBM Code
0.0000
0.2000
0.4000
0.6000
0.8000
1.0000
1.2000
2500 40000 160000 640000
Ite
rati
ve C
om
pu
tati
on
Tim
e (
s)
Domain Size (Nodes)
Subroutine Time Profile
Prestream
Hydrodynamics
Poststream
Stats
• 53 million gallons of radioactive waste at Hanford site
• Stored in leaking single shell tanks (SST)
• Double shell tanks (DST) introduced in 1968
• Unlike the SSTs, DSTs show no leaking
• Waste is being transported from SSTs to DSTs
• Transport of heterogeneous waste clogs piping
• Pulsed-air mixing used to “stir” heterogeneous material
• LBM simulates bubbles rising to predict mixing
Rising bubbles mix slurry
SST (A), and DST (B)
• Master processor divides the problem domain amongst multiple slaves
• Message passing interface (MPI) allows CPUs to bridge information across sub domains
• Reduction of processing time is ultimately limited by communications between processors
and the components of the program that must run sequentially
• Effectiveness of parallelization is measured by speedup, S(N), for N processors
• When increasing the number of CPUs shows minimal performance increase, optimal
quantity has been reached
In the serial configuration, only one processor is used to solve the entire domain of the problem
In the parallel configuration, multiple processors split the domain of the problem, reducing overall computation time
Processors communicate with their neighbors through MPI to “patch” the sub domains
Finally, the master collects the results from the various slave nodes
)(
)1()(
NT
TNS Amdahl’s Law :
T(1) = Single processor computation time T(N) = Multiple processor computation time
Acknowledgements
Results
f1
f2
f3
f4
f5 f6
f7 f8
f0
Histogram view of the distribution function, f.
tftftftf
eq
aaa
t
a
,,,,
xxxx
2
2
4
2
2 2
3
2
931)(
cccwf a
eq
a
uueuexx aa
…where…
),(),( tftttf t
aa xex a
Stream
x = position of particles u = macroscopic velocity at the node
aw = constant, direction-specific weight
c = model speed of sound = dimensionless relaxation time
ae = basis vector at node (9 total)
Lattice nodes
Overlapping profiles for serial and parallel case indicates accurate results for parallel code
0.00E+00
1.00E+01
2.00E+01
3.00E+01
4.00E+01
5.00E+01
6.00E+01
7.00E+01
0 20 40 60 80
Spe
ed
Up
Number of Processors
Speed Up for Parallelized Subroutines
Prestream
Hydrodynamics
Poststream
Ideal
Speedup trends. Near-linear behavior confirms correct parallelization
For 640,000 nodes, the parallel code reduces the job from thirty hours to only three hours
0.00E+00
5.00E-03
1.00E-02
1.50E-02
2.00E-02
2.50E-02
3.00E-02
3.50E-02
0 10 20 30 40 50 60 70
Co
mp
uta
tio
n T
ime
(s)
Number of Processors
Prestream
Subroutine parallel time profiles; The computation times all converge at about N = 25, representing the optimal quantity
0.00E+00
5.00E-04
1.00E-03
1.50E-03
2.00E-03
2.50E-03
3.00E-03
0 10 20 30 40 50 60 70
Co
mp
uta
tio
n T
ime
(s)
Number of Processors
Hydrodynamics
0.00E+00
5.00E-03
1.00E-02
1.50E-02
2.00E-02
2.50E-02
3.00E-02
3.50E-02
0 10 20 30 40 50 60 70
Co
mp
uta
tio
n T
ime
(s)
Number of Processors
PostStream
0
20000
40000
60000
80000
100000
120000
0 200000 400000 600000 800000
Pro
cess
ing
Tim
e (
s)
Domain Size (nodes)
Time Response of Solver to Domain Size
Serial
Parallel (25 CPUs)
0.999955
0.999960
0.999965
0.999970
0.999975
0.999980
0.999985
0.999990
0.999995
1.000000
1.000005
0 50 100 150 200 250
Pre
ssu
re
Position
Cross-sectional Pressure Profile
Serial
Parallel
Max Error = 0.27%
Collision
Introduction Parallel Processing Background
Time Profile for Serial LBM Code
• For the multiphase simulations that are being studied,
the iterative algorithm is comprised of three steps
• Diagnostics performed to identify sluggish areas
• “Hydrodynamics”, “Prestream”, and “Poststream” will
benefit the most from parallelization
• These subroutines will be split amongst various
processors to share the load, speeding up the solution
Conclusions and Future Work
• Parallelization with the optimal number of processors results in significant savings in
computer time (10 times for N=25 and 640,000 lattice nodes)
• Parallelization allows for simulation of larger domains or longer times
• Future work will include extension of the code from 2D to 3D
• In addition, fluid-solid interactions will be also implemented
This research was supported by the U.S. Department of Energy through the DOE-FIU Science and Technology Workforce Development Program, under grant No. DE-EM0000598. Special thanks to Leonel Lagos, Ph.D., PMP ®, Director of the DOE-FIU Science and Technology Workforce Development Program
• Study performed on static bubble with a density ratio of 1,000 and a uniform initial
pressure distribution
• The solution was checked against Laplace’s law for surface tension described below
RP
Laplace’s Law :
P
= Surface tension
= Pressure difference across fluid interface
= Radius of bubble R
• Parallelization allows for more simulations to performed in a much shorter time
• Using the parallel code and the experimentally determined optimal quantity of
processors (N = 25) the following simulation was performed
This series illustrates a case of three equal radius bubbles with minimal separation. LBM captures the coalescence of the top bubbles. Density ratio = 100, Interface width = 5 lattice units, vertical acceleration = -2.0 x 10-7 lattice units per lattice time squared, Interfacial tension = 0.1, and relaxation time for both fluids = 2.71 x 10-2
T = 0 T = 10 T = 15 T = 20
T = 25 T = 100 T = 50 T = 75