Post on 13-Apr-2020
transcript
Fast Fluid-structure Interaction Using
Lattice Boltzmann and Immersed
Boundary Methods
Mark Mawson1, Pedro Valero Lara2, Julien Favier3, Alfredo Pinelli2, Alistair Revell1
1. The University of Manchester 2. Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas 3. Aix Marseille Université
Contents
1. The Lattice Boltzmann Method
2. The Immersed Boundary Method
3. Demonstrations in 2D and 3D
4. Implementation and Optimisation
5. Summary
Demo GPU Hardware
• GK104 based K5000M: • Portable (in a laptop)
• High peak performance
• Low DRAM bandwidth-but we can still solve
fluid problems interactively.
CUDA Cores 1344
DRAM 4GB
Compute capability 3.0
Peak performance
(single precision)
1.6TFlops
DRAM Bandwidth 96GB/sec (theoretical)
66GB/sec (measured)
The Lattice Boltzmann Method
• Continuum methods (macro-scale)
Based on Navier Stokes equations.
Conservation of mass/momentum/energy on infinitesimal volume
Finite (Volume/Element/Diff.)
• Molecular Dynamics (micro scale)
Small particles that collide with each other
Inter-particle forces governs interactions
For each t we must find trajectory of each particle
Very computationally expensive
• Meso-scale,
Based on Kinetic theory, fits somewhere in the middle:
LBM falls within this category
Instead of a single particle we consider a distribution function
This represents a collection of particles
The Boltzmann Equation
=ff
t
fex fe
ff eq )(1
f
e
f is a probability distribution
function
is a the velocity vector associated
with faccounts for body forces
applied to the distribution
)(eqf
an operator to account for particle
collisions (LGBK) in this case
a Maxwellian used to “emit” particles
into a new component of e (see
Bhatnagar & Grook, 1954)
a relaxation term used to describe the
amount of collision taking place
is a weighting function and is the speed of sound in the lattice
Discretisation to Lattice Boltzmann
ii
eq
iii tftftftf f,,1
=,1, )( xxxex i
fi = 1-1
2t
æ
èçö
ø÷w i
ei - u
cs2
+ei ×u
cs4ei
é
ëê
ù
ûú × f
2
2
4
2
2
)(
221=
sss
i
eq
iccc
fuueue ii
i 3
1sc
i
i
f=
feu i2
1= i
i
f
• Multi-scale expansion of the Lattice Boltzmann equation up to and including 2nd
order terms allows the Navier-Stokes equations to be recovered,
• See Guo et al, 2002 for more details
Macroscopic variables Populations
LBM– Discretisation 2D-D2Q9 3D-D3Q19
1111111000
1111100110=ie
1111111100001100000
0000111111110011000
1111000011110000110
=ie
Collision step:
• This is an entirely local operation (think independent threads in CUDA)
LBM- As an Algorithm ii
eq
ii tftftf f,,1
=1, )( xxx
Streaming step:
• Nearest neighbour interaction.
tftf iii ,=1, xex
LBM- As an Algorithm
• Re-ordering of LBM algorithm to
increase locality: Stream in the appropriate direction
Apply boundary conditions.
Calculate ρ and u.
Calculate .
Apply the collision operator.
if
1,= txff iii ex
ii
eq
ii tftftf f,,1
=1, )( xxx
)(eq
if
LBM Validation
Validation 1: Lid Driven Cavity
0
0.2
0.4
0.6
0.8
1
-0.5 0 0.5 1
Y
u
Ghia et al
33x33
65x65
129x129
0
0.2
0.4
0.6
0.8
1
-0.5 0 0.5 1
Y
u
Ghia et al
65x65
129x129
257x257
0
0.2
0.4
0.6
0.8
1
-0.5 0 0.5 1
Y
u
Ghia et al
129x129
257x257
513x513
2D CASE: Centreline u profiles for Re=100,400 and 1000
0
0.2
0.4
0.6
0.8
1
-0.5 0 0.5 1
Y
u
Jiang & Lin
33x33x33
65x65x65
129x129x129
0
0.2
0.4
0.6
0.8
1
-0.5 0 0.5 1
Y
u
Jiang & Lin
65x65x65
129x129x129
257x257x257
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-0.5 0.5
Y
u
Jiang & Lin
129x129x129
257x257x257
3D CASE: Centreline u profiles for Re=100,400 and 1000
• D3Q19 is more memory intensive, leads to smaller domains.
• Boundary conditions become more important.
• 2nd order convergence verified up to floating precision in 3D.
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 0.2 0.4 0.6 0.8 1
u v
elo
cit
y
Y coordinate
Analytical Solution
33x33x33
-5
-4.5
-4
-3.5
-3
-2.20 -1.70 -1.20
Lo
g10 o
f err
or
Log10 Δx
Δx
Δx²
L₂ Norm
Validation 2: Poiseuille Flow
Validation 2: Poiseuille Flow
• In double precision the K5000M runs out of memory
before floating point error becomes dominant.
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 0.2 0.4 0.6 0.8 1
u v
elo
cit
y
Y coordinate
Analytical Solution
33x33x33
-5.2
-4.7
-4.2
-3.7
-3.2
-2.20 -1.70 -1.20
Lo
g10 o
f err
or
Log10 Δx
Δx
Δx²
L₂ Norm
Immersed Boundary Method
• Allows moving and complex boundaries to be created arbitrarily within a Lagrangian space.
• No need for unstructured, body fitting domains. • We use the method found in Pinelli, A., Naqavi, I., Piomelli, U., Favier,
J., 2010. Immersed-boundary methods for general finite-difference and finite-volume navier-stokes solvers.
Pictures from http://www.math.vt.edu/people/xuz/research.html
Immersed Boundary Method – With LBM • Perform collision and stream
• Apply boundary conditions.
• Calculate u*.
• Integrate velocities to Lagrange
space.
• Calculate corrective force.
• Integrate velocities back to
Eulerian space
• Perform collision and stream
operations with forces
• Calculate ρ and u with forces.
i
i
fie
u
=*
feu i2
=t
fi
i
xXxxuU dss ))((~
)(=)(
dsssF ))((~
)(=)( Xxxf
dt
sss
)()(ˆ=)(
UUF
tftftftf i
eq
iii ,,1
=,1, )(xxxex i
ii
eq
iii tftftftf f,,1
=,1, )( xxxex i
Immersed Boundary Method - Interpolation
• is a mollifier kernel with compact support of size 3
(Roma 1999)
otherwise
rr
rrr
r
0
0.5||1313
1
1.5||0.51|)|3(1||356
1
=)(~ 2
2
~
Demonstrations – Flow Over a Sphere
Demonstration – Flexible Filaments
• We must update the position of our immersed boundary points.
• A Lagrange-Euler system is solved iteratively for the tension between points, and the position of the points
• This is currently (unfortunately!) performed on the host CPU.
• Partially hide this process by executing concurrently with LBM
T X
FgXXX
)()(=
2
2
2
2
2
2
sK
ssT
stB
Implementation & Performance-LBM Operating on all indices of in one thread helps to hide the
latency through ILP, lower occupancy but higher performance
(Volkov, 2010).
if
Unrolling the streaming operation loop allows 19 requests to
DRAM to be made with only register level stalls.
Implementation & Performance-LBM Use “Struct of Arrays” access to . Coalesced access therefore
only depends on the x component of e (fully coalesced during collision).
Cache hit rate is low as we don’t have repeat accesses; 0% in L1 and <7% in L2.
if
Implementation & Performance- LBM • 2D LBM
740
750
760
770
780
790
800
810
820
830
256^2 512^2 1024^2 2048^2
ML
UP
S
Number of lattice points
0
100
200
300
400
500
600
64^3 96^3 128^3
ML
UP
S
Number of Lattice Points
Present work
Asinari et al. (2011)
Obrecht et al. (2010)
Rinaldi et al. (2012)
• 3D LBM
Implementation & Performance - LBM 3D LBM- If we scale for the bandwidth of the GPU
0
100
200
300
400
500
600
700
64^3 96^3 128^3
ML
UP
S s
cale
d f
or
ban
dw
idth
Number of Lattice Points
Present work
Obrecht et al. (2010)
Rinaldi et al. (2012)
0
100
200
300
400
500
600
64^3 96^3 128^3
ML
UP
S
Number of Lattice Points
Present work
Astorino et al. (2011)
Obrecht et al. (2010)
Rinaldi et al. (2012)
Implementation & Performance – IB • Transactions involving information for each boundary point
are coalesced – each point only needs information about itself • Transactions for moving data between fluid and boundary are
random – much higher cache use, ≈40%
Lagrange
information Fluid
information
Implementation & Performance – IB • In 2D we only use a few hundred lagrange markers per object
• We can assign one block of threads per object (1024 points max).
• In 3D several thousand lagrange markers are needed (4000 for the sphere
demonstration).
• We need to launch one kernel per object.
• Launching kernels in different streams can improve the utilisation of the GPU, if the
objects are small.
Performance – Immersed Boundary
• 3.4ms in serial
Summary
• Lattice Boltzmann-Immersed Boundary solvers presented in 2D and 3D. Relatively simple alternative to unstructured domains.
Both methods suit parallelisation.
• Real-time simulations possible thanks to GPU acceleration.
• Don’t always need high occupancy and use of shared/cache memory to achieve high performance.
Future work
“Interactive in-silico Platform for Optimising Surgical Procedure of Abdominal
Aortic Aneurysm Repair and Evaluation of Stent Performance.”
• Personalised surgery simulation for stent implants – i.e. Real-time with user
interaction.
• Medical images converted into CAD designs and imported into fluids solver as an
immersed boundary.
• A few years away, but this work lays the foundations for such a project.
Any Questions?
www.mark.j.mawson.blogspot.com
http://www.youtube.com/user/mjmawson
http://www.youtube.com/mcji8ar2
References • Bhatnagar, P. & Gross, E., 1954. A model for collision
processes in gases. I. Small amplitude processes in charged and neutral one-component systems.
• Guo, Z., Zheng, C. & Shi, B., 2002. Discrete lattice effects on the forcing term in the lattice Boltzmann method.
• Roma, A. M., Peskin, C. S., Berger, M. J., 1999. An adaptive version of the immersed boundary method. Journal of Computational Physics 153, 509 – 534.
• Volkov, V., 2010, Better Performance at Lower Occupancy, GTC 2010.