FRIEDRICH-ALEXANDER-UNIVERSITAT ERLANGEN-N URNBERGDas Implementierungskapitel als solches beinhaltet...

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERGINSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG)

Lehrstuhl für Informatik 10 (Systemsimulation)

Smoothed Particle HydrodynamicsReal-Time Fluid Simulation Approach

David Staubach

Bachelor Thesis

Smoothed Particle HydrodynamicsReal-Time Fluid Simulation Approach

David StaubachBachelor Thesis

Aufgabensteller: Prof. Dr. Ulrich Rüde

Betreuer: Dr.-Ing. Klaus Iglberger

Bearbeitungszeitraum: 28.04.2010 – 22.12.2010

Erklärung:

Ich versichere, daß ich die Arbeit ohne fremde Hilfe und ohne Benutzung anderer als der angege-benen Quellen angefertigt habe und daß die Arbeit in gleicher oder ähnlicher Form noch keineranderen Prüfungsbehörde vorgelegen hat und von dieser als Teil einer Prüfungsleistung angenom-men wurde. Alle Ausführungen, die wörtlich oder sinngemäß übernommen wurden, sind als solchegekennzeichnet.

Erlangen, den 22.12.2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Abstract

This bachelor thesis covers an extensive introduction into the smoothedparticle hydrodynamics method as well as a thorough implementation guide-line and a discussion about its real-time performance. We provide insight intothe �uid dynamics of this particle-based approach and its comparison to the yetcommonly applied mesh-based methods. Next to the SPH �uid mechanics, thehandling of collisions between the �uid particles and rigid bodies poses a ratherinteresting application �eld for this thesis. Derived from the introduced SPHterms, a well depicted and conceptional implementation tutorial is presented.The implementation section comprises speci�c data structures and algorithmsto reduce the computational complexity of the particle-particle interactions.Besides the applied optimizations, the implementation part is completed withan approach into parallel programming using Open Multi-Processing to achieveshorter simulation runtimes. Finally, the results of di�erent simulation scenar-ios and the performance impact of multi-threaded execution with regards toreal-time capabilities are discussed.

Zusammenfassung

Diese Bachelorarbeit umfasst eine ausführliche Einleitung in die Smoothed Par-ticle Hydrodynamics Methode, eine detaillierte Implementierungsbeschreibung sowiedie Untersuchung der Echtzeitfähigkeit der Methode. Sie liefert auÿerdem einen Ein-blick in die Fluidmechanik dieser partikelbasierten Herangehensweise und einen Ver-gleich mit den bisher überwiegend angewandten gitterbasierten Methoden. Nebender SPH Fluiddynamik stellt die Kollisionslösung zwischen Partikeln und Starrkör-pern ein interessantes Anwendungsfeld dieser Arbeit dar. Abgeleitet von den vorgestell-ten SPH Gröÿen wird ein anschauliches und verständliches Programmiertutorialbeschrieben. Das Implementierungskapitel als solches beinhaltet Algorithmen undDatenstrukturen zur Reduzierung des rechnerischen Aufwands der Partikel-PartikelInteraktionen. Abgesehen von den angewandten Optimierungstechniken wird zumAbschluss dieser Rubrik eine Möglichkeit der Parallelisierung des Simulationspro-gramms mit Hilfe von Open Multi-Processing vorgestellt. Schlieÿlich werden dieErgebnisse verschiedener Simulationsszenarien sowie detaillierte Laufzeitmessungenunterschiedlicher Optimierungsgrade im Hinblick auf Echtzeitfähigkeit diskutiert.

Contents

List of Figures III

List of Algorithms IV

List of Tables V

List of Abbreviations VI

1 Introduction 1

1.1 Computational Fluid Dynamics . . . . . . . . . . . . . . . . . . 1

1.2 Goals / Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Smoothed Particle Hydrodynamics 5

2.1 SPH Essentials . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 The Smoothing Kernel . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Example: Isotropic Gaussian kernel . . . . . . . . . . . . 8

2.3 Fluid Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Mass-Density . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.2 Internal Force Fields . . . . . . . . . . . . . . . . . . . . 10

2.3.2.1 Pressure . . . . . . . . . . . . . . . . . . . . . . 10

2.3.2.2 Viscosity . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 External Force Fields . . . . . . . . . . . . . . . . . . . . 13

2.3.3.1 Gravity . . . . . . . . . . . . . . . . . . . . . . 13

2.3.3.2 Surface Tension . . . . . . . . . . . . . . . . . . 14

2.4 Leap-Frog Time Integration Scheme . . . . . . . . . . . . . . . . 16

2.5 Collision Handling with Rigid Bodies . . . . . . . . . . . . . . . 17

2.5.1 Collision Detection . . . . . . . . . . . . . . . . . . . . . 17

2.5.1.1 Sphere . . . . . . . . . . . . . . . . . . . . . . . 19

2.5.1.2 Box . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5.2 Collision Response . . . . . . . . . . . . . . . . . . . . . 21

I

2.5.2.1 Hybrid Impulse-Projection Method . . . . . . . 21

2.5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Implementation 25

3.1 Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . 25

3.1.1 Linked Cell Method . . . . . . . . . . . . . . . . . . . . . 25

3.1.2 Optimization of Memory Access . . . . . . . . . . . . . . 28

3.1.3 Discussion and Further Improvements . . . . . . . . . . . 30

3.2 Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Implementation Guideline . . . . . . . . . . . . . . . . . . . . . 34

3.3.1 Initialization of the Simulation . . . . . . . . . . . . . . . 34

3.3.2 Evaluation of the Mass Density . . . . . . . . . . . . . . 35

3.3.3 Evaluation of the Pressure Field . . . . . . . . . . . . . . 35

3.3.4 Evaluation of the Internal and External Forces . . . . . . 35

3.3.5 Leap-Frog Scheme and Collision Handling . . . . . . . . 36

3.3.6 Linked Cell Update . . . . . . . . . . . . . . . . . . . . . 37

3.3.7 Visualization . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Parallel Programming using OpenMP . . . . . . . . . . . . . . . 38

4 Discussion of Results 43

4.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Simulation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Conclusion 53

Bibliography 55

A Math Reference A

B Classical CFD Equations B

C Kernel Functions C

II

List of Figures

1.1 Eulerian Fluid . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Lagrangian Fluid . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Isotropic Gaussian Kernel . . . . . . . . . . . . . . . . . . . . . 8

2.2 Generation of repulsive and attractive Forces . . . . . . . . . . . 11

2.3 Drop Clustering due to Surface Minimization . . . . . . . . . . . 14

2.4 E�ect of Surface Curvature . . . . . . . . . . . . . . . . . . . . . 15

2.5 Boundary versus Obstacle Primitive . . . . . . . . . . . . . . . . 18

2.6 Depiction of di�erent Coordinate Systems . . . . . . . . . . . . 20

2.7 Issue of Collision Handling . . . . . . . . . . . . . . . . . . . . . 23

3.1 Main Idea of the Linked Cell Method . . . . . . . . . . . . . . . 26

3.2 Optimized Linked Cell Support Region . . . . . . . . . . . . . . 29

3.3 Description of Particle Reference List . . . . . . . . . . . . . . . 30

3.4 Fork Mechanism of OpenMP . . . . . . . . . . . . . . . . . . . . 38

4.1 Visualization of Falling Water into a Bowl . . . . . . . . . . . . 44

4.2 Visualization of a Dam-Break . . . . . . . . . . . . . . . . . . . 45

4.3 Visualization of an Obstacle Collision . . . . . . . . . . . . . . . 45

4.4 Run-Time of the SPH Solver . . . . . . . . . . . . . . . . . . . . 47

4.5 Run-Time improvement using OpenMP . . . . . . . . . . . . . . 48

4.6 Visualization of the Volume Conservation Issue . . . . . . . . . 51

C.1 Poly6 Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . D

C.2 Spiky Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . E

C.3 Viscosity Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . F

III

List of Algorithms

3.1 C++ General De�nitions . . . . . . . . . . . . . . . . . . . . . . 273.2 Index-Function for the Cell-Iteration . . . . . . . . . . . . . . . 283.3 Exploit of Symmetrical Force . . . . . . . . . . . . . . . . . . . 333.4 OpenMP for -Directive . . . . . . . . . . . . . . . . . . . . . . . 393.5 Particle-loop using OpenMP . . . . . . . . . . . . . . . . . . . . 393.6 Linked Cell-loop using OpenMP . . . . . . . . . . . . . . . . . . 40

IV

List of Tables

3.1 General Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Water Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 34

V

List of Abbreviations

API . . . . . . Application Programming Interface

CFD . . . . . Computational Fluid Dynamics

CPU . . . . . Central Processing Unit

LC . . . . . . . Linked Cell

LCS . . . . . Local Coordinate System

MPI . . . . . Message Passing Interface

OBB . . . . . Oriented Bounding Box

OpenMP Open Multi-processing

POV-Ray Persistence of Vision Raytracer

SPH . . . . . Smoothed Particle Hydrodynamics

STL . . . . . Standard Template Library

WCS . . . . World Coordinate System

VI

Chapter 1

Introduction

The graphical representation of �uid dynamics is a topic of great demand incomputer science and particularly for computational engineering. The appli-cation areas of �uid simulations are of a very diverse manner and cover science�elds like astrophysics, meteorology or medical engineering. However, even formundane scenarios happening in our environment, the research of �uid dy-namics is worthwhile for the progress in technology and science. The di�erentkinds of �uid simulation problems vary in the rate of computational complexityand demand for physical correctness or rather real-time interactivity. Obvi-ously the requirement for the physical correctness of the �uid �ow is in mostcases obligatory for any medical or mechanical engineering application, whilea computer game graphics developer has to focus on real-time capabilites.

Generally, a method combining physical correctness with real-time interac-tivity does not exist, but there is always a trade-o� between both properties.Therefore, it is indispensible for any computational engineer to examine thevarious �uid dynamic methods and evaluate their quali�cation for the simula-tion requirements of a certain application �eld.

1.1 Computational Fluid Dynamics

The numerical methods brought up in the introduction are included in thescience �eld of Computational Fluid Dynamics and are used to solve certain�uid mechanics equations. Typically these equations are the Navier-Stokes(B.1) or the Euler equations (B.3). As already mentioned, there are lots ofCFD methods in use and each of them has di�erent properties when it comesto performance and accuracy. It is a common way to distinguish betweenmesh-based and particle-based methods or, from the �uid's point of view, todi�er between Eulerian and Lagrangian �uids. While the Eulerian methodcorresponds to mesh-based solvers, an implementation of a Lagrangian �uidcan be either mesh-based or particle-based. In the Eulerian view the simulation

1

2 CHAPTER 1. INTRODUCTION

Fluid 1

Fluid 2 (e.g. air)

Figure 1.1: The Eulerian �uid description in two dimensions. The �uid'sproperties are evaluated at the grid points. The problem of the grid itself getseven clearer with such a coarse grid example. The �uid is constrained to existonly at the discrete grid cells, a smooth transition between both �uids is notpossible.

domain is discretized by a �xed mesh and the �uid's properties a�ecting itsdynamics can only be calculated at the discrete grid points of the domain.

The Lagrangian view describes the �uid as a discrete number of particlesthat model the �uid �ow as such and that are carrying along all of the �uid'squantities. The particles are able to move freely through the whole domaindependent on the acting forces in the scenario and hence, the �uid is able toexist at any location in the simulation domain. Figure 1.1 and 1.2 depict thebasic description of the two di�erent �uid views in a very simpli�ed manner.Even though the Eulerian �uid solvers provide more accurate results for some�uid quantities like the mass-density or the pressure �eld, their worst drawbackis the mesh itself. Namely, the �uid properties are constrained to exist onlyat the discrete grid points leading to unrealistic �uid �ows in some occasions,e.g. free surface �ows. If this problem is faced by re�ning the mesh more andmore, the memory overhead of the grid as well as the growing computationalcosts aroused by the larger number of grid points, inhibit any ambitions forsimulations at real-time rates. Famous examples for mesh-based solvers are the�nite di�erence method, the �nite element method, the �nite volume methodand the Lattice Boltzmann methods. Further information on theoretical detailsand implementation examples of Eulerian �uids are given in [ESHD05].

The performance of particle-based solvers strongly depends on the num-

1.2. GOALS / PREVIEW 3

Fluid Particles

SurroundingFluid (e.g. air)

Figure 1.2: The Lagrangian �uid description in two dimensions. The particlesare de�ning the �uid as such and are able to �oat freely through the simulationdomain. Each of the particle carries the necessary �uid mechanical quantities.

ber of simulated particles and the algorithms used to optimize the executionprocess of the simulation. However, particle-based methods generally providean advantage over the mesh-based methods in terms of real-time interactivity.Hence, the examination of particle-based �uid �ow solvers, particularly withregard to the exploit of computational complexity and e�ciency, seems to posean interesting modern research �eld.

1.2 Goals / Preview

This bachelor thesis is supposed to provide an insight into the computationalsimulation of �uid dynamics with the application of the smoothed particlehydrodynamics method. The main focus of this approach lies in the real-time interactivity of the implemented solver and the interaction between the�uid particles and static rigid bodies, modeling obstacles or boundary layers.Next to the real-time achievement, the physical accuracy of the simulationparameters and accordingly the �uid dynamics is of signi�cant importance forthe results of this work.

Chapter 2 gives an introduction to the origin of the SPH method as well asa theoretical background to and the derivation of the �uid mechanics equationsand SPH formalisms. Moreover, the applied numerical time integration schemeand a detailed collision handling routine is presented. Chapter 3 comprisesthe practical part of this work with a thorough description of algorithms and

4 CHAPTER 1. INTRODUCTION

certain techniques to optimize the computational costs of the developed step-by-step implementation guideline. Furthermore, an approach for the parallelexecution on multi-core architectures, applying the OpenMP API, is presentedas part of the implementation chapter. Although our developed implementa-tion is only intended to simulation scenarios with the �uid water, the SPHmethod provides all the common basics for other �uids, like mucus or steam,as well. At last, Chapter 4 collects the most important results of this work andcompletes with a discussion about precedent expectations and implementationissues.

Chapter 2

Smoothed Particle Hydrodynamics

The �rst appearance of the smoothed particle hydrodynamics method goesback to 1977, where it was used for computational research in astrophysics.The originators of the method are Bob Gingold and Joe Monaghan [GiMo77]as well as Leon Lucy [Luc77] in personal research. SPH was actually designedfor compressible �ow problems [Mon92] and since it has been applied in various�elds of scienti�c research including astrophysics, ballistics, volcanology, andoceanography. The SPH method got its name from the term �uid dynamics,also known as hydrodynamics and was adapted into the Computer Graphicscommunity in 1995 [StFi95].

The following sections 2.1 and 2.2 deal with the basic derivation of theSPH terms and depict the necessary characteristics of an exemplary smoothingkernel function that is decisive for the quality of the method. Generally, thischapter is based on the documentation of [Kel06].

2.1 SPH Essentials

As already mentioned, SPH is a particle-based Lagrangian method and thus,it introduces particles to de�ne the simulated �uid and to evaluate the arisingdynamics properly. These so called smoothed particles, represented as pointmasses in the simulation domain Ω, carry information like position and velocityas well as certain densities and force �elds that describe the �uid's propertiesand impacts at the respected particle's location in the domain. As the nameof the method already reveals, the properties of a �uid particle are smoothedout over the neighbouring particles within a support region with radius h.

The integral interpolant of any continuous �eld quantity A is,

AI (x) =

ˆΩ

A (x′)W (x− x′, h) dx′, (2.1)

5

6 CHAPTER 2. SMOOTHED PARTICLE HYDRODYNAMICS

where x is the position vector in W and W is a certain smoothing kernelfunction with the parameter h as smoothing length.

The numerical equivalent to (2.1) yields

AS (x) =∑∀j

AjVjW (x− xj, h) , (2.2)

where the integral is approximated by a summation interpolant and j iter-ates over every single particle. Vj represents the theoretical volume of a singleparticle j. Considering the general relation between volume V , mass m andmass-density ρ

V =m

ρ, (2.3)

the basic SPH formulation is derived by substituting the volume term in (2.2)by the right hand side of (2.3)

AS (x) =∑∀j

AjmjρjW (x− xj, h) . (2.4)

The gradient (A.3) and the Laplacian (A.4) of (2.4) can be set up as fol-lows. We can take advantage of the fact that Aj

mjρj

does not depend on x

and hence, it is not a�ected by derivation. It turns out that only the kernelfunction W is decisive for the gradient and the Laplacian.

The gradient of (2.4) is

∇AS (x) =∑∀j

Ajmjρj∇W (x− xj, h) . (2.5)

Furthermore, there is another version of the gradient, introduced and derivedin [Kel06], that is supposed to obtain more accurate results

∇AS (x) = ρ∑∀j

(Ajρ2j

+A

ρ2

)mj∇W (x− xj, h) . (2.6)

The Laplacian of (2.4) �nally yields

2.2. THE SMOOTHING KERNEL 7

∇2AS (x) =∑∀j

Ajmjρj∇2W (x− xj, h) . (2.7)

2.2 The Smoothing Kernel

The main requirement of the kernel function W is to smooth out a particle'scontribution to an arbitrary SPH quantity �eld A (x) dependent on the dis-tance of the respected particle. The farther two particles are located from eachother, the smaller the mutual in�uence between the two particles has to be.

We have already mentioned the importance of the smoothing kernel for theSPH formulation and especially its derivatives. Which kernel function providesthe most accurate and realistic results for a certain quantity �eld depends verymuch on the function's characteristics. However, generally a smoothing kernelfunction has to ful�ll the following properties [Mon92]

ˆΩ

W (r, h)dr = 1 (2.8)

as well as

limh→0

W (r, h) = δ(r) (2.9)

with δ(r) as the Dirac delta function (A.1).

Other de�ned characteristics are

W (r, h) ≥ 0 (2.10)

and

W (r, h) = W (−r, h). (2.11)

To combine the requirements in a few words, a suitable kernel function has tobe normalized (2.8), positive (2.10) and axial symmetric (2.11).

The parameter h, already introduced as the smoothing length or supportradius, is of essential importance when it comes to computational complexityand physical accuracy of any SPH �uid quantity. The smoothing length spec-i�es the radius around an observed particle i within which another particle j


needs to be located to interact with particle i. If the distance between particlei and particle j is larger than the support radius, the kernel is supposed toreturn a zero value, meaning

W (r, h) = 0, ‖r‖ > h. (2.12)

The support radius h has to be chosen in such a way that on the one hand itis large enough to let a suitable number of particles interact with each otherand on the other hand it is small enough to limit the computational costs ofthe SPH contributions.

2.2.1 Example: Isotropic Gaussian kernel

In order to visualize some of the aforementioned characteristics of a smoothingkernel, the isotropic Gaussian kernel is depicted here as an exemplary kernelfunction, given by

WGaussian (r, h) =1

(2πh2)32

exp

(−

(‖r‖2

2h2

)), h > 0. (2.13)

Figure 2.1 shows the kernel's curve in one dimension for h = 1.

Figure 2.1: Isotropic Gaussian kernel in one dimension, h = 1

2.3. FLUID DYNAMICS 9

However, the isotropic Gaussian kernel does not ful�ll the constraint (2.12)because of its asymptotic curve. Furthermore, the computational costs of theexponential term evaluation does not meet our ambition of real-time interac-tivity. Due to the unsuitable properties of the isotropic Gaussian kernel, it isnot applied as the default kernel throughout this thesis, but the 6th degreepolynomial kernel (C.1) suggested by [MCGr03].

2.3 Fluid Dynamics

This section is about the �uid mechanical theory and the necessary quantity�elds, calculated with the help of the SPH method, to obtain the propertiesand dynamics of the simulated �uid. Based on the Navier-Stokes equationsfor incompressible �uids and the continuity equation (B.2) we can setup thegeneral �uid dynamics equation for our simulation purpose. By using theLagrangian �uid view we are able to simplify the standard equations in thefollowing way. Due to the fact that our particles have a �xed mass, the conti-nuity equation, de�ning mass conservation, is always satis�ed as long as we donot change the number of particles in our closed simulation system. Moreover,we can neglect the spatial derivative of the velocity term on the left hand sidebecause the particles are �oating with the �uid and hence the carried quan-tities are independent of x. Thus, the remaining derivative on the left handside is the time dependent component only and the mass-density ρ describedin subsection 2.3.1. We will not examine the �uid mechanical background inmore detail, but we want to focus on the resulting Lagrangian formulation foran incompressible �uid

ρdv

dt= −∇p+ µ∇2v + f , (2.14)

where, on the right hand side, the pressure term −∇p and the viscosity termµ∇2v model the internal force �elds, while f represents the external force�eld contributions. We can retrieve the total resulting force �eld acting on aparticle i by the summation of its internal and external force �elds,

F i = finternali + f

externali . (2.15)

An equivalent reformulation of Equation (2.14) leads to the de�nition of theacceleration ai of an arbitrary particle i,

ai =dvidt

=F iρi. (2.16)


The next subsections cover the description and calculation of the individualmandatory quantities and force �elds needed to solve Equation (2.16).

2.3.1 Mass-Density

Generally, the mass-density is de�ned as mass per unit volume ρ = mV. Re-

garding water, the mass-density would be about 1000 kgm3. However, in this

subsection we are not interested in the mass-density of a certain �uid as such,but we need to determine the mass-density acting at the speci�c location ofa respected particle i. In this form, the mass-density gives information aboutthe concentration of adjacent particles j around the position of particle i. Themore particles occupy the same area, the higher the respected mass-densitywill turn out in this region.

Using the general SPH formulation (2.4) the mass-density of particle i yields

ρi(xi) =∑∀j

ρjmjρjW (xi − xj, h) =

∑∀j

mjW (xi − xj, h), (2.17)

where W is our default kernel. The calculation of the mass-density needsto be applied before further SPH computations because each of the followingquantity �elds depend on the adjacent particles' mass-density.

2.3.2 Internal Force Fields

The internal force �elds are the pressure and viscosity force introduced in(2.14). The origin of these forces lies within the �uid itself and hence it arisesonly from the interaction between the corresponding particles. Both internalcomponents are obtained by the SPH formalism, however. In contrast to themass-density calculation the default kernel does not provide realistic results.Consequently, there are two di�erent kernel functions applied to model thepressure and viscosity forces properly.

2.3.2.1 Pressure

The pressure force is the main trigger for repulsive and attractive force im-pacts between adjacent particles. Within a location where lots of particlesare positioned densely, the pressure �eld rises and consequently the pressureforce generates repulsive forces to tear the particles apart from each other.Contrarily, the particles experience an attractive force towards each other ina deserted region. Figure 2.2 depicts the di�erent particle constellations andthe resulting repulsion and attraction between the particles modeled by thepressure force.


Repulsion Balanced Attraction

Figure 2.2: Repulsive forces in narrow particle regions (left), attractive forcesin deserted regions (right) and balanced disposition in between.

At �rst, we have to determine the pressure �eld p. The ideal gas law, ob-taining particle i's pressure �eld as

pi = kρi, (2.18)

is a simple way to get the pressure �eld out of the gas sti�ness constant k andthe mass-density ρi. However, the standard ideal gas law does not produceattractive forces because the pressure �eld is always positive. In fact, thismight be adequate for gaseous �uids, that tend to di�use into the simulationdomain, but it is not applicable for liquids, e.g. water. We want to model neg-ative pressure �elds as well, therefore [DeCa96] introduces a modi�ed versionof Equation (2.18), that is

pi = k (ρi − ρ0) , (2.19)

where ρ0 is another parameter given in table 3.2 and known as the rest densityof the �uid.

Now that we have the pressure �eld, the pressure force can be easily setup using the basic gradient formulation (2.5) with A (x) as the pressure term−∇p (x)

f pressurei (xi) = −∇p (xi) = −∑∀j 6=i

pjmjρj∇W (xi − xj, h) . (2.20)


In [Kel06] it is stated that this form of the pressure force calculation actuallydoes not produce symmetric force impacts due to the fact that the corre-sponding particles' contributions do not ful�ll Newton's third law of motion.In other words, since pj and pi as well as ρj and ρi are generally not equal,the action-reaction law is not conserved by (2.20). Using the reformulatedgradient equation (2.6) instead of the basic form (2.5), the SPH equation forthe pressure force becomes

f pressurei (xi) = −ρi∑∀j 6=i

(piρ2i

+pjρ2j

)mj∇W (xi − xj, h) , (2.21)

providing the conservation of symmetry and momentum scaled by particle i'smass-density. Another symmetric equation to obtain particle i's pressure forceis introduced by [MCGr03] and employs the standard arithmetic mean of therespective particles' pressure �elds pi and pj

f pressurei (xi) = −∑∀j 6=i

pi + pj2

mjρj∇W (xi − xj, h) . (2.22)

As already mentioned, our default kernel is not suitable for the pressure forcecalculation because the gradient of the default kernel tends to zero when theadjacent particles are �oating very close to each other. Hence, the particleswould not experience the desired repulsive forces in this situation leading on toa possible coalescence of particles. To prevent clustering, we choose the spikykernel (C.4) from [MCGr03] to smooth out the pressure force contributions. Incontrast to the 6th degree polynomial kernel, the spiky kernel's gradient pre-serves the impact of the repulsive forces when two nearby particles approacheach other. Equation (2.23) describes the spiky kernel's characteristics forsmall distances.

|∇Wspiky (r, h)| →45

πh6, ‖r‖ → 0 (2.23)

2.3.2.2 Viscosity

The viscosity term µ∇2v in Equation (2.14) is de�ned as the �uid's internalresistance to �ow. While a mucous �uid provides a high viscosity, water, forexample, is very thin and has a relatively low viscosity. In SPH manner, theviscosity force acting at a speci�c location can be calculated with the standardinterpolation scheme


f viscosityi (xi) = µ∇2v (xi) = µ∑∀j 6=i

vjmjρj∇2W (xi − xj, h) , (2.24)

where µ is a given viscosity coe�cient in Table (3.2) describing the viscosityof the �uid as such. However, similar to Equation (2.20), the standard versiondoes not provide symmetric results due to the fact that particle i and particlej do not usually have identical velocities. Via the relative di�erence betweenboth velocities we can easily symmetrize Equation (2.24) and end up with

f viscosityi (xi) = µ∑∀j 6=i

(vj − vi)mjρj∇2W (xi − xj, h) (2.25)

as introduced in [MCGr03] as well.

It is necessary for the stability of the simulation that the Laplacian of thesmoothing kernel W is positive for ‖r‖ < h. Without this condition we couldnot assure that the viscosity force has a dampened e�ect on the velocity onlyand hence it would be possible to introduce energy and instability into thesimulation. Therefore, neither the default kernel nor the spiky kernel are anadequateW -function for the viscosity force evaluation. To solve these stabilityissues, [MCGr03] presents the viscosity kernel function (C.7) as the appliedsmoothing kernel for the viscosity force calculation.

2.3.3 External Force Fields

External force �elds do not arise from within the �uid but they are caused bythe physical laws of the �uid's environment. There are two external forces thatare explicitly necessary to simulate a liquid like water. On the one hand there isthe gravity force and on the other hand the surface tension force. Consideringgaseous materials, there would be no impact of the surface tension force, buta buoyancy force term instead. As already mentioned, the external forces areintroduced as f on the right hand side of Equation (2.14).

2.3.3.1 Gravity

The gravity force �eld is not really a part of the SPH quantities as it is notobtained by the basic SPH equations. However, the impact of gravity is nec-essary for realistic results of our simulation. The applied equation to obtainthe gravitational force acting equally on any particle i yields

f gravityi (xi) = ρ0g, (2.26)

with the gravitational acceleration g de�ned in Table 3.1 and the already


Figure 2.3: The particles on the surface are equally forced in inward directionforming a circle in 2D and a sphere in 3D.

introduced rest density ρ0.

2.3.3.2 Surface Tension

The surface tension force of a liquid, e.g. water, has several visible e�ectslike drop clustering on smooth waxy surfaces, �oating of a water strider orthe formation of interface tension forces that separate two di�erent liquids. Inour water simulation, the most suitable picture of the surface tension force isthe e�ect of drop clustering. Generally, surface tension forces are caused bythe unbalanced molecular dynamic forces at the free surface, e.g. the regionbetween two di�erent �uids like water and air. In these regions the watermolecules, represented by our particles, are forced to shift in the direction of thesurface normal towards the liquid itself causing a minimization of the liquid'scurvature at the surface, visualized in Figure 2.3. Therefore, the resultinggeometry forming a liquid's surface is always approximating a spherical body,because a sphere provides the smallest surface area compared to its volume. Ina very similar manner it is possible to explain why water drops roll o� smoothsurfaces like for example glass panels or front lids of automobiles.

However, let us now come back to our SPH equations and to the forcecontribution of the surface tension. Usually, the surface tension force is notdirectly present in the Navier-Stokes equations but does only e�ect the bound-ary conditions. In our case, the particles are suited for the moulding of theboundary conditions because the particles themselves de�ne the liquid as suchand therefore they form its boundaries as well.

Obiously the surface tension force is only present in regions near the liq-uid's surface, so �rst of all, we need to determine if a certain particle is lyingon the surface of the simulated liquid or not. The color �eld c (x) is supposedto interpolate the value 1, smoothed out by the well-known SPH formula

ci (xi) =∑∀j

mj1

ρjW (xi − xj, h) =

∑∀j

mjρjW (xi − xj, h) . (2.27)


stronger

weaker

Figure 2.4: The e�ect of the direction of the surface's curvature on the impactof the surface tension force. The positive curvature on the left side induces astronger surface tension force than the negative surface curvature.

The resulting surface normal, introduced as a vector pointing towards the liq-uid, is actually the gradient of the color �eld (2.27) and hence

ni = ∇ci (xi) =∑∀j

mjρj∇W (xi − xj, h) . (2.28)

The condition‖ni‖ > 0 veri�es that the particle i is located near or on theliquid's surface.

The intensity of the surface tension force depends on the direction of thecurvature on the surface. Figure 2.4 shows a very descriptive example for thecharacteristic that the in�uence of the surface tension force is stronger on partsof the surface with a positive curvature than on parts with a negative curva-ture. The quantity that measures curvature of the surface and its e�ect on theimpact of the force is given by

κi = −∇2ci‖ni‖

= −∇ni‖ni‖

. (2.29)

Now we have already collected the necessary information to set up our surfacetension force equation as presented in [MCGr03] as follows,

f surfacei (xi) = σκini = −σ∇2cini‖ni‖

, (2.30)

where σ is the tension coe�cient dependent on the simulated �uids sharingthe free surface region.

The calculation of the surface tension force does only produce adequateresults and is therefore only applied, if


‖ni‖ ≥ l, (2.31)

where l is a threshold parameter given in Table 3.2. Otherwise, the evaluationof the term ni‖ni‖ is not stable and may lead to numerical problems.

2.4 Leap-Frog Time Integration Scheme

After the calculation of each of the necessary SPH quantities and force �elds,the total acting force on particle i yields

F i = fpressurei + f

viscosityi + f

gravityi + f

surfacei . (2.32)

Remembering the original equation (2.16) we wanted to solve with the SPHmethod, we have now summoned all terms to calculate each particle's currentacceleration. The obtained acceleration ai is then integrated numerically (2.16)to determine the particle's advanced position and velocity. There are threeintegration schemes presented in [Kel06], two of them are the implicit Eulerand the Verlet method, but in this thesis we decided to apply the leap-frogscheme [Ebe03] only.

Generally, numerical time integration schemes are responsible for variousphenomena in particle simulations, e.g. motion damping, complications withcollision handling or stability issues. We have chosen the leap-frog schemedue to its superiority concerning performance and stability and furthermoreit provides the most realistic results for shorter time steps ∆t. The numeri-cal integration step for a particle's velocity update using the leap-frog schemeyields

vt+0.5∆t = vt−0.5∆t + ∆tat (2.33)

and for the position update it is

xt+∆t = xt + ∆tvt+0.5∆t, (2.34)

initialized once with the leap velocity on negative time scale

v−0.5∆t = v0 − 0.5∆ta0. (2.35)

The current particle's velocity at time t, that is necessary for the calculationof the �uid dynamical forces and the collision handling procedure in the next

2.5. COLLISION HANDLING WITH RIGID BODIES 17

section 2.5, can be approximated by the arithmetic mean of the leap velocities

vt ≈vt−0.5∆t + vt+0.5∆t

2. (2.36)

2.5 Collision Handling with Rigid Bodies

Concerning collision handling, we want to focus on collisions between the �uidparticles and static rigid bodies only. The collision handling routine is splitup in two di�erent parts, namely the collision detection and the collision re-sponse. For reasons of simplicity and the reduction of computational costs, theapplication of standard implicit primitives, introduced in [Kel06], in favor ofthe more advanced tetrahedra meshes is recommended to model the collisionobjects. As aforementioned, the use of implicit primitives in our simulationis limited to rigid bodies. The two implemented primitives to choose from inthe developed simulation are a sphere and a rectangular box of arbitrary size.While the box geometry is only implemented as a boundary primitive, alsoknown as oriented bounding box, the spherical body can be applied in bothways, as a boundary object and as an arbitrarily positioned obstacle insidethe simulation domain. Figure 2.5 is supposed to clearify the main di�erencebetween a boundary and an obstacle object and why it is necessary to distin-guish between the two types. For our purpose, we will assume the collisionobjects to be impermeable and stationary.

2.5.1 Collision Detection

Basically, the only information needed to decide whether a particle has col-lided with a collision object or not is its position x. However, the positionalone is not su�cient to handle a collision adequately because it does not re-veal anything about the intensity of the penetration or the colliding particle'soriginal direction of �ow. Hence, the following three properties are necessaryfor a satisfactory analysis of any collision:

• Contact point cp

• Depth of penetration d

• Surface normal n, pointing away from the collision object.

The contact point cp is supposed to be the location where a particle haspenetrated the implicit primitive. The penetration depth gives information


Area of penetration ofobstacle object

Area of penetration ofboundary object

Figure 2.5: Illustration of the main di�erence of boundary and obstacle primi-tives. The �forbidden� area of penetration of a boundary object is outside of thesphere, while it is the other way round for the obstacle object.

about the distance a particle has already travelled inside an obstacle or outsideof the simulation domain boundaries. The surface normal n is an unit vectorthat orthogonally points away from the collision object.

As already mentioned, with the use of implicit primitives we can determineany possible collision with the help of the respected particle's current positiononly. A simple detection function F : R3 → R for any implicit primitive yields

F (x)

< 0 x is inside primitive

= 0 x is on the primitive′s surface

> 0 x is outside the primitive.

(2.37)

From now on, we appoint that the second condition of (2.37) will not beevaluated as an actual collision, so as long as the observed particle has notyet penetrated the implicit primitive, no collision will be detected. At thatpoint of time, we again need to think about the di�erence of a container andobstacle object. If the primitive is simulated as a container, condition three of(2.37) will lead to an actual detection of a collision, because the penetrationarea will be outside of the primitive. For an obstacle object it is the oppositecase, because there the particles will trigger a collision if the �rst condition of(2.37) is evaluated as true. Hence, the collision detection routine needs to becapable to distinguish between containers and obstacles and to generate validdetection results for both cases.


Both of the presented implicit primitives apply di�erent detection functionsand di�erent analytical ways to detect potential collisions properly and aretherefore treated in separate subsections.

2.5.1.1 Sphere

The sphere has a smooth, continuous structure and is a very suitable objectfor collision handling purposes. With the following de�nition of the sphericalimplicit primitive, we are able to implement the sphere as a container or anobstacle. The detection function is de�ned by

Fsphere (x) = ‖x− c‖2 − r2, (2.38)

where c is the center of the sphere and r is its radius. If Fsphere (x) has beenevaluated accordingly and a collision has taken place, the mandatory proper-ties will be calculated as follows:

cpsphere = c + rx− c‖x− c‖

(2.39)

dsphere = |‖c− x‖ − r| (2.40)

nsphere = sgn (Fsphere (x))c− x‖c− x‖

, (2.41)

where sgn (x) is the signum function de�ned in (A.5) and in this case, it isresponsible for the valid direction of the surface normal vector nsphere.

2.5.1.2 Box

Using a box as collision object is not as straightforward as using a sphere dueto its discontinuous and square-cut geometry. For the sake of simplicity it ishere presented as a container primitive only. However, an OBB geometry stillprovides a lot of practical application scenarios, e.g. pouring a �uid into anopen vessel, simulating water movement inside a basin or a combination ofboth scenarios. First of all, we need to introduce a local coordinate system asa transformation from the usual world coordinate system. The local particleposition xlocal seen from the box's point of view is the relative position ofthe particle compared to the center of the box container. Figure 2.6 shows an


OBB

c

ext vector

position vectorof particle i in LCS

origin of WCS

position vectorof particle i in WCS

particle i

x

y

Figure 2.6: Exemplary description of particle i's position in the two di�erentcoordinate systems WCS (green) and LCS (blue).

example for the di�erent descriptions of a particle's position in both coordinatesystems.

The transformation from a regular oriented world to a local coordinate iscalculated by

xlocal = x− c, (2.42)

where c de�nes the center of the box. Regular oriented means that the axisorientation, illustrated in Figure 2.6, is of standard form and neither rotatednor skew.

With the help of the respected particle's local position, the correspondingdetection function of the OBB yields

Fbox (x) = [|xlocal| − ext]max , (2.43)

with vector ext as the axis extent in each direction measured from the centerof the box. The local contact point is

cplocal = min [ext,max [−ext,xlocal]] (2.44)

and the corresponding WCS contact point, necessary for the collision response,is given by the transformation


cpbox = c + cplocal. (2.45)

The further missing mandatory collision information for the box becomes

dbox = ‖cpbox − x‖ (2.46)

and

nbox =sgn (cplocal − xlocal)‖sgn (cplocal − xlocal)‖

. (2.47)

2.5.2 Collision Response

After the mandatory information from the collision detection is summoned,we need to apply a realistic and adequate response to the corresponding colli-sion. In [Kel06] there are several di�erent collision responses de�ned, e.g. theacceleration-based, impulse-based or projection-based method. An impulse-based collision response is an event driven method modifying a particle's ve-locity at the very moment of the collision to avoid the penetration of thecollision object. A projection-based collision response modi�es the positionof a particle, that is already penetrating a collision object, by projecting itback on the surface of the implicit primitive. In our implementation we do notexplicitly avoid penetrations but react to them in the same time step as theyappear. For this reason, a collision is not actually visible in the simulation butonly the response can be observed. As the procedure of the applied collisionresponse routine involves modifying the penetrating particle's position as wellas its velocity, it can be imagined as an hybrid version of an impulse-basedand a projection-based method.

2.5.2.1 Hybrid Impulse-Projection Method

As already mentioned, a projection-based collision response method projectsa penetrating particle's position back on the surface of the collision object.In our implementation with implicit primitives the position projection simplyyields

xi = cp, (2.48)

where we just set the colliding particle back to the assumed contact point onthe collision object's surface.


The velocity vi of the respected particle can be re�ected using the standardvector re�ection method

vi = vi − 2 (vi · n)n, (2.49)

that results in a perfect elastic collision implying a conservation of kinetic en-ergy. However, such a collision behaviour does not seem appropriate for �uidparticles because generally they are not bouncing materials. Thus, we needto introduce a restitution parameter into (2.49) to control the conservation ofkinetic energy as follows

vi = vi − (1 + cR) (vi · n)n, (2.50)

where 0 ≤ cR ≤ 1 is the coe�cient of restitution. With cR = 0 we are able tomodel the usual no-slip boundary conditions suitable for any simulated liquid,while cR = 1 again leads to (2.49).

Another more sophisticated re�ection method comprises the penetrationdepth as well. Equation (2.50) does not constraint the kinetic energy forcR > 0. To ensure that there is no energy introduced by the collision, we needto take care that only the velocity relative to the penetration depth is impliedin the re�ection equation

vi = vi −(

1 + cRd

∆t ‖vi‖

)(vi · n)n. (2.51)

However, if one is only interested in simulations of low viscosity �uids, therestitution coe�cient will consequently be set to zero (cR = 0). For this case,Equation (2.50) is equal to (2.51) and thus the simpli�ed impulse re�ectionequation for brisk �uid collisions with stationary rigid bodies yields

vi = vi − (vi · n)n. (2.52)

2.5.3 Discussion

The main focus of the applied collision handling routine lies on the low com-putational cost of the collision solver. We are not interested in expensivealgorithms, e.g. intersection tests between colliding geometries, to handle ourrigid body collisions in the most perfect possible way. However, the simpleand rapid way we solve our collision problems with the presented routinesand primitives brings some irregularities into the �uid simulation. Figure 2.7


OBB

iold

jold

ij

actual contact point of particle i

actual contact point of particle j

determined contact point for both particles

Figure 2.7: Depiction of an exemplary situation where the collision handlingroutine would advance two colliding particles to the exact same position at theedge of the OBB. Once two particles have merged, they �oat superposed

shows an exemplary situation where two particles that penetrated the OBB atdi�erent positions are set to the exact same position by our collision handlingmethod. Unfortunately, there is no simple and a�ordable way to separate co-alesced �uid particles. Once two particles share the same position, they willexperience the same forces and will therefore superpose each other for the restof the simulation.

Furthermore, the contact point cp, determined by the collision detectionroutine, is usually not the entrance point of the particle into the collisionobject but this point on the implicit primitive's surface that lies nearest tothe partricle's current position and therefore normal to the surface. Finally,one needs to decide whether accuracy or the computational rate of a certainmethod is more important. For the sake of real-time achievements of our SPHsolver, the presented collision handling method meets our needs su�cientlyanyhow. Moreover, the smaller we choose the time step ∆t, the better theresults of the collision handling routine will turn out. Therewith, a situationlike it is depicted in Figure 2.7 would be avoidable by simply adapting ∆t.

Chapter 3

Implementation

The most challenging part of the thesis, along with the study of the �uiddynamics theory and the determination of the SPH quantities, is the imple-mentation of the particle-based solver and the setup of the whole simulationprogram. This chapter provides a thorough introduction of optimization tech-niques, including algorithms and data types to make the SPH simulation worke�ciently as well as the necessary physical parameters to make it work realis-tically. Beyond that, it is supposed to support less experienced readers witha detailed implementation guideline and is �nally presenting an approach intothe �eld of parallel computing using the OpenMP API. The following seman-tics of exemplary algorithms and the introduced data types are based on theC++ programming language. As already mentioned in Section 1.2 in the in-troductory chapter, the main application �eld of this implementation is thesimulation of various �ow scenarios of water. However, the readers who areinterested in the simulation of other liquids or even gases should be aware ofthe fact that the only di�erence lies in the measurement of the physical param-eters in Section 3.2. The optimization techniques and programming guidelinespresented here are therefore still applicable and helpful for other real-timesimulation approaches of further �uid materials.

3.1 Optimization Techniques

3.1.1 Linked Cell Method

As already mentioned, an important goal of this thesis is to achieve a real-time�uid dynamics solver. Therefore, we �rst have to gain insight into the com-plexity of the whole simulation. Even though particle-based methods like theSPH method are generally quali�ed for e�cient simulation programs, there isalways a necessity to apply certain optimization techniques and algorithms toimprove the computational complexity of a method. The calculation of any

25

26 CHAPTER 3. IMPLEMENTATION

hh

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

ji

y

valid contributionfrom j to i

support region

no contribution from y to i

Figure 3.1: Main idea of the Linked Cell method. Equilateral cells are putunder the simulation domain and allow for a simple determination of the supportregion of a certain cell (here 7) and the contained particles. Further particlesoutside of the support region are left untouched.

�uid quantity �eld requires the iteration through all particles in the simula-tion. Moreover, the calculation of each particle's SPH terms, except for thedirectly applied �elds, needs the contribution of every other particle in thesimulation as well. Altogether the complete evaluation of any SPH term withN simulated particles would have the computational complexity of O (N2),making it irrelevant to talk about real-time capabilities.

However, the property (2.12) of the kernel functions states that a certainparticle's contribution to any quantity is zero, if the particle is located outsidethe smoothing radius h. Hence, it is su�cient to iterate only over adjacentparticles that lie within the support radius and leave the further particlesuntouched. The Linked Cell method, also used for short-range potentials inmolecular dynamics simulations [GCKn03], is quite a straightforward way toreduce the computational complexity from O (N2) to O (M ·N), where M isthe average number of particles located in a certain support region. Figure3.1 gives an overview of the main idea of the Linked Cell method in twodimensions. Each depicted cell is implemented as a particle container thatholds a copy of or a reference to every contained particle. If each cell's sidelength is ≥ h, the method will guarantee that for the computation of anySPH term, it is enough to iterate through the respected particle's cell and itsneighbouring cells only. Depending on the dimension of the simulation domainthe Linked Cell algorithm limits the maximum number of involved cells per

3.1. OPTIMIZATION TECHNIQUES 27

Algorithm 3.1 C++ - de�nition of the particle and cell structure as well aseach vector containers.struct Particle{

real x[DIM];real v[DIM];real f_pressure[DIM];real f_viscosity[DIM];...real mass_density;...

};

struct Cell{std::list p_list;

};...std::vector p_vec;std::vector c_vec;

SPH computation to 27 in three dimensions and nine in two dimensions.

Obvious drawbacks of the Linked Cell method are the memory overheadof the cell structure and an additional routine to update the content of thecells after each time step to transfer all particles that moved to an adjacentcell to their new container cell. However, the gain of the reduction of the com-putational complexity from O (N2) to O (M ·N) supersedes any drawbacksconsidering higher memory consumption and additional methods to updatethe cells' contents. After the pros and cons of the Linked Cell method havebeen discussed, we now would like to deal with the implementation of themethod as such, talking about applied data types and structure in memory.

With regards to our Linked Cell variant, a single cell is implemented asa structure carrying a linked list of references to those particles that are po-sitioned inside the corresponding cell. The numerous cell structures coveringthe domain are organized in a standard vector container as shown in algorithm3.1. Due to the fact that the cell structs are pushed into the vector in lexico-graphic order, the indices likewise will be arranged lexicographically. In ourcase, the lexicographic ordering means that the cells are enumerated at �rst inx-direction, then in y- and at last in z-direction. There is also an exemplarydepiction of the cell enumeration in Figure 3.1 in the two dimensional case.With such an arrangement, it is always possible to determine the respectivecell's direct neighbours without searching through the vector. Hence, we canguarantee that each cell is accessible in O (1). Algorithm 3.2 describes howan arbitrary cell is easily addressed by its index position in the cell vector.


Algorithm 3.2 De�nition of the function index that returns the index posi-tion of any addressed cell in the vector. Integer array ic contains the threedimensional coordinates, expressed in cell sizes, of the respective cell, while ncprovides the total number of cells in each direction.

int index(int* ic, int* nc){return( ic[0] + nc[0] * ( ic[1] + nc[1] * ic[2] ) );

}...list &icList = c_vec[ index( ic, nc ) ].p_list;

Simultaneously, we now have access to the whole particle reference list p_listof the cell. A brief discussion about the reasons why we decided to use thehere presented data types and containers can be found in subsection 3.1.3.

3.1.2 Optimization of Memory Access

The main issue of numerous computational simulations today is not the lackof computing power of the CPU but the performance bottleneck of the mainmemory. The computer engineers try to counteract this problem with the in-tegration of hierarchical cache architectures. Those caches are small but veryfast memory chips holding copies of data elements from the main memory toprovide a much faster access to these data values. However, if the implemen-tation of a certain computer program does not account for the functionalityof the cache architecture, the developer of the program will not bene�t of thepossible advantages and the performance of the code will rather su�er fromthe cache operations.

The general idea of caches is to store these data elements that are mostpossibly used next or in the near future by the executed program. In fact,data elements are not transfered separately but in so-called cache lines, whichcontain a certain number of data elements. Consequently, the code developershould use the already cached data elements whenever it is possible to exploitthe performance of cheap data accesses, also called cache hits. Otherwise,cache misses will most likely occur, which limit the performance by the speedof main memory latency and bandwidth.

Coming back to our �uid simulation, we are su�ering from incoherent mem-ory accesses as well. While the �uid particles themselves are properly storedsequentially in a vector container, their ability to �oat through the whole do-main and consequently enter di�erent cells of our Linked Cell domain leads onto a more or less unstructured memory access to the particles when we iter-ate over the cells as illustrated in Figure 3.3. However, with the applicationof the Linked Cell method it is generally not possible to access the particlessequentially in any case without the necessity of expensive additional e�ort,


hh

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

ji

k

simultaneous mutual contributions

optimized support region

Interaction bet-ween i and k already done

Figure 3.2: Optimization of the Linked Cell support region by making use ofsymmetric particle-particle contributions. Cell 7 is the respective cell again.

e.g. the permanent sorting of the contained particles. Nevertheless, there isa possibility to minimize cache misses within the Linked Cell-iterations. Forexample, while we calculate a certain force �eld, e.g. the pressure force of par-ticle i with regards to the respective contribution of particle j with the helpof the arithmetic mean variant (2.22)

f pressureij = −pi + pj

2

mjρj∇W (xi − xj, h) , (3.1)

we should take advantage of the temporal locality property of cache elements.Hence, we calculate particle i's contribution to particle j's pressure force rightafter expression (3.1) as well

f pressureji = −pj + pi

2

miρi∇W (xj − xi, h) (3.2)

and may experience a performance gain because particle i's and particle j's�uid quantities have already been cached for the calculation of equation (3.1).The term temporal locality describes the pro�table reuse of previously cacheddata elements and the involved increase of the cache hit rate.

Using this scheme of successive computations, we do not only bene�t fromthe possible cheap data accesses alone but we also handle two contributionsin one iteration step at once. In principle, this optimization technique shouldalso be helpful for other SPH implementations without the application of the


hh

1 2 3

4 5 6

7 8 9Content of each cell's particle reference list

...

...

3

5

6

a

hn

x

k dt

a

t k x d

h n

Figure 3.3: Arbitrary distribution of particles in the simulation domain. Eachcell stores a reference to the comprised particles in a list.

Linked Cell method. However, in combination with this method, we can re-duce the number of necessary cell lookups per SPH term evaluation per particleagain. Considering the determination of a certain particle's quantity �eld, it isnow su�cient to touch the considered particle's own cell and its neighbouringcells with a higher lexicographic number only. Figure 3.2 depicts the furtherlimited support region caused by this order of computations in two dimensions.The support region needs to cover the higher numbered neighbour cells onlybecause the contributions from the particles in the lower ordered neighbourcells have already been calculated in the previous iteration step. Moreover,concerning particle i's own cell, we only need to loop over adjacent particlesthat are listed in p_list after particle i. Expressed in terms of complexity, themaximum number of considered cells per SPH computation is reduced to 14in three dimensions and to �ve in two dimensions, while there is no additionalalgorithmic or memory overhead necessary. Altogether, this optimization tech-nique approximately has halved the computational complexity of the LinkedCell method alone and moreover, we exploit the temporal locality of cacheddata elements providing a further speed-up of our SPH solver.

3.1.3 Discussion and Further Improvements

In the previous two subsections, we have already introduced the optimizationtechniques that in�uence our solver's complexity most intensely. However,as we already suggested in the last subsection, the cell-iterations needed toevaluate the SPH terms for each of the particles are leading to very unstruc-


tured and non-sequential memory accesses. Figure 3.3 shows a compositionof exemplary simulation states and the content of the corresponding particlereference lists. Considering the particle constellations, we have to iterate overeach cell to determine a certain �uid quantity �eld of each involved particle.Coming across cell 3 we need all particles from cell �ve and six to completelycalculate the smoothed �eld values for particle a. Obviously, we need to jumpthrough the particle vector in a more or less arbitrary way causing our simu-lation program to mess around with memory accesses and to lose a lot of timeby demanding particle elements from the main memory. The question is, ifwe can do any better by not storing particle references inside the cell but theparticle elements themselves and if the implementation of the list container isthe best choice for our purpose.

Let us think about the main requirements for a proper particle containerwithin each cell. We actually need the possibility to e�ciently enumerate,insert and delete the particles in the cells. Considering the container type,we restrict our performance examination to the application of the C++ STLcontainers list and vector for the reason of simplicity. First of all, we want tointroduce the abilities of both container types based on [Jos01]. The vectorcontainer shows good performance for appending and deleting elements at itsend but poor results for insertion and deletion at middle positions. The rea-son is that the elements of a vector are always stored successively causing thereassignment of every element behind the deleted or inserted one. Moreover,the standard implementation of the vector container does not automaticallyprovide the reduction of the vector's size and the deallocation of unused mem-ory after the deletion of a certain element. Due to the fact that we generallyare not able to predict the de�nite amount of memory required for each cell'scontainer, the memory management of the vector type seems unfavourable forour purpose. In contrast to the vector, a list container does not provide ran-dom access leading to slow performance when accessing a particular element.However, the implementation of the standard list allows for e�cient insertionand deletion at any position because the internal structure of the list does onlyoperate on pointers. Generally, the list elements are not stored in a structuredand coherent way causing the program to jump through the memory wheniterating over the contained data elements.

We already mentioned that we have a major problem with unstructuredmemory accesses due to the fact that the particles are �owing freely aroundthe cell domain. One way to possibly avoid the expensive arbitrary accessesto the particle vector p_vec would be to store not the references but theparticles themselves in each cell's container. This way, we would not need anexternal particle storage anymore. However, this would only make sense incombination with a vector container that stores its elements successively. Thelist container would not exploit the possible advantages because of its propertyto store its elements in an unstructured manner anyway. The vector container


helps us to organize our memory accesses to the adjacent particles in a betterway. Unfortunately, the deletion of particles from an arbitrary position in thevector is a very often performed operation in our simulation and especially forthis, the vector container shows a bad performance. Furthermore, the largersize of a whole particle element in contrast to the reference would even leadto a worse performance when it comes to shifting elements inside the vectorcontainer. Finally, we have to face the fact that there does not seem to bean ideal way to guarantee coherent data access in combination with minimumcomplexity for the cell updates. In order to make a decision, we preferredstoring particle references instead of whole particles and accept unstructuredmemory accesses in favor of an expensive and memory ine�cient cell updateroutine.

After we decided to store particle references in the cell containers and try tolive with expensive and incoherent memory accesses, the list container providesthe most suitable and e�cient properties. Considering our implementation, themain advantage in contrast to the standard vector container is the better per-formance for the deletion of elements at arbitrary positions that is constantlynecessary for the cell update procedure.

While we introduced the �uid mechanical force and density �elds in Section2.3 we already came across the symmetry conservation, regarding Newton'saction-reaction law, of force and density �eld contributions. In our case, itstates that particle j's smoothed quantity contribution towards particle i isthe negation of the contribution particle i exerts on particle j, generally mean-ing F ij = −F ji. Considering this symmetry of particle-particle interactions,we are able to skip certain redundant computations that slow down our SPHsolver for nothing. In fact, the action-reaction law cannot be applied for ourSPH terms as it was originally set up. For example, if we take the evaluation ofthe viscosity force, we will observe the following two equations for the mutualcontributions of particle i and j

f viscosityij = µ (vj − vi)mjρj∇2W (xi − xj, h) (3.3)

f viscosityji = µ (vi − vj)miρi∇2W (xj − xi, h) . (3.4)

Considering our simulation, we employ an identical mass for each particlemi =mj = m. Hence, Equation (3.3) and (3.4) can be simpli�ed furthermore. If wetake a closer look on each of the terms, we will observe that they only di�er inthe sign and the mass density, making it possible to reduce the computationale�ort as presented in Algorithm 3.3.

3.2. SIMULATION PARAMETERS 33

Algorithm 3.3 Code example for the exploit of symmetrical force contribu-tions. The negative sign in the last assignment arises from the interchangedvelocity di�erence in the �rst term....tmp = mu * (v_j - v_i) * m * v_kernel_lap( x_i - x_j, h );

f_viscosity_ij = tmp / mass_density_j;f_viscosity_ji = (- tmp ) / mass_density_i;...

Obiously, we do not fully agree with Newton's third law of motion becauseof the scaling factor comprising the respective mass densities. However, theaction-reaction law helps us to understand the particle-particle interactionsmore clearly and to exploit the possible mathematical simpli�cations moreeasily. Regarding this short example for the viscosity force, we actually savethree �oating point multiplications, two additions and one kernel function eval-uation per dimension. Such a splitting method is applicable for any SPH basedcontinuous quantity �eld in our �uid simulation and is a minor but e�ectiveoptimization technique to further reduce the computational complexity of thesolver.

3.2 Simulation Parameters

The following parameters are mostly based on the calculations and experimentsfrom [Kel06]. For the sake of integrity of our own implementation, all of thenecessary parameters applied for the SPH water simulation solver are listedhere as well. While some of the parameters are based on physical properties,other values are determined by simulation experiments improving the stabilityof the solver and the realistic behaviour of the particles.

Description Symbol Value

time step ∆t 0.01 s

gravitational acceleration g

0−9.820

ms2

Table 3.1: General parameters independent of the simulated �uid.


Description Symbol Value

rest density ρ0 998.29kgm3

mass m 0.02 kgviscosity µ 3.5Pa · s

surface tension σ 0.0728 Nm

threshold l 7.065gas sti�ness k 3 Jrestitution cR 0

support radius h 0.0457m

Table 3.2: Physical parameters and their values used for a realistic simulationof water dynamics.

3.3 Implementation Guideline

Throughout this section, we focus on a thorough and understandable guide-line how to implement the SPH �uid solver in combination with the presentedcollision handling and time integration scheme as well as the above intro-duced optimization techniques. Especially the use of the Linked Cell methodis mandatory for the following implementation tutorial because each of theparticle neighbourhood iteration is based on the cell structure. In order tosimplify our guideline, we make use of di�erent listing formats to emphasizethe order of the necessary steps. The operations done in the indented linesdepend on the corresponding parental line and therefore need to be executedsuccessively. In the same way, the numerically and alphabetically enumeratedsteps have to be performed in the exact order they are listed. Otherwise, thearbitrarily marked list elements sharing the same indentation depth can beexecuted in a user-de�ned sequence.

3.3.1 Initialization of the Simulation

1. Set up the necessary information for the implicit primitive container andpossible obstacle objects.

2. Create the �uid particles and set their positions, velocities and otherquantities to their according initial values. Insert each of them in theparticle vector container p_vec with suitable size N .

3. Initialize the cells for the Linked Cell method with equidistant side lengthof h and order them lexicographically in another vector structure denotedin algorithm 3.1. Assign the reference to each single particle to thecorrespondent cell's particle list according to the particle's initial positionx.

3.3. IMPLEMENTATION GUIDELINE 35

4. If necessary, initialize the leap velocity v−0.5∆t for the leap-frog integratoras introduced in (2.35) for each particle.

3.3.2 Evaluation of the Mass Density

Iterate over each cell ic in the simulation domain Ω in lexicographic order

• Iterate over each particle i in ic

1. Iterate over each particle j > i in ic

� Calculate the mutual mass density contributions for particle iand j using (2.17).

2. Iterate over each cell kc > ic in the neighbourhood of cell ic

� Iterate over each particle k in kc

. Calculate the mutual mass density contributions for particlei and k using (2.17).

3.3.3 Evaluation of the Pressure Field

Iterate over each particle i in the particle vector p_vec

• Calculate the pressure �eld using (2.19).

3.3.4 Evaluation of the Internal and External Forces



1. Calculate the gravity force of particle i using (2.26)

2. Iterate over each particle j > i in ic

� Calculate the mutual pressure force contributions for particle iand j using (2.22)

� Calculate the mutual viscosity force contributions for particle iand j using (2.25)

� Calculate the surface normal n and the Laplacian of the smoothedcolor �eld c of particle i and j using (2.28) and the Laplacianof Equation (2.27)

3. Iterate over each cell kc > ic in the neighbourhood of cell ic

� Iterate over each particle k in kc


. Calculate the mutual pressure force contributions for par-ticle i and k using (2.22)

. Calculate the mutual viscosity force contributions for par-ticle i and k using (2.25)

. Calculate the inward surface normal n and the Laplacianof the smoothed color �eld c of particle i and k using (2.28)and the Laplacian of equation (2.27)

4. Calculate the norm ‖n‖ particle i's inward surface normal5. If ‖n‖ ≥ l then

� Calculate the surface tension force of particle i using (2.30)

Keep in mind to simplify the particle-particle contributions for the SPH-basedforce terms as well as for the mass density �eld as denoted in subsection 3.1.3to reduce the number of necessary operations. The order of the calculationsapplied in this implementation guideline is not mandatory at all. We just triedto keep the computational e�ort of the domain-covering Linked Cell iterationsas small as possible. Hence, we calculate each of the di�erent force �elds inthe same iteration of the respective loop causing our code to look unorganizedat �rst sight. For the sake of a more structured implementation, there is thepossibility to split the computation of the internal and external force �elds intoseparate routines due to their mutual independence. However, the computa-tional cost of an additional domain-covering cell-iteration is not compatible toour ambition to achieve real-time performance.

3.3.5 Leap-Frog Scheme and Collision Handling

Iterate over each particle i in vector p_vec

1. Calculate the total force f acting on particle i using (2.32)

2. Calculate the acceleration of particle i with (2.16)

3. Update particle i's leap velocity vt+0.5∆t using (2.33)

4. Update particle i's position using (2.34)

5. Approximate the current velocity v with (2.36)

6. Check for a collision with an implicit primitive boundary or obstacleobject evaluating the detection function (2.37)If a collision is detected

(a) Summon the mandatory collision information listed in subsection2.5.1

3.3. IMPLEMENTATION GUIDELINE 37

(b) Project particle i's position back on the contact point cp

(c) Modify its velocity with (2.51)

3.3.6 Linked Cell Update



1. Compare particle i's current position with the extent of the cell ic

2. If particle i moved to di�erent cell kc

(a) Add particle i to cell kc

(b) Delete particle i from cell ic

3.3.7 Visualization

In order to visualize our simulation results, we either used the open-sourcevisualization software ParaView [Kit10] or the POV-Ray tool [Pov10]. Bothsimulation tools require a speci�c �le format and di�erent information to vi-sualize the particle-based simulation satisfactory. In each case, we need toperform the following steps:

• Every n-th SPH simulation step

1. Open a new visualization output �le in the desired format

2. If necessary, write out the information about the simulation con-tainer and possible obstacles

3. Iterate over each particle i in the particle vector p_vec

� Write out the �le format dependent information for particle i

� If necessary, use

r = 3√

3m

4πρ0(3.5)

as the radius to render the particles as equally sized spheres

4. Close the output �le


3.4 Parallel Programming using OpenMP

There are di�erent ways to parallelize an existing sequential program codelike the one we produced with the recently presented implementation guide-line. The design of the respective computer system is a very decisive elementfor the actual implementation of parallel algorithms. Just to mention a fewdi�erent computer architectures, there are cluster systems combining severalcomputing machines, single computing machines with multiple processors orsingle computing machines with a single processor comprising multiple cores.In addition, we have to distinguish between shared and distributed memoryarchitectures. For our parallelization purpose, we want to focus on the archi-tecture that is implemented in the majority of today's personal computers, asingle processor multicore shared memory system.

Figure 3.4: The task fork mechanism of OpenMP.Source: http://en.wikipedia.org/wiki/File:Fork_join.svg

The OpenMP API [Omp10] is primarily designed to enforce parallel pro-gramming in combination with shared memory systems. It is applicable onmultiple platforms and can be implemented in a rather simple way by addingspeci�c compiler directives to an existing program written in C, C++, or For-tran. Figure 3.4 depicts the main idea of the parallelization process usingOpenMP. Generally, the program is executed in the usual sequential mannerand processed by the master thread only. If a certain OpenMP parallelizationdirective is called from inside the code, a speci�ed number of tasks will sharethe upcoming workload until the end of the parallel section. There, the par-allel tasks merge into the master thread again. In combination with certainextensions or MPI, OpenMP can be used for the parallelization of distributedmemory systems as well. However, in the following examples we restrict to theuse of OpenMP only and therefore stick to shared memory systems.

3.4. PARALLEL PROGRAMMING USING OPENMP 39

Algorithm 3.4 General construct of the OpenMP for -directive.#include ...#pragma omp parallel for

Algorithm 3.5 Parallelization of a particle-loop using the OpenMP API for -construct.#pragma omp parallel for shared( p_vec )for( int i = 0; i < p_vec.size(); ++i ) {

Particle &p = p_vec[ i ];...

}

Obviously, the code parts mainly considered to be parallelized in our sim-ulation are the expensive particle- and cell-loops. The OpenMP API providesa particular construct to parallelize for -loops. In C and C++, any directive isspeci�ed using the #pragma mechanism. In Algorithm 3.4, there is a simpleexample how to parallelize the workload of the directly following for -loop. Infact, there are lots of di�erent directives and constructs to parallelize a sequen-tial piece of code with OpenMP, but for the sake of simplicity and usabilitywe restrict our implementation to the use of the loop-directive presented inalgorithm 3.4. For further details and possibilities concerning the OpenMPAPI, one can �nd the speci�cation in [Omp10]. In addition to the simple#pragma construct, we need to declare private and shared variables to ensurethat any parallel task does not mess around with another task's variables caus-ing segmentation faults or irregular work �ow. The private clause declares thecontained data elements to be private for each parallel task. This means thatfor each data element contained in the private list, a separate element will beautomatically allocated for each task. Hence, each task will be referencing an-other element guaranteeing the unique access to this variable only from withinthis particular task. Contrarily, the shared clause declares the contained dataelements to be shared by all parallel tasks. If any task modi�es the contentof a shared variable, every other task will possibly work with the modi�edvalue as well. The clauses cannot be applied arbitrarily but one has to thinkabout the way a certain data element needs to be declared within the parallelregion. However, we will examine the di�erent clauses by introducing our wayof applying the OpenMP constructs.

Starting with the simpler particle-loop necessary for the evaluation of thepressure �eld (2.19) and for the leap-frog scheme presented in Subsection 3.3.5,we just need to apply the code construct depicted in Algorithm 3.5 to paral-lelize each of the particle-loops. The particle container p_vec is explicitlydeclared as a shared variable because the parallel tasks are all accessing the


Algorithm 3.6 Parallelization of the cell-loop.

int ic [ DIM ];int kc [ DIM ];

#pragma omp parallel for shared( nc, c_vec ) private( ic, kc )for( int i = 0; i < nc[2]; ++i ) {

ic[2] = i;for( ic[1] = 0; ic[1] < nc[1]; ++ic[1] ) {

for( ic[0] = 0; ic[0] < nc[0]; ++ic[0] ) {...

}}

}

same data element. The particle alias p is automatically declared private be-cause it is de�ned inside the parallel region. Regarding the for -construct, theadvantage of the parallelism lies in the subdivision of the number range of theloop variable i in order to speed up the execution of the loop. While in the se-quential case the program needs to iterate p_vec.size() times in a row throughthe construct, the same is now done by several tasks processing certain chunksizes of the whole number range of i in parallel. However, we have to take careon our own that the calculations and assignments inside the parallel regionare actually adequate for parallelization. For example, mutual dependenciesof data elements or input/output operations are somehow di�cult to programin parallel or at least di�cult to control and are therefore mostly not worththe additional e�ort. Generally, this is the reason why we waive the paral-lelization of subsection 3.3.7. For the same reason, we do not parallelize theLinked Cell update presented in Subsection 3.3.6. If we need to move a certainparticle from one cell to another, we will mess around with the iterator of thecell's particle container by deleting the particle element. Without additionalserialization e�ort, we cannot guarantee that each of the other threads is stillworking properly.

Concerning the more delicate cell-iterations in our SPH solver, the basicappliance is similar to Algorithm 3.5. As we already discussed throughoutSubsection 3.1.1, we have ordered our Linked Cells lexicographically in thevector container c_vec. Hence, we iterate through our cell-domain in thesame manner. Algorithm 3.6 shows an exemplary possibility to parallelize ourcell-loop. The array ic is used to de�ne the currently respected cell's positionin the cell-domain, while kc is the iteration element for the neighbourhood ofcell ic. Both are declared as private variables to ensure that each parallel taskis only modifying its own versions and is able to step independently from theother tasks through the cell-neighbourhood. One restriction of the OpenMP

3.4. PARALLEL PROGRAMMING USING OPENMP 41

directives is that variables, that are de�ned outside the parallel region anddeclared private in the construct, have to be initialized inside the parallelsection anyway. Afterwards, the value of these prede�ned elements is unknown.Imagining our cell-domain as a three dimensional OBB, we can see clearlythat the presented cell-loop divides the space in its basic coordinates. In ourexample, only the number of cells in the third direction, the z-direction, issubdivided into parallel processed chunks.

Finally, we have not yet declared how many tasks we actually want to com-prise in the parallel work �ow. In the OpenMP API there are di�erent waysto de�ne the desired number of threads for the parallel regions. For this, wehave chosen to use the OpenMP function

void omp_set_num_threads( int num_threads ),

where num_threads is supposed to be a positive integer. The function iscomprised in the omp.h header �le and has to be called by the master threadto a�ect the subsequent parallel regions.

Chapter 4

Discussion of Results

As already mentioned repeatedly, the goal of this thesis is to achieve a properC++ implementation of the SPH method with particular focus on the reduc-tion of the computational costs and stability. In this chapter, we want to dealwith the outcome of our particle-based water simulator in terms of perfor-mance improvements using the introduced optimization techniques, particularsimulation issues coming across as well as a short conclusion about the generalquali�cation of the SPH method.

To provide a clearer understanding of the visible results of the �uid simu-lation and the actual �uid �ow, we present some visualizations of exemplaryimplemented scenarios. Figure 4.1 shows water particles falling into a glassbowl visualized at particular steps of the simulation time.

Figure 4.1a) shows the initial arrangement of the 2,197 water particles. Ac-celerated by gravity the particles start falling into the bowl (see 4.1b)). Theimpact of the surface tension force provides for the minimization of the �uid'ssurface causing a more or less spherical formation of the particle crowd. Inc) the amount of water spreads over the bottom ground of the bowl, while in4.1d) the steepness of the bowl's wall enforces the water to �ow back caus-ing the particles to mount up in the middle. During e) and f) the e�ect ofspreading and �owing back together is repeated and attenuates over time untilthe water comes to rest. Just to get a feeling of the real duration of such asmall-scale simulation, the elapsed real-time from a) to f) is approximately1.5 seconds. Another typical water scenario is the Breaking Dam simulationdepicted in Figure 4.2 using 3,000 particles. The general idea of this scenariois to place a certain body of a �uid in a fractional area of the whole simula-tion domain. After the �uid has come to rest, the panel that separates thedomain volume is removed. Consequently, the body of the �uid propagatesin the entire simulation domain. Regarding our visualization, in a) the damis broken (the panel is removed) and causes the body of water to �ow freelyin the open direction visualized in b). As the particles reach the solid right

43

44 CHAPTER 4. DISCUSSION OF RESULTS

a) b) c)

d) e) f)

Figure 4.1: Simulation of 2,197 water particles falling into a solid bowl.

wall in c), they are forming a wave spilling back to the opposite direction ind). In e) a second minor wave is formed, and again the water spills over tothe opposite wall in f). Obviously, these visualization scenes demonstrate anunusual behaviour of the �uid particles at the walls of the OBB. There, theparticles tend to adhere to the surface of the wall. We believe that this resultsfrom the applied no-slip boundary condition (2.51) as well as the suboptimalrectangular geometry. Further issues concerning the use of

Date post:	11-Feb-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

FRIEDRICH-ALEXANDER-UNIVERSITAT ERLANGEN-N URNBERGDas Implementierungskapitel als solches beinhaltet...

Documents