+ All Categories
Home > Documents > A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the...

A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the...

Date post: 01-Sep-2018
Category:
Upload: nguyenphuc
View: 214 times
Download: 0 times
Share this document with a friend
11
Abstract Mechanized tunnelling in soft ground is an intricate process which involves various interacting components such as the ground, the tunnel boring machine, the support measures such as the face support and the tail-void grouting. These components and their mutual interactions need to be considered in numerical simulations [5]. To dis- cretize this spatial-temporal system, a LBB-compatible mixed formulation is used for space discretization and the generalized-α method for temporal discretization [4]. This results in an ill-conditioned system which poses substantial difficulties for itera- tive solvers to converge [1]. In this paper, a linear solver for a distributed computing environment is presented to solve the resulting discretized system. In the numerical benchmark, a direct solver employing MPI is tested to investigate the weak scalabil- ity and performance. Also, an iterative solver with standard ILU preconditioner and alternatively, block preconditioner is investigated. The objective is to choose the best solver configuration for shield tunnelling simulations for various problem sizes. The benchmark is run on a local small cluster for easy customization and optimization for job-run configuration. Keywords: partially saturated soils, shield tunnelling, block preconditioning. 1 Introduction Parallel computation for coupled multiphase problem is still under active development in the community, see, e.g. [10] and [11]. In this work, the authors explored the seg- gregated method to solve the resulting linear system. Mathematically, the multiphase coupled problem represents a saddle point problem. Hence a saddle point solver with primal and dual correction steps is employed. Thus, the formulation is homogeneous. In the preconditioning step, the multigrid method is used to coarsen the mesh and 1 Paper 36 A Study on the Performance of Parallel Solvers for Coupled Simulations of Partially Saturated Soils in Tunnel Engineering Applications G. Bui and G. Meschke Institute for Structural Mechanics Ruhr University Bochum, Germany Civil-Comp Press, 2015 Proceedings of the Fourth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, P. Iványi and B.H.V. Topping (Editors), Civil-Comp Press, Stirlingshire, Scotland.
Transcript
Page 1: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

Abstract

Mechanized tunnelling in soft ground is an intricate process which involves variousinteracting components such as the ground, the tunnel boring machine, the supportmeasures such as the face support and the tail-void grouting. These components andtheir mutual interactions need to be considered in numerical simulations [5]. To dis-cretize this spatial-temporal system, a LBB-compatible mixed formulation is usedfor space discretization and the generalized-α method for temporal discretization [4].This results in an ill-conditioned system which poses substantial difficulties for itera-tive solvers to converge [1]. In this paper, a linear solver for a distributed computingenvironment is presented to solve the resulting discretized system. In the numericalbenchmark, a direct solver employing MPI is tested to investigate the weak scalabil-ity and performance. Also, an iterative solver with standard ILU preconditioner andalternatively, block preconditioner is investigated. The objective is to choose the bestsolver configuration for shield tunnelling simulations for various problem sizes. Thebenchmark is run on a local small cluster for easy customization and optimization forjob-run configuration.

Keywords: partially saturated soils, shield tunnelling, block preconditioning.

1 Introduction

Parallel computation for coupled multiphase problem is still under active developmentin the community, see, e.g. [10] and [11]. In this work, the authors explored the seg-gregated method to solve the resulting linear system. Mathematically, the multiphasecoupled problem represents a saddle point problem. Hence a saddle point solver withprimal and dual correction steps is employed. Thus, the formulation is homogeneous.In the preconditioning step, the multigrid method is used to coarsen the mesh and

1

Paper 36 A Study on the Performance of Parallel Solvers for Coupled Simulations of Partially Saturated Soils in Tunnel Engineering Applications G. Bui and G. Meschke Institute for Structural Mechanics Ruhr University Bochum, Germany

Civil-Comp Press, 2015 Proceedings of the Fourth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, P. Iványi and B.H.V. Topping (Editors), Civil-Comp Press, Stirlingshire, Scotland.

Page 2: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

standard V-cycle is used. Here, the author makes an attempt to apply the multigridmethod to second order mesh, which is known to not exhibit stable performance, dueto an insufficient coarse-fine grid transfer operator. The efficiency of their methodis acceptable, however, a significant parallel overhead appears when the number ofdegree of freedoms is less than 100.000.

The application of block preconditioners to coupled-soil flow problems is inves-tigated in [9]. The formulation uses a saturated soil model, which is known to benumerically more stable as compared to the formulation for partially saturated soils.A stabilization scheme is used to circumvent the inf-sup condition, thus enablingequal order approximations for both fields. The obtained results using multigrid sub-preconditioners exhibit stable behaviour, and scale up to 4000 processors. The resultsalso exhibit weak sensibility with respect to soil permeability.

In this contribution, the first section is devoted to an overview of the current phys-ical problem at hand. The state equations of the underlying soil-groundwater flow in-teraction is described, which then leads to a space-time discretization for the discretestate update equations. These state update equations represent a substantial amount ofsystem modelling behind. For example, contact between ground and the shield of thetunnel boring machine (TBM), the tunnel lining and the tail void grouting installedafter each excavation step or the hydraulic jacks, connecting the shield and the lining,which are used for the advancement of the TBM. Details of the numerical simulationmodel for mechanized tunneling are contained in [5]. This tunnel model EKATE isimplemented within the object oriented finite element package KRATOS [3]. To sim-plify the description in the following section, these components related to mechanicalquantities (i.e. displacements) are summarized within the solid block in Equation (5).

The second section addresses possible techniques to improve the numerical sta-bility of the linear solving process and to customize these techniques to the specificproperties of the tunnel problem to enhance the efficiency of the analysis. Althoughthe investigated techniques are discussed broadly in the literature, specific applica-tion to real problems is still limited. This contribution makes an attempt to combinethose techniques into a single framework to enable a realistic simulation of tunneladvance in soft soils and to enhance the parallel efficiency. Most notably, as will bedemonstrated in Section 3.3, the a priori re-ordering of the system helps to save therepeated re-ordering for the solving phase. In addition, the iterative scaling procedureexhibits superior performance compared to other traditional scaling techniques suchas left/right scaling. The scaling technique is particularly meaningful in the presenttunnel engineering application since the shield is fabricated by a stiff material and hasa much higher Young modulus than the soil.

The last section presents two numerical studies using tunnel data from real projects.The analysis is run on a local cluster with one master node and two computing nodes.Since the computing nodes are multicore, the hybrid parallelization approach is used.Firstly, the system matrix is assembled on the master node, then it is distributed to thecomputing nodes for linear solving process. Since the mesh is continuously adaptedduring the analysis, the mesh decomposition process is prone to error of synchroniza-

2

Page 3: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

tion on the ghost domains and unbalancing. Therefore, a monolithic approach is used,which turns out to be competitive, since the assembly is quite fast compared to the lin-ear solving process. The employed network is Infiniband, which enables high speedre-distribution of large amount of data.

2 Problem description and finite element discretization

The governing equation of coupled soil-flow interaction can be represented by thebalance of momentum equation

div σ + ρg = 0 , (1)

the mass balance of the pore water

n∂sSw

∂t+ div vws + Swdiv us = 0 , (2)

and the mass balance of the air phase in the pore space of partially saturated soils

nSa

%a∂s%a

∂t+ n

∂sSa

∂t+

1

%agrad %a · vas + div vas + Sadiv us = 0. (3)

Details of the involved parameters in the balance equations are fully described in [4]and are therefore omitted here.

Herein, only the fully discretized system of equation after spatial and temporaldiscretization are described. In Figure 1, an illustration of the underlying three phasedescription, and the material model describing the solid skeleton behavior are shown.Using the generalized-α time integration scheme [2], the system of equation for three-

a) b)

σ3

σ2

σ1

σ 1=σ 2

=σ 3

CSL

p0p'

Critical State Line

yield surfa

ce

q

current configurationreference configuration

ϕs

ϕw

ϕa

X x

soil skeletonpore waterpore air

Figure 1: a) Three-phase formulation for unsaturated soil, (b) Illustration of the elasto-plastic soil material model [4]

3

Page 4: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

phasic soil-groundwater flow interaction is given as

∆us

∆pw

∆pa

n+1

=

Kmu Kmw Kma

Kwu Kww Kwa

Kau Kaw Kaa

(1− αf ) +

Dmu Dmw Kma

Dwu Dww Kwa

Dau Daw Kaa

(1− αf )γ

β∆t

−1

n+1−α

×

Rmext −Rm

int

Rwext −Rw

int

Raext −Ra

int

(4)

When there is no air phase involved, i.e. in case of a fully saturated soil, the stateupdate equation reduces to a block 2x2 matrix. Nevertheless, the physical characterof the fluid phase allows to group water phase and air phase in the resulting system.This leads to Equation (5), which is used for designing the block preconditioner.

[∆us

∆pw

]n+1

=1

1− αf

([Kmu Kmw

Kwu Kww

]+

[Dmu Dmw

Dwu Dww

β∆t

)−1

n+1−α

×[Rmext −Rm

int

Rwext −Rw

int

] (5)

3 Linear solver techniques

3.1 Scaling strategy to improve the system conditioning

The scaling strategy is usually used to improve the condition of badly-scaled linearsystem. In this work, the scaling strategy according to [6] is used to improve the con-ditioning of the resulting linear system. The idea of the algorithm is to asymptoticallyscale the ∞-norm of each row and column of the matrix to unity. It is described inFigure 2.

After the left- and right-scaling vector is obtained, the system matrix and vector isscaled accordingly by

A = DLADR (6)b = DLb (7)

After the system has been solved, the solution vector is obtained by back scaling

x = DRx (8)

In Section 3.1, the effectiveness of this scaling strategy is shown for a fully saturated(two phase) problem. The number of iteration cycles and the condition number of thesystem matrix after scaling are acceptable for all problem sizes.

4

Page 5: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

Figure 2: Flowchart of the iterative scaling algorithm.

Total unknowns 26788 57952 128008Number of iterations 7 6 6Condition number 3.62165e+19 4.64968e+19 5.62663e+19Condition number (after scaling) 17508.6 21989.5 28856.9

Table 1: Example of scaling strategy

3.2 Block preconditioner

The algebraic form of the block preconditioner has been carefully studied in [8] and[1]. Therefore, only the final form is given here:

P−1 =

[P−1A 0

−P−1S B2P

−1A P−1

S

](9)

The block preconditioner relies on two sub-preconditioners to compute the approxi-mate inverse of the solid skeleton matrix (P−1

A ) and the Schur complement P−1S result-

ing from the algebraic LU decomposition of the block linear system. Preconditioningthe Schur complement is tricky, since it is a dense matrix by default. In this work, theSchur complement is approximated by Equation (10)

SD = C −B2diag(A)−1B1 . (10)

In Section 3.2, results from computations using the block preconditioner in conjunc-tion with various sub-preconditioners obtained for the tunneling example shown inFigure 4 are summarized.

5

Page 6: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

Strategy Ref Average iteration Average time (s)BICGSTAB-BP-ILU0 1 18.5 0.16BICGSTAB-BP-AMG 1 22.8 1.50BICGSTAB-BP-INEXACT 1 10.6 3.78BICGSTAB-BP-EXACT 1 1.00 6.19BICGSTAB-BP-ILU0 2 40.4 2.26BICGSTAB-BP-AMG 2 53.4 45.29BICGSTAB-BP-INEXACT 2 17.0 29.86BICGSTAB-BP-EXACT 2 1.00 190.02BICGSTAB-BP-ILU0 3 91.8 35.70

Table 2: Computation results using block preconditioner

For verification of the implementation, the computed results confirm that the blockpreconditioner achieves convergence in one step, if the preconditioner is the exactinverse of the block matrix. The table shows, that the performance in terms of numberof iterations is slightly better with Incomplete LU sub-preconditioner. Meanwhile, thesetup cost is relatively high for the multigrid sub-preconditioner. These results alsoexhibit mesh sensitivity. Thus, we see that the performance of sub-preconditioners isthe key factor for the aggregated performance of the block preconditioner.

3.3 System re-ordering

Re-ordering of the sparse linear system often reduces fill-ins of the LU decompositionprocess [7]. By that, the computing speed and the memory consumption of the directsolver is reduced. Re-ordering is standard in many direct solver packages such asPARDISO, Trilinos, etc. However, re-ordering can be a time consuming process ifserial re-ordering algorithms are employed. In the present implementation, the re-ordering is performed a priori, even though the simulation involves an adaptation ofthe mesh during the analysis.

The re-ordering is performed at the kernel level where each degree of freedom ateach node is accounted and re-ordered before the first matrix assembly. By that, there-ordering information is reused implicitly and no more re-ordering is required forthe solver. The applied method for re-ordering is the Reverse Cuthill McKee (RCM)re-ordering algorithm. An illustration of system re-ordering is contained in Figure 3.

4 Numerical validation

4.1 Benchmark problems and discretization

In the first benchmark, a simplified tunnel model concerned with the excavation of astraight tunnel (diameter 8.5 m, overburden 17 m) is used to test the employed algo-

6

Page 7: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

Figure 3: Matrix sparsity pattern before (left) and after RCM re-ordering (right).

rithms. Here, an elastic constitutive model is employed for the behavior of the soilskeleton. A two-phase model is used for the representation of the fully saturated soilbehavior, using a permeability of 4.4×10−4m2. The discretization of model is shownin Figure 4.

Figure 4: Discretizations used for the first benchmark (16, 32 and 48 excavationsteps).

In Figure 5, the speed-up of the iterative linear solver with configurations describedin Section 3 is plotted in comparison with the direct solver (PARDISO). The red-column shows the result when the iterative solver was used without scaling or re-ordering. In fact, with the parallel matrix-vector multiplication, the iterative solvercould outperform the direct solver for large meshes. When coupled with the scalingtechnique, the speed-up reduces slightly. This is due to the computational expense ofthe scaling phase. However, as shown in the third graph of Figure 5, these computa-tional costs reduce with increasing number of degrees of freedom.

The speed-up when the iterative solver is coupled with system re-ordering is quitesignificant. Since the preconditioner is based on an Incomplete LU method, the re-ordering gives rise to fewer fill-ins during numerical factorization, which leads to an

7

Page 8: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

increase of the speed-up.

Figure 5: Speed-up results for the first benchmark.

Figure 6: Discretization for second benchmark problem (96 excavation steps).

In the second benchmark, a full tunnel model with 96 excavation steps for the ma-chine driven tunnel excavation is simulated. Here, a partially saturated soil is assumed.The tunnel boring machine as well as the linings and the grouting of the so-called tailvoid gap (i.e. the gap between the soil and the tunnel lining) are considered (see [5]).The number of degrees of freedom in this model is approximately 1 million. To solvethis system, a parallel direct solver for distributed environment (MUMPS) is used.This solver employs MPI to handle communication between computing nodes. The

8

Page 9: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

Figure 7: Speed up of parallel direct solver for the second benchmark.

Number of processes 8 16 32 64 128Average computing time 1255.9 775.1 581.7 929.7 3459.7

Table 3: Performance of parallel direct solvers for 8 to 128 processes.

interface to this solver is provided by the Trilinos solver package. The discretization ofthis benchmark is shown in Figure 6. According to Figure 7 and Section 4.1, the paral-lel speed-up of the parallel direct solver shows a peak at 32 processes. When the MPIsize becomes large, the solver becomes significantly slower. The bottleneck can comefrom different sources, either from the job management system or the data movementwithin the solver. For our problem size, when running with 32 processes, we ob-serve a full occupation of system memory. Running with 64 and 128 processes, thememory exchange between processes significantly increases, hence the performanceis reduced.

5 Conclusions

This contribution has investigated the parallel performance of different techniques forhigh performance tunnel engineering simulations involving sequential excavation oftunnels by means of a tunnel boring machine in soft soils, which are assumed to bepartially or fully saturated. For the soil, a multiphase model in the framework of thetheory of porous media with displacements and the gas and the pore water pressure asnodal degrees of freedom was employed. It was shown, that the scaling technique andsystem re-ordering technique have performed well and enhance the numerical stabil-ity of the results. The system re-ordering enabled higher performance compared to ahighly-optimised direct solver (PARDISO) on the multicore environment. The blockpreconditioner has performed satisfactorily, however to some extent, its performanceis largely affected by the sub-preconditioners. Since second order approximationswere used for the displacements, the multigrid sub-preconditioners exhibit a poor per-formance.

9

Page 10: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

For large tunnel analysis, the parallel direct solver for distributed environment(MUMPS) has performed well. Its scalabilty is limited by the number of processesand system size. Increasing the number of processes along with larger system size issubjected for further analysis. Currently, the tunnel simulation achieved best speed upwith 32 processes. The investigation to enable parallel iterative solver for distributedenvironment is on-going, thus enables for further speed-up and reduces the communi-cation overhead.

For future consideration, the finite element tearing and interconnecting (FETI)technique will be investigated, i.e. solving the tunnel in small segments and prop-agate the results globally. Although this type of substructuring schemes infer somedifficulties for coupling the interfaces and eliminating the rigid body motion, it will beinvestigated in view of the specific features of tunnelling simulations, characterizedby sequential excavation steps and different requirements for resolution in differentparts of the numerical model.

References

[1] G. Bui, J. Stascheit, and G. Meschke. A parallel block preconditioner for cou-pled simulations of partially saturated soils in finite element analyses. In B.H.V.Topping and P. Ivanyi, editors, The Third International Conference on Parallel,Distributed, Grid and Cloud Computing for Engineering, page paper 24. Civil-Comp, 2013.

[2] J. Chung and G.M. Hulbert. A time integration algorithm for structural dynamicswith improved numerical dissipation: The generalized α-method. Journal ofApplied Mechanics, 60:371–375, 1993.

[3] Pooyan Dadvand, Riccardo Rossi, and Eugenio Onate. An object-oriented envi-ronment for developing finite element codes for multi-disciplinary applications.Archives of Computational Methods in Engineering, 17:253–297, 2010.

[4] F. Nagel and G. Meschke. An elasto-plastic three phase model for partiallysaturated soil for the finite element simulation of compressed air support in tun-nelling. International Journal for Numerical and Analytical Methods in Geome-chanics, 34:605–625, 2010. doi:10.1002/nag.828.

[5] F. Nagel, J. Stascheit, and G. Meschke. Process-oriented numerical simulationof shield tunneling in soft soils. Geomechanics and Tunnelling, 3(3):268–282,2010.

[6] Daniel Ruiz. A scaling algorithm to equilibrate both rows and columns in ma-trices. Technical report, Computational Science and Engineering Department,Rutherford Appleton Laboratory, 2001.

[7] Yousef Saad. Iterative Methods for Sparse Linear Systems. 3 2000.[8] J. A. White and R. I. Borja. Stabilized low-order finite elements for coupled

solid-deformation/fluid-diffusion and their application to fault zone transients.Computer Methods in Applied Mechanics and Engineering, 197:4353–4366,2008.

10

Page 11: A Study on the Performance of Parallel Solvers for … · advance in soft soils and to enhance the parallel efficiency. ... tion with various sub-preconditioners obtained for the

[9] Joshua A. White and Ronaldo I. Borja. Block-preconditioned newton–krylovsolvers for fully coupled flow and geomechanics. Computational Geosciences,pages 1–13, 2011.

[10] C. Wieners, M. Ammann, S. Diebels, and W. Ehlers. Parallel 3-D simulations forporous media models in soil mechanics. Computational Mechanics, 29:75–87,2002.

[11] Christian Wieners, Martin Ammann, Wolfgang Ehlers, and Tobias Graf. Parallelkrylov methods and the application to 3-d simulations of a triphasic porous mediamodel in soil mechanics. Comput Mech, 36:409–420, 2005.

11


Recommended