+ All Categories
Home > Documents > A Comparison of Gradient- and Hessian-Based Optimization...

A Comparison of Gradient- and Hessian-Based Optimization...

Date post: 31-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
18
A Comparison of Gradient- and Hessian-Based Optimization Methods for Tetrahedral Mesh Quality Improvement Shankar Prasad Sastry 1 and Suzanne M. Shontz 1 Department of Computer Science and Engineering, The Pennsylvania State University University Park, PA 16802 sps210,[email protected] 1 Introduction Discretization methods, such as the finite element method, are commonly used in the solution of partial differential equations (PDEs). The accuracy of the computed solution to the PDE depends on the degree of the approx- imation scheme, the number of elements in the mesh [1], and the quality of the mesh [2, 3]. More specifically, it is known that as the element dihedral angles become too large, the discretization error in the finite element solution increases [4]. In addition, the stability and convergence of the finite element method is affected by poor quality elements. It is known that as the angles become too small, the condition number of the element matrix increases [5]. Recent research has shown the importance of performing mesh quality im- provement before solving PDEs in order to: (1) improve the condition num- ber of the linear systems being solved [6], (2) reduce the time to solution [7], and (3) increase the solution accuracy. Therefore, mesh quality improvement methods are often used as a post-processing step in automatic mesh genera- tion. In this paper, we focus on mesh smoothing methods which relocate mesh vertices, while preserving mesh topology, in order to improve mesh quality. Despite the large number of papers on mesh smoothing methods (e.g., [8, 9, 10, 11, 12, 13, 14]), little is known about the relative merits of using one solver over another in order to smooth a particular unstructured, finite element mesh. For example, it is not known in advance which solver will converge to an optimal mesh faster or which solver will yield a mesh with better quality in a given amount of time. It is also not known which solver will most aptly handle mesh perturbations or graded meshes with elements of heterogeneous volumes. The answers may likely depend on the context. For This work was funded in part by NSF grant CNS 0720749, a Grace Woodward grant from The Pennsylvania State University, and an Institute for CyberScience grant from The Pennsylvania State University.
Transcript
Page 1: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

A Comparison of Gradient- and Hessian-Based

Optimization Methods for Tetrahedral MeshQuality Improvement�

Shankar Prasad Sastry1 and Suzanne M. Shontz1

Department of Computer Science and Engineering,The Pennsylvania State UniversityUniversity Park, PA 16802sps210,[email protected]

1 Introduction

Discretization methods, such as the finite element method, are commonlyused in the solution of partial differential equations (PDEs). The accuracyof the computed solution to the PDE depends on the degree of the approx-imation scheme, the number of elements in the mesh [1], and the quality ofthe mesh [2, 3]. More specifically, it is known that as the element dihedralangles become too large, the discretization error in the finite element solutionincreases [4]. In addition, the stability and convergence of the finite elementmethod is affected by poor quality elements. It is known that as the anglesbecome too small, the condition number of the element matrix increases [5].

Recent research has shown the importance of performing mesh quality im-provement before solving PDEs in order to: (1) improve the condition num-ber of the linear systems being solved [6], (2) reduce the time to solution [7],and (3) increase the solution accuracy. Therefore, mesh quality improvementmethods are often used as a post-processing step in automatic mesh genera-tion. In this paper, we focus on mesh smoothing methods which relocate meshvertices, while preserving mesh topology, in order to improve mesh quality.

Despite the large number of papers on mesh smoothing methods (e.g., [8,9, 10, 11, 12, 13, 14]), little is known about the relative merits of usingone solver over another in order to smooth a particular unstructured, finiteelement mesh. For example, it is not known in advance which solver willconverge to an optimal mesh faster or which solver will yield a mesh withbetter quality in a given amount of time. It is also not known which solverwill most aptly handle mesh perturbations or graded meshes with elements ofheterogeneous volumes. The answers may likely depend on the context. For� This work was funded in part by NSF grant CNS 0720749, a Grace Woodward

grant from The Pennsylvania State University, and an Institute for CyberSciencegrant from The Pennsylvania State University.

Page 2: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

632 S.P. Sastry and S.M. Shontz

example, one solver may find an approximate solution faster than the others,whereas another solver may improve the quality of meshes with heterogeneouselements more quickly than its competitors.

To answer the above questions, we use Mesquite [15], a mesh quality im-provement toolkit, to perform a numerical study comparing the performanceof several local mesh quality improvement methods to improve the global ob-jective function representing the overall mesh quality as measured with var-ious shape quality metrics. We investigate the performance of the followinggradient-based methods: steepest descent [16] and Fletcher-Reeves conjugategradient [16], and the following Hessian-based methods: quasi-Newton [16],trust-region [16], and feasible Newton [17]. Mesh quality metrics used in thisstudy include the aspect ratio [18], inverse mean ratio [19, 20], and vertex con-dition number metrics [21]. The optimization solvers are compared on the basisof efficiency and ability to smooth several realistic unstructured tetrahedral fi-nite element meshes to both accurate and inaccurate levels of mesh quality. Weused Mesquite in its native state with the default parameters. Only Mesquitewas employed for this study so that differences in solver implementations, datastructures, and other factors would not influence the results.

In this paper, we report the results of an initial exploration of the factorsstated above to determine the circumstances when the various solvers maybe preferred over the others. In an effort to make the number of experimentsmanageable, we limit the number of free parameters. Hence, we consider afixed mesh type and objective function. In particular, we use unstructuredtetrahedral meshes and an objective function which sums the squared quali-ties of individual tetrahedral elements. The free parameters we investigate arethe problem size, initial mesh configuration, heterogeneity in element volume,quality metric, and desired degree of accuracy in the improved mesh.

The main results of this study are as follows: (1) the behavior of the opti-mization solvers is influenced by the degree of accuracy desired in the solutionand the size of the mesh; (2) most of the time, the gradient-based solvers ex-hibited superior performance compared to that of the Hessian-based solvers;(3) the rank-ordering of the optimization solvers depends on the amountof random perturbation applied; (4) the rank-ordering of the optimizationsolvers is the same for the affine perturbation meshes; (5) the rank-orderingof the majority of the solvers is the same for graded meshes; however, the rankof conjugate gradient is a function of time; (6) graded meshes are sensitiveto changes in the mesh quality metric.

2 Problem Statement

2.1 Element and Mesh Quality

Let V and E denote the vertices and elements, respectively, of an unstruc-tured mesh, and let |V | and |E| denote the numbers of vertices and elements,respectively. Define VB and VI to be the set of boundary and interior mesh

Page 3: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

A Comparison of Gradient- and Hessian-Based Optimization Methods 633

vertices. Let xv ∈ Rn denote the coordinates for vertex v ∈ V . For the pur-

poses of this paper, n = 3. Denote the collection of all vertex coordinates byx ∈ R

n×|V |. Let e be an element in E. Finally, let xe ∈ Rn×|e| the matrix of

vertex coordinates for e.We associate with the mesh a continuous function q : R

n×|e| → R tomeasure the mesh quality as measured by one or more geometric propertiesof elements as a function of their vertex positions. In particular, let q(xe)measure the quality of element e. We assume a smaller value of q(xe) indi-cates a better quality element. A specific choice of q is an element qualitymetric. There are various metrics to measure shape, size, and orientation ofelements [22].

The overall quality of the mesh is a function of the individual elementqualities. The mesh quality depends on both the choice of the element qualitymetric q and the function used to combine them.

2.2 Aspect Ratio Quality Metric

An important parameter in this study is the choice of mesh quality metric. Ingeneral, we expect that the results could vary significantly depending on thechoice of mesh quality metric. Thus, we consider three mesh quality metricsin this study, starting with the aspect ratio.

Various formulas have been used to compute the aspect ratio. The aspectratio definition we employ is the one implemented in Mesquite. In particular,it is the average edge length divided by the normalized volume. Thus fortetrahedra, the aspect ratio is defined as follows:

(l21 + l22 + · · ·+ l26

6

)

/

(

vol× 12√2

)

,

where li, i = 1, 2, . . . , 6 represent the six edge lengths, and vol represents itsvolume.

2.3 Inverse Mean Ratio Quality Metric

In order to derive the inverse mean ratio mesh quality metric, we let a, b, c,and d denote the four vertices of a tetrahedron labeled according to the right-hand rule. Next, define the matrix A by fixing the vertex a and denoting bye1, e2, and e3 the three edge vectors emanating from a towards the remainingthree vertices. Then, A = [e1; e2; e3] = [b − a; c − a; d − a]. Next, defineW to be the incidence matrix for the ideal element which is an equilateraltetrahedron in the isotropic case. In this case,

W =

⎜⎝

1 12

12

0√

32

√3

6

0 0√

2√3

⎟⎠ .

Page 4: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

634 S.P. Sastry and S.M. Shontz

Next, let T = AW−1 transform the ideal element to the physical element.Finally, the inverse mean ratio of a tetrahedral element is as follows:

‖T ‖2F3|det (T ) | 23 .

2.4 Vertex Condition Number Quality Metric

In order to specify the vertex condition number quality metric, we first definesome notation. Let x be any vertex of an element. Let xk denote the kth

neighboring vertex, for k = 1, 2, . . . , n. Define k edge vectors ek = xk − x.Then the Jacobian of the element is given by the matrix A = [e1 e2 · · · en] .Using A, we can define its vertex condition number as follows:

‖A‖F ‖A−1‖F ,where ‖ · ‖F denotes the Frobenius matrix norm.

All three mesh quality metrics range from 1 (for an equilateral tetrahedron)to ∞ (for a degenerate element). Invalid elements can be detected by theinverse mean ratio mesh quality metric when a complex value results.

2.5 Quality Improvement Problem

To improve the overall quality of the mesh, we assemble the local elementqualities as follows:Q =

∑e q(xe)2, whereQ denotes the overall mesh quality,

and q(xe) is the quality of element e. We compute an x∗ ∈ Rn×|V | such that

x∗ is a locally optimal solution to

minxQ(x) (1)

subject to the constraint that xvB = xvB , where xVB are the initial bound-ary vertex coordinates. In addition, we require that the initial mesh andsubsequent meshes to be noninverted. This translates to the constraintdet(A(i)) > 0 for every element. In order to satisfy the two constraints,Mesquite fixes the boundary vertices and explicity checks for mesh inversionat each iteration.

3 Improvement Algorithms

In this paper, we consider the performance of five numerical optimizationmethods, namely, the steepest descent, conjugate gradient, quasi-Newton,trust-region, and feasible Newton methods, as implemented in Mesquite. Thesteepest descent and conjugate gradient solvers are gradient-based, whereasthe remaining three are Hessian-based, i.e., they employ both the gradientand Hessian in the step computation. We describe each method below.

Page 5: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

A Comparison of Gradient- and Hessian-Based Optimization Methods 635

3.1 Steepest Descent Method

The steepest descent method [16] is a line search technique which takes astep along the direction pk = −∇f(xk) at each iteration. In Mesquite thesteplength, αk, is chosen to satisfy the Armijo condition [23], i.e.,

f(xk + αkpk) ≤ f(xk) + c1αk∇f(xk)T pk

for some constant c1 ∈ (0, 1), which ensures that the step yields sufficientdecrease in the objective function.

3.2 Conjugate Gradient Method

The conjugate gradient method [16] is a line search technique which takes astep in a direction which is a linear combination of the negative gradient atthe current iteration and the previous direction, i.e.,.

pk = −∇f(xk) + βkpk−1,

where p0 = −∇f(x0). Conjugate gradient methods vary in their computa-tion of βk. The Fletcher-Reeves conjugate gradient method implemented inMesquite computes

βFRk =

∇f(xk)T∇f(xk)∇f(xk−1)T∇f(xk−1)

.

Care is taken in the line search employed by Mesquite to compute a steplengthyielding both a feasible step (i.e., one which does not result in a tangledmesh) and an approximate minimum of the objective function along the lineof interest.

3.3 Quasi-Newton Method

Quasi-Newton methods [16] are line search (or trust-region) algorithms whichreplace the exact Hessian in Newton’s method with an approximate Hessianin the computation of the Newton step. Thus, quasi-Newton methods solveBkpk = −∇f(xk), for some Bk ≈ ∇2f(xk) at each iteration in an attempt tofind a stationary point, i.e., a point where ∇f(x) = 0. The quasi-Newton im-plementation in Mesquite [15] is a line search that approximates the Hessianusing the gradient and true values of the diagonal blocks of the Hessian.

3.4 Trust-Region Method

Trust-region methods [16] are generalizations of line search algorithms in thatthey allow the optimization algorithm to take steps in any direction providedthat the steps are no longer than a maximum steplength. Steps are computedby minimizing a quadratic model of the function over the trust region. Thetrust region is expanded or contracted at each iteration depending upon howreflective the model is of the objective function at the given iteration.

Page 6: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

636 S.P. Sastry and S.M. Shontz

3.5 Feasible Newton Method

The feasible Newton method [17] is a specialized method for mesh qualityimprovement. In particular, it uses an inexact Newton method [24, 16] with anArmijo line search [23] to determine the direction in which to move the vertexcoordinates. At each iteration, the algorithm solves the Newton equations viaa conjugate gradient method with a block Jacobi preconditioner [24]. Thesolver also obtains good locality of reference.

4 Numerical Experiments

In this section, we report results from four numerical experiments designedto determine when each of the five solvers are preferred according to theirtime to convergence for local mesh smoothing. All solvers are implemented inMesquite 2.0, the Mesh Quality Improvement Toolkit [15], and were run withtheir default parameter values. We solve the optimization problem (1) on aseries of tetrahedral meshes generated with the CUBIT [25] and Tetgen [26]mesh generation packages. We consider the following geometries: distduct,foam, gear, hook [27] and cube. Sample meshes are shown in Figure 1. Inthe first three experiments, we study the effects of three different problemparameters on the time taken to reach x∗, a locally optimal solution. Theproblem parameters of interest are: problem size, initial mesh configuration,and grading of mesh elements. For each of the three parameters studied,we create a set of test meshes in which we isolate the parameter of interestand allow it to vary; these experiments were inspired by [28, 29]. Particularattention was paid to ensure that the remaining parameters were held asconstant as possible. Due to space limitations, we have omitted most of thetables of initial mesh quality statistics which demonstrate this. In the fourthexperiment, we investigate the effect that mesh quality metric has on solverperformance.

Because the objective functions used for our experiments are non-convex,the optimization techniques may converge to different local minima. To ensurethat this did not effect our study, we verified for each experiment whetheror not the solvers converge to the same optimal mesh by comparing vertexcoordinates of the optimal meshes.

In the following subsections, we describe the problem characteristics ofthe test meshes in terms of the numbers of vertices and elements, initialmesh quality (according to the mesh quality metric of interest), and param-eter values of interest (such as magnitude of perturbation). We then specifyperformance results for the five optimization solvers. In all cases, the solu-tion is considered optimal when it has converged to six significant digits.The machine employed for this study is equipped with an Intel P4 processor(2.67 GHz). The 32-bit machine has 1GB of RAM, a 512KB L2 cache, andruns Linux.

Page 7: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

A Comparison of Gradient- and Hessian-Based Optimization Methods 637

(a) Gear mesh (b) Foam mesh (c) Distduct mesh

(d) Hook mesh (e) Cube mesh

Fig. 1. Sample meshes on the gear, foam, distduct, hook, and cube geometries.Geometries (a)-(d) were provided to us by Dr. Patrick Knupp of Sandia NationalLaboratories [27].

4.1 Increasing Problem Size

To test the effect that increasing the problem size has on optimization solverperformance, we used CUBIT to generate a series of tetrahedral meshes withan increasing number of vertices while maintaining uniform mesh qualityand element size. A series of meshes were generated for the distduct, foam,gear, and hook geometries shown in Figures 1(a) through 1(d); for each seriesof meshes, the number of elements is increased from approximately 5000 to500,000 elements.

In the creation of the test meshes, care was taken to ensure that, for eachmesh geometry, we achieve our goal of maintaining roughly uniform elementsize and mesh quality distributions. Table 1 shows the initial and final aspectratio quality before and after conjugate gradient method was applied on threeof the meshes. Such changes in mesh quality were typical of the results seenin this experiment.

For each mesh geometry, when the aspect ratio mesh quality metric wasemployed, the time to convergence required increased linearly with an in-crease in problem size. Figure 2 illustrates this trend for the use of the var-ious solvers on the distduct geometry. Solver behavior was identical on theremaining geometries; in particular, the solvers also converged to the sameoptimal meshes. Thus, additional figures have been omitted. This is expectedas the number of iterations to convergence is more or less a constant, andthe time per iteration increases linearly with the number of elements used for

Page 8: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

638 S.P. Sastry and S.M. Shontz

Table 1. Initial and final mesh quality after smoothing the distduct mesh with theconjugate gradient method using the aspect ratio mesh quality metric

Distduct Mesh Mesh Quality (Aspect Ratio)# Vertices # Elements Phase min avg rms max std dev

1,262 5,150Initial 1.00557 1.33342 1.35118 2.71287 0.218363Final 1.00077 1.27587 1.28932 2.83607 0.185684

19,602 99,895Initial 1.0007 1.28014 1.29531 10.3188 0.197718Final 1.00065 1.21742 1.22755 4.8624 0.157424

92,316 498,151Initial 1.00009 1.27055 1.28513 18.5592 0.193054Final 1.00004 1.18949 1.1977 18.5592 0.139968

local mesh smoothing. There are instances where a deviation from linearityis seen in larger meshes. These are likely due to limitations on the size of themesh which can fit in the cache; small meshes may fit entirely in the cache,whereas larger meshes may only partially fit in the cache.

We now examine the behavior of the various solvers on the distduct mesheswith the use of the aspect ratio quality metric. For engineering applications,a highly accurate solution is not often needed or even desired. Thus, we con-sider partially-converged as well as fully-converged solutions. In each case, weconsider smoothing with 85%, 90%, and 100%-converged solutions; the resultsare shown in Figure 2. The legend for the remaining plots in the paper is asfollows: ‘circle’ (steepest descent), ‘triangle’ (conjugate gradient), ‘diamond’(quasi-Newton), ‘square’ (trust-region), and ‘star’ (feasible Newton).

In all the cases, i.e., for the 85%−, 90%-, and 100%−converged solutions,the five optimization solvers converged towards the same optimal mesh. Forthe 85%−converged solutions, feasible Newton is the fastest method to reachan optimal solution (see Figure 2(a)); few iterations were required since theinitial CUBIT-generated meshes were of fairly good quality. Feasible Newtonwas possibly the quickest method since it takes fewer iterations than theother methods; however, each iteration takes a greater amount of time thanthe other solvers. The ranking of all solvers in order of fastest to slowest onthe larger meshes is: feasible Newton < steepest descent < conjugate gradient< trust-region < quasi-Newton. For the smaller meshes, the rank-ordering is:conjugate gradient < feasible Newton < steepest descent < trust-region <quasi-Newton. In general, the gradient-based solvers (i.e, steepest descent andconjugate gradient) performed better than the Hessian-based solvers (trust-region and quasi-Newton). However, feasible Newton, which is a Hessian-based solver, performed very competitively. This is likely due to the fact thatlocal mesh smoothing was performed with a highly-tuned solver. In addition,the rank ordering of the solvers depends on the mesh size as noted above.

In the majority of the 90%-converged solution cases (see Figure 2(b)), theconjugate gradient algorithm reached convergence faster than the other meth-ods. This was followed by the steepest descent, feasible Newton, trust-region,

Page 9: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

A Comparison of Gradient- and Hessian-Based Optimization Methods 639

0 100 200 300 400 5000

10

20

30

40

50

60

• • • • • ••• ••• •• • • • •• •••• ••• • • • • • • • •

•••

•••

••••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(a) 85%-converged solution

0 100 200 300 400 5000

10

20

30

40

50

60

• • • • • ••• ••• •• • • • •• •••• ••• • • • • • • • •

•••

•••

••••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(b) 90%-converged solution

0 100 200 300 400 5000

20

40

60

80

100

120

• • • • • ••• ••• •• • • • •• •••• ••• • • • • • • • •

•••

•••

••••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(c) 100%-converged solution

Fig. 2. Mesh smoothing to various convergence levels: (a) 85%-converged solu-tion; (b) 90%-converged solution; (c) 100%-converged solution. Results are for thedistduct meshes with the aspect ratio quality metric

and quasi-Newton methods, respectively. This ordering is different than thatwhich was obtained for the 85% case. Because local mesh smoothing wasperformed, only one vertex in the mesh is moved at a time. The steepest de-scent and conjugate gradient methods use only the gradient of the objectivefunction to move a vertex to its optimal location. The other methods also usethe Hessian of the objective function to move the vertex. The calculation ofthe Hessian adds computational expense, making the Hessian-based methodscomparatively slower. However, Hessians may effect local mesh smoothing re-sults less than global mesh smoothing results where the Hessian matrices aremuch larger. The conjugate gradient method is superior to steepest descentsince it uses gradient history to determine the optimal vertex position.

In the majority of the 100%-converged solution case (see Figure 2(c)), theconjugate gradient algorithm was the fastest to reach convegence for smaller

Page 10: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

640 S.P. Sastry and S.M. Shontz

meshes; however, the steepest descent method proved to be faster for largermeshes. This is probably due to the increase in memory which is requiredfor larger meshes. Eventually the increased requirements on the performanceof the cache may slow down the conjugate gradient algorithm relative to thesteepest descent algorithm since it must store and access an additional vector.

In conclusion, the behavior of the optimization solvers is influenced by thedegree of accuracy desired in the solution and the size of the mesh. Most of thetime, the gradient-based optimization solvers exhibited superior performanceto that of the Hessian-based solvers.

4.2 Initial Mesh Configuration

In order to investigate the effect that the initial mesh configuration (as mea-sured by distance from optimal mesh) had on the performance of the fivesolvers, a series of perturbed meshes, based on the 500,000 element distduct,foam, gear, and hook meshes from the previous experiment, were designed.In particular, the meshes were smoothed initially using the aspect ratio meshquality metric. Then, random or systematic perturbations were applied tothe interior vertices of the optimal mesh. For all experiments, the perturba-tions were applied to all interior vertices and to a randomly chosen subsets ofvertices of size 5%, 10%, 25%, and 100% of the interior vertices. The formulasfor the perturbations are as follows:

Random: xv = xv +αvr, where r is a vector of random numbers generatedusing the rand function, and αv is a multiplicative factor controlling theamount of perturbation. For our experiments, we chose a random value forαv; the resulting meshes were checked to verify that they were of poor quality.

Translational: xv = xv + αs, where s is a direction vector giving the coor-dinates to be shifted, and α is a multiplicative factor controlling the degreeof perturbation. In this case, we consider the shift with s = [1 0 0]T andα values ranging from 0.016 to 1.52 were used to maximize the amount ofperturbation a particular mesh could withstand before the elements becameinverted. Thus, the specific value of α chosen for a mesh depended upon thesize of the elements.

Random Perturbations

The results obtained here differ somewhat from the results obtained fromthe scalability experiment above. They are similar in that the gradient-basedmethods performed better than the Hessian-based methods. This can be at-tributed to the greater computational expense of computing the Hessian ma-trices for a smaller payoff in terms of a decrease in the objective function.The main difference here is that, in almost all cases, the steepest descentalgorithm performs better than the conjugate gradient algorithm.

For this experiment, the meshes to be smoothed were perturbed from thefully optimized CUBIT-generated meshes. Thus, the initial meshes are ofpoorer quality. Starting with poor quality meshes, i.e., far away from an

Page 11: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

A Comparison of Gradient- and Hessian-Based Optimization Methods 641

optimal mesh, had a very significant impact on the performance of the solvers.There are cases when the conjugate gradient method does better than thesteepest descent method when the quality of the input mesh is reasonablygood. In this case, all solvers converged to the same optimal mesh.

However, when we start with a poor quality initial mesh, a coarse-scaleimprovement in the the mesh is needed. Once the mesh has been sufficientlysmoothed, fine-scale improvements can be obtained through the use of supe-rior solvers. In most cases, because the perturbation was large, coarse-scalesmoothing was needed. As a result, the performance of steepest descent wasthe best (also due to the lower complexity of the algorithm). When the per-turbations were small, fine-scale smoothing requirements imply that superiormethods will converge faster. This was indeed seen when the perturbationswere small. The conjugate gradient method’s performance was better thanthat of steepest descent in such cases. However, the Hessian-based methodswere slower because of their inherent computational complexity. Figure 3(a)shows typical objective function versus time plots for our experiments.

The behavior of the trust region method was distinctly different than thatof the other algorithms. For small perturbations from the optimal mesh, thebehavior of the trust-region method almost coincided with that of the othermethods in the quality versus time plots. Figure 3(b) below illustrates anexample of such behavior.

However, when the perturbations were large, the trust-region method wasmuch slower than the other methods in terms of time to convergence. Thisis due to the constraint of the spherical trust-region bounding the maximumacceptable steplength at each iteration. This conservative approach slows thetime to convergence of the trust-region method. It was also observed that,for large perturbations, the steepest descent method does not converge tothe same optimal mesh as the other methods. In particular, it converges toan optimal mesh with a higher objective function value. The plot shown inFigure 3(c) is a good example of the dismal performance of the trust-regionand steepest descent methods in the large perturbation case.

In conclusion, the rank-ordering of the optimization solvers depends uponthe amount of random perturbations applied to the initial meshes in thecontext of mesh smoothing using the aspect ratio quality metric. In particular,all five methods performed competitively for the small perturbation case;however, the steepest descent and conjugate gradient methods performed thebest. In the case of medium-sized perturbations, the steepest descent methodperformed the best, and the trust-region method performed very slowly. Theother three methods exhibited average performance. Finally, for the case oflarge perturbations, the trust-region method is very slow to converge, andthe steepest descent method may converge to a mesh of poorer quality.

Affine Perturbations

In order to determine the effect that affine perturbations had on the per-formance of the optimization solvers, the affine (translation) perturbation

Page 12: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

642 S.P. Sastry and S.M. Shontz

0 50 100 150 200 250 300 3507

7.5

8

8.5

9

9.5x 10•

• •• •

••

••••

•••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(a) 10% interior vertices perturbed

0 50 100 150 200 250 300 3500.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6x 10•

• •• ••

•••

••••

•••

••

••••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(b) 10% interior vertices perturbed

0 50 100 150 200 250 300 3500

0.5

1

1.5

2

2.5

3x 10• •

• •• •

••

••••

•••

•••

•••

••• • •• • • • • ••• • • • • • •

• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(c) 5% interior vertices perturbed

Fig. 3. Typical results from the random perturbation experiment. Results wereobtained by smoothing the 500,000 element meshes using the aspect ratio qualitymetric. (a) The result is for the gear mesh with 10% of its vertices perturbed;because the perturbation was small, the behavior of the trust-region method wasalmost coincident with that of the other solvers. (b) The result is for the distductmesh with 10% of its vertices perturbed; here the trust-region method is competitivewhen the initial mesh is of reasonable quality due to the medium-size perturbation.(c) The result is for the distduct mesh with 5% of its vertices perturbed. Because theperturbations were large, the steepest descent and trust-region methods performedvery poorly.

shown above was applied to all interior mesh vertices once the appropriateinitial 500,000 element distduct, foam, gear, and hook meshes were smoothedaccording to the aspect ratio mesh quality metric.

The qualities of the interior elements of the perturbed meshes were stillfairly good since the transformation applied was affine; however the qual-ities of the boundary elements was much worse. It is expected that the

Page 13: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

A Comparison of Gradient- and Hessian-Based Optimization Methods 643

0 50 100 150 200 250 300 3507.1

7.2

7.3

7.4

7.5

7.6

7.7x 10•

• •• •

••

••••

•••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(a) Mesh smoothing of affinely per-turbed distduct mesh

0 50 100 150 200 250 300 3506.4

6.45

6.5

6.55

6.6

6.65

6.7

6.75x 10•

• •• •

••

••••

•••

••••

••••

• •• • • • • ••• • • • • • •• • • • •• •• •• • • •• •• • • •• • • •• •• •• • •• • •• •• • ••• • • •• •• • • • •• • • • •• •

(b) Mesh smoothing of affinely per-turbed hook mesh

Fig. 4. Typical results for the affine perturbation experiment using the aspect ratiofor local mesh smoothing. The results are for smoothing the distduct and hookmeshes with 500,000 elements after all interior vertices were affinely perturbed.

convergence plots for all of the solvers will start with rapid decrease in theobjective function and will end with a small decrease in the objective func-tion. This is because the initial meshes were created by applying as large anaffine perturbation as possible before mesh inversion occurred, thus generat-ing meshes rather far away from the optimal ones. This behavior is typicaland is observed in the plots shown in Figure 4. The time taken per non-linear iteration varies with the computational complexity of the algorithm.However, the objective function values (for the various solvers) remain rathersimilar over the first few iterations until, eventually, more vertex movementoccurs, and the objective function values become less predictable. However,all solvers did converge to the same optimal mesh.

The steepest descent method, being the least computationally expensivemethod, spends less time per iteration and converges to an optimal meshfairly quickly. The ranking of the optimization solvers for the affine pertur-bation meshes is as follows: steepest descent < conjugate gradient < feasibleNewton < trust-region < quasi-Newton. This rank-ordering demonstratesthat methods for which every iteration is faster converge before methods forwhich each iteration is slower.

In conclusion, the optimization solvers exhibited a distinct rank-orderingfor the affine perturbation meshes in the context of local mesh smoothingusing the aspect ratio quality metric. In particular, the rank-ordering was asfollows: steepest descent < conjugate gradient < feasible Newton < trust-region < quasi-Newton.

Page 14: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

644 S.P. Sastry and S.M. Shontz

4.3 Graded Meshes

Our second test set was generated using Tetgen in order to test the effectthat grading of mesh elements has on the performance of the five optimiza-tion solvers, as graded meshes have a larger distribution of element meshqualities. For this experiment, three sets of structured tetrahedral mesheswere generated which contain the same numbers of vertices and elements butwhose elements have different volumes. The meshes were constructed on acube domain having a side length of 20 units. In the first set of meshes, thevertices were evenly distributed in two of the three axes, but, for the otheraxis, half of the vertices were placed in first 10%, 20%, 30%, or 40% of thevolume. Two additional sets of test meshes were created with the density ofvertices varying in two and three directions instead of variation in only onedirection. After the point clouds were created, Tetgen was used to create avolume mesh of the cube domain. The resulting Delaunay meshes, which werecreated without using any quality control features, was used for the gradedmesh experiment. See Figure 1(e) for an example of a mesh created with halfof its vertices occupying 30% of the space in all three axes and distributeduniformly throughout the rest of the cube volume.

This mesh generation technique results in a structured mesh with het-erogeneous elements in terms of volume. In particular, approximately one-fourth, one-half, and one-fourth of the mesh elements can be considered small,medium, and large, respectively. All of the meshes generated contain 8000vertices and 41,154 tetrahedra.

The results obtained from this experiment are shown in Figure 5. The meshsmoothing results for the graded meshes are similar to those observed in theaffine perturbation case. The main difference between the two experiments isthe behavior of the conjugate gradient method. For the graded meshes, thereis a definite hierarchy among the other four solvers; the rank-ordering is asfollows: steepest descent < feasible Newton < trust-region < quasi-Newton.However, the rank of the conjugate gradient method with respect to the othersolvers varies as a function of time.

In conclusion, the rank-ordering of the conjugate gradient method variedas a function of time as the graded meshes were smoothed using the aspectratio mesh quality metric. However, the rank-ordering of the remaining fouroptimization solvers was as follows: steepest descent < feasible-Newton <trust-region < quasi-Newton.

4.4 Mesh Quality Metric

Our final experiment was designed to investigate the effect of the choice ofmesh quality metric on the performance of the optimization methods. Forthis experiment, we investigated the performance of the various methods onthe distduct, foam, gear, hook, and cube meshes by repeating a subset of theabove experiments for the inverse mean ratio and vertex condition numberquality metrics.

Page 15: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

A Comparison of Gradient- and Hessian-Based Optimization Methods 645

0 50 100 150 200 250 3002

2.5

3

3.5

4

4.5x 10•

• •• •

••

••••

•••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(a) 10%

0 50 100 150 200 250 3009.8

9.85

9.9

9.95

10

10.05x 10•

• •• ••

•••

••••

•••

••

••••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(b) 40%

Fig. 5. Mesh smoothing results for the graded meshes using the aspect ratio meshquality metric. The percentages indicate the amount of volume used by the firsthalf of the vertices in a given axis within the cube domain.

The results of performing the scaling experiment for the inverse mean ratioand vertex condition number quality metrics are the same as those for theaspect ratio mesh quality metric described above.

Performing the random perturbation experiment for the inverse mean ratioand vertex condition number quality metrics yielded results that were qual-itatively the same, i.e., the results could be classified into one of the abovethree cases depending upon how large were the perturbations.

The results of performing the affine perturbation experiment for the inversemean ratio and vertex condition number mesh quality metric yielded resultssimilar to those when the aspect ratio mesh quality metric was used.

Performing the element heterogeneity experiment for the inverse mean ra-tio mesh quality metric yielded results that were the same as those observedearlier for the aspect ratio mesh quality metric. However, the results are dif-ferent for the vertex condition number mesh quality metric. When the vertexcondition number metric is employed for mesh smoothing in the context of thegraded mesh experiment, we observe a small rise in the objective function af-ter a significant initial decrease as seen in Figure 6. The plots in this figure arefor the cube meshes with vertices in all three axes distributed nonuniformlyto create graded meshes with elements of heterogeneous volume. Althoughsuch behavior is rare, it is possible, as local mesh smoothing is being appliedwith a global objective function. Further investigation into the cause of suchbehavior for the meshes in this experiment is needed.

In conclusion, the scaling experiment results were insenstivive to the choiceof mesh quality metric. However, the perturbation and element heterogeneityresults were indeed sensitive to the choice of mesh quality metric. Furtherresearch is needed to identify additional contexts where the choice of meshquality metric influences optimization solver behavior.

Page 16: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

646 S.P. Sastry and S.M. Shontz

0 50 100 150 200 250 3000.8

1

1.2

1.4

1.6

1.8x 10•

• •• •

••

••••

•••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(a) 10%

0 50 100 150 200 250 3003

3.2

3.4

3.6

3.8

4

4.2x 10•

• •• •

••

••••

•••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(b) 20%

0 50 100 150 200 250 3001.85

1.9

1.95

2

2.05

2.1x 10•

• •• •

••

••••

•••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(c) 30%

0 50 100 150 200 250 3001.44

1.445

1.45

1.455

1.46

1.465

1.47

1.475x 10•

• •• •

••

••••

•••

•••

•••

•••

• •• • • • • ••• • • • • • •• • • •• • • •• •• •• • •• • •• • • • •• • • • •• •• •• • ••• • • •• •• • • • •• •• •• • • •• •

(d) 40%

Fig. 6. Mesh smoothing results for the cube meshes with heterogeneous elementvolumes using the vertex condition number mesh quality metric. The percentagesindicate the percentage of volume used by the first half of the vertices in all threeaxes within the cube domain.

5 Future Work

The results in this study are specific to local mesh quality improvement ofunstructured tetrahedral meshes via five optimization solvers, namely, thesteepest descent, Fletcher-Reeves conjugate gradient, quasi-Newton, trust-region, and feasible Newton methods, with mesh quality measured accordingto the three specified quality metrics, namely the aspect ratio, inverse meanratio, and vertex condition number. The results we obtained may vary dra-matically if global mesh quality improvement methods were used instead ofthe local ones studied here [28, 29]; hence, we plan to investigate global ver-sions of these solvers in future work. In addition, vertex ordering has beenshown to play an important role in convergence of the Feasnewt solver when

Page 17: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

A Comparison of Gradient- and Hessian-Based Optimization Methods 647

used for local mesh optimization [30]; thus, we will also investigate the effectof vertex ordering in the future. We also plan to investigate the role that othernon-shape quality metrics have on the mesh optimization methods with thegoal of identifying other contexts where quality metrics influence optimiza-tion solver behavior. Finally, we plan to investigate the use of hybrid solversto improve optimization solver performance.

References

1. Babuska, I., Suri, M.: The p and h-p versions of the finite element method,basic principles, and properties. SIAM Review 35, 579–632 (1994)

2. Berzins, M.: Solution-based mesh quality for triangular and tetrahedral meshes.In: Proceedings of the 6th International Meshing Roundtable, Sandia NationalLaboratories, pp. 427–436 (1997)

3. Berzins, M.: Mesh quality - Geometry, error estimates, or both? In: Proceedingsof the 7th International Meshing Roundtable, Sandia National Laboratories,pp. 229–237 (1998)

4. Babuska, I., Aziz, A.: On the angle condition in the finite element method.SIAM J. Numer. Anal. 13, 214–226 (1976)

5. Fried, E.: Condition of finite element matrices generated from nonuniformmeshes. AIAA Journal 10, 219–221 (1972)

6. Shewchuk, J.: What is a good linear element? Interpolation, conditioning,and quality measures. In: Proceedings of the 11th International MeshingRoundtable, Sandia National Laboratories, pp. 115–126 (2002)

7. Freitag, L., Ollivier-Gooch, C.: A cost/benefit analysis for simplicial mesh im-provement techniques as measured by solution efficiency. Internat. J. Comput.Geom. Appl. 10, 361–382 (2000)

8. Knupp, P., Freitag, L.: Tetrahedral mesh improvement via optimization of theelement condition number. Int. J. Numer. Meth. Eng. 53, 1377–1391 (2002)

9. Freitag, L., Plassmann, P.: Local optimization-based simplicial mesh untanglingand improvement. Int. J. Numer. Meth. Eng. 49, 109–125 (2000)

10. Amenta, N., Bern, M., Eppstein, D.: Optimal point placement for mesh smooth-ing. In: Proceedings of the 8th ACM-SIAM Symposium on Discrete Algorithms,pp. 528–537 (1997)

11. Zavattieri, P.: Optimization strategies in unstructured mesh generation. Int. J.Numer. Meth. Eng. 39, 2055–2071 (1996)

12. Amezua, E., Hormaza, M., Hernandez, A., Ajuria, M.: A method of the im-provement of 3D solid finite element meshes. Adv. Eng. Softw. 22, 45–53 (1995)

13. Canann, S., Stephenson, M., Blacker, T.: Optismoothing: An optimization-driven approach to mesh smoothing. Finite Elem. Anal. Des. 13, 185–190 (1993)

14. Parthasarathy, V., Kodiyalam, S.: A constrained optimization approach to fi-nite element mesh smoothing. Finite Elem. Anal. Des. 9, 309–320 (1991)

15. Brewer, M., Freitag Diachin, L., Knupp, P., Leurent, T., Melander, D.: TheMesquite Mesh Quality Improvement Toolkit. In: Proceedings of the 12th In-ternational Meshing Roundtable, Sandia National Laboratories, pp. 239–250(2003)

16. Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, Heidelberg(2006)

Page 18: A Comparison of Gradient- and Hessian-Based Optimization ...imr.sandia.gov/papers/imr18/Sastry.pdf · plementation in Mesquite [15] is a line search that approximates the Hessian

648 S.P. Sastry and S.M. Shontz

17. Munson, T.: Mesh Shape-Quality Optimization Using the Inverse Mean-RatioMetric. Mathematical Programming 110, 561–590 (2007)

18. Cavendish, J., Field, D., Frey, W.: An approach to automatic three-dimensionalfinite element mesh generation. Int. J. Num. Meth. Eng. 21, 329–347 (1985)

19. Liu, A., Joe, B.: Relationship between tetrahedron quality measures. BIT 34,268–287 (1994)

20. Knupp, P.: Achieving finite element mesh quality via optimization of the Jaco-bian matrix norm and associated quantities, Part II - A framework for volumemesh optimization and the condition number of the Jacobian matrix. Int. J.Numer. Meth. Eng. 48, 1165–1185 (2000)

21. Knupp, P.: Matrix norms and the condition number. In: Proceedings of the 8thInternational Meshing Roundtable, Sandia National Laboratories, pp. 13–22(1999)

22. Knupp, P.: Algebraic mesh quality metrics. SIAM J. Sci. Comput. 23, 193–218(2001)

23. Armijo, L.: Minimization of functions having Lipschitz-continuous first partialderivatives. Pacific Journal of Mathematics 16, 1–3 (1966)

24. Kelley, C.T.: Solving Nonlinear Equations with Newton’s Method. SIAM,Philadelphia (2003)

25. Sandia National Laboratories, CUBIT Generation and Mesh GenerationToolkit, http://cubit.sandia.gov/

26. Si, H.: TetGen - A Quality Tetrahedral Mesh Generator and Three-DimensionalDelaunay Triangulator, http://tetgen.berlios.de/

27. Knupp, P.: Personal communication (2009)28. Freitag, L., Knupp, P., Munson, T., Shontz, S.: A comparison of inexact New-

ton and coordinate descent mesh optimization techniques. In: Proceedings ofthe 13th International Meshing Roundtable, Sandia National Laboratories, pp.243–254 (2004)

29. Diachin, L., Knupp, P., Munson, T., Shontz, S.: A comparison of two optimiza-tion methods for mesh quality improvement. Eng. Comput. 22, 61–74 (2006)

30. Shontz, S.M., Knupp, P.: The effect of vertex reordering on 2D local meshoptimization efficiency. In: Proceedings of the 17th International MeshingRoundtable, Sandia National Laboratories, pp. 107–124 (2008)


Recommended