+ All Categories
Home > Documents > Accelerating Hessian-free Gauss-Newton full-waveform ... · where eAˆ k is the damping term...

Accelerating Hessian-free Gauss-Newton full-waveform ... · where eAˆ k is the damping term...

Date post: 31-May-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
5
Accelerating Hessian-free Gauss-Newton full-waveform inversion via improved preconditioning strategies Wenyong Pan, Kristopher A. Innanen, Department of Geoscience, CREWES Project, University of Calgary, Wenyuan Liao, Depart- ment of Mathematics and Statistics, University of Calgary SUMMARY Gradient-based methods for full-waveform inversion (FWI) have the potential to converge globally but suffer from a slow convergence rate. Newton-type methods provide quadratic convergence, but they are computationally burdensome for large-scale inverse problems. The Hessian-free (HF) optimization method represents an attractive alter- native to these above-mentioned optimization methods. At each iter- ation, the HF approach obtains the search direction by approximately solving the Newton linear system using a conjugate-gradient (CG) al- gorithm. One issue with HF optimization is that the CG algorithm requires many iterations. In this paper, we develop and compare dif- ferent preconditioning schemes for the CG algorithm to accelerate the HF Gauss-Newton method. Traditionally, the preconditioners are de- signed as diagonal Hessian approximations or inverse Hessian approx- imations. In this research, we propose to construct the l -BFGS inverse Hessian preconditioner with the diagonal Hessian approximations as initial guess. It is shown that the quasi-Newton l -BFGS precondition- ing scheme with the pseudo diagonal Gauss-Newton Hessian as ini- tial guess shows the best performances in accelerating the HF Gauss- Newton FWI. INTRODUCTION Seismic full-waveform inversion (FWI) holds the promise of provid- ing high-resolution estimates of subsurface properties. FWI iteratively reconstructs the model parameters by minimizing an L2 norm misfit function (Lailly, 1983; Tarantola, 1984; Virieux and Operto, 2009). Traditional optimization methods for FWI in exploration geophysics are gradient-based methods (i.e., steepest-descent (SD) and non-linear conjugate-gradient (NCG) methods), which are computationally at- tractive for large-scale inverse problems. However, they suffer from slow convergence rates. The search direction can be significantly enhanced by multiplying the gradient with the inverse Hessian matrix (Pratt et al., 1998). For multi- parameter FWI, the multi-parameter Hessian is also expected to sup- press parameter cross-talk (Operto et al., 2013; Innanen, 2014a,b; Pan et al., 2015c). Furthermore, the second-order term in the Hessian ma- trix can help to remove the second-order scattering artifacts in the gradient (Pratt et al., 1998; Pan et al., 2015b,c). However, explicit calculation, storage and inversion of the Hessian at each iteration is computationally impractical for large-scale inverse problems. Hence, various approaches have been proposed for approximating the Hessian (Shin et al., 2001a) or inverse Hessian (Nocedal and Wright, 2006; Nammour and Symes, 2009; Demanet et al., 2012). In Gauss-Newton method, an approximate Hessian is introduced by involving only the first-order term (Pratt et al., 1998). Tang (2009) and Pan et al. (2015a) used phase-encoding technology to construct the diagonal Gauss-Newton Hessian efficiently. Shin et al. (2001a) proposed a pseudo-Hessian approximation by replacing the Fr´ echet derivative wavefield with the virtual source during the auto-correlation process. Preconditioning the gradient with the diagonal pseudo-Hessian resembles a deconvolution imaging condition (Pan et al., 2014, 2015a). Instead of constructing the Hessian explicitly, the quasi-Newton l - BFGS methods approximate the inverse Hessian iteratively by storing the model and gradient changes from a number of M (M < 10) pre- vious iterations (Nocedal and Wright, 2006). Compared to gradient- based methods, l -BFGS methods provide faster convergence rates for large-scale inverse problems (Brossier et al., 2010; Ma and Hale, 2012). The convergence performance of l -BFGS method is closely related to the initial guess of inverse Hessian approximation (Brossier et al., 2009; Guitton and D´ ıaz, 2012). In the numerical section, we give ex- amples to examine the convergence rates of the l -BFGS methods with different diagonal Hessian approximations as initial guess. The Hessian-free optimization method represents an attractive alterna- tive to the above-described optimization methods (Nash, 1985; San- tosa and Symes, 1988). At each iteration, the search direction is com- puted by approximately solving the Newton equations using a matrix- free fashion of the conjugate-gradient (CG) algorithm (Nash, 1985; Hu et al., 2009). This linear iterative solver only requires the Hessian- vector products instead of forming the Hessian operator explicitly (Ficht- ner and Trampert, 2011; M´ etivier et al., 2014). In this paper, the full Hessian is replaced with the Gauss-Newton Hessian, which is always symmetric and positive semi-definite. One issue of the HF optimiza- tion method is that obtaining the search direction approximately re- quires a large number of CG iterations. Our main goal in this paper is to precondition the CG algorithm, reduce the number of CG iterations, and accelerate the HF Gauss-Newton FWI (Nash, 2000; Sainath et al., 2013). Preconditioning makes the CG problem well-conditioned, hence eas- ier to solve, and it reduces the number of CG iterations. The precondi- tioner for the CG algorithm is designed by approximating the Hessian or its inverse (Nash, 2000). Different preconditioning schemes are de- veloped for comparison in this paper. Diagonal pseudo-Hessian and diagonal Gauss-Newton Hessian are first considered as precondition- ers for the CG algorithm. A pseudo diagonal Gauss-Newton Hessian approximation is also introduced as the preconditioner. Quasi-Newton l -BFGS inverse Hessian approximations also serve as effective pre- conditioners for CG iterative solver (Nash, 1985). In this paper, we furthermore propose that the l -BFGS inverse Hessian precondition- ers be constructed with diagonal Hessian approximations as initial guesses. We demonstrate with numerical examples that the l -BFGS inverse Hessian preconditioning strategy can accelerate the HF Gauss- Newton method effectively and furthermore, employing the diagonal Hessian approximations as initial guess can improve its efficiency fur- ther. THE NON-LINEAR LEAST-SQUARES INVERSE PROBLEM FWI seeks to estimate the subsurface parameters by iteratively min- imizing the difference between the synthetic data d syn and observed data d obs (Lailly, 1983; Tarantola, 1984; Virieux and Operto, 2009). The misfit function Φ is formulated in a least-squares form: Φ (m)= 1 2 X xs X xg X ω kΔd (x g , x s , ω) k 2 , (1) where Δd = d obs - d syn is the data residual vector, and k·k means the L2 norm. Here, the synthetic data d syn is related to the seismic wave- field u by a detection operator P, which samples the wavefield at the receiver locations: d syn = Pu. To minimize the quadratic approxima- tion of the misfit function, the updated model at the (k + 1)th iteration can be written as the sum of the model at the kth iteration and the search direction Δm k : m k+1 = m k + μ k Δm k , (2) where μ k is the step length, a scalar constant calculated through a line search method satisfying the weak Wolfe condition (Nocedal and Wright, 2006). Within a Newton optimization framework, the search direction Δm k is the solution of the Newton linear system: H k Δm k = -g k , (3)
Transcript
Page 1: Accelerating Hessian-free Gauss-Newton full-waveform ... · where eAˆ k is the damping term ensuring that H˜ k +eAˆ is positive def-inite, e is a small constant value and Aˆ k

Accelerating Hessian-free Gauss-Newton full-waveform inversion via improved preconditioning strategiesWenyong Pan, Kristopher A. Innanen, Department of Geoscience, CREWES Project, University of Calgary, Wenyuan Liao, Depart-ment of Mathematics and Statistics, University of Calgary

SUMMARY

Gradient-based methods for full-waveform inversion (FWI) have thepotential to converge globally but suffer from a slow convergencerate. Newton-type methods provide quadratic convergence, but theyare computationally burdensome for large-scale inverse problems. TheHessian-free (HF) optimization method represents an attractive alter-native to these above-mentioned optimization methods. At each iter-ation, the HF approach obtains the search direction by approximatelysolving the Newton linear system using a conjugate-gradient (CG) al-gorithm. One issue with HF optimization is that the CG algorithmrequires many iterations. In this paper, we develop and compare dif-ferent preconditioning schemes for the CG algorithm to accelerate theHF Gauss-Newton method. Traditionally, the preconditioners are de-signed as diagonal Hessian approximations or inverse Hessian approx-imations. In this research, we propose to construct the l-BFGS inverseHessian preconditioner with the diagonal Hessian approximations asinitial guess. It is shown that the quasi-Newton l-BFGS precondition-ing scheme with the pseudo diagonal Gauss-Newton Hessian as ini-tial guess shows the best performances in accelerating the HF Gauss-Newton FWI.

INTRODUCTION

Seismic full-waveform inversion (FWI) holds the promise of provid-ing high-resolution estimates of subsurface properties. FWI iterativelyreconstructs the model parameters by minimizing an L2 norm misfitfunction (Lailly, 1983; Tarantola, 1984; Virieux and Operto, 2009).Traditional optimization methods for FWI in exploration geophysicsare gradient-based methods (i.e., steepest-descent (SD) and non-linearconjugate-gradient (NCG) methods), which are computationally at-tractive for large-scale inverse problems. However, they suffer fromslow convergence rates.

The search direction can be significantly enhanced by multiplying thegradient with the inverse Hessian matrix (Pratt et al., 1998). For multi-parameter FWI, the multi-parameter Hessian is also expected to sup-press parameter cross-talk (Operto et al., 2013; Innanen, 2014a,b; Panet al., 2015c). Furthermore, the second-order term in the Hessian ma-trix can help to remove the second-order scattering artifacts in thegradient (Pratt et al., 1998; Pan et al., 2015b,c). However, explicitcalculation, storage and inversion of the Hessian at each iteration iscomputationally impractical for large-scale inverse problems. Hence,various approaches have been proposed for approximating the Hessian(Shin et al., 2001a) or inverse Hessian (Nocedal and Wright, 2006;Nammour and Symes, 2009; Demanet et al., 2012). In Gauss-Newtonmethod, an approximate Hessian is introduced by involving only thefirst-order term (Pratt et al., 1998). Tang (2009) and Pan et al. (2015a)used phase-encoding technology to construct the diagonal Gauss-NewtonHessian efficiently. Shin et al. (2001a) proposed a pseudo-Hessianapproximation by replacing the Frechet derivative wavefield with thevirtual source during the auto-correlation process. Preconditioning thegradient with the diagonal pseudo-Hessian resembles a deconvolutionimaging condition (Pan et al., 2014, 2015a).

Instead of constructing the Hessian explicitly, the quasi-Newton l-BFGS methods approximate the inverse Hessian iteratively by storingthe model and gradient changes from a number of M (M < 10) pre-vious iterations (Nocedal and Wright, 2006). Compared to gradient-based methods, l-BFGS methods provide faster convergence rates forlarge-scale inverse problems (Brossier et al., 2010; Ma and Hale, 2012).The convergence performance of l-BFGS method is closely related

to the initial guess of inverse Hessian approximation (Brossier et al.,2009; Guitton and Dıaz, 2012). In the numerical section, we give ex-amples to examine the convergence rates of the l-BFGS methods withdifferent diagonal Hessian approximations as initial guess.

The Hessian-free optimization method represents an attractive alterna-tive to the above-described optimization methods (Nash, 1985; San-tosa and Symes, 1988). At each iteration, the search direction is com-puted by approximately solving the Newton equations using a matrix-free fashion of the conjugate-gradient (CG) algorithm (Nash, 1985;Hu et al., 2009). This linear iterative solver only requires the Hessian-vector products instead of forming the Hessian operator explicitly (Ficht-ner and Trampert, 2011; Metivier et al., 2014). In this paper, the fullHessian is replaced with the Gauss-Newton Hessian, which is alwayssymmetric and positive semi-definite. One issue of the HF optimiza-tion method is that obtaining the search direction approximately re-quires a large number of CG iterations. Our main goal in this paper isto precondition the CG algorithm, reduce the number of CG iterations,and accelerate the HF Gauss-Newton FWI (Nash, 2000; Sainath et al.,2013).

Preconditioning makes the CG problem well-conditioned, hence eas-ier to solve, and it reduces the number of CG iterations. The precondi-tioner for the CG algorithm is designed by approximating the Hessianor its inverse (Nash, 2000). Different preconditioning schemes are de-veloped for comparison in this paper. Diagonal pseudo-Hessian anddiagonal Gauss-Newton Hessian are first considered as precondition-ers for the CG algorithm. A pseudo diagonal Gauss-Newton Hessianapproximation is also introduced as the preconditioner. Quasi-Newtonl-BFGS inverse Hessian approximations also serve as effective pre-conditioners for CG iterative solver (Nash, 1985). In this paper, wefurthermore propose that the l-BFGS inverse Hessian precondition-ers be constructed with diagonal Hessian approximations as initialguesses. We demonstrate with numerical examples that the l-BFGSinverse Hessian preconditioning strategy can accelerate the HF Gauss-Newton method effectively and furthermore, employing the diagonalHessian approximations as initial guess can improve its efficiency fur-ther.

THE NON-LINEAR LEAST-SQUARES INVERSE PROBLEM

FWI seeks to estimate the subsurface parameters by iteratively min-imizing the difference between the synthetic data dsyn and observeddata dobs (Lailly, 1983; Tarantola, 1984; Virieux and Operto, 2009).The misfit function Φ is formulated in a least-squares form:

Φ(m) =12

∑xs

∑xg

∑ω

‖∆d(xg,xs,ω)‖2, (1)

where ∆d = dobs−dsyn is the data residual vector, and ‖ · ‖ means theL2 norm. Here, the synthetic data dsyn is related to the seismic wave-field u by a detection operator P , which samples the wavefield at thereceiver locations: dsyn = Pu. To minimize the quadratic approxima-tion of the misfit function, the updated model at the (k+1)th iterationcan be written as the sum of the model at the kth iteration and thesearch direction ∆mk:

mk+1 = mk +µk∆mk, (2)

where µk is the step length, a scalar constant calculated through aline search method satisfying the weak Wolfe condition (Nocedal andWright, 2006). Within a Newton optimization framework, the searchdirection ∆mk is the solution of the Newton linear system:

Hk∆mk =−gk, (3)

Page 2: Accelerating Hessian-free Gauss-Newton full-waveform ... · where eAˆ k is the damping term ensuring that H˜ k +eAˆ is positive def-inite, e is a small constant value and Aˆ k

Preconditioned HF Gauss-Newton FWI

where g and H indicate gradient and Hessian respectively. Within theadjoint-state formulism (Plessix, 2006), the gradient can be expressedas (Sirgue and Pratt, 2004; Pan et al., 2015a):

g(x)=∑

xg

∑xs

∑ω

ℜ(ω

2 fs (ω)G(x,xs,ω)G(xg,x,ω)∆d∗ (xg,xs,ω)),

(4)where G(x,xs,ω) and G(xg,x,ω) indicate source-side and receiver-side Green’s functions respectively. Following equation (4), the gra-dient can be constructed efficiently by cross-correlating the forwardmodelled wavefield with the back-propagated data residual wavefield(Virieux and Operto, 2009). The gradient is poorly-scaled due to geo-metrical spreading, and it is also contaminated by spurious correlationsbecause of finite-frequency effects and doubly-scattered energy (Prattet al., 1998). Multiplying the gradient with the inverse Hessian cangreatly enhance the model update, which provides a quadratic conver-gence rate.

The Gauss-Newton approximate Hessian H is constructed by corre-lating two Frechet derivative wavefields, which only accounts for thefirst-order scattering effects (Tang, 2009):

H(x,x′)=∑

xg

∑xs

∑ω

ℜ(ω

4| fs(ω)|2G(x,xs,ω)G(xg,x,ω)

×G∗(x′,xs,ω

)G∗(xg,x′,ω

)),

(5)

where the element H(x,x′) in the Gauss-Newton Hessian is formedby correlating the two Frechet derivative wavefields at the receivers’locations due to model perturbations at positions x and x′.

For these Newton-type methods, explicit evaluation and inversion theHessian matrix H and Gauss-Newton Hessian H at each iteration arerequired. Though Newton-type methods benefit from fast convergencerate, the computation, storage and inversion of Hessian at each it-eration are prohibitively expensive for large-scale inverse problems.Gradient-based methods approximate the Hessian matrix H as an iden-tity matrix I and they are computationally more attractive but they suf-fer from slow convergence rates.

HESSIAN-FREE OPTIMIZATION METHOD

Instead of constructing Hessian or inverse Hessian approximations, theHessian-free (HF) optimization method obtains the search direction bysolving the Newton linear system approximately using a conjugate-gradient (CG) method (Saad, 2003; Anagaw and Sacchi, 2012). TheCG method is an optimal algorithm for solving a symmetric positivedefinite system Wx=b and it only requires computing the Hessian-vector products H(x) (Fichtner and Trampert, 2011):

H(x) =∑

x′H(x,x′)

υ(x′). (6)

where υ (x′) is an arbitrary vector. In this paper, the adjoint-statemethod is employed to calculate the Hessian-vector products. In thecontext of HF optimization method, the Hessian H is always replacedwith Gauss-Newton Hessian H, which is always symmetric and posi-tive semi-definite: (

Hk + εAk)

∆mk =−gk, (7)

where εAk is the damping term ensuring that Hk + εA is positive def-inite, ε is a small constant value and Ak indicates a diagonal matrixconsisting of the diagonal elements of the Gauss-Newton Hessian. Theresulting algorithm becomes a Levenberg-Marquardt method (Leven-berg, 1944; Marquardt, 1963). A Hessian-free optimization methodcan be made more competitive with further enhancements, such as, aneffective preconditioner for the linear system and appropriate stoppingcriteria for the inner iterative algorithm.

Preconditioning

The CG iterative algorithm requires many iterations to obtain the ap-proximate solution of a linear system. The convergence rate of the CGmethod depends on the spectral properties of the coefficient matrix(Nash, 2000). It is often convenient to transform the equation systeminto one which has the same solution but more favorable spectral prop-erties. This can be achieved by applying a suitable preconditioner Mon the linear system: M−1Wx = M−1b. Thus, the preconditionedNewton system for the HF Gauss-Newton FWI is given by:

M−1k

(Hk + εAk

)∆mk =−M−1

k gk. (8)

The solution of equation (8) can be obtained by the preconditionedconjugate-gradient (PCG) method, which is expected to reduce thenumber of inner iterations, improve the convergence rate and accel-erate the HF Gauss-Newton FWI.

Diagonal Hessian Approximation Preconditioners

The preconditioner for the CG method is always devised to approx-imate the Hessian or the inverse Hessian. We first consider the tra-ditional Hessian approximations (e.g., diagonal pseudo-Hessian anddiagonal Gauss-Newton Hessian) as the preconditioners. The pseudo-Hessian H is constructed by replacing the Frechet derivative wavefieldwith the virtual source fs (x,ω) in the correlation process (Shin et al.,2001b):

fs (x,ω) =−ω2 fs (ω)G(x,xs,ω) . (9)

The diagonal pseudo-Hessian is obtained by auto-correlating two vir-tual sources:

Hdiag (x) =∑

xs

∑xg

∑ω

ℜ(ω

4 | fs (ω) |2 G(x,xs,ω)G∗ (x,xs,ω)).

(10)Constructing the diagonal pseudo-Hessian at each iteration does notinvolve any additional cost. While when employing the diagonal Gauss-Newton Hessian Hdiag as a preconditioner, more computation cost isrequired for constructing the receiver-side Green’s functions.

In this paper, we introduce a pseudo diagonal Gauss-Newton Hessianapproximation ˜Hdiag as the preconditioner for the CG algorithm in theinner loop by assuming that the sources and receivers are co-located.The pseudo diagonal Gauss-Newton Hessian is written as:

˜Hdiag (x) =∑

xg

∑xs

∑ω

ℜ(ω

4| fs(ω)|2|G(x,xs,ω) |4). (11)

This diagonal approximation can be constructed at no additional cost.Summarily, the diagonal pseudo-Hessian, diagonal Gauss-Newton Hes-sian and pseudo diagonal Gauss-Newton Hessian preconditioners aregiven by:

M DPHk =Hk

diag +λBk, (12)

M DGHk = Hk

diag +λCk, (13)

M PDGHk = ˜Hk

diag +λDk, (14)

where λBk , λCk and λDk are the stabilization terms. These threedifferent preconditioning strategies are referred to as DPH-GN, DGH-GN and PDGH-GN in this paper. When the parameter λ is very large,these preconditioning methods approach the non-preconditioned HFGauss-Newton (CG-GN) method.

l-BFGS Inverse Hessian Preconditioners

Furthermore, we develop an l-BFGS preconditioning scheme for theHF optimization method, namely l-BFGS-GN method (H0 = I). InBFGS method, we are given a symmetric and positive definite matrixHk that approximates the inverse of the Hessian, and a pair of vec-tors sk = mk+1−mk , and yk = gk+1−gk that indicates the model and

Page 3: Accelerating Hessian-free Gauss-Newton full-waveform ... · where eAˆ k is the damping term ensuring that H˜ k +eAˆ is positive def-inite, e is a small constant value and Aˆ k

Preconditioned HF Gauss-Newton FWI

gradient changes. The inverse Hessian approximation Hk+1 is givenby:

Hk+1 = v†kHkvk +wksks†

k , (15)

where wk = 1/y†ksk and vk = I−wkyks†

k . A limited-memory BFGS (l-BFGS) method is always developed by storing the model and gradientchanges from a limited number M of previous iterations (typically M <10) (Nocedal, 1980). The approximated inverse Hessian H can alsobe used as a preconditioner for the CG iterative method:

Hk(Hk + εAk

)∆mk =−Hkgk. (16)

Traditionally, an identity matrix I is usually set as the initial guessH0. While, the initial guess H0 is closely related to the performanceof l-BFGS method. In this paper, to improve the l-BFGS precondition-ing scheme, we consider using the stabilized diagonal pseudo-Hessian,diagonal Gauss-Newton Hessian and pseudo diagonal Gauss-NewtonHessian as the initial guess for constructing the l-BFGS precondition-ers. These methods are referred to as l-BFGS-GN-DPH, l-BFGS-GN-DGH and l-BFGS-GN-PDGH methods respectively.

“Over-solving” the Newton equation will not produce a better searchdirection (Nash, 2000). The CG algorithm should be terminated withan appropriate stopping criteria. We can define the maximum inneriteration number kmax and relative residual γk:

γk =‖Hk∆mk +gk‖

‖gk‖, (17)

where k indicates the CG inner iteration index. The inner iterationis stopped when γk < γmin, where γmin indicates the relative residualtolerance.

NUMERICAL EXAMPLES

A modified Marmousi model is used to examine the efficiency of dif-ferent preconditioning schemes for the HF Gauss-Newton FWI. Thetruncated Marmousi model has 100× 100 grid cells with a grid in-terval of 10 m in both horizontal and vertical directions. We deploy49 sources from 20 m to 980 m at depth of 20 m with a regularsource spacing of 20 m. Fifty receivers are arranged from 10 m to1000 m every 20 m at the depth of 20 m. The source function is aRicker wavelet with a dominant frequency of 30 Hz. Figures 1a and1b show the true P-wave velocity model and initial P-wave velocitymodel. The frequencies used for inversion are increased from 5 Hzto 30 Hz with a partial overlap-frequency selection strategy, in whicha group of 3 frequencies are used for inversion simultaneously. Thefrequency group increases from low to high with 2 frequencies over-lapped and for each frequency band, a number of 5 outer iterations areperformed. The stopping criteria for the inner iteration are kmax = 10and/or γmin = 2.0e-1. The stabilization parameters are ε = 1.0e-2 andλ = 1.0e-2.

a)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

b)

Distance (km)

Dep

th (

km)

km/s0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1.522.533.5

Figure 1: (a) True P-wave velocity model; (b) Initial P-wave velocitymodel.

Figures 2a and 2b illustrate the inversion results obtained by SD andl-BFGS (H0 = I) methods. Figures 2c and 2d show the comparisonof well log data at 0.1 km and 0.6 km. The SD method is limited inrecovering the deep parts of the model. The l-BFGS method (H0 = I)provides better inversion result compared to SD method but the deep

a)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

b)

Distance (km)

Dep

th (

km)

km/s0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

11.5

2

2.5

3

3.5

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

c)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

TrueInitialSD

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

d)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

TrueInitiall−BFGS

Figure 2: (a) SD method (φ = 0.31); (b) l-BFGS method (φ = 0.07);(c) and (d) are the comparison of the well log data at 0.1 km and 0.6km.

parts of the inversion result are still not satisfactory. Figures 3a, 3band 3c show the inversion results by l-BFGS methods with differentdiagonal Hessian approximations as initial guess. Figures 3d, 3e and3f show the well log data comparison. With diagonal Hessian approxi-mations as initial guess, the inversion results by l-BFGS methods havebeen improved.

a)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

b)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

c)

Distance (km)

Dep

th (

km)

km/s0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

11.5

2

2.5

3

3.5

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

d)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

True

Initial

l−BFGS (DPH)

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

e)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

True

Initial

l−BFGS (DGH)

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

f)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

True

Initial

l−BFGS (PDGH)

Figure 3: (a), (b) and (c) show the inverted models by l-BFGS methodswith diagonal pseudo-Hessian (φ = 4.28e-2), diagonal Gauss-NewtonHessian (φ = 2.0e-2) and pseudo diagonal Gauss-Newton Hessian(φ = 2.50e-2) as initial guess respectively. (d), (e) and (f) show thewell log data comparison.

Figure 4a is the reconstructed model by the non-preconditioned CG-GN method. Figures 4b, 4c and 4d show the inversion results byl-BFGS-GN (M = 5 and H0 = I), DPH-GN and l-BFGS-GN-DPHmethods respectively. The deep parts of the reconstructed models byHF Gauss-Newton methods have been enhanced obviously comparedto SD and l-BFGS methods. Furthermore, the inversion results bypreconditioned HF Gauss-Newton methods are further improved com-pared to that by non-preconditioned CG-GN method.

a)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

b)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

c)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

d)

Distance (km)

Dep

th (

km)

km/s0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

11.5

2

2.5

3

3.5

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

e)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

TrueInitialCG−GNDPH−GN

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

f)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

True

Initial

l−BFGS−GN

l−BFGS−GN−DPH

Figure 4: (a) CG-GN method (φ = 4.1e-3); (b) l-BFGS-GN method(φ = 1.1e-3); (c) DPH-GN method (φ = 4.4e-3); (d) l-BFGS-GN-DPH method (φ = 1.1e-3); (e) and (f) show the comparison of welllog data at 0.1 km and 0.6 km.

Figures 5a and 5b show the inversion results by DGH-GN and l-BFGS-

Page 4: Accelerating Hessian-free Gauss-Newton full-waveform ... · where eAˆ k is the damping term ensuring that H˜ k +eAˆ is positive def-inite, e is a small constant value and Aˆ k

Preconditioned HF Gauss-Newton FWI

GN-DGH methods with the stabilization parameter λ = 1.0e-2. TheDGH-GN and l-BFGS-GN-DGH inverted models are contaminated byartifacts. This is because incorporating the diagonal Gauss-NewtonHessian for preconditioning increases the instability of the inversionprocess. Figures 5c and 5d are DGH-GN and l-BFGS-GN-DGH in-version results with the stabilization parameter λ = 5.0e-2. It can beseen that the DGH-GN and l-BFGS-GN-DGH methods with diago-nal Gauss-Newton Hessian approximation become more stable and themodel can be reconstructed very well. Figures 6a and 6b show the in-version results by PDGH-GN and l-BFGS-GN-PDGH methods withthe stabilization parameter λ = 5.0e-2. The PDGH-GN and l-BFGS-GN-PDGH inversion results based on the proposed pseudo diagonalGauss-Newton Hessian can reconstruct the velocity model stably andefficiently. The l-BFGS-GN-DGH and l-BFGS-GN-PDGH methodswith λ = 5.0e-2 give the best inversion results.

a)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

b)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

c)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

d)

Distance (km)

Dep

th (

km)

km/s0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

11.5

2

2.5

3

3.5

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

e)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4 4.5

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

True

Initial

DGH−GN (λ=1e−2)

DGH−GN (λ=5e−2)

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

f)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4 4.5

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

True

Initial

l−BFGS−GN−DGH (λ=1e−2)

l−BFGS−GN−DGH (λ=5e−2)

Figure 5: (a) DGH-GN method (λ = 1.0e-2 and φ = 8.5e-3); (b) l-BFGS-GN-DGH method (λ = 1.0e-2 and φ = 5.2e-3); (c) DGH-GNmethod (λ = 5.0e-2 and φ = 2.5e-3); (d) l-BFGS-GN-DGH method(λ = 5.0e-2 and φ = 8.4e-4); (e) and (f) are the comparison of the welllog data at 0.1 km and 0.6 km.

a)

Distance (km)

Dep

th (

km)

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

b)

Distance (km)

Dep

th (

km)

km/s0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

11.5

2

2.5

3

3.5

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

c)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

TrueInitialPDGH−GN

1.5 2 2.5 3 3.5

0

0.2

0.4

0.6

0.8

1

d)

Dep

th (

km)

Velocity (km/s) (0.1 km)1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

Velocity (km/s) (0.6 km)

True

Initial

l−BFGS−GN−PDGH

Figure 6: (a) PDGH-GN (λ = 5.0e-2 and φ = 2.1e-3); (b) l-BFGS-GN-PDGH (λ = 5.0e-2 and φ = 8.3e-4); (c) and (d) are the compari-son of the well log data at 0.1 km and 0.6 km.

Figure 7 shows the convergence history of the l-BFGS methods andHF Gauss-Newton methods. The HF Gauss-Newton methods providefaster convergence rates than l-BFGS methods. Except for DGH-GNand l-BFGS-GN-DGH methods (λ = 1.0e-2), the preconditioned HFGauss-Newton methods converge faster than the non-preconditionedCG-GN method. Generally, the methods with l-BFGS preconditionersconverge faster than those methods preconditioned by diagonal Hes-sian approximations. The l-BFGS-GN-DGH and l-BFGS-GN-PDGHmethods with λ = 5.0e-2 give the fastest convergence rates.

In Figure 8a we plot the normalized misfit vs. number of forward prob-lems solved for the HF Gauss-Newton methods. Figure 8b shows thenormalized misfit vs. computation time (s). The preconditioned HFGauss-Newton methods are more efficient than the non-preconditionedCG-GN method. Proper stabilization parameter λ should be deter-

mined to ensure the effectiveness and stability of the DGH-GN and l-BFGS-GN-DGH methods. Except for the l-BFGS-GN-DGH method,the l-BFGS preconditioning schemes with diagonal Hessian approxi-mations as initial guess can reduce the computation burden more ef-fectively and reconstruct the velocity model better than l-BFGS-GNmethod with H0 = I. Although, l-BFGS-GN-DGH method (λ = 5.0e-2) provides fast convergence rate as shown in Figure 7, more computa-tion cost is required for calculating the receiver-side Green’s functions.The l-BFGS-GN-PDGH method (λ = 5.0e-2) shows the best perfor-mance in reducing the computation burden and accelerating the HFGauss-Newton FWI.

20 40 60 80 100 120 140 160 180

−8

−7

−6

−5

−4

−3

−2

−1

0

Outer iteration

Nor

mal

ized

mis

fit (

log)

l−BFGS (I)l−BFGS (DPH)l−BFGS (PDGH)l−BFGS (DGH)DPH−GNDGH−GN (λ=1e−2)DGH−GN (λ=5e−2)PDGH−GN (λ=5e−2)l−BFGS−GN (M=5)l−BFGS−GN−DPHl−BFGS−GN−DGH (λ=1e−2)l−BFGS−GN−DGH (λ=5e−2)l−BFGS−GN−PDGH (λ=5e−2)CG−GN

Figure 7: Comparison of the convergence history for the HF Gauss-Newton methods with different preconditioning schemes.

0 1000 2000 3000 4000 5000

−8

−7

−6

−5

−4

−3

−2

−1

0a)

Number of forward problems solved

Nor

mal

ized

mis

fit (

log)

200 400 600 800 1000 1200 1400 1600 1800

−8

−7

−6

−5

−4

−3

−2

−1

0b)

Computation time (s)

Nor

mal

ized

mis

fit (

log)

DPH−GNDGH−GN (λ=1e−2)DGH−GN (λ=5e−2)PDGH−GN (λ=5e−2)l−BFGS−GN (M=5)l−BFGS−GN−DPHl−BFGS−GN−DGH (λ=1e−2)l−BFGS−GN−DGH (λ=5e−2)l−BFGS−GN−PDGH (λ=5e−2)CG−GN

Figure 8: (a) Normalized misfit (log) vs. Number of forward problemssolved for different preconditioning strategies; (b) Normalized misfit(log) vs. Computation time (s).

CONCLUSIONS

In this paper, we implement a Hessian-free Gauss-Newton FWI, whichobtains search direction by solving the Newton equation with an conjugate-gradient algorithm. To accelerate the HF GN FWI, we develop dif-ferent preconditioning schemes. A pseudo diagonal Gauss-NewtonHessian is introduced as preconditioner. Furthermore, we propose toimprove the l-BFGS preconditioning by employing the diagonal Hes-sian approximations as initial guess. We present numerical examplesto show that the l-BFGS preconditioning method with pseudo diago-nal Gauss-Newton Hessian as initial guess can speed up the HF GNFWI most efficiently.

ACKNOWLEDGMENTS

This research was supported by the sponsors of CREWES project andNational Science and Engineering Research Council of Canada (NSERC,CRDPJ 461179-13).

Page 5: Accelerating Hessian-free Gauss-Newton full-waveform ... · where eAˆ k is the damping term ensuring that H˜ k +eAˆ is positive def-inite, e is a small constant value and Aˆ k

Preconditioned HF Gauss-Newton FWI

REFERENCES

Anagaw, A. Y. and M. D. Sacchi, 2012, Full waveform inversionwith simultaneous sources using the full newton method: SEG Ex-panded Abstracts, 971–975.

Brossier, R., S. Operto, and J. Virieux, 2009, Seismic imaging ofcomplex onshore structures by 2D elastic frequency-domain full-waveform inversion: Geophysics, 74, no. 6, WCC105–WCC118.

——–, 2010, Which data residual norm for robust elastic frequency-domain full-waveform inversion?: Geophysics, 75, no. 3, R37–R46.

Demanet, L., P. Letourneau, N. Boumal, H. Calandra, and S. Snelson,2012, Matrix probing: a randomized preconditioner for the waveequation Hessian: Appl. and Comp. Harmon. Anal., 32, R25–R36.

Fichtner, A. and J. Trampert, 2011, Hessian kernels of seismic datafunctionals based upon adjoint techniques: Geophysical Journal In-ternational, 185, 775–798.

Guitton, A. and E. Dıaz, 2012, Attenuating crosstalk noise with simul-taneous source full waveform inversion: Geophysical Prospecting,60, 759–768.

Hu, W., A. Abubakar, and T. M. Habashy, 2009, Simultaneous mul-tifrequency inversion of full-waveform seismic data: Geophysics,74, R1–R14.

Innanen, K. A., 2014a, Reconciling seismic AVO and precritical re-flection FWI-analysis of the inverse Hessian: SEG Technical Pro-gram Expanded Abstracts, 1022–1027.

——–, 2014b, Seismic AVO and the inverse Hessian in precriticalreflection full waveform inversion: Geophysical Journal Interna-tional, 199, 717–734.

Lailly, P., 1983, The seismic inverse problem as a sequence of beforestack migration: Conference on Inverse Scattering, Theory and Ap-plications, SIAM, Expanded Abstracts, 206–220.

Levenberg, K., 1944, A method for the solution of certain non-linearproblems in least squares: Quarterly of Applied Mathematics, 2,164–168.

Ma, Y. and D. Hale, 2012, Quasi-Newton full-waveform inversionwith a projected Hessian matrix: Geophysics, 77, R207–R216.

Marquardt, D., 1963, An algorithm for least-squares estimation ofnonlinear parameters: SIAM Journal, 11, 431–441.

Metivier, L., F. Bretaudeau, R. Brossier, J. Virieux, and S. Operto,2014, Full waveform inversion and the truncated Newton method:quantitative imaging of complex subsurface structures: Geophysi-cal Prospecting, 62, 1–23.

Nammour, R. and W. Symes, 2009, Approximate constant densityacoustic inverse scattering using dip-dependent scaling: SEG Tech-nical Program Expanded Abstracts, 2347–2351.

Nash, S. G., 1985, Preconditioning of truncated-newton methods:SIAM J. Sci. Statist. Comput, 6, 599–616.

——–, 2000, A survey of truncated-newton methods: Journal of com-putational and applied mathematics, 124, 45–59.

Nocedal, J., 1980, Updating quasi-Newton matrices with limited stor-age: Mathematics of Computation, 35, 773–782.

Nocedal, J. and S. J. Wright, 2006, Numerical optimization: Springer.Operto, S., Y. Gholami, V. Prieux, A. Ribodetti, R. Brossier, L.

Metivier, and J. Virieux, 2013, A guided tour of multiparameterfull waveform inversion with multicomponent data: from theory topractice: The Leading Edge, 32, 1040–1054.

Pan, W., K. A. Innanen, G. F. Margrave, and D. Cao, 2015a, Efficientpseudo-Gauss-Newton full-waveform inversion in the τ-p domain:Geophysics, 80, no. 5, R225–R14.

Pan, W., K. A. Innanen, G. F. Margrave, M. C. Fehler, X. Fang, andJ. Li, 2015b, Estimation of elastic constants in HTI media usingGauss-Newton and Full-Newton multi-parameter full waveform in-version: SEG Technical Program Expanded Abstracts, 1177–1182.

Pan, W., K. A. Innanen, G. F. Margrave, M. C. Fhler, X. Fang, andJ. Li, 2015c, Estimation of elastic constants in HTI media using

Gauss-Newton and Full-Newton multi-parameter full-waveform in-version: submitted.

Pan, W., G. F. Margrave, and K. A. Innanen, 2014, Iterative modelingmigration and inversion (IMMI): Combining full waveform inver-sion with standard inversion methodology: SEG Technical ProgramExpanded Abstracts, 938–943.

Plessix, R. E., 2006, A review of the adjoint-state method for com-puting the gradient of a functional with geophysical applications:Geophysical Journal International, 167, 495–503.

Pratt, R. G., C. Shin, and G. J. Hicks, 1998, Gauss-Newton and fullNewton methods in frequency-space seismic waveform inversion:Geophysical Journal International, 133, 341–362.

Saad, Y., 2003, Iterative methods for sparse linear systems: SIAM.Sainath, T. N., L. Horesh, B. Kingsbury, A. Y. Aravkin, and B. Ram-

abhadran, 2013, Accelerating Hessian-free optimization for DeepNeural Networks by implicit preconditioning and sampling: IEEEWorkshop on Automatic Speech Recognition and Understanding,303–308.

Santosa, F. and W. W. Symes, 1988, Computation of the Hessian forleast-squares solutions of inverse problems of reflection seismol-ogy: Inverse Problems, 4, 211–233.

Shin, C., S. Jang, and D. Min, 2001a, Improved amplitude preser-vation for prestack depth migration by inverse scattering theory:Geophysical Prospecting, 49, 592–606.

Shin, C., K. Yoon, K. J. Marfurt, K. Park, D. Yang, H. Y. Lim,S. Chung, and S. Shin, 2001b, Efficient calculation of a partial-derivative wavefield using reciprocity for seismic imaging and in-version: Geophysics, 66, 1856–1863.

Sirgue, L. and R. G. Pratt, 2004, Efficient waveform inversion andimaging: A strategy for selecting temporal frequencies: Geo-physics, 69, 231–248.

Tang, Y., 2009, Target-oriented wave-equation least-squares migra-tion/inversion with phase-encoded Hessian: Geophysics, 74, no.6, WCA95–WCA107.

Tarantola, A., 1984, Inversion of seismic reflection data in the acousticapproximation: Geophysics, 49, 1259–1266.

Virieux, A. and S. Operto, 2009, An overview of full-waveform in-version in exploration geophysics: Geophysics, 74, no. 6, WCC1–WCC26.


Recommended