Abstract - Stanford Universitylexing/traveltime.pdf · 2019-11-29 · Abstract This paper...

Solving Traveltime Tomography with Deep Learning

Yuwei Fan [email protected] of Mathematics, Stanford University, Stanford, CA 94305.

Lexing Ying [email protected]

Department of Mathematics and ICME, Stanford University, Stanford, CA 94305.

AbstractThis paper introduces a neural network approach for solving two-dimensional traveltime tomog-raphy (TT) problems based on the eikonal equation. The mathematical problem of TT is to re-cover the slowness field of a medium based on the boundary measurement of the traveltimes ofwaves going through the medium. This inverse map is high-dimensional and nonlinear. For thecircular tomography geometry, a perturbative analysis shows that the forward map can be approxi-mated by a vectorized convolution operator in the angular direction. Motivated by this and filteredback-projection, we propose an effective neural network architecture for the inverse map usingthe recently proposed BCR-Net, with weights learned from training datasets. Numerical resultsdemonstrate the efficiency of the proposed neural networks.Keywords: Traveltime tomography; Eikonal equation; Inverse problem; Neural networks; Convo-lutional neural network.

1. Introduction

Traveltime tomography is a method to determinate the internal properties of a medium by measur-ing the traveltimes of waves going through the medium. It is first motivated in global seismologyin determining the inner structure of the Earth by measuring at different seismic stations the trav-eltimes of seismic waves produced by earthquakes (Backus and Gilbert, 1968; Rawlinson et al.,2010). By now, it has found many applications, such as Sun’s interior (Kosovichev, 1996), oceanacoustics (Munk et al., 2009), and ultrasound tomography (Schomberg, 1978; Jin and Wang, 2006)in biomedical imaging.

Background. The governing equation of first-arrival traveltime tomography (TT) is the eikonalequation (Born and Wolf, 1965) and we consider the two dimensional case in this paper for simplic-ity. Let Ω ⊂ R2 be an open bounded domain with Lipschitz boundary Γ = ∂Ω. Suppose that thepositive function m(x) is the slowness field, i.e., the reciprocal of the velocity field, defined in Ω.The traveltime u(x) satisfies the eikonal equation |∇u(x)| = m(x). Since it is a special case of theHamilton-Jacobi equation, the solution u(x) can develop singularities and should be understood inthe viscosity sense (Ishii, 1987).

A typical experimental setup of TT is as follows. For each point xs ∈ Γ, one sets up the Sonerboundary condition at point xs, i.e., only zero value at xs, and solves for the following the eikonalequation

|∇us(x)| = m(x), x ∈ Ω,

us(xs) = 0,(1)

1

arX

iv:1

911.

1163

6v1

[m

ath.

NA

] 2

5 N

ov 2

019

where the superscript s is to index the source point. Recording the solution of us(·) at points xr ∈ Γproduces the whole data set us(xr). In practice xr and xs are samples from a discrete set of points onΓ. Here we assume for now that they are placed everywhere on Γ, for the simplicity of presentationand mathematical analysis.

The forward problem is to compute us(xr) given the slowness field m(x). On the other hand,the inverse problem, at the center of the first-arrival TT, is to recover m(x) given us(xr).

Both the forward and inverse problems are computationally challenging, and a lot of effortshave been devoted to their numerical solutions. For the forward problem, the eikonal equation, as aspecial case of the Hamilton-Jacobi equation, can develop singular solutions. In order to computethe physically meaningful viscosity solution, special care such as up-winding is required. As theresulting discrete system is nonlinear, fast iteration methods such as fast marching method (Popoviciand Sethian, 1997; Sethian, 1999) and fast sweeping method (Zhao, 2005; Kao et al., 2005; Qianet al., 2007) have been developed. Among them, the fast sweeping methods have been successfullyapplied to many traveltime tomography problems (Leung et al., 2006). The inverse problem isoften computationally more intensive, due to the nonlinearity of the problem. Typical methods takean optimization approach with proper regularization (Chung et al., 2011) and require a significantnumber of iterations.

A deep learning approach. Over the past decade or so, deep learning (DL) has become the dom-inant approach in computer vision, image processing, speech recognition, and many other applica-tions in machine learning and data science (Hinton et al., 2012; Krizhevsky et al., 2012; Goodfellowet al., 2016; Ma et al., 2015; Leung et al., 2014; Sutskever et al., 2014; LeCun et al., 2015; Schmid-huber, 2015). From a technical point of view, this success is a synergy of several key developments:neural networks (NNs) as a flexible framework for representing high-dimensional functions andmaps, simple algorithms such as back-propagation (BP) and stochastic gradient descent (SGD) fortuning the model parameters, efficient general software packages such as Tensorflow and Pytorch,and unprecedented computing power provided by GPUs and TPUs.

In the past several years, deep neural networks (DNNs) have been increasingly used in scientificcomputing, particularly in solving PDE-related problems (Khoo et al., 2017; Berg and Nystrom,2018; Han et al., 2018a; Fan et al., 2018; Araya-Polo et al., 2018; Raissi and Karniadakis, 2018;Kutyniok et al., 2019; Feliu-Faba et al., 2019), in two directions. In the first direction, as NNs offera powerful tool for approximating high-dimensional functions (Cybenko, 1989), it is natural to usethem as an ansatz for high-dimensional PDEs (Rudd and Ferrari, 2015; Carleo and Troyer, 2017;Han et al., 2018a; Khoo et al., 2019; E and Yu, 2018). The second direction focuses on the low-dimensional parameterized PDE problems, by using the DNNs to represent the nonlinear map fromthe high-dimensional parameters of the PDE solution (Long et al., 2018; Han et al., 2018b; Khooet al., 2017; Fan et al., 2018, 2019b,a; Li et al., 2019; Bar and Sochen, 2019).

As an extension of the second direction, DNNs have been widely applied to inverse problems(Khoo and Ying, 2018; Hoole, 1993; Kabir et al., 2008; Adler and Oktem, 2017; Lucas et al.,2018; Tan et al., 2018; Fan and Ying, 2019a,b; Raissi et al., 2019). For the forward problem,since applying neural networks to input data can be carried out rapidly under current software andhardware architectures, the solution of the forward problem can be significantly accelerated whenthe forward map is represented with a DNN. For the inverse problem, DNNs can help in two criticalways: (1) due to its flexibility in representing high-dimensional functions, DNNs can potentiallybe used to approximate the full inverse map, thus avoiding the iterative solution process; (2) recent

2

work in machine learning shows that DNNs often can automatically extract features from the dataand offer a data-driven regularization prior.

This paper applies the deep learning approach to the first-arrival TT by representing the wholeinverse map using a NN. The starting point is a perturbative analysis of the forward map, whichreveals that for the circular tomography geometry, the forward map contains a one-dimensional con-volution with multiple channels, after appropriate reparameterization. This observation motivates torepresent the forward map from 2D coefficient m(x) to 2d data us(xr) by a one-dimensional con-volution neural network (with multiple channels). Further, the one-dimensional convolution neuralnetwork can be implemented by the recently proposed multiscale neural network (Fan et al., 2018,2019a). Following the idea of filtered back-projection (Schuster and Quintus-Bosz, 1993), the in-verse map can be approximated by the adjoint map followed by a pseudo-differential filtering step.This suggests an architecture for the inverse map by reversing the architecture of the forward mapfollowed with a simple two-dimensional convolution neural network.

For the test problems being considered, the resulting neural networks have 105 parameters whenthe data is of size 160× 160 (a fully-connected layer results in 1604 ≈ 6× 108 parameters), thanksto the convolutional structure and the compact multiscale neural network. This rather small numberof parameters allows for rapid and accurate training, even on rather limited data sets.

Organization. This rest of the paper is organized as follows. The mathematical background isgiven in Section 2. The design and architecture of the DNNs of the forward and inverse maps arediscussed in Section 3. The numerical results in Section 4 demonstrate the numerical efficiency andthe generalization of the proposed neural networks.

2. Mathematical analysis of traveltime tomography

2.1. Problem setup

This section describes the necessary mathematical insights that motivate the NN architecture design.Let us consider the so-called differential imaging setting, where a background slowness fieldm0(x)is known, and denote by us0 the solution of the eikonal equation associated with the field m0:

|∇us0(x)| = m0(x), x ∈ Ω,

us0(xs) = 0.(2)

Then for a perturbation m to the slowness field, the difference in the traveltime us ≡ us − us0naturally satisfies

|∇(us0(x) + us(x))| = m0(x) + m(x), x ∈ Ω,

us(xs) = 0.(3)

The imaging data d(xs, xr) consists of us(xr) over all xs and xr: d(xs, xr) ≡ us(xr).To better understand the dependence of us on m, we assume m to be sufficient small and carry

out a perturbative analysis. Squaring (3) and canceling the background using (2) result in

(∇us(x))T∇us(x) + 2(∇us0(x))T∇us(x) = m(x)2 + 2m0(x)m(x). (4)

Since m(x) is sufficiently small, ∇us(x) is also a small quantity. Keeping only linear terms in mand discarding the higher order ones yields

∇us0(x)T∇us(x) ≈ m0(x)m(x), (5)

3

which is an advection equation. Using |∇us0(x)T| = m0(x), one can further simplify the upperequation as

∇us0(x)T∇us(x) ≈ m(x), (6)

where · stands for the unit vector.For simplicity, let C0(xs, xr) be the unique characteristic of us0(xr) that connects xs and xr.

Thend(xs, xr) ≡ us(xr) ≈

∫C0(xs,xr)

m(x) dx ≡ d1(xs, xr), (7)

where d1(xs, xr) is introduced to stand for the first-order approximation to d(xs, xr). Particularly,if the background slowness field is a constant, then C0(xs, xr) is a line segment with start and endpoints to be xs and xr, respectively, and

d1(xs, xr) = |xs − xr|∫ 1

0m(xs + τ(xr − xs)) dτ.

! = 1! > 1

! < 1

Source

Receiver

Figure 1: Illustration of the problem setup. The domain is a unit disk and the light sources and thereceivers are equidistantly placed on the boundary.

The most relevant geometry in traveltime tomography either for medicine and earth science isthe circular geometry where Ω is modeled as a unit disk (Chung et al., 2011; Deckelnick et al.,2011; Yeung et al., 2018). As illustrated in Fig. 1, the sources and receivers are placed on theboundary equidistantly. More precisely, xs = (cos(s), sin(s)) with s = 2πk

Ns, k = 0, . . . , Ns − 1

and xr = (cos(r), sin(r)) with r = 2πjNr

, j = 0, . . . , Nr − 1, where Ns = Nr in the current setup.Often in many cases, the background slowness field m0(x) is only radially dependent, or even

a constant (Deckelnick et al., 2011; Yeung et al., 2018). In what follows, m0(x) is assumed to beradially dependent, i.e., m0(x) = m0(|x|).

2.2. Mathematical analysis on the forward map

Since the domain Ω is a disk, it is convenient to rewrite the problem in the polar coordinates. Letxr = (cos(r), sin(r)), xs = (cos(s), sin(s)) and x = (ρ cos(θ), ρ sin(θ)), where ρ ∈ [0, 1] is theradial coordinate and r, s, θ ∈ [0, 2π) are the angular ones..

Figure 2 presents an example of the slowness field and the measurement data. Notice that themain signal in us(xr) and d(xs, xr) concentrates on the minor diagonal part. Due to the circular to-mography geometry, it is convenient to “shear” the measurement data by introducing a new angular

4

(a) m(x) (b) us(xr) (c) d(xs, xr)

(d) m(θ, ρ) (e) us(xs+h) (f ) d(s, h)

Figure 2: Visualization of the slowness field and the measurement data. The upper figures arethe perturbation of the slowness field m(x) (m0 = 1 and m ≤ 0 in this sample), themeasurement data us(xr) and the difference d(xs, xr) with respect to the backgroundmeasurement data. The lower-left figure is m(x) in the polar coordinates and the lower-right two figures the “shear” of their corresponding upper figures.

variable h = r − s, where the difference here is understood modulus 2π. As we shall see in thenext section, this shearing step significantly simplifies the architecture of the NNs. Under the newparameterization, the measurement data is

d(s, h) ≡ d(xs, xs+h). (8)

The same convention applies to its first order approximation: d1(s, h) ≡ d1(xs, xs+h). By writingm(θ, ρ) ≡ m(ρ cos(θ), ρ sin(θ)) in the polar coordinates, the linear dependence of d1(s, h) on min (7) states that there exists a kernel distribution K(s, h, θ, ρ) such that

d1(s, h) =

∫ 1

0

∫ 2π

0K(s, h, θ, ρ)m(θ, ρ) dρdθ. (9)

Convolution form of the map m(θ, ρ) → d1(s, h). Since the domain is a disk and m0 is onlyradially independent, the whole problem is equivariant to rotation. In this case, the situation can bedramatically simplified. Precisely, we have the following proposition.

Proposition 1 There exists a function κ(h, ρ, ·) periodic in the last parameter such that

d1(s, h) =

∫ 1

0

∫ 2π

0κ(h, ρ, s− θ)m(θ, ρ) dρ dθ. (10)

5

Proof Let C0(s, r) ≡ C0((cos(s), sin(s)), (cos(r), sin(r))) and we parameterize the characteristicC0(s, r) as ps,h(τ) ≡ (θs,h(τ), ρs,h(τ)), τ ∈ [0, 1] with ρs,h(0) = ρs,h(1) = 1 and θs,h(0) = s,θs,h(1) = r. Then the relationship (7) between d1 and m can be written as

d1(s, h) =

∫ 1

0m(θs,h(τ), ρs,h(τ))‖p′s,h(τ)‖ dτ.

Since the background slownessm0 is radially independent, the characteristic C0(s, r) is rotationinvariant in the sense that for any φ ∈ [0, 2π), if (θs,h(τ), ρs,h(τ)) is a parameterization of C0(s, r),then (θs,h(τ) + φ, ρs,h(τ)) is a parameterization of C0(s+ φ, r+ φ). Hence, for any φ ∈ [0, 2π), ifwe rotate the system by a angular φ, then

d1(s+ φ, h) =

∫ 1

0m(θs+φ,h(τ), ρs+φ,h(τ))‖p′s+φ,h(τ)‖ dτ

=

∫ 1

0m(θs,h(τ) + φ, ρs,h(τ))‖p′s,h(τ)‖dτ.

Writing this equation in the form of (9) directly yields K(s + φ, h, θ + φ, ρ) = K(s, h, θ, ρ).Hence, there is a periodic κ(s, h, ·) in the last parameter such that K(s, h, θ, ρ) = κ(h, ρ, s − θ).This completes the proof.

Proposition 1 shows that K acts on m in the angular direction by a convolution, which is, infact, the motivation behind shearing the measurement data d. This property allows us to evaluatethe map m(θ, ρ)→ d(s, h) by a family of 1D convolutions, parameterized ρ and h.

Discretization. All the above analysis is in the continuous space. One can apply a discretizationon the eikonal equation (1) by finite difference and solve it by fast sweeping method or fast marchingmethod. Here we assume that the discretization of m(θ, ρ) is on a uniform mesh on [0, 2π)× [0, 1].More details of the discretization and the numerical solver will be discussed in the Section 4. Witha slight abuse of notation, we use the same letters to denote the continuous kernels, variables andtheir discretization. Then the discretization version of Equations (9) and (10) is

d(s, h) ≈∑ρ,θ

K(s, h, θ, ρ)m(θ, ρ) =∑ρ

(κ(h, ρ, ·) ∗ m(·, ρ))(s). (11)

3. Neural networks for TT

In this section, we describe the NN architecture for the inverse map d(s, h)→ m(θ, ρ) based on themathematical analysis in Section 2. To start, we first study the NN for the forward map and then theinverse map.

Forward map. The perturbative analysis in Section 2.2 shows that, when m is sufficiently small,the forward map m(θ, ρ)→ d(s, h) can be approximated by (11). In terms of the NN architecture,for small m, the forward map (11) can be approximated by a (non-local) convolution on the angulardirection and a fully-connected operator on the (h, ρ) direction. In the actual implementation, it canbe represented by the convolution layer by taking h and ρ as the channel dimensions. For larger m,this linear approximation is no longer accurate. In order to extend the neural network for (11) to the

6

Data: c, Ncnn ∈ N+, m ∈ RNθ×NρResult: d ∈ RNs×Nh/* Resampling data to fit for BCR-Net. */ξ = Conv1d[c, 1, id](m) with ρ as the channel direction/* Use BCR-Net to implement the convolutional neural network. */ζ = BCR-Net[c,Ncnn](ξ)/* Reconstruct the result from the output of BCR-Net. */d = Conv1d[Nh, 1, id](ζ)

Algorithm 1: Neural network architecture for the forward map m→ d.

nonlinear case, we propose to increase the number of convolution layers and nonlinear activationfunctions.

In the (h, ρ) direction, denote the number of channels by c, whose value is problem-dependentand will be discussed in the numerical part. In the angular direction, since the convolution betweenm and d is global, in order to represent global interactions the window size of the convolution wmust satisfy the following relationship

wNcnn ≥ Nθ, (12)

where Ncnn is the number of layers and Nθ is number of discretization points on the angulardirection. A simple calculation shows that the number of parameters of the neural network isO(wNcnnc

2) ∼ O(Nθc2). The recently proposed BCR-Net (Fan et al., 2019a) has been demon-

strated to require fewer number of parameters and provide better efficiency for such global inter-actions. Therefore, in our architecture, we replace the convolution layers with the BCR-Net. Theresulting neural network architecture for the forward map is summarized in Algorithm 1 with anestimate of O(c2 log(Nθ)Ncnn) parameters. The components are explained in the following.

• ξ = Conv1d[c, w, φ](m) mapping m ∈ RNθ×Nρ to ξ ∈ RNθ×c is the one-dimensional convo-lution layer with window size w, channel number c, activation function φ and period paddingon the first direction.

• BCR-Net is motivated by the data-sparse nonstandard wavelet representation of the pseudo-differential operators (Beylkin et al., 1991). It processes the information at different scaleseparately and each scale can be understood as a local convolutional neural network. Theone-dimensional ζ = BCR-Net[c,Ncnn](ξ) maps ξ ∈ RNθ×c to ζ ∈ RNθ×c where thenumber of channels and layers in the local convolutional neural network in each scale are cand ncnn, respectively. The readers are referred to (Fan et al., 2019a) for more details on theBCR-Net.

Inverse map. The perturbative analysis in Section 2.2 shows that if m is sufficiently small, theforward map can be approximated by d ≈ Km, the operator notation of the discretization (11).Here m is a vector indexed by (θ, ρ), d is a vector indexed by (s, h), and K is a matrix with rowindexed by (s, h) and column indexed by (θ, ρ).

The filtered back-projection method (Schuster and Quintus-Bosz, 1993) suggests the followingformula to recover m:

m ≈ (KTK + εI)−1KTd, (13)

7

where ε is a regularization parameter. The first piece KTd can also be written as a family of convo-lutions

(KTd)(θ, ρ) =∑h

(κ(h, ρ, ·) ∗ d(·, h))(θ). (14)

The application of KT to d can be approximated with a similar neural network to K in Algo-rithm 1. The second piece (KTK + εI)−1 is a pseudo-differential operator in the (θ, ρ) spaceand it is implemented with several two-dimensional convolutional layers for simplicity. Then theresulting architecture for the inverse map is summarized in Algorithm 2 and illustrated in Fig. 3.The Conv2d[c2, w, φ] used in Algorithm 2 is the two-dimensional convolution layer with windowsize w, channel number c2, activation function φ and periodic padding on the first direction andzero padding on the second direction. The selection of the hyper-parameters in Algorithm 2 will bediscussed in Section 4.

Data: c, c2, w, Ncnn, Ncnn2 ∈ N+, d ∈ RNs×NhResult: m ∈ RNθ×Nρ/* Application of KT to d */ζ = Conv1d[c, 1, id](d) with h as the channel directionξ = BCR-Net[c,Ncnn](ζ)ξ(0) = Conv1d[Nρ, 1, id](ξ)/* Application of (KTK + εI)−1 */for k from 1 to Ncnn2 − 1 do

ξ(k) = Conv2d[c2, w,ReLU](ξ(k−1))endm = Conv2d[1, w, id](ξ(Ncnn2−1))

Algorithm 2: Neural network architecture for the inverse problem d→ m.

BCR-NetResampling

Reconstruction

Conv2d

Conv2d

Conv2d

!" (!"! + %&) ()

*+×*- *+×. *+×. */×*0 */×*0*/×*0×.1*/×*0×.1

Figure 3: Neural network architecture for the inverse map of TT.

4. Numerical tests

This section reports the numerical performance of the proposed neural network architecture in Al-gorithm 2 for the inverse map d→ m.

8

4.1. Experimental setup

In order to solve the eikonal equation (1) on the unit disk Ω, we embed Ω into the square do-main [−1, 1]2 by specifying sufficiently large slowness values outside Ω. The domain [−1, 1]2 isdiscretized with a uniform Cartesian mesh with 160 points in each direction by a finite differencescheme. The fast sweeping method proposed in (Zhao, 2005) is used to solve the nonlinear discretesystem. In the polar coordinates, the domain (θ, ρ) ∈ [0, 2π] × [0, 1] is partitioned by uniformlyCartesian mesh with 160 × 80 points, i.e., Nθ = 160 and Nρ = 80. As m(θ, ρ) used in Algo-rithm 2 is in the polar coordinates while the eikonal equation is solved in the Cartesian ones, theperturbation of the slowness field m is treated as a piecewise linear function in the domain Ω andis interpolated on to the polar grid. The number of sources and receivers as Ns = Nr = 160, andhence Nh = 160.

The NN in Algorithm 2 is implemented with Keras (Chollet et al., 2015) running on top ofTensorFlow (Abadi et al., 2016). All the parameters of the network are initialized by Xavier initial-ization (Glorot and Bengio, 2010). The loss function is the mean squared error, and the optimizer isthe Nadam (Dozat, 2016). In the training process, the batch size and the learning rate is firstly setas 32 and 10−3 respectively, and the NN is trained 100 epochs. One then increases the batch size bya factor 2 till 512 with the learning rate unchanged, and then decreases the learning rate by a factor101/2 to 10−5 with the batch size fixed as 512. In each step, the NN is trained with 50 epochs. Forthe hyper-parameters used in Algorithm 2, Ncnn = 6, Ncnn2 = 5, and w = 3× 3. The selection ofthe channel number c will be studied later.

4.2. Results

For a fixed m, d(s, h) stands for the exact measurement data solved by numerical discretization of(1). The prediction of the NN from d is denoted by mNN. The metric for the prediction is the peaksignal-to-noise ratio (PSNR), which is defined as

PSNR = 10 log10

(Max2

MSE

), Max = max

ij(mij)−min

ij(mij), MSE =

1

NθNρ

∑i,j

|mi,j−mNNi,j |2.

(15)For each experiment, the test PSNR is then obtained by averaging (15) over a given set of testsamples. The numerical results presented below are obtained by repeating the training process fivetimes, using different random seeds for the NN initialization.

The numerical experiments focus on the shape reconstruction setting (Ustundag, 2008; Deck-elnick et al., 2011), where m are often piecewise constant inclusions. The background slownessfield is set as m0 ≡ 1 and the slowness field m is assumed to be the sum of Ne piecewise constantellipses. As the slowness field m is positive, it is required that m > −1. For each ellipse, thedirection is uniformly random over the unit circle, the position is uniformly sampled in the disk,and the width and height depend on the datasets. It is also required that each ellipse lies in the diskand there is no intersection between every two ellipses. Three types of data sets are generated totest the neural network.

• Negative inclusions. m, the perturbation of the slowness, is −0.5 in the ellipses and 0 oth-erwise, and the width and height of each ellipse are sampled from the uniform distributionsU(0.1, 0.2) and U(0.05, 0.1), respectively.

9

• Positive inclusions. m is 2 in the ellipses and 0 otherwise, and the width and height of eachellipse are sampled from U(0.2, 0.4) and U(0.1, 0.2), respectively.

• Mixture inclusions. The setup of each ellipse is either a negative one in the negative inclusionsor a positive one in the positive inclusions.

For each type, we generate two datasets forNe = 2 and 4. For each test, 20, 480 samples (mi, di)are generated with 16, 384 used for training and the remaining 4, 096 for testing.

(a) Negative inclusions (b) Positive inclusions (c) Mixture inclusions

Figure 4: The test PSNR for different channel numbers c for the three types of data with Ne = 4.

reference mNN with δ = 0 mNN with δ = 2% mNN with δ = 10%

Neg

Pos

Mix

Figure 5: NN prediction of a sample in the test data for negative (first row) / positive (second row) /mixture (third row) inclusions with Ne = 4 for different noise level δ = 0, 2% and 10%.

10

(a) reference (b) mNN, δ = 0% (c) mNN, δ = 1% (d) mNN, δ = 2%

(e) reference (f ) mNN, δ = 0% (g) mNN, δ = 1% (h) mNN, δ = 2%

Figure 6: NN generalization test for the negative inclusions. The upper (or lower) figures: the NNis trained by the data of the number of ellipses Ne = 2 (or 4) with noise level δ = 0, 1%or 2% and test by the data of Ne = 4 (or 2) with the same noise level.

The first numerical study is concerned with the choice of channel number c in Algorithm 2.Figure 4 presents the test PSNR and the number of parameters with different channel number cfor three types of data sets with Ne = 4. As the channel number c increases, the test PSNR firstconsistently increases and then saturates for all the three types of data. Notice that the number ofparameters of the neural network is O(c2 log(Nθ)Ncnn). The choice of c = 30 is a reasonablebalance between accuracy and efficiency and the total number of parameters is 684K.

To model the uncertainty in the measurement data, we introduce noises to the measurementdata by defining us,δ(xr) ≡ (1 + Ziδ)u

s(xr), where Zi is a Gaussian random variable with zeromean and unit variance and δ controls the signal-to-noise ratio. In terms of the actual data d ofthe differential imaging, dδ(s, h) ≡ (1 + Ziδ)d(s, h) + Ziδu

s0(xr). Notice that, since the mean of

‖us0(xr)‖‖d(s,h)‖ for all the samples lies in [15, 30] in these experiments, the signal-to-noise ratio for d is infact more than 15 · δ. For each noisy level δ = 0, 2%, 10%, an independent NN is trained and testedwith the noisy data set (dδi , mi).

Figure 5 collects, for different noise level δ, samples for all three data types: (1) negative inclu-sions with Ne = 4, (2) positive inclusions with Ne = 4, and (3) mixture inclusions with Ne = 4.The NN is trained with the datasets generated in the same way as the test data. When there is nonoise in the measurement data, the NN consistently gives accurate predictions of the slowness fieldm, in the position, shape, and direction of the ellipse. For the small noise levels, for example,δ = 2%, the boundary of the shapes slightly blurs while the position and direction of the ellipse arestill correct. As the noise level δ increases, the shapes become fuzzy but the position and number ofshapes are always correct. This demonstrates the proposed NN architecture is capable of learningthe inverse problem.

11

reference Neg Pos MixN

egPo

sM

ix

Figure 7: NN generalization test for different types of data sets. The first column is the referencesolution. In each column of the last three columns, the NN is trained with one data type(negative, positive, or mixed) and is tested on all three data types withNe = 4 and withoutnoise.

The next test is about the generalization of the proposed NN. We first train the NN by the dataset of the negative inclusions with Ne = 2 (or 4) with noise level δ = 0, 1% or 2% and test by thedata of the negative inclusions with Ne = 4 (or 2) with the same noise level. The results, presentedin Fig. 6, indicate that the NN trained by the data with two inclusions is capable of recovering themeasurement data of the case with four inclusions, and vice versa. This shows that the trained NNis capable of predicting beyond the training scenario.

The last test is about the prediction power of the NN on one data type while trained with another.In Fig. 7, the first column is the reference solution. In each of the rest three columns, the NN istrained with one data type (negative, positive, or mixed) and is tested on all three data types, withNe = 4 and without noise. The figures in the second column show that the NN trained by negativeinclusions fails to capture the information of the positive inclusions, and vice versa, the third columndemonstrates that the NN trained with positive inclusions fails for the negative inclusions. On theother hand, the NN trained with mixed inclusions is capable of predicting reasonably well for allthree data types.

12

5. Discussions

This paper presents a neural network approach for the inverse problems of first arrival traveltimetomography, by using the NN to approximate the whole inverse map from the measurement data tothe slowness field. The perturbative analysis, which indicates that the linearized forward map canbe represented by a one-dimensional convolution with multiple channels, inspires the design of thewhole NN architectures. The analysis in this paper can also be extended to the three-dimensionalTT problems by leveraging recent work such as (Cohen et al., 2018).

Acknowledgments

The work of Y.F. and L.Y. is partially supported by the U.S. Department of Energy, Office of Sci-ence, Office of Advanced Scientific Computing Research, Scientific Discovery through AdvancedComputing (SciDAC) program. The work of L.Y. is also partially supported by the National ScienceFoundation under award DMS-1818449.

References

Martın Abadi et al. Tensorflow: A system for large-scale machine learning. In OSDI, volume 16,pages 265–283, 2016.

Jonas Adler and Ozan Oktem. Solving ill-posed inverse problems using iterative deep neural net-works. Inverse Problems, 33(12):124007, 2017.

M. Araya-Polo, J. Jennings, A. Adler, and T. Dahlke. Deep-learning tomography. The LeadingEdge, 37(1):58–66, 2018. doi: 10.1190/tle37010058.1. URL https://doi.org/10.1190/tle37010058.1.

George Backus and Freeman Gilbert. The resolving power of gross Earth data. Geophysical JournalInternational, 16(2):169–205, 1968.

Leah Bar and Nir Sochen. Unsupervised deep learning algorithm for PDE-based forward and inverseproblems. arXiv preprint arXiv:1904.05417, 2019.

Jens Berg and Kaj Nystrom. A unified deep artificial neural network approach to partial differentialequations in complex geometries. Neurocomputing, 317:28–41, 2018.

Gregory Beylkin, Ronald Coifman, and Vladimir Rokhlin. Fast wavelet transforms and numericalalgorithms I. Communications on pure and applied mathematics, 44(2):141–183, 1991.

Max Born and Emil Wolf. Principles of optics: electromagnetic theory of propagation, interferenceand diffraction of light. Oxford: Pergamon, 3rd edition, 1965.

Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificialneural networks. Science, 355(6325):602–606, 2017.

Francois Chollet et al. Keras. https://keras.io, 2015.

Eric Chung, Jianliang Qian, Gunther Uhlmann, and Hongkai Zhao. An adaptive phase space methodwith application to reflection traveltime tomography. Inverse Problems, 27(11):115002, 2011.

13

https://doi.org/10.1190/tle37010058.1

https://doi.org/10.1190/tle37010058.1

https://keras.io

Taco S. Cohen, Mario Geiger, Jonas Khler, and Max Welling. Spherical CNNs. In Interna-tional Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Hkbd5xZRb.

George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of con-trol, signals and systems, 2(4):303–314, 1989.

Klaus Deckelnick, Charles M Elliott, and Vanessa Styles. Numerical analysis of an inverse problemfor the eikonal equation. Numerische Mathematik, 119(2):245, 2011.

Timothy Dozat. Incorporating Nesterov momentum into adam. International Conference on Learn-ing Representations, 2016.

Weinan E and Bing Yu. The deep Ritz method: A deep learning-based numerical algorithm forsolving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.

Yuwei Fan and Lexing Ying. Solving electrical impedance tomography with deep learning. arXivpreprint arXiv:1906.03944, 2019a.

Yuwei Fan and Lexing Ying. Solving optical tomography with deep learning. arXiv preprintarXiv:1910.04756, 2019b.

Yuwei Fan, Lin Lin, Lexing Ying, and Leonardo Zepeda-Nunez. A multiscale neural network basedon hierarchical matrices. arXiv preprint arXiv:1807.01883, 2018.

Yuwei Fan, Cindy Orozco Bohorquez, and Lexing Ying. BCR-Net: a neural network based on thenonstandard wavelet form. Journal of Computational Physics, 384:1–15, 2019a.

Yuwei Fan, Jordi Feliu-Faba, Lin Lin, Lexing Ying, and Leonardo Zepeda-Nunez. A multiscaleneural network based on hierarchical nested bases. Research in the Mathematical Sciences, 6(2):21, 2019b.

Jordi Feliu-Faba, Yuwei Fan, and Lexing Ying. Meta-learning pseudo-differential operators withdeep neural networks. arXiv preprint arXiv:1906.06782, 2019.

Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neuralnetworks. In Proceedings of the thirteenth international conference on artificial intelligence andstatistics, pages 249–256, 2010.

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1.MIT press Cambridge, 2016.

J. Han, A. Jentzen, and W. E. Solving high-dimensional partial differential equations using deeplearning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018a.

Jiequn Han, Linfeng Zhang, Roberto Car, and Weinan E. Deep potential: A general representationof a many-body potential energy surface. Communications in Computational Physics, 23(3):629–639, 2018b. doi: https://doi.org/10.4208/cicp.OA-2017-0213.

14

https://openreview.net/forum?id=Hkbd5xZRb

https://openreview.net/forum?id=Hkbd5xZRb

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke,P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling inspeech recognition: The shared views of four research groups. IEEE Signal Processing Maga-zine, 29(6):82–97, 2012. ISSN 1053-5888. doi: 10.1109/MSP.2012.2205597.

S Ratnajeevan H Hoole. Artificial neural networks in the solution of inverse electromagnetic fieldproblems. IEEE transactions on Magnetics, 29(2):1931–1934, 1993.

Hitoshi Ishii. A simple, direct proof of uniqueness for solutions of the Hamilton-Jacobi equationsof eikonal type. Proceedings of the American Mathematical Society, pages 247–251, 1987.

Xing Jin and Lihong V Wang. Thermoacoustic tomography with correction for acoustic speedvariations. Physics in Medicine & Biology, 51(24):6437, 2006.

Humayun Kabir, Ying Wang, Ming Yu, and Qi-Jun Zhang. Neural network inverse modeling and ap-plications to microwave filter design. IEEE Transactions on Microwave Theory and Techniques,56(4):867–879, 2008.

Chiu-Yen Kao, Stanley Osher, and Yen-Hsi Tsai. Fast sweeping methods for static Hamilton–Jacobiequations. SIAM journal on numerical analysis, 42(6):2612–2632, 2005.

Yuehaw Khoo and Lexing Ying. SwitchNet: a neural network model for forward and inverse scat-tering problems. arXiv preprint arXiv:1810.09675, 2018.

Yuehaw Khoo, Jianfeng Lu, and Lexing Ying. Solving parametric PDE problems with artificialneural networks. arXiv preprint arXiv:1707.03351, 2017.

Yuehaw Khoo, Jianfeng Lu, and Lexing Ying. Solving for high-dimensional committor functionsusing artificial neural networks. Research in the Mathematical Sciences, 6(1):1, 2019.

Alexander G Kosovichev. Tomographic imaging of the Sun’s interior. The Astrophysical JournalLetters, 461(1):L55, 1996.

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convo-lutional neural networks. In Proceedings of the 25th International Conference on Neural Informa-tion Processing Systems - Volume 1, NIPS’12, pages 1097–1105, USA, 2012. Curran AssociatesInc.

Gitta Kutyniok, Philipp Petersen, Mones Raslan, and Reinhold Schneider. A theoretical analysis ofdeep neural networks and parametric PDEs. arXiv preprint arXiv:1904.00377, 2019.

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(436), 2015.

Michael K. K. Leung, Hui Yuan Xiong, Leo J. Lee, and Brendan J. Frey. Deep learning of the tissue-regulated splicing code. Bioinformatics, 30(12):i121–i129, 2014. doi: 10.1093/bioinformatics/btu277.

Shingyu Leung, Jianliang Qian, et al. An adjoint state method for three-dimensional transmissiontraveltime tomography using first-arrivals. Communications in Mathematical Sciences, 4(1):249–266, 2006.

15

Yingzhou Li, Jianfeng Lu, and Anqi Mao. Variational training of neural network approximations ofsolution maps for physical models. arXiv preprint arXiv:1905.02789, 2019.

Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-net: Learning PDEs from data.In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Confer-ence on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages3208–3216, Stockholmsmssan, Stockholm Sweden, 10–15 Jul 2018. PMLR. URL http://proceedings.mlr.press/v80/long18a.html.

Alice Lucas, Michael Iliadis, Rafael Molina, and Aggelos K Katsaggelos. Using deep neural net-works for inverse problems in imaging: beyond analytical methods. IEEE Signal ProcessingMagazine, 35(1):20–36, 2018.

Junshui Ma, Robert P. Sheridan, Andy Liaw, George E. Dahl, and Vladimir Svetnik. Deep neuralnets as a method for quantitative structure-activity relationships. Journal of Chemical Informationand Modeling, 55(2):263–274, 2015. doi: 10.1021/ci500747n.

Walter Munk, Peter Worcester, and Carl Wunsch. Ocean acoustic tomography. Cambridge univer-sity press, 2009.

Alexander Mihai Popovici and James Sethian. Three dimensional traveltime computation using thefast marching method. In SEG Technical Program Expanded Abstracts 1997, pages 1778–1781.Society of Exploration Geophysicists, 1997.

Jianliang Qian, Yong-Tao Zhang, and Hong-Kai Zhao. Fast sweeping methods for eikonal equationson triangular meshes. SIAM Journal on Numerical Analysis, 45(1):83–107, 2007.

Maziar Raissi and George Em Karniadakis. Hidden physics models: Machine learning of nonlinearpartial differential equations. Journal of Computational Physics, 357:125 – 141, 2018. ISSN0021-9991. doi: 10.1016/j.jcp.2017.11.039.

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: Adeep learning framework for solving forward and inverse problems involving nonlinear partialdifferential equations. Journal of Computational Physics, 378:686–707, 2019.

Nicholas Rawlinson, S Pozgay, and S Fishwick. Seismic tomography: a window into deep Earth.Physics of the Earth and Planetary Interiors, 178(3-4):101–135, 2010.

Keith Rudd and Silvia Ferrari. A constrained integration (CINT) approach to solving partial differ-ential equations using artificial neural networks. Neurocomputing, 155:277–285, 2015.

Jurgen Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85–117,2015. ISSN 0893-6080. doi: 10.1016/j.neunet.2014.09.003.

Hermann Schomberg. An improved approach to reconstructive ultrasound tomography. Journal ofPhysics D: Applied Physics, 11(15):L181, 1978.

Gerard T Schuster and Aksel Quintus-Bosz. Wavepath eikonal traveltime inversion: Theory. Geo-physics, 58(9):1314–1323, 1993.

16

http://proceedings.mlr.press/v80/long18a.html

http://proceedings.mlr.press/v80/long18a.html

James A Sethian. Fast marching methods. SIAM review, 41(2):199–235, 1999.

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks.In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Ad-vances in Neural Information Processing Systems 27, pages 3104–3112. Curran Associates, Inc.,2014.

Chao Tan, Shuhua Lv, Feng Dong, and Masahiro Takei. Image reconstruction based on convolu-tional neural network for electrical resistance tomography. IEEE Sensors Journal, 19(1):196–204,2018.

D Ustundag. Retrieving slowness distribution of a medium between two boreholes from first arrivaltraveltimes. Int. J. Geol, 2:1–8, 2008.

Tak Shing Au Yeung, Eric T Chung, and Gunther Uhlmann. Numerical inversion of 3d geodesicX-ray transform arising from traveltime tomography. arXiv preprint arXiv:1804.10006, 2018.

Hongkai Zhao. A fast sweeping method for eikonal equations. Mathematics of computation, 74(250):603–627, 2005.

17

Date post:	28-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Abstract - Stanford Universitylexing/traveltime.pdf · 2019-11-29 · Abstract This paper...

Documents