+ All Categories
Home > Documents > Projected Stein variational Newton: Peng Chen · Projected Stein variational Newton: A fast and...

Projected Stein variational Newton: Peng Chen · Projected Stein variational Newton: A fast and...

Date post: 01-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
75
Projected Stein variational Newton: A fast and scalable Bayesian inference method in high dimensions Peng Chen Keyi Wu, Joshua Chen, Thomas OLeary-Roseberry, Omar Ghattas Oden Institute for Computational Engineering and Sciences The University of Texas at Austin RICAM Workshop on Optimization and Inversion under Uncertainty Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 1 / 65
Transcript
  • Projected Stein variational Newton:A fast and scalable Bayesian inference method in high dimensions

    Peng ChenKeyi Wu, Joshua Chen, Thomas OLeary-Roseberry, Omar Ghattas

    Oden Institute for Computational Engineering and SciencesThe University of Texas at Austin

    RICAM Workshop on Optimization and Inversion under Uncertainty

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 1 / 65

  • Example: inversion in Antarctica ice sheet flow

    uncertain parameter: basal sliding field in boundary conditionforward model: viscous, shear-thinning, incompressible fluid

    −∇ · (η(u)(∇u +∇uT)− Ip) = ρg∇ · u = 0

    data: (InSAR) satellite observation of surface ice flow velocity

    T. Isaac, N. Petra, G. Stadler, O. Ghattas, JCP, 2015Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 2 / 65

  • Outline

    1 Bayesian inversion

    2 Stein variational methods

    3 Projected Stein variational methods

    4 Stein variational reduced basis methods

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 3 / 65

  • Outline

    1 Bayesian inversion

    2 Stein variational methods

    3 Projected Stein variational methods

    4 Stein variational reduced basis methods

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 4 / 65

  • Uncertainty parametrization

    Example I: Karhunen–Loève expansionKarhunen–Loève expansion for β with mean β̄ and covariance C

    β(x, θ) = β̄(x) +∑j≥1

    √λjψj(x) θj,

    (λj, ψj)j≥1: eigenpairs of a covariance C, θ = (θj)j≥1, uncorrelated, given by

    θj =1√λj

    ∫D(κ− κ̄)ψj(x)dx.

    Example II: dictionary basis representationWe can approximate the random field β by

    β(x, θ) =∑j≥1

    ψj(x) θj,

    (ψj)j≥1 dictionary basis, e.g., wavelet or finite element basis.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 5 / 65

  • Bayesian inversion

    We consider an abstract form of the parameter to data model

    y = O(θ) + ξuncertain parameter: θ ∈ Θ ⊂ Rd

    parameter-to-observable map Oobservation data: y ∈ Rn

    noise ξ, e.g., ξ ∼ N (0,Γ)Bayes’ rule:

    πy(θ)︸ ︷︷ ︸posterior

    =1

    π(y)π(y|θ)︸ ︷︷ ︸

    likelihood

    π0(θ)︸ ︷︷ ︸prior

    ,

    with the model evidence

    π(y) =∫

    Θπ(y|θ)π0(θ)dθ.

    Parameter θ

    QoI s(θ) Foward ModelA(u , v, θ) = F(v)

    Bayesian Inversionπy(θ) ∝ π(y |θ)π0(θ)

    Observational Datay = ℬu + ξ

    prior

    posterior

    π(y |θ) = πξ(y −ℬu )

    $(θ) = ℬu (θ)

    The central tasks: sample from posterior and compute statistics, e.g.,

    Eπy [s] =∫

    Θs(θ)πy(θ)dθ.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 6 / 65

  • Computational challenges

    Computational challenges for Bayesian inversion:the posterior has complex geometry:non-Gaussian, multimodal, concentrating in a local regionthe parameter lives in high-dimensional spacescurse of dimensionality – complexity grows exponentiallythe map O is expensive to evaluate:involving solve of large-scale partial differential equations

    complex geometry high dimensionality large-scale computationPeng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 7 / 65

  • Computational methods

    Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,

    Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...

    2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...

    3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

    Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)

    [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)

    [Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...

    3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

  • Computational methods

    Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,

    Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...

    2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...

    3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

    Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)

    [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)

    [Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...

    3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

  • Computational methods

    Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,

    Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...

    2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...

    3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

    Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)

    [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)

    [Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...

    3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

  • Computational methods

    Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,

    Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...

    2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...

    3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

    Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)

    [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)

    [Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...

    3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

  • Computational methods

    Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,

    Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...

    2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...

    3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

    Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)

    [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)

    [Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...

    3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

  • Computational methods

    Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,

    Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...

    2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...

    3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

    Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)

    [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)

    [Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...

    3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

  • Computational methods

    Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,

    Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...

    2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...

    3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

    Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)

    [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)

    [Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...

    3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

  • Computational methods

    Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic

    collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

    2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...

    3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

    Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using

    projected transport map in parameter spacereduced basis approximation in state space

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

  • Computational methods

    Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic

    collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

    2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...

    3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

    Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using

    projected transport map in parameter spacereduced basis approximation in state space

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

  • Computational methods

    Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic

    collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

    2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...

    3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

    Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using

    projected transport map in parameter spacereduced basis approximation in state space

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

  • Computational methods

    Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic

    collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

    2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...

    3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

    Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using

    projected transport map in parameter spacereduced basis approximation in state space

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

  • Computational methods

    Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic

    collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

    2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...

    3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

    Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using

    projected transport map in parameter spacereduced basis approximation in state space

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

  • Outline

    1 Bayesian inversion

    2 Stein variational methods

    3 Projected Stein variational methods

    4 Stein variational reduced basis methods

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 10 / 65

  • Transport map

    Find a transport map T : Rd → Rd, such that

    θ ∼ π0 → T(θ) ∼ πy,

    prior posterior

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 11 / 65

  • Kullback–Leibler divergence

    Definition: Kullback–Leibler (KL) divergence

    DKL(π1|π2) =∫

    Θπ1(θ) log

    (π1(θ)

    π2(θ)

    )dθ.

    It measures the difference between two probability distribution

    DKL(π1|π2) ≥ 0, and DKL(π1|π2) = 0 if and only if π1 = π2, a.e.

    It is not symmetric, thus not a distance

    DKL(π1|π2) 6= DKL(π2|π1)

    Relation to (Shannon) information theory

    DKL(π1|π2) = Eθ∼π1 [− log(π2)]︸ ︷︷ ︸cross entropy

    −Eθ∼π1 [− log(π1)]︸ ︷︷ ︸entropy

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 12 / 65

  • Optimization for transport map

    Find a transport map T : Rd → Rd, such that

    θ ∼ π0 → T(θ) ∼ πy,

    by minimizing the KL divergence

    minT∈TDKL(T]π0|πy) ⇔ min

    T∈TDKL(π0|T]πy).

    T] is a pushforward map satisfying

    T]π0(θ) = π0(T−1(θ))|det∇T−1(θ)|,

    T] is a pullback map satisfying

    T]πy(θ) = πy(T(θ))|det∇T(θ)|.

    T is a tensor-product function space Hd = H⊗ · · · ⊗ H.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 13 / 65

  • Composition of transport map

    Instead of seeking one complex (highly nonlinear) transport map T,we look for composition of a sequence of simple transport maps

    T = TL ◦ TL−1 ◦ · · · ◦ T1 ◦ T0, L ∈ N,

    perturbation of identity:

    Tl(θ) = I(θ) + Ql(θ),

    identity map I(θ) = θperturbation map Ql : Rd → Rd

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 14 / 65

  • Optimization of each transport map

    At each l = 0, 1, . . . , we define

    πl+1 := (Tl ◦ · · · ◦ T0)]π0 ⇐⇒ πl+1 = (Tl)]πlWe introduce a cost functional

    Jl[Q] := DKL((I + Q)]πl|πy). (1)One step optimization of Jl(Q) w.r.t. Q leads to

    Tl = I + αlQl,

    with step size αl > 0 (learning rate, line search).

    Optimization methodsGradient descent method: steepest descent [Liu and Wang, 2016]

    Ql = −DJl[0].Newton method: solve the linear system [Detommaso et al., 2018]

    D2Jl[0](V,Ql) = −DJl[0](V), ∀ V ∈ T .

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 15 / 65

  • Optimization of each transport map

    Calculus of variationThe first variation DJl[0] at Q = 0 in direction V ∈ T

    DJl[0](V) := −Eπl[(V(θ))>∇θ log(πy(θ)) + trace(∇θV(θ))

    ]The second variation D2Jl[0] at Q = 0 in directions V,W ∈ T

    D2Jl[0](V,W) := −Eπl[(V(θ))>∇2θ log(πy(θ)) W(θ)− trace(∇θW(θ) ∇θV(θ))

    ]Recall the Bayes’ rule:

    πy(θ)︸ ︷︷ ︸posterior

    =1

    π(y)π(y|θ)︸ ︷︷ ︸

    likelihood

    π0(θ)︸ ︷︷ ︸prior

    ,

    which leads to

    ∇θ log(πy(θ)) =∇θπy(θ)πy(θ)

    =∇θ(π(y|θ)π0(θ))π(y|θ)π0(θ)

    Key observation: the intractable model evidence π(y) is canceled out.Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 16 / 65

  • Reproducing Kernel Hilbert Space (RKHS)

    T is a tensor-product function space Hd = H⊗ · · · ⊗ H.

    tensor-product polynomials[El Moselhy and Marzouk, 2012, Spantini et al., 2018],radial basis/kernel functions[Liu and Wang, 2016, Detommaso et al., 2018].

    Reproducing Kernel Hilbert Space HThere exists a function kθ ∈ H for every θ ∈ Θ, such that

    v(θ) = 〈v, kθ〉 ∀v ∈ H,

    which implies existence of kθ′ ∈ H for every θ′ ∈ Θ such that

    kθ′(θ) = 〈kθ′ , kθ〉 =: k(θ, θ′) reproducing kernel

    Many choices: bilinear, polynomials, Bergman, radial basis functions

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 17 / 65

  • N-dimensional approximation of RKHS

    Gaussian kernel

    k(θ, θ′) = exp(− 1

    2h(θ − θ′)>M(θ − θ′)

    ).

    To account for the geometry of the posterior, [Detommaso et al., 2018]

    M = H̄ := Eπl[−∇2θ log(πy(θ))

    ], h = d, v.s. M = I ∈ Rd×d.

    Finite dimensional approximation of RKHS:

    HlN = span(kl1(θ), . . . , klN(θ)) ⊂ H,

    where the basis functions are taken as

    kln(θ) = k(θ, θln), n = 1, . . . ,N,

    where θln ∼ πl are particles transported from θ0n ∼ π0 by

    θln = (Tl ◦ · · · ◦ T0)(θ0n), n = 1, . . . ,N.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 18 / 65

  • Stein variational gradient descent (SVGD)[Liu and Wang, 2016]

    For DJl[0](V) = 〈DJl[0],V〉Hd , by the reproducing property

    〈DJl[0],V〉Hd = −〈Eπl [∇θ log(πy(θ))k(θ, θ′) +∇θk(θ, θ′)],V(θ′)〉.

    For gradient descent, we have (by notation kln(θ) = k(θ, θln))

    Ql(θln) = −DJl[0](θln) = Eπl[∇θ log(πy(θ))kln(θ) +∇θkln(θ)

    ]Sample average approximation (SAA): θlm ∼ πl, m = 1, . . . ,N

    Ql(θln) ≈1N

    N∑m=1

    ∇θ log(πy(θlm))kln(θlm) +∇θkln(θlm).

    Particle updates by the transport map

    θl+1n = Tl(θln) := θ

    ln + αl Ql(θ

    ln), n = 1, . . . ,N.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 19 / 65

  • Stein variational Newton (SVN) [Detommaso et al., 2018]

    We seek Ql ∈ T lN = (HlN)d, where HlN = span(kl1(θ), . . . , klN(θ)),

    Ql(θ) =N∑

    n=1

    clnkln(θ),

    where the coefficients cln ∈ Rd, with cl = (cl1, . . . , clN) ∈ RdN .

    For the Newton system: find Ql ∈ T lN such that

    D2Jl[0](V,Ql) = −DJl[0](V), ∀ V ∈ T lN ,

    which, by using the reproducing property, becomes

    Hcl = −gl,

    gradient: gl = (gl1, . . . , glN) ∈ RdN , Hessian: H ∈ RdN×dN .

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 20 / 65

  • Stein variational Newton (SVN) [Detommaso et al., 2018]

    The gradient gl = (gl1, . . . , glN) ∈ RdN , with glm ∈ Rd given by

    glm = −1N

    N∑i=1

    ∇θ log(πy(θli))klm(θli) +∇θklm(θli)

    The Hessian H ∈ RdN×dN : with Hmn ∈ Rd×d given by

    Hmn =1N

    N∑i=1

    −∇2θ log(πy(θli))klm(θli)kln(θli) +∇θklm(θli)(∇θkln(θli))>.

    Decouple dN × dN system to N systems of size d × d

    H̄mclm = −glm, m = 1, . . . ,N,

    with diagonal approximation

    H̄m =1N

    N∑i=1

    −∇2θ log(πy(θli))klm(θli)klm(θli) +∇θklm(θli)(∇θklm(θli))>.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 21 / 65

  • SVGD vs SVN with M = I vs M = H̄

    G. Detommaso, T. Cui, Y. Marzouk, A. Spantini, R. Scheichl. A Stein variationalNewton method. NeurIPS, 2018.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 22 / 65

  • Outline

    1 Bayesian inversion

    2 Stein variational methods

    3 Projected Stein variational methods

    4 Stein variational reduced basis methods

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 23 / 65

  • Computational challenges in high dimensions

    Curse of dimensionality: d � 1The number N of basis functions grows rapidly (exponentially) w.r.t. thedimension d to achieve map representation with required accuracy.

    4 5 6 7 8 9 10Log2(Dimension)

    0.8

    0.6

    0.4

    0.2

    0.0

    0.2Lo

    g10(

    RMSE

    of v

    aria

    nce)

    SVGDSVNpSVN

    P. Chen, K. Wu, J. Chen, T. O’Leary-Roseberry, O. Ghattas. Projected Steinvariational Newton: A fast and scalable Bayesian inference method in high

    dimensions. NeurIPS, 2019.Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 24 / 65

  • Intrinsic low dimensionality

    The posterior 6= the prior in a low-dimensional subspace.high correlation in different dimensions;forward map is smoothing/regularizing;parameters are anistropic, e.g., KL expansion.

    100

    101

    102

    103

    j

    10-6

    10-4

    10-2

    100

    102

    λj

    j−2(α = 2)

    prior, case 1

    posterior, case 1

    rearranged, case 1

    prior, case 2

    posterior, case 2

    rearranged, case 2

    posterior, case 3

    rearranged, case 3

    P. Chen, U. Villa, O. Ghattas. Hessian-based adaptive sparse quadrature forinfinite-dimensional Bayesian inverse problems. CMAME, 2017.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 25 / 65

  • Projection [Constantine et. al. 2014, Cui et. al. 2014]

    We denote a basis of the subspace of dimension r � d as

    Ψ = (ψ1, . . . , ψr) ∈ Rd×r.

    We project θ to the low-dimensional subspace as

    θr =

    r∑i=1

    ψiψ>i θ = Ψw.

    As a result, we consider the projected posterior

    πry(θ) =1

    πr(y)π(y|θr)π0(θ), (2)

    where the maginal density

    πr(y) = Eπ0 [π(y|θr)].

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 26 / 65

  • Projected Stein variational methods

    By decomposition θ = θr + θ⊥, we have

    πry(θ) = π(y|θr) π0(θr) π0(θ⊥|θr).

    With θ⊥ frozen, by θr = Ψw, we define

    p0(w) := π0(θr), py(w) := πry(θr) = π(y|θr)π0(θr).

    We seek T = TL ◦ TL−1 ◦ · · · ◦ T1 ◦ T0 : Rr → Rr, such that

    minT∈T

    DKL(T]p0|py).

    Apply SVGD/SVN in Rr for w, pSVGD/pSVN where

    ∇w log(py(w)) = Ψ>∇θ log(πry(θr)),

    and∇2w log(py(w)) = Ψ>∇2θ log(πry(θr))Ψ.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 27 / 65

  • Basis construction

    The basis functions Ψ for projection are obtained by

    Hψi = λiC−10 ψi, i = 1, . . . , r,

    which corresponds to the r largest (in | · |) eigenvalues, i.e.,|λ1| ≥ · · · ≥ |λr|. C0: prior covariance. With ηy(θ) = − log(π(y|θ))

    Gradient-based subspace:

    H = Eπ[∇θηy(θ)(∇θηy(θ))>

    ].

    Hessian-based subspace:

    H = Eπ[∇2θηy(θ)

    ].

    Choice of the density π:density at step l, i.e., πl.

    0 10 20 30 40i

    1

    0

    1

    2

    3

    4

    Log1

    0(|

    i|)

    # dimensions=289# dimensions=1,089# dimensions=4,225# dimensions=16,641

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 28 / 65

  • Algorithm 1 pSVN in parallel using MPI

    1: Input: N prior samples, θ01, . . . , θ0N , in each of K cores.

    2: Output: posterior samples θy1, . . . , θyN in each core.

    3: Perform projection to get θn = θrn + θ⊥n and the samples wl−1n .4: Perform MPI Allgather for wl−1n , n = 1, . . . ,M.5: repeat6: Compute the gradient and Hessian.7: Perform MPI Allgather for the gradient and Hessian.8: Compute the kernel and its gradient.9: Assemble and solve Newton system for c1, . . . , cN .

    10: Perform a line search to get wl1, . . . ,wlN .

    11: Perform MPI Allgather for wln, n = 1, . . . ,N.12: Update the samples θrn = Ψwln + θ̄, n = 1, . . . ,N.13: Set l← l + 1.14: until A stopping criterion is met.15: Reconstruct samples θyn = θrn + θ⊥n , n = 1, . . . ,N.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 29 / 65

  • Algorithm 2 Adaptive pSVN

    1: Input: N prior samples, θ01, . . . , θ0N , in each of K cores.

    2: Output: posterior samples θy1, . . . , θyN in each core.

    3: Set level l2 = 1, θl2−1n = θ0n, n = 1, . . . ,N.4: repeat5: Perform the eigendecomposition and form the bases Ψl2 .6: Apply Algorithm pSVN to update the samples

    [θl21 , . . . , θl2N ] = pSVN([θ

    l2−11 , . . . , θ

    l2−1N ],K,Ψ

    l2).7: Set l2 ← l2 + 1.8: until A stopping criterion is met.

    Advantages:Avoids/alleviates the curse of dimensionality.Largely reduces computational cost with r � d.Converges faster in low-dimensional space.Parallel computation with reduced communication.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 30 / 65

  • Projection error estimates

    Assumption

    Assume that the parameter-to-observable map O satisfies:1 There exists a constant CO > 0 such that Eπ0 [||O(·)||Γ] ≤ CO.2 For every b > 0, there exists a constant Cb > 0 such that

    ||O(θ1)−O(θ2)||Γ ≤ Cb||θ1 − θ2||Θ, for max{||θ1||X, ||θ2||Θ} < b.

    TheoremUnder Assumption 1, for Hessian-based projection, we have

    DKL(πy |πry) ≤ C||θ − θr||Θ,

    C independent of r. For gradient-based projection, based on a resultin [Zahm et. al., 2018], we obtain (with C independent of r)

    DKL(πy |πry) ≤ Cd∑

    i=r+1

    λi.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 31 / 65

  • Numerical results: AccuracyWe first consider a linear parameter-to-observable map

    O(θ) = Aθ, A = O(Bθ), B = (L + M)−1,where L and M are the discrete Laplacian and mass matrices in the PDEmodel −4u + u = θ, in (0, 1), u(0) = 0, u(1) = 1. Gaussian prior N (0,C0), C0is discretized from (I − 0.14)−1 with Laplace operator 4.

    4 5 6 7 8 9 10Log2(Dimension)

    0.8

    0.6

    0.4

    0.2

    0.0

    0.2

    Log1

    0(RM

    SE o

    f var

    ianc

    e)

    SVGDSVNpSVN

    0 2 4 6 8# iterations

    1.5

    1.0

    0.5

    0.0

    Log1

    0(RM

    SE o

    f var

    ianc

    e)

    SVGD N=32SVN N=32pSVN N=32SVGD N=512SVN N=512pSVN N=512

    Decay of the RMSE of the L2 of pointwise variance of the parameter w.r.t.dimension d = 16, 64, 256, 1024 with N = 128 samples (left), and with N = 32,

    and 512 samples in parameter dimension d = 256 w.r.t. # iterations (right).Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 32 / 65

  • Numerical results: AccuracyWe consider a nonlinear Bayesian inverse problem with

    O(θ) = O(S(θ)), u = S(θ), −∇ · (eθ∇u) = 0, in (0, 1)2

    Gaussian prior N (0,C0), where C0 is a discretization of (I − 0.14)−2. We testthe accuracy against a dimension-independent likelihood informed (DILI)MCMC method with 10,000 samples as reference.

    0 5 10 15# iterations

    1.2

    1.0

    0.8

    0.6

    0.4

    0.2

    0.0

    Log1

    0(RM

    SE o

    f mea

    n)

    SVN 32pSVN 32SVN 512pSVN 512

    0 5 10 15# iterations

    1.0

    0.8

    0.6

    0.4

    0.2

    0.0

    Log1

    0(RM

    SE o

    f var

    ianc

    e)SVN 32pSVN 32SVN 512pSVN 512

    Decay of the RMSE of the L2 of the mean (left) and pointwise variance (right)of the parameter with dimension d = 1089 and N = 32 and 512 samples.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 33 / 65

  • Numerical results: ScalabilityWe consider a nonlinear Bayesian inverse problem with

    O(θ) = O(S(θ)), u = S(θ), −∇ · (eθ∇u) = 0, in (0, 1)2

    Gaussian prior N (0,C0), where C0 is a discretization of (I − 0.14)−2. We testthe accuracy against a dimension-independent likelihood informed (DILI)MCMC method with 10,000 samples as reference.

    2 4 6 8# iterations

    2.0

    1.5

    1.0

    0.5

    0.0

    Log1

    0(st

    ep n

    orm

    )

    # samples=8# samples=32# samples=128# samples=512

    0 1 2 3 4 5Log2(# processor cores)

    2

    0

    2

    4

    6

    8

    10

    Log2

    (tim

    e (s

    ))

    totalvariationkernelsolvesampleO(N 1)

    Left: Decay of the averaged norm of the update wl+1 − wl w.r.t. the iterationnumber l, with increasing number of samples. Right: Decay of the wall clock

    time of different computational components w.r.t. increasing # cores.Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 34 / 65

  • Summary

    Take away message:

    SVN provides good samples for complex posterior.

    pSVN is scalable to address the curse-of-dimensionality.

    Ongoing:

    Bayesian optimal experimental design with Keyi Wu.

    Triangular map and data assimilation with Joshua Chen.

    Deep learning for transport map with Tom O’Leary-Roseberry.

    Gravitational wave inversion with Bassel Saleh, Alex Leviyev.

    Integration with model reduction with Zihang Zhang.

    Convergence analysis w.r.t. # particles, parameter dimensions.

    Multilevel parallel implementation w.r.t. particles and PDE solves.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 35 / 65

  • Outline

    1 Bayesian inversion

    2 Stein variational methods

    3 Projected Stein variational methods

    4 Stein variational reduced basis methods

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 36 / 65

  • PDE-constrained Bayesian inversion

    We have the data model

    y = B(u(θ)) + ξ

    where u is the solution of the PDE (in weak form)

    A(u(θ), v; θ) = F(v) v ∈ V

    B : V → Y is a vector of observational functionals.Examples: linear diffusion, elasiticity, Stokes flow, acoustic, etc.,

    −∇ · (κ(θ)∇u) = f , in D,

    with suitable boundary conditions, which leads to

    A(u, v; θ) =∫

    Dκ(x, θ)∇u(x, θ) · ∇v(x)dx, F(v) =

    ∫D

    f (x)v(x)dx.

    With Gaussian noise ξ ∈ N (0,Γ), we define the potential

    ηy(θ) :=12

    (y− B(u(θ)))TΓ−1(y− B(u(θ)))⇒ π(y|θ) = log(−ηy(θ)).

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 37 / 65

  • High-fidelity approximation of the potential ηyE.g. finite element, we consider: find uh ∈ Vh ⊂ V such that

    A(uh, vh, θ) = F(vh) ∀vh ∈ Vh. (3)

    Then the data model is given by

    y = B(uh(θ)) + ξ,

    then for ξ ∼ N (0,Γ) the likelihood function is given by

    π(y|θ) = exp(−ηy(uh(θ))),

    where the potential ηy(uh(θ)) (nonlinear w.r.t. uh)

    ηy(uh(θ)) =12

    (y− B(uh(θ)))TΓ−1(y− B(uh(θ))).

    For SVGD, and the projected SVGD, we also need

    −∇θ log(πy(θ)) = ∇θηy(uh(θ)) +∇θπ0(θ)π0(θ)

    .

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 38 / 65

  • High-fidelity approximation of the gradient ∇θηy

    We form a Lagrangian

    L(uh, ph, θ) = ηy(uh) + A(uh, ph, θ)− F(ph),

    ∂vLwh = 0 to obtain the adjoint ph, i.e., find ph ∈ V such that

    A(wh, ph; θ) = −∂uηy|uh(wh) ∀wh ∈ Vh, (4)

    where∂uηy|uh(wh) = −B(wh)

    TΓ−1(y− B(uh)).

    Then the gradient is given by

    ∇θηy(uh(θ)) = ∂θL(uh, ph; θ) = ∂θA(uh, ph, θ).

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 39 / 65

  • Model reductionHigh-fidelity approximationFinite element space Vh,

    dim(Vh) = Nh

    Given θ, find uh ∈ Vh s.t.

    A(uh, vh; θ) = F(vh) ∀vh ∈ Vh

    The algebraic system is

    Ah(θ)uh = f h

    V = [ψ1, . . . ,ψN ]

    VT uh = uN

    VTAh(θ)V = AN(θ)

    VT f h = f N

    Reduced basis approximationReduced basis space VN ⊂ Vh,

    dim(VN) = N

    Given θ, find uN ∈ VN s.t.

    A(uN , vN ; θ) = F(vN) ∀vN ∈ VN

    The algebraic system is

    AN(θ)uN = f N

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 40 / 65

  • Model reduction: Building blocksPOD/SVDTraining samples

    Ξt = {θn, n = 1, . . . ,Nt}

    Compute snapshots

    U = [uh(θ1), . . . , uh(θNt )]

    Perform SVD

    U = VΣWT

    Extract bases V[1 : N, :]

    N = argminn En(Σ) ≥ 1− ε

    Greedy algorithmTraining samples

    Ξt = {θn, n = 1, . . . ,Nt}

    Initialize VN for N = 1 as

    VN = span{uh(θ1)}

    Pick next sample such that

    θN+1 = argmaxθ∈Ξt ∆N(θ)

    Update bases VN+1 as

    VN ⊕ span{uh(θN+1)}

    Offline-OnlineAffine assumption/approx.

    A =∑Q

    q=1 θq(θ)Aq

    Offline computation once

    AqN = VTAqhV

    Online assemble

    AN(θ) =∑Q

    q=1 θq(θ)AqN

    Online solve

    AN(θ)uN = f N

    Goal-oriented a-posteriori error estimate ∆N(θ) – dual weighted residual

    ∆N(θ) = A(uN , pN , θ)− F(pN)

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 41 / 65

  • Reduced basis approximation of the potential ηy

    RB approximation for the adjoint problem: find pN ∈ WN s.t.

    A(wN , pN , θ) = −∂uηy|uN (wN) ∀wN ∈ WN .

    The goal-oriented a-posterior error estimate is given by

    ∆N(θ) = A(uN , pN , θ)− F(pN).

    RB approximation for the potential ηy(θ):

    ηy,N(θ) = ηy(uN(θ)).

    Dual-weighted residual correction:

    η∆y,N(θ) = ηy,N(θ) + ∆N(θ).

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 42 / 65

  • Reduced basis approximation of the gradient ∇θηyWith the RB state uN and adjoint pN , the gradient is given by

    ∇θηy(uN(θ)) = ∂θA(uN , pN ; θ).

    For the modified potential η∆y,N(θ), we form the Lagrangian

    L(uN , pN , ûN , p̂N ; θ) = η∆y,N(θ) + A(uN , ûN ; θ)− F(ûN)+ A(p̂N , pN ; θ) +∇uηy|uN (p̂N),

    and solve the variational problem: find p̂N ∈ WN

    A(p̂N ,wN ; θ) = F(wN)− A(uN ,wN ; θ), ∀wN ∈ WN ,

    and the variational problem: find ûN ∈ VN

    A(vN , ûN ; θ) = −A(vN , pN ; θ)− ∂uηy|uN (vN)−∇2uηy|uN (p̂N , vN), ∀vN ∈ VN ,

    which leads to the gradient

    ∇θη∆y,N(θ) = ∂θL(uN , pN , ûN , p̂N ; θ).

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 43 / 65

  • Error estimates for the state and adjoint

    Assumption: Well-posednessThe bilinear form A(·, ·; θ) : V × V → R and linear form F(·) : V → RsatisfyA1 At any θ ∈ Θ, there exist a coercivity constant α(θ) > 0 and a

    continuity constant γ(θ) > 0 such that

    α(θ)||w||2V ≤ A(w,w; θ) and A(w, v; θ) ≤ γ(θ)||w||V ||v||V , ∀w, v ∈ V.

    The linear functional F(·) : V → R is bounded with norm

    ||F(·)||V′

  • Error estimates for the state u and adjoint p

    Let eur (θ) and epr (θ) denote the RB state and adjoint errors

    eur (θ) := uh(θ)− uN(θ), epr (θ) := ph(θ)− pN(θ).Let Ru(uN , ·; θ) denotes the residual of the state equation

    Ru(uN , vh; θ) = A(uN , vh; θ)− F(vh; θ) ∀vh ∈ Vh,and Rp(pN , ·; θ) denotes the residual of the adjoint equation

    Rp(wh, pN ; θ) = A(wh, pN ; θ) +∇uηy|uN (wh) ∀wh ∈ Vh.

    Lemma: Error estimates for the state u and adjoint pUnder the well-posedness assumption, for any θ ∈ Θ, there holds

    ||eur (θ)||V ≤1

    α(θ)||Ru(uN , ·; θ)||V′ ,

    and||epr (θ)||V ≤

    1α(θ)

    ||Rp(·, pN ; θ)||V′ +COα(θ)

    ||eur (θ)||V .

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 45 / 65

  • Error estimates for the potential ηy and gradient ∇θηy

    Lemma: Error esstimates for ηy,N(θ) and η∆y,N(θ).

    There exists constant C(θ) > 0 for each θ ∈ Θ, independent of N, s.t.

    |eηr (θ)| := |ηy(θ)− ηy,N(θ)| ≤ C(θ)||eur (θ)||V .

    There exists constant C1(θ) > 0 for each θ ∈ Θ, independent of N, s.t.

    |e∆r (θ)| := |ηy(θ)− η∆y,N(θ)| ≤ C||eur (θ)||V(||eur (θ)||V + ||epr (θ)||V).

    Lemma: Error esstimates for ∇θηy,N(θ) and ∇θη∆y,N(θ)There exist C1(θ),C2(θ) > 0 for each θ ∈ Θ, independent of N, s.t.

    ||∇θeηr (θ)||1 ≤ C1(θ)||∇θeur (θ)||Vd + C2(θ)||∇θuN(θ)||Vd ||eur (θ)||V .

    There exist C1(θ), ,C2(θ),C3(θ),C4(θ) > 0, independent of N, such that

    ||∇θe∆r (θ)||1 ≤ C1||∇θeur (θ)||Vd ||epr (θ)||V + C2||∇θepr (θ)||Vd ||eur (θ)||V+ C3||eur (θ)||V ||epr (θ)||V + C4||∇θeur (θ)||Vd ||eur (θ)||V .

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 46 / 65

  • Error estimates for the posterior πy

    Theorem: Error estimates for the posterior πy

    DKL(πhy |πry) ≤ Eπhy [|eηr |] + Eπhy [| exp(e

    ηr )− 1|],

    andDKL(πhy |π∆y ) ≤ Eπhy

    [|e∆r |

    ]+ Eπhy [| exp(e

    ∆r )− 1|].

    Corollary: Error estimates for the posterior πyLet Θ1 =: {θ ∈ Θ : eηr (θ) < 1}, if

    Eπhy (Θ\Θ1)[| exp(eηr )− 1|] < δEπhy [|e

    ηr |]

    for some constant δ > 0, we have

    DKL(πhy |πry) ≤ (3 + δ)Eπhy [|eηr |] .

    The same holds for DKL(πhy |π∆y ) ≤ (3 + δ)Eπhy[|e∆r |

    ].

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 47 / 65

  • Algorithm 3 Adaptive greedy algorithm with Stein samples

    1: Input: samples θ0m ∼ π0, m = 1, . . . ,M, tolerance ε0r , update step k.2: Output: Stein samples θm, m = 1, . . . ,M.3: Initialization: at θ = θ01, solve the high-fidelity state and adjoint prob-

    lems for uh and ph, set Vr = span{uh} and Wr = span{ph}, computethe reduced matrices and vectors for once.

    4: while at step l = 0, k, 2k, . . . , of the SVGD algorithm do5: Compute the error indicator 4N(θlm) for m = 1, . . . ,M.6: while maxm=1,...,M |4N(θlm)| > εlr do7: Choose θ = argmaxθlm,m=1,...,M |4N(x

    lm)|.

    8: Solve the high-fidelity problems for uh and ph at θ.9: Enrich the spaces Vr = Vr

    ⊕span{uh}, Wr = Wr

    ⊕span{ph}.

    10: Compute all the reduced matrices and vectors for once.11: Compute the error indicator 4N(θlm) for m = 1, . . . ,M.12: end while13: Perform SVGD update with RB approximations.14: Update the tolerance εlr according to gradient in SVGD algorithm.15: end while

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 48 / 65

  • Numerical example

    We consider the diffusion problem

    −∇ · (a(θ, x)∇u) = f (x), x ∈ D = (0, 1)2,where f = 1, the coefficient

    a(θ, x) = 5 +∑

    1≤i+j≤4

    1√i2 + j2

    θi,j cos(iπx1) cos(jπx2).

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 49 / 65

  • Numerical results: Comparison

    (a) Samples at step l = 0, 9, 99 (b) marginal posterior

    Figure: Comparison of (128) sample distribution driven by SVGD high-fidelityapproximation (blue) and reduced basis approximation (red).

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 50 / 65

  • Numerical results: Accuracy

    (a) adaptive construction ηy (b) adaptive construction ∇θηy

    (c) fixed construction ηy (d) fixed construction ∇θηy

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 51 / 65

  • Numerical results: Adaptive greedy algorithm

    Figure: Tolerances for adaptive greedy algorithm (left);# reduced basis functions for difference initial tolerances (right)

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 52 / 65

  • Numerical results: Cost

    FE adaptive RB fixed RBinitial tolerance ε0r n/a 1 0.1 0.01 0.00001

    M = 64

    DOF (Nh,Nr) 16641 20 31 49 62time to build RB n/a 4.4 7.1 12.2 15.8

    time for evaluation 1.8× 103 4.4 4.8 5.8 7.3speedup factor 1 203 148 98 62

    M=128

    DOF (Nh,Nr) 16641 19 30 53 87time to build RB n/a 4.5 7.3 14.3 26.3

    time for evaluation 3.5× 103 8.3 9.5 11.8 19.2speedup factor 1 267 212 137 78

    Table: Comparison of high fidelity and reduced basis approximations ondegrees of freedom (DOF), CPU time for different tolerances and # samples

    P. Chen, O. Ghattas. Stein variational reduced basis Bayesian inversion, 2019.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 53 / 65

  • Summary

    Take away message:

    Reduced basis methods reduce the computational cost whilepreserving physical structure with certified accuracy.Leverage goal-oriented adaptive construction of RB.

    Ongoing:

    RB for SVN.Parameter and state reduction by projected SV + RB.Extension to nonlinear and nonaffine problems.

    Thank you for your attention!

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 54 / 65

  • References I

    Beskos, A., Girolami, M., Lan, S., Farrell, P. E., and Stuart, A. M.(2017).Geometric mcmc for infinite-dimensional inverse problems.Journal of Computational Physics, 335:327 – 351.

    Binev, P., Cohen, A., Dahmen, W., DeVore, R., Petrova, G., andWojtaszczyk, P. (2011).Convergence rates for greedy algorithms in reduced basismethods.SIAM Journal on Mathematical Analysis, 43(3):1457–1472.

    Bui-Thanh, T., Ghattas, O., Martin, J., and Stadler, G. (2013).A computational framework for infinite-dimensional bayesianinverse problems part I: The linearized case, with application toglobal seismic inversion.SIAM Journal on Scientific Computing, 35(6):A2494–A2523.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 55 / 65

  • References II

    Bui-Thanh, T. and Girolami, M. (2014).Solving large-scale pde-constrained bayesian inverse problemswith riemann manifold hamiltonian monte carlo.Inverse Problems, 30(11):114014.

    Chen, P. and Ghattas, O. (2019).Stein variational reduced basis Bayesian inversion.in preparation.

    Chen, P. and Schwab, C. (2016).Sparse-grid, reduced-basis Bayesian inversion:Nonaffine-parametric nonlinear equations.Journal of Computational Physics, 316:470 – 503.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 56 / 65

  • References III

    Chen, P., Villa, U., and Ghattas, O. (2017).Hessian-based adaptive sparse quadrature for infinite-dimensionalBayesian inverse problems.Computer Methods in Applied Mechanics and Engineering,327:147–172.

    Chen, P., Wu, K., Chen, J., O’Leary-Roseberry, T., and Ghattas, O.(2019).Projected stein variational Newton: A fast and scalable Bayesianinference method in high dimensions.arXiv preprint arXiv:1901.08659.

    Cui, T., Marzouk, Y., and Willcox, K. (2015).Data-driven model reduction for the bayesian solution of inverseproblems.International Journal for Numerical Methods in Engineering,102(5):966–990.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 57 / 65

  • References IV

    Detommaso, G., Cui, T., Marzouk, Y., Spantini, A., and Scheichl,R. (2018).A stein variational Newton method.In Advances in Neural Information Processing Systems, pages9187–9197.

    DeVore, R., Petrova, G., and Wojtaszczyk, P. (2012).Greedy algorithms for reduced bases in Banach spaces.Arxiv preprint arXiv:1204.2290.

    El Moselhy, T. and Marzouk, Y. (2012).Bayesian inference with optimal maps.Journal of Computational Physics.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 58 / 65

  • References V

    Farcas, I.-G., Latz, J., Ullmann, E., Neckel, T., and Bunagrtz, H.-J.(2019).Multilevel adaptive sparse Leja approximations for Bayesianinverse problems.arXiv preprint arXiv:1904.12204.

    Gantner, R. N. and Schwab, C. (2016).Computational higher order quasi-Monte Carlo integration.In Monte Carlo and Quasi-Monte Carlo Methods, pages 271–288.Springer.

    Girolami, M. and Calderhead, B. (2011).Riemann manifold langevin and hamiltonian monte carlo methods.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 73(2):123–214.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 59 / 65

  • References VI

    Lan, S., Bui-Thanh, T., Christie, M., and Girolami, M. (2016).Emulation of higher-order tensors in manifold monte carlo methodsfor bayesian inverse problems.Journal of Computational Physics, 308:81–101.

    Lassila, T., Manzoni, A., Quarteroni, A., and Rozza, G. (2013).A reduced computational and geometrical framework for inverseproblems in hemodynamics.International journal for numerical methods in biomedicalengineering, 29(7):741–776.

    Lieberman, C., Willcox, K., and Ghattas, O. (2010).Parameter and state model reduction for large-scale statisticalinverse problems.SIAM Journal on Scientific Computing, 32(5):2523–2542.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 60 / 65

  • References VII

    Liu, Q. and Wang, D. (2016).Stein variational gradient descent: A general purpose Bayesianinference algorithm.In Advances In Neural Information Processing Systems, pages2378–2386.

    Martin, J., Wilcox, L., Burstedde, C., and Ghattas, O. (2012).A stochastic Newton MCMC method for large-scale statisticalinverse problems with application to seismic inversion.SIAM Journal on Scientific Computing, 34(3):A1460–A1487.

    Marzouk, Y., Najm, H., and Rahn, L. (2007).Stochastic spectral methods for efficient bayesian solution ofinverse problems.Journal of Computational Physics, 224(2):560–586.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 61 / 65

  • References VIII

    Marzouk, Y. and Xiu, D. (2009).A stochastic collocation approach to Bayesian inference in inverseproblems.Communications in Computational Physics, 6(4):826–847.

    Nguyen, C., Rozza, G., Huynh, D., and Patera, A. (2010).Reduced basis approximation and a posteriori error estimation forparametrized parabolic pdes; application to real-time bayesianparameter estimation.Technical report, John Wiley & Sons.

    Oliver, D. S. (2017).Metropolized randomized maximum likelihood for improvedsampling from multimodal distributions.SIAM/ASA Journal on Uncertainty Quantification, 5(1):259–277.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 62 / 65

  • References IX

    Rezende, D. J. and Mohamed, S. (2015).Variational inference with normalizing flows.arXiv preprint arXiv:1505.05770.

    Schillings, C. and Schwab, C. (2013).Sparse, adaptive smolyak quadratures for bayesian inverseproblems.Inverse Problems, 29(6).

    Schillings, C., Sprungk, B., and Wacker, P. (2019).On the convergence of the Laplace approximation andnoise-level-robustness of Laplace-based Monte Carlo methods forBayesian inverse problems.arXiv preprint arXiv:1901.03958.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 63 / 65

  • References X

    Schwab, C. and Stuart, A. (2012).Sparse deterministic approximation of bayesian inverse problems.Inverse Problems, 28(4):045003.

    Spantini, A., Bigoni, D., and Marzouk, Y. (2018).Inference via low-dimensional couplings.The Journal of Machine Learning Research, 19(1):2639–2709.

    Stuart, A., Voss, J., and Wilberg, P. (2004).Conditional path sampling of sdes and the langevin mcmc method.Communications in Mathematical Sciences, 2(4):685–697.

    Wang, J. and Zabaras, N. (2005).Using Bayesian statistics in the estimation of heat source inradiation.International journal of heat and mass transfer, 48(1):15–29.

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 64 / 65

  • References XI

    Wang, K., Bui-Thanh, T., and Ghattas, O. (2018).A randomized maximum a posteriori method for posterior samplingof high dimensional nonlinear Bayesian inverse problems.SIAM Journal on Scientific Computing, 40(1):A142–A171.

    Wang, Z., Cui, T., Bardsley, J., and Marzouk, Y. (2019).Scalable optimization-based sampling on function space.arXiv preprint arXiv:1903.00870.

    Thank you for your attention!

    Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 65 / 65

    Bayesian inversionStein variational methodsProjected Stein variational methodsStein variational reduced basis methods

    fd@rm@0:


Recommended