Uncertainty Quantification Using Deep Gaussian
Processes
Alireza Daneshkhah
Warwick Centre for Predictive ModellingThe University of Warwick
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 1 / 35
Motivation
Deterministic Solver
Red
uct
ion
Den
sity
Est
imat
ion
Rec
onst
ruct
ion
Observed input
Bayesian Training
Surrogate
Model
• Tree construction.
• HDMR terms.
• Experimental design.
A =
{
α(s)
}SA
s=1
Reduced input space Output space
• Output correlations.
Data collection
Statistics PDFs Error bars
Bilionis and Zabaras (2012)
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 2 / 35
The multiscale Modelling Challenges
1 The complex multiscale physical models challenges◮ Curse of dimensionality◮ Computational complexity (& limited data)◮ Discontinuity of the model output
2 The current solutions◮ Probabilistic neural networks◮ Traditional Gaussian processes◮ Multi-output separable Gaussian process
3 Deep Gaussian processes◮ Probabilistic representation◮ Analytical solution is available◮ Model dimensionality is no longer an issue
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 3 / 35
Deep Neural network
The idea is taken from the Deep Neural network with l hidden
layers
given x
h1 = φ (W1x)
h2 = φ (W2h1)
h3 = φ (W3h2)
y = w⊤4
h3y1
h31
h32
h33
h34
h21
h22
h23
h24
h25
h26
h11
h12
h13
h14
h15
h16 h1
7h1
8
x1 x2 x3 x4 x5 x6
Lawrence et al. (2014).
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 4 / 35
Problems with Deep neural network
By increasing the number of nodes in neighboring layers, the
corresponding set W (in h = φ(Wx)) becomes very big (leading to
be a overfitted model).
Solution: replace Wi with a lower rank form
Wi = UiVTi
Wk1×k2 , then Uk1×q and Vk2×q
given x
f1 = V⊤1
x
h1 = g (U1f1)
f2 = V⊤2
h3
h2 = g (U2f2)
f3 = V⊤1
h2
h3 = g (U3f3)
y = w⊤4
h3y1
h11
h12
h13
h14
h21
h22
h23
h24
h25
h26
h31
h32
h33
h34
h35
h36
h37
h38
x1 x2 x3 x4 x5 x6
Lawrence et al. (2014).
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 5 / 35
Alternative Solution: Deep GP
Put GP prior over weights and and taking width of each layer to
infinity.
Y = f3(f2(· · · f1(X))), Hi = fi(Hi−1)
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 6 / 35
Deep Gaussian Process
f1~GP
f3~GP
f2~GP
1 Deep Gaussian process◮ Bayesian belief network (DAG)◮ Non-parametric, non-linear mappings fl◮ Likelihood is a non-linear function of the inputs
2 Challenges◮ How to learn the intermediate hidden layers?◮ How to efficiently train the model?
3 Solution◮ Variational compression◮ Provides a Probabilistic presentation of the model
evidence
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 7 / 35
Non-linear mapping Using GP
Nonlinear regression problem: Learn f with error bars from data
D = {X,y}
Use GP prior on f = {fi}Ni=1 given X = {xi}
Ni=1
}
f1f2
f3fN
N function values
x
f
= ( ; )
prior
p(f |X) = N (0,KN)
KN
K(xi,xj) = τ 2exp
−1
2
q∑
k=1
(
x(k)i − x
(k)j
ωk
)2
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 8 / 35
GP Regression
yi = fi + ǫi, ǫi ∼ N(0, σ2)
marginal likelihood: p(y | X) = N (0,KN + σ2I)
predictive distribution: p(y∗ | x∗,y,X) = N (µ∗, σ2∗)
µ∗ = K∗N(KN + σ2I)−1y
σ2∗= K∗∗ − K∗N(KN + σ2I)−1KN∗ + σ2
Problem: N3 computation
Solution: Use sparse GP approximation based on a small set of M
pseudo-inputs or inducing variables
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 9 / 35
Sparse GP Using pseudo-inputs
1 Choose any set of M ≪ N inducing variables X
2 Draw corresponding function values f from prior◮ p(f | X) = N (0,KM)
3 Draw f conditioned on f◮ p(f | f) = N (KNMK−1
M f, Σ = KN − KNMK−1M KMN)
X
x
f
f f
−
Σ
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 10 / 35
Sparse GP approximation
p(fi | f) = N (µi = KiMK−1M f, λi = Kii − KiMK−1
M KMi)
X
x
f
−
Λ
Approximate: p(f | f) ≈∏
i=1 p(fi | f) = N (µ,Λ), Λ = diag(λ)
Minimum KL: minqiKL[
p(f | f) ‖∏
i qi(fi)]
Integrate out f to obtain: p(f) =∫
p(f)p(f | f)df
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 11 / 35
Sparse Pseudo-input GP (SPGP)∫
∏
|
GP prior
N (0,KN) ≈SPGP prior
p(f) = N (0,KNMK−1
MKMN + Λ)
≈ = +
SPGP covariance computation: O(M2N)
The predictive mean computational complexity: O(M)
The predictive variance computational complexity: O(M2)
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 12 / 35
How to find Sparse Pseudo-inputs?
consider these inputs as the extra hyper-parameters and
maximize the marginal likelihood w.r.t (X, τ, σ,ω)
p(y | X,X, τ, σ,ω)
This Joint optimization avoids discontinuities that arise when the
design points are selected .
We use this augmented variable method, followed by a collapsed
variational approximation for learning Deep GP.
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 13 / 35
Sparse Pseudo-inputs position
amplitudeamplitudeamplitude
x
y
X
amplitude lengthscale noise amplitudeamplitudeamplitude
x
y
X
amplitude lengthscale noise
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 14 / 35
Bayesian GP Latent Variable Model (GP-LVM)
Start with a standard GP-LVM.
Y
X
σ2
p(Y | X) =
p∏
j=1
N (y:,j | 0,K)
Apply standard latent variable approach◮ Define Gaussian prior over latent space, X.
p(X) =
q∏
j=1
N (x:,j | 0, α2j I)
◮ Integrate out latent variables to get p(Y) =?◮ Integration is intractable.
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 15 / 35
Standard Variational Inference
Standard variational bound has the form
L = 〈log p(y | X)〉q(X) + KL(q(X) ‖ p(X))
Requires expectation of log p(y | X) under q(X)
log p(y | X) = −1
2yT(Kff + σ2I)−1y −
1
2|Kff + σ2I| −
N
2log 2π
The computation of expectation of log p(y | X) under q(X) is
extremely difficult.
Augment the GP model by inducing variables (Z,u = f(Z))
p(f,u | Z,X) = N
((
f
u
)
|0,
(
Kff Kfu
Kuf Kuu
))
log p(y | X,Z) ≥ logN (y | 0,KfuK−1uu Kuf + σ2I)−
1
2σ2tr(Σ)
Σ = Kff − KfuK−1uu Kuf
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 16 / 35
Bayesian Variational Inference
Treat u as the extra parameters of the model
p(u) = N (u | 0,Kuu)
By applying the parametric Variational Bayes (Titsias and
Lawrence, 2010), we have
log p(y | X) ≥ logN (y | KfuK−1uu m, σ2I)−
1
2σ2tr(SK−1
uu KufKfuK−1uu )
−KL(p(u) ‖ q(u))−1
2σ2tr(Σ)
The variational distribution is q(u) = N (u | m,S)
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 17 / 35
Deep GP representation - Process Composition
f1~GP
f3~GP
f2~GP
Deep GP: y = hl(hl−1(. . . h1(X))) + ǫ
Joint pdf:
p(y, {hi}li=1 | x) = p(y | hl)
l∏
i=2
p(hi | hi−1)p(h1 | x)
h1 | X = N (0,Kh1h1+ σ2
1I)
hi | hi−1 = N(0,Khihi+ σ2
i I)
y | hl = N(0,Khlhl+ σ2
l I)
The direct computation of p(y | X) is intractable
(O(N3)).
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 18 / 35
Computational Challenges in Learning Deep GP
1 Marginalise out all hidden layers in a Bayesian framework (Titsiaset al. 2010)
◮ The number of parameters is drastically reduced.◮ The deep network structure can be automatically determined
2 The direct marginalization of hi’s are intractable.
p(y | x) =∫
p(y | h1)(∫
p(h1 | h2)p(h2 | x)dh2
)
dh1
p(h2 | x) =∫
p(h1 | f2)p(f2 | h2)p(h1 | x)dh1df2
p(f2 | h2) contains non-linear kernel K−1f2f2
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 19 / 35
Deep GP augmented by Inducing Variables: An
Example
f1~GP
f3~GP
f2~GP
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 20 / 35
Variational (Compression) Inference for Deep GPs
Augment layer hi with a set of inducing variables ui
Apply Bayesian variational inference withing each layer
The bound on the conditional probability is:
p(y, {hi}li=1 | {ui}
li=1x) ≥ p(y | hl,ul)×
l∏
i=2
p(hi | hi−1,ui)p(h1 | x,u1)
× exp
(
l∑
i=1
−1
2σ2i
tr(Σi)
)
p(hi | ui,hi−1) = N (hi | KhiuiK−1
uiuiui, σ
2i I)
Σi = Khihi− Khiui
K−1uiui
Kuihi
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 21 / 35
Variational Compression for Deep GP (3)
Given x and a fixed q(u1) = N (u1 | m1,S1) compute
q(h1) =
∫
p(h1 | u1,x)q(u1)du1
Given q(h1), we can variationally propagate using q(u2) and
marginalize out h1
log p(h2 | x,u2) ≥ −〈1
2σ22
tr(Σ1)〉q(h1) −1
2σ21
tr(Σ0)
−KL(q(u1)‖p(u1)) + log N(h2 | Ψ2K−1u2u2
u2, σ22 I)
−1
σ22
tr(
(Φ2 −ΨT2Ψ2)K
−1u2u2
u2uT2 K−1
u2u2
)
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 22 / 35
The marginal likelihood bound
Continue to feed-forward to the bottom layer using the variational
propagation at each layer, the marginal likelihood bound is
log p(y | X) ≥ −l∑
i=2
1
2σ2i
(ψi − tr(ΦiK−1uiui
))−1
2σ21
tr(Σ1)
−l∑
i=1
KL(q(ui)‖p(ui)) + logN (y | ΨlK−1ulul
ml, σl2I)
−l∑
i=1
1
σ2i
tr(
(Φi −ΨTi Ψi)K
−1uiui
〈uiuTi 〉q(ui)K
−1uiui
)
Φi = 〈KuihiKhiui
〉q(hi−1),Ψi = 〈Khiui〉q(hi−1), ψi = 〈tr(Khihi
)〉q(hi−1)
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 23 / 35
VC for Deep GP - Points
1 All the terms of the given bound are tractable, including the KL
term
2 However, the rate of tractability depends on the selected
covariance function (the same as GPLVM) and how feasible can
be convoluted with q(hl).
3 A gradient based optimization method can be used to maximizethe final form of the variational lower bound w.r.t to
◮ model parameters: {σ2i , θi}
l+1i=2
◮ variational parameters: {Zi,mi,Si,µi+1,Σi+1}li=1
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 24 / 35
Elliptic PDE Example
−∇.(a(ω,x)∇u(ω,x)) = f (·), in D
u(ω,x) = 0, in ∂D
The physical domain is D = [0, 1]2
Z(ω,x) = log(a(ω,x))
C(x1,x2) = σ2rf exp
(
−kI∑
i=1
(x1,i − x2,i)2
λ
)
We generate N = 250 realisations of Z by truncating the KLE at
q1 = 50, when λ = 0.1.
The boundary problem is then solved with FEM over 16 × 16 grid
Response observed on a 20 × 20 grid.
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 25 / 35
Input & output realizations of the Elliptic PDEs
Training data: D = {(Zr, ur), r = 1, . . . , 200}
0 0.5 10
0.2
0.4
0.6
0.8
1
-2
-1
0
1
2
0 0.5 10
0.2
0.4
0.6
0.8
1
-2
-1
0
1
2
3
0 0.5 10
0.2
0.4
0.6
0.8
1
-2
-1
0
1
2
0 0.5 10
0.2
0.4
0.6
0.8
1
-2
-1
0
1
2
0 0.5 10
0.2
0.4
0.6
0.8
1
0.01
0.02
0.03
0.04
0.05
0.06
0 0.5 10
0.2
0.4
0.6
0.8
1
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0 0.5 10
0.2
0.4
0.6
0.8
1
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0 0.5 10
0.2
0.4
0.6
0.8
1
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 26 / 35
Hidden Layers demonstrations - Elliptic Problem
0 50 100 150 200 2500
0.5
1
1.5Layer 1
1 2 3 4 5 6 7 8 9 10 11 120
2
4
6Layer 2
Latent Dimesion 1
-0.6 -0.4 -0.2 0 0.2 0.4 0.6
LD
6
-0.5
0
0.5
1Layer 2
Latent Dimension 139
-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3
LD
14
2
-0.2
-0.1
0
0.1
0.2Layer 1
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 27 / 35
Posterior Mean and Variance of Response - Elliptic
Problem
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
0.01
0.02
0.03
0.04
0.05
0.06
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.06
×10-5
1.4423
1.4423
1.4423
1.4423
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1×10
-3
2
4
6
8
10
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 28 / 35
Mean of Variance & Variance of mean - Elliptic
Problem
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.06
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1×10
-5
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1×10
-5
1.2706
1.2706
1.2706
1.2706
1.2706
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 29 / 35
Flow through porous media
−∇u = 0
u = −K(X, ω)∇p, ∀x ∈ in Xs = [0, 1]2
p = 1 − x1 on ∂Xs
Deterministic solver: Mixed FEM on a 20 × 20 grid.
Response observed on a 20 × 20 grid.
G(x, ω) = log(K(x, ω) is an exponential RF with:
COVG(xs1,xs2) = s2G exp
{
−ks∑
k=1
|xs1,k − xs2,k|
λk
}
.
We employ the KLE on G and truncate it after 50 terms, λk = 0.1.
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 30 / 35
Input & output realizations of the Permeability Problem
400 data-points are generated, first 300 are used for training the
Deep GP.
0 0.5 10
0.5
1
-0.5
0
0.5
0 0.5 10
0.5
1
-0.6
-0.4
-0.2
0
0.2
0.4
0 0.5 10
0.5
1
-0.5
0
0.5
0 0.5 10
0.5
1
-0.2
0
0.2
0.4
0.6
0 0.5 10
0.5
1
-2
-1
0
1
0 0.5 10
0.5
1
-2
-1
0
1
0 0.5 10
0.5
1
-1.5
-1
-0.5
0
0.5
1
0 0.5 10
0.5
1
-1.5
-1
-0.5
0
0.5
1
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 31 / 35
Hidden Layers demonstrations - Permeability Problem
A deep GP with 2 hidden layers is fitted to the data
K = 80 ( of inducing variables).
0 50 100 150 200 250 300 350 4000
5
10
15Layer 1
1 2 3 4 5 6 7 8 9 10 11 12 13 140
0.02
0.04Layer 2
Lat Dim 1 ×10-4
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Lat
Dim
241
×10-4
-5
0
5Layer 1
Lat Dim 1
-1.5 -1 -0.5 0 0.5 1 1.5 2
Lat D
im 1
4
-2
0
2
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 32 / 35
Posterior Mean and Variance of Pressure -
Permeability Problem
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
-2
-1.5
-1
-0.5
0
0.5
1
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
-1.5
-1
-0.5
0
0.5
1
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1×10
-5
2.5072
2.5072
2.5072
2.5072
2.5072
2.5072
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 33 / 35
Mean of Variance & Variance of mean of Pressure
0 0.2 0.4 0.6 0.8 1
x
0
0.2
0.4
0.6
0.8
1
y
-1.5
-1
-0.5
0
0.5
1
0 0.2 0.4 0.6 0.8 1
x
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
y
0.5
1
1.5
2
2.5×10
-3
0 0.2 0.4 0.6 0.8 1
x
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
y
1.1376944
1.13769445
1.1376945
1.13769455
1.1376946×10
-4
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 34 / 35
Summary
The final models is not a GP!
Deep GP provides a probabilistic approximation of the model
which is useful for UQ and also guards against overfitting
Deep GP allows unsupervised and supervised deep learning.
Using Deep GP, the curse of dimensionality is no longer an issue
Variational compression algorithms show promise for scaling
these models to massive data sets.
Sampling is straight-forward
Alireza Daneshkhah Uncertainty Quantification Using Deep Gaussian Processes 35 / 35