Technische Universität MünchenData Innovation Lab
DI-LAB Final Presentation
Parameter Estimation with Gaussian Processes
A. Grundner, K. Wang, K. Harsha
Scientific Lead: Prof. Dr. Eric Sonnendrucker, Dr. Ahmed Ratnani
Technische Universität MünchenData Innovation Lab
How it started!
so, what’s the problem here?
Life’s so uncertain!I see all this data
around me but I don’t knowwhat my parameters are!! It’s fine. I’ll prescribe you
some priors and you can usethem with your favourite kernels!That should help with estimates.
LxΦ
GP
August 9, 2018 A. Grundner, K. Wang, K. Harsha 2
Technische Universität MünchenData Innovation Lab
Contents
1 Intro to Gaussian Processes
2 Parameter Estimation with Gaussian ProcessesFrameworkExamples
3 Heat equationBackward Euler for the homogeneous caseBackward Euler for the nonhomogeneous caseNo discretization
4 Wave equation
5 Burgers’ equation
6 Conclusions
August 9, 2018 A. Grundner, K. Wang, K. Harsha 3
Technische Universität MünchenData Innovation Lab
Gaussian Processes
A stochastic process with a distribution over functionsSpecified by a mean function, m(x), and a covariancefunction, or kernel, k (x , x ′)Application in machine learning: regression, classification...
August 9, 2018 A. Grundner, K. Wang, K. Harsha 4
Technische Universität MünchenData Innovation Lab
Different Kernels
August 9, 2018 A. Grundner, K. Wang, K. Harsha 5
Technische Universität MünchenData Innovation Lab
Realization of a GP
f ∼ GP(m, k)
m(x) = x2
4k(x , x ′) = exp(− 1
2 (x − x ′)2)y = f + ε
ε ∼ N (0, σ2)
5.0 2.5 0.0 2.5 5.0x
0
2
4
6
y(x)
August 9, 2018 A. Grundner, K. Wang, K. Harsha 6
Technische Universität MünchenData Innovation Lab
Varying signal variance σf for SE kernel
k(x , x ′) = σ2f exp
(− (x−x′)2
2`2
)
4 2 0 2 4x
2.5
0.0
2.5
5.0
7.5
10.0
12.5
15.0
f
Varying f
1510100
August 9, 2018 A. Grundner, K. Wang, K. Harsha 7
Technische Universität MünchenData Innovation Lab
Varying length scale l for SE kernel
k(x , x ′) = σ2f exp
(− (x−x′)2
2`2
)
4 2 0 2 4x
1
0
1
2
3
4
5
fVarying l
0.050.51.05.0
August 9, 2018 A. Grundner, K. Wang, K. Harsha 8
Technische Universität MünchenData Innovation Lab
Contents
1 Intro to Gaussian Processes
2 Parameter Estimation with Gaussian ProcessesFrameworkExamples
3 Heat equationBackward Euler for the homogeneous caseBackward Euler for the nonhomogeneous caseNo discretization
4 Wave equation
5 Burgers’ equation
6 Conclusions
August 9, 2018 A. Grundner, K. Wang, K. Harsha 9
Technische Universität MünchenData Innovation Lab
Framework for parameter estimation
Lφx u = fu ∼ GP(0, kuu(xi , xj , θ))
f ∼ GP(0, kff (xi , xj , θ, φ))
kff (xi , xj ; θ, φ) = LφxiLφxj
kuu(xi , xj ; θ)
kfu(xi , xj ; θ, φ) = Lφxikuu(xi , xj ; θ)
Dataset: x , yu, yf
August 9, 2018 A. Grundner, K. Wang, K. Harsha 10
Technische Universität MünchenData Innovation Lab
Framework for parameter estimation
y =
[yuyf
]K =
[kuu(xu, xu; θ) + σ2
u I kuf (xu, xf ; θ, φ)kfu(xf , xu; θ, φ) kff (xf , xf ; θ, φ) + σ2
f I
]NLML =
12
[log|K |+ yT K−1y + Nlog(2π)
]Take maximum likelihood estimates: θest , φest
August 9, 2018 A. Grundner, K. Wang, K. Harsha 11
Technische Universität MünchenData Innovation Lab
1D Linear operator with more than one parameter
Lφx u(x) = f (x)
Lφx := φ1 · + φ2ddx ·
u (x) = sin(x)f (x) = φ1 sin(x) + φ2 cos(x)
kff(xi , xj ; θ, φ1, φ2
)= LφxiL
φxj kuu
(xi , xj ; θ
)= Lφxi
(φ1kuu + φ2
∂∂xj
kuu
)= φ2
1kuu + φ1φ2∂∂xj
kuu
+φ1φ2∂∂xi
kuu + φ22∂∂xi
∂∂xi
kuu
kfu(xi , xj ; θ, φ1, φ2)
= Lφxi kuu(xi , xj ; θ)= φ1kuu + φ2
∂∂xi
kuu
Lφx u(x) = f (x)
L φx := φ1 · + φ2
ddx · + φ3
d2
dx2
u ( x ) = sin(x)f ( x ) = φ1 sin(x) + φ2 cos(x)− φ3 sin(x)
kff(xi , xj ; θ, φ1, φ2, φ3
)= LφxiL
φxj kuu
(xi , xj ; θ
)= Lφxi
(φ1kuu + φ2
∂∂xj
kuu + φ3∂2
∂x2jkuu
)=(φ1kuu + φ2
∂∂xi
kuu + φ3∂2
∂x2ikuu
)(φ1kuu + φ2
∂∂xj
kuu + φ3∂2
∂x2jkuu
)kfu(xi , xj ; θ, φ1, φ2, φ3
)= Lφxi kuu
(xi , xj ; θ
)= φ1kuu + φ2
∂∂xi
kuu + φ3∂2
∂x2ikuu
August 9, 2018 A. Grundner, K. Wang, K. Harsha 12
Technische Universität MünchenData Innovation Lab
1D Linear operator with more than one parameter
φ1 = 2, φ2 = 5 φ2 = 5
August 9, 2018 A. Grundner, K. Wang, K. Harsha 13
Technische Universität MünchenData Innovation Lab
Contents
1 Intro to Gaussian Processes
2 Parameter Estimation with Gaussian ProcessesFrameworkExamples
3 Heat equationBackward Euler for the homogeneous caseBackward Euler for the nonhomogeneous caseNo discretization
4 Wave equation
5 Burgers’ equation
6 Conclusions
August 9, 2018 A. Grundner, K. Wang, K. Harsha 14
Technische Universität MünchenData Innovation Lab
Heat equation
∂u∂t− φ∇2u = f
In one spatial dimension,
Lφx u(x) =∂
∂tu(x)− φ ∂2
∂x2 u(x) = f (x),
where x = (t , x) ∈ R2
August 9, 2018 A. Grundner, K. Wang, K. Harsha 15
Technische Universität MünchenData Innovation Lab
Backward Euler scheme
For the homogeneous case:
ut − αuxx = 0
u(x , t) = e−tsin(x√α
)
u0(x) := u(x ,0) = sin(x√α
)
Discretization in the time domain:
un − un−1
τ− α d2
dx2 un = 0
un − ταd2
dx2 un = un−1
August 9, 2018 A. Grundner, K. Wang, K. Harsha 16
Technische Universität MünchenData Innovation Lab
Backward Euler (contd...)
Gaussian prior:
un ∼ GP(0, kuu(xi , xj , θ))
Linear operator:
Lαx = · − τα d2
dx2 ·
Lαx u = f ; u := un, f := un−1
August 9, 2018 A. Grundner, K. Wang, K. Harsha 17
Technische Universität MünchenData Innovation Lab
Backward Euler scheme: Results
Kernel: kuu(xi , xj ; θ) = e(θ(xi−xj )2)
0 1 2 3 4 5x
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
u
Input and output to the operatorun
un 1
7 6 5 4 3 2log( )
0.00
0.02
0.04
0.06
0.08
0.10
|tr
uees
timat
e|
Error in parameter estimate vs time steps
0 10 20 30 40 50 60 70iteration #
220
200
180
160
140
120
100
80
60
nlm
l
Convergence plot for one run with 20 data points
August 9, 2018 A. Grundner, K. Wang, K. Harsha 18
Technische Universität MünchenData Innovation Lab
Backward Euler for the nonhomogeneous case
ut − αuxx = f
u(x , t) = e−tsin(2πx), u0(x) := u(x ,0) = sin(2πx)
f (x , t) = (−1 + 4απ2)e−tsin(2πx)
x , t ∈ [0,1]
with the Backward Euler scheme:
un − un−1
τ− α d2
dx2 un = fn
un − ταd2
dx2 un = un−1 + τ fn
August 9, 2018 A. Grundner, K. Wang, K. Harsha 19
Technische Universität MünchenData Innovation Lab
Backward Euler for the nonhomogeneous case(contd...)
Gaussian prior:
un ∼ GP(0, kuu(xi , xj , θ))
Linear operator:
Lαx = · − τα d2
dx2 ·
Lαx u = f ; u := un, f := un−1 + τ fn
August 9, 2018 A. Grundner, K. Wang, K. Harsha 20
Technische Universität MünchenData Innovation Lab
Backward Euler for the nonhomogeneous case: Data
20 data pointsτ = 0.01
0.0 0.2 0.4 0.6 0.8x
1.5
1.0
0.5
0.0
0.5
1.0
1.5
u
Input and output to the operator ( = 0.01)un
un 1 + fn
August 9, 2018 A. Grundner, K. Wang, K. Harsha 21
Technische Universität MünchenData Innovation Lab
Backward Euler for the nonhomogeneous case:Results
Comparison between kernels:For kuu(xi , xj ; θ) = eθ(xi−xj )
2
For kuu(xi , xj ; θ) = θe−12 (xi−xj )
2
August 9, 2018 A. Grundner, K. Wang, K. Harsha 22
Technische Universität MünchenData Innovation Lab
General case
Lφx u(x) =∂
∂tu(x)− φ ∂2
∂x2 u(x) = f (x),
where x = (t , x) ∈ R2.Gaussian prior:
u ∼ GP(0, kuu(xi , xj , θ))
kuu(xi , xj , ti , tj ; θ) = e[−θ1(xi−xj )2−θ2(ti−tj )2]
August 9, 2018 A. Grundner, K. Wang, K. Harsha 23
Technische Universität MünchenData Innovation Lab
General case: Benchmark
10 data pointsNoise variance: 10−7
August 9, 2018 A. Grundner, K. Wang, K. Harsha 24
Technische Universität MünchenData Innovation Lab
General case: Results
1.0 1.2 1.4 1.6 1.8ln( 1)
2.001.751.501.251.000.750.500.25
ln(
2)
-0.152
0.192
0.655
1.277 2.113
3.238
3.238
4.750
4.750
6.782
6.782
9.514
13.186
18.12324.759
33.681
-0.152
0.192
0.192
0.655
1.277
2.113
2.113
3.238
3.238
3.238
3.238
4.750
4.750
4.750
4.750
6.782
6.782
6.782
9.514
9.514
13.186
13.186
18.123
3 4 5 6 71
0.2
0.4
0.6
0.8
2
NLML contour lines
1.0 1.2 1.4 1.6 1.8 2.0ln( 1)
0
2
4
6
nlm
l
2.0 1.5 1.0 0.5 0.0ln( 2)
nlm
l
Profile likelihood
August 9, 2018 A. Grundner, K. Wang, K. Harsha 25
Technische Universität MünchenData Innovation Lab
General case: Simulation results
10 12 14 16 18 20 22 24Number of data points
0.000
0.002
0.004
0.006
0.008
0.010
Abso
lute
erro
r
(A) Error in estimate of the parameter
10 12 14 16 18 20 22 24Number of data points
5
10
15
20
25
30
Exec
utio
n tim
e
(B) Execution time benchmark
August 9, 2018 A. Grundner, K. Wang, K. Harsha 26
Technische Universität MünchenData Innovation Lab
General case: Comparison with the full kernel
Full kernel: θexp(− 12l1
(xi − xj )2 − 1
2l2(ti − tj )2)
August 9, 2018 A. Grundner, K. Wang, K. Harsha 27
Technische Universität MünchenData Innovation Lab
General case: Comparison with the full kernel
10 12 14 16 18 20 22 24Number of data points
0.000
0.002
0.004
0.006
0.008
0.010
Abso
lute
erro
r
Error in estimate of the parameter
10 12 14 16 18 20 22 24Number of data points
10
20
30
40
50
60
Exec
utio
n tim
e
Execution time benchmark
August 9, 2018 A. Grundner, K. Wang, K. Harsha 28
Technische Universität MünchenData Innovation Lab
Contents
1 Intro to Gaussian Processes
2 Parameter Estimation with Gaussian ProcessesFrameworkExamples
3 Heat equationBackward Euler for the homogeneous caseBackward Euler for the nonhomogeneous caseNo discretization
4 Wave equation
5 Burgers’ equation
6 Conclusions
August 9, 2018 A. Grundner, K. Wang, K. Harsha 29
Technische Universität MünchenData Innovation Lab
The Wave Equation
∂2
∂t2 u = c∇2u
Can rewrite it (in one spatial dimension) as Lcxu = f , where f = 0 and
Lcx =
∂2
∂t2 · − c∂2
∂x2 ·
August 9, 2018 A. Grundner, K. Wang, K. Harsha 30
Technische Universität MünchenData Innovation Lab
A solution for c = 1:
u(x , t) = (x − t)2 + sin(x + t).
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0x-values
0
50
100
150
200
250
300
350u t
(x)
t = 0
t = 9
t = 0
t = 9
t = 0
t = 9
t = 0
t = 9
t = 0
t = 9
t = 0
t = 9
t = 0
t = 9
t = 0
t = 9
t = 0
t = 9
t = 0
t = 9
Function values for ut(x) with t {0, , 9}
Sample 20 random points X in [0,1]2 along with u(X ) and f (X ).
Problem at hand : Estimate c from these samples.
August 9, 2018 A. Grundner, K. Wang, K. Harsha 31
Technische Universität MünchenData Innovation Lab
Applying our algorithm
Assumption:u ∼ GP(0, kuu(xi , xj ; θ)i,j ),
where kuu is an RBF Kernel and θ = {σu, lx , lt}.f is GP-distributed as a linear transformation of u.Minimize the nlml, that corresponds to u, f and our data samples.
Our result: c = 1.0003
August 9, 2018 A. Grundner, K. Wang, K. Harsha 32
Technische Universität MünchenData Innovation Lab
The absolute error in our estimate
We plot the error |c − 1| using five differently colored runs of ouralgorithm (c is our estimate for c).
Here, the error is bounded by 0.041 for 10 ≤ n ≤ 24 (blue-dashedline).
August 9, 2018 A. Grundner, K. Wang, K. Harsha 33
Technische Universität MünchenData Innovation Lab
Calculating the L2-error
Given c, we can solve d2
dt2 u(x , t)− c d2
dx2 u(x , t) = 0 and get asolution based on our estimate:
u(x , t) = u(x ,√
ct) = (x −√
ct)2 + sin(x +√
ct)
Can plot ‖u − u‖L2 now:
The L2-error is in our case bounded by 0.015 for 10 ≤ n ≤ 24.August 9, 2018 A. Grundner, K. Wang, K. Harsha 34
Technische Universität MünchenData Innovation Lab
Contents
1 Intro to Gaussian Processes
2 Parameter Estimation with Gaussian ProcessesFrameworkExamples
3 Heat equationBackward Euler for the homogeneous caseBackward Euler for the nonhomogeneous caseNo discretization
4 Wave equation
5 Burgers’ equation
6 Conclusions
August 9, 2018 A. Grundner, K. Wang, K. Harsha 35
Technische Universität MünchenData Innovation Lab
Burgers’ Equation
∂u∂t
+ u∂u∂x
= ν∂2u∂x2
We look at the inviscid Burgers’ Equation, that is when the diffusioncoefficient is zero: ν = 0. Then a solution is:
u(x , t) =x
1 + t
We implemented two similar setups:1) Infer c in:
ut + cuux = 0 (1)
2) Infer ν in:ut + uux = νux (2)
August 9, 2018 A. Grundner, K. Wang, K. Harsha 36
Technische Universität MünchenData Innovation Lab
Applying our algorithm
Used discretization methods (with step size τ = 0.001)Replaced the non-linear term with the mean µn−1 of un−1
- Used the backward Euler scheme for (1):
un(x)− un−1(x)
τ+ cun(x)
ddx
un(x) = 0,
where Lcxun = un−1 and Lc
x = · + τcµn−1ddx · .
- Used the forward Euler scheme for (2):
un(x)− un−1(x)
τ+ un−1(x)
ddx
un−1(x) = νddx
un−1(x),
where Lνx un−1 = un and Lνx = · + τ(ν − µn−1) ddx · .
August 9, 2018 A. Grundner, K. Wang, K. Harsha 37
Technische Universität MünchenData Innovation Lab
Results
The blue-dashed line is given byf (x) = 0.01x2.2:
Here, we replaced the non-lineari-ty by u(x ,0) = x for a comparison:
August 9, 2018 A. Grundner, K. Wang, K. Harsha 38
Technische Universität MünchenData Innovation Lab
Contents
1 Intro to Gaussian Processes
2 Parameter Estimation with Gaussian ProcessesFrameworkExamples
3 Heat equationBackward Euler for the homogeneous caseBackward Euler for the nonhomogeneous caseNo discretization
4 Wave equation
5 Burgers’ equation
6 Conclusions
August 9, 2018 A. Grundner, K. Wang, K. Harsha 39
Technische Universität MünchenData Innovation Lab
Problem solved!
LxΦ
GP
Thanks for your help! No problem.Any day!
August 9, 2018 A. Grundner, K. Wang, K. Harsha 40
Technische Universität MünchenData Innovation Lab
Conclusions
Efficient and quite accurate framework for estimating parametersin differential equationsNo discretization methods neededDesigned for linear transformations onlyCovariance matrix often ill-conditioned for more than 30 datapointsAutomatic calculation of all kernels possibleInitial attempt with pyGPs was unsuccessful
August 9, 2018 A. Grundner, K. Wang, K. Harsha 41
Technische Universität MünchenData Innovation Lab
What’s next?
GP
Okay!WHO’S NEXT ?
It’s theNon-Linear Operator!!
August 9, 2018 A. Grundner, K. Wang, K. Harsha 42
Technische Universität MünchenData Innovation Lab
Oh yes!
GP
oh f#$%
August 9, 2018 A. Grundner, K. Wang, K. Harsha 43
Technische Universität MünchenData Innovation Lab
Thank you!
August 9, 2018 A. Grundner, K. Wang, K. Harsha 44
Technische Universität MünchenData Innovation Lab
Trying to use the Python-package pyGPs
Our pyGPs approach:1. Assume Gaussian Priors with RBF Kernels:
u(x) ∼ GP(0, kuu(x , x ′;σu, lu))
f (x) ∼ GP(0, kff (x , x ′;σf , lf ))
2. Can optimize hyperparameters with pyGPs (given the data{Xu,Yu} and {Xf ,Yf})3. Know that the covariance matrix for f is kf = Lφx′Lφx kuu, sincef (x) = Lφx u(x). Set kf ' kff . Then:
kf (xi , xi ) ' kff (xi , xi )
Rearranging:φ ' g(σu, σf , lu),
This we can evaluate.August 9, 2018 A. Grundner, K. Wang, K. Harsha 45
Technische Universität MünchenData Innovation Lab
Using it for a simple example
We used this approach for u(x) =√
x and f (x) = Lφx u(x) with φ = 12.By the previous slide it follows
φ ' σf
σu.
Using 15 evenly spaced data samples in [0,2π], our result wasφ = 12.05.
August 9, 2018 A. Grundner, K. Wang, K. Harsha 46