Parameter Estimation with Gaussian Processes - Diplomarbeit … · 2018-09-06 · Data Innovation...

Technische Universität MünchenData Innovation Lab

DI-LAB Final Presentation

Parameter Estimation with Gaussian Processes

A. Grundner, K. Wang, K. Harsha

Scientific Lead: Prof. Dr. Eric Sonnendrucker, Dr. Ahmed Ratnani


How it started!

so, what’s the problem here?

Life’s so uncertain!I see all this data

around me but I don’t knowwhat my parameters are!! It’s fine. I’ll prescribe you

some priors and you can usethem with your favourite kernels!That should help with estimates.

LxΦ

GP

August 9, 2018 A. Grundner, K. Wang, K. Harsha 2


Contents

1 Intro to Gaussian Processes

2 Parameter Estimation with Gaussian ProcessesFrameworkExamples

3 Heat equationBackward Euler for the homogeneous caseBackward Euler for the nonhomogeneous caseNo discretization

4 Wave equation

5 Burgers’ equation

6 Conclusions



Gaussian Processes

A stochastic process with a distribution over functionsSpecified by a mean function, m(x), and a covariancefunction, or kernel, k (x , x ′)Application in machine learning: regression, classification...



Different Kernels



Realization of a GP

f ∼ GP(m, k)

m(x) = x2

4k(x , x ′) = exp(− 1

2 (x − x ′)2)y = f + ε

ε ∼ N (0, σ2)

5.0 2.5 0.0 2.5 5.0x

0

2

4

6

y(x)



Varying signal variance σf for SE kernel

k(x , x ′) = σ2f exp

(− (x−x′)2

2`2

)

4 2 0 2 4x

2.5

0.0

2.5

5.0

7.5

10.0

12.5

15.0

f

Varying f

1510100



Varying length scale l for SE kernel

k(x , x ′) = σ2f exp

(− (x−x′)2

2`2

)

4 2 0 2 4x

1

0

1

2

3

4

5

fVarying l

0.050.51.05.0



Contents




4 Wave equation


6 Conclusions



Framework for parameter estimation

Lφx u = fu ∼ GP(0, kuu(xi , xj , θ))

f ∼ GP(0, kff (xi , xj , θ, φ))

kff (xi , xj ; θ, φ) = LφxiLφxj

kuu(xi , xj ; θ)

kfu(xi , xj ; θ, φ) = Lφxikuu(xi , xj ; θ)

Dataset: x , yu, yf



Framework for parameter estimation

y =

[yuyf

]K =

[kuu(xu, xu; θ) + σ2

u I kuf (xu, xf ; θ, φ)kfu(xf , xu; θ, φ) kff (xf , xf ; θ, φ) + σ2

f I

]NLML =

12

[log|K |+ yT K−1y + Nlog(2π)

]Take maximum likelihood estimates: θest , φest



1D Linear operator with more than one parameter

Lφx u(x) = f (x)

Lφx := φ1 · + φ2ddx ·

u (x) = sin(x)f (x) = φ1 sin(x) + φ2 cos(x)

kff(xi , xj ; θ, φ1, φ2

)= LφxiL

φxj kuu

(xi , xj ; θ

)= Lφxi

(φ1kuu + φ2

∂∂xj

kuu

)= φ2

1kuu + φ1φ2∂∂xj

kuu

+φ1φ2∂∂xi

kuu + φ22∂∂xi

∂∂xi

kuu

kfu(xi , xj ; θ, φ1, φ2)

= Lφxi kuu(xi , xj ; θ)= φ1kuu + φ2

∂∂xi

kuu

Lφx u(x) = f (x)

L φx := φ1 · + φ2

ddx · + φ3

d2

dx2

u ( x ) = sin(x)f ( x ) = φ1 sin(x) + φ2 cos(x)− φ3 sin(x)

kff(xi , xj ; θ, φ1, φ2, φ3

)= LφxiL

φxj kuu

(xi , xj ; θ

)= Lφxi

(φ1kuu + φ2

∂∂xj

kuu + φ3∂2

∂x2jkuu

)=(φ1kuu + φ2

∂∂xi

kuu + φ3∂2

∂x2ikuu

)(φ1kuu + φ2

∂∂xj

kuu + φ3∂2

∂x2jkuu

)kfu(xi , xj ; θ, φ1, φ2, φ3

)= Lφxi kuu

(xi , xj ; θ

)= φ1kuu + φ2

∂∂xi

kuu + φ3∂2

∂x2ikuu



1D Linear operator with more than one parameter

φ1 = 2, φ2 = 5 φ2 = 5



Contents




4 Wave equation


6 Conclusions



Heat equation

∂u∂t− φ∇2u = f

In one spatial dimension,

Lφx u(x) =∂

∂tu(x)− φ ∂2

∂x2 u(x) = f (x),

where x = (t , x) ∈ R2



Backward Euler scheme

For the homogeneous case:

ut − αuxx = 0

u(x , t) = e−tsin(x√α

)

u0(x) := u(x ,0) = sin(x√α

)

Discretization in the time domain:

un − un−1

τ− α d2

dx2 un = 0

un − ταd2

dx2 un = un−1



Backward Euler (contd...)

Gaussian prior:

un ∼ GP(0, kuu(xi , xj , θ))

Linear operator:

Lαx = · − τα d2

dx2 ·

Lαx u = f ; u := un, f := un−1



Backward Euler scheme: Results

Kernel: kuu(xi , xj ; θ) = e(θ(xi−xj )2)

0 1 2 3 4 5x

1.00

0.75

0.50

0.25

0.00

0.25

0.50

0.75

1.00

u

Input and output to the operatorun

un 1

7 6 5 4 3 2log( )

0.00

0.02

0.04

0.06

0.08

0.10

|tr

uees

timat

e|

Error in parameter estimate vs time steps

0 10 20 30 40 50 60 70iteration #

220

200

180

160

140

120

100

80

60

nlm

l

Convergence plot for one run with 20 data points



Backward Euler for the nonhomogeneous case

ut − αuxx = f

u(x , t) = e−tsin(2πx), u0(x) := u(x ,0) = sin(2πx)

f (x , t) = (−1 + 4απ2)e−tsin(2πx)

x , t ∈ [0,1]

with the Backward Euler scheme:

un − un−1

τ− α d2

dx2 un = fn

un − ταd2

dx2 un = un−1 + τ fn



Backward Euler for the nonhomogeneous case(contd...)

Gaussian prior:

un ∼ GP(0, kuu(xi , xj , θ))

Linear operator:

Lαx = · − τα d2

dx2 ·

Lαx u = f ; u := un, f := un−1 + τ fn



Backward Euler for the nonhomogeneous case: Data

20 data pointsτ = 0.01

0.0 0.2 0.4 0.6 0.8x

1.5

1.0

0.5

0.0

0.5

1.0

1.5

u

Input and output to the operator ( = 0.01)un

un 1 + fn



Backward Euler for the nonhomogeneous case:Results

Comparison between kernels:For kuu(xi , xj ; θ) = eθ(xi−xj )

2

For kuu(xi , xj ; θ) = θe−12 (xi−xj )

2



General case

Lφx u(x) =∂

∂tu(x)− φ ∂2

∂x2 u(x) = f (x),

where x = (t , x) ∈ R2.Gaussian prior:

u ∼ GP(0, kuu(xi , xj , θ))

kuu(xi , xj , ti , tj ; θ) = e[−θ1(xi−xj )2−θ2(ti−tj )2]



General case: Benchmark

10 data pointsNoise variance: 10−7



General case: Results

1.0 1.2 1.4 1.6 1.8ln( 1)

2.001.751.501.251.000.750.500.25

ln(

2)

-0.152

0.192

0.655

1.277 2.113

3.238

3.238

4.750

4.750

6.782

6.782

9.514

13.186

18.12324.759

33.681

-0.152

0.192

0.192

0.655

1.277

2.113

2.113

3.238

3.238

3.238

3.238

4.750

4.750

4.750

4.750

6.782

6.782

6.782

9.514

9.514

13.186

13.186

18.123

3 4 5 6 71

0.2

0.4

0.6

0.8

2

NLML contour lines

1.0 1.2 1.4 1.6 1.8 2.0ln( 1)

0

2

4

6

nlm

l

2.0 1.5 1.0 0.5 0.0ln( 2)

nlm

l

Profile likelihood



General case: Simulation results

10 12 14 16 18 20 22 24Number of data points

0.000

0.002

0.004

0.006

0.008

0.010

Abso

lute

erro

r

(A) Error in estimate of the parameter


5

10

15

20

25

30

Exec

utio

n tim

e

(B) Execution time benchmark



General case: Comparison with the full kernel

Full kernel: θexp(− 12l1

(xi − xj )2 − 1

2l2(ti − tj )2)



General case: Comparison with the full kernel


0.000

0.002

0.004

0.006

0.008

0.010

Abso

lute

erro

r

Error in estimate of the parameter


10

20

30

40

50

60

Exec

utio

n tim

e

Execution time benchmark



Contents




4 Wave equation


6 Conclusions



The Wave Equation

∂2

∂t2 u = c∇2u

Can rewrite it (in one spatial dimension) as Lcxu = f , where f = 0 and

Lcx =

∂2

∂t2 · − c∂2

∂x2 ·



A solution for c = 1:

u(x , t) = (x − t)2 + sin(x + t).

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0x-values

0

50

100

150

200

250

300

350u t

(x)

t = 0

t = 9

t = 0

t = 9

t = 0

t = 9

t = 0

t = 9

t = 0

t = 9

t = 0

t = 9

t = 0

t = 9

t = 0

t = 9

t = 0

t = 9

t = 0

t = 9

Function values for ut(x) with t {0, , 9}

Sample 20 random points X in [0,1]2 along with u(X ) and f (X ).

Problem at hand : Estimate c from these samples.



Applying our algorithm

Assumption:u ∼ GP(0, kuu(xi , xj ; θ)i,j ),

where kuu is an RBF Kernel and θ = {σu, lx , lt}.f is GP-distributed as a linear transformation of u.Minimize the nlml, that corresponds to u, f and our data samples.

Our result: c = 1.0003



The absolute error in our estimate

We plot the error |c − 1| using five differently colored runs of ouralgorithm (c is our estimate for c).

Here, the error is bounded by 0.041 for 10 ≤ n ≤ 24 (blue-dashedline).



Calculating the L2-error

Given c, we can solve d2

dt2 u(x , t)− c d2

dx2 u(x , t) = 0 and get asolution based on our estimate:

u(x , t) = u(x ,√

ct) = (x −√

ct)2 + sin(x +√

ct)

Can plot ‖u − u‖L2 now:

The L2-error is in our case bounded by 0.015 for 10 ≤ n ≤ 24.August 9, 2018 A. Grundner, K. Wang, K. Harsha 34


Contents




4 Wave equation


6 Conclusions



Burgers’ Equation

∂u∂t

+ u∂u∂x

= ν∂2u∂x2

We look at the inviscid Burgers’ Equation, that is when the diffusioncoefficient is zero: ν = 0. Then a solution is:

u(x , t) =x

1 + t

We implemented two similar setups:1) Infer c in:

ut + cuux = 0 (1)

2) Infer ν in:ut + uux = νux (2)



Applying our algorithm

Used discretization methods (with step size τ = 0.001)Replaced the non-linear term with the mean µn−1 of un−1

- Used the backward Euler scheme for (1):

un(x)− un−1(x)

τ+ cun(x)

ddx

un(x) = 0,

where Lcxun = un−1 and Lc

x = · + τcµn−1ddx · .

- Used the forward Euler scheme for (2):

un(x)− un−1(x)

τ+ un−1(x)

ddx

un−1(x) = νddx

un−1(x),

where Lνx un−1 = un and Lνx = · + τ(ν − µn−1) ddx · .



Results

The blue-dashed line is given byf (x) = 0.01x2.2:

Here, we replaced the non-lineari-ty by u(x ,0) = x for a comparison:



Contents




4 Wave equation


6 Conclusions



Problem solved!

LxΦ

GP

Thanks for your help! No problem.Any day!



Conclusions

Efficient and quite accurate framework for estimating parametersin differential equationsNo discretization methods neededDesigned for linear transformations onlyCovariance matrix often ill-conditioned for more than 30 datapointsAutomatic calculation of all kernels possibleInitial attempt with pyGPs was unsuccessful



What’s next?

GP

Okay!WHO’S NEXT ?

It’s theNon-Linear Operator!!



Oh yes!

GP

oh f#$%



Thank you!



Trying to use the Python-package pyGPs

Our pyGPs approach:1. Assume Gaussian Priors with RBF Kernels:

u(x) ∼ GP(0, kuu(x , x ′;σu, lu))

f (x) ∼ GP(0, kff (x , x ′;σf , lf ))

2. Can optimize hyperparameters with pyGPs (given the data{Xu,Yu} and {Xf ,Yf})3. Know that the covariance matrix for f is kf = Lφx′Lφx kuu, sincef (x) = Lφx u(x). Set kf ' kff . Then:

kf (xi , xi ) ' kff (xi , xi )

Rearranging:φ ' g(σu, σf , lu),

This we can evaluate.August 9, 2018 A. Grundner, K. Wang, K. Harsha 45


Using it for a simple example

We used this approach for u(x) =√

x and f (x) = Lφx u(x) with φ = 12.By the previous slide it follows

φ ' σf

σu.

Using 15 evenly spaced data samples in [0,2π], our result wasφ = 12.05.


Date post:	02-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Parameter Estimation with Gaussian Processes - Diplomarbeit … · 2018-09-06 · Data Innovation...

Documents