David Makowski and Daniel Wallach INRA, France

1

Bayesian methods for parameter estimation and data assimilation with crop models

Part 4: The Metropolis-Hastings algorithm

David Makowski and Daniel Wallach

INRA, France

January 2007

http://www.inra.fr/

2

Previously

• Approximation of posterior distribution with the Importance Sampling algorithm.

• Implementation with the R statistical software.

• Application to estimate one parameter.

P Y PP Y

P Y P

Bayes’ Theorem

3

Objectives of part 4

• Present another algorithm to approximate the posterior probability distribution, the Metropolis-Hastings algorithm.

• Illustrate with 2 examples. The first has 1 parameter, the second 3 parameters.

• Furnish a program in the R language that you can run to implement Metropolis-Hastings for the examples.

•R is free (see http://www.r-project.org/.)

http://www.r-project.org/



4

Two approaches for approximating posterior distributions from Monte Carlo simulations

1. Non adaptative methods• All parameter vectors can be generated at the start of the procedure. The choice of parameters to be tested does not depend on the results for previous parameters.

•Example: Importance Sampling (see part 3).

2. Markov chain Monte Carlo methods (MCMC)• Parameter values are generated from a Markov chain. The parameter value to be tested at stage i+1 can depend on the parameter value at stage i.

• The most important methods are the Metropolis-Hastings algorithm and Gibbs sampling.

5

The Metropolis-Hastings algorithm

General case

Step 0. Choose a starting value θ1. Define a proposal distrubution Pp(θc|θi). (For example, use a normal distribution with mean equal to θi ).

Repeat steps 1-3 for i=1,…,N

Step 1. Generate a candidate parameter value θc from Pp ( θc|θi ).

Step 2. Calculate

Step 3. If T1, then θi+1 = θc . If T<1, then draw u from a uniform distribution on the interval (0, 1). If u<T then θi+1 = θc otherwise θi+1 = θi .

c c p i c

i i p c i

P Y P PT

P Y P P

The result of the algorithm is a list of N parameter values. The same value may be repeated several times.

6


with symmetric proposal distribution

• A common choice for the proposal distribution P(θc|θi) is a normal distribution with mean equal to θi and constant variance.

• In this case P( θc|θi ) = P(θi|θc ) and the expression for T simplifies:

c c

i i

P Y PT

P Y P

Likelihood Prior density

7


Choices to me made

• The proposal distribution. The number of iterations required for a good approximation to the posterior distribution depends on this choice.

• The number of iterations. A large number is generally necessary (N=10000, 100000 or more depending on the problem).

• The number of parameter values in the list to be discarded (to reduce dependence on the value chosen for starting the algorithm).

• We will give some suggestions for these choices with example 1.

8

Example 1Example already presented in parts 2 and 3: Estimation of crop yield.

• The single unknown parameter is the yield of a particular field. The prior information is an expert’s opinion. There is also information from a measurement. Both the prior density and the likelihood are normal distributions.

•In this case, the exact expression of the posterior distribution is known.

• This example is used to show that the Metropolis-Hastings method can give a good approximation of the posterior distribution.

9

Example 1 – Exact posterior distribution

From part 2, we have:

• Measurement: Y=9 t/ha (sd=1)

• Prior distribution: P(θ) = N(5, 2²)

• Likelihood: P(Y|θ ) = N(θ, 1)

• Exact posterior distribution: P(θ |Y) = N(8.2, 0.8²)

0 2 4 6 8 10 12

Theta (t/ha)

0.0

0.1

0.2

0.3

0.4

0.5

De

nsi

ty

Prior probability distributionLikelihood function

Posterior probability distribution

10

Example 1 – Metropolis-Hastings

Step 0. Choose θ1=5t/ha. As proposal distribution use a normal distribution: P( θc|θi ) = N( θi , 0.8²).

Repeat steps 1-3 for i=1,…,N

Step 1. Generate a candidate parameter value θc from P( θc|θi ).

Step 2. Calculate with

and

Step 3. If u<min (1, T), where u is drawn from a uniform distribution on the interval (0, 1) then θi+1 = θc otherwise θi+1 = θi .

9

9c c

i i

P PT

P P

2

9exp

2

19

2

P

42

5exp42

1 2

P

11

Example 1 - ResultsN = 500. Chain 1.

Chain of parameter values

Posterior distribution approximated from the last 250 values

True posterior distribution

Measurement

12



Posterior distribution approximated from the last 250 values


Measurement

13




Measurement

Posterior distribution approximated from

the last 25000 values

14




Measurement

Posterior distribution approximated from

the last 25000 values

15

Example 1 – Running the R program

• Install the file « MHyield.txt » on your computer. Note the path. This file has the R program that does the calculations.

• Open R. • You will use the “source” command to run the program:

– The command is given as a comment in the first line of the program.

– In my case, I had to type: source("c:\\David\\Enseignements\\Cours ICASA\\MHyield.txt").

– You must replace the path name by your path name. – Copy and paste the corrected command (without the “#”

character) in the Commands window of R.– Press RETURN to execute.

• You can easily change the value of N, the measurement value, its accuracy… See comments in my R function MHyield.

16

Example 2Estimation of the three parameters of a model of yield response to fertilizer.

• Non linear model relating wheat yield to nitrogen fertilizer dose.

Yield = θ1 + θ2 (Dose – θ3) if Dose < θ3

Yield = θ1 if Dose ≥ θ3

• Objective: Estimation of the 3 parameters for a given wheat field.

Dose0

θ1

θ2

θ3

17

Example 2 – Prior distribution

• The prior distribution of the parameters was defined in a previous study (Makowski and Lavielle. 2006. JABES 11, 45-60).

• It represents the between-field variability of the parameter values in a region (bassin of Paris).

P(θ1) = N(9.18, 1.16²) t/ha (maximal yield value)

P(θ2) = N(0.026, 0.0065²) t/kg N (slope of the linear part)

P(θ1) = N(123.85, 46.7²) kg N /ha (N dose threshold)

• The prior means define the « average » response curve in the region of interest.

18

Example 2 - Data

Data collected in a new wheat plot in the same region.

Four yield measurements obtained in this plot for four different N doses.

Tested doses: 0, 50, 100, and 200 kg/ha.

Corresponding yield measurements in the plot: 2.50, 5.01, 7.45, and 7.51 t/ha.

19

Example 2 - Likelihood

43214321 ,,, YPYPYPYPYYYYP

2

2

2

;exp

2

1

jjj

DfYYPwith

f(D; θ) is the linear-plus-plateau response function (D = N dose).

σ was estimated in a previous study and is set equal to 0.3.

20

Example 2 – Results with N=50000 – Chain 1.

Chains of parameter values

21


Curve based on prior means

Curve based on posterior means

22


Chains of parameter values

23

Curve based on prior means

Curve based on posterior means


24

Example 2 – Running the R program

• The R program is in the file MHresponse.txt.

• To run this function yourself, follow the previous instructions

• Type « Return » after the first series of graphs to obtain the second series.

25

• Both methods can be used to approximate the posterior distribution of parameters for any model.

• Both methods require the definition of a proposal distribution to generate parameter values. Not easy in practice.

• The comparison of the two types of methods is an active area of reasearch.

• MCMC methods (Gibbs sampling and MH) can be easily implemented with the WinBUGS software. See http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml.

Conclusion

Importance sampling versus MCMC

http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml

Date post:	03-Jan-2016
Category:	Documents
Upload:	carter-vaughn
View:	45 times
Download:	3 times

David Makowski and Daniel Wallach INRA, France

Documents