+ All Categories
Home > Documents > Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010,...

Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010,...

Date post: 16-Dec-2015
Category:
Upload: emmeline-burkett
View: 222 times
Download: 1 times
Share this document with a friend
Popular Tags:
21
Uncertainty and Uncertainty and confidence intervals confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén
Transcript
Page 1: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

Uncertainty andUncertainty andconfidence intervalsconfidence intervals

Statistical estimation methods, Finse

Friday 10.9.2010, 12.45–14.05

Andreas Lindén

Page 2: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

OutlineOutline• Point estimates and uncertainty• Sampling distribution

– Standard error– Covariation between parameters

• Finding the VC-matrix for the parameter estimates– Analytical formulas– From the Hessian matrix– Bootstrapping

• The idea behind confidence intervals• General methods for constructing confidence intervals of parameters

– CI based on the central limit theorem– Profile likelihood CI– CI by bootstrapping

Page 3: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

3

Point estimates and uncertaintyPoint estimates and uncertainty• The main output in any statistical model fitting are the

parameter estimates– Point estimates — one value for each parameter– The effect sizes– Answers the question “how much”

• Point estimates are of little use without any assessment of uncertainty– Standard error– Confidence intervals– p-values– Estimated sampling distribution– Bayesian credible intervals– Plotting Bayesian posterior distribution

Page 4: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

4

Sampling distributionSampling distribution• The probability distribution of a parameter estimate

– Calculated from a sample– Variability due to sampling effects

• Typically depends on sample size or the number of degrees of freedom (df)

• Examples of common sampling distributions– Student’s t-distribution– F-distribution– χ²-distribution

Page 5: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

5

Degrees of freedomDegrees of freedom

Y

X

In a linear regression df = n – 2

Page 6: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

6

Properties of the sampling distributionProperties of the sampling distribution

• The standard error (SE) of a parameter, is the estimated standard deviation of the sampling distribution– Square root of parameter variance

• Parameters are not necessarily unrelated– The sampling distribution of several parameters is multivariate– Example: regression slope and intercept

Page 7: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

7

Linear regression – simulated dataLinear regression – simulated dataParam. a b σ²

True value 4.00 1.00 0.80

Estim. 1 4.29 0.96 0.70

Estim. 2 4.13 0.97 0.36

Estim. 3 3.86 0.98 0.83

Estim. 4 3.77 1.04 0.75

Estim. 5 3.63 1.06 0.63

Estim. 6 4.39 0.93 0.72

Estim. 7 3.80 0.98 0.91

Estim. 8 3.78 1.06 0.92

Estim. 9 3.74 1.07 0.69

Estim. 10 4.62 0.84 0.50

… … … …

Estim 100 3.54 1.06 0.71

Page 8: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

8

Properties of the sampling distributionProperties of the sampling distribution

• The standard error (SE) of a parameter, is the estimated standard deviation of the sampling distribution– Square root of parameter variance

• Parameters are not necessarily unrelated– The sampling distribution of several parameters is multivariate– Example: regression slope and intercept

0.1531 -0.0273 0.0031

COV = -0.0273 0.0059 0.0002

0.0031 0.0002 0.0335

1.0000 -0.9085 0.0432

CORR = -0.9085 1.0000 0.0159

0.0432 0.0159 1.0000

Page 9: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

9

Properties of the sampling distributionProperties of the sampling distribution

• The standard error (SE) of a parameter, is the estimated standard deviation of the sampling distribution– Square root of parameter variance

• Parameters are not necessarily unrelated– The sampling distribution of several parameters is multivariate– Example: regression slope and intercept

• Methods to obtain the VC-matrix (or standard errors) for a set of parameters– Analytical formulas– Bootstrap– The inverse of the Hessian matrix

Page 10: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

10

Parameter variances analyticallyParameter variances analytically• For many common situations the SE and VC-matrix of a set of parameters

can be calculated with analytical formulas• Standard error of the sample mean

• Standard error of the estimated binomial probability

Page 11: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

11

BootstrapBootstrap• The bootstrap is a general and common resampling method• Used to simulate the sampling distribution• Information in the sample itself is used to mimic the original

sampling procedure– Non-parametric bootstrap — sampling with replacement – Parametric bootstrap — simulation based on parameter estimates

• The procedure is repeated B times (e.g. B = 1000)• To make inference from the bootstrapped estimates

– Sample standard deviation = bootstrap estimate of SE– Sample VC-matrix = bootstrap estimate of VC-matrix– Mean = difference between bootstrap mean and original estimate is

an estimate of bias

Page 12: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

12

VC-matrix from the HessianVC-matrix from the Hessian• The Hessian matrix (H)

– 2nd derivative of the (multivariate) negative log-likelihood at the ML-estimate

– Typically given as an output by software for numerical optimization

• The inverse of the Hessian is an estimate of the parameters’ variance-covariance matrix

Page 13: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

13

Confidence interval (CI)Confidence interval (CI)• An frequentistic interval estimate of one or several

parameters• A fraction α of all correctly produced CI:s will fail to include

the true parameter value– Trust your 95% CI and take the risk α = 0.05

• NB! Should not be confused with Bayesian credible intervals– CI:s should not be thought to contain the parameter with 95%

probability– The CI is based on the sampling distribution, not on an estimated

probability distribution for the parameter of interest

Page 14: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

14

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

70

80

90

100

Page 15: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

15

CI based on central limit theoremCI based on central limit theorem

• The sum/mean of many random values are approximately normally distributed– Actually t-distributed with df depending on

sample size and model complexity– Might matter with small sample size

• As a rule of thumb, an arbitrary parameter estimate ± 2*SE produce an approximate 95% confidence interval– With infinitely many observations ± 1.96*SE

Page 16: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

16

CI from profile likelihoodCI from profile likelihood• The profile deviance

– The change in −2*log-likelihood, in comparison to the ML-estimate

– Asymptotically χ²-distributed (assuming infinite sample size)

• Confidence intervals can be obtained as the range around the ML-estimate, for which the profile deviance is under a critical level– The 1 – α quantile from χ²-distribution– One-parameter -> df = 1 (e.g. 3.841 for α = 0.05)– k-dimensional profile deviance -> df = k

Page 17: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

17

95% CI from profile deviance95% CI from profile deviance

–2*LL

Parameter value

Fmin + 3.841

Fmin

Page 18: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

18

2-D confidence regions2-D confidence regions

Parameter a

Parameter b

99% confidence region, deviance χ²df2 = 9.201

95% confidence region, deviance χ²df2 = 5.992

Page 19: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

19

CI by bootstrappingCI by bootstrapping

• A 100*(1 – α)% CI for a parameter can be calculated from the sampling distribution– The α / 2 and 1 – α /2 quantiles (e.g. 0.025 and

0.975 with α = 0.05)

• In bootstrapping, simply use the sample quantiles of simulated values

Page 20: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

ExercisesExercises• Data: The prevalence of an infectious disease in a human

population is investigated. The infection is recorded with 100% detection efficiency. In a sample of N = 80 humans X = 18 infections were found.

• Model: Assume that infection (x = 0 or 1) of a host individual is an independent Bernoulli trial with probability pi, such that the probability of infection is constant over all hosts.

• (This equals a logistic regression with an intercept only. Host specific explanatory variables, such as age, condition, etc. could be used to improve the model of pi closer.)

Page 21: Uncertainty and confidence intervals Statistical estimation methods, Finse Friday 10.9.2010, 12.45–14.05 Andreas Lindén.

Do the following in R:Do the following in R:a) Calculate and plot the profile (log) likelihood of infection probability p

b) What is the maximum likelihood estimate of p (called p̂� )?

c) Construct 95% and 99% confidence intervals for p̂� based on the profile likelihood

d) Calculate the analytic SE for p̂�

e) Construct symmetric 95% confidence interval for p̂� based on the central limit theorem and the SE obtained in previous exercise

f) Simulate and plot the sampling distribution of p̂� by parametric bootstrapping (B = 10000)

g) Calculate the bootstrap SE of p̂�

h) Construct 95% confidence interval for p̂� based on the bootstrap


Recommended