BAYESIAN STATISTICS AND BIAS ANALYSIS Joel Mefford [email protected] 02/24/2012.

BAYESIAN STATISTICS AND BIAS ANALYSIS

Joel Mefford [email protected]/24/2012

BAYESIAN STATISTICS

Rothman, Greenland, and Lash: Ch. 18.

Bayesian Statistics Bayesian Approach to Probability and Statistics Prior Distributions

Subjective Priors: Prior Belief Objective Bayes and Reference Priors Conjugate Priors Priors as Regularization Tools

Basic Bayesian Approaches Information-weighted Averages of Prior and Likelihood Data Priors Bayesian Interpretation of Frequentist Methods Bayes Factors

Further Developments Hierarchical Models Computational Tools

Advantages and Controversies

BAYESIAN STATISTICS


Bayesian Statistics

Bayesian Statistics Bayesian Approach to Probability and Statistics What are probabilities?

Relative frequency interpretation: The limit of the relative frequency of trials in which a particular event

occurs as the number of trials increases without bound is the probability of that event.

What if you are conducting an observational study which cannot be replicated in any meaningful way?

Subjective probability interpretation: Probabilities may represent degrees of belief. Ranging from:

Probability=0: certainty that an event does not occur Probability=1: certainty that an event does occur

But maybe you don’t care what my subjective belief is.

Bayesian Statistics Bayesian methods are often contrasted with Frequentist methods (for example

hypothesis tests and calculation of p-values or confidence intervals) Bayesian and Frequentist methods are not necessarily mutually exclusive but

historically and in common usage, they are separate. A starting point for considering their distinction can be the different interpretations of probability

Relative frequency interpretation Starting with consideration of the sampling or generating process from

which the dataset came, the distributions of statistics such as effect estimates and p-values can be considered

Probabilities, such as the probability that a confidence interval will cover the true value or the probability of a false negative result can then be considered

Frequentist methods are designed to perform well when scored by probabilities such as these

Subjective probability interpretation An analyst could record the state of belief before looking at data, then use

a statistical method to update beliefs based on data Bayesian methods may be interpreted this way as an inductive learning

process

Bayesian Statistics Many texts covering Bayesian statistics have been published recently. These two are modern classics.

Bayesian Statistics

Bayes’ Rule

The denominator, Pr(data) can be difficult to calculate, but fortunately that is not always necessary

Bayesian Statistics

Bayes’ Rule

Example: using Bayes’ Rule to calculate the predictive value of a positive test result (PVP).

Suppose you are interested in a disease (D) and a diagnostic test for the disease (T)

The sensitivity of the test is 0.99 The specificity of the test is 0.9 The prevalence of the disease is 0.02 What is the predictive value of a positive test result?

Bayesian Statistics

Bayes’ Rule is not controversial and all statisticians would agree to use it in probability calculations such as the example above. Statisticians would generally agree to use Bayesian methods when they have strongly held subjective beliefs to put into a prior and when the methods are computationally feasible.

A defining feature of Bayesian statistics is that a prior and Bayes’ Rule is always used in analyses, even if the prior is not particularly informative.

There are different classes of common prior distributions Subjective Priors: Prior Belief Conjugate Priors Objective Bayes and Reference Priors Priors as Regularization Tools

Bayesian Statistics

Prior Distributions Subjective Priors: Prior Belief

Albert (2007) Bayesian Computation With R

Bayesian Statistics

Prior Distributions Subjective Priors: Prior Belief If you know something about the quantities you are trying to estimate, you

can code that information as a prior distribution and then use the data to refine that prior knowledge

Generally you know something about your quantities of interest; often you know a great deal.

Adult human heights take positive values Adult human heights are almost always between 4 and 7 feet… If you are working with a specific population, you can probably give

narrow height intervals that will cover at least 50%, 90%, or other fractions of the population.

Cromwell’s Rule. If a prior assigns 0 probability to a certain event, then the posterior distribution will place 0 probability on that event

Bayesian Statistics

Prior Distributions Conjugate Priors Many common probability distributions come in pairs such that:

If the likelihood is proportional to one member of a pair And the prior distribution is the conjugate member of the pair Then the posterior will be a member of the same distribution as the prior,

with easily updated parameters.

Most of the examples of Bayesian computations in the RGL text take advantage of the fact that the conjugate distribution for a Normal distribution with known variance is another Normal distribution

Bayesian Statistics

Prior Distributions Objective Bayes You may want to use uninformative priors that do not contain very much

information, so that the posterior is mainly due to the data rather than the prior.

You may even be able to use a prior that is not a proper probability distribution. For example, a constant prior over the entire real number line as a prior

when you are trying to estimate the average height of a population. The constant prior does not push the estimate towards any particular

values, so it does not influence the estimate. Improper priors can lead to numerical problems in computations and may

be difficult to interpret. There various approaches to defining weakly informative default or reference

priors if: You do not want to overwhelm the data with a strong prior You do not have any background information on which to base an

informative prior Using such weak priors is called the “objective Bayesian” approach

Bayesian Statistics

Prior Distributions Priors as Regularization Tools

Sometimes priors can help reduce or avoid computational instabilities, In particular when working with sparse datasets.

For example, adding 1 or ½ to each cell of a 2x2 table can be interpreted as the use of a fairly weak prior in the analysis.

“I find the term “weakly informative priors” to be very useful […] for the purposes of regularization (“keeping things unridiculous”) in sparse-data settings. “

Andrew Gelman: http://andrewgelman.com/2011/12/more-by-berger-and-me-on-weakly-informative-priors/

Bayesian Statistics Basic Bayesian Approaches

Information-weighted Averages of Prior and Likelihood Example: normal prior, normal likelihood => normal posterior Example: beta prior, binomial likelihood => beta posterior


Data Priors Standard Bayesian methods work with prior DISTRIBUTIONS and represent

the essential information in the data using likelihood DISTRIBUTIONS. Instead, you could work with the data itself and some “pseudodata” that

would be typical of the intended prior, or more correctly, that has a likelihood proportional to the intended prior.

For example, we can work backwards from a common Epidemiologic device to see it as a data prior:

When we have few observations or unbalanced designs, it is common practice to add 1 or ½ to each cell of a 2x2 table.

Suppose we add 1 to each cell. Just using these 4 pseudodata observations, we have an odds ratio of 1, a log odds ratio of 0, and a standard error for the log odds ratio of sqrt(1+1+1+1) = 2.

Using a normal approximation for the log odds ratio, this suggests a normal prior: N(mean=0, var=2*2).

Adding the pseudocounts to the 2x2 table corresponds to assuming the prior above and conducting a Bayesian analysis


Bayesian Interpretation of Frequentist Methods In simple cases, if you use a constant, improper prior, the posterior

distribution will be a normalized version of the likelihood. So, maximum-likelihood and other likelihood-based methods could be considered Bayesian methods with a particular uninformative prior.

Note: a prior that is flat on one scale will not be flat on a different scale – say if you transform the data from the natural scale to the log scale then a prior that was flat on the natural scale will put much more weight on large values on the log scale

For simple models, even with informative priors, the posterior distribution and the scaled likelihood distribution converge. So likelihood-based statistics such as ML estimates and confidence intervals could be considered large sample theory based estimates to Bayesian estimates and intervals.

Many Bayesian applications involve hierarchical models. There are frequentist hierarchical models (mixed effects models, empirical Bayes…). Important properties are demonstrated by the whole set (shrinkage…)


Bayes Factors. Used for Bayesian model comparison and model selection They may be difficult to calculate, so approximations exist

Bayesian Statistics

Further Developments Hierarchical Models and Computational Tools

Bayesian Statistics

Bayesian Statistics Use a Bayesian hierarchical model to fit each trajectory with its own break-point

and slopes. The data were separated into different groups by initial CD4 count. Different

distributions were used for the slopes of trajectories for different groups.

Bayesian Statistics Level 1: log(CD4 count) as a function of the slopes and breakpoint Level 2: slopes and breakpoints as functions of hyperparameters Level 3: priors on hyperparameters: α ~ N(0, s) Level 4: weakly informative priors on s from level 3.

Bayesian Statistics

Bayesian Statistics Fitting the model.

There are simple tools to fit Bayesian hierarchical models such as this one. Gibbs Sampling, MCMC, importance sampling, and extensions

The authors of this paper used WinBUGS For many applications there are special-purpose computational tools that use

and fit Bayesian models

Bayesian Statistics Bayesian Regression in R

Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau and Yu-Sung Su. (2009). “A Weakly Informative Default Prior Distribution For Logistic And Other Regression Models.” The Annals of Applied Statistics 2 (4): 1360-1383.

library(arm). bayesglm(): Bayesian versions of standard regression functions, with weakly informative normal, t, or Cauchy priors on the regression coefficients

Note how thestandard logisticregression, glm(), gives wild estimatesfor 1964.

Bayesian Statistics Sensitivity Analysis:

Try several different priors Example: Reanalysis of clinical trial data to see if or when the trial would have

been stopped if a Bayesian interim analysis were part of trial’s analysis plan.

Bayesian Statistics

Bayesian Statistics

A Bayesian hierarchical model for survival data: Interest is in the posterior distribution of β, the log(Hazard Ratio)β := the log(hazard placebo/ hazard treatment)Early stopping rule: stop trial if Pr(β > 0.36) >= 0.95

Bayesian Statistics

Noninformative prior

Bayesian Statistics

Skeptical prior

Bayesian Statistics

Prior based on external information

Bias Analysis The protocol was to stop the trial because of evidence of a beneficial effect of

treatment if Pr(β > 0.36) >= 0.95 The different priors indicate different stopping times Bayesian methods are very natural to use in sequential analyses: the posterior from

the first interim analysis could become the prior for the second interim analysis

Bayesian Statistics Advantages

If you have subject-matter knowledge you can use an informative prior to get effect estimates with lower variance.

Even weakly informative priors may reduce computational problems when using sparse data.

Very general computational tools allow the use of new or complicated models.

Controversies Some people just are not comfortable specifying or seeing others specify

priors. Sometime those nice computational tools fail; and worse, it is not always

clear that they have failed. Non-Bayesian methods are used with good or adequate results in many

applications. Differences

Bayesians often talk about entire posterior distributions or summarize them with expected values rather than maximum values.

Emphasize narrow credible intervals rather than, e.g. symmetric confidence intervals

BIAS ANALYSIS


Bias Analysis Analysis of Unmeasured Confounders Analysis of Misclassification Probabilistic Bias Analysis

BIAS ANALYSIS


Bias Analysis Why bias analysis?

conventional methods assume: random assignment to treatment random sampling

random sampling within strata random measurement errors, if any

but assumptions can be wrong => systematic errors (may be bigger than random errors)

big studies: standard errors become relatively small so systematic errors become

more apparent

Bias Analysis Analysis of or adjustment for systematic errors requires guesses for the likely

sizes of the biases. Different analysts may have different guesses. May try a set or distribution of guesses rather than a single one Quantitative bias analysis:

”Such assessments may argue persuasively that certain sources of bias cannot by themselves plausibly explain a study result , or that a bias explanation cannot be ruled out." RGL p. 347

Analysis of unmeasured confounders Suppose you have measurements for:

exposure (occupational exposure to fine silica) and your outcome of interest (lung cancer mortality) but not for a potential confounder (smoking)

If we know or are willing to assume: the silica-exposure-stratum specific associations of smoking and

lung cancer mortality the prevalence of smoking in each silica-exposure-stratum the prevalence of silica exposure in controls

Then we can estimate silica effect, adjusted for smoking Garbage-in; garbage-out with these many assumptions

We could try many sets of assumptions and then plot or summarize the many adjusted estimates we make

Bias Analysis: Unmeasured Confounders




Bias Analysis: Misclassfication

Exposure Classification

True Exposed

E=1

True not Exposed

E=0

Reported Exposed

e=1

True Positive

False Positive

Reported not Exposed

e=0

False Negative

True Negative

Sensitivity = Pr(report exposure | true exposure ) = Pr(e=1 | E=1)

Specificity = Pr(report not exposed | truly not exposed) = Pr(e=0 | E=0)

Bias Analysis: Misclassification

Logistic Regression as ClassifierAfter fitting a logistic regression model:

For each individual used to fit the model, you can calculatea probability of having the outcome variable = 1 (radiation therapyinstead of prostatectomy)

You could use Pr(outcome = 1 ) = 0.5 as a cutoff: individuals withprobabilities greater than the cutoff are assigned or predicted to bein the outcome=1 category; otherwise they are assigned or precictedto be in the outcome=0 category. By comparing these predictionsto the actual outcomes, you can calculate sensitivities and specificities


. logistic treat2 age race inst i.gleasoncat

Logistic regression Number of obs = 330 LR chi2(5) = 93.73 Prob > chi2 = 0.0000Log likelihood = -181.84698 Pseudo R2 = 0.2049

------------------------------------------------------------------------------ treat2 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | 1.14762 .0219378 7.20 0.000 1.105418 1.191433 race | 5.464351 2.101918 4.41 0.000 2.571078 11.61347 inst | .845149 .2970819 -0.48 0.632 .4243497 1.683227 | gleasoncat | 2 | .5591875 .2315788 -1.40 0.160 .2483394 1.259126 3 | .4410856 .1823707 -1.98 0.048 .196149 .9918811------------------------------------------------------------------------------


. estat classification

Logistic model for treat2

-------- True --------Classified | D ~D | Total-----------+--------------------------+----------- + | 113 44 | 157 - | 50 123 | 173-----------+--------------------------+----------- Total | 163 167 | 330

Classified + if predicted Pr(D) >= .5True D defined as treat2 != 0--------------------------------------------------Sensitivity Pr( +| D) 69.33%Specificity Pr( -|~D) 73.65%Positive predictive value Pr( D| +) 71.97%Negative predictive value Pr(~D| -) 71.10%--------------------------------------------------False + rate for true ~D Pr( +|~D) 26.35%False - rate for true D Pr( -| D) 30.67%False + rate for classified + Pr(~D| +) 28.03%False - rate for classified - Pr( D| -) 28.90%--------------------------------------------------Correctly classified 71.52%--------------------------------------------------


You could try probability cutoffs other than 0.5 ROC curves:

Try many cutoffs between Pr(outcome=1)=0 and Pr(outcome=1)=1Calculate the sensitivity and specificity for each cutoffPlot Sensitivity vs 1-Specificity


Bias Analysis: Misclassification You could try probability cutoffs other than 0.5 ROC curves:

Try many cutoffs between Pr(outcome=1)=0 and Pr(outcome=1)=1Calculate the sensitivity and specificity for each cutoffPlot Sensitivity vs. 1-Specificity

Stata: .lsens : plot sensitivity and specificity vs. cutoff .lroc : plot ROC curve

Sensitivity and Specificity are inversely related The area under the curve (AUC) or the 2 times the area between the curve

and the diagonal line (Gini coefficient) are used to assess the quality of the classifier/ prediction system

Assess quality of classifier with data that was not used to generate/fit/train the classifier (cross validations)

Bias Analysis: Probabilistic Bias Analysis





Simulating or correcting misclassification of EITHER exposure or outcome is not difficult

If both exposure and outcome can be misclassified, there is the additional issue of whether or not the two types of misclassification are independent

Misclassification may depend on levels of other covariates as well





Date post:	14-Jan-2016
Category:	Documents
Upload:	ashlee-helen-carson
View:	214 times
Download:	1 times

BAYESIAN STATISTICS AND BIAS ANALYSIS Joel Mefford [email protected] 02/24/2012.

Documents