Date post: | 14-Jan-2016 |
Category: |
Documents |
Upload: | ashlee-helen-carson |
View: | 214 times |
Download: | 1 times |
BAYESIAN STATISTICS
Rothman, Greenland, and Lash: Ch. 18.
Bayesian Statistics Bayesian Approach to Probability and Statistics Prior Distributions
Subjective Priors: Prior Belief Objective Bayes and Reference Priors Conjugate Priors Priors as Regularization Tools
Basic Bayesian Approaches Information-weighted Averages of Prior and Likelihood Data Priors Bayesian Interpretation of Frequentist Methods Bayes Factors
Further Developments Hierarchical Models Computational Tools
Advantages and Controversies
BAYESIAN STATISTICS
Rothman, Greenland, and Lash: Ch. 18.
Bayesian Statistics
Bayesian Statistics Bayesian Approach to Probability and Statistics What are probabilities?
Relative frequency interpretation: The limit of the relative frequency of trials in which a particular event
occurs as the number of trials increases without bound is the probability of that event.
What if you are conducting an observational study which cannot be replicated in any meaningful way?
Subjective probability interpretation: Probabilities may represent degrees of belief. Ranging from:
Probability=0: certainty that an event does not occur Probability=1: certainty that an event does occur
But maybe you don’t care what my subjective belief is.
Bayesian Statistics Bayesian methods are often contrasted with Frequentist methods (for example
hypothesis tests and calculation of p-values or confidence intervals) Bayesian and Frequentist methods are not necessarily mutually exclusive but
historically and in common usage, they are separate. A starting point for considering their distinction can be the different interpretations of probability
Relative frequency interpretation Starting with consideration of the sampling or generating process from
which the dataset came, the distributions of statistics such as effect estimates and p-values can be considered
Probabilities, such as the probability that a confidence interval will cover the true value or the probability of a false negative result can then be considered
Frequentist methods are designed to perform well when scored by probabilities such as these
Subjective probability interpretation An analyst could record the state of belief before looking at data, then use
a statistical method to update beliefs based on data Bayesian methods may be interpreted this way as an inductive learning
process
Bayesian Statistics Many texts covering Bayesian statistics have been published recently. These two are modern classics.
Bayesian Statistics
Bayes’ Rule
The denominator, Pr(data) can be difficult to calculate, but fortunately that is not always necessary
Bayesian Statistics
Bayes’ Rule
Example: using Bayes’ Rule to calculate the predictive value of a positive test result (PVP).
Suppose you are interested in a disease (D) and a diagnostic test for the disease (T)
The sensitivity of the test is 0.99 The specificity of the test is 0.9 The prevalence of the disease is 0.02 What is the predictive value of a positive test result?
Bayesian Statistics
Bayes’ Rule is not controversial and all statisticians would agree to use it in probability calculations such as the example above. Statisticians would generally agree to use Bayesian methods when they have strongly held subjective beliefs to put into a prior and when the methods are computationally feasible.
A defining feature of Bayesian statistics is that a prior and Bayes’ Rule is always used in analyses, even if the prior is not particularly informative.
There are different classes of common prior distributions Subjective Priors: Prior Belief Conjugate Priors Objective Bayes and Reference Priors Priors as Regularization Tools
Bayesian Statistics
Prior Distributions Subjective Priors: Prior Belief
Albert (2007) Bayesian Computation With R
Bayesian Statistics
Prior Distributions Subjective Priors: Prior Belief If you know something about the quantities you are trying to estimate, you
can code that information as a prior distribution and then use the data to refine that prior knowledge
Generally you know something about your quantities of interest; often you know a great deal.
Adult human heights take positive values Adult human heights are almost always between 4 and 7 feet… If you are working with a specific population, you can probably give
narrow height intervals that will cover at least 50%, 90%, or other fractions of the population.
Cromwell’s Rule. If a prior assigns 0 probability to a certain event, then the posterior distribution will place 0 probability on that event
Bayesian Statistics
Prior Distributions Conjugate Priors Many common probability distributions come in pairs such that:
If the likelihood is proportional to one member of a pair And the prior distribution is the conjugate member of the pair Then the posterior will be a member of the same distribution as the prior,
with easily updated parameters.
Most of the examples of Bayesian computations in the RGL text take advantage of the fact that the conjugate distribution for a Normal distribution with known variance is another Normal distribution
Bayesian Statistics
Prior Distributions Objective Bayes You may want to use uninformative priors that do not contain very much
information, so that the posterior is mainly due to the data rather than the prior.
You may even be able to use a prior that is not a proper probability distribution. For example, a constant prior over the entire real number line as a prior
when you are trying to estimate the average height of a population. The constant prior does not push the estimate towards any particular
values, so it does not influence the estimate. Improper priors can lead to numerical problems in computations and may
be difficult to interpret. There various approaches to defining weakly informative default or reference
priors if: You do not want to overwhelm the data with a strong prior You do not have any background information on which to base an
informative prior Using such weak priors is called the “objective Bayesian” approach
Bayesian Statistics
Prior Distributions Priors as Regularization Tools
Sometimes priors can help reduce or avoid computational instabilities, In particular when working with sparse datasets.
For example, adding 1 or ½ to each cell of a 2x2 table can be interpreted as the use of a fairly weak prior in the analysis.
“I find the term “weakly informative priors” to be very useful […] for the purposes of regularization (“keeping things unridiculous”) in sparse-data settings. “
Andrew Gelman: http://andrewgelman.com/2011/12/more-by-berger-and-me-on-weakly-informative-priors/
Bayesian Statistics Basic Bayesian Approaches
Information-weighted Averages of Prior and Likelihood Example: normal prior, normal likelihood => normal posterior Example: beta prior, binomial likelihood => beta posterior
Bayesian Statistics Basic Bayesian Approaches
Data Priors Standard Bayesian methods work with prior DISTRIBUTIONS and represent
the essential information in the data using likelihood DISTRIBUTIONS. Instead, you could work with the data itself and some “pseudodata” that
would be typical of the intended prior, or more correctly, that has a likelihood proportional to the intended prior.
For example, we can work backwards from a common Epidemiologic device to see it as a data prior:
When we have few observations or unbalanced designs, it is common practice to add 1 or ½ to each cell of a 2x2 table.
Suppose we add 1 to each cell. Just using these 4 pseudodata observations, we have an odds ratio of 1, a log odds ratio of 0, and a standard error for the log odds ratio of sqrt(1+1+1+1) = 2.
Using a normal approximation for the log odds ratio, this suggests a normal prior: N(mean=0, var=2*2).
Adding the pseudocounts to the 2x2 table corresponds to assuming the prior above and conducting a Bayesian analysis
Bayesian Statistics Basic Bayesian Approaches
Bayesian Interpretation of Frequentist Methods In simple cases, if you use a constant, improper prior, the posterior
distribution will be a normalized version of the likelihood. So, maximum-likelihood and other likelihood-based methods could be considered Bayesian methods with a particular uninformative prior.
Note: a prior that is flat on one scale will not be flat on a different scale – say if you transform the data from the natural scale to the log scale then a prior that was flat on the natural scale will put much more weight on large values on the log scale
For simple models, even with informative priors, the posterior distribution and the scaled likelihood distribution converge. So likelihood-based statistics such as ML estimates and confidence intervals could be considered large sample theory based estimates to Bayesian estimates and intervals.
Many Bayesian applications involve hierarchical models. There are frequentist hierarchical models (mixed effects models, empirical Bayes…). Important properties are demonstrated by the whole set (shrinkage…)
Bayesian Statistics Basic Bayesian Approaches
Bayes Factors. Used for Bayesian model comparison and model selection They may be difficult to calculate, so approximations exist
Bayesian Statistics
Further Developments Hierarchical Models and Computational Tools
Bayesian Statistics
Bayesian Statistics Use a Bayesian hierarchical model to fit each trajectory with its own break-point
and slopes. The data were separated into different groups by initial CD4 count. Different
distributions were used for the slopes of trajectories for different groups.
Bayesian Statistics Level 1: log(CD4 count) as a function of the slopes and breakpoint Level 2: slopes and breakpoints as functions of hyperparameters Level 3: priors on hyperparameters: α ~ N(0, s) Level 4: weakly informative priors on s from level 3.
Bayesian Statistics
Bayesian Statistics Fitting the model.
There are simple tools to fit Bayesian hierarchical models such as this one. Gibbs Sampling, MCMC, importance sampling, and extensions
The authors of this paper used WinBUGS For many applications there are special-purpose computational tools that use
and fit Bayesian models
Bayesian Statistics Bayesian Regression in R
Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau and Yu-Sung Su. (2009). “A Weakly Informative Default Prior Distribution For Logistic And Other Regression Models.” The Annals of Applied Statistics 2 (4): 1360-1383.
library(arm). bayesglm(): Bayesian versions of standard regression functions, with weakly informative normal, t, or Cauchy priors on the regression coefficients
Note how thestandard logisticregression, glm(), gives wild estimatesfor 1964.
Bayesian Statistics Sensitivity Analysis:
Try several different priors Example: Reanalysis of clinical trial data to see if or when the trial would have
been stopped if a Bayesian interim analysis were part of trial’s analysis plan.
Bayesian Statistics
Bayesian Statistics
A Bayesian hierarchical model for survival data: Interest is in the posterior distribution of β, the log(Hazard Ratio)β := the log(hazard placebo/ hazard treatment)Early stopping rule: stop trial if Pr(β > 0.36) >= 0.95
Bayesian Statistics
Noninformative prior
Bayesian Statistics
Skeptical prior
Bayesian Statistics
Prior based on external information
Bias Analysis The protocol was to stop the trial because of evidence of a beneficial effect of
treatment if Pr(β > 0.36) >= 0.95 The different priors indicate different stopping times Bayesian methods are very natural to use in sequential analyses: the posterior from
the first interim analysis could become the prior for the second interim analysis
Bayesian Statistics Advantages
If you have subject-matter knowledge you can use an informative prior to get effect estimates with lower variance.
Even weakly informative priors may reduce computational problems when using sparse data.
Very general computational tools allow the use of new or complicated models.
Controversies Some people just are not comfortable specifying or seeing others specify
priors. Sometime those nice computational tools fail; and worse, it is not always
clear that they have failed. Non-Bayesian methods are used with good or adequate results in many
applications. Differences
Bayesians often talk about entire posterior distributions or summarize them with expected values rather than maximum values.
Emphasize narrow credible intervals rather than, e.g. symmetric confidence intervals
BIAS ANALYSIS
Rothman, Greenland, and Lash: Ch. 19.
Bias Analysis Analysis of Unmeasured Confounders Analysis of Misclassification Probabilistic Bias Analysis
BIAS ANALYSIS
Rothman, Greenland, and Lash: Ch. 19.
Bias Analysis Why bias analysis?
conventional methods assume: random assignment to treatment random sampling
random sampling within strata random measurement errors, if any
but assumptions can be wrong => systematic errors (may be bigger than random errors)
big studies: standard errors become relatively small so systematic errors become
more apparent
Bias Analysis Analysis of or adjustment for systematic errors requires guesses for the likely
sizes of the biases. Different analysts may have different guesses. May try a set or distribution of guesses rather than a single one Quantitative bias analysis:
”Such assessments may argue persuasively that certain sources of bias cannot by themselves plausibly explain a study result , or that a bias explanation cannot be ruled out." RGL p. 347
Analysis of unmeasured confounders Suppose you have measurements for:
exposure (occupational exposure to fine silica) and your outcome of interest (lung cancer mortality) but not for a potential confounder (smoking)
If we know or are willing to assume: the silica-exposure-stratum specific associations of smoking and
lung cancer mortality the prevalence of smoking in each silica-exposure-stratum the prevalence of silica exposure in controls
Then we can estimate silica effect, adjusted for smoking Garbage-in; garbage-out with these many assumptions
We could try many sets of assumptions and then plot or summarize the many adjusted estimates we make
Bias Analysis: Unmeasured Confounders
Bias Analysis: Unmeasured Confounders
Bias Analysis: Unmeasured Confounders
Bias Analysis: Unmeasured Confounders
Bias Analysis: Misclassfication
Exposure Classification
True Exposed
E=1
True not Exposed
E=0
Reported Exposed
e=1
True Positive
False Positive
Reported not Exposed
e=0
False Negative
True Negative
Sensitivity = Pr(report exposure | true exposure ) = Pr(e=1 | E=1)
Specificity = Pr(report not exposed | truly not exposed) = Pr(e=0 | E=0)
Bias Analysis: Misclassification
Logistic Regression as ClassifierAfter fitting a logistic regression model:
For each individual used to fit the model, you can calculatea probability of having the outcome variable = 1 (radiation therapyinstead of prostatectomy)
You could use Pr(outcome = 1 ) = 0.5 as a cutoff: individuals withprobabilities greater than the cutoff are assigned or predicted to bein the outcome=1 category; otherwise they are assigned or precictedto be in the outcome=0 category. By comparing these predictionsto the actual outcomes, you can calculate sensitivities and specificities
Bias Analysis: Misclassification
. logistic treat2 age race inst i.gleasoncat
Logistic regression Number of obs = 330 LR chi2(5) = 93.73 Prob > chi2 = 0.0000Log likelihood = -181.84698 Pseudo R2 = 0.2049
------------------------------------------------------------------------------ treat2 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | 1.14762 .0219378 7.20 0.000 1.105418 1.191433 race | 5.464351 2.101918 4.41 0.000 2.571078 11.61347 inst | .845149 .2970819 -0.48 0.632 .4243497 1.683227 | gleasoncat | 2 | .5591875 .2315788 -1.40 0.160 .2483394 1.259126 3 | .4410856 .1823707 -1.98 0.048 .196149 .9918811------------------------------------------------------------------------------
Bias Analysis: Misclassification
. estat classification
Logistic model for treat2
-------- True --------Classified | D ~D | Total-----------+--------------------------+----------- + | 113 44 | 157 - | 50 123 | 173-----------+--------------------------+----------- Total | 163 167 | 330
Classified + if predicted Pr(D) >= .5True D defined as treat2 != 0--------------------------------------------------Sensitivity Pr( +| D) 69.33%Specificity Pr( -|~D) 73.65%Positive predictive value Pr( D| +) 71.97%Negative predictive value Pr(~D| -) 71.10%--------------------------------------------------False + rate for true ~D Pr( +|~D) 26.35%False - rate for true D Pr( -| D) 30.67%False + rate for classified + Pr(~D| +) 28.03%False - rate for classified - Pr( D| -) 28.90%--------------------------------------------------Correctly classified 71.52%--------------------------------------------------
Bias Analysis: Misclassification
You could try probability cutoffs other than 0.5 ROC curves:
Try many cutoffs between Pr(outcome=1)=0 and Pr(outcome=1)=1Calculate the sensitivity and specificity for each cutoffPlot Sensitivity vs 1-Specificity
Bias Analysis: Misclassification
Bias Analysis: Misclassification You could try probability cutoffs other than 0.5 ROC curves:
Try many cutoffs between Pr(outcome=1)=0 and Pr(outcome=1)=1Calculate the sensitivity and specificity for each cutoffPlot Sensitivity vs. 1-Specificity
Stata: .lsens : plot sensitivity and specificity vs. cutoff .lroc : plot ROC curve
Sensitivity and Specificity are inversely related The area under the curve (AUC) or the 2 times the area between the curve
and the diagonal line (Gini coefficient) are used to assess the quality of the classifier/ prediction system
Assess quality of classifier with data that was not used to generate/fit/train the classifier (cross validations)
Bias Analysis: Probabilistic Bias Analysis
Bias Analysis: Probabilistic Bias Analysis
Bias Analysis: Probabilistic Bias Analysis
Bias Analysis: Probabilistic Bias Analysis
Bias Analysis: Probabilistic Bias Analysis
Simulating or correcting misclassification of EITHER exposure or outcome is not difficult
If both exposure and outcome can be misclassified, there is the additional issue of whether or not the two types of misclassification are independent
Misclassification may depend on levels of other covariates as well
Bias Analysis: Probabilistic Bias Analysis
Bias Analysis: Probabilistic Bias Analysis
Bias Analysis: Probabilistic Bias Analysis
Bias Analysis: Probabilistic Bias Analysis