Agenda
I MotivationsI Traditional inferenceI Bayesian inferenceI Bernoulli, BetaI Connection to the Binomial distributionI Posterior of Beta-BernoulliI Example with 2012 election dataI Marginal likelihoodI Posterior Prediction
Estimating Rent Prices in Small Domains
Benchmarked estimates Benchmarked estimates with smoothing
6.5
7.0
7.5
8.0
8.5
9.0
9.5
10.0
Traditional inference
You are given data X and there is an unknown parameter youwish to estimate θHow would you estimate θ?
I Find an unbiased estimator of θ.I Find the maximum likelihood estimate (MLE) of θ by looking
at the likelihood of the data.I If you cannot remember the definition of an unbiased estimator
or the MLE, review these before our next class.
Bayesian inference
Bayesian methods trace its origin to the 18th century and EnglishReverend Thomas Bayes, who along with Pierre-Simon Laplacediscovered what we now call Bayes’ Theorem
I p(x | θ) likelihoodI p(θ) priorI p(θ | x) posteriorI p(x) marginal distribution
p(θ|x) = p(θ, x)p(x) = p(x |θ)p(θ)
p(x) ∝ p(x |θ)p(θ)
Bernoulli distribution
The Bernoulli distribution is very common due to binary outcomes.I Consider flipping a coin (heads or tails).I We can represent this a binary random variable where the
probability of heads is θ and the probability of tails is 1− θ.
The write the random variable as X ∼ Bernoulli(θ)1(0 < θ < 1)It follows that the likelihood is
p(x | θ) = θx (1− θ)(1−x)1(0 < θ < 1).
I Exercise: what is the mean and the variance of X?
Bernoulli distribution
I Suppose that X1, . . . ,Xniid∼ Bernoulli(θ). Then for
x1, . . . , xn ∈ {0, 1} what is the likelihood?
Likelihood
p(x1:n|θ) = P(X1 = x1, . . . ,Xn = xn | θ)
=n∏
i=1P(Xi = xi | θ)
=n∏
i=1p(xi |θ)
=n∏
i=1θxi (1− θ)1−xi
= θ∑
xi (1− θ)n−∑
xi .
Beta distribution
Given a, b > 0, we write θ ∼ Beta(a, b) to mean that θ has pdf
p(θ) = Beta(θ|a, b) = 1B(a, b)θ
a−1(1− θ)b−11(0 < θ < 1),
i.e., p(θ) ∝ θa−1(1− θ)b−1 on the interval from 0 to 1.I Here,
B(a, b) = Γ(a)Γ(b)Γ(a + b)
.I The mean is E (θ) =
∫θ p(θ)dθ = a/(a + b).
Posterior of Bernoulli-Beta
Lets derive the posterior of θ | x1:n
p(θ|x1:n)∝ p(x1:n|θ)p(θ)
= θ∑
xi (1− θ)n−∑
xi 1B(a, b)θ
a−1(1− θ)b−1I(0 < θ < 1)
∝ θa+∑
xi −1(1− θ)b+n−∑
xi −1I(0 < θ < 1)∝ Beta
(θ | a +
∑xi , b + n −
∑xi).
Approval ratings of Obama
What is the proportion of people that approve of President Obamain PA?
I We take a random sample of 10 people in PA and find that 6approve of President Obama.
I The national approval rating (Zogby poll) of President Obamain mid-September 2015 was 45%. We’ll assume that in PA hisapproval rating is approximately 50%.
I Based on this prior information, we’ll use a Beta prior for θ andwe’ll choose a and b.
Obama Example
n = 10# Fixing values of a,b.a = 21/8b = 0.04th = seq(0,1, length=500)x = 6
# we set the likelihood, prior, and posteriors with# THETA as the sequence that we plot on the x-axis.# Beta(c,d) refers to shape parameterlike = dbeta(th, x+1, n-x+1)prior = dbeta(th, a, b)post = dbeta(th, x+a, n-x+b)
Likelihood
plot(th, like, type='l', ylab = "Density",lty = 3, lwd = 3, xlab = expression(theta))
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
θ
Den
sity
Prior
plot(th, prior, type='l', ylab = "Density",lty = 3, lwd = 3, xlab = expression(theta))
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
θ
Den
sity
Posterior
plot(th, post, type='l', ylab = "Density",lty = 3, lwd = 3, xlab = expression(theta))
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
θ
Den
sity
Likelihood, Prior, and Posterior
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
θ
Den
sity
PriorLikelihoodPosterior
Cast of characters
I Observed data: xI Note this could consist of many data points, e.g.,
x = x1:n = (x1, . . . , xn).
likelihood p(x |θ)prior p(θ)posterior p(θ|x)marginal likelihood p(x)posterior predictive p(xn+1|x1:n)loss function `(s, a)posterior expected loss ρ(a, x)risk / frequentist risk R(θ, δ)integrated risk r(δ)
Marginal likelihood
The marginal likelihood is
p(x) =∫
p(x |θ)p(θ) dθ
I What is the marginal likelihood for the Bernoulli-Beta?
Posterior predictive distribution
I We may wish to predict a new data point xn+1I We assume that x1:(n+1) are independent given θ
p(xn+1|x1:n) =∫
p(xn+1, θ|x1:n) dθ
=∫
p(xn+1|θ, x1:n)p(θ|x1:n) dθ
=∫
p(xn+1|θ)p(θ|x1:n) dθ.
Example: Back to the Beta-Bernoulli
Supposeθ ∼ Beta(a, b)
andX1, . . . ,Xn | θ
iid∼ Bernoulli(θ)
Then the marginal likelihood is
p(x1:n)
=∫
p(x1:n|θ)p(θ) dθ
=∫ 1
0θ∑
xi (1− θ)n−∑
xi 1B(a, b)θ
a−1(1− θ)b−1dθ
=B(a +
∑xi , b + n −
∑xi)
B(a, b) ,
by the integral definition of the Beta function.
Example continued
Let an = a +∑
xi and bn = b + n −∑
xi .It follows that the posterior distribution isp(θ|x1:n) = Beta(θ|an, bn).The posterior predictive can be derived to be
P(Xn+1 = 1 | x1:n) =∫
P(Xn+1 = 1 | θ)p(θ|x1:n)dθ
=∫θ Beta(θ|an, bn) = an
an + bn,
hence, the posterior predictive p.m.f. is
p(xn+1|x1:n) = axn+1n b1−xn+1
nan + bn
1(xn+1 ∈ {0, 1}).