20: Maximum Likelihood EstimationLisa YanMay 20, 2020
1
Lisa Yan, CS109, 2020
Quick slide reference
2
3 Intro to parameter estimation 20a_intro
14 Maximum Likelihood Estimator 20b_mle
21 argmax and log-likelihood 20c_argmax
30 MLE: Bernoulli 20d_mle_bernoulli
42 MLE exercises: Poisson, Uniform, Gaussian LIVE
Intro to parameter estimation
3
20a_intro
Lisa Yan, CS109, 2020
Story so farAt this point:
If you are given a model with all thenecessary probabilities, you canmake predictions.
But what if you want to learn the probabilities in the model?
What if you want to learn the structure of the model, too?
Machine Learning4
(I wishβ¦another day)
π~Poi 5
π!, β¦ , π" i.i.d.π~Ber 0.2 ,π = β π#"
#$!
Lisa Yan, CS109, 2020
ML: Rooted in probability theory
Artificial Intelligence
Machine Learning
Deep Learning
AI and Machine Learning
Lisa Yan, CS109, 2020
Tensor Flow
Alright, so Deep Learning now?
Not so fastβ¦
Lisa Yan, CS109, 2020
Lisa Yan, CS109, 2020 8
Lisa Yan, CS109, 2020
Once upon a timeβ¦
9
β¦there was parameter estimation.
Lisa Yan, CS109, 2020
Recall some estimators π!, π", β¦ , π# are π i.i.d. random variables,where π$ drawn from distribution πΉ with πΈ π$ = π, Var π$ = π".
Sample mean:
10
π+ =1π
- π$
#
$%!
unbiased estimate of π
Sample variance: π" =1
π β 1- π$ β π+ "
#
$%!
unbiased estimate of π"
Lisa Yan, CS109, 2020
What are parameters?def Many random variables we have learned so far are parametric models:
Distribution = model + parameter πex The distribution Ber 0.2
For each of the distributions below, what is the parameter π?
1. Ber π2. Poi π3. Uni πΌ, π½4. π©(π, π!)5. π = ππ + π
11
π = π
= Bernoulli model, parameter π = 0.2.
π€
Lisa Yan, CS109, 2020
What are parameters?def Many random variables we have learned so far are parametric models:
Distribution = model + parameter πex The distribution Ber 0.2
For each of the distributions below, what is the parameter π?
1. Ber π2. Poi π3. Uni πΌ, π½4. π©(π, π!)5. π = ππ + π
12
= Bernoulli model, parameter π = 0.2.
π = ππ = ππ = πΌ, π½
π = π, ππ = π, π"
π is the parameter of a distribution.π can be a vector of parameters!
Lisa Yan, CS109, 2020
Why do we care?In real world, we donβt know the βtrueβ parameters.β’ But we do get to observe data:
def estimator π: : random variable estimating parameter π from data.
In parameter estimation,We use the point estimate of parameter estimate (best single value):β’ Better understanding of the process producing dataβ’ Future predictions based on modelβ’ Simulation of future processes
13
(# times coin comes up heads, lifetimes of disk drives produced, # visitors to website per day, etc.)
Maximum Likelihood Estimator
14
20b_mle
Lisa Yan, CS109, 2020
Defining the likelihood of data: BernoulliConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ π$ was drawn from distribution πΉ = Ber π with unknown parameter π.β’ Observed data:
0, 0, 1, 1, 1, 1, 1, 1, 1, 1
How likely was the observed data if π = 0.4?
π sample|π = 0.4 = 0.4 & 0.6 " = 0.000236
15
(π = 10)
Likelihood of datagiven parameter π = 0.4 Is there a better
parameter π?
Lisa Yan, CS109, 2020
Defining the likelihood of dataConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ π$ was drawn from a distribution with density function π π$|π .β’ Observed data: π!, π", β¦ , π#
Likelihood question:How likely is the observed data π!, π", β¦ , π# given parameter π?
Likelihood function, πΏ π :
16
This is just a product, since π# are i.i.d.
or mass
= B π π$|π#
$%!
πΏ π = π π!, π", β¦ , π#|π
Lisa Yan, CS109, 2020
Defining the likelihood of data
17
πΏ π = B π π$|π#
$%!
Lisa Yan, CS109, 2020
Maximum Likelihood EstimatorConsider a sample of π i.i.d. random variables π!, π", β¦ , π#, drawn from a distribution π π$|π .def The Maximum Likelihood Estimator (MLE) of π is the value of π that
maximizes πΏ π .
18
π!"# = arg max$
πΏ π
Lisa Yan, CS109, 2020
Maximum Likelihood EstimatorConsider a sample of π i.i.d. random variables π!, π", β¦ , π#, drawn from a distribution π π$|π .def The Maximum Likelihood Estimator (MLE) of π is the value of π that
maximizes πΏ π .
19
π!"# = arg max$
πΏ π
πΏ π = B π π$|π#
$%!
Likelihood of your sample
For continuous π$, π π$|π is PDF; for discrete π$, π π$|π is PMF
Lisa Yan, CS109, 2020
Maximum Likelihood EstimatorConsider a sample of π i.i.d. random variables π!, π", β¦ , π#, drawn from a distribution π π$|π .def The Maximum Likelihood Estimator (MLE) of π is the value of π that
maximizes πΏ π .
20
π!"# = arg max$
πΏ π
The argument πthat maximizes πΏ π
Stay tuned!
argmax
21
20c_argmax
Lisa Yan, CS109, 2020
New function: arg max
1. max'
π π₯ ?
2. arg max'
π π₯ ?
22
arg max%
π π₯ The argument π₯ thatmaximizes the function π π₯ .
Let π π₯ = βπ₯" + 4, where β2 < π₯ < 2.
0
1
2
3
4
-2 -1 0 1 2
π π₯
π₯ π€
Lisa Yan, CS109, 2020
New function: arg max
1. max'
π π₯ ?
2. arg max'
π π₯ ?
23
arg max%
π π₯ The argument π₯ thatmaximizes the function π π₯ .
Let π π₯ = βπ₯" + 4, where β2 < π₯ < 2.
0
1
2
3
4
-2 -1 0 1 2
π π₯
π₯
= 4
= 0
Lisa Yan, CS109, 2020
Argmax and log
24
arg max%
π π₯
Let π π₯ = βπ₯" + 4, where β2 < π₯ < 2.
0
1
2
3
4
-2 -1 0 1 2
π π₯
π₯arg max
' π π₯ = 0
-8
-4
0
4
-2 -1 0 1 2
log π π₯
π₯
= arg max%
log π π₯
The argument π₯ thatmaximizes the function π π₯ .
Lisa Yan, CS109, 2020
Logs all around
25
-2
-1
0
1
-1 0 1 2 3 4 5 6
β’ Log is monotonic:π₯ β€ π¦ βΊ log π₯ β€ log π¦
β’ Log of product = sum of logs:
β’ Natural logslog π₯
log ππ = log π + log π
log(π₯ = ln π₯
Lisa Yan, CS109, 2020
Argmax properties
26
arg max%
π π₯
= arg max%
log π π₯ (log is an increasing function: π₯ β€ π¦ βΊ log π₯ β€ log π¦)
= arg max%
π log π π₯
for any positive constant π
(π₯ β€ π¦ βΊ π log π₯ β€ π log π¦)
The argument π₯ thatmaximizes the function π π₯ .
Lisa Yan, CS109, 2020
Argmax properties
27
arg max%
π π₯
= arg max%
log π π₯ (log is monotonic: π₯ β€ π¦ βΊ log π₯ β€ log π¦)
= arg max%
π log π π₯
for any positive constant π
(π₯ β€ π¦ βΊ π log π₯ β€ π log π¦)
The argument π₯ thatmaximizes the function π π₯ .
arg max
How do we compute argmax?
Lisa Yan, CS109, 2020
Finding the argmax with calculus
28
π₯/ = arg max%
π π₯ Let π π₯ = βπ₯" + 4, where β2 < π₯ < 2.
0
1
2
3
4
-2 -1 0 1 2
π π₯
π₯
πππ₯
π π₯ =π
ππ₯π₯" + 4 = 2π₯Differentiate w.r.t.
argmaxβs argument
Set to 0 and solve 2π₯ = 0 β π₯U = 0
Make sure π₯;is a maximum
β’ Check π π₯; Β± π < π π₯;β’ Often ignored in expository derivationsβ’ Weβll ignore it here too
(and wonβt require it in class)
MLE: Bernoulli
29
20d_mle_bernoulli
Lisa Yan, CS109, 2020
Computing the MLE
General approach for finding π)*+ , the MLE of π:
30
π!"# = arg max$
πΏπΏ π
1. Determine formula for πΏπΏ π
2. Differentiate πΏπΏ πw.r.t. (each) π
πΏπΏ π = ? log π π#|π"
#$!
ππΏπΏ πππ
3. Solve resulting(simultaneous) equations
To maximize:ππΏπΏ π
ππ = 0
4. Make sure derived πB%&' is a maximum β’ Check πΏπΏ π%&' Β± π < πΏπΏ π%&'β’ Often ignored in expository derivationsβ’ Weβll ignore it here too (and wonβt require it in class)
(algebra orcomputer)
πΏπΏ π is often easier to differentiate than πΏ π .
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
31
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
π π$|π = Wπ if π$ = 11 β π if π$ = 0
πΏπΏ π = ? log π π#|π"
#$!?
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
32
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
πΏπΏ π = ? log π π#|π"
#$!
π π$|π = π7! 1 β π !87! where π$ β {0,1}
β’ Is differentiable with respect to πβ’ Valid PMF over discrete domainβ
π π$|π = Wπ if π$ = 11 β π if π$ = 0
Lisa Yan, CS109, 2020
πΏπΏ π = ? log π π#|π"
#$!
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
33
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
= ? log π(! 1 β π !)(!"
#$!
π = ? π#"
#$!
= ? π# log π + 1 β π# log 1 β π"
#$!
= π log π + π β π log 1 β π , where
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
34
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
= π log π + π β π log 1 β π , where π = ? π#
"
#$!
πΏπΏ π = ? π# log π + 1 β π# log 1 β π"
#$!
ππΏπΏ πππ = π
1π + π β π
β11 β π = 0
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
35
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
= π log π + π β π log 1 β π , where π = ? π#
"
#$!
πΏπΏ π = ? π# log π + 1 β π# log 1 β π"
#$!
ππΏπΏ πππ = π
1π + π β π
β11 β π = 0
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
36
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
= π log π + π β π log 1 β π , where π = ? π#
"
#$!
πΏπΏ π = ? π# log π + 1 β π# log 1 β π"
#$!
ππΏπΏ πππ = π
1π + π β π
β11 β π = 0
π%&' =1π π =
1π ? π#
"
#$!
MLE of the Bernoulli parameter, π%&', is the unbiased estimate of the mean, πF (sample mean)
Lisa Yan, CS109, 2020
MLE of Bernoulli is the sample mean
37
πF =1π ? π#
"
#$!
Bernoulliπ π#|π = π(! 1 β π !)(! ,
where π# β {0,1}
πΏπΏ π = ? log π π#|π"
#$!
Lisa Yan, CS109, 2020
Quick checkβ’ You draw π i.i.d. random variables π!, π", β¦ , π# from the distribution πΉ,
yielding the following sample: 0, 0, 1, 1, 1, 1, 1, 1, 1, 1
β’ Suppose distribution πΉ = Ber π with unknown parameter π.
38
(π = 10)
A. 1.0B. 0.5C. 0.8D. 0.2E. None/other
1. What is π)*+ , the MLE of the parameter π?
π%&' = πF =1π ? π#
"
#$!
π€
Lisa Yan, CS109, 2020
Quick checkβ’ You draw π i.i.d. random variables π!, π", β¦ , π# from the distribution πΉ,
yielding the following sample: 0, 0, 1, 1, 1, 1, 1, 1, 1, 1
β’ Suppose distribution πΉ = Ber π with unknown parameter π.
39
A. 1.0B. 0.5C. 0.8D. 0.2E. None/other
1. What is π)*+ , the MLE of the parameter π?
π%&' = πF =1π ? π#
"
#$!
(π = 10)
Lisa Yan, CS109, 2020
Quick checkβ’ You draw π i.i.d. random variables π!, π", β¦ , π# from the distribution πΉ,
yielding the following sample: 0, 0, 1, 1, 1, 1, 1, 1, 1, 1
β’ Suppose distribution πΉ = Ber π with unknown parameter π.
40
C. 0.8
πΏ π = J π π#|π"
#$!
1. What is π)*+ , the MLE of the parameter π?2. What is the likelihood πΏ π of this particular sample?
π π#|π = π(! 1 β π !)(! where π# β {0,1}
= π* 1 β π +
where π = π
(π = 10)
(live)20: Maximum Likelihood EstimationLisa YanMay 20, 2020
41
Lisa Yan, CS109, 2020
Computing the MLE
General approach for finding π)*+ , the MLE of π:
42
1. Determine formula for πΏπΏ π
2. Differentiate πΏπΏ πw.r.t. (each) π
πΏπΏ π = ? log π π#|π"
#$!
ππΏπΏ πππ
3. Solve resulting(simultaneous) equations
To maximize:ππΏπΏ π
ππ = 0
4. Make sure derived πB%&' is a maximum β’ Check πΏπΏ π%&' Β± π < πΏπΏ π%&'β’ Often ignored in expository derivationsβ’ Weβll ignore it here too (and wonβt require it in class)
(algebra orcomputer)
πΏπΏ π is often easier to differentiate than πΏ π .
Review
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
43
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
44
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
βπ +1π A π%
&
%'(+ π log π β A
1π%!
β ππ%!ππ%
&
%'(βπ +
1π ? π#
"
#$!
A. B. C. None/other/donβt know π€
ππΏπΏ πππ = ?
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
45
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
βπ +1π A π%
&
%'(+ π log π β A
1π%!
β ππ%!ππ%
&
%'(βπ +
1π ? π#
"
#$!
A. B. C. None/other/donβt know
ππΏπΏ πππ = ?
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
46
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
ππΏπΏ πππ = βπ +
1π ? π#
"
#$!= 0
3. Solve resultingequations
π%&' =1π ? π#
"
#$!
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
47
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
ππΏπΏ πππ = βπ +
1π ? π#
"
#$!= 0
3. Solve resultingequations
π%&' =1π ? π#
"
#$!
MLE of the Poisson parameter, π%&', is the unbiased estimate of the mean, πF (sample mean)
Lisa Yan, CS109, 2020
Quick check1. A particular experiment can be modeled as a
Poisson RV with parameter π, in terms of events/minute.Collect data: observe 53 events over the next 10 minutes. What is π)*+?
2. Is the Bernoulli MLE an unbiased estimator of the Bernoulli parameter π?
3. Is the Poisson MLE an unbiased estimator of the Poisson variance?
4. What does unbiased mean?
48
π€
π%&' =1π ? π#
"
#$!
Lisa Yan, CS109, 2020
Quick check1. A particular experiment can be modeled as a
Poisson RV with parameter π, in terms of events/minute.Collect data: observe 53 events over the next 10 minutes. What is π)*+?
2. Is the Bernoulli MLE an unbiased estimator of the Bernoulli parameter π?
3. Is the Poisson MLE an unbiased estimator of the Poisson variance?
4. What does unbiased mean?
49
π%&' =1π ? π#
"
#$!
β
β
πΈ estimator = true_thingUnbiased: If you could repeat your experiment, on average you would get what you are looking for.
Interlude for jokes/announcements
50
Lisa Yan, CS109, 2020
Announcements
51
Problem Set 5
Only do problems on the official Pset handout.
Problem Set 6Released today! Due Wed. August 12 (no late days or on-time bonus).
Regrade RequestsPset 1-5 and midterm regrade requests are due by August 11 via Gradescope. Please submit Pset 6 regrades only in extreme cases (e.g. we didnβt see your answers because of mislabeled pages) via email.
Completely Optional ProjectYou may be able to replace an early Pset grade that youβre unhappy with by completing a CS109-related project. Details here: https://us.edstem.org/courses/667/discussion/98951
Lisa Yan, CS109, 2020
Are these trials independent?Are probabilities consistent across jobs?
Interesting probability news
52
Bernoulliβstrialscantellyouhowmanyjobapplicationstosend
https://swizec.com/blog/bernoullis-trials-can-tell-many-job-applications-send/swizec/7677
Lisa Yan, CS109, 2020
Maximum Likelihood with UniformConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.
Let π$~Uni πΌ, π½ .
53
π π#|πΌ, π½ = Q1
π½ β πΌ if πΌ β€ π₯# β€ π½
0 otherwise
1. Determine formula for πΏ π
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
πΏ π = R1
π½ β πΌ
" if πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
0 otherwise
A. Great, letβs do itB. Differentiation is hardC. Constraint πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
makes differentiation hard π€
Lisa Yan, CS109, 2020
Example sample from a UniformConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.
Let π$~Uni πΌ, π½ .
54
πΏ π = R1
π½ β πΌ
" if πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
0 otherwise
A. Uni πΌ = 0 , π½ = 1
B. Uni πΌ = 0.15, π½ = 0.75
C. Uni πΌ = 0.15, π½ = 0.70
Suppose π$~Uni 0,1 .You observe data:
Which parameterswould give youmaximum πΏ π ?
0.15, 0.20, 0.30, 0.40, 0.65, 0.70, 0.75
π€
Lisa Yan, CS109, 2020
Example sample from a UniformConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.
Let π$~Uni πΌ, π½ .
55
πΏ π = R1
π½ β πΌ
" if πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
0 otherwise
A. Uni πΌ = 0 , π½ = 1
B. Uni πΌ = 0.15, π½ = 0.75
C. Uni πΌ = 0.15, π½ = 0.70
Suppose π$~Uni 0,1 .You observe data:
Which parameterswould give youmaximum πΏ π ?
0.15, 0.20, 0.30, 0.40, 0.65, 0.70, 0.75
!-.//
0β 0 = 0
1 1 = 1!-.0
1= 59.5
β Original parameters may not yield maximum likelihood.
Lisa Yan, CS109, 2020
Maximum Likelihood with UniformConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.
Let π$~Uni πΌ, π½ .
56
πΏ π = R1
π½ β πΌ
" if πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
0 otherwise
π)*+ : πΌ)*+ = min π₯!, π₯", β¦ , π₯# π½)*+ = max π₯!, π₯", β¦ , π₯#
Intuition:β’ Want interval size π½ β πΌ to be as small
as possible to maximize likelihood function per datapoint
β’ Need to make sure all observed data is in interval (if not, then πΏ π = 0)
Lisa Yan, CS109, 2020
Small samples = problems with MLEMaximum Likelihood Estimator π)*+ :β’ Best explains data we have seen β’ Does not attempt to generalize to unseen data.
In many cases,
β’ Unbiased (πΈ π%&' = π regardless of size of sample, π)
For some cases, like Uniform:
β’ Biased. Problematic for small sample sizeβ’ Example: If π = 1 then πΌ = π½, yielding an invalid distribution
57
π%&' =1π ? π#
"
#$!Sample mean (MLE for Bernoulli π,
Poisson π, Normal π)
πΌ)*+ β₯ πΌ, π½)*+ β€ π½
β
β
π)*+ = arg maxF
πΏ π
Lisa Yan, CS109, 2020
Properties of MLEMaximum Likelihood Estimator:β’ Best explains data we have seen β’ Does not attempt to generalize to unseen data.
β’ Often used when sample size π is large relative to parameter space
β’ Potentially biased (though asymptotically less so, as π β β)
β’ Consistent:
As π β β (i.e., more data), probability that πB significantly differs from π is zero
58
π)*+ = arg maxF
πΏ π
lim#βH
π π: β π < π = 1 where π > 0
Lisa Yan, CS109, 2020
Maximum Likelihood with Normal
59
πΏπΏ π = ? log1
2πππ) (!)2 "/ +4"
"
#$!= ? β log 2ππ β π# β π +/ 2π+
"
#$! (using natural log)
= β ? log 2ππ"
#$!
β ? π# β π +/ 2π+"
#$!
Consider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
1. Determine formula for πΏπΏ π
3. Solve resultingequations
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
Lisa Yan, CS109, 2020
Maximum Likelihood with NormalConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
60
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = β ? log 2ππ"
#$!
β ? π# β π +/ 2π+"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
ππΏπΏ πππ = ? 2 π# β π / 2π+
"
#$!
=1
π+ ? π# β π"
#$!
= 0
with respect to π
Lisa Yan, CS109, 2020
Maximum Likelihood with NormalConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
61
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = β ? log 2ππ"
#$!
β ? π# β π +/ 2π+"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
ππΏπΏ πππ = ? 2 π# β π / 2π+
"
#$!
=1
π+ ? π# β π"
#$!
= 0
with respect to π with respect to π
ππΏπΏ πππ = β ?
1π
"
#$!
+ ? 2 π# β π +/ 2π5"
#$!
= βππ +
1π5 ? π# β π +
"
#$!
= 0
Lisa Yan, CS109, 2020
Maximum Likelihood with Normal Consider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
62
3. Solve resultingequations
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
1π+ ? π# β π
"
#$!= 0Two equations,
two unknowns:
First, solvefor π%&':
1π+ ? π#
"
#$!β
1π+ ? π
"
#$!= 0 β ? π#
"
#$!
= ππ β π%&' =1π ? π#
"
#$!unbiased
βππ +
1π5 ? π# β π +
"
#$!
= 0
Lisa Yan, CS109, 2020
Maximum Likelihood with NormalConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
63
3. Solve resultingequations
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
βππ +
1π5 ? π# β π +
"
#$!
= 0Two equations, two unknowns:
1π+ ? π#
"
#$!β
1π+ ? π
"
#$!= 0 β ? π#
"
#$!
= ππ β π%&' =1π ? π#
"
#$!
Next, solvefor π%&':
1π5 ? π# β π +
"
#$!
=ππ β ? π# β π +
"
#$!
= π+π β π%&'+ =1π ? π# β π%&' +
"
#$!biased
unbiased
First, solvefor π%&':
1π+ ? π# β π
"
#$!= 0
Lisa Yan, CS109, 2020
Estimating a Bernoulli parameterConsider π i.i.d. random variables π!, π", β¦ , π#.β’ Suppose distribution πΉ = Ber π with unknown parameter π.β’ Say you have three coins: π! = 0.5, π" = 0.8, or πK = 1
Which coin is most likely to give you the following sample (π = 10)?0, 0, 1, 1, 1, 1, 1, 1, 1, 1
64
π€
Lisa Yan, CS109, 2020
Estimating a Bernoulli parameterConsider π i.i.d. random variables π!, π", β¦ , π#.β’ Suppose distribution πΉ = Ber π with unknown parameter π.β’ Say you have three coins: π! = 0.5, π" = 0.8, or πK = 1
Which estimate is most likely to give you the following sample (π = 10)?0, 0, 1, 1, 1, 1, 1, 1, 1, 1
65
How do we write this process mathematically?
Most likely, sochoose this coin
π sample|π = 0.5 = 0.5 & 0.5 " = 0.00097π sample|π = 0.8 = 0.8 & 0.2 " = 0.00671π sample|π = 1.0 = 1.0 & 0 " = 0
Lisa Yan, CS109, 2020
Estimating a Bernoulli parameter Consider π i.i.d. random variables π!, π", β¦ , π#.β’ Suppose distribution πΉ = Ber π with unknown parameter π.β’ Say you have three coins: π! = 0.5, π" = 0.8, or πK = 1
Which estimate is most likely to give you the following sample (π = 10)? 0, 0, 1, 1, 1, 1, 1, 1, 1, 1
66
π3 = arg max$β '.),'.+,,
π+ 1 β π - = 0.8
Most likely, sochoose this coin
π sample|π = 0.5 = 0.5 & 0.5 " = 0.00097π sample|π = 0.8 = 0.8 & 0.2 " = 0.00671π sample|π = 1.0 = 1.0 & 0 " = 0
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~Ber π .
What is π)*+ = π)*+?
67
What is the PMF π π$|π ?A. πB. 1 β π
C. Wπ if π$ = 11 β π if π$ = 0
D. π7! 1 β π !87! where π$ β {0,1}
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = ? log π π#|π"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
π€
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~Ber π .
What is π)*+ = π)*+?
68
What is the PMF π π$|π ?A. πB. 1 β π
C. Wπ if π$ = 11 β π if π$ = 0
D. π7! 1 β π !87! where π$ β {0,1}
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = ? log π π#|π"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~Ber π .
What is π)*+ = π)*+?
69
β’ Is differentiableβ’ Valid PMF over
discrete domain
What is the PMF π π$|π ?A. πB. 1 β π
C. Wπ if π$ = 11 β π if π$ = 0
D. π7! 1 β π !87! where π$ β {0,1}
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = ? log π π#|π"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0