Date post: | 08-Jul-2015 |
Category: |
Data & Analytics |
Upload: | ferris-jumah |
View: | 385 times |
Download: | 1 times |
GVlogo
Hessian Matrices In Statistics
Ferris Jumah, David Schlueter, Matt Vance
MTH 327Final Project
December 7, 2011
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .
Introduce the Hessian matrixBrief description of relevant statisticsMaximum Likelihood Estimation (MLE)Fisher Information and Applications
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .Introduce the Hessian matrix
Brief description of relevant statisticsMaximum Likelihood Estimation (MLE)Fisher Information and Applications
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .Introduce the Hessian matrixBrief description of relevant statistics
Maximum Likelihood Estimation (MLE)Fisher Information and Applications
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .Introduce the Hessian matrixBrief description of relevant statisticsMaximum Likelihood Estimation (MLE)
Fisher Information and Applications
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .Introduce the Hessian matrixBrief description of relevant statisticsMaximum Likelihood Estimation (MLE)Fisher Information and Applications
Hessian Matrices in Statistics
GVlogo
The Hessian Matrix
Recall the Hessian matrix
H(f) =
∂2f∂x21
∂2f∂x1 ∂x2
· · · ∂2f∂x1 ∂xn
∂2f∂x2 ∂x1
∂2f∂x22
· · · ∂2f∂x2 ∂xn
......
. . ....
∂2f∂xn ∂x1
∂2f∂xn ∂x2
· · · ∂2f∂x2n
(1)
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential StatisiticsParameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential StatisiticsParameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential StatisiticsParameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential StatisiticsParameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)
E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential StatisiticsParameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential StatisiticsParameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample?
X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential StatisiticsParameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample? X1, . . . , Xn i.i.d.
Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential StatisiticsParameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential StatisiticsParameters
Random VariablesDefinition: A random variable X is a function X : Ω→ R
Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.
f(x|µ, σ2) =1
σ√
2πexp
[−
(x− µ)2
2σ2
](2)
What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Stats cont.
Estimators (θ) of Population Parameters
Definition: Estimator is often a formula to calculate an estimate of aparameter, θ based on sample dataMany estimators, but which is the best?
Hessian Matrices in Statistics
GVlogo
Stats cont.
Estimators (θ) of Population ParametersDefinition: Estimator is often a formula to calculate an estimate of aparameter, θ based on sample data
Many estimators, but which is the best?
Hessian Matrices in Statistics
GVlogo
Stats cont.
Estimators (θ) of Population ParametersDefinition: Estimator is often a formula to calculate an estimate of aparameter, θ based on sample dataMany estimators, but which is the best?
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function
We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =n∏i=1
f(xi|θ) (4)
Implication of maximizing likelihood function
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sample
Likelihood FunctionWe obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =n∏i=1
f(xi|θ) (4)
Implication of maximizing likelihood function
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function
We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =n∏i=1
f(xi|θ) (4)
Implication of maximizing likelihood function
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function
We obtain data vector x = (x1, . . . , xn)
Since random sample is i.i.d., we express the probability of our observeddata given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =n∏i=1
f(xi|θ) (4)
Implication of maximizing likelihood function
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function
We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =n∏i=1
f(xi|θ) (4)
Implication of maximizing likelihood function
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function
We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =n∏i=1
f(xi|θ) (4)
Implication of maximizing likelihood function
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function
We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =n∏i=1
f(xi|θ) (4)
Implication of maximizing likelihood function
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Recall Least Squares RegressionWish to determine weight vector wLikelihood function given by
P (y|x,w) =
(1
σ√
2π
)nexp
[−∑i(yi −wTxi)2
2σ2
](5)
Need to minimizen∑i=1
(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)
where A is the design matrix of our data.
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector w
Likelihood function given by
P (y|x,w) =
(1
σ√
2π
)nexp
[−∑i(yi −wTxi)2
2σ2
](5)
Need to minimizen∑i=1
(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)
where A is the design matrix of our data.
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector wLikelihood function given by
P (y|x,w) =
(1
σ√
2π
)nexp
[−∑i(yi −wTxi)2
2σ2
](5)
Need to minimizen∑i=1
(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)
where A is the design matrix of our data.
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector wLikelihood function given by
P (y|x,w) =
(1
σ√
2π
)nexp
[−∑i(yi −wTxi)2
2σ2
](5)
Need to minimizen∑i=1
(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)
where A is the design matrix of our data.
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector wLikelihood function given by
P (y|x,w) =
(1
σ√
2π
)nexp
[−∑i(yi −wTxi)2
2σ2
](5)
Need to minimizen∑i=1
(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)
where A is the design matrix of our data.
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
∇S = −ATy +ATAw (7)
Notice linear combination of weights and columns of ATAOur resulting critical point is
w = (ATA)−1ATy, (8)
which we recognize to be the normal equations!
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
∇S = −ATy +ATAw (7)
Notice linear combination of weights and columns of ATAOur resulting critical point is
w = (ATA)−1ATy, (8)
which we recognize to be the normal equations!
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
∇S = −ATy +ATAw (7)
Notice linear combination of weights and columns of ATA
Our resulting critical point is
w = (ATA)−1ATy, (8)
which we recognize to be the normal equations!
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
∇S = −ATy +ATAw (7)
Notice linear combination of weights and columns of ATAOur resulting critical point is
w = (ATA)−1ATy, (8)
which we recognize to be the normal equations!
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
∇S = −ATy +ATAw (7)
Notice linear combination of weights and columns of ATAOur resulting critical point is
w = (ATA)−1ATy, (8)
which we recognize to be the normal equations!
Hessian Matrices in Statistics
GVlogo
Computing the Hessian Matrix
We compute the Hessian in order to show that this is minimum
∂
∂wk∇S =
∂
∂wk
w1
x1,1
...
xn,1
+ · · ·+ wk
x1,k
...
xn,k
+ · · ·+ wn
x1,n
...
xn,n
=
x1,k
...
xn,k
Therefore,
H = ATA (9)
which is positive semi-definite. Therefore, our estimate for wmaximizes our likelihood function
Hessian Matrices in Statistics
GVlogo
Computing the Hessian Matrix
We compute the Hessian in order to show that this is minimum
∂
∂wk∇S =
∂
∂wk
w1
x1,1
...
xn,1
+ · · ·+ wk
x1,k
...
xn,k
+ · · ·+ wn
x1,n
...
xn,n
=
x1,k
...
xn,k
Therefore,H = ATA (9)
which is positive semi-definite. Therefore, our estimate for wmaximizes our likelihood function
Hessian Matrices in Statistics
GVlogo
Computing the Hessian Matrix
We compute the Hessian in order to show that this is minimum
∂
∂wk∇S =
∂
∂wk
w1
x1,1
...
xn,1
+ · · ·+ wk
x1,k
...
xn,k
+ · · ·+ wn
x1,n
...
xn,n
=
x1,k
...
xn,k
Therefore,
H = ATA (9)
which is positive semi-definite. Therefore, our estimate for wmaximizes our likelihood function
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n→∞, give better estimatesθn → θ
Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and DisadvantagesLarger samples, as n→∞, give better estimates
θn → θ
Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and DisadvantagesLarger samples, as n→∞, give better estimates
θn → θ
Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and DisadvantagesLarger samples, as n→∞, give better estimates
θn → θ
Other Advantages
Disadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and DisadvantagesLarger samples, as n→∞, give better estimates
θn → θ
Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fit
Begs the question: How much information about a parameter can begathered from sample data?
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and DisadvantagesLarger samples, as n→∞, give better estimates
θn → θ
Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?
Hessian Matrices in Statistics
GVlogo
Fisher Information
Key Concept: Fisher Information
We determine the amount of information about a parameter fromsample using Fisher information defined by
I(θ) = −E[∂2 ln[f(x|θ)]
∂θ
]. (10)
Intuitive appeal: More data provides more information aboutpopulation parameter
Hessian Matrices in Statistics
GVlogo
Fisher Information
Key Concept: Fisher InformationWe determine the amount of information about a parameter fromsample using Fisher information defined by
I(θ) = −E[∂2 ln[f(x|θ)]
∂θ
]. (10)
Intuitive appeal: More data provides more information aboutpopulation parameter
Hessian Matrices in Statistics
GVlogo
Fisher Information
Key Concept: Fisher InformationWe determine the amount of information about a parameter fromsample using Fisher information defined by
I(θ) = −E[∂2 ln[f(x|θ)]
∂θ
]. (10)
Intuitive appeal: More data provides more information aboutpopulation parameter
Hessian Matrices in Statistics
GVlogo
Fisher Information
Key Concept: Fisher InformationWe determine the amount of information about a parameter fromsample using Fisher information defined by
I(θ) = −E[∂2 ln[f(x|θ)]
∂θ
]. (10)
Intuitive appeal: More data provides more information aboutpopulation parameter
Hessian Matrices in Statistics
GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distributionN(µ, σ2)
Log likelihood function of
ln[f(x|θ)] = −1
2ln(2πσ2)− (x− µ)2
2σ2(11)
where the the parameter vector θ = (µ, σ2).The gradient of the log likelihood is,(
∂ ln[f(x|θ)]∂µ
,∂ ln[f(x|θ)]
∂σ2
)=
(x− µσ2
,(x− µ)2
2σ4− 1
2σ2
)(12)
Hessian Matrices in Statistics
GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distributionN(µ, σ2)
Log likelihood function of
ln[f(x|θ)] = −1
2ln(2πσ2)− (x− µ)2
2σ2(11)
where the the parameter vector θ = (µ, σ2).
The gradient of the log likelihood is,(∂ ln[f(x|θ)]
∂µ,∂ ln[f(x|θ)]
∂σ2
)=
(x− µσ2
,(x− µ)2
2σ4− 1
2σ2
)(12)
Hessian Matrices in Statistics
GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distributionN(µ, σ2)
Log likelihood function of
ln[f(x|θ)] = −1
2ln(2πσ2)− (x− µ)2
2σ2(11)
where the the parameter vector θ = (µ, σ2).The gradient of the log likelihood is,
(∂ ln[f(x|θ)]
∂µ,∂ ln[f(x|θ)]
∂σ2
)=
(x− µσ2
,(x− µ)2
2σ4− 1
2σ2
)(12)
Hessian Matrices in Statistics
GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distributionN(µ, σ2)
Log likelihood function of
ln[f(x|θ)] = −1
2ln(2πσ2)− (x− µ)2
2σ2(11)
where the the parameter vector θ = (µ, σ2).The gradient of the log likelihood is,(
∂ ln[f(x|θ)]∂µ
,∂ ln[f(x|θ)]
∂σ2
)=
(x− µσ2
,(x− µ)2
2σ4− 1
2σ2
)(12)
Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisherinformation matrix
∂2 ln[f(x|θ)])∂θ2
=
∂2 ln[f(x|θ)]
∂µ2
∂2 ln[f(x|θ)])∂µ∂σ2
∂2 ln[f(x|θ)]∂µ∂σ2
∂2 ln[f(x|θ)]∂(σ2)2
=
(−1σ2
)−(x− µ
σ4
)
−(x− µ
σ4
) (1
2σ4− (x− µ)2
σ6
) (13)
We now compute our Fisher information matrix. We see that
I(θ) = −E(∂2f(x|θ)∂θ2
)(14)
=
[1σ2 0
0 −12σ4
](15)
Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisherinformation matrix
∂2 ln[f(x|θ)])∂θ2
=
∂2 ln[f(x|θ)]
∂µ2
∂2 ln[f(x|θ)])∂µ∂σ2
∂2 ln[f(x|θ)]∂µ∂σ2
∂2 ln[f(x|θ)]∂(σ2)2
=
(−1σ2
)−(x− µ
σ4
)
−(x− µ
σ4
) (1
2σ4− (x− µ)2
σ6
) (13)
We now compute our Fisher information matrix. We see that
I(θ) = −E(∂2f(x|θ)∂θ2
)(14)
=
[1σ2 0
0 −12σ4
](15)
Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisherinformation matrix
∂2 ln[f(x|θ)])∂θ2
=
∂2 ln[f(x|θ)]
∂µ2
∂2 ln[f(x|θ)])∂µ∂σ2
∂2 ln[f(x|θ)]∂µ∂σ2
∂2 ln[f(x|θ)]∂(σ2)2
=
(−1σ2
)−(x− µ
σ4
)
−(x− µ
σ4
) (1
2σ4− (x− µ)2
σ6
) (13)
We now compute our Fisher information matrix.
We see that
I(θ) = −E(∂2f(x|θ)∂θ2
)(14)
=
[1σ2 0
0 −12σ4
](15)
Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisherinformation matrix
∂2 ln[f(x|θ)])∂θ2
=
∂2 ln[f(x|θ)]
∂µ2
∂2 ln[f(x|θ)])∂µ∂σ2
∂2 ln[f(x|θ)]∂µ∂σ2
∂2 ln[f(x|θ)]∂(σ2)2
=
(−1σ2
)−(x− µ
σ4
)
−(x− µ
σ4
) (1
2σ4− (x− µ)2
σ6
) (13)
We now compute our Fisher information matrix. We see that
I(θ) = −E(∂2f(x|θ)∂θ2
)(14)
=
[1σ2 0
0 −12σ4
](15)
Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisherinformation matrix
∂2 ln[f(x|θ)])∂θ2
=
∂2 ln[f(x|θ)]
∂µ2
∂2 ln[f(x|θ)])∂µ∂σ2
∂2 ln[f(x|θ)]∂µ∂σ2
∂2 ln[f(x|θ)]∂(σ2)2
=
(−1σ2
)−(x− µ
σ4
)
−(x− µ
σ4
) (1
2σ4− (x− µ)2
σ6
) (13)
We now compute our Fisher information matrix. We see that
I(θ) = −E(∂2f(x|θ)∂θ2
)(14)
=
[1σ2 0
0 −12σ4
](15)
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(θ) given by
V ar(θ) ≥1
I(θ)(16)
for an estimator θWald Test: Comparing a proposed value of θ against the MLE
Test statistic given by
W =θ − θ0s.e.(θ)
(17)
wheres.e.(θ) =
1√I(θ)
(18)
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by
V ar(θ) ≥1
I(θ)(16)
for an estimator θWald Test: Comparing a proposed value of θ against the MLE
Test statistic given by
W =θ − θ0s.e.(θ)
(17)
wheres.e.(θ) =
1√I(θ)
(18)
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by
V ar(θ) ≥1
I(θ)(16)
for an estimator θ
Wald Test: Comparing a proposed value of θ against the MLETest statistic given by
W =θ − θ0s.e.(θ)
(17)
wheres.e.(θ) =
1√I(θ)
(18)
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by
V ar(θ) ≥1
I(θ)(16)
for an estimator θWald Test: Comparing a proposed value of θ against the MLE
Test statistic given by
W =θ − θ0s.e.(θ)
(17)
wheres.e.(θ) =
1√I(θ)
(18)
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by
V ar(θ) ≥1
I(θ)(16)
for an estimator θWald Test: Comparing a proposed value of θ against the MLE
Test statistic given by
W =θ − θ0s.e.(θ)
(17)
wheres.e.(θ) =
1√I(θ)
(18)
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by
V ar(θ) ≥1
I(θ)(16)
for an estimator θWald Test: Comparing a proposed value of θ against the MLE
Test statistic given by
W =θ − θ0s.e.(θ)
(17)
wheres.e.(θ) =
1√I(θ)
(18)
Hessian Matrices in Statistics