The likelihood approach to statisticaldecision problems
Marco CattaneoDepartment of Statistics, LMU Munich
30 January 2014
introduction
I the classical and Bayesian approaches to statistics are unified andgeneralized by the corresponding decision theories
I the likelihood approach to statistics is extremely successful in practice, but itis not unified and generalized by a decision theory
I is such a likelihood decision theory possible?
I in statistics, L usually denotes:
I likelihood function
(here λ)
I loss function
(here W )
I statistical model: (Ω,F ,Pθ) with θ ∈ Θ (where Θ is a nonempty set) andrandom variables X : Ω → X and Xn : Ω → Xn
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 2/10
introduction
I the classical and Bayesian approaches to statistics are unified andgeneralized by the corresponding decision theories
I the likelihood approach to statistics is extremely successful in practice, but itis not unified and generalized by a decision theory
I is such a likelihood decision theory possible?
I in statistics, L usually denotes:
I likelihood function
(here λ)
I loss function
(here W )
I statistical model: (Ω,F ,Pθ) with θ ∈ Θ (where Θ is a nonempty set) andrandom variables X : Ω → X and Xn : Ω → Xn
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 2/10
introduction
I the classical and Bayesian approaches to statistics are unified andgeneralized by the corresponding decision theories
I the likelihood approach to statistics is extremely successful in practice, but itis not unified and generalized by a decision theory
I is such a likelihood decision theory possible?
I in statistics, L usually denotes:
I likelihood function
(here λ)
I loss function
(here W )
I statistical model: (Ω,F ,Pθ) with θ ∈ Θ (where Θ is a nonempty set) andrandom variables X : Ω → X and Xn : Ω → Xn
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 2/10
introduction
I the classical and Bayesian approaches to statistics are unified andgeneralized by the corresponding decision theories
I the likelihood approach to statistics is extremely successful in practice, but itis not unified and generalized by a decision theory
I is such a likelihood decision theory possible?
I in statistics, L usually denotes:
I likelihood function
(here λ)
I loss function
(here W )
I statistical model: (Ω,F ,Pθ) with θ ∈ Θ (where Θ is a nonempty set) andrandom variables X : Ω → X and Xn : Ω → Xn
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 2/10
introduction
I the classical and Bayesian approaches to statistics are unified andgeneralized by the corresponding decision theories
I the likelihood approach to statistics is extremely successful in practice, but itis not unified and generalized by a decision theory
I is such a likelihood decision theory possible?
I in statistics, L usually denotes:
I likelihood function
(here λ)
I loss function
(here W )
I statistical model: (Ω,F ,Pθ) with θ ∈ Θ (where Θ is a nonempty set) andrandom variables X : Ω → X and Xn : Ω → Xn
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 2/10
introduction
I the classical and Bayesian approaches to statistics are unified andgeneralized by the corresponding decision theories
I the likelihood approach to statistics is extremely successful in practice, but itis not unified and generalized by a decision theory
I is such a likelihood decision theory possible?
I in statistics, L usually denotes:
I likelihood function
(here λ)
I loss function
(here W )
I statistical model: (Ω,F ,Pθ) with θ ∈ Θ (where Θ is a nonempty set) andrandom variables X : Ω → X and Xn : Ω → Xn
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 2/10
introduction
I the classical and Bayesian approaches to statistics are unified andgeneralized by the corresponding decision theories
I the likelihood approach to statistics is extremely successful in practice, but itis not unified and generalized by a decision theory
I is such a likelihood decision theory possible?
I in statistics, L usually denotes:
I likelihood function (here λ)
I loss function (here W )
I statistical model: (Ω,F ,Pθ) with θ ∈ Θ (where Θ is a nonempty set) andrandom variables X : Ω → X and Xn : Ω → Xn
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 2/10
introduction
I the classical and Bayesian approaches to statistics are unified andgeneralized by the corresponding decision theories
I the likelihood approach to statistics is extremely successful in practice, but itis not unified and generalized by a decision theory
I is such a likelihood decision theory possible?
I in statistics, L usually denotes:
I likelihood function (here λ)
I loss function (here W )
I statistical model: (Ω,F ,Pθ) with θ ∈ Θ (where Θ is a nonempty set) andrandom variables X : Ω → X and Xn : Ω → Xn
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 2/10
loss function
I a statistical decision problem is described by a loss function
W : Θ×D → [0,+∞[,
where D is a nonempty set
I intended as unification (and generalization) of statistical inference,in particular of:
I point estimation (e.g., with D = Θ)
I hypothesis testing (e.g., with D = H0,H1)
I most successful general methods:
I point estimation: maximum likelihood estimators
I hypothesis testing: likelihood ratio tests
I these methods do not fit well in the setting of statistical decision theory:here they are unified (and generalized) in likelihood decision theory
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 3/10
loss function
I a statistical decision problem is described by a loss function
W : Θ×D → [0,+∞[,
where D is a nonempty set
I intended as unification (and generalization) of statistical inference,
in particular of:
I point estimation (e.g., with D = Θ)
I hypothesis testing (e.g., with D = H0,H1)
I most successful general methods:
I point estimation: maximum likelihood estimators
I hypothesis testing: likelihood ratio tests
I these methods do not fit well in the setting of statistical decision theory:here they are unified (and generalized) in likelihood decision theory
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 3/10
loss function
I a statistical decision problem is described by a loss function
W : Θ×D → [0,+∞[,
where D is a nonempty set
I intended as unification (and generalization) of statistical inference,in particular of:
I point estimation (e.g., with D = Θ)
I hypothesis testing (e.g., with D = H0,H1)
I most successful general methods:
I point estimation: maximum likelihood estimators
I hypothesis testing: likelihood ratio tests
I these methods do not fit well in the setting of statistical decision theory:here they are unified (and generalized) in likelihood decision theory
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 3/10
loss function
I a statistical decision problem is described by a loss function
W : Θ×D → [0,+∞[,
where D is a nonempty set
I intended as unification (and generalization) of statistical inference,in particular of:
I point estimation (e.g., with D = Θ)
I hypothesis testing (e.g., with D = H0,H1)
I most successful general methods:
I point estimation: maximum likelihood estimators
I hypothesis testing: likelihood ratio tests
I these methods do not fit well in the setting of statistical decision theory:here they are unified (and generalized) in likelihood decision theory
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 3/10
loss function
I a statistical decision problem is described by a loss function
W : Θ×D → [0,+∞[,
where D is a nonempty set
I intended as unification (and generalization) of statistical inference,in particular of:
I point estimation (e.g., with D = Θ)
I hypothesis testing (e.g., with D = H0,H1)
I most successful general methods:
I point estimation: maximum likelihood estimators
I hypothesis testing: likelihood ratio tests
I these methods do not fit well in the setting of statistical decision theory:here they are unified (and generalized) in likelihood decision theory
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 3/10
likelihood function
I λx : Θ → [0, 1] is the (relative) likelihood function given X = x , when
supθ∈Θ
λx(θ) = 1 and λx(θ) ∝ Pθ(X = x)
(with λx(θ) ∝ fθ(x) as approximation for continuous X )
I λx describes the relative plausibility of the possible values of θ in the light ofthe observation X = x , and can thus be used as a basis for post-datadecision making
I prior information can be described by a prior likelihood function: if X1 andX2 are independent, then λ(x1,x2) ∝ λx1 λx2 ; that is, when X2 = x2 isobserved, the prior λx1 is updated to the posterior λ(x1,x2)
I strong similarity with the Bayesian approach (both satisfy the likelihoodprinciple): a fundamental advantage of the likelihood approach is thepossibility of not using prior information (since λx1 ≡ 1 describes completeignorance)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 4/10
likelihood function
I λx : Θ → [0, 1] is the (relative) likelihood function given X = x , when
supθ∈Θ
λx(θ) = 1 and λx(θ) ∝ Pθ(X = x)
(with λx(θ) ∝ fθ(x) as approximation for continuous X )
I λx describes the relative plausibility of the possible values of θ in the light ofthe observation X = x , and can thus be used as a basis for post-datadecision making
I prior information can be described by a prior likelihood function: if X1 andX2 are independent, then λ(x1,x2) ∝ λx1 λx2 ; that is, when X2 = x2 isobserved, the prior λx1 is updated to the posterior λ(x1,x2)
I strong similarity with the Bayesian approach (both satisfy the likelihoodprinciple): a fundamental advantage of the likelihood approach is thepossibility of not using prior information (since λx1 ≡ 1 describes completeignorance)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 4/10
likelihood function
I λx : Θ → [0, 1] is the (relative) likelihood function given X = x , when
supθ∈Θ
λx(θ) = 1 and λx(θ) ∝ Pθ(X = x)
(with λx(θ) ∝ fθ(x) as approximation for continuous X )
I λx describes the relative plausibility of the possible values of θ in the light ofthe observation X = x , and can thus be used as a basis for post-datadecision making
I prior information can be described by a prior likelihood function: if X1 andX2 are independent, then λ(x1,x2) ∝ λx1 λx2 ; that is, when X2 = x2 isobserved, the prior λx1 is updated to the posterior λ(x1,x2)
I strong similarity with the Bayesian approach (both satisfy the likelihoodprinciple): a fundamental advantage of the likelihood approach is thepossibility of not using prior information (since λx1 ≡ 1 describes completeignorance)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 4/10
likelihood function
I λx : Θ → [0, 1] is the (relative) likelihood function given X = x , when
supθ∈Θ
λx(θ) = 1 and λx(θ) ∝ Pθ(X = x)
(with λx(θ) ∝ fθ(x) as approximation for continuous X )
I λx describes the relative plausibility of the possible values of θ in the light ofthe observation X = x , and can thus be used as a basis for post-datadecision making
I prior information can be described by a prior likelihood function: if X1 andX2 are independent, then λ(x1,x2) ∝ λx1 λx2 ; that is, when X2 = x2 isobserved, the prior λx1 is updated to the posterior λ(x1,x2)
I strong similarity with the Bayesian approach (both satisfy the likelihoodprinciple): a fundamental advantage of the likelihood approach is thepossibility of not using prior information (since λx1 ≡ 1 describes completeignorance)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 4/10
likelihood function
I λx : Θ → [0, 1] is the (relative) likelihood function given X = x , when
supθ∈Θ
λx(θ) = 1 and λx(θ) ∝ Pθ(X = x)
(with λx(θ) ∝ fθ(x) as approximation for continuous X )
I λx describes the relative plausibility of the possible values of θ in the light ofthe observation X = x , and can thus be used as a basis for post-datadecision making
I prior information can be described by a prior likelihood function: if X1 andX2 are independent, then λ(x1,x2) ∝ λx1 λx2 ; that is, when X2 = x2 isobserved, the prior λx1 is updated to the posterior λ(x1,x2)
I strong similarity with the Bayesian approach (both satisfy the likelihoodprinciple): a fundamental advantage of the likelihood approach is thepossibility of not using prior information (since λx1 ≡ 1 describes completeignorance)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 4/10
likelihood decision criteria
I likelihood decision criterion: minimize V (W (·, d), λx),
where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]
I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)(implied by meaning of W )
I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)
(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ
for infinite Θ)
I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)
I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 5/10
likelihood decision criteria
I likelihood decision criterion: minimize V (W (·, d), λx),where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]
I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)(implied by meaning of W )
I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)
(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ
for infinite Θ)
I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)
I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 5/10
likelihood decision criteria
I likelihood decision criterion: minimize V (W (·, d), λx),where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]
I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)
(implied by meaning of W )
I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)
(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ
for infinite Θ)
I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)
I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 5/10
likelihood decision criteria
I likelihood decision criterion: minimize V (W (·, d), λx),where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]
I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)(implied by meaning of W )
I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)
(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ
for infinite Θ)
I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)
I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 5/10
likelihood decision criteria
I likelihood decision criterion: minimize V (W (·, d), λx),where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]
I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)(implied by meaning of W )
I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)
(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ
for infinite Θ)
I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)
I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 5/10
likelihood decision criteria
I likelihood decision criterion: minimize V (W (·, d), λx),where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]
I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)(implied by meaning of W )
I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)
(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ
for infinite Θ)
I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)
I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 5/10
likelihood decision criteria
I likelihood decision criterion: minimize V (W (·, d), λx),where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]
I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)(implied by meaning of W )
I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)
(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ
for infinite Θ)
I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[
(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)
I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 5/10
likelihood decision criteria
I likelihood decision criterion: minimize V (W (·, d), λx),where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]
I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)(implied by meaning of W )
I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)
(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ
for infinite Θ)
I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)
I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 5/10
likelihood decision criteria
I likelihood decision criterion: minimize V (W (·, d), λx),where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]
I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)(implied by meaning of W )
I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)
(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ
for infinite Θ)
I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)
I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 5/10
properties
I likelihood decision criteria have the advantages of post-data methods:
I independence from choice of possible alternative observations
I direct interpretation
I simpler problems
I likelihood decision criteria have also important pre-data properties:
I equivariance: for invariant decision problems, the likelihood decision functionsare equivariant
I (strong) consistency: under some regularity conditions, the likelihood decisionfunctions δn : X1 × · · · × Xn → D satisfy
limn→∞
W (θ, δn(X1, . . . ,Xn)) = infd∈D
W (θ, d) Pθ-a.s.
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 6/10
properties
I likelihood decision criteria have the advantages of post-data methods:
I independence from choice of possible alternative observations
I direct interpretation
I simpler problems
I likelihood decision criteria have also important pre-data properties:
I equivariance: for invariant decision problems, the likelihood decision functionsare equivariant
I (strong) consistency: under some regularity conditions, the likelihood decisionfunctions δn : X1 × · · · × Xn → D satisfy
limn→∞
W (θ, δn(X1, . . . ,Xn)) = infd∈D
W (θ, d) Pθ-a.s.
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 6/10
properties
I likelihood decision criteria have the advantages of post-data methods:
I independence from choice of possible alternative observations
I direct interpretation
I simpler problems
I likelihood decision criteria have also important pre-data properties:
I equivariance: for invariant decision problems, the likelihood decision functionsare equivariant
I (strong) consistency: under some regularity conditions, the likelihood decisionfunctions δn : X1 × · · · × Xn → D satisfy
limn→∞
W (θ, δn(X1, . . . ,Xn)) = infd∈D
W (θ, d) Pθ-a.s.
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 6/10
properties
I likelihood decision criteria have the advantages of post-data methods:
I independence from choice of possible alternative observations
I direct interpretation
I simpler problems
I likelihood decision criteria have also important pre-data properties:
I equivariance: for invariant decision problems, the likelihood decision functionsare equivariant
I (strong) consistency: under some regularity conditions, the likelihood decisionfunctions δn : X1 × · · · × Xn → D satisfy
limn→∞
W (θ, δn(X1, . . . ,Xn)) = infd∈D
W (θ, d) Pθ-a.s.
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 6/10
properties
I likelihood decision criteria have the advantages of post-data methods:
I independence from choice of possible alternative observations
I direct interpretation
I simpler problems
I likelihood decision criteria have also important pre-data properties:
I equivariance: for invariant decision problems, the likelihood decision functionsare equivariant
I (strong) consistency: under some regularity conditions, the likelihood decisionfunctions δn : X1 × · · · × Xn → D satisfy
limn→∞
W (θ, δn(X1, . . . ,Xn)) = infd∈D
W (θ, d) Pθ-a.s.
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 6/10
properties
I likelihood decision criteria have the advantages of post-data methods:
I independence from choice of possible alternative observations
I direct interpretation
I simpler problems
I likelihood decision criteria have also important pre-data properties:
I equivariance: for invariant decision problems, the likelihood decision functionsare equivariant
I (strong) consistency: under some regularity conditions, the likelihood decisionfunctions δn : X1 × · · · × Xn → D satisfy
limn→∞
W (θ, δn(X1, . . . ,Xn)) = infd∈D
W (θ, d) Pθ-a.s.
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 6/10
properties
I likelihood decision criteria have the advantages of post-data methods:
I independence from choice of possible alternative observations
I direct interpretation
I simpler problems
I likelihood decision criteria have also important pre-data properties:
I equivariance: for invariant decision problems, the likelihood decision functionsare equivariant
I (strong) consistency: under some regularity conditions, the likelihood decisionfunctions δn : X1 × · · · × Xn → D satisfy
limn→∞
W (θ, δn(X1, . . . ,Xn)) = infd∈D
W (θ, d) Pθ-a.s.
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 6/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ),
corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ H
I W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
MPL criterion
I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to
V (w , λ) = supθ∈Θ
w(θ)λ(θ)
(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))
I point estimation:
I D = Θ finite
I W (θ, θ) = Iθ =θ simple loss function
I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion
I hypothesis testing:
I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′
I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 7/10
a simple example
I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)
I estimation of θ with squared error:
I D = Θ with W (θ, θ) = (θ − θ)2
I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior
I likelihood decision function resulting from the MPL criterion:
I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/
√n) σ/√n
I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√
2σ/√n
0
1
2
3
4
5
g
–4 –2 2 4x
0.5
0.6
0.7
0.8
0.9
1
MSE
0 1 2 3 4 5theta
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 8/10
a simple example
I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)
I estimation of θ with squared error:
I D = Θ with W (θ, θ) = (θ − θ)2
I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior
I likelihood decision function resulting from the MPL criterion:
I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/
√n) σ/√n
I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√
2σ/√n
0
1
2
3
4
5
g
–4 –2 2 4x
0.5
0.6
0.7
0.8
0.9
1
MSE
0 1 2 3 4 5theta
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 8/10
a simple example
I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)
I estimation of θ with squared error:
I D = Θ with W (θ, θ) = (θ − θ)2
I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior
I likelihood decision function resulting from the MPL criterion:
I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/
√n) σ/√n
I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√
2σ/√n
0
1
2
3
4
5
g
–4 –2 2 4x
0.5
0.6
0.7
0.8
0.9
1
MSE
0 1 2 3 4 5theta
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 8/10
a simple example
I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)
I estimation of θ with squared error:
I D = Θ with W (θ, θ) = (θ − θ)2
I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior
I likelihood decision function resulting from the MPL criterion:
I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/
√n) σ/√n
I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√
2σ/√n
0
1
2
3
4
5
g
–4 –2 2 4x
0.5
0.6
0.7
0.8
0.9
1
MSE
0 1 2 3 4 5theta
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 8/10
a simple example
I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)
I estimation of θ with squared error:
I D = Θ with W (θ, θ) = (θ − θ)2
I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior
I likelihood decision function resulting from the MPL criterion:
I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/
√n) σ/√n
I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√
2σ/√n
0
1
2
3
4
5
g
–4 –2 2 4x
0.5
0.6
0.7
0.8
0.9
1
MSE
0 1 2 3 4 5theta
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 8/10
a simple example
I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)
I estimation of θ with squared error:
I D = Θ with W (θ, θ) = (θ − θ)2
I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior
I likelihood decision function resulting from the MPL criterion:
I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/
√n) σ/√n
I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√
2σ/√n
0
1
2
3
4
5
g
–4 –2 2 4x
0.5
0.6
0.7
0.8
0.9
1
MSE
0 1 2 3 4 5theta
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 8/10
a simple example
I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)
I estimation of θ with squared error:
I D = Θ with W (θ, θ) = (θ − θ)2
I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior
I likelihood decision function resulting from the MPL criterion:
I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/
√n) σ/√n
I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√2σ/√n
0
1
2
3
4
5
g
–4 –2 2 4x
0.5
0.6
0.7
0.8
0.9
1
MSE
0 1 2 3 4 5theta
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 8/10
a simple example
I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)
I estimation of θ with squared error:
I D = Θ with W (θ, θ) = (θ − θ)2
I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior
I likelihood decision function resulting from the MPL criterion:
I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/
√n) σ/√n
I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√2σ/√n
0
1
2
3
4
5
g
–4 –2 2 4x
0.5
0.6
0.7
0.8
0.9
1
MSE
0 1 2 3 4 5theta
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 8/10
a simple example
I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)
I estimation of θ with squared error:
I D = Θ with W (θ, θ) = (θ − θ)2
I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior
I likelihood decision function resulting from the MPL criterion:
I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/
√n) σ/√n
I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√2σ/√n
0
1
2
3
4
5
g
–4 –2 2 4x
0.5
0.6
0.7
0.8
0.9
1
MSE
0 1 2 3 4 5theta
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 8/10
conclusion
I this work:
I fills a gap in the likelihood approach to statistics
I introduces an alternative to classical and Bayesian decision making
I offers a new perspective on the likelihood methods
I likelihood decision making:
I is post-data and equivariant
I is consistent and asymptotically efficient
I does not need prior information
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 9/10
conclusion
I this work:
I fills a gap in the likelihood approach to statistics
I introduces an alternative to classical and Bayesian decision making
I offers a new perspective on the likelihood methods
I likelihood decision making:
I is post-data and equivariant
I is consistent and asymptotically efficient
I does not need prior information
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 9/10
conclusion
I this work:
I fills a gap in the likelihood approach to statistics
I introduces an alternative to classical and Bayesian decision making
I offers a new perspective on the likelihood methods
I likelihood decision making:
I is post-data and equivariant
I is consistent and asymptotically efficient
I does not need prior information
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 9/10
conclusion
I this work:
I fills a gap in the likelihood approach to statistics
I introduces an alternative to classical and Bayesian decision making
I offers a new perspective on the likelihood methods
I likelihood decision making:
I is post-data and equivariant
I is consistent and asymptotically efficient
I does not need prior information
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 9/10
conclusion
I this work:
I fills a gap in the likelihood approach to statistics
I introduces an alternative to classical and Bayesian decision making
I offers a new perspective on the likelihood methods
I likelihood decision making:
I is post-data and equivariant
I is consistent and asymptotically efficient
I does not need prior information
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 9/10
conclusion
I this work:
I fills a gap in the likelihood approach to statistics
I introduces an alternative to classical and Bayesian decision making
I offers a new perspective on the likelihood methods
I likelihood decision making:
I is post-data and equivariant
I is consistent and asymptotically efficient
I does not need prior information
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 9/10
references
I Lehmann (1959). Testing Statistical Hypotheses. Wiley.
I Diehl and Sprott (1965). Die Likelihoodfunktion und ihre Verwendungbeim statistischen Schluß. Statistische Hefte 6, 112–134.
I Giang and Shenoy (2005). Decision making on the sole basis ofstatistical likelihood. Artificial Intelligence 165, 137–163.
I Cattaneo (2013). Likelihood decision functions. Electronic Journal ofStatistics 7, 2924–2946.
I Cattaneo and Wiencierz (2012). Likelihood-based Imprecise Regression.International Journal of Approximate Reasoning 53, 1137–1154.
Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 10/10