+ All Categories
Home > Documents > Testing in models that are not true

Testing in models that are not true

Date post: 04-Jan-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
94
Introduction What happens if assumptions are violated? Nominal and substantive hypotheses What can we do about the model assumptions? Combined procedures Testing in models that are not true Christian Hennig Christian Hennig Testing in models that are not true
Transcript
Page 1: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Testing in models that are not true

Christian Hennig

Christian Hennig Testing in models that are not true

Page 2: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

1. Introduction

Frequentist statistical methods rely on model assumptions,e.g., want to test from a number of measurements whetherwater turbidity of a river is ≤ 25 NTU (common standard).

Test H0 : µ ≤ 25 against H1 : µ > 25 using

T =Xn − 25Sn/√

n

assuming X1, . . . ,Xn i.i.d. with L(X1) = N (µ, σ2).

Christian Hennig Testing in models that are not true

Page 3: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

X1, . . . ,Xn i.i.d. with L(X1) = N (µ, σ2)

What about these assumptions?Do they have to be fulfilled? Can this be checked?

But “all models are wrong”!This is often used as argument against frequentist methods.“You have to believe the model is true, but it isn’t.”

Some are more careful and say, “the model has to be valid”.What does this mean, and can we check this?

Christian Hennig Testing in models that are not true

Page 4: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

X1, . . . ,Xn i.i.d. with L(X1) = N (µ, σ2)

What about these assumptions?Do they have to be fulfilled? Can this be checked?

But “all models are wrong”!This is often used as argument against frequentist methods.“You have to believe the model is true, but it isn’t.”

Some are more careful and say, “the model has to be valid”.What does this mean, and can we check this?

Christian Hennig Testing in models that are not true

Page 5: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

X1, . . . ,Xn i.i.d. with L(X1) = N (µ, σ2)

What about these assumptions?Do they have to be fulfilled? Can this be checked?

But “all models are wrong”!This is often used as argument against frequentist methods.“You have to believe the model is true, but it isn’t.”

Some are more careful and say, “the model has to be valid”.What does this mean, and can we check this?

Christian Hennig Testing in models that are not true

Page 6: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

An issue in testing:

Greenland, Senn et al. (2016):“In logical terms, the P value tests all the assumptions abouthow the data were generated, not just the targeted hypothesis itis supposed to test”

Trafimov (2020, NISS debate):“I’ll make a more general comment, which is that since themodel is wrong, in the sense of not being exactly correct,whenever you reject it, you haven’t learned anything.”

Christian Hennig Testing in models that are not true

Page 7: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

What is going on?

Christian Hennig Testing in models that are not true

Page 8: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Christian Hennig Testing in models that are not true

Page 9: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Christian Hennig Testing in models that are not true

Page 10: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Christian Hennig Testing in models that are not true

Page 11: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

“Model world” and “real world” are separate -it’s not the job of models to be “true”.Models are tools for thinking.

Benefits of “model thinking” (even if model not true):I Predictions (testable)I Quantification of uncertainty (often testable)I Inspiration for methods and decisionsI Unambiguous communication of point of viewI Learn through mathematicsI Learn from objections and falsification

Christian Hennig Testing in models that are not true

Page 12: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

“Model world” and “real world” are separate -it’s not the job of models to be “true”.Models are tools for thinking.

Benefits of “model thinking” (even if model not true):I Predictions (testable)I Quantification of uncertainty (often testable)I Inspiration for methods and decisionsI Unambiguous communication of point of viewI Learn through mathematicsI Learn from objections and falsification

Christian Hennig Testing in models that are not true

Page 13: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Frequentist interpretation of probability:

Christian Hennig Testing in models that are not true

Page 14: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

“We think (at least tentatively) of the situation as . . . ”I Potentially infinite repetition (of experimental conditions)I P(A): relative frequency limit of occurrence of A

(e.g., normal distribution is defined by P(A) ∀A.)

”I.i.d.”:Identity: We treat systematic differences as irrelevant.

Independence: We treat potential dependencies as irrelevant.

Of course need to discuss these for situation of interest.

Christian Hennig Testing in models that are not true

Page 15: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

“We think (at least tentatively) of the situation as . . . ”I Potentially infinite repetition (of experimental conditions)I P(A): relative frequency limit of occurrence of A

(e.g., normal distribution is defined by P(A) ∀A.)

”I.i.d.”:Identity: We treat systematic differences as irrelevant.

Independence: We treat potential dependencies as irrelevant.

Of course need to discuss these for situation of interest.

Christian Hennig Testing in models that are not true

Page 16: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

“We think (at least tentatively) of the situation as . . . ”I Potentially infinite repetition (of experimental conditions)I P(A): relative frequency limit of occurrence of A

(e.g., normal distribution is defined by P(A) ∀A.)

”I.i.d.”:Identity: We treat systematic differences as irrelevant.

Independence: We treat potential dependencies as irrelevant.

Of course need to discuss these for situation of interest.

Christian Hennig Testing in models that are not true

Page 17: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Detour on Epistemic probability:“(Frequentist) probability does not exist” (de Finetti) - modelsubjective (or “objective”) epistemic uncertainty instead.

But still same separation between “model world”and “real epsitemic uncertainty”- no “solution” of “all models are wrong”.

If we’re interested in reality,why not model reality directly,rather than our thinking about reality?

Christian Hennig Testing in models that are not true

Page 18: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

2. What happens if assumptions are violated?What does it meanthat a method requires model assumptions?

It means there’s a result stating that method will performwell or even optimal if model assumptions are fulfilled.

Benefit of model is that it inspires methods.This doesn’t mean we have to believe it’s true.

It doesn’t say anything about what happensif model assumptions are not fulfilled.

Christian Hennig Testing in models that are not true

Page 19: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

2. What happens if assumptions are violated?What does it meanthat a method requires model assumptions?

It means there’s a result stating that method will performwell or even optimal if model assumptions are fulfilled.

Benefit of model is that it inspires methods.This doesn’t mean we have to believe it’s true.

It doesn’t say anything about what happensif model assumptions are not fulfilled.

Christian Hennig Testing in models that are not true

Page 20: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

2. What happens if assumptions are violated?What does it meanthat a method requires model assumptions?

It means there’s a result stating that method will performwell or even optimal if model assumptions are fulfilled.

Benefit of model is that it inspires methods.This doesn’t mean we have to believe it’s true.

It doesn’t say anything about what happensif model assumptions are not fulfilled.

Christian Hennig Testing in models that are not true

Page 21: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

2. What happens if assumptions are violated?What does it meanthat a method requires model assumptions?

It means there’s a result stating that method will performwell or even optimal if model assumptions are fulfilled.

Benefit of model is that it inspires methods.This doesn’t mean we have to believe it’s true.

It doesn’t say anything about what happensif model assumptions are not fulfilled.

Christian Hennig Testing in models that are not true

Page 22: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

How can we know what happens if assumptions areviolated?

We need to model violated model assumptions,then theory or simulations.

Christian Hennig Testing in models that are not true

Page 23: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Some examples:Assume X1, . . . ,Xn i.i.d. with L(X1) = N (µ, σ2),σ2 = 1, n = 50, test H0 : µ = 0 against H1 : µ > 0,more precisely µ = 0.5 at α = 0.05.

(a) Rounded Gaussian - as above but data rounded to full 0.1(very realistic, but no continuous likelihood!)

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

Gaussian

x

dens

ity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

rounded Gaussian

x

dens

ity

Christian Hennig Testing in models that are not true

Page 24: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Performance of t-test of H0 : µ = 0

Distribution effective level powerGaussian 0.05 0.93rounded Gaussian 0.05 0.94

Christian Hennig Testing in models that are not true

Page 25: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Some examples:X1, . . . ,X50 i.i.d. with L(X1) = N (µ,1),test H0 : µ = 0 against H1 : µ = 0.5.

(b) (Shifted) exponential

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

Shifted exponential

x

dens

ity

Christian Hennig Testing in models that are not true

Page 26: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Performance of t-test of H0 : µ = 0

Distribution effective level powerGaussian 0.05 0.93rounded Gaussian 0.05 0.94exponential 0.06 1

Christian Hennig Testing in models that are not true

Page 27: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Central limit theorem:For large n, as long as variances exist,non-normality is not an issue.

Christian Hennig Testing in models that are not true

Page 28: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

More examples:X1, . . . ,X50 i.i.d. with L(X1) = N (µ,1),test H0 : µ = 0 against H1 : µ = 0.5.

(c) t2 (non-existing variance, CLT doesn’t hold)

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

t_2−distribution

x

dens

ity

Christian Hennig Testing in models that are not true

Page 29: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Performance of t-test of H0 : µ = 0

Distribution effective level powerGaussian 0.05 0.93rounded Gaussian 0.05 0.94exponential 0.06 1t2 0.04 0.39

Christian Hennig Testing in models that are not true

Page 30: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

More examples:X1, . . . ,X50 i.i.d. with L(X1) = N (µ,1),test H0 : µ = 0 against H1 : µ = 0.5.

(d) Gross error model

0.0

0.1

0.2

0.3

0.4

Gross error model

x

dens

ity

−4 −3 −2 −1 0 1 2 3 44 1000

Christian Hennig Testing in models that are not true

Page 31: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

(d) Gross error model

0.00.1

0.20.3

0.4

Gross error model

x

density

−4 −3 −2 −1 0 1 2 3 44 1000

Here µ = 0 with P = 0.99, but EPX = 10!Does this belong to H0 or H1 (compute level or power)?

General issue: µ is defined within nominal model.If model violated,it’s matter of interpretation how to “translate” H0 and H1.

(In fact also relevant for exponential;do we want expected value, median, mode= 0?)

Christian Hennig Testing in models that are not true

Page 32: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

(d) Gross error model

0.00.1

0.20.3

0.4

Gross error model

x

density

−4 −3 −2 −1 0 1 2 3 44 1000

Here µ = 0 with P = 0.99, but EPX = 10!Does this belong to H0 or H1 (compute level or power)?

General issue: µ is defined within nominal model.If model violated,it’s matter of interpretation how to “translate” H0 and H1.

(In fact also relevant for exponential;do we want expected value, median, mode= 0?)

Christian Hennig Testing in models that are not true

Page 33: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Performance of t-test of H0 : µ = 0

Distribution effective level powerGaussian 0.05 0.93rounded Gaussian 0.05 0.94exponential 0.06 1t2 0.04 0.39gross error (EPX = 10) 0.03 0.56gross error (EPX = 0) 0.60 0.56

Christian Hennig Testing in models that are not true

Page 34: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

More examples:X1, . . . ,X50 i.i.d. with L(X1) = N (µ,1),test H0 : µ = 0 against H1 : µ = 0.5.

(e) Constant correlation. X1, . . . ,Xn marginally as above,ρ(Xi ,Xj) = 0.1 ∀i , j .

0 200 400 600 800 1000

−3

−2

−1

01

23

Observation

x

0 200 400 600 800 1000

−2

−1

01

2

Observation

x

Christian Hennig Testing in models that are not true

Page 35: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Performance of t-test of H0 : µ = 0

Distribution effective level powerGaussian 0.05 0.93rounded Gaussian 0.05 0.94exponential 0.06 1t2 0.04 0.39gross error 0.03 0.56gross error (EPX = 0) 0.60 0.56correlated Gaussian 0.44 0.86

Some of these are dangerous, some are harmless.

Christian Hennig Testing in models that are not true

Page 36: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

3. Nominal and substantive hypotheses

Nominal H0 and H1 are defined in “model world”;but we’re interested in substantive hypthesis in real world.

“Turbidity in river X at place Y over time period Z is(not) larger than 25.”

If “true” distribution isn’t the nominal one,does it belong to substantive H0, to H1, or neither?(“Should we reject?”)

Christian Hennig Testing in models that are not true

Page 37: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

3. Nominal and substantive hypotheses

Nominal H0 and H1 are defined in “model world”;but we’re interested in substantive hypthesis in real world.

“Turbidity in river X at place Y over time period Z is(not) larger than 25.”

If “true” distribution isn’t the nominal one,does it belong to substantive H0, to H1, or neither?(“Should we reject?”)

Christian Hennig Testing in models that are not true

Page 38: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

“Turbidity in river X at place Y over time period Z is(not) larger than 25.”

Issues with “translation into model world”:I Measurement error, idea of unobserved “true” turbidityI How to aggregate measurement distribution? (Median?

Mean?)I Definitory treatment of turbidity peaks/outliersI (Ignored here:) Trend/dependence of “true” turbidity

Christian Hennig Testing in models that are not true

Page 39: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

If “true” distribution isn’t the nominal one,does it belong to substantive H0, to H1, or neither?

E.g. gross error model 0.99N (25,1) + 0.01δ1025:“Substantive µ”= 25 (H0; of Gaussian) or = 35 (H0; E-value)?

Do we see an outlier at 1025 as “meaningless disturbance”or as important to be taken into account?

This needs judgement!The data cannot decide this,not even making a truth assumption is enough!

CLT holds for gross error model, but this doesn’t helpif E-value doesn’t reflect substantive hypothesis!

Christian Hennig Testing in models that are not true

Page 40: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

If “true” distribution isn’t the nominal one,does it belong to substantive H0, to H1, or neither?

E.g. gross error model 0.99N (25,1) + 0.01δ1025:“Substantive µ”= 25 (H0; of Gaussian) or = 35 (H0; E-value)?

Do we see an outlier at 1025 as “meaningless disturbance”or as important to be taken into account?

This needs judgement!The data cannot decide this,not even making a truth assumption is enough!

CLT holds for gross error model, but this doesn’t helpif E-value doesn’t reflect substantive hypothesis!

Christian Hennig Testing in models that are not true

Page 41: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

If “true” distribution isn’t the nominal one,does it belong to substantive H0, to H1, or neither?

E.g. gross error model 0.99N (25,1) + 0.01δ1025:“Substantive µ”= 25 (H0; of Gaussian) or = 35 (H0; E-value)?

Do we see an outlier at 1025 as “meaningless disturbance”or as important to be taken into account?

This needs judgement!The data cannot decide this,not even making a truth assumption is enough!

CLT holds for gross error model, but this doesn’t helpif E-value doesn’t reflect substantive hypothesis!

Christian Hennig Testing in models that are not true

Page 42: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

What does the test actually do?t-test with T = Xn−µ

Sn/√

n ,rejecting H0 for |T | > cαcan be interpreted as testing general nonparametricH0 : P is such that P{|T | > cα} ≤ α againstH1 : P is such that P{|T | > cα} > αFor this, the test is unbiased by definition.

The key issue then is:Does definition of T indicatethe desired direction of deviation from the substantive H0?Rather than “are the assumptions fulfilled”? (Which they aren’t.)

Christian Hennig Testing in models that are not true

Page 43: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

What does the test actually do?t-test with T = Xn−µ

Sn/√

n ,rejecting H0 for |T | > cαcan be interpreted as testing general nonparametricH0 : P is such that P{|T | > cα} ≤ α againstH1 : P is such that P{|T | > cα} > αFor this, the test is unbiased by definition.

The key issue then is:Does definition of T indicatethe desired direction of deviation from the substantive H0?

Rather than “are the assumptions fulfilled”? (Which they aren’t.)

Christian Hennig Testing in models that are not true

Page 44: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

What does the test actually do?t-test with T = Xn−µ

Sn/√

n ,rejecting H0 for |T | > cαcan be interpreted as testing general nonparametricH0 : P is such that P{|T | > cα} ≤ α againstH1 : P is such that P{|T | > cα} > αFor this, the test is unbiased by definition.

The key issue then is:Does definition of T indicatethe desired direction of deviation from the substantive H0?Rather than “are the assumptions fulfilled”? (Which they aren’t.)

Christian Hennig Testing in models that are not true

Page 45: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

With this interpretation, it is not true that“the P value tests all the assumptions about how the data weregenerated, not just the targeted hypothesis it is supposed totest”.

It doesn’t automatically test the substantive hypothesis,but in fact it testswhether T is where it is expected to be under the H0(. . . and under many other distributions,hopefully mostly formalising the substantive H0.)

Christian Hennig Testing in models that are not true

Page 46: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

With this interpretation, it is not true that“the P value tests all the assumptions about how the data weregenerated, not just the targeted hypothesis it is supposed totest”.

It doesn’t automatically test the substantive hypothesis,but in fact it testswhether T is where it is expected to be under the H0

(. . . and under many other distributions,hopefully mostly formalising the substantive H0.)

Christian Hennig Testing in models that are not true

Page 47: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

With this interpretation, it is not true that“the P value tests all the assumptions about how the data weregenerated, not just the targeted hypothesis it is supposed totest”.

It doesn’t automatically test the substantive hypothesis,but in fact it testswhether T is where it is expected to be under the H0(. . . and under many other distributions,hopefully mostly formalising the substantive H0.)

Christian Hennig Testing in models that are not true

Page 48: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

4. What can we do about the model assumptions?

Standard approaches:I Misspecification testingI Informal (visual) diagnosisI “Translate” information about reality into model world,

e.g., time dependence of water turbidity

Christian Hennig Testing in models that are not true

Page 49: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Misspecification testing:H0 : Assumption holds, H1 : Asumption violated.

DataMisspecification

test

Model−based

method (e.g. test)

Alternative method(if available)

or

not do anything?

don’t reject assumption

reject assumption

Christian Hennig Testing in models that are not true

Page 50: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Fisher (1922): “For empirical as the specification of the hypotheticalpopulation may be, this empiricism is cleared of its dangers if we canapply a rigorous and objective test of its adequacy.”

Cox & Mayo (2006): “An important part of frequentist theory is itsability to check model assumptions.”

Kass et al. (2016): “Rule 8: Check your assumptions.”

Spanos (2018): “The typicality of (observations) z0 (for the proposedmodel) can - and should - be assessed using trenchantmisspecification testing.”

Christian Hennig Testing in models that are not true

Page 51: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Example: Shapiro-Wilk test for normality:Distribution eff. level power S-W detection prob.Gaussian 0.05 0.93 0.05rounded Gaussian 0.05 0.94 0.05exponential 0.06 1 0.99t2 0.04 0.39 0.86gross error 0.03 0.56 0.42gross error (EPX = 0) 0.60 0.56 0.42correlated Gaussian 0.44 0.86 (0.05)

Least normal 6= most dangerous!S-W test can’t find rounded Gaussian.This bug is actually a feature!Don’t want to find everything.

Christian Hennig Testing in models that are not true

Page 52: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Example: Shapiro-Wilk test for normality:Distribution eff. level power S-W detection prob.Gaussian 0.05 0.93 0.05rounded Gaussian 0.05 0.94 0.05exponential 0.06 1 0.99t2 0.04 0.39 0.86gross error 0.03 0.56 0.42gross error (EPX = 0) 0.60 0.56 0.42correlated Gaussian 0.44 0.86 (0.05)

Least normal 6= most dangerous!

S-W test can’t find rounded Gaussian.This bug is actually a feature!Don’t want to find everything.

Christian Hennig Testing in models that are not true

Page 53: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Example: Shapiro-Wilk test for normality:Distribution eff. level power S-W detection prob.Gaussian 0.05 0.93 0.05rounded Gaussian 0.05 0.94 0.05exponential 0.06 1 0.99t2 0.04 0.39 0.86gross error 0.03 0.56 0.42gross error (EPX = 0) 0.60 0.56 0.42correlated Gaussian 0.44 0.86 (0.05)

Least normal 6= most dangerous!S-W test can’t find rounded Gaussian.This bug is actually a feature!Don’t want to find everything.

Christian Hennig Testing in models that are not true

Page 54: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Untestable assumptions

Constant correlation. X1, . . . ,Xn marginally N (µ, σ2),ρ(Xi ,Xj) = 0.1 ∀i , j .

0 200 400 600 800 1000

−3

−2

−1

01

23

Observation

x

0 200 400 600 800 1000

−2

−1

01

2

Observation

x

This is pretty bad (see above). . .but it’s indistiguishable from i.i.d.!

Christian Hennig Testing in models that are not true

Page 55: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Why’s that? Assume X1, . . . ,Xn as before with Cor(Xi ,Xj) = ρ.

Lemma 1: For Y1, . . . ,Yn iid, L(Y1) = N (µ, (1− ρ)σ2):

L(X1, . . . ,Xn|Xn) = L(Y1, . . . ,Yn|Yn).

Proof: Elementary calculations on conditional multivariatenormals.

Given the mean the distributions are the same;for unknown µ, σ2, mean doesn’t hold information about ρ.

Christian Hennig Testing in models that are not true

Page 56: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Generally, can only test dependenceassuming regularly repeated dependence pattern(such as in time series, within random effect levels).

Dependence can only be foundif we can specify how observation order is informative for it.

Other dependence patternscan only be excluded by assumption.The best we can do is to think very hard about the situation.(Same with irregular non-identity of distribution.)

Christian Hennig Testing in models that are not true

Page 57: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Further issue with misspecification testing:The misspecification (goodness-of-fit) paradox(H, 2007)

Checking the model assumptions violates them automaticallybecause the possibility of unlikely eventsis constitutive part of the models.

(Known in literature for long,e.g., Bancroft 1944, Chatfield 1995)

Christian Hennig Testing in models that are not true

Page 58: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Christian Hennig Testing in models that are not true

Page 59: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Christian Hennig Testing in models that are not true

Page 60: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

But is this a problem?A. Spanos (2018): “No, we learn that model is valid for data.(MS test and main test) “pose very different questions to data”.MS test tests whether data “constitutes truly typicalrealization of mechanism described by model”.

In fact, if MS test and main test are independent,misspecification paradox does not affectdistribution of main test statistic.(E.g., Gaussian linear regression model checkingbased on residuals.)

Christian Hennig Testing in models that are not true

Page 61: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

But is this a problem?A. Spanos (2018): “No, we learn that model is valid for data.(MS test and main test) “pose very different questions to data”.MS test tests whether data “constitutes truly typicalrealization of mechanism described by model”.

In fact, if MS test and main test are independent,misspecification paradox does not affectdistribution of main test statistic.(E.g., Gaussian linear regression model checkingbased on residuals.)

Christian Hennig Testing in models that are not true

Page 62: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Christian Hennig Testing in models that are not true

Page 63: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

But independence is often not fulfilled.

Statistics literature from Bancroft (1944) investigatesdistribution of resultconditionally on not rejecting assumption.

E.g., will test level be kept, power decline?Also, does MS testing help if model is violated?

Again: model violation of assumption, and what is done,and see what happens.

Christian Hennig Testing in models that are not true

Page 64: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

But independence is often not fulfilled.

Statistics literature from Bancroft (1944) investigatesdistribution of resultconditionally on not rejecting assumption.

E.g., will test level be kept, power decline?Also, does MS testing help if model is violated?

Again: model violation of assumption, and what is done,and see what happens.

Christian Hennig Testing in models that are not true

Page 65: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

5. Combined procedures

DataMisspecification

test

Model−based

method (e.g. test)

Alternative method

no evidence against

H0 of interest

reject

H0 of interest

H0 of interest needs definition for both

model−based and alternative method, e.g. equality

don’t reject assumption

reject assumption

Analyse under nominal model and violated assumptionswhat these procedures deliver.

Christian Hennig Testing in models that are not true

Page 66: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Some results

Authors who investigated specific combined procedures:

Easterling and Anderson (1978): “The results given here (. . . ) are not supportive of thenotion that preliminary testing is the proper thing to do.”

Freeman (1989): “In the light of the results in this paper, the two-stage analysis is sounsatisfactory as to be ruled out of future use.”

Moser and Stevens (1992): “Is the current practice of preliminary variance testsappropriate? The answer is no.”

Fay and Proschan (2010): “The choice between t- and Wilcoxon-Mann-Whitney shouldnot be based on a test of normality.”

Rochon, Gondan and Kieser (2012): “From a formal perspective, preliminary testing for

normality is incorrect and should therefore be avoided.”

Christian Hennig Testing in models that are not true

Page 67: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Overall disturbing, givenpreference for assumption checking in general literature.

. . . but at least King and Giles (1984): “We find that overall,pre-testing is preferable to pure OLS regression techniques andgenerally compares favourably with the strategy of always correctingfor possible autocorrelation.”

Christian Hennig Testing in models that are not true

Page 68: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

“Mixed” setupsLiterature looks at either fulfilled or violated assumptions

Nominal model

Violated model

DataMisspecification

test

Model−based

method (e.g. test)

Alternative method

don't reject assumption

reject assumption

Christian Hennig Testing in models that are not true

Page 69: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

“Mixed” setupsLiterature looks at either fulfilled or violated assumptions

Nominal model

Violated model

DataMisspecification

test

Model−based

method (e.g. test)

Alternative method

don't reject assumption

reject assumption

Christian Hennig Testing in models that are not true

Page 70: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Nominal model

Violated model

DataMisspecification

test

Model−based

method (e.g. test)

Alternative method

don't reject assumption

reject assumption

Simpson’s paradox: MS testing may not help for nominalmodel. . .

Christian Hennig Testing in models that are not true

Page 71: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Nominal model

Violated model

DataMisspecification

test

Model−based

method (e.g. test)

Alternative method

don't reject assumption

reject assumption

. . . may not help if assumptions violated. . .

Christian Hennig Testing in models that are not true

Page 72: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Nominal model

Violated model

DataMisspecification

test

Model−based

method (e.g. test)

Alternative method

don't reject assumption

reject assumption

. . . but can help if both are mixed.

Christian Hennig Testing in models that are not true

Page 73: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Nominal model

Violated model

DataMisspecification

test

Model−based

method (e.g. test)

Alternative method

don't reject assumption

reject assumption

Looking at nominal or violated model in isolationwill hide ability of MS test to make a difference.

Christian Hennig Testing in models that are not true

Page 74: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

PhD thesis of Iqbal Shamsudheen:Look at “mixed” setupsin which with probability λ ∈ [0,1]model assumption fulfilled or not.

(Two two-sample test examples,look at power only here;type I error probability also relevantbut level not significantly violatedby any procedure in these examples.)

Christian Hennig Testing in models that are not true

Page 75: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

0.0 0.2 0.4 0.6 0.8 1.0

0.30

0.35

0.40

0.45

0.50

0.55

0.60

Exponential mean diff=0.5, Normal mean diff=0.5, n=20

lambda

pow

er

t−test

Wilcoxon test

Setup from Rochon et al. (2012) -note that t-test is more superior for exp than for normal.

Christian Hennig Testing in models that are not true

Page 76: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

0.0 0.2 0.4 0.6 0.8 1.0

0.30

0.35

0.40

0.45

0.50

0.55

0.60

Exponential mean diff=0.5, Normal mean diff=0.5, n=20

lambda

pow

er

t−test

Wilcoxon test

Combined procedure

. . . and combined procedure is quite competitive under normal.

Christian Hennig Testing in models that are not true

Page 77: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

0.0 0.2 0.4 0.6 0.8 1.0

0.20

0.25

0.30

0.35

Laplace mean diff=0.5, Normal mean diff=0.5, n=20

lambda

pow

er

t−test

Wilcoxon test

Christian Hennig Testing in models that are not true

Page 78: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

0.0 0.2 0.4 0.6 0.8 1.0

0.20

0.25

0.30

0.35

Laplace mean diff=0.5, Normal mean diff=0.5, n=20

lambda

pow

er

t−test

Wilcoxon test

Combined procedure

. . . but combined procedure can better them bothfor much of λ-range.

Christian Hennig Testing in models that are not true

Page 79: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Many follow this pattern:

!"##

!"$

!"$%

!"$&

!"$'

!"'(

! !") !"( !"% !"* !"+ !"& !"# !"$ !"' )

!"#$%

&

,-./0123 4, 56

Christian Hennig Testing in models that are not true

Page 80: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

A general theoretical result

!"##

!"$

!"$%

!"$&

!"$'

!"'(

! !") !"( !"% !"* !"+ !"& !"# !"$ !"' )

!"#$%

&

,-./0123 4, 56

Christian Hennig Testing in models that are not true

Page 81: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Lemma 2, Shamsudheen & H. (2020):

Look at probability λ for fulfilled assumptions P,otherwise violated assumptions Q.

Assume ΦMS (approx.) independent of both ΦMC and ΦAU .

Assume MS test “better than useless”.

Assume model-based method has higher power under P,alternative higher power under Q.

Then combined procedure has higher power thanboth ΦMC and ΦAU for [λ1, λ2], 0 < λ1 < λ2 < 1.

Christian Hennig Testing in models that are not true

Page 82: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Lemma 2, Shamsudheen & H. (2020):

Look at probability λ for fulfilled assumptions P,otherwise violated assumptions Q.

Assume ΦMS (approx.) independent of both ΦMC and ΦAU .

Assume MS test “better than useless”.

Assume model-based method has higher power under P,alternative higher power under Q.

Then combined procedure has higher power thanboth ΦMC and ΦAU for [λ1, λ2], 0 < λ1 < λ2 < 1.

Christian Hennig Testing in models that are not true

Page 83: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Are MS testing/combined procedures advisable?No, if model-based test is robust (good overall).No, if alternative test is good also under nominal

model.No, if good robust/alternative approaches are

preferred.

Yes, if MS test is sensitive to violations that matter,and MS test is approximately independent of main

tests,and main tests have “complementary qualities”,and both close-to-nominal and violated assumptions

seem realistic.Details matter!

Christian Hennig Testing in models that are not true

Page 84: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Are MS testing/combined procedures advisable?No, if model-based test is robust (good overall).No, if alternative test is good also under nominal

model.No, if good robust/alternative approaches are

preferred.Yes, if MS test is sensitive to violations that matter,and MS test is approximately independent of main

tests,and main tests have “complementary qualities”,and both close-to-nominal and violated assumptions

seem realistic.Details matter!

Christian Hennig Testing in models that are not true

Page 85: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Major issue with current MS testing:

Focus on testing whether model assumptions hold -but focus should be to distinguishproblematic from unproblematic violations!

Much research potential!

Christian Hennig Testing in models that are not true

Page 86: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

DiscussionMore than one assumption needs checking.

More complicated combined procedures,analyse easier cases first.

Is visual assumption checking better?It may be, in the hands of good data analyst,but it may also be worse, andit cannot be analysed by theory or simulation!

Christian Hennig Testing in models that are not true

Page 87: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

DiscussionMore than one assumption needs checking.

More complicated combined procedures,analyse easier cases first.

Is visual assumption checking better?It may be, in the hands of good data analyst,but it may also be worse, andit cannot be analysed by theory or simulation!

Christian Hennig Testing in models that are not true

Page 88: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Key take-awaysI Much communication about model assumptions is

misleading.

I The issue is not whether assumptions are fulfilled,but rather whether they are violated in ways that misleadabout substantive hypothesis.

I Whether assumption checking helps depends on manydetails.

I Some key assumptions cannot be checked against data.I Judgment and interpretation are always involved.I None of these issues is solved by Bayesian statistics.

Christian Hennig Testing in models that are not true

Page 89: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Key take-awaysI Much communication about model assumptions is

misleading.I The issue is not whether assumptions are fulfilled,

but rather whether they are violated in ways that misleadabout substantive hypothesis.

I Whether assumption checking helps depends on manydetails.

I Some key assumptions cannot be checked against data.I Judgment and interpretation are always involved.I None of these issues is solved by Bayesian statistics.

Christian Hennig Testing in models that are not true

Page 90: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Key take-awaysI Much communication about model assumptions is

misleading.I The issue is not whether assumptions are fulfilled,

but rather whether they are violated in ways that misleadabout substantive hypothesis.

I Whether assumption checking helps depends on manydetails.

I Some key assumptions cannot be checked against data.I Judgment and interpretation are always involved.I None of these issues is solved by Bayesian statistics.

Christian Hennig Testing in models that are not true

Page 91: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Key take-awaysI Much communication about model assumptions is

misleading.I The issue is not whether assumptions are fulfilled,

but rather whether they are violated in ways that misleadabout substantive hypothesis.

I Whether assumption checking helps depends on manydetails.

I Some key assumptions cannot be checked against data.

I Judgment and interpretation are always involved.I None of these issues is solved by Bayesian statistics.

Christian Hennig Testing in models that are not true

Page 92: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Key take-awaysI Much communication about model assumptions is

misleading.I The issue is not whether assumptions are fulfilled,

but rather whether they are violated in ways that misleadabout substantive hypothesis.

I Whether assumption checking helps depends on manydetails.

I Some key assumptions cannot be checked against data.I Judgment and interpretation are always involved.

I None of these issues is solved by Bayesian statistics.

Christian Hennig Testing in models that are not true

Page 93: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

Key take-awaysI Much communication about model assumptions is

misleading.I The issue is not whether assumptions are fulfilled,

but rather whether they are violated in ways that misleadabout substantive hypothesis.

I Whether assumption checking helps depends on manydetails.

I Some key assumptions cannot be checked against data.I Judgment and interpretation are always involved.I None of these issues is solved by Bayesian statistics.

Christian Hennig Testing in models that are not true

Page 94: Testing in models that are not true

IntroductionWhat happens if assumptions are violated?

Nominal and substantive hypothesesWhat can we do about the model assumptions?

Combined procedures

References:Bancroft, T. A. (1944) On biases in estimation due to the use of preliminary tests of significance. Annals of

Mathematical Statistics 15(2), 190-204.Chatfield, C. (1995) Model Uncertainty, Data Mining and Statistical Inference (with discussion). Journal of the

Royal Statistical Society, Series B 158(3), 419-466.Cox, D. R. (2006) Principles of Statistical Inference. Cambridge University Press.

Easterling, R. G., & Anderson, H. E. (1978) The effect of preliminary normality goodness of fit tests on subsequentinference, Journal of Statistical Computation and Simulation 8(1), 1-11.

Fay, M. P. and Proschan M. A. (2010) Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests andmultiple interpretations of decision rules. Statistics Surveys 4, 1-39.

Fisher, R. A. (1922) On the Mathematical Foundation of Theoretical Statistics, Philosophical Transactions of theRoyal Society of London A 222, 309-368.

Freeman, P. (1989) The performance of the two-stage analysis of two-treatment, two-period cross-over trials.Statistics in Medicine 8, 1421-1432.

Greenland, S., Senn, S.J., Rothman, K.J. et al. (2016) Statistical tests, P values, confidence intervals, and power:a guide to misinterpretations. European Journal of Epidemiolgy 31, 337-350.

Hennig, C. (2007) Falsification of propensity models by statistical tests and the goodness-of-fit paradox.Philosophia Mathematica 15(2), 166-192.

Kass, R. E., Caffo, B. S., Davidian, M., Meng, X. L., Yu, B., & Reid, N. (2016) Ten simple rules for effectivestatistical practice. PLoS Computational Biology 12(6), e1004961.

King, M. L. and Giles, D. E. A. (1984) Autocorrelation pre-testing in the linear model: Estimation, testing andprediction, Journal of Econometrics 25 (1), 35-48.

Moser, B. K., & Stevens, G. R. (1992) Homogeneity of variance in the two-sample means test. The AmericanStatistician 46(1), pp. 19-21.

Rochon, J., Gondan, M., & Kieser, M. (2012) To test or not to test: Preliminary assessment of normality whencomparing two independent samples. BMC Medical Research Methodology 12(1), 81-91.

Shamsudheen, M. I. and Hennig, C. (2020) Should we test the model assumptions before running a model-basedtest? arXiv:1908.02218.

Spanos, A. (2018) Mis-specification Testing in Retrospect. Journal of Economic Surveys 32(2), 541-577.

Christian Hennig Testing in models that are not true


Recommended