Thoughts on the theory of statistics

transcript

Thoughts on the theory of statisticsNancy Reid

Theory of statistics

SSC 2010

Statistics in demand

“Statistical science is undergoing unprecedented growth in both opportunity and activity”

High energy physicsArt history Reality miningBioinformaticsComplex surveysClimate and environmentSSC 2010 …

SSC 2010

Statistical Thinking

Dramatic increase in resources now available

SSC 2010

Statistical Thinking 1

If a statistic was the answer, what was the question? What are we counting?

Common pitfalls means, medians and outliers

How sure are we? statistical significance and confidence

Percentages and risk relative and absolute change

SSC 2010

Statistical theory for 20xx

What should we be teaching?If a statistic was the answer, what was the question?

Design of experiments and surveysCommon pitfalls

Summary statistics: sufficiency etc.How sure are we?

InferencePercentages and risk

Interpretation

SSC 2010

Models and likelihood

Modelling is difficult and importantWe can get a lot from the likelihood functionNot only point estimatorsNot only (not at all!!) most powerful testsInferential quantities (pivots)Inferential distributions (asymptotics)A natural starting point, even for very complex models

SSC 2010

Likelihood is everywhere! 2

Outline

SSC 2010

1. Higher order asymptoticslikelihood as pivotal

2. Bayesian and non-Bayesian inference3. Partial, quasi, composite likelihood4. Where are we headed?

SSC 2010

P-value functions from likelihood

Likelihood as pivotal

SSC 2010

P-value functions from likelihood

SSC 2010

Can be nearly exact

Likelihood rootMaximum likelihood estimateScore function All approximately distributed as

Much better :

can be

SSC 2010

Can be nearly exact

Likelihood rootMaximum likelihood estimateScore function

SSC 2010

Can be nearly exact

SSC 2010

Can be nearly exact

SSC 2010

Can be nearly exact

SSC 2010

Can be nearly exact

SSC 2010

Can be nearly exact 3

SSC 2010

Using higher order approximations

Excellent approximations for ‘easy’ cases Exponential families, non-normal linear regression

More work to construct for ‘moderate’ cases Autoregressive models, fixed and random effects, discrete responses

Fairly delicate for ‘difficult’ cases Complex structural models with several sources of variation

Best results for scalar parameter of interest But we may need inference for vector parameters

SSC 2010

Where does this come from?

4Amari, 1982, Biometrika; Efron, 1975, Annals

SSC 2010

Where does this come from? 5,6,7

Differential geometry of statistical modelsTheory of exponential familiesEdgeworth and saddlepoint approximationsKey idea:A smooth parametric model can be approximated by a tangent exponential family modelRequires differentiating log-likelihood function on the sample spacePermits extensions to more complex models

SSC 2010

Where does this come from? 8

SSC 2010

Generalizations

To discrete dataWhere differentiating the log-likelihood on the sample

space is more difficultSolution: use expected value of score statistic insteadRelative error instead ofStill better than the normal approximation

SSC 2010

Generalizations 9

SSC 2010

Generalizations 10

To vector parameters of interestBut our solutions require a single parameter Solution: use length of the vector, conditioned on the

direction

SSC 2010

Generalizations 11

Extending the role of the exponential familyBy generalizing differentiation on the sample spaceIdea: differentiate the expected log-likelihood

Instead of the log-likelihoodLeads to a new version of approximating exponential

familyCan be used with pseudo-likelihoods

SSC 2010

What can we learn? 12

Bayesian/nonBayesian

Higher order approximation requiresDifferentiating the log-likelihood function on the sample spaceBayesian inference will be differentAsymptotic expansion highlights the discrepancyBayesian posteriors are in general not calibratedCannot always be corrected by choice of the priorWe can study this by comparing Bayesian and

nonBayesian approximations

SSC 2010

Example: inference for ED50 13

Logistic regression with a single covariateOn the logistic scaleUse flat priors for Parameter of interest isEmpirical coverage of Bayesian posterior intervals:

0.90, 0.88, 0.89, 0.90Empirical coverage of intervals using

0.95, 0.95, 0.95, 0.95

SSC 2010

Flat priors are not a good idea! 14

SSC 2010

Flat priors are not a good idea!

SSC 2010

Flat priors are not a good idea!

Bayesian p-value – Frequentist p-value

SSC 2010

More complex models

Partial, quasi, composite likelihood

Likelihood inference has desirable propertiesSufficiency, asymptotic efficiencyGood approximations to needed distributionsDerived naturally from parametric modelsCan be difficult to construct, especially in complex modelsMany natural extensions: partial likelihood for censored

data, quasi-likelihood for generalized estimating equations, composite likelihood for dependent data

SSC 2010

Complex models 14

Example: longitudinal study of migraine sufferersLatent variable Observed variable

E.g. no headache, mild, moderate, intense … Covariates: age, education, painkillers, weather, … random effects between and within subjectsSerial correlation

SSC 2010

Likelihood for longitudinal discrete data

Likelihood function

Hard to computeMakes strong assumptionsProposal: use bivariate marginal densities instead of full multivariate normal densitiesGiving a mis-specified model

SSC 2010

Composite likelihood

Composite likelihood function

More generally

Sets index marginal or conditional (or …) distributions

Inference based on theory of estimating equations

SSC 2010

A simple example 16

Pairwise likelihood estimator of fully efficientIf , loss of efficiency depends on dimensionSmall for dimension less than, say, 10Falls apart if for fixed sample size

Relevant for time series, genetics applications

SSC 2010

Composite likelihood estimator

Godambe information

SSC 2010

Recent Applications 17

Longitudinal data, binary and continuous: random effects models

Survival analysis: frailty models, copulas Multi-type responses: discrete and continuous;

markers and event timesFinance: time-varying covariance models Genetics/bioinformatics: CCL for vonMises distribution:

protein folding; gene mapping; linkage disequilibriumSpatial data: geostatistics, spatial point processes

SSC 2010

… and more

Image analysis Rasch modelBradley-Terry model State space modelsPopulation dynamics …

SSC 2010

What can we learn?

SSC 2010

What do we need to know?

Why are composite likelihood estimators efficient?How much information should we use?Are the parameters guaranteed to be identifiable?Are we sure the components are consistent with a

‘true’ model?Can we make progress if not?How do joint densities get constructed?What properties do these constructions have?Is composite likelihood robust?

SSC 2010

Why is this important?

Composite likelihood ideas generated from applicationsLikelihood methods seem too complicatedA range of application areas all use the same/similar

ideasAbstraction provided by theory allows us to step back

from the particular applicationGet some understanding about when the methods

might not workAs well as when they are expected to work well

SSC 2010

The role of theory

Where are we headed?

Abstracts the main ideasSimplifies the detailsIsolates particular featuresIn the best scenario, gives new insight into what

underlies our intuitionExample: curvature and Bayesian inferenceExample: composite likelihood Example: false discovery rates

SSC 2010

False discovery rates 18

Problem of multiple comparisons Simultaneous statistical inference – R.G. Miller, 1966

Bonferroni correction too strongBenjamini and Hochberg, 1995Introduce False Discovery Rate

An improvement (huge!) on “Type I and Type II error”Then comes data, in this case from astrophysicsGenovese & Wasserman collaborating with Miller and

Nichol

SSC 2010

False discovery rates 19

SSC 2010

Speculation 20

Composite likelihood as a smootherCalibration of posterior inferenceExtension of higher order asymptotics to composite

likelihoodExponential families and empirical likelihoodSemi-parametric and non-parametric models

connected to higher order asymptoticsEffective dimension reduction for inferenceEnsemble methods in machine learning

SSC 2010

Speculation 21

“in statistics the problems always evolve relative to the development of new data structures and new computational tools” … NSF report

“Statistics is driven by data” … Don McLeish“Our discipline needs collaborations” … Hugh ChipmanHow do we create opportunities? How do we establish an independent identity?In the face of bureaucratic pressures to merge?Keep emphasizing what we do best!!

SSC 2010

Speculation

Engle Variation, modelling, data, theory, data, theory

Tibshirani Cross-validation; forensic statistics

Netflix Grand Prize Recommender systems: machine learning, psychology,

statistics!Tufte

“Visual Display of Quantitative Information” -- 1983

http://recovery.gov

787,000,000,000 $

Thank you!!

End Notes

1. “Making Sense of Statistics” Accessed on May 5, 2010. http://www.senseaboutscience.org.uk/2. Midlife Crisis: National Post, January 30, 2008.3. Alessandra Brazzale, Anthony Davison and Reid (2007). Applied Asymptotics. Cambridge

University Press.4. Amari (1982). Biometrika.5. Fraser, Reid, Jianrong Wu. (1999). Biometrika.6. Reid (2003). Annals Statistics7. Fraser (1990). J. Multivariate Anal. 8. Figure drawn by Alessandra Brazzale. From Reid (2003).9. Davison, Fraser, Reid (2006). JRSS B.10. Davison, Fraser, Reid, Nicola Sartori (2010). in progress11. Reid and Fraser (2010). Biometrika12. Fraser, Reid, Elisabetta Marras, Grace Yun-Yi (2010). JRSSB13. Reid and Ye Sun (2009). Communications in Statistics14. J. Heinrich (2003). Phystat Proceedings15. C. Varin, C. Czado (2010). Biostatistics.16. D.Cox, Reid (2004). Biometrika.17. CL references in C.Varin, D.Firth, Reid (2010). Submitted for publication.18. Account of FDR and astronomy taken from Lindsay et al (2004). NSF Report on the Future of

Statistics19. Miller et al. (2001). Science.20. Photo: http://epiac1216.wordpress.com/2008/09/23/origins-of-the-phrase-pie-in-the-sky/ 21. Photo: http://www.bankofcanada.ca/en/banknotes/legislation/images/023361-lg.jpg

Thoughts on the theory of statistics

Documents