Post on 23-Feb-2016
description
transcript
Thoughts on the theory of statisticsNancy Reid
Theory of statistics
SSC 2010
Statistics in demand
“Statistical science is undergoing unprecedented growth in both opportunity and activity”
High energy physicsArt history Reality miningBioinformaticsComplex surveysClimate and environmentSSC 2010 …
Theory of statistics
SSC 2010
Statistical Thinking
Dramatic increase in resources now available
Theory of statistics
SSC 2010
Statistical Thinking 1
If a statistic was the answer, what was the question? What are we counting?
Common pitfalls means, medians and outliers
How sure are we? statistical significance and confidence
Percentages and risk relative and absolute change
Theory of statistics
SSC 2010
Statistical theory for 20xx
What should we be teaching?If a statistic was the answer, what was the question?
Design of experiments and surveysCommon pitfalls
Summary statistics: sufficiency etc.How sure are we?
InferencePercentages and risk
Interpretation
Theory of statistics
SSC 2010
Models and likelihood
Modelling is difficult and importantWe can get a lot from the likelihood functionNot only point estimatorsNot only (not at all!!) most powerful testsInferential quantities (pivots)Inferential distributions (asymptotics)A natural starting point, even for very complex models
Theory of statistics
SSC 2010
Likelihood is everywhere! 2
Theory of statistics
Outline
SSC 2010
1. Higher order asymptoticslikelihood as pivotal
2. Bayesian and non-Bayesian inference3. Partial, quasi, composite likelihood4. Where are we headed?
Theory of statistics
SSC 2010
P-value functions from likelihood
Likelihood as pivotal
SSC 2010
P-value functions from likelihood
Likelihood as pivotal
0.975
0.025
SSC 2010
Can be nearly exact
Likelihood as pivotal
Likelihood rootMaximum likelihood estimateScore function All approximately distributed as
Much better :
can be
SSC 2010
Can be nearly exact
Likelihood as pivotal
Likelihood rootMaximum likelihood estimateScore function
SSC 2010
Can be nearly exact
Likelihood as pivotal
SSC 2010
Can be nearly exact
Likelihood as pivotal
SSC 2010
Can be nearly exact
Likelihood as pivotal
SSC 2010
Can be nearly exact
Likelihood as pivotal
SSC 2010
Can be nearly exact 3
Likelihood as pivotal
SSC 2010
Using higher order approximations
Likelihood as pivotal
Excellent approximations for ‘easy’ cases Exponential families, non-normal linear regression
More work to construct for ‘moderate’ cases Autoregressive models, fixed and random effects, discrete responses
Fairly delicate for ‘difficult’ cases Complex structural models with several sources of variation
Best results for scalar parameter of interest But we may need inference for vector parameters
SSC 2010
Where does this come from?
Likelihood as pivotal
4Amari, 1982, Biometrika; Efron, 1975, Annals
SSC 2010
Where does this come from? 5,6,7
Likelihood as pivotal
Differential geometry of statistical modelsTheory of exponential familiesEdgeworth and saddlepoint approximationsKey idea:A smooth parametric model can be approximated by a tangent exponential family modelRequires differentiating log-likelihood function on the sample spacePermits extensions to more complex models
SSC 2010
Where does this come from? 8
Likelihood as pivotal
SSC 2010
Generalizations
Likelihood as pivotal
To discrete dataWhere differentiating the log-likelihood on the sample
space is more difficultSolution: use expected value of score statistic insteadRelative error instead ofStill better than the normal approximation
SSC 2010
Generalizations 9
Likelihood as pivotal
SSC 2010
Generalizations 10
Likelihood as pivotal
To vector parameters of interestBut our solutions require a single parameter Solution: use length of the vector, conditioned on the
direction
SSC 2010
Generalizations 11
Likelihood as pivotal
Extending the role of the exponential familyBy generalizing differentiation on the sample spaceIdea: differentiate the expected log-likelihood
Instead of the log-likelihoodLeads to a new version of approximating exponential
familyCan be used with pseudo-likelihoods
SSC 2010
What can we learn? 12
Bayesian/nonBayesian
Higher order approximation requiresDifferentiating the log-likelihood function on the sample spaceBayesian inference will be differentAsymptotic expansion highlights the discrepancyBayesian posteriors are in general not calibratedCannot always be corrected by choice of the priorWe can study this by comparing Bayesian and
nonBayesian approximations
SSC 2010
Example: inference for ED50 13
Bayesian/nonBayesian
Logistic regression with a single covariateOn the logistic scaleUse flat priors for Parameter of interest isEmpirical coverage of Bayesian posterior intervals:
0.90, 0.88, 0.89, 0.90Empirical coverage of intervals using
0.95, 0.95, 0.95, 0.95
SSC 2010
Flat priors are not a good idea! 14
Bayesian/nonBayesian
SSC 2010
Flat priors are not a good idea!
Bayesian/nonBayesian
SSC 2010
Flat priors are not a good idea!
Bayesian/nonBayesian
Bayesian p-value – Frequentist p-value
SSC 2010
More complex models
Partial, quasi, composite likelihood
Likelihood inference has desirable propertiesSufficiency, asymptotic efficiencyGood approximations to needed distributionsDerived naturally from parametric modelsCan be difficult to construct, especially in complex modelsMany natural extensions: partial likelihood for censored
data, quasi-likelihood for generalized estimating equations, composite likelihood for dependent data
SSC 2010
Complex models 14
Partial, quasi, composite likelihood
Example: longitudinal study of migraine sufferersLatent variable Observed variable
E.g. no headache, mild, moderate, intense … Covariates: age, education, painkillers, weather, … random effects between and within subjectsSerial correlation
SSC 2010
Likelihood for longitudinal discrete data
Partial, quasi, composite likelihood
Likelihood function
Hard to computeMakes strong assumptionsProposal: use bivariate marginal densities instead of full multivariate normal densitiesGiving a mis-specified model
SSC 2010
Composite likelihood
Partial, quasi, composite likelihood
Composite likelihood function
More generally
Sets index marginal or conditional (or …) distributions
Inference based on theory of estimating equations
SSC 2010
A simple example 16
Partial, quasi, composite likelihood
Pairwise likelihood estimator of fully efficientIf , loss of efficiency depends on dimensionSmall for dimension less than, say, 10Falls apart if for fixed sample size
Relevant for time series, genetics applications
SSC 2010
Composite likelihood estimator
Partial, quasi, composite likelihood
Godambe information
SSC 2010
Recent Applications 17
Partial, quasi, composite likelihood
Longitudinal data, binary and continuous: random effects models
Survival analysis: frailty models, copulas Multi-type responses: discrete and continuous;
markers and event timesFinance: time-varying covariance models Genetics/bioinformatics: CCL for vonMises distribution:
protein folding; gene mapping; linkage disequilibriumSpatial data: geostatistics, spatial point processes
SSC 2010
… and more
Partial, quasi, composite likelihood
Image analysis Rasch modelBradley-Terry model State space modelsPopulation dynamics …
SSC 2010
What can we learn?
Partial, quasi, composite likelihood
SSC 2010
What do we need to know?
Partial, quasi, composite likelihood
Why are composite likelihood estimators efficient?How much information should we use?Are the parameters guaranteed to be identifiable?Are we sure the components are consistent with a
‘true’ model?Can we make progress if not?How do joint densities get constructed?What properties do these constructions have?Is composite likelihood robust?
SSC 2010
Why is this important?
Partial, quasi, composite likelihood
Composite likelihood ideas generated from applicationsLikelihood methods seem too complicatedA range of application areas all use the same/similar
ideasAbstraction provided by theory allows us to step back
from the particular applicationGet some understanding about when the methods
might not workAs well as when they are expected to work well
SSC 2010
The role of theory
Where are we headed?
Abstracts the main ideasSimplifies the detailsIsolates particular featuresIn the best scenario, gives new insight into what
underlies our intuitionExample: curvature and Bayesian inferenceExample: composite likelihood Example: false discovery rates
SSC 2010
False discovery rates 18
Where are we headed?
Problem of multiple comparisons Simultaneous statistical inference – R.G. Miller, 1966
Bonferroni correction too strongBenjamini and Hochberg, 1995Introduce False Discovery Rate
An improvement (huge!) on “Type I and Type II error”Then comes data, in this case from astrophysicsGenovese & Wasserman collaborating with Miller and
Nichol
SSC 2010
False discovery rates 19
Where are we headed?
SSC 2010
Speculation 20
Where are we headed?
Composite likelihood as a smootherCalibration of posterior inferenceExtension of higher order asymptotics to composite
likelihoodExponential families and empirical likelihoodSemi-parametric and non-parametric models
connected to higher order asymptoticsEffective dimension reduction for inferenceEnsemble methods in machine learning
SSC 2010
Speculation 21
Where are we headed?
“in statistics the problems always evolve relative to the development of new data structures and new computational tools” … NSF report
“Statistics is driven by data” … Don McLeish“Our discipline needs collaborations” … Hugh ChipmanHow do we create opportunities? How do we establish an independent identity?In the face of bureaucratic pressures to merge?Keep emphasizing what we do best!!
SSC 2010
Speculation
Where are we headed?
Engle Variation, modelling, data, theory, data, theory
Tibshirani Cross-validation; forensic statistics
Netflix Grand Prize Recommender systems: machine learning, psychology,
statistics!Tufte
“Visual Display of Quantitative Information” -- 1983
http://recovery.gov
787,000,000,000 $
Thank you!!
Theory of statistics
End Notes
Theory of statistics
1. “Making Sense of Statistics” Accessed on May 5, 2010. http://www.senseaboutscience.org.uk/2. Midlife Crisis: National Post, January 30, 2008.3. Alessandra Brazzale, Anthony Davison and Reid (2007). Applied Asymptotics. Cambridge
University Press.4. Amari (1982). Biometrika.5. Fraser, Reid, Jianrong Wu. (1999). Biometrika.6. Reid (2003). Annals Statistics7. Fraser (1990). J. Multivariate Anal. 8. Figure drawn by Alessandra Brazzale. From Reid (2003).9. Davison, Fraser, Reid (2006). JRSS B.10. Davison, Fraser, Reid, Nicola Sartori (2010). in progress11. Reid and Fraser (2010). Biometrika12. Fraser, Reid, Elisabetta Marras, Grace Yun-Yi (2010). JRSSB13. Reid and Ye Sun (2009). Communications in Statistics14. J. Heinrich (2003). Phystat Proceedings15. C. Varin, C. Czado (2010). Biostatistics.16. D.Cox, Reid (2004). Biometrika.17. CL references in C.Varin, D.Firth, Reid (2010). Submitted for publication.18. Account of FDR and astronomy taken from Lindsay et al (2004). NSF Report on the Future of
Statistics19. Miller et al. (2001). Science.20. Photo: http://epiac1216.wordpress.com/2008/09/23/origins-of-the-phrase-pie-in-the-sky/ 21. Photo: http://www.bankofcanada.ca/en/banknotes/legislation/images/023361-lg.jpg