+ All Categories
Home > Documents > On the Probability of Observing Misleading Statistical Evidence: Comment

On the Probability of Observing Misleading Statistical Evidence: Comment

Date post: 20-Jan-2017
Category:
Upload: michael-evans
View: 214 times
Download: 0 times
Share this document with a friend
3
On the Probability of Observing Misleading Statistical Evidence: Comment Author(s): Michael Evans Source: Journal of the American Statistical Association, Vol. 95, No. 451 (Sep., 2000), pp. 768- 769 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2669457 . Accessed: 14/06/2014 04:44 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 195.78.108.81 on Sat, 14 Jun 2014 04:44:39 AM All use subject to JSTOR Terms and Conditions
Transcript

On the Probability of Observing Misleading Statistical Evidence: CommentAuthor(s): Michael EvansSource: Journal of the American Statistical Association, Vol. 95, No. 451 (Sep., 2000), pp. 768-769Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2669457 .

Accessed: 14/06/2014 04:44

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 195.78.108.81 on Sat, 14 Jun 2014 04:44:39 AMAll use subject to JSTOR Terms and Conditions

768 Journal of the American Statistical Association, September 2000

Hacking, I. (1965), Logic of Statistical Ihference, New York: Cambridge University Press.

Jeffreys, H. (1961), Theory of Probability (3rd ed.), Oxford, U.K.: Oxford University Press.

Kalbfleisch, J. D., and Sprott, D. A. (1970), "Application of Likelihood Methods to Models Involving Large Numbers of Parameters" (with dis- cussion), Journal of the Royal Statistical Society, Ser. B, 32, 175-208.

Kass, R. E., and Raftery, A. E. (1995), "Bayes Factors," Journal of the Ameericani Statistical Association, 90, 773-795.

Morrison, D. E., and Henkel, R. E. (1970), The Significance Test Contro- versy, Chicago: Aldine.

Robbins, H. (1970), "Statistical Methods Related to the Law of the Iterated Logarithm," Annals of Mathematical Statistics, 41, 1397-1409.

Royall, R. M. (1994), "The Elusive Concept of Statistical Evidence," in Bayesian Statistics 4, eds. J. M. Bernardo, J. 0. Berger, A. P. Dawid, and A. F. M. Smith, Oxford, U.K.: Clarendon Press.

(1997), Statistical Evidence: A Likelihood Paradigm, London: Chapman and Hall.

(1998), "The Likelihood Paradigm for Statistical Evidence," pre- sented at Ecological Society of America Symposium on The Nature of Statistical Evidence, August 3, 1998, Baltimore.

Smith, C. A. B. (1953), "The Detection of Linkage in Human Genetics," Journal of the Royal Statistical Society, Ser. B, 15, 153-192.

Tsou, T.-S., and Royall, R. M. (1995), "Robust Likelihoods," Journal of the American Statistical Association, 90, 316-320.

Wald, A. (1947), Sequential Analysis, New York: John Wiley.

Comment Michael EVANS

This is an interesting article that addresses fundamental issues in statistics. The author is to be commended for pro- viding a stimulating and thought-provoking set of results and ideas concerning the approach to inference via the law of the likelihood. My comments deal with the article's spe- cific contents, which I do not find controversial, and also the more controversial topic of what is a suitable definition for statistical evidence and the inferential methods that follow from this.

Resolving issues concerning fundamental concepts in statistics are some of the most important problems that a statistical researcher can consider. Alternatively, one could simply accept a particular paradigm for statistical infer- ence and then get on with the ultimate goal of any theory of statistical inference; namely, solving real world prob- lems for practitioners. Although practically this is nec- essary, it is disturbing that there is often profound dis- agreement among statisticians about what that paradigm should be. If the different paradigms all lead to the same end, then this would mitigate the effects of the disagree- ments, but in fact this is not the case, and there are many examples where such conflicts occur. For instance, com- monly advocated Bayesian and classical point null hypoth- esis testing methods can be in profound disagreement and can in fact contradict one another, as exemplified by what is sometimes called Lindley's paradox. As a good example of how the effects of such conflicts can affect the perception of statistics, one could consider the claim of Matthews (1998) that some standard statistical methodology, namely p val- ues, has been profoundly misleading in terms of its practical consequences. It is important that such issues be resolved. For me, this article is an admirable example of research that concerns itself with these kinds of issues.

Perhaps at the heart of the matter is the definition of what is meant by statistical evidence. Birnbaum (1962) did not attempt to precisely define this concept. Instead, he showed

Michael Evans is Professor, Department of Statistics, University of Toronto, Toronto, Ontario M5S 3G3, Canada (E-mail: mevans@utstat. utorontoca).

that if one accepts the justifications for applications of the sufficiency principle and the conditionality principle, then one is lead inexorably to the likelihood function as the mathematical characterization of statistical evidence. Many would accept that theorem as support for the law of likeli- hood as stated in this article. Doubts have been raised about the validity of Birnbaum's argument, however. For exam- ple, Evans, Fraser, and Monette (1986) showed that in the proof of Birnbaum's theorem, the information discarded as irrelevant for inference by sufficiency is precisely the infor- mation identified as relevant for conditioning by the con- ditionality principle. So these basic principles seem to be in conflict, which certainly undermines their roles as ax- ioms. The same article showed that conditionality alone is equivalent to the likelihood, and this proof follows from a flaw in conditionality; namely, the lack of a unique max- imal ancillary. Overall, the support for likelihood as the characterization of statistical evidence through Birnbaum's theorem does not seem very strong.

Still, one could argue that Birnbaum's justification is not needed, as we could simply accept the law of likelihood as stated in this article, because it certainly has some appeal. Of interest then are the consequences of such a charac- terization of statistical evidence, and that is the subject of Royall's article. The results are interesting and to a great extent supportive of likelihood fulfilling its role as the mea- sure of statistical evidence. In part the article is concerned with the uncertainties associated with a particular report of a likelihood ratio-namely, to what extent can we be mis- lead? The article provides a satisfying result in this regard when we choose a particular value k as the cutoff point for saying that a likelihood ratio larger than this value is strong evidence. Several things concern me with this, how- ever, and with likelihood inference as exemplified by the law of likelihood.

( 2000 American Statistical Association Journal of the American Statistical Association

September 2000, Vol. 95, No. 451, Theory and Methods

This content downloaded from 195.78.108.81 on Sat, 14 Jun 2014 04:44:39 AMAll use subject to JSTOR Terms and Conditions

Evans: Comment 769

The first concern is the ambiguity surrounding the choice of a particular k as the cutoff for saying that strong evi- dence does or does not exist for H1 over H2. The article suggests k = 32, but that seems quite arbitrary to me. Why not k = 100? Looking at likelihood ratios leaves us with the problem of calibrating these values in some natural way, and this seems problematic. For example, the well-known Trea- sure Hunt in Flatland example due to Stone (1976) presents a situation in which there are always equal nonzero likeli- hoods on four points but where one of these points forms a .75 confidence region for the true value. Evans (1989) mod- ified this to a set of examples where only two points have nonzero likelihood and the maximum likelihood estimator (MLE) has a relative likelihood that can be made as large as one likes, but as it is made larger the other point, as a confidence region, has confidence increasing to 1. This cer- tainly seems to undermine what likelihood ratios should be saying.

Perhaps the response to the foregoing concern is simply that likelihood ratios can indeed be misleading, as the arti- cle acknowledges. Given this, however, it seems to me that rather than simply reporting the ratio, one should say, as part of the inference, something about the uncertainty in the report. For example, one might also quote the value

F (X) ? f (})) (1)

namely, the probability under H1 of observing a likelihood ratio at least as large as that observed. Some quantifica- tion concerning the uncertainty inherent in what the like- lihood ratio is saying seems to be a necessary part of any acceptable theory of inference. In other words, such a quan- tification is part of the summary of statistical evidence. In quoting such a quantity, however, one violates the law of likelihood, because different models can have the same like- lihood but provide different quantifications of uncertainty. Also, just reporting an upper bound on (1) does not seem ad- equate, because there can be relatively large discrepancies, as pointed out in the article. For me, this lack of a suitable quantification of uncertainty in likelihood inferences that strictly conform to the law of likelihood is a major draw- back. Reporting the entire likelihood for this purpose suf- fers from the first problem; namely, calibrating its values. This article is clearly concerned with this issue but stops short of making a recommendation of what one should re- port beyond the ratio in an application.

Perhaps the greatest deficiency in likelihood methods, whether or not they conform to the likelihood principle, is the ambiguity concerning appropriate inference methods when there are nuisance parameters. This article also makes this point quite clearly. Further, the article offers a com- pelling reason for using the profile likelihood over at least one of its competitors. Although this is support for the pro- file likelihood, this methodology would have greater appeal if there were a simple argument that deduced it directly from the law of likelihood as a characterization of statisti- cal evidence. The fact that the profile likelihood may not be

the likelihood associated with the data on which it is based (see, e.g., Fraser 1979, p. 102) strikes me as a substantial reason to doubt the existence of such an argument. Overall, the lack of an unambiguous likelihood-based theory for in- ference in the context of nuisance parameters deepens my skepticism concerning the likelihood as the characterization of statistical evidence.

Bayesian methodology avoids these marginalization problems, at least when we restrict ourselves to proper priors. In Bayesian inference the likelihood principle is sometimes taken as an axiom. This seems unnecessary, however, as the inferences proceed via an application of conditional probability, which for me is a more fundamental concept than likelihood. Further model checking seems like a necessary component of any good application of statistical methodology whatever the particular paradigm chosen. In Bayesian contexts, appropriate model-checking methods vi- olate the likelihood principle, as they depend on the model for the data. Accordingly, I see neither the need nor the appropriateness of considering Bayesian inference as being based in some way on likelihood concepts. Also, it is not clear to me how model checking would proceed using the law of likelihood except perhaps through expansion of the model, and this is somewhat unsatisfactory, as it simply re- places checking one model by another that also needs to be checked. The appropriateness of model checking in applica- tions of statistical inference seems obvious, and so it seems necessary that any sensible theory produces methods for do- ing it and moreover accommodates it without treating it as a violation of a basic axiom. It is perhaps also worth noting that model checking can be presented as one response to a common criticism of Bayesian methods; namely, the ar- bitrariness of the prior. For any particular prior and model combination can be forced to undergo some basic checks before one proceeds to full inference. The issues are more complicated than this, but my main point is that any appro- priate measure of statistical evidence must accommodate model checking as a natural and necessary part of a statis- tician's activity, and the law of the likelihood does not seem to do this.

Royall's article is not necessarily an argument for the likelihood as the appropriate characterization of statistical evidence, although ultimately that is the question of real in- terest. It could be viewed as simply considering some conse- quences of such a definition. The results are interesting and useful and add substantially to the discussion of whether or not it is the right approach.

ADDITIONAL REFERENCES Evans, M. (1989), "An Example Concerning the Likelihood Function,"

Statistics and Probability Letters, 7, 417-418. Evans, M., Fraser, D. A. S., and Monette, G. (1986), "On Principles and

Arguments to Likelihood" (with discussion), Canadian Journal of Statis- tics, 14, 181-199.

Fraser, D. A. S. (1979), Inference and Linear Models, New York: McGraw- Hill.

Matthews, R. (1998), "The Great Health Hoax," The Sunday Telegraph, 13 September, 1998.

Stone, M. (1976), "Strong Inconsistency From Uniform Priors" (with dis- cussion), Journal of the American Statistical Association, 71, 114-125.

This content downloaded from 195.78.108.81 on Sat, 14 Jun 2014 04:44:39 AMAll use subject to JSTOR Terms and Conditions


Recommended