Influence Functions for Testing

Influence Functions for TestingAuthor(s): Diane LambertSource: Journal of the American Statistical Association, Vol. 76, No. 375 (Sep., 1981), pp. 649-657Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2287525 .

Accessed: 16/06/2014 08:13

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 62.122.72.154 on Mon, 16 Jun 2014 08:13:21 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=astata

http://www.jstor.org/stable/2287525?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


Influence Functions for Testing DIANE LAMBERT*

Influence functions for testing are defined by applying Hampel's influence function to transformed P values. These influence functions describe the effect of an observation and an underlying distribution on the behavior of a test. Because these influence functions are not based on the test statistic alone, they are applicable to both unconditional and conditional (e.g., randomization) tests. The influence function of a transformed P value at a null hypothesis distribution is related to an influence function based on limiting power. The influence functions of several one- and two-sample tests for location are discussed to illustrate the utility of this approach.

KEY WORDS: Influence function; P value; Robust test; Bahadur exact slope; Approximate slope.

1. INTRODUCTION

When studying the robustness of a statistical test, the classic statistician may first consider the Neyman-Pear- son approach to testing (see, for example, Huber 1965, 1968). From that perspective, a test results in a simple dichotomy: accept or reject. If a test is robust, then small or infrequent variations in the data should not be able to reverse the test decision. Ylvisaker (1977) defined the resistance of a test to be the smallest proportion p such that a sample of size n(1 - p) + 1 can always determine the decision of the test regardless of the values of the remaining np - 1 observations. The resistance of a test is analogous to the breakdown point of an estimator.

In the context of estimation, the breakdown point of an estimator is supplemented with an influence function or sensitivity curve that describes the effect of any observation x on the estimator. In the Neyman-Pearson set- ting, however, the effect can only be described in terms of reversal or nonreversal of the decision, and any such effects are likely to depend highly on the error probabilities chosen. A reasonable measure of sensitivity of a test to an observation x requires a refinement of the binary Neyman-Pearson scale.

It is natural to augment the accept-reject decision scale with a measure of the strength of the evidence in the data in favor of the decision taken. Or, equivalently, the binary

* Diane Lambert is Assistant Professor, Department of Statistics, Carnegie-Mellon University, Pittsburgh, PA 15213. This paper is based on a portion of her doctoral dissertation written under the direction of W.J. Hall at the University of Rochester. The research for it was sup- ported by the U.S. Army Research Office through the University of Rochester and Office of Naval Research through Carnegie-Mellon University. The author thanks Hall for his generous guidance and J.B. Kadane and the referees for their help with the exposition and the revision.

scale may be replaced by a measure of the strength of the evidence against the decision to reject Ho. That is, the effect of an observation of x on a test may be described by its effect on the observed significance level or P value. In this formulation the size and direction of the effect may be calculated without reference to the error probabilities chosen. Such independence is convenient and is related to the fact that a P value may be compared to any Type I error level.

Section 2 discusses the behavior of the P value under null and alternative hypotheses. Hampel's influence function is discussed in Section 3 and applied to P values under alternative distributions in Section 4. The relationship between the influence function of the P value and the influence function of the test statistic is also described in Section 4. Several examples, with discussion, are presented in Sections 5 and 6. In Section 7 the influence function for P values is extended to the distributions of the null hypothesis and related to an influence function based on asymptotic power.

2. PROPERTIES OF P VALUES

The P value is the maximum probability under the null hypothesis of observing a value of the test statistic at least as extreme as the one obtained. Let {PH, 0 E O} formally represent a set of distributions of interest and consider the null hypothesis H: 0 E Oo, 0o C 0. Suppose that for each sample size n there is a real-valued statistic Tn for testing H. (A P value can be defined without a potential sequence of test statistics envisaged, but this framework facilitates the work below.) Assume Tn has been chosen so that large values indicate significant departures from H. Let the left continuous cdf of Tn under Po be H,(-; 0). The P value or observed significance level P, is defined as supe,O0 (1 - H,(T,,; 0)). Hereafter, if X E [0, 1], let X = 1 - X, for example, P, = sup H,n(Tn; 0).

The statistic Pn defined above is sometimes called an exact P value to distinguish it from an approximate P value for which Hn is replaced by an approximation Hn(a). An approximate P value must be used if Hn is unknown or untabulated. If the distribution of Tn is conditioned on a statistic Un, then a conditional P value is defined by replacing Hn(-; 0) with Hn(- I Un; 0).

The null distribution of an exact P value is given by Bahadur and Raghavachari (1970) and Kempthorne and

? Journal of the American Statistical Association September 1981, Volume 76, Number 375

Theory and Methods Section

649



650 Journal of the American Statistical Association, September 1981

Folks (1971). Simply stated, for any test statistic and any sample size, an exact unconditional or conditional P value is stochastically larger than or equal to a uniform [0, 1] random variable under any Po with 0 E OO.

In typical cases small values of P, reflect departures from H, so P, is stochastically smaller than a uniform [0, 1] random variable when 0 E OO. How much smaller depends on the nonnull distribution of P,n which in turn depends on the alternative and the sample size. Under Po, the cdf of P, evaluated at an achievable level (or point of support of Pn) x, P9(P, - x), equals the power at Po of the test based on T, with Type I error x. Like the power function, the nonnull finite sample cdf of P, is often intractable.

Bahadur (1960) investigated the asymptotic nonnull distribution of P,, but only for the sign, t, and sample mean tests under normal alternatives. His work has been extended to exact and approximate, unconditional and conditional P values in the general case by Lambert and Hall (1981). In particular, if T, is asymptotically normal and if its exact null cdf is approximated by the normal cdf, then the approximate P value p'(a) is asymptotically lognormal. In general, any P value is asymptotically lognormal under an alternative Po if T, is asymptotically normal and if H, is the tail of a cdf satisfying weak regularity conditions, such as convergence of third moments for rank and permutation P values (see Lambert and Hall 1981 for details). In all examples in this paper, the P values are asymptotically lognormal; that is, for some positive constants c(o) and T(o), n - 1/2 (log P, + nc(0))IT(0) is

asymptotically a normal (0, 1) random variable. The parameters ( - nc(0), nT2(0)) of the asymptotic dis-

tribution summarize the nonnull behavior of the P value. The parameter c(o) is the (Bahadur half-) slope (Bahadur 1971), or exponential rate at which the P value approaches zero. That is, P,n which may be approximate or exact, has slope c(o) at Po if lim( - n'- log Pn) = c(o) a.s. Po. For example, see Bahadur (1971) or below.

When P, is an approximate P value, the corresponding slope is sometimes called an approximate slope (Bahadur 1971). Approximate refers to the substitution H,(a) for the exact null cdf H, in the evaluation of P, rather than to a relationship between c(o) and c(a)(0). Even if H,(a) is the limit of the sequence of H,'s, c(a)(0) need not be a good approximation to c(o) (cf. Gleser 1964). Hence, only exact slopes should be used to evaluate the performance of exact P values. Similarly, only approximate slopes (and not exact slopes) are appropriate whenever approximate P values are used for large samples. In the following sections, the term slope is used without the qualifier exact or approximate; the type of the slope is assumed to agree with the type of the sequence of P values.

Together, the slope c(o) and the standard deviation T(o), which are typically not constant in 0, provide an indication of the asymptotic power of the test. For if P, is asymptotically lognormal and the level a is approximately zerot the power at Po corresponding to a is approximately

?(+n(n tlog a + c(0))IT(0)), where 1 is the normal (0,

1) cdf. It will be shown that the parameters c(O) and T(O) are also useful for studying test robustness.

3. HAMPEL'S INFLUENCE FUNCTION FOR ESTIMATORS

Hampel (1974) gives a detailed exposition of influence functions in the context of estimation. Suppose an estimator Tn can be represented as a sample-size-independent functional T of the empirical cdf Fn. The sample mean T= X is one example since X T(F,) with T(F) = f xdF(x). To measure the effect on T, of disturbing the model cdf F by an infinitesimal amount of contamination at the point x, Hampel defines the influence function of T by

Q(x; F) = lim e-'(T(iF + E5x) - T(F)), (1) e---O

where 5x is the cdf that gives mass one to x. For T(F) as above, Q(x; F) = x - T(F).

Mallows (1975) states that if Tn is based on a random sample XI, . . . , X, from F, then often

Tn = T(F) + n-I E Q(Xi; F) + op(n-1/2). (2)

He does not specify general conditions that imply (2). But if (2) is valid, Tn is approximately equal to T(F), the parameter it estimates, plus the average of the disturbing influences of the observations.

Estimators T with influence function Q with the following properties are desirable: Q should be bounded (to protect against outliers), continuous in x (to protect against slight errors such as rounding), and continuous in F (to insure stability near F). In this last statement and elsewhere, convergence is understood to be in the sense of the weak topology.

4. AN INFLUENCE FUNCTION FOR TESTING Since a P value Pn is commonly asymptotically zero

under many distributions that may or may not be close to the target alternative Po, applying an influence function, which measures absolute rather than relative influences, to Pn is inappropriate. However, an influence function for - n l'log 9Pn both describes the extent of the error in the observed significance level relative to its size and accounts for the fact that Pn is typically exponentially small. Because - n -Ilog 9Pn cannot, in general, be represented as a sample-size-independent functional of the empirical cdf, we first define the influence function for its limit, the slope, and then relate it to the finite sample situation.

Definition 1. Let PF, P2, ... be any sequence of P values (exact or approximate, unconditional or conditional) that has a slope c(Q) under the alternative Q. Let 5x be the cdf that gives mass one to x, and assume that FI has a slope c(QE) at Q\ = - Q + . for every sufficiently small e > 0. Then the influence function of FPn at the alternative Q is defined by Q(x; Q) = limf 0&- '(c(QE)



Lambert: Influence Functions for Testing 651

- c(Q)), whenever the limit exists for all x in the sample space.

In effect, the testing problem is converted into one of estimating the slope by a log transformed observed significance level. Examples are given below.

For unconditional P values, the influence functions of the test and of the test statistic differ by a factor de- pending only on Q. To be specific, suppose that the P value P, depends on the data only through the statistic Tn

T(Fj), where F, is the empirical cdf. Then the slope of P, under the alternative cdf Q equals d(T(Q)) for a function d that does not depend on Q. Suppose that d has a derivative d' that is continuous at T(Q) and that T has influence function QT. It then follows easily that the influence function of the test is d'(T(Q))fT(x; Q).

Thus for fixed Q the influence function of an unconditional test has the same qualitative behavior (e.g., boundedness and continuity properties) as the influence function of its test statistic. In particular, the influence functions for unconditional exact and approximate P values will differ only by the factor d'(T(Q)). (That they are not identical is reasonable since the tails of different cdf's are involved.)

There is no simple relationship between the influence of a conditional P value, which does not depend on the data through the test statistic alone, and the influence function of its test statistic. Because the null distribution H, depends on the data, the slope of a conditional P value typically equals d(T(Q), Q) for some nonnegative function d, and the qualitative behavior of the influence function of a conditional test cannot be inferred from the behavior of the influence function of its test statistic. For example, the influence functions of the permutation test discussed in section 6.2 are bounded from above, while those of its test statistic are unbounded from above and below.

Although the influence function of a test is defined in terms of its slope, it has applications to finite samples. For if Xl, . . . , X, is a random sample from Q and the slope of P, at Q is d(T(Q), Q), then typically -n'- log P, - d(T, Q) = o(n"-12) a.s. (cf. Lambert and Hall 1981). Consequently, the influence function of Definition 1 should describe the behavior of the P value reasonably well. If P, is also asymptotically lognormal (- nd(T(Q), Q), T 2(Q)) and EfI2(Xi; Q) = T2(Q) (these conditions are often satisfied), then

-n logP, = d(T(Q), Q) + n'- l , (Xi; Q) + o,(n- 1/2). (3)

That is, the influence function describes the extent to which an observation changes the size of the P value; the P value is approximately equal to its model value exp( - nd(T(Q), Q)) modified by a factor determined by the average of the influences of the data. Since in an accept-reject situation the F value is compared to a spec- ified level of Type I error, the influence function, together with the slope, provides a crude indication of the test

decision under the alternative for a particular sample. The variances T2(Q) of the influences of the observations provide a second-order measure of test efficiency for tests whose slopes are equal or nearly equal (Lambert and Hall 1981).

Whether a particular influence function is desirable depends on how the P value Pn is to be used. If P, is regarded as empirical evidence of significance, then erroneous deflation of P, may be as troublesome as erroneous inflation of P, under an alternative. From this perspective, influence functions that are bounded and continuous in x and Q are desirable. However, if P, is reduced to an accept-reject decision and its numerical value is of no concern, then deflation of P, under the alternative only increases the likelihood of a correct decision. The fact that the smallness of P, may be exag- gerated by erroneous observations is (perhaps) unimpor- tant in this framework. Hence, influence functions that are unbounded above and bounded below under the alternative are desirable. However, it should be verified that the influence is not unbounded above under the null hypothesis also. (See Sec. 7 below.)

5. ONE-SAMPLE EXAMPLES Throughout this section, Xl, . . . , X, denotes a sample

of iid observations. Under the null hypothesis H, the Xi's are from a normal (0, &') distribution. The alternative distribution Q (not necessarily normal) has mean 0 > 0 and variance v2. The 1, Student's t, sign, Wilcoxon, and Huber's robust tests are considered here. Table 1 gives the slopes c(Q) and the influence functions Q(x; Q). The slopes, and their exact or approximate counterparts, can be found in Bahadur (1971) or obtained by using the methods therein. The influence functions Ql(x; Q) are obtained from the influence function of the test statistic and the derivative of the slope as in Section 4. Figure 1 graphs the influence functions for the case when Q is normal (1, 1). In cases in which the exact null distribution is likely to be approximated by a normal distribution, as with Huber's test, the table and the graph are for the approximate P value. All the P values are asymptotically lognormal, and all the influence functions satisfy (3) with the o(n 1/2) term holding a.s. (Lambert 1977). These influence functions are discussed in detail below.

5.1 X Test

Like the influence function for the estimator X, %t(x; Q) is unbounded, continuous in X, and discontinuous in Q. Hence, for reasons described above, the P value F(V- X/C) can be expected to be sensitive to gross errors, insensitive to rounding and grouping, and unstable under small changes in Q.

5.2 Student's t Test The first component of the influence function, 0(x -

0)I(V2 + 02), reflects the linear effect of an outlier x on the numerator X; the second indicates the quadratic effect




Table 1. Slopes and Influence Functions for Tests of H: Normal (0, .2) at Alternative Q With Mean 0, Variance v2

Test Bahadur Slope Influence Function

X 02/uf2 O(X- 0)/u2

Student's t logv2 0/v (y _ 0 (y2 1))

Huber's robust censoring at a, b (p1 - p1o)2/Uo2 (1 - po)(X* - Io)/Uo2, x* = median(a, x, b) approximate P value = x* dF(x/u), Uo2 = f (x* - p) d(F(x/u)

,u = f x* dQ(x)

Sign 2Q(O) log Q(O) + 20(0) log Q(O) + 2 log 2 log Q(?) (I(x> O) - (O))

Wilcoxon approximate P value 3W2(Q) 3W(Q)(1 - 2Q(-x) - 2W(Q))

W(Q)= 1(Q(x) - Q(-x)) dQ(x)

of x on the sample standard deviation S,. As x increases, the effect on S,, predominates so that the influence function fQ,(x; Q) -> - oX as x - ?+ oo. It is as if large values of x indicate that the standard deviation is large rather than that the mean is positive, and hence such data do not provide significant evidence against H. This influence function is bounded above, so an outlier cannot pull the

P value arbitrarily close to zero under the normal alternative..

5.3 Huber's Robust Test Huber (1965, 1968) constructed a minimax test for the

composite null hypothesis PO* =- Po + 8G, G arbitrary

Figure 1. One-Sample Influence Functions: N(O, 1) Versus N(1, 1) 3

I ,

~~~~~~~~~~~~~~~~............ -3 -2 -1 01 2 3 4

- Students t -x

/ --- Huber's Robust Wilcoxon

-4

-5-




and unspecified, against the composite alternative Po* = bPo + BG, 0 > 0, Po(-) = D(( - 0)/cr). These hypotheses allow for contamination of any kind by an amount 8 or less. (Huber's test is actually minimax for wider null and alternative hypotheses, but this is unim- portant here; cf. Huber 1968.)

The test statistic Tn* can be written as Tn* = n- Xi*, Xi* = median (a, b, xi) where a = (-kcr + 0/2)! VO, b = (kcr + 01V2)1VO, and k is a decreasing function of the maximum level of allowable contamination 5. Since the exact null distribution is intractable, an approximate P value based on a normal approximation is likely to be used in practice. The corresponding slope is then given by Mill's ratio.

The influence function flr(x; Q) is continuous in x and Q. The values of its tails (which are flat) and the slope of the middle section both depend on k and thus on B. As 8 increases (and k decreases), the point at which x will be suspected of being an outlier, and its influence therefore bounded, becomes smaller in absolute value, and the weight or influence placed on the central observations increases. For 0 = cr = 1, Table 2 shows the effect of 5.

5.4 Sign Test

The influence function for the sign test l,Q is a limiting form of the influence function for the robust test Qr. As the level of contamination 8 increases, the censoring points a, b from the robust test statistic approach a common value c. In the limit, after standardization, Tn* becomes the proportion of observations that are larger than c. Simultaneously, the slope of the middle section of fQr increases until there is only ajump discontinuity between the tails. For example, if 0 = af = 1, then as 8 -- .277, the constants a, b -- .5. However, Qr and 1i, are centered around different points (.5 and .0, respectively), because the robust test is designed for the alternative 0 2 1 rather than for 0 > 0.

5.5 Wilcoxon Test

To obtain the influence function for the Wilcoxon test, rewrite the test statistic Wn = n - " , R(f Xi I) sgn(Xi), R(I Xi 1) being the rank of I Xi I in I Xi 1, . . ., I Xn 1, as Wn = nl(n + 1) f (Fn (x) - F,n(-x)) dFn (x), with Fn the empirical cdf. We consider here the approximate P value (~T3(n Wn) Since the influence function Q1w is nearly constant in

Table 2. Effect of Contamination 8 on the Robust Test

Censoring Bounds on S Points tlr(X, PO) fr(X, Pi)

.277 .5 , .5+ 1.80/(x > .5) - 1.24 -1.24, .56

.045 -.5, 1.5 1.46x* - 1.21 -1.94, .98

.012 -1 .0, 2.0 1 .1 9x* - 1 .10 - 2.29, 1.28

.0028 -1.5, 2.5 1.08x* - 1.04 -2.66, 1.64

its tails, all extreme observations are treated almost equally, as they would be by Huber's test. But unlike the influence function for Huber's test, 0, is strictly increas- ing in x. In effect, the Wilcoxon P value preserves the ordering of all "outliers," but the Huber P value does not.

6. MULTISAMPLE INFLUENCE FUNCTIONS

The influence function may be generalized to testing with k independent samples as follows. Let each observation in the ith sample have cdf Fi under the null hypothesis and cdf Gi under the alternative hypothesis i = 1, ... , k. Define the function F with values on [0, I]k by F(xl, . . . , Xk) = (F(xl), . * , Fk(xk)) and define a function G = (GI, . .. , GO) similarly.

Definition 2. Let PN be an exact or approximate, unconditional or conditional k-sample P value with slope c(G). For 1 - i s k, let G(ij) be G with Gi + Eb,i replacing Gi. Assume c(G(l),) is defined for all E sufficiently small. Then the ith sample influence function of PN is tQi(xi; G) = lime e o(c(G(i)E) - c((G))IE.

The ith sample influence function fQi describes the effect of contaminating the model alternative Gi for the ith sample by an infinitesimal amount of mass at xi.

Although in practice every marginal Gi could be con- taminated, the following expansion, which generalizes (3), shows that it is sufficient to consider each Gi sepa- rately to evaluate the local stability of the P value. Given a random sample Xi,, . . . , Xi,NX, from Gi, 0 < Xi < 1, i = 1.. . kg E Xi = 1, typically

k

-N-' logPN = c(G) + (N i)I i=1 (5)

NXA

x , fi(Xij; G) + op(N- 1/2). j=1

As with (3), general conditions for (5) are difficult to establish unless Pn is asymptotically lognormal. All P values in the examples to follow are asymptotically lognormal and satisfy (5) (Lambert and Hall 1981). If a simultaneous, multisample influence function is of interest, then (5) suggests that the natural candidate, evaluated at X1, . .. . Xk under G,, . . .., Gk, is E XiflQi(xi; G).

In the following examples, k = 2, K1 = A, and X2 = X. The first random sample is denoted X1, .. , XNA and the second Y1, . . . , YN,. Under the null hypothesis, F1 = F2 is the normal (KL, cr2) cdf; under the alternative, G1 = F1 and G2 is the normal (i + A, Cr2) cdf with A > 0. The common location ,u is unspecified. The slopes and influence functions for several well-known tests under these conditions are listed in Table 3. For each unconditional test the influence function for the second sample is obtained from the derivative of the slope and the influence function of the test statistic; the influence function of the permutation test is obtained by a method described in Section 6.5.




Table 3. Slopes and Second-Sample Influence Functions for Two-Sample Tests of H: Common Normal ([, c2) Distribution at Normal (pt, c2), Normal (p. + A, Cr2) (A > 0) Alternatives

Test Bahadur Slope Influence Function (z = (y - p. - A)/u)

Y-? X xi1A2 XAAzIuA

Student's t, pooled variance log (1 + 2A) 1 + (A/z)2 ( - ( ))

3(2W(A) - X)2 Wilcoxon approximate P AA A A(2 \ )(()-(l( ) value 'XX \120 Vu - 6) (D (z - (VD )

W(A) = AAJ F(u).(u /oIJ) du

?(DX - /\I() - I (X > Y >

Normal scores and van der A V2(A)/X, xV(A) x A/)- 1(Hx - AM dx Waerden approximate P V r f( ( (x))) value V(A) = J -l(H(u))(+(u - A/or) du

H(u) = XA(u) + )A(u - A/or)

Permutation - 2f log(Xe(A/T)(Y /2T) + X) dH(u) -X log(X + Xe- (Y)(z+A/2U)) + X f log(x + Xe- (A/2)(x+A/2(Y))4(x) dx

Figure 2. Two-Sample Influence Functions: Lambda = .5 N(O, 1), N(O, 1) Versus N(O, 1), N(.5, 1) 0.6-

-.4

-.6

/ -.8 ~~~~~~~~~~~~~~~~~~~- - - Y - X Permutation

/~~~~~~~~~~~~~~~ Student'st

/ - ~~~~~~~~~~~~~~~~~- Wilcoxon

-1 .2




Figure 2 graphs the influence functions for K = A = 2, = 0, = 1. The normal scores and van der Waerden functions are omitted because, as described below, they are very close to the permutation influence function. In the following discussion, the sample mean, Student's t, and Wilcoxon tests are not considered in detail because they behave like their one-sample analogs. (Further details and the first-sample influence functions of all these tests are given in Lambert 1977.)

6.1 Normal Scores and van der Waerden Tests The van der Waerden statistic is VN = (NX)-1

( -1(R(Yj)I(N + 1)) = f -1((NXF1N(U) + NXF2N(U))I N + 1) dF2N(u), with R( Yj) the rank of Yj in the combined sample of N. An approximate P value is (D(NXVNIAN) where TN2 = N-' JN f'{F(iIN + 1)}2.

The normal scores P value, which is locally most pow- erful, has the test statistic (NX)-' E EN(R( Yj)), EN(i) being the ith expected order statistic of a sample of size N from a normal (0, 1) distribution. This P value has the same limiting behavior as the P value for VN. In particular, both tests have the same influence functions and all the statements below hold for either test.

Let Qvl and QV2 denote the first- and second-sample influence functions. The Qvi are continuous in y and at G. The proof, which is an application of the dominated and monotone convergence theorems, is given in Lam- bert (1977). The continuity insures insensitivity to slight rounding and suggests that efficiency does not deteriorate quickly if GI, G2 are only slightly disturbed.

The influence functions fQvi are bounded above; Qvl diverges as x -X 00 and Qv2 diverges as y >o -X (Lambert 1977).

Heuristically, if A > 0 the maximum of the combined sample of N is likely to be a Yj, its exact magnitude being inconsequential. Since y -X oo merely forces the maximum to be from the second sample, it is plausible that Qv2 is bounded from above. This boundedness suggests that the observed significance levels are resistant to right tail outliers among the Yj's. Moreover, Pitman efficiency has not been sacrificed to realize this protection since the tests have Pitman efficiency one with respect to the optimal Y - X test. (Recall, however, that this boundedness is not desirable from an accept-reject testing perspective.)

On the other hand, if y -* - oo, the rank one will eventually belong to the second sample, contrary to expec- tation under G. For any finite N the inclusion of (-F1(1/ (N + 1)) in the test statistic will have a finite effect on VN and the log P value, but asymptotically in N the effect is unbounded. In fact, even though the tests depend only on the ranks and not the magnitudes of the observations, fv2 diverges as fast as the Y - X influence function f02 as y -> - Xo (Lambert 1977). Nevertheless, the continuity of the fQvi in G and their one-sided boundedness do favor the normal scores and van der Waerden tests over the Y - X test. For values of A that are small relative to N, fZv2 has

the following form. For y E (-1/A, 1/A), p = 0, c = 1,

fV2(Y) = XAA/[(y- A) - (XA(y2 - 1)/2) (6) + (XA2(X - X)(3y + y3)l6] + O(A4y3)

Hence, on the interval (- 1/A, 1/A) the influence functions of the normal scores, t, and Y - X P values are identical up to terms of order A. This is to be expected since all these tests have Pitman efficiency one. The influence functions ft2 and QV2 coincide up to terms of order A2. Hence, at least for moderate values of y, the normal scores test is closer to the t test than to the Y - X test. The boundedness and continuity properties of QV2 favor the normal scores and van der Waerden tests over the t test.

6.2 Permutation Test It has been claimed that the conditional permutation

P value is insensitive to small perturbations of the underlying alternative model (e.g., Box and Andersen 1955). Oden and Wedel (1975) showed that if the parametric hypotheses are enlarged to include all distributions with densities near the hypothetical ones, then only permutation tests are unbiased. On the basis of this result, they suggested using the permutation test based on Y - X. Despite such assurances that permutation tests are robust, the effect on such tests of a shift from the model alternative G to some nearby distribution is obscure because any observational error affects every point in the support of the permutation distribution.

Although the following considerations may be generalized (Lambert 1977), only the Y - X permutation P value is considered here. The ith sample influence function is the derivative with respect to E of the slope cp(G(i)E). This slope, which can be obtained by the method of Stone (1969), is

cp(Q) = -Xflog(1 + ble b2u)dQ2(u)

- f log(1 + bK-le b2u) dQI(u)

-K log K - A log A

where bI and b2 are the functions of Q defined by A = f (1 + bie-b2u)b1 dH(u), A f udQ2(u) = f u(l + ble -b2u)1 dH(u), and H = XQI + XQ2. At G, b2 = A/ o2 and bI = (WXI) exp(A(p + A12)1u2).

The resulting second-sample permutation influence function fQP is continuous in y and G1 but discontinuous at G2; QPl is continuous in y and G2 but discontinuous at G1. These discontinuities contrast unfavorably with continuity of the normal scores influence functions at G.

The Taylor expansion (6) is valid for fP2 also, so the permutation and normal scores P values treat central observations similarly. Moreover, like fQV and fQV2, the permutation test influence functions are bounded above under the normal alternative and diverge as fast as the Y - X influence functions from below. The one-sided boundedness of QP has the following




explanation. Recall that the permutation distribution is defined by first partitioning the combined sample Z1, ..., ZN into all subsets of size NX and NA, giving weight (NA)!(NA)!IN! to each partition, and computing Z(2) for each subset of size NA. The permutation P value is (NX)!(Nx)!IN! I(Z2) , Y), where sd is the family of all partitions.

Suppose there is a positive outlier y in Yl, . . ., YNA say Y1 = y. Let sly denote the collection of subsets of size NA that include Y, = y. As y increases, those values Z(2) associated with a partition in s4y increase and those associated with s - sly stay unchanged. Once y is so large that mina,(Z2)) 2 maxA _-Y((2)), there is no further change in the P value. In fact, as y increases, the permutation P value decreases until it reaches

(N I E I(2(2)2 AY~~Y

AYA syaey-Ml 2 )

= PN*,

where Ay denotes a subset of size NA corresponding to a partition in sy For any N this PN* is the same as X times the permutation P value PN- I calculated from XI, * , XNA and Y2, ... , YNX. Since -N' log PN* = -N log PN-1 - N log X, the outlier y essentially only reduces the size of the second sample to NA - 1 when N is large. In other words, the influence of a right tail outlier y is bounded.

Similarly, as y decreases through negative values, the observed permutation P-value eventually reaches

(NAY'{(zA 1) NX/ NA

+ ZIV >- Zj YJ_ 3 + PN*. AY je-Ay-{ 1 2

Roughly speaking, PN* behaves like a permutation P value from the model distributions GI and G2, so that for N large PN* is exponentially small. Hence, as N increases, X dominates PN* and the outlier y pulls the observed permutation P value towards X, regardless of the other observations; that is, as with the normal scores P value, the effect on the permutation P value of an outlier in the second sample intensifies as the combined sample size increases. Hence, the unboundedness of fQ2 describes the behavior of the P value of the Y - X permutation test accurately.

The above examples demonstrate that the influence functions for tests provide indications of the robustness of the observed significance level. The present approach seems especially valuable in situations in which the effects of inaccuracies are not otherwise apparent, as with the permutation P value. The influence function is also useful for ranking competing tests in terms of their robustness. For example, the permutation and normal

scores tests are both Pitman efficient, distribution-free tests. Their influence functions exhibit much similarity. Yet the normal scores influence function is continuous in the alternative and the permutation influence function is not, suggesting that the normal scores P value is more stable.

The influence function also provides new insights into the relationship of tests with Pitman efficiency one. For example, the progression from the sample mean, to the Student's t, to the normal scores and permutation tests is evident from the Taylor expansions of their influence functions.

7. INFLUENCE FUNCTIONS UNDER THE NULL HYPOTHESIS

If the P value is used as an empirical measure of the weight of the evidence in favor of the null hypothesis, and if one subscribes to the view that the null-hypothesis is never precisely true, then it may be sufficient to evaluate the influence function(s) of the test under the alternative hypothesis. In general, however, it would seem appropriate to evaluate the effect of the data under the null hypothesis as well as under the alternative.

Unfortunately, the influence function of Definition 1 is not always useful under the null hypothesis. For all the examples of Sections 5 and 6, the influence functions under the null hypothesis are identically zero.

The fault lies with the choice of the logarithmic scale for the comparison of P values. That choice is traditional, but it introduces a discontinuity in the transition from alternative to null hypotheses in typical cases in which the statistic - log P, is asymptotically exponential under the null hypothesis and asymptotically normal under alternatives.

It is suggested in Lambert (1977) that the transformation - CF - ' (P,) is preferable to - n - 'log P, if the statistic T, is asymptotically normal. Like the log transformation, the - tr transformation provides a common scale, which is not always identical to the scale of the test statistic, for all unconditional and conditional P values. If T, has a normal distribution under H, then - '-'(P,) = T, has a normal distribution for all sample sizes under H, and if - n 1 log P, is asymptotically normal under alternatives, then it is shown that -(D -'(P,) continues to be asymptotically normal under alternatives.

In particular, if - n'- log P, = c(Q) + o(n"- 2) under Q, then - (- -'(P,) = X(Q) + o(n"- 2) under Q where X(Q) = (2c(Q))1"2. If - n- I log P, is asymptotically normal (\/ c(Q), T2(Q)), then - 4- I(P,) is asymptotically normal (\/ X(Q), W2(Q)) where w(Q) = T(Q)/X(Q). If - n- log P, has influence function fQ(x; Q), then - 4v (P,) has influence function (2c(Q)) - 1"2f(x; Q). Thus the influence functions for both the log and - I'

transformed P values are qualitatively similar in typical cases, including those in Sections 5 and 6. The two influence functions are either both continuous in x (or Q), or both discontinuous in X (or Q), and both are bounded




or both unbounded. Moreover, if the influence function of - n'- log Pn lies above that of - n- ' log P,' for some x and Q, then the same holds for the influence functions Of - ( '(Pn) and -(C-'(P,'). The log scale was used above because it is more traditional; the - CF -' scale is used here because it allows an influence function under the null hypothesis.

For example, consider the hypotheses in Section 6. Under the null hypothesis H the observations of both samples are independently distributed normal (,u, u2); under the alternative the observations in one sample are independent normal (,u, u2) and the observations in the other sample are independent normal (iL + A, u2) for some A > 0. The first- and second-sample influence functions for the - CF -'transformed exact and approximate Wilcoxon P values under H all equal (12XX)"12(C((z - ,u)/ u) - 2). The - (C -' transformed exact and approximate P values for the Y - X, Student's t, normal scores, and Y - X permutation tests all have influence functions for either sample equal to (XX)"/2(z - p)I( under H.

The equality of the latter influence functions is not surprising. By definition, the influence function of the - CF -'transformed P value is identical to the influence function of the root slope (2c(Q))"X2. As the parameter A of the alternative distribution approaches zero, the slope of each of the four tests converges to its Pitman efficiency function. Since all four Pitman efficiency functions are equal, all four tests should have the same influence function under the null hypothesis, assuming the appropriate regularity conditions are satisfied.

Eplett (1980) defined an influence function of a test to be the influence function of its Pitman efficiency function under a sequence of location shift alternatives. Under regularity conditions (c.f. Wieand 1976), the slope of a test under a sequence of alternatives approaching the null distribution will converge to the Pitman efficiency function of the test. Thus, under regularity conditions not investigated here, Eplett's influence function may be determined from the influence function of the - CF -'transformed P value under the null hypothesis.

The advantage of the - CF ' influence function lies in its interpretation for an observable statistic, the P value; the relationship between an influence function for limiting power and the role of an observation x in the finite testing

problem is not as clear. Additionally, the - ?- F' influence function can be defined at arbitrary fixed distributions, not just at sequences of alternative distributions approaching the null distribution. The - CF-' influence function may also distinguish between tests that have Pitman efficiency one. Thus, the influence function for - C-'transformed P values is both a generalization and refinement of an influence function based on limiting power.

[Received January 1979. Revised January 1981.]

REFERENCES BAHADUR, R.R. (1960), "Simultaneous Comparison of the Optimum

and Sign Tests of a Normal Mean," in Contributions to Probability and Statistics, ed. I. Olkin et al., Stanford: Stanford University Press, 77-88.

(1971), Some Limit Theorems in Statistics, Philadelphia: SIAM. BAHADUR, R.R., and RAGHAVACHARI, M. (1970), Proceedings

of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley: University of California Press, 129-152.

BOX, G.E.P., and ANDERSEN, S.L. (1955), "Permutation Theory in the Derivation of Robust Criteria and the Study of Departures From Assumption," Journal of the Royal Statistical Society, Ser. B, 17, 1-34.

EPLETT, W.J.R. (1980), "An Influence Curve for Two-Sample Rank Tests," Journal of the Royal Statistical Society, Ser. B, 42, 64-70.

GLESER, L.J. (1964), "On a Measure of Test Efficiency Proposed by R.R. Bahadur," Annals of Mathematical Statistics, 35, 1537-1544.

HAMPEL, F.R. (1974), "The Influence Curve and Its Role in Robust Estimation," Journal of the American Statistical Association, 69, 383-393.

HUBER, P.J. (1965), "A Robust Version of the Probability Ratio Test," Annals of Mathematical Statistics, 36, 1753-1758.

(1968), "Robust Confidence Limits," Zeitschrift far Wahr- scheinlichkeitstheorie und Verwandte Gebiete, 10, 269-278.

KEMPTHORNE, D., and FOLKS, L. (1971), Probability, Statistics and Data Analysis, Ames, Iowa: Iowa State University Press.

LAMBERT, D. (1977), "P-values: Asymptotics and Robustness," un- published Ph.D. thesis, University of Rochester.

LAMBERT, D., and HALL, W.J. (1981), "Asymptotic Lognormality of P-values," Annals of Statistics, in press.

MALLOWS, C.L. (1975), "On Some Topics in Robustness," presented at Institute of Mathematical Statistics meeting, Rochester, N.Y.

ODEN, A., and WEDEL, H. (1975), "Arguments for Fisher's Per- mutation Test," Annals of Statistics, 3, 518-520.

STONE, M. (1969), "Approximations to Extreme Tail Probabilities for Sampling Without Replacement," Proceedings of the Cambridge Philosophical Society, 66, 587-606.

WIEAND, H.S. (1976), "A Condition Under Which the Pitman and Bahadur Approaches to Efficiency Coincide," Annals of Statistics, 4, 1003-1011.

YLVISAKER, D. (1977), "Test Resistance," Journal of the American Statistical Association, 72, 551-556.



Date post:	20-Jan-2017
Category:	Documents
Upload:	diane-lambert
View:	214 times
Download:	2 times

Influence Functions for Testing

Documents