This article was downloaded by: [Texas State University, San Marcos]On: 28 August 2013, At: 05:23Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office:Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Communications in Statistics - Theory andMethodsPublication details, including instructions for authors and subscriptioninformation:http://www.tandfonline.com/loi/lsta20
A nonparametric comparison of two multipleregressions by means of a weighted measure ofcorrelationIbrahim Salama a & Dana Quade aa Department of Biostatistics, University of North Carolina, Chapel Hill, NC,27514Published online: 27 Jun 2007.
To cite this article: Ibrahim Salama & Dana Quade (1982) A nonparametric comparison of two multiple regressionsby means of a weighted measure of correlation, Communications in Statistics - Theory and Methods, 11:11,1185-1195
To link to this article: http://dx.doi.org/10.1080/03610928208828304
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”)contained in the publications on our platform. However, Taylor & Francis, our agents, and ourlicensors make no representations or warranties whatsoever as to the accuracy, completeness, orsuitability for any purpose of the Content. Any opinions and views expressed in this publication arethe opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis.The accuracy of the Content should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoevercaused arising directly or indirectly in connection with, in relation to or arising out of the use of theContent.
This article may be used for research, teaching, and private study purposes. Any substantialor systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, ordistribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use canbe found at http://www.tandfonline.com/page/terms-and-conditions
A NONPAKAMETKIC C0EU'AI:ISON O F TWO PtUL'IIPLE RECRESSIONS MY MANS OF A WGILII'I'LD PlEASUKE O F COKHEIATIUN
1l ra l i i111 5 a l a n ~ a and Dana Quade
DepnrLn~cnt o f B i o s t a t i s t i c s U n i v e r s i L y o f Nor th C a r o l i n a
Clrapsl Hill, NC 27514
Key Wolds and ~'hrases: n ~ u l t i p l e r e g r e s s i o n ; r ank c o r r e l a t i o n ;
S p e a r n u n ' s f o o t r u l e
Let ( R l j * - , R . ) , j = 1 , 2 , b e two r a n k i n g s o f m i t e m s ; l e t
nl J T k , k-1, ..., m, be t h e n u ~ ~ i b c r of i t e m s w i t h r a n k 5 k i n b o t h
r a n k i n g s ; and l e t T = Z'Tk/k . Tlien T is a measure of r a n k
c o r r e l a t i o n wlrich g i v e s g r e a t e r we igh t t o i t e m s o f l o w r a n k t h a n
h igh . Such a measure is p a r t i c u l a r l y u s e f u l i n c o n ~ p a r i n g t h e
o r d e r i n g of t h e r e g r e s s o r s i n two m u l t i p l e r e g r e s s i o n s . We
d i s c u s s the d i s t r l b u t l o ~ i of T , p r e s e n t i n g b o t h e x a c t t a b l e s and
i ~ r a c t i c a l a p p r o x i m a t i o n s , a d e x t e n d t h e c o n c e p t t o o t h e r
s i t u a t i u ~ i s .
1 . INI'KODIICTION -- Suppose we wish t o coinpare two p o p u l a t i o n s w i t h r e s p e c t t o
predict in^ a r e s p o n s e v a r i a b l e Y f rom r e g r e s s o r v a r i a b l e s
X, ,X2,. . . ,X I n a s k i n g whe the r t h e y a r e " s i n ~ i l a r " , however, we m'
do n o t mean w h e t h e r t h e r e g r e s s i o n coefficients are similar, s i n c e
t h e v a r i a b l e s observable i n o u r samples may o n l y b e p r o x i e s f o r
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
u n o b s e r v a b l e u n d e r l y i n g v a r i a b l e s , m e a s u r e d d i f f e r e n t l y i n t h e two
p o p u l a t i o n s . R a t h e r , we a r e c o n c e r n e d w i t h t h e r a n k i n g o f t h e X ' s
by i m p o r t a n c e . T h i s m i g h t b e e s t a b l i s h e d by p e r f o r m i n g a s t e p w i s e
r e g r e s s i o n i n e a c h s a m p l e , w i t h t h e s i m i l a r i t y o f t h e two p o p u l a -
t i o n s tlie.1 m e a s u r e d by tht . r a n k correlation. As we s e e I t , I ~ o w c v e r .
t h e u s u a l r a n k c o r r e l a t i o n c o e f f i c l e r ~ t s a r e n o t i d e a l F o r t h i s
p u r p o s e , b e c a u s e t h e y g i v e e q u a l w e i g h t t o a l l i t e m s r a n k e d . F o r
e x a m p l e , s u p p o s e we h a v e t h r e e r e g r e s s o r s X1,X2,XJ, a n d i n o n e
s a m p l e t h e s t e p w i s e p r o c e d u r e p l a c e s them i n p r e c i s e l y t h a t o r d e r .
C o n s i d e r two p o s s i b i l i t i e s f o r t h e o r d e r i n t h e o t h e r s a m p l e :
( a ) X1,X3,X2, a n d ( b ) X2,X1,X3. E i t l ~ e r p r o d u c e s t h e same r a n k
c o r r e l a t i o n by t h e w u a l m e t h o d s . B u t ( a ) , w h e r e t h e two s a m p l e s
a g r e e o n t h e f i r s t r e g r e s s o r , s u r e l y r e p r e s e n t s more s i m i l a r i t y
t h a n ( b ) , w h e r e t h e y d i s a g r e e o n t h e f i r s t r e g r p s s o r a n d a g r e e
i n s t e a d o n t h e l a s t . T l ~ u s we s e e k a r a n k c o r r e l a t i o n c o e f f i c i e n t
w h i c h g i v e s g r e a t e r w e i g h t t o i t e m s o f low t h a n h i g h r a n k .
L e t R b e t h e r a n k of t h e i - t h v a r i a b l e i n the 1-t11 pop- 1 j
u l a t i o n , f o r i = L , ..., m a n d j = 1 , 2 ; a n d l e l I
Then Tk
is the number o f r e g r e s s o r s common t o t h e two r e g r e s s i o n s
a t t h e k - t h s t e p . w h i c h i s a n i n t u i t i v e m e a s u r e o f . s i m i l a r i t y a t
t h a t s t e p . With i d e n t i c a l p o p u l a t i o n s , t n have Tk = k f o r
k - 1 , . . . . ~ n w o u l d b e t h e mos t l i k e l y r c s u l t ; o r , f i n d i n g Tk = k
f o r k = l , .... m w o u l d i n d i c a t e s t r o n g l y t l l n t the two p o p u l a t i o n s
a r e s i m i l a r . A c o n s i d e r a b E e d e v i a t i o n f rom t h i s s l t u a t i v n w o u l d
s u g g e s t a high d e g r e e o f n o n - s i m i l a r i t y .
As a s t a t i s t i c t o m e a s u r e s i m l l a r j t y o v e r a l l s t e p s we
p r o p o s e
i n w l ~ l c h e a c h Tk is d i v i d e d by k i n o r d e r t o r e f l e c t t h e
d e c r e a s i n g i m p o r t a n c e o f t h e s t e p s . The maximum v a l u e o f T
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
o c c u r s i f t h e two samples y i e l d t h e same r a n k i n g of t h e X ' S , i n
which c a s e Tk = k f o r a l l k , and T = 111. The minimuu v a l u e
o c c u r s i f ( t hough n o t o n l y l i ) ~ t r e X ' s a r e i n r e v e r s e o r d e r ; t h e n
ei.1~11 Tk = mdx(0, 2k-m), a n d
Note t h a t t (m)/m lrds t lrc same v a l u e f o r e a c h even i n t e g e r
N a s f o r t h e f o l l o w i n g odJ i n t e g e r : t ( 2 ) / 2 = t ( 3 ) / 3 = 112,
t ( 4 ) / 4 = t ( 5 ) / 5 = 5 / 1 2 , t (6116 = t ( 7 ) / 7 = 23/60, t ( 8 ) / 8 - t ( 9 ) / 9 - 307/840, e t c . I n a d d i t i o u , t(ni)/rn is n o n i n c r e a s i n g i n m, and ,
a s m + t (ni)/m -+ 1 - log,2. Thus we c o u l d r e s c a l e T t o l i e
i n Llre r a n g e (-1, 1 ) . o b t d i u i n g ( i f we u s e t h e l i ~ n i t i n g v a l u e of
L ) J. weigllLed r a n k c o r r r l o t i u n c o e f f i c i e n t C = 1 - 2(m-T)/rn(log,2),
bu t we s h a l l n o t b o t h e r t o do t h i s i n wltat f o l l o w s .
I t i s a l s o i n t e r e s t i n g t o n o t e t h a t t h e unweighted measure
S = I T k , which we would have o b t a i n e d had we n o t d i v i d e d e a c h Tk by k , i s r e l a t e d t o t h e meLrlc U Ilk - R . I whose s t a n d a r d i z e d
i l 1 2 v e r s i o n i s known a s S p e a r n ~ a n ' s f o o t r u l e ; Saldma (1981) p r o v e s t h a t
D = 111(nrtl) - 25.
2 . A 1'EST FOR SIEIII.AKI1Y
R e c a l l i n g t h a t R I s t l le r a n k of t l re i - t h v a r i a b l e i n t h e i j
j - t l ~ p a p u l a t i o n , de f i n c
pi:) = P { K = r f o r j = 1 , 2 and i , r = I .... ,rn . 11
'l'l~en we call w r i t e t h e e x p e c t a t i o n of Tki a s
s o t h a t a g e n e r a l e x p r e 6 s i o n f o r t h e e x @ c t a t i o n o t t h e s t a t i s t i c
T is
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
1188 SAIAMA AND QUAYJE
We s a y t h a t r e g r e s s i o n i n t h e j - t l ~ p o l ) u l n t i o n has ~ i n i j b r m
1 p =
* f o r i = I , . . ." *
i n w h i c h case
If r e g r e s s i o n i n o n e p o p u l a t i o n h a s u n i f o r m s t r u c t u r e
l o s s o f g e n e r a l i t y , l e t i t b e t h e f i r s t -- t l w n
W s p e c i a l c a s e o f u n d f o n n s t r u c t u r e is tvrliicm strscctu
me1 = -- 2 -
e , f o r wl~lc la
a l l r a n k i n g s o f t h e v a r i a b l e s a r e e q u a l l y I i k e l y , i . e . ,
P ( R , ~ = K ~ , . . . =r = I/"! f o r a l l p e r m u t a t i o n s ( r l , ..., r m ) sRmj m
of 1 . m T h c n u l l h y p o t h e s i s which we s h a l l test u s i n g
T is t h a t t h l s is t r u e f o r a t l e a s t o n e of t h e p o p u l a t i o n s .
The a l t e r n a t i v e we h a v e i n mind is t h a t t h e r e g r e s s i o r i s i n
the two p o p u l a t i o n s Rave non-uni f orm sfmilat' structurso, by whicl i
we mean t h a e ( 1 ) = ( 2 ) * pir pir
pir ( s a y ) f o r i , r , = 1, .... m, w h e r e
P i r 2 l/m f o r a t l e a s t o n e ( f , r ) . I'llcn
However, s i n c e
is f i x e d . i t f o l l o w s t h a t
w i t h t h e l o w e r bo11nJ l m l d f n g i f arid o n l y i f t h e comninn
r e g r e s s i o n s t r u c t r ~ r e fs u n i f o r m . H e n c e , ~ t n l l e r t h e a 1 t e r n n t i v e ,
E[T) > (m+1) /2 ; i . e . . o u r t e s t s t a t i s t i c h a s a g r e a t e r mean u n d e r Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
'IWO MULTIPLE REGRESSIONS
t h e a l t e r n a t i v e of s i m i l a r nori-uniform s t r u c t u r e s f o r t h e
r e g r e s s i o n s i n t h e two popula t iono t h a n under t h e hypothes i s of
randou~ s t r u c t u r e f o r a t l e a s t one of them.
Now c o n s i d e r some h igher moments under t h e n u l l hypothes i s .
We have
if i,ii,iB1 a l l
d i s t i n c t ,
where lsk<k'sk"Lm;thun a l i r t l e a l g e b r a y i e l d s
Thence f i n a l l y w e c a l c u l a t e
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
SALAElA AND QUADE
mtl V[T] - 12
and
The conditional distribution of Tk+l, given T Iatl~ T2=t2*
. . . , Tk=tk, and assuming that the null hypothesis is' true, may be derived as follows. Beginning with the initial condition that
P(T~-o] - (m-l)/m and P{T~=~) = l/m, suppose the joint distri-
bution of TI, ..., Tk has been found. Then define four sets of
variables :
A - (1: Rillk and ~ ~ ~ s k l , B = [i: Rillk but ~ ~ ~ > k ) ,
C = (1: Ril>k but ~ ~ ~ s k ) , D = {i: Rll>k and Ri2>k).
Note that A contains tk variables, B and C contain (k-tk)
variables each, and D contains the remaining (m-2k+tk) variables.
Then, at step (k+l), tile variable chosen in Sample 1 dust be from
C or D, end that chosen in Sample 2 must be from B or D. We have
the following possibilities:
Varinl)le Vnrinhle Prolmbi lity Resulting in Sample 1 in Sample 2 ~ ( m - t ~ ) ~ value OF Tk+l
from: from:
c n (k-tL) 2 '
C I) (k-tk) (m-2k+tk)
D B (k-tk) (m-2k+tk)
different 'variables 1 (m-2k+tk) (ln-2k+tk-1)
' [varS%Fe I ' (m-2k+tk)
Summing up,
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
TWO MULTIPLE REGRESSIONS
Thus we have derived recursive relationships by which exact tables
can be calculated. It may be seen, by the way, that the distribution
Of Tk+l actually depends only on tk. Thus {Tl,~2,. . . ,T io
a hrkov chain.
Table I presents exact critical values of T for Type I error
probability a0.10, .05, .025, .01, and .005 and number of variables
m=5(1)15. The entire exact null-hypothesis distributione of T
for m=3(1)10, and selected values from the distributions for
m=ll(l)lS, are available in Salama and Quade (1981). When outside
the range OF the table, one mlght use
as a standard normal variable. This would ignore the positive
skew~iebis in the distribution of T, however, and thus generally
produce anti-conservative P-values (too small). Instead, our
recomendation is to treat a linear function of T as a chi-
squared. Spccifically, to have (a+bT) approximately distributed 2 as x (d), equate the first three central momenta:
where for convenience the exact third aaoment of (a+bT) has been
replaced by a sligt~tly lower value. Solving theee equations
yields a - -4(mC1)/3, b = 8, and d = 8(m+1)/3, whence
This approximation fits two moments exactly and the third fairly
well.
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
SAIAMA AND QUADE
Number ol Variable1
m
TABLE I
CRITICAL VAI.UES OF THE TEST srtvrrsrlc T
Type 1 Error 13ro!)ability, a
We lllustrate these approximations at 111114. For any observed
T, the normal approximation to the P-value is P(Z 2 ( ~ - 7 . 5 ) / f i I
where Z - N(0,1), and the chi-equared approximation is P{w 2 2 8 ~ - 2 0 ) where W - x (40). Then we have, obtaining the exact
P-values from Quade and Salama (1981):
T Exact Norma 1 Chi - squared P-value Approximation Aproximnt ion
9 .lo1 32 .08386 .09682
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
For t l ~ i s number of v a r i a b l e s t h e nornlal a p p r o x i m a t i o n is s t r o n g l y
a n t i c o n s e r v a t i v e , b u t t h e cl11-squared a p p r o x i m a t i o n a p p e a r s
a d e q u ~ t e f o r p r a c t i c a l p u r p o s e s .
3. EXTIINS I Oh's AND DISCllSSION
A l t l ~ o u g h we have p roposed t h e s t a t i s t i c T a s a n i n d e x f o r
cornparlng t h e r a n k i ~ l g s of t h e r e g r e s s o r s i n two p o p u l a t i o n l ; , i t i c i
e q u a l l y w a l l s u i t e d f o r c o ~ n p a r i n g some p r e s p e c i f i r d r a n k i n g wi t i t
t h e r a n k i n g o h s e r v e d i n a s i n g l e p o p u l a t i o n . Wi thou t l o s s
g e n e r a l i t y , r e l a b e l t h e v a r i a b l e s s o t h a t t h e p r e t l p e c i f i e d
r a n k i n g b e c o m s I , . . . , m . Lc t Wi b e t h e o b s e r v e d r a n k o f
i - t h v a r i a b l e , and p i r = Pi l l i = r ) f o r r = 1 ,.... m. Then
a l g e b r a w i l l e s t a b l i s h t l rn t
i n
t h e
a l i t t l e
w i t l ~ E l ' f ] = (1n t1 ) /2 i f t l ~ e p o p u l a t i o n h a s u n i f o r m r e g r e s s i o n
s t r t ; c t u r c . T l ~ u s o n c s a y t r o t t i le h y p o t h e s i s o f raodom s t r u c t u r e .
unde r w l ~ i c h T o b v i o u s l y h a s t h c d i s t r i b u t i o n p r c v i o u e l y d e r i v e d ,
a g a i n s t t h e a l t e r n a t i v e t l u t E[T] > ( iat1)/2.
The s t a t i s t i c T c a n a l s o b e used t o t e s t f o r s i m i l a r i t y o f
r e g r e s s i o n s t r u c t u r e aurony n > 2 p o p u l a t i o n s . L e t T b e t h e
statistic c a l c u l a t e d f rom t h e o b s e r v e d r e s u l t s f o r p o p u l a t i o n s i
dud j , where i j = l n . Then a n i n t u i t i v e l y r e a s o n a b l e t e s t
s t a t i s t i c is
which c a n b e v iewed a s a U - s t a t i s t i c oC d e g r e e 2 . Suppose the n
p o p u l a t i o n s a l l have s i m i l a r r e g r e s s i o n s t r u c t u r e s . Then, i f we
we o b t a i n
Under t h e n u l l h y p o t h e s i s of random s t r u c t u r e i n a l l p o p u l a t i o n s .
w e have
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
whence
m t l mt' " [ T I = - - E[T] = -1 , 611 (n- 1 ) -
S i n c e E=O, t h e a s y m p t o t l c d i s t r i b u t i o n o f T is n o t n o m a l ;
h o w e v e r , we c a n o b t a i n a c h i - s q u a r e d a p p r o x i m a t i o n a s we d i d f o r
'T. L e t
63 = E [ T ( ' ~ ) - ~ ] ~ , IAI a E [ ( T ( ~ ~ ) - u ) (T ( 2 3 ) - u ) (T(31) -p) ;
t h e n B ( T - ~ ) + D Is a p p r o x i m a t e l y d i s t r i b u t e d a s c h i - s q u a r e d w i t h
D d e g r e e s o f f r e e d o m , w h e r e -3
Some a l g e b r a e s t a t l i s h e s t h a t (1: = m(m+1)/48(m-1) e x a c t l y ; i t is
c o n v e n i e n t t o r e p l a c e t h i s b y (m+1)/48. T h e n , r e p l a c i n g t h e e x a c t
v a l u e o f 8 by ( m t 1 ) / 2 4 a s b e f o r e , we o b t a i n P, = 4n a n d
D = 4 n ( m + 1 ) / 3 ( n - I ) , s o t h a t
N o t e t h a t t h i s a g r e e s w i t h t h e p r e v i o u s r e s u l t f o r n = 2 .
The c o m p a r i s o n o f m u l t i p l e r e g r e s s i o n s is, o f c o u r s e , n o t t h e
o n l y c o n t e x t i n wli ich a w e i g h t e d c o r r e l a t i o n s t a t i s t i c m i g l ~ t b e
u s e f u l ; o n e c a n i m a g i n e o t h e r s i t u a t i o r i s i n w h i c h t h e c l o s e n e s s of
t h e r e l a t i o n s h i p amohg t h e i t e m s r a n k e d e a r l i e r is more i m p o r t a n t
t h a n among t h o s e r a n k e d l a t e r . Nor Ls 'r t h e o n l y r e a s o n a b l e
s t a t i s t i c . F o r i n s t a n c e , i n s t e a d of m o d i f y i n g S p e a r m a n ' s f o o t r u l e ,
c o n s i d e r modl Eying S p e a r m a n ' s r h o f ram
* I t is e a s i l y shown t l l a t
Rs r a n g e s f rom - 1 when t h e two r n n k i n g s
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3
TWO I1UI.TIPLE KE(:KESSlONS
are opposite to 1 vlien they are identical, with expectatlon 0
if eltiler population has ra~~dom structure. It is not clear. * however, lmw to choose among T, Rs, and the innumerable other
posslbilitirs.
ACKNOWI.EDCEE.IENT
'l'lie work of the first author was sponsored by grant Hll07102
f ronl tlie Nat ionill 111s~itutr of Child Health and Human Development.
Salamn, I.A., (1981). A note on Spearman's footrule (unpublished manuscript).
Salau~a. 1 . A . . 6 Quade, D., (1981). A nonparametric comparison of the structure of two n~ultiple-regression prediction situations. institute of Statistics Mimeo Series 1325, University of Nortli Carolina at Chapel Llill.
Rccelved Eloy, 1981; Rcvr red S e p t f m l r r , 1 9 8 1 .
Hecorrur~r~dcd by H. E . O r l ~ l k , llniversi t y o f Victorla, Victoria, B.C . , C d n d d d .
Dow
nloa
ded
by [
Tex
as S
tate
Uni
vers
ity, S
an M
arco
s] a
t 05:
23 2
8 A
ugus
t 201
3