A nonparametric comparison of two multiple regressions by means of a weighted measure of correlation

This article was downloaded by: [Texas State University, San Marcos]On: 28 August 2013, At: 05:23Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office:Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory andMethodsPublication details, including instructions for authors and subscriptioninformation:http://www.tandfonline.com/loi/lsta20

A nonparametric comparison of two multipleregressions by means of a weighted measure ofcorrelationIbrahim Salama a & Dana Quade aa Department of Biostatistics, University of North Carolina, Chapel Hill, NC,27514Published online: 27 Jun 2007.

To cite this article: Ibrahim Salama & Dana Quade (1982) A nonparametric comparison of two multiple regressionsby means of a weighted measure of correlation, Communications in Statistics - Theory and Methods, 11:11,1185-1195

To link to this article: http://dx.doi.org/10.1080/03610928208828304

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”)contained in the publications on our platform. However, Taylor & Francis, our agents, and ourlicensors make no representations or warranties whatsoever as to the accuracy, completeness, orsuitability for any purpose of the Content. Any opinions and views expressed in this publication arethe opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis.The accuracy of the Content should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoevercaused arising directly or indirectly in connection with, in relation to or arising out of the use of theContent.

This article may be used for research, teaching, and private study purposes. Any substantialor systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, ordistribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use canbe found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/lsta20

http://dx.doi.org/10.1080/03610928208828304

http://www.tandfonline.com/page/terms-and-conditions

A NONPAKAMETKIC C0EU'AI:ISON O F TWO PtUL'IIPLE RECRESSIONS MY MANS OF A WGILII'I'LD PlEASUKE O F COKHEIATIUN

1l ra l i i111 5 a l a n ~ a and Dana Quade

DepnrLn~cnt o f B i o s t a t i s t i c s U n i v e r s i L y o f Nor th C a r o l i n a

Clrapsl Hill, NC 27514

Key Wolds and ~'hrases: n ~ u l t i p l e r e g r e s s i o n ; r ank c o r r e l a t i o n ;

S p e a r n u n ' s f o o t r u l e

Let ( R l j * - , R . ) , j = 1 , 2 , b e two r a n k i n g s o f m i t e m s ; l e t

nl J T k , k-1, ..., m, be t h e n u ~ ~ i b c r of i t e m s w i t h r a n k 5 k i n b o t h

r a n k i n g s ; and l e t T = Z'Tk/k . Tlien T is a measure of r a n k

c o r r e l a t i o n wlrich g i v e s g r e a t e r we igh t t o i t e m s o f l o w r a n k t h a n

h igh . Such a measure is p a r t i c u l a r l y u s e f u l i n c o n ~ p a r i n g t h e

o r d e r i n g of t h e r e g r e s s o r s i n two m u l t i p l e r e g r e s s i o n s . We

d i s c u s s the d i s t r l b u t l o ~ i of T , p r e s e n t i n g b o t h e x a c t t a b l e s and

i ~ r a c t i c a l a p p r o x i m a t i o n s , a d e x t e n d t h e c o n c e p t t o o t h e r

s i t u a t i u ~ i s .

1 . INI'KODIICTION -- Suppose we wish t o coinpare two p o p u l a t i o n s w i t h r e s p e c t t o

predict in^ a r e s p o n s e v a r i a b l e Y f rom r e g r e s s o r v a r i a b l e s

X, ,X2,. . . ,X I n a s k i n g whe the r t h e y a r e " s i n ~ i l a r " , however, we m'

do n o t mean w h e t h e r t h e r e g r e s s i o n coefficients are similar, s i n c e

t h e v a r i a b l e s observable i n o u r samples may o n l y b e p r o x i e s f o r

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

u n o b s e r v a b l e u n d e r l y i n g v a r i a b l e s , m e a s u r e d d i f f e r e n t l y i n t h e two

p o p u l a t i o n s . R a t h e r , we a r e c o n c e r n e d w i t h t h e r a n k i n g o f t h e X ' s

by i m p o r t a n c e . T h i s m i g h t b e e s t a b l i s h e d by p e r f o r m i n g a s t e p w i s e

r e g r e s s i o n i n e a c h s a m p l e , w i t h t h e s i m i l a r i t y o f t h e two p o p u l a -

t i o n s tlie.1 m e a s u r e d by tht . r a n k correlation. As we s e e I t , I ~ o w c v e r .

t h e u s u a l r a n k c o r r e l a t i o n c o e f f i c l e r ~ t s a r e n o t i d e a l F o r t h i s

p u r p o s e , b e c a u s e t h e y g i v e e q u a l w e i g h t t o a l l i t e m s r a n k e d . F o r

e x a m p l e , s u p p o s e we h a v e t h r e e r e g r e s s o r s X1,X2,XJ, a n d i n o n e

s a m p l e t h e s t e p w i s e p r o c e d u r e p l a c e s them i n p r e c i s e l y t h a t o r d e r .

C o n s i d e r two p o s s i b i l i t i e s f o r t h e o r d e r i n t h e o t h e r s a m p l e :

( a ) X1,X3,X2, a n d ( b ) X2,X1,X3. E i t l ~ e r p r o d u c e s t h e same r a n k

c o r r e l a t i o n by t h e w u a l m e t h o d s . B u t ( a ) , w h e r e t h e two s a m p l e s

a g r e e o n t h e f i r s t r e g r e s s o r , s u r e l y r e p r e s e n t s more s i m i l a r i t y

t h a n ( b ) , w h e r e t h e y d i s a g r e e o n t h e f i r s t r e g r p s s o r a n d a g r e e

i n s t e a d o n t h e l a s t . T l ~ u s we s e e k a r a n k c o r r e l a t i o n c o e f f i c i e n t

w h i c h g i v e s g r e a t e r w e i g h t t o i t e m s o f low t h a n h i g h r a n k .

L e t R b e t h e r a n k of t h e i - t h v a r i a b l e i n the 1-t11 pop- 1 j

u l a t i o n , f o r i = L , ..., m a n d j = 1 , 2 ; a n d l e l I

Then Tk

is the number o f r e g r e s s o r s common t o t h e two r e g r e s s i o n s

a t t h e k - t h s t e p . w h i c h i s a n i n t u i t i v e m e a s u r e o f . s i m i l a r i t y a t

t h a t s t e p . With i d e n t i c a l p o p u l a t i o n s , t n have Tk = k f o r

k - 1 , . . . . ~ n w o u l d b e t h e mos t l i k e l y r c s u l t ; o r , f i n d i n g Tk = k

f o r k = l , .... m w o u l d i n d i c a t e s t r o n g l y t l l n t the two p o p u l a t i o n s

a r e s i m i l a r . A c o n s i d e r a b E e d e v i a t i o n f rom t h i s s l t u a t i v n w o u l d

s u g g e s t a high d e g r e e o f n o n - s i m i l a r i t y .

As a s t a t i s t i c t o m e a s u r e s i m l l a r j t y o v e r a l l s t e p s we

p r o p o s e

i n w l ~ l c h e a c h Tk is d i v i d e d by k i n o r d e r t o r e f l e c t t h e

d e c r e a s i n g i m p o r t a n c e o f t h e s t e p s . The maximum v a l u e o f T

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

o c c u r s i f t h e two samples y i e l d t h e same r a n k i n g of t h e X ' S , i n

which c a s e Tk = k f o r a l l k , and T = 111. The minimuu v a l u e

o c c u r s i f ( t hough n o t o n l y l i ) ~ t r e X ' s a r e i n r e v e r s e o r d e r ; t h e n

ei.1~11 Tk = mdx(0, 2k-m), a n d

Note t h a t t (m)/m lrds t lrc same v a l u e f o r e a c h even i n t e g e r

N a s f o r t h e f o l l o w i n g odJ i n t e g e r : t ( 2 ) / 2 = t ( 3 ) / 3 = 112,

t ( 4 ) / 4 = t ( 5 ) / 5 = 5 / 1 2 , t (6116 = t ( 7 ) / 7 = 23/60, t ( 8 ) / 8 - t ( 9 ) / 9 - 307/840, e t c . I n a d d i t i o u , t(ni)/rn is n o n i n c r e a s i n g i n m, and ,

a s m + t (ni)/m -+ 1 - log,2. Thus we c o u l d r e s c a l e T t o l i e

i n Llre r a n g e (-1, 1 ) . o b t d i u i n g ( i f we u s e t h e l i ~ n i t i n g v a l u e of

L ) J. weigllLed r a n k c o r r r l o t i u n c o e f f i c i e n t C = 1 - 2(m-T)/rn(log,2),

bu t we s h a l l n o t b o t h e r t o do t h i s i n wltat f o l l o w s .

I t i s a l s o i n t e r e s t i n g t o n o t e t h a t t h e unweighted measure

S = I T k , which we would have o b t a i n e d had we n o t d i v i d e d e a c h Tk by k , i s r e l a t e d t o t h e meLrlc U Ilk - R . I whose s t a n d a r d i z e d

i l 1 2 v e r s i o n i s known a s S p e a r n ~ a n ' s f o o t r u l e ; Saldma (1981) p r o v e s t h a t

D = 111(nrtl) - 25.

2 . A 1'EST FOR SIEIII.AKI1Y

R e c a l l i n g t h a t R I s t l le r a n k of t l re i - t h v a r i a b l e i n t h e i j

j - t l ~ p a p u l a t i o n , de f i n c

pi:) = P { K = r f o r j = 1 , 2 and i , r = I .... ,rn . 11

'l'l~en we call w r i t e t h e e x p e c t a t i o n of Tki a s

s o t h a t a g e n e r a l e x p r e 6 s i o n f o r t h e e x @ c t a t i o n o t t h e s t a t i s t i c

T is

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

1188 SAIAMA AND QUAYJE

We s a y t h a t r e g r e s s i o n i n t h e j - t l ~ p o l ) u l n t i o n has ~ i n i j b r m

1 p =

* f o r i = I , . . ." *

i n w h i c h case

If r e g r e s s i o n i n o n e p o p u l a t i o n h a s u n i f o r m s t r u c t u r e

l o s s o f g e n e r a l i t y , l e t i t b e t h e f i r s t -- t l w n

W s p e c i a l c a s e o f u n d f o n n s t r u c t u r e is tvrliicm strscctu

me1 = -- 2 -

e , f o r wl~lc la

a l l r a n k i n g s o f t h e v a r i a b l e s a r e e q u a l l y I i k e l y , i . e . ,

P ( R , ~ = K ~ , . . . =r = I/"! f o r a l l p e r m u t a t i o n s ( r l , ..., r m ) sRmj m

of 1 . m T h c n u l l h y p o t h e s i s which we s h a l l test u s i n g

T is t h a t t h l s is t r u e f o r a t l e a s t o n e of t h e p o p u l a t i o n s .

The a l t e r n a t i v e we h a v e i n mind is t h a t t h e r e g r e s s i o r i s i n

the two p o p u l a t i o n s Rave non-uni f orm sfmilat' structurso, by whicl i

we mean t h a e ( 1 ) = ( 2 ) * pir pir

pir ( s a y ) f o r i , r , = 1, .... m, w h e r e

P i r 2 l/m f o r a t l e a s t o n e ( f , r ) . I'llcn

However, s i n c e

is f i x e d . i t f o l l o w s t h a t

w i t h t h e l o w e r bo11nJ l m l d f n g i f arid o n l y i f t h e comninn

r e g r e s s i o n s t r u c t r ~ r e fs u n i f o r m . H e n c e , ~ t n l l e r t h e a 1 t e r n n t i v e ,

E[T) > (m+1) /2 ; i . e . . o u r t e s t s t a t i s t i c h a s a g r e a t e r mean u n d e r Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

'IWO MULTIPLE REGRESSIONS

t h e a l t e r n a t i v e of s i m i l a r nori-uniform s t r u c t u r e s f o r t h e

r e g r e s s i o n s i n t h e two popula t iono t h a n under t h e hypothes i s of

randou~ s t r u c t u r e f o r a t l e a s t one of them.

Now c o n s i d e r some h igher moments under t h e n u l l hypothes i s .

We have

if i,ii,iB1 a l l

d i s t i n c t ,

where lsk<k'sk"Lm;thun a l i r t l e a l g e b r a y i e l d s

Thence f i n a l l y w e c a l c u l a t e

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

SALAElA AND QUADE

mtl V[T] - 12

and

The conditional distribution of Tk+l, given T Iatl~ T2=t2*

. . . , Tk=tk, and assuming that the null hypothesis is' true, may be derived as follows. Beginning with the initial condition that

P(T~-o] - (m-l)/m and P{T~=~) = l/m, suppose the joint distri-

bution of TI, ..., Tk has been found. Then define four sets of

variables :

A - (1: Rillk and ~ ~ ~ s k l , B = [i: Rillk but ~ ~ ~ > k ) ,

C = (1: Ril>k but ~ ~ ~ s k ) , D = {i: Rll>k and Ri2>k).

Note that A contains tk variables, B and C contain (k-tk)

variables each, and D contains the remaining (m-2k+tk) variables.

Then, at step (k+l), tile variable chosen in Sample 1 dust be from

C or D, end that chosen in Sample 2 must be from B or D. We have

the following possibilities:

Varinl)le Vnrinhle Prolmbi lity Resulting in Sample 1 in Sample 2 ~ ( m - t ~ ) ~ value OF Tk+l

from: from:

c n (k-tL) 2 '

C I) (k-tk) (m-2k+tk)

D B (k-tk) (m-2k+tk)

different 'variables 1 (m-2k+tk) (ln-2k+tk-1)

' [varS%Fe I ' (m-2k+tk)

Summing up,

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

TWO MULTIPLE REGRESSIONS

Thus we have derived recursive relationships by which exact tables

can be calculated. It may be seen, by the way, that the distribution

Of Tk+l actually depends only on tk. Thus {Tl,~2,. . . ,T io

a hrkov chain.

Table I presents exact critical values of T for Type I error

probability a0.10, .05, .025, .01, and .005 and number of variables

m=5(1)15. The entire exact null-hypothesis distributione of T

for m=3(1)10, and selected values from the distributions for

m=ll(l)lS, are available in Salama and Quade (1981). When outside

the range OF the table, one mlght use

as a standard normal variable. This would ignore the positive

skew~iebis in the distribution of T, however, and thus generally

produce anti-conservative P-values (too small). Instead, our

recomendation is to treat a linear function of T as a chi-

squared. Spccifically, to have (a+bT) approximately distributed 2 as x (d), equate the first three central momenta:

where for convenience the exact third aaoment of (a+bT) has been

replaced by a sligt~tly lower value. Solving theee equations

yields a - -4(mC1)/3, b = 8, and d = 8(m+1)/3, whence

This approximation fits two moments exactly and the third fairly

well.

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

SAIAMA AND QUADE

Number ol Variable1

m

TABLE I

CRITICAL VAI.UES OF THE TEST srtvrrsrlc T

Type 1 Error 13ro!)ability, a

We lllustrate these approximations at 111114. For any observed

T, the normal approximation to the P-value is P(Z 2 ( ~ - 7 . 5 ) / f i I

where Z - N(0,1), and the chi-equared approximation is P{w 2 2 8 ~ - 2 0 ) where W - x (40). Then we have, obtaining the exact

P-values from Quade and Salama (1981):

T Exact Norma 1 Chi - squared P-value Approximation Aproximnt ion

9 .lo1 32 .08386 .09682

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

For t l ~ i s number of v a r i a b l e s t h e nornlal a p p r o x i m a t i o n is s t r o n g l y

a n t i c o n s e r v a t i v e , b u t t h e cl11-squared a p p r o x i m a t i o n a p p e a r s

a d e q u ~ t e f o r p r a c t i c a l p u r p o s e s .

3. EXTIINS I Oh's AND DISCllSSION

A l t l ~ o u g h we have p roposed t h e s t a t i s t i c T a s a n i n d e x f o r

cornparlng t h e r a n k i ~ l g s of t h e r e g r e s s o r s i n two p o p u l a t i o n l ; , i t i c i

e q u a l l y w a l l s u i t e d f o r c o ~ n p a r i n g some p r e s p e c i f i r d r a n k i n g wi t i t

t h e r a n k i n g o h s e r v e d i n a s i n g l e p o p u l a t i o n . Wi thou t l o s s

g e n e r a l i t y , r e l a b e l t h e v a r i a b l e s s o t h a t t h e p r e t l p e c i f i e d

r a n k i n g b e c o m s I , . . . , m . Lc t Wi b e t h e o b s e r v e d r a n k o f

i - t h v a r i a b l e , and p i r = Pi l l i = r ) f o r r = 1 ,.... m. Then

a l g e b r a w i l l e s t a b l i s h t l rn t

i n

t h e

a l i t t l e

w i t l ~ E l ' f ] = (1n t1 ) /2 i f t l ~ e p o p u l a t i o n h a s u n i f o r m r e g r e s s i o n

s t r t ; c t u r c . T l ~ u s o n c s a y t r o t t i le h y p o t h e s i s o f raodom s t r u c t u r e .

unde r w l ~ i c h T o b v i o u s l y h a s t h c d i s t r i b u t i o n p r c v i o u e l y d e r i v e d ,

a g a i n s t t h e a l t e r n a t i v e t l u t E[T] > ( iat1)/2.

The s t a t i s t i c T c a n a l s o b e used t o t e s t f o r s i m i l a r i t y o f

r e g r e s s i o n s t r u c t u r e aurony n > 2 p o p u l a t i o n s . L e t T b e t h e

statistic c a l c u l a t e d f rom t h e o b s e r v e d r e s u l t s f o r p o p u l a t i o n s i

dud j , where i j = l n . Then a n i n t u i t i v e l y r e a s o n a b l e t e s t

s t a t i s t i c is

which c a n b e v iewed a s a U - s t a t i s t i c oC d e g r e e 2 . Suppose the n

p o p u l a t i o n s a l l have s i m i l a r r e g r e s s i o n s t r u c t u r e s . Then, i f we

we o b t a i n

Under t h e n u l l h y p o t h e s i s of random s t r u c t u r e i n a l l p o p u l a t i o n s .

w e have

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

whence

m t l mt' " [ T I = - - E[T] = -1 , 611 (n- 1 ) -

S i n c e E=O, t h e a s y m p t o t l c d i s t r i b u t i o n o f T is n o t n o m a l ;

h o w e v e r , we c a n o b t a i n a c h i - s q u a r e d a p p r o x i m a t i o n a s we d i d f o r

'T. L e t

63 = E [ T ( ' ~ ) - ~ ] ~ , IAI a E [ ( T ( ~ ~ ) - u ) (T ( 2 3 ) - u ) (T(31) -p) ;

t h e n B ( T - ~ ) + D Is a p p r o x i m a t e l y d i s t r i b u t e d a s c h i - s q u a r e d w i t h

D d e g r e e s o f f r e e d o m , w h e r e -3

Some a l g e b r a e s t a t l i s h e s t h a t (1: = m(m+1)/48(m-1) e x a c t l y ; i t is

c o n v e n i e n t t o r e p l a c e t h i s b y (m+1)/48. T h e n , r e p l a c i n g t h e e x a c t

v a l u e o f 8 by ( m t 1 ) / 2 4 a s b e f o r e , we o b t a i n P, = 4n a n d

D = 4 n ( m + 1 ) / 3 ( n - I ) , s o t h a t

N o t e t h a t t h i s a g r e e s w i t h t h e p r e v i o u s r e s u l t f o r n = 2 .

The c o m p a r i s o n o f m u l t i p l e r e g r e s s i o n s is, o f c o u r s e , n o t t h e

o n l y c o n t e x t i n wli ich a w e i g h t e d c o r r e l a t i o n s t a t i s t i c m i g l ~ t b e

u s e f u l ; o n e c a n i m a g i n e o t h e r s i t u a t i o r i s i n w h i c h t h e c l o s e n e s s of

t h e r e l a t i o n s h i p amohg t h e i t e m s r a n k e d e a r l i e r is more i m p o r t a n t

t h a n among t h o s e r a n k e d l a t e r . Nor Ls 'r t h e o n l y r e a s o n a b l e

s t a t i s t i c . F o r i n s t a n c e , i n s t e a d of m o d i f y i n g S p e a r m a n ' s f o o t r u l e ,

c o n s i d e r modl Eying S p e a r m a n ' s r h o f ram

* I t is e a s i l y shown t l l a t

Rs r a n g e s f rom - 1 when t h e two r n n k i n g s

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

TWO I1UI.TIPLE KE(:KESSlONS

are opposite to 1 vlien they are identical, with expectatlon 0

if eltiler population has ra~~dom structure. It is not clear. * however, lmw to choose among T, Rs, and the innumerable other

posslbilitirs.

ACKNOWI.EDCEE.IENT

'l'lie work of the first author was sponsored by grant Hll07102

f ronl tlie Nat ionill 111s~itutr of Child Health and Human Development.

Salamn, I.A., (1981). A note on Spearman's footrule (unpublished manuscript).

Salau~a. 1 . A . . 6 Quade, D., (1981). A nonparametric comparison of the structure of two n~ultiple-regression prediction situations. institute of Statistics Mimeo Series 1325, University of Nortli Carolina at Chapel Llill.

Rccelved Eloy, 1981; Rcvr red S e p t f m l r r , 1 9 8 1 .

Hecorrur~r~dcd by H. E . O r l ~ l k , llniversi t y o f Victorla, Victoria, B.C . , C d n d d d .

Dow

nloa

ded

by [

Tex

as S

tate

Uni

vers

ity, S

an M

arco

s] a

t 05:

23 2

8 A

ugus

t 201

3

Date post:	14-Dec-2016
Category:	Documents
Upload:	dana
View:	214 times
Download:	1 times

A nonparametric comparison of two multiple regressions by means of a weighted measure of correlation

Documents