Nonparametric: K sample 張育慈 2015/03. Treatment 1 ……Treatment i...

Post on 21-Dec-2015

216 views 0 download

transcript

Nonparametric: K sample

張育慈 2015/03

Kruskal-Wallis testLet represent the ranks of observation The Kruskal-Wallis test statistic is denoted by

Where is the total number of observation, and are the sample size and the average rank for treatment i. Kruskal-Wallis statistic approximate the chi-square distribution with degrees of freedom k-1.

Treatment 1

… … Treatment i

Data Ranks … … Data Ranks

219 . … … 223 .

255 . … … 186 2

269 . … … 164 1

SSTR=

= , =

SSTO===

For tied data, the Kruskal-Wallis test adjusted for ties is given by

Assume the tied data are arrange into g groups of like observations. We let denote the number of observation in the ith group, i=1,2,…,g.

Treatment 1

… … Treatment i

Data Ranks … … Data Ranks

219 5 … … 223 .

253 . … … 164 2

255 . … … 164 2

269 . … … 164 2

250 . … … 204 4

• Multiple comparisons use pairwise tests to determine which treatment differ from

others with controlling experiment-wise error rate.(=1- > >) 1. Bonferroni Adjustment =2. Fisher’s(Protected) Least Significant Difference (LSD)

3. Tukey’s HSD Procedure

for equal sample sizes

for unequal sample sizes

ExampleAn agronomist gave scores from 0 to 5 to denote insect damage to wheat plants that were treated with four insecticides. The data are givens in following tables. Use the Kruskal-Wallis test and one-way ANOVA to test whether or not there is difference among the treatments.

One- way ANOVA Kruskal-Wallis test

四種殺蟲劑所獲得的分數平均數相同四種殺蟲劑所獲得的分數平均數不同

四種殺蟲劑所獲得的分數中位數相同四種殺蟲劑所獲得的分數中位數不同

P-value=0.0384 P-value=0.0485

在 =0.05之下,我們可以拒絕。因此可以推論四種殺蟲劑所獲得的分數平均數不同。

在 =0.05之下,我們可以拒絕。因此可以推論四種殺蟲劑所獲得的分數中位數不同。

多重比較 :第一種殺蟲劑所獲得的分數平均數和第四種不同

多重比較 :第一種殺蟲劑所獲得的分數中位數和第四種不同

T1 T 2 T 3 T 4

0 2 1 3

2 0 3 4

1 3 4 2

3 1 2 5

1 3 2 3

4 4 1 4

Block: Group of homogeneous experiment units.

• Blocking to "remove" the effect of nuisance factors.

• More precise.

• The treatments are randomly assigned to experimental units within blocks.

Friedman’s test is a nonparametric test for ranking the observations within blocks. It’s a randomized complete block design to the ranks.

Where denote the average rank for treatment i , k and b are total of treatments and blocks . FM follows a chi-square distribution with k-1 degrees of freedom.

BlocksRow

totals1 2 … b

Treatments

1 …

2 …

. . . … .

. . . … .

k …

Column totals …

SSTR, =

For tied data

Let denote the number of tied observations in the ith group within the jth block. Let denote the number of groups of tied observations with the jth block.

Blocks

1 2 b

Treatments

1

2 60(2)

. 150(3.5) 80(2) 80(3)

k 90(4)

ExampleDifferent types of farm machinery have different effects on the compaction of soil and thus may affect yields differently .Table shows yield data from a randomized complete block design in which four different types of tractors were used in tilling the soil. One- way ANOVA RBD Friedman’s test

四種農業用拖拉機所獲得的產量平均數相同四種農業用拖拉機所獲得的產量平均數不同

四種農業用拖拉機所獲得的產量中位數相同四四種農業用拖拉機所獲得的產量中位數不同

P-value=0.46 P-value=0.058

在 =0.1之下,我們不拒絕。在 =0.1之下,我們可以拒絕。

tractor LOCATION1 LOCATION2 LOCATION3 LOCATION4 LOCATION5 LOCATION6

1 120(1) 208(4) 199(4) 194(4) 177(4) 195(4)

2 207(4) 188(3) 181(3) 164(2) 155(1) 175(2)

3 122(2) 137(2) 177(2) 177(3) 160(3) 138(1)

4 128(3) 128(1) 160(1) 142(1) 157(2) 179(3)

Nonparametric vs parametric

Kruskal-Wallis test vsOne- way ANOVA

•approximate the chi-square distribution with degrees of freedom k-1.

• F=

Friedman’s test vsOne- way ANOVA RBD

•.

• F=

• Distribution• Central measure• Outliers

ExampleAn agronomist gave scores from 0 to 5 to denote insect damage to wheat plants that were treated with four insecticides. The data are givens in following tables. Use the Kruskal-Wallis test and one-way ANOVA to test whether or not there is difference among the treatments.

One- way ANOVA Kruskal-Wallis test

四種殺蟲劑所獲得的分數平均數相同四種殺蟲劑所獲得的分數平均數不同

四種殺蟲劑所獲得的分數中位數相同四種殺蟲劑所獲得的分數中位數不同

P-value=0.285 P-value=0.0485

在 =0.05之下,我們不拒絕。因此我們沒有足夠的證據說明四種殺蟲劑所獲得的分數平均數不同

在 =0.05之下,我們可以拒絕。因此可以推論四種殺蟲劑所獲得的分數中位數不同。

T1 T 2 T 3 T 4

0 2 1 3

2 0 3 4

1 3 4 2

3 1 2 50

1 3 2 3

4 4 1 4

END

= .

The expected rank for any observation is the average rank .

For the ith sample which contains observations, the expected sum of ranks would be .

the actual sum of ranks assigned to the elements in the ith sample.

The sum of squares of these deviation can be S=

==

Average rank sum for ith column

Hence E()= Var()==

The CLT allows us to approximate the distribution of = is distributed approximately as chi square with one degree of freedom.

Kruskal(1952) showed that under , if no is very small, the r.v

is distributed approximately as chi square with k-1 degree of freedom.

Kruskal-Wallis Test

Chi-Square 7.6301

DF 3

Asymptotic Pr > Chi-Square

0.0543

Exact Pr >= Chi-Square

0.0458

Source DF Sum of Squares

Mean Square

F Value Pr > F

Model 3 14.45833333

4.81944444

3.38 0.0384

Error 20 28.50000000

1.42500000

   

Corrected Total

23 42.95833333

 

p-Values

Variable Contrast Raw Bonferroni Permutation

scores 1 vs 2 0.2407 1.0000 0.6727

scores 1 vs 3 0.2407 1.0000 0.6727

scores 1 vs 4 0.0051 0.0307 0.0294

scores 2 vs 3 1.0000 1.0000 1.0000

scores 2 vs 4 0.0673 0.4039 0.2750

scores 3 vs 4 0.0673 0.4039 0.2750

Means with the same letterare not significantly different.

t Grouping Mean N treatment

  A 3.5000 6 4

  A      

B A 2.1667 6 3

B A      

B A 2.1667 6 2

B        

B   1.3333 6 1

來源 自由度 ANOVA SS

均方 F 值 Pr > F

tractor 3 5408.333333

1802.777778

3.12 0.0575

location 5 2816.833333

563.366667

0.98 0.4640

Cochran-Mantel-Haenszel  統計值 ( 根據排名計分 )

統計值 對立假設 自由度 值 機率1 非零相關 1 0.2650 0.6067

2 列平均值計分差異

5 4.6377 0.4617

data;

input treatment scores @@;

cards;

1 0 2 2 3 1 4 3

1 2 2 0 3 3 4 4

1 1 2 3 3 4 4 2

1 3 2 1 3 2 4 5

1 1 2 3 3 2 4 3

1 1 2 4 3 1 4 4

;

proc npar1way wilcoxon;

class treatment;

exact wilcoxon;

var scores;

run;

Proc glm;

Class treatment;

model scores=treatment;

run;

data yield;do tractor= 1 to 4;do location= 1 to 6;input y @@;output;end;end;cards;120 208 199 194 177 195 207 188 181 164 155 175122 137 177 177 160 138 128 128 160 142 157 179;proc anova;class tractor location;model y= tractor location;run;proc freq;tables tractor*location*y/ CMH2 scores=Rank noprint;run; proc univariate normal;var y;by tractor;run;

proc glm;

Class treatment;

model scores=treatment;

means treatment/BON LSD TUKEY;

run;

proc multtest perm bon pvals ;

class treatment;

contrast '1 vs 2' -1 1 0 0;

contrast '1 vs 3' -1 0 1 0;

contrast '1 vs 4' -1 0 0 1;

contrast '2 vs 3' 0 -1 1 0;

contrast '2 vs 4' 0 -1 0 1;

contrast '3 vs 4' 0 0 -1 1;

test mean(scores/);

run;