+ All Categories
Home > Documents > and the Triola Statistics Series - docs.ufpr.brdenis.santos/Arquivos/Slides/Triola_Cap_13... · Tom...

and the Triola Statistics Series - docs.ufpr.brdenis.santos/Arquivos/Slides/Triola_Cap_13... · Tom...

Date post: 07-Aug-2018
Category:
Upload: lamthu
View: 225 times
Download: 3 times
Share this document with a friend
131
Lecture Slides Elementary Statistics Tenth Edition Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. by Mario F. Triola
Transcript

Lecture Slides

Elementary StatisticsTenth Edition Tenth Edition

and the Triola Statistics Series

by Mario F. Triola

SlideSlide 1Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

by Mario F. Triola

Chapter 13Nonparametric Statistics

13-1 Overview

13-2 Sign Test13-2 Sign Test

13-3 Wilcoxon Signed -Ranks Test for Matched Pairs

13-4 Wilcoxon Rank -Sum Test forTwo Independent Samples

13-5 Kruskal -Wallis Test

SlideSlide 2Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

13-5 Kruskal -Wallis Test

13-6 Rank Correlation

13-7 Runs Test for Randomness

Section 13 -1 Overview

SlideSlide 3Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Created by Erin Hodgess, Houston, TexasRevised to accompany 10th Edition, Jim Zimmer, Chattanooga State,

Chattanooga, TN

Definitions

�Parametric tests have requirements about the

Overview

�Parametric tests have requirements about the nature or shape of the populations involved.

�Nonparametric tests do not require that samples come from populations with normal distributions or have any other particular distributions. Consequently, nonparametric

SlideSlide 4Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

distributions. Consequently, nonparametric tests are called distribution -free tests .

Advantages of Nonparametric Methods

1. Nonparametric methods can be applied to a wide v ariety of situations because they do not have the more rig id requirements of the corresponding parametric method s. In particular, nonparametric methods do not require In particular, nonparametric methods do not require normally distributed populations.

2. Unlike parametric methods, nonparametric methods can often be applied to categorical data, such as the g enders of survey respondents.

SlideSlide 5Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

3. Nonparametric methods usually involve simpler computations than the corresponding parametric methods and are therefore easier to understand and apply.

Disadvantages of Nonparametric Methods

1. Nonparametric methods tend to waste information 1. Nonparametric methods tend to waste information because exact numerical data are often reduced to a qualitative form.

2. Nonparametric tests are not as efficient as para metric tests, so with a nonparametric test we generally ne ed stronger evidence (such as a larger sample or great er

SlideSlide 6Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

stronger evidence (such as a larger sample or great er differences) before we reject a null hypothesis.

Efficiency of Nonparametric Methods

SlideSlide 7Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Definitions

Data are sorted when they are arranged according to some criterion, such as smallest according to some criterion, such as smallest

to the largest or best to worst.

A rank is a number assigned to an individual sample item according to its order in the sorted list. The first item is assigned a

SlideSlide 8Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

sorted list. The first item is assigned a rank of 1, the second is assigned a rank

of 2, and so on.

Handling Ties in Ranks

Find the mean of the ranks involved and assign this mean rank to each of the tied items.

Sorted Data RankPreliminary RankingSorted Data

4

5

5

5

10

Rank

1

3

3

3

5

Mean is 3.

Preliminary Ranking

1

2

3

4

5

SlideSlide 9Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

10

11

12

12

5

6

7.5

7.5

Mean is 7.5.

5

6

7

8

Section 13 -2 Sign Test

SlideSlide 10Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Created by Erin Hodgess, Houston, TexasRevised to accompany 10th Edition, Jim Zimmer, Chattanooga State,

Chattanooga, TN

Key Concept

The main objective of this section is to The main objective of this section is to understand the sign testprocedure,

which involves converting data values to plus and minus signs, then testing for disproportionately more of either sign.

SlideSlide 11Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

disproportionately more of either sign.

Definition

Sign Test

The sign test is a nonparametric (distribution The sign test is a nonparametric (distribution free) test that uses plus and minus signs to test different claims, including:

1) Claims involving matched pairs of sample data;

2) Claims involving nominal data;

SlideSlide 12Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

2) Claims involving nominal data;

3) Claims about the median of a single population.

Basic Concept of the Sign Test

The basic idea underlying the sign The basic idea underlying the sign test is to analyze the frequencies

of the plus and minus signs to determine whether they are

significantly different.

SlideSlide 13Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

significantly different.

Figure 13-1 Sign Test Procedure

SlideSlide 14Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Figure 13-1 Sign Test Procedure

SlideSlide 15Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Figure 13-1 Sign Test Procedure

SlideSlide 16Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Requirements

1. The sample data have been randomly selected. randomly selected.

2. There is no requirement that the sample data come from a populationwith a particular distribution, such

SlideSlide 17Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

as a normal distribution.

Notation for Sign Test

x = the number of times the less frequent signthe less frequent signoccurs

n = the total number of positive and negative

SlideSlide 18Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

positive and negative signs combined

Test Statistic

For n ≤≤≤≤ 25: x (the number of times the less frequent sign occurs)

(x + 0.5) –n

Critical valuesFor n ≤≤≤≤ 25, critical x values are in Table A -7.

z =For n > 25: n

(x + 0.5) –n

2

2

SlideSlide 19Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

For n ≤≤≤≤ 25, critical x values are in Table A -7.

For n > 25, critical z values are in Table A -2.

Claims Involving Matched Pairs

When using the sign test with data that are matched pairs, we convert the raw data to plus and minus signs as follows: plus and minus signs as follows:

1. Subtract each value of the second variable from the corresponding value of the first variable.

2. Record only the sign of the difference found

SlideSlide 20Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

2. Record only the sign of the difference found in step 1.

Exclude ties : that is, any matched pairs in which both values are equal.

Key Concept Underlying This Use of the Sign Test

If the two sets of data have equal medians, the number of positive

signs should be approximately equal to the number of negative signs.

SlideSlide 21Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Example: Yields of Corn from Different Seeds

Use the data in Table 13-3 with a 0.05 significance level to test the claim that there is no difference betwe en the yields from the regular and kiln -dried seed. yields from the regular and kiln -dried seed.

SlideSlide 22Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Example: Yields of Corn from Different Seeds

Use the data in Table 13-3 with a 0.05 significance level to test the claim that there is no difference betwe en the yields from the regular and kiln -dried seed.

H0: The median of the differences is equal to 0.

H1: The median of the differences is not equal to 0.

αααα = 0.05

yields from the regular and kiln -dried seed.

SlideSlide 23Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

x = minimum(7, 4) = 4 (From Table 13-3, there are 7 negative signs and 4 positive signs.)

Critical value = 1 (From Table A -7 where n = 11 and αααα = 0.05)

Example: Yields of Corn from Different Seeds

Use the data in Table 13-3 with a 0.05 significance level to test the claim that there is no difference betwe en the yields from the regular and kiln -dried seed.

H0: The median of the differences is equal to 0.

H1: The median of the differences is not equal to 0.

With a test statistic of x = 4 and a critical value of 1, we fail to reject the null hypothesis of no differe nce.

yields from the regular and kiln -dried seed.

SlideSlide 24Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

we fail to reject the null hypothesis of no differe nce.

There is not sufficient evidence to warrant rejecti on of the claim that the median of the differences is equal to 0.

Claims Involving Nominal Data

The nature of nominal data limits the The nature of nominal data limits the calculations that are possible, but we can identify the proportion of the sample data

that belong to a particular category.

Then we can test claims about the

SlideSlide 25Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Then we can test claims about the corresponding population proportion p.

Example: Gender SelectionOf the 325 babies born to parents using the XSORT method of gender selection, 295 were girls. Use th e sign test and a 0.05 significance level to test the claim that this method of gender selection has no effect.that this method of gender selection has no effect.

The procedures are for cases in which n > 25.

Note that the only requirement is that the sample d ata are randomly selected.

H : p = 0.5

SlideSlide 26Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

H0: p = 0.5 (the proportion of girls is 0.5)

H1: p ≠≠≠≠ 0.5

Example: Gender SelectionOf the 325 babies born to parents using the XSORT method of gender selection, 295 were girls. Use th e sign test and a 0.05 significance level to test the claim that this method of gender selection has no effect.that this method of gender selection has no effect.

Denoting girls by the positive sign (+) and boys by the negative sign (–), we have 295 positive signs and 30 negative signs.

SlideSlide 27Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Test statistic x = minimum(295, 30) = 30

The test involves two tails.

Example: Gender SelectionOf the 325 babies born to parents using the XSORT method of gender selection, 295 were girls. Use th e sign test and a 0.05 significance level to test the claim that this method of gender selection has no effect.that this method of gender selection has no effect.

n(x + 0.5) –

z =n

2

2

SlideSlide 28Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

2

(30 + 0.5) –z =

3252

2325

= –14.64

Example: Gender SelectionOf the 325 babies born to parents using the XSORT method of gender selection, 295 were girls. Use th e sign test and a 0.05 significance level to test the claim that this method of gender selection has no effect.that this method of gender selection has no effect.

With αααα = 0.05 in a two-tailed test, the critical values are z = ±±±± 1.96.

The test statistic z = -14.64 is less than -1.96.

We reject the null hypothesis that p = 0.5.

SlideSlide 29Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

We reject the null hypothesis that p = 0.5.

There is sufficient evidence to warrant rejection o f the claim that the method of gender selection has no effect.

Example: Gender SelectionOf the 325 babies born to parents using the XSORT method of gender selection, 295 were girls. Use th e sign test and a 0.05 significance level to test the claim that this method of gender selection has no effect.that this method of gender selection has no effect.

Figure 13.2

SlideSlide 30Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Claims About the Median of a Single Population

The negative and positive signs are based on the claimed value

of the median.

SlideSlide 31Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Example: Body TemperatureUse the temperatures for 12:00 A.M. on Day 2 in Data Set 2 in Appendix B. Use the sign test to test the cla im that the median is less than 98.6°F.

There are 68 subjects with temperatures below 98.6° F, 23 subjects with temperatures above 98.6°F, and 15 subjects with temperatures equal to 98.6°F.

H0: Median is equal to 98.6°F.

SlideSlide 32Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

H1: Median is less than 98.6°F.

Since the claim is that the median is less than 98.6°F. the test involves only the left tail.

Example: Body TemperatureUse the temperatures for 12:00 A.M. on Day 2 in Data Set 2 in Appendix B. Use the sign test to test the cla im that the median is less than 98.6°F.

Discard the 15 zeros.

Use ( – ) to denote the 68 temperatures below 98.6° F, and use ( + ) to denote the 23 temperatures above 98.6°F.

SlideSlide 33Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

98.6°F.

So n = 91 and x = 23

Example: Body TemperatureUse the temperatures for 12:00 A.M. on Day 2 in Data Set 2 in Appendix B. Use the sign test to test the cla im that the median is less than 98.6°F.

(x + 0.5) –z =

n

2

2n

(23 + 0.5) –91

SlideSlide 34Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

(23 + 0.5) –z =

912

291

= – 4.61

Example: Body TemperatureUse the temperatures for 12:00 A.M. on Day 2 in Data Set 2 in Appendix B. Use the sign test to test the cla im that the median is less than 98.6°F.

We use Table A -2 to get the critical z value of –1.645.

The test statistic of z = –4.61 falls into the critical region.

We reject the null hypothesis.

SlideSlide 35Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

We support the claim that the median body temperatu re of healthy adults is less than 98.6°F.

Example: Body TemperatureUse the temperatures for 12:00 A.M. on Day 2 in Data Set 2 in Appendix B. Use the sign test to test the cla im that the median is less than 98.6°F.

Figure 13.3

SlideSlide 36Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

RecapIn this section we have discussed:

Sign tests where data are assigned plus or minus signs and then tested to see if the number of plus signs and then tested to see if the number of plus and minus signs is equal.

Sign tests can be performed on claims involving:

Matched pairs

SlideSlide 37Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Nominal data

The median of a single population

Section 13 -3 Wilcoxon Signed -Ranks Test for Matched PairsTest for Matched Pairs

SlideSlide 38Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Created by Erin Hodgess, Houston, TexasRevised to accompany 10th Edition, Jim Zimmer, Chattanooga State,

Chattanooga, TN

Key Concept

The Wilcoxon signed-ranks test uses ranks of sample data consisting of ranks of sample data consisting of matched pairs.

This test is used with a null hypothesis that the population of differences from the matched pairs has a median equal

SlideSlide 39Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

the matched pairs has a median equal to zero.

The Wilcoxon signed -ranks test is a nonparametric test that uses ranks of sample

data consisting of matched pairs.

Definition

data consisting of matched pairs.

It is used to test the null hypothesis that the population of differences has a median of zero.

H0: The matched pairs have differences that come from a population with a median equal to zero.

SlideSlide 40Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

equal to zero.

H1: The matched pairs have differences that come from a population with a nonzero median.

1. The data consist of matched pairs that have

Wilcoxon Signed-Ranks TestRequirements

1. The data consist of matched pairs that have been randomly selected.

2. The population of differences (found from the pairs of data) has a distribution that is approximately symmetric, meaning that the left half of its histogram is roughly a mirror image of

SlideSlide 41Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

half of its histogram is roughly a mirror image of its right half. (There is no requirement that the data have a normal distribution.)

Notation

T = the smaller of the following two sums:sums:

1. The sum of the absolute values of the negative ranks of the nonzero differences d

SlideSlide 42Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

2. The sum of the positive ranks of the nonzero differences d

Test Statistic for the Wilcoxon Signed-Ranks Test

for Matched Pairs

For n ≤≤≤≤ 30, the test statistic is T.

z =For n > 30, the test statistic is4

T – n(n + 1)

SlideSlide 43Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

z =For n > 30, the test statistic isn(n +1) (2n +1)

24

Critical Values for the Wilcoxon Signed-Ranks Test

for Matched Pairs

For n ≤≤≤≤ 30, the critical T value is found in Table A -8.

For n > 30, the critical z values are found in

SlideSlide 44Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

For n > 30, the critical z values are found in Table A -2.

Procedure for Finding the Value of the Test Statistic

Step 1: For each pair of data, find the difference d by subtracting the second value from the first. Keep the signs, but d iscard any

pairs for which d = 0.pairs for which d = 0.

Step 2: Ignore the signs of the differences , then sort the differences from lowest to highest and replace the differences by the corresponding rank value. When differences have th e same numerical value, assign to them the mean of the ran ks involved in the tie.

Step 3: Attach to each rank the sign difference fr om which it came.

SlideSlide 45Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Step 3: Attach to each rank the sign difference fr om which it came. That is, insert those signs that were ignored in st ep 2.

Step 4: Find the sum of the absolute values of the negative ranks. Also find the sum of the positive ranks.

Step 5: Let T be the smaller of the two sums found in Step 4. Either sum could be used, but for a simplified proc edure

Procedure for Finding the Value of the Test Statistic

Either sum could be used, but for a simplified proc edure we arbitrarily select the smaller of the two sums.

Step 6: Let n be the number of pairs of data for which the difference d is not 0.

Step 7: Determine the test statistic and critical values based on the sample size, as shown above.

SlideSlide 46Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Step 8: When forming the conclusion, reject the nu ll hypothesis if the sample data lead to a test statistic that is in the critical region - that is, the test statistic is less than or equal to the critical value(s). Otherwise, fail to reject the nu ll hypothesis.

Example: Does the Type of SeedAffect Corn Growth?

Use the data in Table 13-4 with the Wilcoxon signed -ranks test and 0.05 significance level to test the claim that there is no difference between the yields from the regula r and is no difference between the yields from the regula r and kiln-dried seed.

SlideSlide 47Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Use the data in Table 13-4 with the Wilcoxon signed -ranks test and 0.05 significance level to test the claim that there is no difference between the yields from the regula r and

Example: Does the Type of SeedAffect Corn Growth?

is no difference between the yields from the regula r and kiln-dried seed.

H0: There is no difference between the times of the first and second trials.

H1: There is a difference between the times of

SlideSlide 48Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

H1: There is a difference between the times of the first and second trials.

�The ranks of differences in row four of the table are found by ranking the absolute differences, handling ties by assigning the

Example: Does the Type of SeedAffect Corn Growth?

differences, handling ties by assigning the mean of the ranks.

�The signed ranks in row five of the table are found by attaching the sign of the differences to the ranks.

SlideSlide 49Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

to the ranks.

�The differences in row three of the table are found by computing the first time – second time.

Example: Does the Type of SeedAffect Corn Growth?

Step 1: In Table 13 -4, the row of differences is

Calculate the Test Statistic

Step 1: In Table 13 -4, the row of differences is obtained by computing this difference for each pair of data:

d = yield from regular seed – yield from kiln-dried se ed

Step 2: Ignoring their signs, we rank the absolute differences from lowest to highest.

SlideSlide 50Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

absolute differences from lowest to highest.

Step 3: The bottom row of Table 13 -4 is created by attaching to each rank the sign of the corresponding differences.

Example: Does the Type of SeedAffect Corn Growth?

Step 3 (cont.): If there really is no difference Step 3 (cont.):

Calculate the Test Statistic

Step 3 (cont.): If there really is no difference between the yields from the two types of seed (as in the null hypothesis), we expect the sum of the positive ranks to be approximately equal to the sum of the absolute values of the negative ranks.

Step 4: We now find the sum of the absolute

Step 3 (cont.):

SlideSlide 51Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Step 4: We now find the sum of the absolute values of the negative ranks, and we also find the sum of the positive ranks.

Example: Does the Type of SeedAffect Corn Growth?

Calculate the Test Statistic

Step 4 (cont.):Sum of absolute values of negative ranks: 51

(from 10 + 9 + 8 + 6 + 5 + 11 + 2)

Sum of positive ranks: 15 (from 1 + 3 + 4 + 7)

Step 5: Letting T be the smaller of the two sums found in Step 4, we find that T = 15.

Step 4 (cont.):

SlideSlide 52Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

sums found in Step 4, we find that T = 15.

Step 6: Letting n be the number of pairs of data for which the difference d is not 0, we have n = 11.

Example: Does the Type of SeedAffect Corn Growth?

Step 7: Because n = 11, we have n ≤ 30, so we Step 7:

Calculate the Test Statistic

Step 7: Because n = 11, we have n ≤ 30, so we use a test statistic of T = 15. From Table A -8, the critical T = 11 (using n = 11 and αααα = 0.05 in two tails).

Step 8: The test statistic T = 15 is not less than or equal to the critical value of 11, so we fail to

Step 7:

SlideSlide 53Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

or equal to the critical value of 11, so we fail to reject the null hypothesis .

It appears that there is no difference between yields from regular seed and kiln -dried seed.

Recap

In this section we have discussed:

The Wilcoxon signed -ranks test which uses matched pairs.

The hypothesis is that the matched pairs have differences that come from a

SlideSlide 54Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

pairs have differences that come from a population with a median equal to zero.

Section 13 -4 Wilcoxon Rank -Sum Test

for Two Independent for Two Independent Samples

SlideSlide 55Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Created by Erin Hodgess, Houston, TexasRevised to accompany 10th Edition, Jim Zimmer, Chattanooga State,

Chattanooga, TN

Key Concept

The Wilcoxon signed-ranks test (Section 13-3) involves matched (Section 13-3) involves matched

pairs of data.

The Wilcoxon rank-sum test of this section involves two independent samples that are not related or

SlideSlide 56Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

samples that are not related or somehow matched or paired.

Definition

The Wilcoxon rank -sum test is a nonparametric test that uses ranks of sample data from two independent populations. It is used to test the independent populations. It is used to test the null hypothesis that the two independent samples come from populations with equal medians.

H0: The two samples come from populations

SlideSlide 57Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

H0: The two samples come from populations with equal medians.

H1: The two samples come from populations with different medians.

Basic Concept

If two samples are drawn from If two samples are drawn from identical populations and the individual values are all ranked as one combined collection of values, then the high and low ranks should fall evenly between the two samples.

SlideSlide 58Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

fall evenly between the two samples.

Requirements

1. There are two independent samples of randomly selected data.randomly selected data.

2. Each of the two samples has more than 10 values.

3. There is no requirement that the two populations have a normal distribution or

SlideSlide 59Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

populations have a normal distribution or any other particular distribution.

n1 = size of Sample 1

n2 = size of Sample 2

Notation for the Wilcoxon Rank -Sum Test

n2 = size of Sample 2

R1 = sum of ranks for Sample 1

R2 = sum of ranks for Sample 2

R = same as R1 (sum of ranks for Sample 1)

µµµµ = mean of the sample R values that is expected

SlideSlide 60Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

µµµµR = mean of the sample R values that is expected when the two populations have equal medians

σσσσR = standard deviation of the sample R values that is expected with two populations having equal medians

Test Statistic for the Wilcoxon Rank -Sum Test

R – µµµµRz = σσσσRσσσσR

σσσσR = n1 n2 (n1 + n2 + 1)

12

n1 (n1 + n2 + 1)

2=µµµµ Rwhere

n = size of the sample from which the rank

SlideSlide 61Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

n1 = size of the sample from which the rank sum R is found

n2 = size of the other sample

R = sum of ranks of the sample with size n1

Critical Values for the Wilcoxon Rank -Sum Test

Critical values can be found in Table A-2 (because the test statistic is

based on the normal distribution).

SlideSlide 62Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Procedure for Finding the Value of the Test Statistic

1. Temporarily combine the two samples into one big sample, then replace each sample one big sample, then replace each sample value with its rank.

2. Find the sum of the ranks for either one of the two samples.

SlideSlide 63Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

3. Calculate the value of the z test statistic as shown in the previous slide, where either sample can be used as ‘Sample 1’.

The data in Table 13-5 are from Data Set 1 in Appendix B and use only the first 13 sample

Example: BMI of Men and Women

use only the first 13 sample values for men and the first 12 sample values for women.

The numbers in parentheses are their ranks beginning with a rank of 1 assigned to the

SlideSlide 64Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

a rank of 1 assigned to the lowest value of 17.7.

R1 and R2 at the bottom denote the sum of ranks.

Example: BMI of Men and Women

Use the data in Table 13-5 with the Wilcoxon rank-s um test and a 0.05 significance level to test the clai m that the median BMI of men is equal to the median BMI of the median BMI of men is equal to the median BMI of women.

The requirements of having two independent and random samples and each having more than 10 values are met.

H : Men and women have BMI values with equal

SlideSlide 65Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

H0: Men and women have BMI values with equal medians

H1: Men and women have BMI values with medians that are not equal

Use the data in Table 13-5 with the Wilcoxon rank-s um test and a 0.05 significance level to test the clai m that the median BMI of men is equal to the median BMI of

Example: BMI of Men and Women

the median BMI of men is equal to the median BMI of women.

Procedures .

1. Rank all 25 BMI measurements combined. This is done in Table 13 -5.

SlideSlide 66Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

2. Find the sum of the ranks of either one of the samples. For men the sum of ranks is

R = 11.5 + 9 + 14 + … + 15.5 = 187

Example: BMI of Men and Women

Procedures (cont.) .3. Calculate the value of the z test statistic.3. Calculate the value of the z test statistic.

1 1 2( 1) 13(13 12 1)169

2 2R

n n nµ + + + += = =

1 2 1 2( 1) (13)(12)(13 12 1)18.385

12 12R

n n n nσ + + + += = =

SlideSlide 67Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

12 12

187 1690.98

18.385R

R

Rz

µσ− −= = =

Example: BMI of Men and Women

Use the data in Table 13-5 with the Wilcoxon rank-s um test and a 0.05 significance level to test the clai m that the median BMI of men is equal to the median BMI of the median BMI of men is equal to the median BMI of women.

A large positive value of z would indicate that the higher ranks are found disproportionately in Sample 1, and a large negative value of z would

SlideSlide 68Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Sample 1, and a large negative value of z would indicate that Sample 1 had a disproportionate share of lower ranks.

Example: BMI of Men and Women

Use the data in Table 13-5 with the Wilcoxon rank-s um test and a 0.05 significance level to test the clai m that the median BMI of men is equal to the median BMI of the median BMI of men is equal to the median BMI of women.

We have a two tailed test (with αααα = 0.05), so thecritical values are 1.96 and –1.96.

The test statistic of 0.98 does not fall within the critical region, so we fail to reject the null hypo thesis

SlideSlide 69Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

critical region, so we fail to reject the null hypo thesis that men and women have BMI values with equal medians.

It appears that BMI values of men and women are basically the same.

The preceding example used only 13 of the 40

Example: BMI of Men and Women

The preceding example used only 13 of the 40 sample BMI values for men listed in Data Set 1 in Appendix B, and it used only 12 of the 40 BMI values for women. Do the results change if we use all 40 sample values for both men and women?

SlideSlide 70Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

The null and alternative hypotheses are the same.

In the Minitab display below ETA1 and ETA2 denote t he medians of the first and second samples, respective ly.

The rank sum for men is W = 1727.5

Example: BMI of Men and Women

The rank sum for men is W = 1727.5

The P-value is 0.3032 (or 0.3031 after adjustment for ties).

Minitab

SlideSlide 71Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Because the P-value is greater than α = 0.05, we fail to reject the null hypothesis.

Example: BMI of Men and Women

There is not sufficient evidence to warrant rejection of the claim that men and women have BMI values with equal medians.

Minitab

SlideSlide 72Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Recap

In this section we have discussed:

The Wilcoxon Rank -Sum Test for Two Independent Samples.

It is used to test the null hypothesis that the two independent samples come from

SlideSlide 73Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

the two independent samples come from populations with equal medians.

TW9

Slide 73

TW9 period at end of sentenceTom Wegleitner; 24/5/2006

Section 13 -5 Kruskal-Wallis Test

SlideSlide 74Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Created by Erin Hodgess, Houston, TexasRevised to accompany 10th Edition, Jim Zimmer, Chattanooga State,

Chattanooga, TN

Key Concept

This section introduces the Kruskal -This section introduces the Kruskal -Wallis test, which uses ranks of data from three or more independent samples to test the null hypothesis that the samples come from populations with equal medians.

SlideSlide 75Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

populations with equal medians.

Definition .• The Kruskal-Wallis test (also called the H

test ) is a nonparametric test that uses ranks

Kruskal-Wallis Test

test ) is a nonparametric test that uses ranks of sample data from three or more independent populations.

• It is used to test the null hypothesis that the independent samples come from populations with the equal medians.

SlideSlide 76Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

with the equal medians.

H0: The samples come from populations with equal medians.

H1: The samples come from populations with medians that are not all equal.

• We compute the test statistic H, which has a distribution that can be approximated by the

Kruskal-Wallis Test

distribution that can be approximated by the chi-square ( χχχχ2 ) distribution as long as each sample has at least 5 observations.

• When we use the chi-square distribution in this context, the number of degrees of

SlideSlide 77Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

this context, the number of degrees of freedom is k – 1, where k is the number of samples.

Kruskal-Wallis Test

1. We have at least three independent

Requirements

1. We have at least three independent samples, all of which are randomly selected.

2. Each sample has at least 5 observations.

SlideSlide 78Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

3. There is no requirement that the populations have a normal distribution or any other particular distribution.

• N = total number of observations in all observations combined

Kruskal-Wallis TestNotation

observations combined

• k = number of samples

• R1 = sum of ranks for Sample 1

• n1 = number of observations in Sample 1

• For Sample 2, the sum of ranks is R and the

SlideSlide 79Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

• For Sample 2, the sum of ranks is R2 and the number of observations is n2 , and similar notation is used for the other samples.

Kruskal-Wallis Test

Test Statistic22 2

1 212... 3( 1)kRR R

H N

= + + + − +

Critical Values1. Test is right-tailed.

1 2

1 2

... 3( 1)( 1)

k

k

H NN N n n n

= + + + − + +

SlideSlide 80Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

2. df = k – 1 (Because the test statistic H can be approximated by the χχχχ2 distribution, use Table A-4).

Procedure for Finding the Value of the Test Statistic H

1 Temporarily combine all samples into one big sample and assign a rank to each sample sample and assign a rank to each sample value.

2. For each sample, find the sum of the ranks and find the sample size.

3. Calculate H by using the results of Step 2 and

SlideSlide 81Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

3. Calculate H by using the results of Step 2 and the notation and test statistic given on the preceding slide.

Procedure for Finding the Value of the Test Statistic H

The test statistic H is basically a measure of the variance of the rank sums R1 , R2 , … , R k.the variance of the rank sums R1 , R2 , … , R k.

If the ranks are distributed evenly among the sample groups, then H should be a relatively small number.

If the samples are very different, then the ranks will be excessively low in some groups

SlideSlide 82Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

ranks will be excessively low in some groups and high in others, with the net effect that Hwill be large.

Example: Effects of Treatments on Poplar Tree Weights

Table 13-6 lists weights of poplar trees given different treatments. (Numbers in parentheses different treatments. (Numbers in parentheses are ranks.)

SlideSlide 83Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Use the data in Table 13-6 with the Kruskal-Wallis test to test the claim that the four samples come from populations with equal medians.

Example: Effects of Treatments on Poplar Tree Weights

populations with equal medians.

Are requirements met?There are three or more independent and random

samples.

Each sample size is 5. (Requirement is at least 5.)

SlideSlide 84Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

H0: The populations of poplar tree weights from the f our treatments have equal medians.

H1: The four population medians are not all equal.

Use the data in Table 13-6 with the Kruskal-Wallis test to test the claim that the four samples come from populations with equal medians.

Example: Effects of Treatments on Poplar Tree Weights

populations with equal medians.

The following statistics come from Table 13-6:n1 = 5, n2 = 5, n3 = 5, n4 = 5

N = 20

SlideSlide 85Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

R1 = 45, R2 = 37.5, R3 = 42.5, R4 = 85

Use the data in Table 13-6 with the Kruskal-Wallis test to test the claim that the four samples come from populations with equal medians.

Example: Effects of Treatments on Poplar Tree Weights

populations with equal medians.

Evaluate the test statistic. .22 2

1 2

1 2

12... 3( 1)

( 1)k

k

RR RH N

N N n n n

= + + + − + +

2 2 22

SlideSlide 86Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

2 2 2245 37.5 8512 42.53(20 1)

20(20 1) 5 5 5 5

= + + + − + +

8.214=

Use the data in Table 13-6 with the Kruskal-Wallis test to test the claim that the four samples come from populations with equal medians.

Example: Effects of Treatments on Poplar Tree Weights

populations with equal medians.

Find the critical value. .Because each sample has at least five observations, the distribution of H is approximately a chi-square distribution.

SlideSlide 87Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

df = k – 1 = 4 – 1 = 3 α = 0.05

From Table A-4 the critical value = 7.815.

Use the data in Table 13-6 with the Kruskal-Wallis test to test the claim that the four samples come from populations with equal medians.

Example: Effects of Treatments on Poplar Tree Weights

populations with equal medians.

The test statistic 8.214 is in the critical region, sowe reject the null hypothesis of equal medians.

At least one of the medians appears to be different

SlideSlide 88Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

At least one of the medians appears to be different from the others.

RecapIn this section we have discussed:

The Kruskal -Wallis Test is the non -The Kruskal -Wallis Test is the non -parametric equivalent of ANOVA.

It tests the hypothesis that three or more populations have equal means.

The populations do not have to be

SlideSlide 89Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

The populations do not have to be normally distributed.

Section 13 -6 Rank Correlation

SlideSlide 90Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Created by Erin Hodgess, Houston, TexasRevised to accompany 10th Edition, Jim Zimmer, Chattanooga State,

Chattanooga, TN

Key Concept

This section describes the nonparametric method of rank correlation, which uses paired data to test for an association between two data to test for an association between two variables.

In Chapter 10 we used paired sample data to compute values for the linear correlation coefficient r, but in this section we use ranks as

SlideSlide 91Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

coefficient r, but in this section we use ranks as a the basis for computing the rank correlation coefficient rs .

Rank CorrelationDefinition

The rank correlation test (or Spearman’s rank correlation test ) is a non-parametric test that uses ranks of sample data consisting of matched pairs.

It is used to test for an association between two variables, so the null and alternative hypotheses a re

as follows (where ρs denotes the rank correlation coefficient for the entire population):

SlideSlide 92Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Ho: ρs = 0 (There is no correlation between the two variables.)

H1: ρs ≠≠≠≠ 0 (There is a correlation between the two variables.)

AdvantagesRank correlation has these advantages over the parametric methods discussed

in Chapter 10:

1. The nonparametric method of rank correlation can be used in a wider variety of circumstances than the parametric method of linear correlation. With rank correlation, we can analyze paired data that are ranks or can

SlideSlide 93Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

be converted to ranks.

2. Rank correlation can be used to detect some (not all) relationships that are not linear.

Disadvantages

A disadvantage of rank correlation is its efficiency rating of 0.91, as described in Section 13 -1.Section 13 -1.

This efficiency rating shows that with all other circumstances being equal, the nonparametric approach of rank correlation requires 100 pairs of sample data to achieve the same results as only 91 pairs of sample observations

SlideSlide 94Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

as only 91 pairs of sample observations analyzed through parametric methods, assuming that the stricter requirements of the parametric approach are met.

Figure 13-4 Rank Correlation for Testing H0: ρρρρs = 0

SlideSlide 95Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Figure 13-4 Rank Correlation for Testing H0: ρρρρs = 0

SlideSlide 96Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Requirements

1. The sample paired data have been randomly selected. randomly selected.

2. Unlike the parametric methods of Section 10 -2, there is no requirement that the sample pairs of data have a bivariate normal distribution. There is no requirement of a normal distribution for any population.

SlideSlide 97Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

for any population.

Notation

rs = rank correlation coefficient for sample paired data (rs is a sample statistic)

ρρρρs = rank correlation coefficient for all the population data (ρρρρs is a population parameter)

n = number of pairs of data

d = difference between ranks for the two values within a pair

SlideSlide 98Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

a pair

Rank CorrelationTest Statistic

No ties:After converting the data in each sample to ranks, if there are no ties among ranks for either variable, the exact value of the test statistic can be calculated using this formula:

Ties:After converting the data in each sample to ranks, if either variable has ties among its ranks, the exact value of the test statistic rs can be found by using Formula 10-1 with the

2

2

61

( 1)s

dr

n n

Σ= −−

SlideSlide 99Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

test statistic rs can be found by using Formula 10-1 with the ranks:

2 2 2 2

( )( )

( ) ( ) ( ) ( )s

n xy x yr

n x x n y y

Σ − Σ Σ=Σ − Σ Σ − Σ

Critical values:

If n ≤≤≤≤ 30, critical values are found in Table A -9.

Rank Correlation

If n ≤≤≤≤ 30, critical values are found in Table A -9.

If n > 30, use Formula 13 -1.

1s

zr

n

±=−

Formula 13-1

SlideSlide 100Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

1n −where the value of z corresponds to the significance

level. (For example, if αααα = 0.05, z – 1.96.)

Example: Rankings of CollegesUse the data in Table 13-7 to determine if there is a correlation between the student rankings and the rankings of the magazine.

SlideSlide 101Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Example: Rankings of CollegesUse the data in Table 13-7 to determine if there is a correlation between the student rankings and the rankings of the magazine.

H : ρρρρ = 0 H : ρρρρ ≠≠≠≠ 0

Since neither variable has ties in the ranks:

2

2 2

6 6(24)1 1

( 1) 8(8 1)s

dr

n n

Σ= − = −− −

H0: ρρρρs = 0 H1: ρρρρs ≠≠≠≠ 0

SlideSlide 102Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

2 21 1

( 1) 8(8 1)sr n n= − = −

− −

1441 0.714

504= − =

Example: Rankings of CollegesUse the data in Table 13-7 to determine if there is a correlation between the student rankings and the rankings of the magazine.

H : ρρρρ = 0 H : ρρρρ ≠≠≠≠ 0H0: ρρρρs = 0 H1: ρρρρs ≠≠≠≠ 0

From Table A -9 the critical values are ±±±±0.738.

Because the test statistic of rs = 0.714 does not exceed the critical value, we fail to reject the null hypothesis.

SlideSlide 103Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

null hypothesis.

There is not sufficient evidence to support a claim of a correlation between the rankings of the students and the magazine.

Assume that the preceding example is expanded by including a total of 40 colleges and that the test statistic r is found to be 0.300. If the significance level of αααα =

Example: Rankings of CollegesLarge Sample Case

rs is found to be 0.300. If the significance level of αααα = 0.05, what do you conclude about the correlation?

Since n = 40 exceeds 30, we find the critical value from Formula 13-1

SlideSlide 104Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

1.960.314

1 40 1s

zr

n

± ±= = = ±− −

Assume that the preceding example is expanded by including a total of 40 colleges and that the test statistic r is found to be 0.300. If the significance level of αααα =

Example: Rankings of CollegesLarge Sample Case

rs is found to be 0.300. If the significance level of αααα = 0.05, what do you conclude about the correlation?

The test statistic of rs = 0.300 does not exceed the critical value of 0.314, so we fail to reject the null hypothesis.

SlideSlide 105Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

hypothesis.

There is not sufficient evidence to support the claim of a correlation between students and the magazine.

The data in Table 13-8 are the numbers of games played and the last scores (in millions) of a Raiders of the Lost Ark pinball game.

Example: Detecting a Nonlinear Pattern

the Lost Ark pinball game.

We expect that there should be an association between the number of games played and the pinball score.

H0: ρρρρs = 0 H1: ρρρρs ≠≠≠≠ 0

SlideSlide 106Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

There are no ties among ranks of either list.

Example: Detecting a Nonlinear Pattern

There are no ties among ranks of either list.

2

2 2

6 6(6)1 1

( 1) 9(9 1)s

dr

n n

Σ= − = −− −

361 0.950= − =

SlideSlide 107Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

361 0.950

720= − =

Example: Detecting a Nonlinear Pattern

Since n = 9 is less than 30, use Table A -9

Critical values are ± 0.700

The sample statistic 0.950 exceeds 0.700, so we conclude that there is significant evidence to reject the null hypothesis of no correlation.

SlideSlide 108Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

There appears to be correlation between the number of games played and the score.

Example: Detecting a Nonlinear Pattern

If the preceding example is done using the methods of Chapter 9, the linear correlation coefficient is r = 0.586.Chapter 9, the linear correlation coefficient is r = 0.586.

This leads to the conclusion that there is not enough evidence to support the claim of a significant linear correlation, whereas the nonlinear test found that there was enough evidence .

The Excel scatter diagram

Excel

SlideSlide 109Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

The Excel scatter diagram shows that there is a non-linear relationship that the parametric method would not have detected.

RecapIn this section we have discussed:

Rank correlation which is the non-parametric equivalent of testing for correlation described in Chapter 10.

It uses ranks of matched pairs to test for association.

Sometimes rank correlation can detect non -

SlideSlide 110Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Sometimes rank correlation can detect non -linear correlation that the parametric test will not recognize.

Section 13 -7 Runs Test for RandomnessRandomness

SlideSlide 111Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Created by Erin Hodgess, Houston, TexasRevised to accompany 10th Edition, Jim Zimmer, Chattanooga State,

Chattanooga, TN

Key Concept

This section introduces the runs test for randomness, which can be used to determine whether the sample data in a sequence are in a whether the sample data in a sequence are in a random order.

This test is based on sample data that have two characteristics, and it analyzes runs of those characteristics to determine whether the runs appear to result from some random process, or

SlideSlide 112Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

appear to result from some random process, or whether the runs suggest that the order of the data is not random.

Runs Test for Randomness

Definitions

A run is a sequence of data having the same characteristic; the sequence is preceded

and followed by data with a different characteristic or by no data at all.

The runs test uses the number of runs in a

SlideSlide 113Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

The runs test uses the number of runs in a sequence of sample data to test for randomness in the order of the data.

Fundamental Principles of the Run Test

Reject randomness if the number of runs is very low or very high.Example: The sequence of genders FFFFFMMMMM is Example: The sequence of genders FFFFFMMMMM is not random because it has only 2 runs, so the numbe r of runs is very low .

Example: The sequence of genders FMFMFMFMFM is not random because there are 10 runs, which is very high .

SlideSlide 114Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

It is important to note that the runs test for randomness is based on the order in which the data occur; it is not based on the frequency of the data.

Figure 13-5 Procedure for Runs Test for Randomness

SlideSlide 115Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Figure 13-5 Procedure for Runs Test for Randomness

SlideSlide 116Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Requirements

1. The sample data are arranged according to some ordering scheme, such as the order some ordering scheme, such as the order in which the sample values were obtained.

2. Each data value can be categorized into one of two separate categories (such as male/female).

SlideSlide 117Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

male/female).

Notation

n1 = number of elements in the sequence that have one particular characteristic (The have one particular characteristic (The characteristic chosen for n1 is arbitrary.)

n2 = number of elements in the sequence that have the other characteristic

SlideSlide 118Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

G = number of runs

Runs Test for Randomness

For Small Samples (n1 ≤ 20 and n2 ≤ 20) and αααα = 0.05:

Test StatisticTest statistic is the number of runs G

Critical Values

SlideSlide 119Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Critical ValuesCritical values are found in Table A -10.

Runs Test for Randomness

For Small Samples (n1 ≤ 20 and n2 ≤ 20) and αααα = 0.05:

Decision criteria

Reject randomness if the number of runs G is:

• less than or equal to the smaller critical value found in Table A -10.

SlideSlide 120Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

value found in Table A -10.

• or greater than or equal to the larger critical value found in Table A -10.

TW16

Slide 120

TW16 put periods at end of sentencesTom Wegleitner; 24/5/2006

Test Statistic

For Small Samples (n1 ≤ 20 and n 2 ≤ 20) and αααα = 0.05:

Runs Test for Randomness

Test Statistic

G

G

Gz

µσ−=

1 221G

n n

n nµ = +

+where

SlideSlide 121Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

1 2n n+

1 2 1 2 1 22

1 2 1 2

(2 )(2 )

( ) ( 1)G

n n n n n n

n n n nσ − −=

+ + −and

Runs Test for Randomness

For Large Samples (n1 > 20 or n 2 > 20) or αααα ≠ 0.05:

Critical Values

Critical values of z: Use Table A -2.

SlideSlide 122Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Example: Small SampleGenders of Bears

Listed below are the genders of the first 10 bears from Data Set 6 in Appendix B. Use a 0.05 significance level to test for randomness in the sequence of genders.to test for randomness in the sequence of genders.

M M M M F F M M F F

Separate the runs as shown below .

M M M M F F M M F F

SlideSlide 123Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

M M M M F F M M F F

2nd run 3rd run 4th run1st run

Example: Small SampleGenders of Bears

M M M M F F M M F F

2nd run 3rd run 4th run1st run

n1 = total number of males = 6

n2 = total number of females = 4

G = number of runs = 4

SlideSlide 124Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

G = number of runs = 4

Because n1 ≤ 20 and n2 ≤ 20 and αααα = 0.05, the test statisticis G = 4

Example: Small SampleGenders of Bears

M M M M F F M M F F

2nd run 3rd run 4th run1st run

From Table A-10, the critical values are 2 and 9.

Because G = 4 is not less than or equal to 2, nor is it greater than or equal to 9, we do not reject

SlideSlide 125Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

it greater than or equal to 9, we do not reject randomness.

It appears the sequence of genders is random.

Refer to the rainfall amounts for Boston as listed in Data Set 10 in Appendix B. Is there sufficient evidence to support the claim that rain on Mondays is not rando m?

Example: Large SampleBoston Rainfall on Mondays

support the claim that rain on Mondays is not rando m?

D D D D R D R D D R D D R D D D R D D R R R D D D D R D R D R R R D R D D D R D D D R D R D D R D D D R

H0: The sequence is random.H : The sequence is not random.

SlideSlide 126Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

H1: The sequence is not random.

n1 = number of Ds = 33n2 = number or Rs = 19G = number of runs = 30

Since n1 > 20, we must calculate z using the formulas:

Example: Large SampleBoston Rainfall on Mondays

2 2(33)(19)n n1 2

1 2

2 2(33)(19)1 1 25.115

33 19G

n n

n nµ = + = + =

+ +

1 2 1 2 1 22

1 2 1 2

(2 )(2 )

( ) ( 1)G

n n n n n n

n n n nσ − −=

+ + −

2

(2)(33)(19)[(2(33)(19) 33 19]3.306

(33 19) (33 19 1)

− −= =+ + −

SlideSlide 127Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

23.306

(33 19) (33 19 1)= =

+ + −

30 25.1151.48

3.306G

G

Gz

µσ− −= = =

The critical values are z = -1.96 and 1.96.

Example: Large SampleBoston Rainfall on Mondays

The critical values are z = -1.96 and 1.96.

The test statistic of z = 1.48 does not fall within the critical region, so we fail to reject the null hypothesis of randomness.

SlideSlide 128Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

The given sequence does appear to be random.

Recap

In this section we have discussed:

The runs test for randomness The runs test for randomness which can be used to determine whether the sample data in a sequence are in a random order.

We reject randomness if the number of runs is very low or very

SlideSlide 129Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

number of runs is very low or very high.


Recommended