NONPARAMETRIK

NONPARAMETRIK

SWN SCIENCE DEPARTMENT

2

The majority of hypothesis tests discussed so far have made

inferences about population parameters, such as the mean and

the proportion. These parametric tests have used the

parametric statistics of samples that came from the population

being tested.

To formulate these tests, we made restrictive assumptions

about the populations from which we drew our samples. For

example, we assumed that our samples either were large or

came from normally distributed populations. But populations

are not always normal.

NON PARAMETRIC TEST


3

And even if a goodness-of-fit test indicates that a population is

approximately normal. We cannot always be sure we’re right,

because the test is not 100 percent reliable.

Fortunately, in recent times statisticians have develops useful

techniques that do not make restrictive assumption about the

shape of population distribution.

These are known as distribution – free or, more commonly,

nonparametric test.

Non parametric statistical procedures in preference to their

parametric counterparts.

The hypotheses of a nonparametric test are concerned with

something other than the value of a population parameter.

A large number of these tests exist, but this section will examine

only a few of the better known and more widely used ones :


4

NON PARAMETRIC TESTS

SIGN TEST

WILCOXON SIGNED RANK TEST

MANN – WHITNEY TEST(WILCOXON RANK SUM TEST)

RUN TEST

KRUSKAL – WALLIS TEST

KOLMOGOROV – SMIRNOV TEST

LILLIEFORS TEST


5

The sign test is used to test hypotheses about the median of a continuous distribution. The median of a distribution is a value of the random variable X such that the probability is 0,5 that an observed value of X is less than or equal to the median, and the probability is 0,5 that an observed value of X is greater than or equal to the median. That is,

Since the normal distribution is symmetric, the mean of a normal distribution equals the median. Therefore, the sign test can be used to test hypotheses about the mean of a normal distribution.

THE SIGN TEST


6

Let X denote a continuous random variable with median and let denote a random sample of size n from the population of interest. If denoted the hypothesized value of the population median, then the usual forms of the hypothesis to be tested can be stated as follows :

(right-tailed test)

(left-tailed test)

(two-tailed test)

VERSUS


7

Form the differences : Now if the null hypothesis is true,

any difference is equally likely to be positive or negative. An appropriate test statistic is the number of these differences that are positive, say . Therefore, to test the null hypothesis we are really testing that the number of plus signs is a value of a Binomial random variable that has the parameter

p = 0,5 .A p-value for the observed number of plus signs can be calculated directly from the Binomial distribution. Thus, if the computed p-value.

is less than or equal to some preselected significance level α , we will reject and conclude is true.


8

To test the other one-sided hypothesis,

vs

is less than or equal α, we will reject . The two-sided alternative may also be tested. If the hypotheses are:

vs p-value is :


9

It is also possible to construct a table of critical value for the sign test.As before, let denote the number of the differences that are positive and let denote the number of the differences that are negative.Let , table of critical values for the sign test that ensure that

If the observed value of the test-statistic , the the null hypothesis should be reject and accepted


10

If the alternative is , then reject if .If the alternative is ,then reject if .The level of significance of a one-sided test is one-half the value for a two-sided test.


11

Since the underlying population is assumed to be continuous, there is a zero probability that we will find a “tie” , that is , a value of exactly equal to .When ties occur, they should be set aside and the sign test applied to the remaining data.

TIES in the SIGN TEST


12

When , the Binomial distribution is well approximated by a normal distribution when n is at least 10. Thus, since the mean of the Binomial is and the variance is , the distribution of is approximately normal with mean 0,5n and variance 0,25n whenever n is moderately large.Therefore, in these cases the null hypothesis can be tested using the statistic :

THE NORMAL APPROXIMATION


13

Critical Regions/Rejection Regions for α-level tests

versus

are given in this table :CRITICAL/REJECTION REGIONS FOR

Alternative CR/RR


14

The sign test makes use only of the plus and minus signs of the differences between the observations and the median (the plus and minus signs of the differences between the observations in the paired case).Frank Wilcoxon devised a test procedure that uses both direction (sign) and magnitude.This procedure, now called the Wilcoxon signed-rank test.The Wilcoxon signed-rank test applies to the case of the symmetric continuous distributions.Under these assumptions, the mean equals the median.

THE WILCOXON SIGNED-RANK TEST


15

Description of the test :We are interested in testing,

versus

16

Assume that is a random sample from a continuous and symmetric distribution with mean/median : .Compute the differences , i = 1, 2, … nRank the absolute differences , and then give the ranks the signs of their corresponding differences.Let be the sum of the positive ranks, and be the absolute value of the sum of the negative ranks, and let .

Critical values of , say .1. If , then value of the statistic , reject

2. If , reject if 3. If , reject if



17

If the sample size is moderately large (n>20), then it can be shown that or has approximately a normal distribution with mean

andvariance

Therefore, a test of can be based on the statistic

LARGE SAMPLE APPROXIMATION


18

Test statistic :

Theorem : The probability distribution of when is true, which is based on a random sample of size n, satisfies :

Wilcoxon Signed-Rank Test


19

Proof :

Let if , then

where

For a given , the discrepancy has a 50 : 50 chance

being “+” or “-”. Hence, where


20


21


22

The Wilcoxon signed-rank test can be applied to paired data.Let ( ) , j = 1,2, …n be a collection of paired observations from two continuous distributions that differ only with respect to their means. The distribution of the differences is continuous and symmetric.The null hypothesis is : , which is equivalent to

.To use the Wilcoxon signed-rank test, the differences are first ranked in ascending order of their absolute values, and then the ranks are given the signs of the differences.

PAIRED OBSERVATIONS


23

Let be the sum of the positive ranks and be the absolute value of the sum of the negative ranks, and .If the observed value , then is rejected and accepted.If , then reject , ifIf , reject , if


24

Eleven students were randomly selected from a large statistics class, and their numerical grades on two successive examinations were recorded.

Use the Wilcoxon signed rank test to determine whether the second test was more difficult than

the first. Use α = 0,1.

EXAMPLE

Student Test 1 Test 2 Difference Rank

Sign Rank

1234567891011

9478896249788082628379

8565925652747984487182

913-36-341-21412-3

8104746121194

810-47-461-2119-4


25

solution :Jumlah ranks positif :

TOLAK H0

0 1,281,69


26

Ten newly married couples were randomly selected, and each husband and wife were independently asked the question of how many children they would like to have. The following information was obtained.

Using the sign test, is test reason to believe that wives want fewer children than husbands?Assume a maximum size of type I error of 0,05

EXAMPLE

COUPLE 1 2 3 4 5 6 7 8 9 10

WIFE XHUSBAND Y

3 2 1 0 0 1 2 2 2 0 2 3 2 2 0 2 1 3 1 2


27

Tetapkan dulu H0 dan H1 :

H0 : p = 0,5

vs H1 : p < 0,5

Ada tiga tanda +.Di bawah H0 , S ~ BIN (9 , 1/2)

P(S ≤ 3) = 0,2539Pada peringkat α = 0,05 , karena 0,2539 > 0,05maka H0 jangan ditolak.

SOLUSI

Pasangan 1 2 3 4 6 7 8 9 10

Tanda + - - - - + - + -


28

Suppose that we have two independent continuous populations X1 and X2 with means µ1 and µ2. Assume that the distributions of X1 and X2 have the same shape and spread, and differ only (possibly) in their means.The Wilcoxon rank-sum test can be used to test the hypothesis H0 : µ1 = µ2. This procedure is sometimes called the Mann-Whitney test or Mann-Whitney U Test.

THE WILCOXON RANK-SUM TEST


29

Let and be two independent random samples of sizes from the continuous populations X1 and X2. We wish to test the hypotheses :

H0 : µ1 = µ2

versus H1 : µ1 ≠ µ2

The test procedure is as follows. Arrange all n1 + n2 observations in ascending order of magnitude and assign ranks to them. If two or more observations are tied, then use the mean of the ranks that would have been assigned if the observations differed.

Description of the Test


30

Let W1 be the sum of the ranks in the smaller sample (1), and define W2 to be the sum of the ranks in the other sample.Then,

Now if the sample means do not differ, we will expect the sum of the ranks to be nearly equal for both samples after adjusting for the difference in sample size. Consequently, if the sum of the ranks differ greatly, we will conclude that the means are not equal.Refer to table with the appropriate sample sizes n1 and n2 , the critical value wα can be obtained.


31

H0 : µ1 = µ2 is rejected, if either of the observed values

w1 or w2 is less than or equal wα

If H1 : µ1 < µ2, then reject H0 if w1 ≤ wα

For H1 : µ1 > µ2, reject H0 if w2 ≤ wα.


32

When both n1 and n2 are moderately large, say, greater than 8, the distribution of W1 can be well approximated by the normal distribution with mean :

and variance :

LARGE-SAMPLE APPROXIMATION


33

Therefore, for n1 and n2 > 8, we could use :

as a statistic, and critical region is : two-tailed test

upper-tail test

lower-tail test


34

A large corporation is suspected of sex-discrimination in the salaries of its employees. From employees with similar responsibilities and work experience, 12 male and 12 female employees were randomly selected ; their annual salaries in thousands of dollars are as follows :

Is there reason to believe that there random samples come from populations with different distributions ? Use α = 0,05

EXAMPLE

Females

22,5 19,8 20,6 24,7 23,2 19,2 18,7 20,9 21,6 23,5 20,7 21,6

Males 21,9 21,6 22,4 24,0 24,1 23,4 21,2 23,9 20,5 24,5 22,3 23,6


35

H0 : f1(x) = f2(x) APA ARTINYA??

random samples berasal dari populasi dengan distribusi yang samaH1 : f1(x) ≠ f2(x)

Gabungkan dan buat peringkat salaries :

SOLUSI

SEX GAJI PERINGKAT

F 18,7 1

F 19,2 2

F 19,8 3

M 20,5 4

F 20,6 5

F 20,7 6

F 20,9 7

M 21,2 8

M 21,6 10

F 21,6 10

F 21,6 10


36

M 21,9 12

M 22,3 13

M 22,4 14

F 22,5 15

F 23,2 16

M 23,4 17

F 23,5 18

M 23,6 19

M 23,9 20

M 24,0 21

M 24,1 22

M 24,5 23

F 24,7 24

C........


37

Andaikan, kita pilih sampel dari female, maka jumlah peringkatnya R1 = RF = 117

Statistic

nilai dari statistic U adalah


38

Grafik

α = 0,05 Zhit = 1,91

maka terima H0

-1,96 1,96 ARTINYA ???


39

The Kolmogorov-Smirnov Test (K-S) test is conducted by the comparing the hypothesized and sample cumulative distribution function.A cumulative distribution function is defined as : and the sample cumulative distribution function, S(x), is defined as the proportion of sample values that are less than or equal to x.The K-S test should be used instead of the to determine if a sample is from a specified continuous distribution.To illustrate how S(x) is computed, suppose we have the following 10 observations :

110, 89, 102, 80, 93, 121, 108, 97, 105, 103.

KOLMOGOROV – SMIRNOV TEST


40

We begin by placing the values of x in ascending order, as follows :

80, 89, 93, 97, 102, 103, 105, 108, 110, 121.Because x = 80 is the smallest of the 10 values, the proportion of values of x that are less than or equal to 80 is : S(80) = 0,1.

X S(x) = P(X ≤ x)

80 0,1

89 0,2

93 0,3

97 0,4

102 0,5

103 0,6

105 0,7

108 0,8

110 0,9

121 1,0


41

The test statistic D is the maximum- absolute difference between the two cdf’s over all observed values. The range on D is 0 ≤ D ≤ 1, and the formula is :

where x = each observed value S(x) = observed cdf at x F(x) = hypothesized cdf at x


42

Let X(1) , X(2) , …. , X(n) denote the ordered observations of a random sample of size n, and define the sample cdf as :

is the proportion of the number of sample values less than

or equal to x.


43

The Kolmogorov – Smirnov statistic, is defined to be :

For the size α of type I error, the critical region is of form :


44

A state vehicle inspection station has been designed so that inspection time follows a uniform distribution with limits of 10 and 15 minutes.A sample of 10 duration times during low and peak traffic conditions was taken. Use the K-S test with α = 0,05 to determine if the sample is from this uniform distribution. The time are :11,3 10,4 9,8 12,6 14,813,0 14,3 13,3 11,5 13,6

EXAMPLE 1


45

1. H0 : sampel berasal dari distribusi Uniform (10,15)versus H1 : sampel tidak berasal dari distribusi Uniform (10,15)

2. Fungsi distribusi kumulatif dari sampel : S (x) dihitung dari,

SOLUTION


46

WaktuPengamatan

xS(x) F(x)

9,8 0,10 0,00 0,10

10,4 0,20 0,08 0,12

11,3 0,30 0,26 0,04

11,5 0,40 0,30 0,10

12,6 0,50 0,52 0,02

13,0 0,60 0,60 0,00

13,3 0,70 0,66 0,04

13,6 0,80 0,72 0,08

14,3 0,90 0,86 0,04

14,8 1,00 0,96 0,04

Hasil Perhitungan dari K-S


47

, untuk x = 10,4Dalam tabel , n = 10 , α = 0,05 D10,0.05 = 0,41

f(D)

α = P(D ≥ D0)

D0 D

0,12 < 0,41 maka do not reject H0


48

Suppose we have the following ten observations 110, 89, 102, 80, 93, 121, 108, 97, 105, 103 ;were drawn from a normal distribution, with mean µ = 100 and standard-deviation σ = 10.Our hypotheses for this test are H0 : Data were drawn from a normal distribution, with µ = 100 and σ = 10.

versusH1 : Data were not drawn from a normal distribution, with µ = 100 and σ = 10.

EXAMPLE 2


49

F(x) = P(X ≤ x)SOLUTION

x F(x)

80

89

93

97

102

103

105

108

110

121

P(X ≤ 80) = P(Z ≤ -2) = 0,0228

P(X ≤ 89) = P(Z ≤ -1,1) = 0,1357

P(X ≤ 93) = P(Z ≤ -0,7) = 0,2420

P(X ≤ 97) = P(Z ≤ -0,3) = 0,3821

P(X ≤ 102) = P(Z ≤ 0,2) = 0,5793

P(X ≤ 103) = P(Z ≤ 0,3) = 0,6179

P(X ≤ 105) = P(Z ≤ 0,5) = 0,6915

P(X ≤ 108) = P(Z ≤ 0,8) = 0,7881

P(X ≤ 110) = P(Z ≤ 1,0) = 0,8413

P(X ≤ 121) = P(Z ≤ 2,1) = 0,9821


50

x F(x) S(x)

80 0,0228 0,1 0,0772

89 0,1357 0,2 0,0643

93 0,2420 0,3 0,0580

97 0,3821 0,4 0,0179

102 0,5793 0,5 0,0793 =

103 0,6179 0,6 0,0179

105 0,6915 0,7 0,0085

108 0,7881 0,8 0,0119

110 0,8413 0,9 0,0587

121 0,9821 1,0 0,0179


51

Jika α = 0,05 , maka critical value, dengan n=10 diperoleh di tabel = 0,409.

Aturan keputusannya, tolak H0 jika D > 0,409

Karena H0 jangan ditolak atau terima H0 .

Artinya, data berasal dari distribusi normal dengan µ = 100 dan σ = 10.


52

In most applications where we want to test for normality, the population mean and the population variance are known.In order to perform the K-S test, however, we must assume that those parameters are known. The Lilliefors test, which is quite similar to the K-S test.The major difference between two tests is that, with the Lilliefors test, the sample mean and the sample standard deviation s are used instead of µ and σ to calculate F (x).

LILLIEFORS TEST


53

A manufacturer of automobile seats has a production line that produces an average of 100 seats per day. Because of new government regulations, a new safety device has been installed, which the manufacturer believes will reduce average daily output.A random sample of 15 days’ output after the installation of the safety device is shown:93, 103, 95 , 101, 91, 105, 96, 94, 101, 88, 98, 94, 101, 92, 95The daily production was assumed to be normally distributed.Use the Lilliefors test to examine that assumption, with α = 0,01

EXAMPLE


54

Seperti pada uji K-S, untuk menghitung S (x) urutkan, sbb :

SOLUSI

x S(x)

88 1/15 = 0,067

91 2/15 = 0,133

92 3/15 = 0,200

93 4/15 = 0,267

94 6/15 = 0,400

95 8/15 = 0,533

96 9/15 = 0,600

98 10/15 = 0,667

101 13/15 = 0,867

103 14/15 = 0,933

105 15/15 = 1,000


55

Dari data di atas, diperoleh dan s = 4,85.

Selanjutnya F(x) dihitung sbb :

X F(x)

88

91

92

.

.

.

.

101

103

105


56

Akhirnya, buat rangkuman sbb :

Tabel, nilai kritis dari uji Lilliefors : α = 0,01 , n = 15 Dtab = 0,257

maka terima H0

x F(x) S(x)

88 0,0401 0,067 0,0269

91 0,1292 0,133 0,0038

92 0,1788 0,200 0,0212

93 0,2358 0,267 0,0312

94 0,3050 0,400 0,0950

95 0,3821 0,533 0,1509 = D

96 0,4602 0,600 0,1398

98 0,6255 0,667 0,0415

101 0,8238 0,867 0,0432

103 0,9115 0,933 0,0215

105 0,9608 1,000 0,0392


57

Usually a sample that is taken from a population should be random.The runs test evaluates the null hypothesisH0 : the order of the sample data is random

The alternative hypothesis is simply the negation of H0. There is no comparable parametric test to evaluate this null hypothesis.The order in which the data is collected must be retained so that the runs may be developed.

TEST BASED ON RUNS


58

DEFINITIONS :1. A run is defined as a sequence of the same

symbols.Two symbols are defined, and each sequence must contain a symbol at least once.

2. A run of length j is defined as a sequence of j observations, all belonging to the same group, that is preceded or followed by observations belonging to a different group.

For illustration, the ordered sequence by the sex of the employee is as follows :F F F M F F F M M F F M M M F F M F M M M M M F For the sex of the employee the ordered

sequence exhibits runs of F’s and M’s.


59

The sequence begins with a run of length three, followed by a run of length one, followed by another run of length three, and so on.The total number of runs in this sequence is 11.Let R be the total number of runs observed in an ordered sequence of n1 + n2 observations, where n1 and n2 are the respective sample sizes. The possible values of R are 2, 3, 4, …. (n1 + n2 ).

The only question to ask prior to performing the test is, Is the sample size small or large?We will use the guideline that a small sample has n1 and n2 less than or equal to 15.

In the table, gives the lower rL and upper rU values of the distribution f(r) with α/2 = 0,025 in each tail.


60

If n1 or n2 exceeds 15, the sample is considered large, in which case a normal approximation to f(r) is used to test H0 versus

H1.

f(r)

rAR

rL rU


61

The mean and variance of R are determined to be

normal approximation


62

The Kruskal – Wallis H test is the nonparametric equivalent of the Analysis of Variance F test.It test the null hypothesis that all k populations possess the same probability distribution against the alternative hypothesis that the distributions differ in location – that is, one or more of the distributions are shifted to the right or left of each other.The advantage of the Kruskall – Wallis H test over the F test is that we need make no assumptions about the nature of sampled populations.A completely randomized design specifies that we select independent random samples of n1, n2 , …. nk

observations from the k populations.

THE KRUSKAL - WALLIS H TEST


63

To conduct the test, we first rank all :n = n1 + n2 + n3 + … +nk observations and compute the rank sums, R1 , R2 , …, Rk for the k samples.

The ranks of tied observations are averaged in the same manner as for the WILCOXON rank sum test.Then, if H0 is true, and if the sample sizes n1 , n2 , …, nk each equal 5 or more, then the test statistic is defined by :

will have a sampling distribution that can be approximated by a chi-square distribution with (k-1) degrees of freedom.Large values of H imply rejection of H0 .


64

Therefore, the rejection region for the test is , where is the value that located α in the upper tail of the chi- square distribution.

The test is summarized in the following :


65

H0 : The k population probability distributions are identical

H1 : At least two of the k population probability distributions differ in location

Test statistic :

where, ni = Number of measurements in sample i

Ri = Rank sum for sample i, where the rank of each measurementis computed according to its relative magnitude in the totality of data for the k samples.

KRUSKAL – WALLIS H TESTFOR COMPARING k POPULATION PROBABILITY DISTRIBUTIONS


66

n = Total sample size = n1 + n2 + … +nk

Rejection Region : with (k-1) dofAssumptions :

1. The k samples are random and independent2. There are 5 or more measurements in each sample 3. The observations can be ranked

No assumptions have to be made about the shape of the population probability distributions.


67

Independent random samples of three different brands of magnetron tubes (the key components in microwave ovens) were subjected to stress testing, and the number of hours each operated without repair was recorded. Although these times do not represent typical life lengths, they do indicate how well the tubes can withstand extreme stress. The data are shown in table (below). Experience has shown that the distributions of life lengths for manufactured product are often non normal, thus violating the assumptions required for the proper use of an ANOVA F test.Use the K-S H test to determine whether evidence exists to conclude that the brands of magnetron tubes tend to differ in length of life under stress. Test using α = 0,05

Example


68

BRAND A B C

36 49 71 48 33 31 5 60 140 67 2 59 53 55 42


69

Lakukan ranking/peringkat dan jumlahkan peringkat dari 3 sample tersebut.

H0 : the population probability distributions of length of life under stress are identical for the three brands of magnetron tubes.

versusH1 : at least two of the population probability

distributions differ in location

Solusi

A peringkat B peringkat C peringkat

36 5 49 8 71 14

48 7 33 4 31 3

5 2 60 12 140 15

67 13 2 1 59 11

53 9 55 10 42 6

R1 = 36 R2 = 35 R3 = 49


70

Test statistic :

H0 ???

f(H)

H1,22 5,99


71

COMPARISON OF POPULATION PROPORTIONSGiven X1~BIN(n1, p1) and X2~BIN(n2, p2)

Statistics :

Are defined to be the sample proportions.

Assume, that X1 and X2 are independent;

2

22

1

11 ˆ;ˆ

n

Xp

n

Xp

)ˆ()ˆ()ˆˆ( 2121 pEpEppE

21 pp

)ˆ()ˆ()ˆˆ( 2121 pVarpVarppVar

2

22

1

11 )1()1(

n

pp

n

pp


72

For sufficiently large n1 and n2 the standardized statistic :

The (1-α)100% CI :

As p1 and p2 UNKNOWN, approximate (1-α)100% CIfor (p1-p2) :

22

22

1

1121

)1()1()ˆˆ( zn

pp

n

pppp

2

22

1

11

2121

)1()1(

)()ˆˆ(

npp

npp

pppp

2

22

1

11221

)ˆ1(ˆ)ˆ1(ˆ)ˆˆ(

n

pp

n

ppzpp


73

In the testing situation,Ho : p1 = p2 = p ( p unknown )

Versus

Test statistic :

The unknown common value of p is estimated by :

1H

21 pp 21 pp 21 pp

zZRR : zZRR :2: zZRR

testlos

21

21

)1()1(

ˆˆ

npp

npp

ppZ

21

21ˆnn

XXp


74

EXAMPLEMembers of the Department of statistics at Iowa State Union collected the following data on grades in an introductory business statistics course and an introductory engineering statistics course.

Course #Students #A gradesB.Stat 571 82E.Stat 156 25 Ho : p1=p2

Vs H1 : p1≠p2

; The proportion of A grades in two courses is equal.

1436,0571

82ˆ1 p 1603,0

156

25ˆ 2 p


75

1472,0156571

2582ˆ

p

)1561

5711)(8528,0(1472,0

1603,01436,0

Z

52,0Z

The p-value is 2P(Z≤-0,52) = 0,6030 If α= 5% < p-value

Ho would not be rejected

Proportion of A’s does not differ significantly in the two courses.


76

An insurance company is thinking about offering discount on its life insurance policies to non smokers. As part of its analysis, it randomly select 200 men who are 50 years old and asks them if they smoke at least one pack of cigarettes per day and if they have ever suffered from heart diseases. The results indicate that 20 out of 80 smokers and 15 out of 120 non smokers suffer from heart disease. Can we conclude at the 5% los that smokers have a higher incidence of heart disease than non smokers ? DATA

berumur 50th

perokok menderita penyakit JANTUNG parameter : p1

berumur 50th

bukan perokok menderita penyakit JANTUNG parameter : p2

EXERCISE

Solution:


77

Jelas Data Qualitative vs

Test statistic :

ztab

Sample proportion : ;

Pooled proportion estimate :Value of the test statistic:

)11

(ˆˆ

)ˆˆ(

21

21

nnqp

ppz

.645,1: 05,0 zzzRR

25,080

20ˆ1 p 125,0

120

15ˆ 2 p

175,0200

35

12080

1520ˆ

p

hitcal zz ˆ ˆ

ˆˆ

1 2

1 2

p -p (0,25-0,125)z= =

1 1 1 1pq( + ) 0,175(0,825)( + )

n n 80 120

0: 21 ppHo0: 211 ppH


78

tabcal zz 28,2 oHreject

Test statistic, is normally distributed

We can calculate p-value

p-value = Reject Ho

%13,10113,0)28,2( zP


79

SOAL-SOAL

1. Diberikan pmf dari variabel random X sbb: x 0 1 2 3 p(x) 0 k k 3k2

Tentukan k sehingga memenuhi sifat dari pmf!

xxp 0)(

130)( 2 kkkxp

0123 2 kk

1,3

10)1)(13( kkkk

1)(xp

Solusi: Ada dua sifat pmf, yaitu :


80

Untuk

Dengan demikian tidak memenuhi. Selanjutnya untuk dapat diperiksa ternyata pada kondisi ini memenuhi sifat pmf.Jadi nilai

01)1(1 pk

01)2( p

1k3

1k

3

1k

81

In a public opinion survey, 60 out of a sample of 100 high-income voters and 40 out of a sample of 75 low-income voters supported a decrease in sales tax.(a) Can we conclude at the 5% los that the

proportion of voters favoring a sales tax decrease differs between high and low-income voters?

(b) What is the p-value of this test?(c) Estimate the difference in proportions, with 99%

confidence! 0)(: 21 ppHo

0)(: 211 ppH

96,1: zRR

)11

(ˆˆ

)ˆˆ(

21

21

nnqp

ppz

Solution:

vs

Test statistic :



82

53,075

40ˆ;6,0

100

60ˆ 21 pp

571,0175

100

75100

4060ˆ

p

429,0ˆ1ˆ pq

93,0

)751

1001

)(429,0(571,0

)53,060,0(

calz

0-1,96 1,96

(a) Conclusion : don not reject Ho

(b) p-value = 2P(z > 0,93) = 2(0,1762) = 0,3524.(c)

The difference between the two-proportions is estimated to lie between -0,125 and 0,265

2

22

1

11221

ˆˆˆˆ)ˆˆ(

n

qp

n

qpzpp 75

)47,0)(53,0(

100

)4,0)(6,0(575,2)53,060,0(

195,007,0


83

TEST on MEANS WHEN THE OBSERVATIONS ARE PAIRED

TESTING THE PAIRED DIFFERENCES

Let (X1, Y1), (X2, Y2) … (Xn, Ym) be the n pairs, where (Xi, Yi) denotes the systolic blood pressure of the i th subject before and after the drug.It is assumed that the differences D1, D2, …, Dn constitute independent normally distributed RV such that: and

TEST STATISTIC:

iiDE 2DiDVar

oDoH : oDH :1vs

nS

DT

D

o

n

DD i

22 )(

1

1DD

nS iDan

d


84

Rejection criteria for testing hypotheses on means when the observation are paired

Null hypothesis Value test statistic under Ho

Alternative hypothesis Rejection criteria

Reject Ho whenor when

oDoH :ns

dt

d

o

Reject Ho when

Reject Ho when

1,2 ntt oDH :1

1,21 ntt

1,1 ntt

1, ntt

oDH :1

oDH :1


85

A paired difference experiment is conducted to compare the starting salaries of male and female college graduates who find jobs. Pairs are formed by choosing a male and female with the same major and similar GRADE-POINT-AVERAGE. Suppose a random sample of ten pairs is formed in this manner and starting annual salary of each person is recorded. The result are shown in table. Test to see whether there is evidence that the mean starting salary, μ1 , for males exceeds the mean starting salary, μ2, for female. Use α=0,05.


86

Pair Male Female Difference (male-female)

1 $ 14.300 $13.800 $ 500

2 16.500 16.600 -100

3 15.400 14.800 600

4 13.500 13.500 0

5 18.500 17.600 900

6 12.800 13.000 -200

7 14.500 14.200 300

8 16.200 15.100 1.100

9 13.400 13.200 200

10 14.200 13.500 700


87

Solution:

)0(0: 21 DoHvs

)0(0: 211 DH

Test statistic :

dxns

oxt D

DD

D

;

RR : reject Ho if : t > tα ; t0.05,9=1,833

400 n

Dxd iD

61,43489,888.1882 DD SS

91,21061,434

400t

0 1,833

t

T-distribution with 9 dof


88

tcal falls in RR

Reject Ho at the los=0,05

Starting salary for males exceeds the starting salary for females


89

Consider a classroom where the students are given a test before they are taught the subject matter covered by the test. The student’s score on this pre test are recorded as the first data set. Next, the subject matter is presented to the class. After the instruction is completed, the students are retested on the same material. The scores on the second test, the post test, compose the second data set. It is reasonable to expect that a student that scored high on the pre test will also score high on the post test(and vice versa). Inherently, a strong dependency exists between the members of a pair of scores generated by each individual.Suppose that the scores in table, have been generated by 15 students under the conditions just described. How would you decide whether the instruction had been effective?


90

Student Pre test Post test D

1 54 66 12

2 79 85 6

3 91 83 -8

4 75 88 13

5 68 93 25

6 43 40 -3

7 33 78 45

8 85 91 6

9 22 44 22

10 56 82 26

11 73 59 -14

12 63 81 18

13 29 64 35

14 75 83 8

15 87 81 -6

A data set with paired scores


91

EX : Use the T statistic for the hypotheses

versus , which σ = 1to compute :a) β, if α = 0.05 and n = 16b) α, if β = 0.025 and n = 16c) n, if α = 0.05 and β = 0.025

Solution:vs

Ho : μ = 5

Ho : μ = 5

H1 : μ = 6

H1 : μ = 6

μ = μo = 6

μ = μ1 > μo

Test Statistic :

nXT

)(

RR = { > c}X

(a) 05.0)5( cXP

05.0

161

5

161

5(

cXP


92

05.0)5(4( cTP

05.0)( tTP

753,115 tt , berarti

753,1)5(4 c c = 5.438

)6()(ˆ 1 cXPbenarHHterimaP o

)248.2()6(4( TPcTP

Tidak ada dalam tabel tJADI PAKAI INTERPOLASIUmumnya, dipakai INTERPOLASI LINEAR

21;)( xxxbxaxf


93

21 xxx o

)()()(

)()( 112

121 xx

xx

xfxfxfbxaxf ooo

TABEL t

One tail α0,10 0.05 0.025 0.01 0.005 0.001

Two-tail α0,20 0.10 0.05 0.02 0.01 0.002

1.341 1.753 2.131 2.602

υ

123...15

2.248


94

)()()(

)()( 112

121 xx

xx

xfxfxfxf oo

)117.0(471.0

025.0010.0025.0)(

oxf

021.0)( oxf

021.0)248.2( TP

(b)

025.0)6( cXP

β = 0.025 ; n = 16 α = ?

025.0)6(4( cTP025.0)( tTP

131.2t


95

042.0)868.1( TP

Jadi : 4(c-6) = -2,131 c = 5,467

)5()( cXPbenarHHtolakP oo


96

TABLE INTERPOLATION

Suppose that it is desired to evaluate a function f(x) at a point xo , and that a table of values of f(x) is available for some, but not all, values of x. In particular, the table may not give the value f(xo) but may give values for f(x1) and f(x2) where x1< xo< x2 .We can use the known values of f(x) for x = x1 , x2 to approximate the value of f(xo) .This process is known as INTERPOLATION. Perhaps the most commonly used interpolation method is linear interpolation.If f(x) is sufficiently smooth and not too curvilinear between x = x1 and x = x2 , calculus tells us that f(x) can be regarded as being nearly linear over the interval [x1 , x2]

97

That is,

Solving the equations :

For a and b yields :

21;)( xxxbxaxf

2211 )(;)( bxaxfbxaxf

12

12 )()(

xx

xfxfb

12

121

)()()(

xx

xfxfxfa

Hence :)(

)()()()( 1

12

121 xx

xx

xfxfxfbxaxf ooo

f(x1)

f(xo)f(x2)

x1 xo x2

f(x)a+bx



98

EXERCISE1. Let (X1, X2, …, Xn) be a random sample of a normal RV

X with mean μ and variance 100. Let :

vsAs a decision test, we use the rule to accept Ho if , where

is the value of sample mean.a) find RRb) find α and β for n = 16.

2. Let (X1, X2, …, Xn) be a random sample of a Bernoulli R.V X with pmf:

where it is know that 0 < p ≤ .Let : vsand n = 20. As a decision test, we use the rule to reject Ho if

Ho : μ = 50H1 : μ = 55

53xx

2

1

Ho : p = H1 : p =2

1 )(1 p

1,0;)1();( 1 xpppxp xxX

2

1

n

iix

1

6


99

(a) Find the power function γ(p) of the test.(b) Find α(c) Find β : (i) if and (ii) 1p 2p

4

1

10

1

Solutions :2.

a)

b)

Ho : p = 2

1H1 : p = )(1 p

2

1vs

X~BER(p) 1,0;)1()( 1 xppxp xxX

)()( pHrejectPp o

2

10;)1(

20 206

0

ppp

kkk

k

)2

1()

2

1( pHrejectP o

2

10;)

2

1()

2

1(

20 206

0

p

kkk

kTableα=0.058


100

c) )()( 1 trueisHHacceptPp o

2142,0)4

3()

4

1(

201)

4

1( 20

6

0

kk

k k

)(1 1pHrejectP o

0024,0)10

9()

10

1(

201)

10

1( 20

6

0

kk

k k


101

Let (X1, X2, …, Xn) be a random sample of a normal RV X with mean μ and variance 100. Let :

vsAs a decision test, we use the rule to accept Ho if . Find the value of c and sample size n such that α =0.025 and β = 0.05.

Ho : μ = 50H1 : μ = 55

cx

Solution :}:),...,,{(: 211 cxxxxR n

)50()( cXPbenarHHtolakP oo

025.0)10

50(

n

cZP

025.0)( zZP

n= 52c = 52.718


102

975.0)( zZP

975.0)10

50(

n

c

60.19)50(96.1)10

50(

nc

n

c

)55()( 1 cXPbenarHHterimaP o

05.0)10

55(05.0)

10

55(

n

c

n

cP

45.16)55(645.110

)55(

nc

nc


103

29.3)50(92.3)55(55

29.3

50

92.3

cccc

50.16429.360.21529.3 cc

21.7

10.38010.38021.7 cc

7184466,52721

38010c

718.52c

60.19)50( nc

211.72718

19600

718.2

600.19

718.2

60.19n

52998.51 n


104

Let (X1, X2, …, Xn) be a random sample of a normal RV X with mean μ and variance 36. Let :

vsAs a decision test, we use the rule to accept Ho if , where

is the value of sample mean.a) Find the expression for the critical region/rejection region R1

b) Find α and β for n = 16.

Ho : μ = 50H1 : μ = 55

53xx

Solution :a) dimana}53:),...,,{(: 211 xxxxR n

)2()5053( ZPXP

n

iixn

x1

1

0228.09772.01)2(1


105

)5553()( 1 XPbenarHHterimaP o

)333.1()333.1( ZP

)333.1(1

x1 xo x2

1.330 1.333 1.340

0.9082 0.9099 1.330 1.340

x1 < xo < x2


106

)330.1333.1(330.1340.1

9082.09099.09082.0)333.1(

f

)003.0(0100.0

0017.09082.0

90871.000051.09082.0)333.1( f

)333.1(1 0913.090870.01

0913.0

Date post:	02-Jan-2016
Category:	Documents
Upload:	bianca-carrillo
View:	31 times
Download:	2 times

NONPARAMETRIK

Documents