Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | bianca-carrillo |
View: | 31 times |
Download: | 2 times |
NONPARAMETRIK
SWN SCIENCE DEPARTMENT
2
The majority of hypothesis tests discussed so far have made
inferences about population parameters, such as the mean and
the proportion. These parametric tests have used the
parametric statistics of samples that came from the population
being tested.
To formulate these tests, we made restrictive assumptions
about the populations from which we drew our samples. For
example, we assumed that our samples either were large or
came from normally distributed populations. But populations
are not always normal.
NON PARAMETRIC TEST
SWN SCIENCE DEPARTMENT
3
And even if a goodness-of-fit test indicates that a population is
approximately normal. We cannot always be sure we’re right,
because the test is not 100 percent reliable.
Fortunately, in recent times statisticians have develops useful
techniques that do not make restrictive assumption about the
shape of population distribution.
These are known as distribution – free or, more commonly,
nonparametric test.
Non parametric statistical procedures in preference to their
parametric counterparts.
The hypotheses of a nonparametric test are concerned with
something other than the value of a population parameter.
A large number of these tests exist, but this section will examine
only a few of the better known and more widely used ones :
SWN SCIENCE DEPARTMENT
4
NON PARAMETRIC TESTS
SIGN TEST
WILCOXON SIGNED RANK TEST
MANN – WHITNEY TEST(WILCOXON RANK SUM TEST)
RUN TEST
KRUSKAL – WALLIS TEST
KOLMOGOROV – SMIRNOV TEST
LILLIEFORS TEST
SWN SCIENCE DEPARTMENT
5
The sign test is used to test hypotheses about the median of a continuous distribution. The median of a distribution is a value of the random variable X such that the probability is 0,5 that an observed value of X is less than or equal to the median, and the probability is 0,5 that an observed value of X is greater than or equal to the median. That is,
Since the normal distribution is symmetric, the mean of a normal distribution equals the median. Therefore, the sign test can be used to test hypotheses about the mean of a normal distribution.
THE SIGN TEST
SWN SCIENCE DEPARTMENT
6
Let X denote a continuous random variable with median and let denote a random sample of size n from the population of interest. If denoted the hypothesized value of the population median, then the usual forms of the hypothesis to be tested can be stated as follows :
(right-tailed test)
(left-tailed test)
(two-tailed test)
VERSUS
SWN SCIENCE DEPARTMENT
7
Form the differences : Now if the null hypothesis is true,
any difference is equally likely to be positive or negative. An appropriate test statistic is the number of these differences that are positive, say . Therefore, to test the null hypothesis we are really testing that the number of plus signs is a value of a Binomial random variable that has the parameter
p = 0,5 .A p-value for the observed number of plus signs can be calculated directly from the Binomial distribution. Thus, if the computed p-value.
is less than or equal to some preselected significance level α , we will reject and conclude is true.
SWN SCIENCE DEPARTMENT
8
To test the other one-sided hypothesis,
vs
is less than or equal α, we will reject . The two-sided alternative may also be tested. If the hypotheses are:
vs p-value is :
SWN SCIENCE DEPARTMENT
9
It is also possible to construct a table of critical value for the sign test.As before, let denote the number of the differences that are positive and let denote the number of the differences that are negative.Let , table of critical values for the sign test that ensure that
If the observed value of the test-statistic , the the null hypothesis should be reject and accepted
SWN SCIENCE DEPARTMENT
10
If the alternative is , then reject if .If the alternative is ,then reject if .The level of significance of a one-sided test is one-half the value for a two-sided test.
SWN SCIENCE DEPARTMENT
11
Since the underlying population is assumed to be continuous, there is a zero probability that we will find a “tie” , that is , a value of exactly equal to .When ties occur, they should be set aside and the sign test applied to the remaining data.
TIES in the SIGN TEST
SWN SCIENCE DEPARTMENT
12
When , the Binomial distribution is well approximated by a normal distribution when n is at least 10. Thus, since the mean of the Binomial is and the variance is , the distribution of is approximately normal with mean 0,5n and variance 0,25n whenever n is moderately large.Therefore, in these cases the null hypothesis can be tested using the statistic :
THE NORMAL APPROXIMATION
SWN SCIENCE DEPARTMENT
13
Critical Regions/Rejection Regions for α-level tests
versus
are given in this table :CRITICAL/REJECTION REGIONS FOR
Alternative CR/RR
SWN SCIENCE DEPARTMENT
14
The sign test makes use only of the plus and minus signs of the differences between the observations and the median (the plus and minus signs of the differences between the observations in the paired case).Frank Wilcoxon devised a test procedure that uses both direction (sign) and magnitude.This procedure, now called the Wilcoxon signed-rank test.The Wilcoxon signed-rank test applies to the case of the symmetric continuous distributions.Under these assumptions, the mean equals the median.
THE WILCOXON SIGNED-RANK TEST
SWN SCIENCE DEPARTMENT
15
Description of the test :We are interested in testing,
versus
16
Assume that is a random sample from a continuous and symmetric distribution with mean/median : .Compute the differences , i = 1, 2, … nRank the absolute differences , and then give the ranks the signs of their corresponding differences.Let be the sum of the positive ranks, and be the absolute value of the sum of the negative ranks, and let .
Critical values of , say .1. If , then value of the statistic , reject
2. If , reject if 3. If , reject if
SWN SCIENCE DEPARTMENT
SWN SCIENCE DEPARTMENT
17
If the sample size is moderately large (n>20), then it can be shown that or has approximately a normal distribution with mean
andvariance
Therefore, a test of can be based on the statistic
LARGE SAMPLE APPROXIMATION
SWN SCIENCE DEPARTMENT
18
Test statistic :
Theorem : The probability distribution of when is true, which is based on a random sample of size n, satisfies :
Wilcoxon Signed-Rank Test
SWN SCIENCE DEPARTMENT
19
Proof :
Let if , then
where
For a given , the discrepancy has a 50 : 50 chance
being “+” or “-”. Hence, where
SWN SCIENCE DEPARTMENT
20
SWN SCIENCE DEPARTMENT
21
SWN SCIENCE DEPARTMENT
22
The Wilcoxon signed-rank test can be applied to paired data.Let ( ) , j = 1,2, …n be a collection of paired observations from two continuous distributions that differ only with respect to their means. The distribution of the differences is continuous and symmetric.The null hypothesis is : , which is equivalent to
.To use the Wilcoxon signed-rank test, the differences are first ranked in ascending order of their absolute values, and then the ranks are given the signs of the differences.
PAIRED OBSERVATIONS
SWN SCIENCE DEPARTMENT
23
Let be the sum of the positive ranks and be the absolute value of the sum of the negative ranks, and .If the observed value , then is rejected and accepted.If , then reject , ifIf , reject , if
SWN SCIENCE DEPARTMENT
24
Eleven students were randomly selected from a large statistics class, and their numerical grades on two successive examinations were recorded.
Use the Wilcoxon signed rank test to determine whether the second test was more difficult than
the first. Use α = 0,1.
EXAMPLE
Student Test 1 Test 2 Difference Rank
Sign Rank
1234567891011
9478896249788082628379
8565925652747984487182
913-36-341-21412-3
8104746121194
810-47-461-2119-4
SWN SCIENCE DEPARTMENT
25
solution :Jumlah ranks positif :
TOLAK H0
0 1,281,69
SWN SCIENCE DEPARTMENT
26
Ten newly married couples were randomly selected, and each husband and wife were independently asked the question of how many children they would like to have. The following information was obtained.
Using the sign test, is test reason to believe that wives want fewer children than husbands?Assume a maximum size of type I error of 0,05
EXAMPLE
COUPLE 1 2 3 4 5 6 7 8 9 10
WIFE XHUSBAND Y
3 2 1 0 0 1 2 2 2 0 2 3 2 2 0 2 1 3 1 2
SWN SCIENCE DEPARTMENT
27
Tetapkan dulu H0 dan H1 :
H0 : p = 0,5
vs H1 : p < 0,5
Ada tiga tanda +.Di bawah H0 , S ~ BIN (9 , 1/2)
P(S ≤ 3) = 0,2539Pada peringkat α = 0,05 , karena 0,2539 > 0,05maka H0 jangan ditolak.
SOLUSI
Pasangan 1 2 3 4 6 7 8 9 10
Tanda + - - - - + - + -
SWN SCIENCE DEPARTMENT
28
Suppose that we have two independent continuous populations X1 and X2 with means µ1 and µ2. Assume that the distributions of X1 and X2 have the same shape and spread, and differ only (possibly) in their means.The Wilcoxon rank-sum test can be used to test the hypothesis H0 : µ1 = µ2. This procedure is sometimes called the Mann-Whitney test or Mann-Whitney U Test.
THE WILCOXON RANK-SUM TEST
SWN SCIENCE DEPARTMENT
29
Let and be two independent random samples of sizes from the continuous populations X1 and X2. We wish to test the hypotheses :
H0 : µ1 = µ2
versus H1 : µ1 ≠ µ2
The test procedure is as follows. Arrange all n1 + n2 observations in ascending order of magnitude and assign ranks to them. If two or more observations are tied, then use the mean of the ranks that would have been assigned if the observations differed.
Description of the Test
SWN SCIENCE DEPARTMENT
30
Let W1 be the sum of the ranks in the smaller sample (1), and define W2 to be the sum of the ranks in the other sample.Then,
Now if the sample means do not differ, we will expect the sum of the ranks to be nearly equal for both samples after adjusting for the difference in sample size. Consequently, if the sum of the ranks differ greatly, we will conclude that the means are not equal.Refer to table with the appropriate sample sizes n1 and n2 , the critical value wα can be obtained.
SWN SCIENCE DEPARTMENT
31
H0 : µ1 = µ2 is rejected, if either of the observed values
w1 or w2 is less than or equal wα
If H1 : µ1 < µ2, then reject H0 if w1 ≤ wα
For H1 : µ1 > µ2, reject H0 if w2 ≤ wα.
SWN SCIENCE DEPARTMENT
32
When both n1 and n2 are moderately large, say, greater than 8, the distribution of W1 can be well approximated by the normal distribution with mean :
and variance :
LARGE-SAMPLE APPROXIMATION
SWN SCIENCE DEPARTMENT
33
Therefore, for n1 and n2 > 8, we could use :
as a statistic, and critical region is : two-tailed test
upper-tail test
lower-tail test
SWN SCIENCE DEPARTMENT
34
A large corporation is suspected of sex-discrimination in the salaries of its employees. From employees with similar responsibilities and work experience, 12 male and 12 female employees were randomly selected ; their annual salaries in thousands of dollars are as follows :
Is there reason to believe that there random samples come from populations with different distributions ? Use α = 0,05
EXAMPLE
Females
22,5 19,8 20,6 24,7 23,2 19,2 18,7 20,9 21,6 23,5 20,7 21,6
Males 21,9 21,6 22,4 24,0 24,1 23,4 21,2 23,9 20,5 24,5 22,3 23,6
SWN SCIENCE DEPARTMENT
35
H0 : f1(x) = f2(x) APA ARTINYA??
random samples berasal dari populasi dengan distribusi yang samaH1 : f1(x) ≠ f2(x)
Gabungkan dan buat peringkat salaries :
SOLUSI
SEX GAJI PERINGKAT
F 18,7 1
F 19,2 2
F 19,8 3
M 20,5 4
F 20,6 5
F 20,7 6
F 20,9 7
M 21,2 8
M 21,6 10
F 21,6 10
F 21,6 10
SWN SCIENCE DEPARTMENT
36
M 21,9 12
M 22,3 13
M 22,4 14
F 22,5 15
F 23,2 16
M 23,4 17
F 23,5 18
M 23,6 19
M 23,9 20
M 24,0 21
M 24,1 22
M 24,5 23
F 24,7 24
C........
SWN SCIENCE DEPARTMENT
37
Andaikan, kita pilih sampel dari female, maka jumlah peringkatnya R1 = RF = 117
Statistic
nilai dari statistic U adalah
SWN SCIENCE DEPARTMENT
38
Grafik
α = 0,05 Zhit = 1,91
maka terima H0
-1,96 1,96 ARTINYA ???
SWN SCIENCE DEPARTMENT
39
The Kolmogorov-Smirnov Test (K-S) test is conducted by the comparing the hypothesized and sample cumulative distribution function.A cumulative distribution function is defined as : and the sample cumulative distribution function, S(x), is defined as the proportion of sample values that are less than or equal to x.The K-S test should be used instead of the to determine if a sample is from a specified continuous distribution.To illustrate how S(x) is computed, suppose we have the following 10 observations :
110, 89, 102, 80, 93, 121, 108, 97, 105, 103.
KOLMOGOROV – SMIRNOV TEST
SWN SCIENCE DEPARTMENT
40
We begin by placing the values of x in ascending order, as follows :
80, 89, 93, 97, 102, 103, 105, 108, 110, 121.Because x = 80 is the smallest of the 10 values, the proportion of values of x that are less than or equal to 80 is : S(80) = 0,1.
X S(x) = P(X ≤ x)
80 0,1
89 0,2
93 0,3
97 0,4
102 0,5
103 0,6
105 0,7
108 0,8
110 0,9
121 1,0
SWN SCIENCE DEPARTMENT
41
The test statistic D is the maximum- absolute difference between the two cdf’s over all observed values. The range on D is 0 ≤ D ≤ 1, and the formula is :
where x = each observed value S(x) = observed cdf at x F(x) = hypothesized cdf at x
SWN SCIENCE DEPARTMENT
42
Let X(1) , X(2) , …. , X(n) denote the ordered observations of a random sample of size n, and define the sample cdf as :
is the proportion of the number of sample values less than
or equal to x.
SWN SCIENCE DEPARTMENT
43
The Kolmogorov – Smirnov statistic, is defined to be :
For the size α of type I error, the critical region is of form :
SWN SCIENCE DEPARTMENT
44
A state vehicle inspection station has been designed so that inspection time follows a uniform distribution with limits of 10 and 15 minutes.A sample of 10 duration times during low and peak traffic conditions was taken. Use the K-S test with α = 0,05 to determine if the sample is from this uniform distribution. The time are :11,3 10,4 9,8 12,6 14,813,0 14,3 13,3 11,5 13,6
EXAMPLE 1
SWN SCIENCE DEPARTMENT
45
1. H0 : sampel berasal dari distribusi Uniform (10,15)versus H1 : sampel tidak berasal dari distribusi Uniform (10,15)
2. Fungsi distribusi kumulatif dari sampel : S (x) dihitung dari,
SOLUTION
SWN SCIENCE DEPARTMENT
46
WaktuPengamatan
xS(x) F(x)
9,8 0,10 0,00 0,10
10,4 0,20 0,08 0,12
11,3 0,30 0,26 0,04
11,5 0,40 0,30 0,10
12,6 0,50 0,52 0,02
13,0 0,60 0,60 0,00
13,3 0,70 0,66 0,04
13,6 0,80 0,72 0,08
14,3 0,90 0,86 0,04
14,8 1,00 0,96 0,04
Hasil Perhitungan dari K-S
SWN SCIENCE DEPARTMENT
47
, untuk x = 10,4Dalam tabel , n = 10 , α = 0,05 D10,0.05 = 0,41
f(D)
α = P(D ≥ D0)
D0 D
0,12 < 0,41 maka do not reject H0
SWN SCIENCE DEPARTMENT
48
Suppose we have the following ten observations 110, 89, 102, 80, 93, 121, 108, 97, 105, 103 ;were drawn from a normal distribution, with mean µ = 100 and standard-deviation σ = 10.Our hypotheses for this test are H0 : Data were drawn from a normal distribution, with µ = 100 and σ = 10.
versusH1 : Data were not drawn from a normal distribution, with µ = 100 and σ = 10.
EXAMPLE 2
SWN SCIENCE DEPARTMENT
49
F(x) = P(X ≤ x)SOLUTION
x F(x)
80
89
93
97
102
103
105
108
110
121
P(X ≤ 80) = P(Z ≤ -2) = 0,0228
P(X ≤ 89) = P(Z ≤ -1,1) = 0,1357
P(X ≤ 93) = P(Z ≤ -0,7) = 0,2420
P(X ≤ 97) = P(Z ≤ -0,3) = 0,3821
P(X ≤ 102) = P(Z ≤ 0,2) = 0,5793
P(X ≤ 103) = P(Z ≤ 0,3) = 0,6179
P(X ≤ 105) = P(Z ≤ 0,5) = 0,6915
P(X ≤ 108) = P(Z ≤ 0,8) = 0,7881
P(X ≤ 110) = P(Z ≤ 1,0) = 0,8413
P(X ≤ 121) = P(Z ≤ 2,1) = 0,9821
SWN SCIENCE DEPARTMENT
50
x F(x) S(x)
80 0,0228 0,1 0,0772
89 0,1357 0,2 0,0643
93 0,2420 0,3 0,0580
97 0,3821 0,4 0,0179
102 0,5793 0,5 0,0793 =
103 0,6179 0,6 0,0179
105 0,6915 0,7 0,0085
108 0,7881 0,8 0,0119
110 0,8413 0,9 0,0587
121 0,9821 1,0 0,0179
SWN SCIENCE DEPARTMENT
51
Jika α = 0,05 , maka critical value, dengan n=10 diperoleh di tabel = 0,409.
Aturan keputusannya, tolak H0 jika D > 0,409
Karena H0 jangan ditolak atau terima H0 .
Artinya, data berasal dari distribusi normal dengan µ = 100 dan σ = 10.
SWN SCIENCE DEPARTMENT
52
In most applications where we want to test for normality, the population mean and the population variance are known.In order to perform the K-S test, however, we must assume that those parameters are known. The Lilliefors test, which is quite similar to the K-S test.The major difference between two tests is that, with the Lilliefors test, the sample mean and the sample standard deviation s are used instead of µ and σ to calculate F (x).
LILLIEFORS TEST
SWN SCIENCE DEPARTMENT
53
A manufacturer of automobile seats has a production line that produces an average of 100 seats per day. Because of new government regulations, a new safety device has been installed, which the manufacturer believes will reduce average daily output.A random sample of 15 days’ output after the installation of the safety device is shown:93, 103, 95 , 101, 91, 105, 96, 94, 101, 88, 98, 94, 101, 92, 95The daily production was assumed to be normally distributed.Use the Lilliefors test to examine that assumption, with α = 0,01
EXAMPLE
SWN SCIENCE DEPARTMENT
54
Seperti pada uji K-S, untuk menghitung S (x) urutkan, sbb :
SOLUSI
x S(x)
88 1/15 = 0,067
91 2/15 = 0,133
92 3/15 = 0,200
93 4/15 = 0,267
94 6/15 = 0,400
95 8/15 = 0,533
96 9/15 = 0,600
98 10/15 = 0,667
101 13/15 = 0,867
103 14/15 = 0,933
105 15/15 = 1,000
SWN SCIENCE DEPARTMENT
55
Dari data di atas, diperoleh dan s = 4,85.
Selanjutnya F(x) dihitung sbb :
X F(x)
88
91
92
.
.
.
.
101
103
105
SWN SCIENCE DEPARTMENT
56
Akhirnya, buat rangkuman sbb :
Tabel, nilai kritis dari uji Lilliefors : α = 0,01 , n = 15 Dtab = 0,257
maka terima H0
x F(x) S(x)
88 0,0401 0,067 0,0269
91 0,1292 0,133 0,0038
92 0,1788 0,200 0,0212
93 0,2358 0,267 0,0312
94 0,3050 0,400 0,0950
95 0,3821 0,533 0,1509 = D
96 0,4602 0,600 0,1398
98 0,6255 0,667 0,0415
101 0,8238 0,867 0,0432
103 0,9115 0,933 0,0215
105 0,9608 1,000 0,0392
SWN SCIENCE DEPARTMENT
57
Usually a sample that is taken from a population should be random.The runs test evaluates the null hypothesisH0 : the order of the sample data is random
The alternative hypothesis is simply the negation of H0. There is no comparable parametric test to evaluate this null hypothesis.The order in which the data is collected must be retained so that the runs may be developed.
TEST BASED ON RUNS
SWN SCIENCE DEPARTMENT
58
DEFINITIONS :1. A run is defined as a sequence of the same
symbols.Two symbols are defined, and each sequence must contain a symbol at least once.
2. A run of length j is defined as a sequence of j observations, all belonging to the same group, that is preceded or followed by observations belonging to a different group.
For illustration, the ordered sequence by the sex of the employee is as follows :F F F M F F F M M F F M M M F F M F M M M M M F For the sex of the employee the ordered
sequence exhibits runs of F’s and M’s.
SWN SCIENCE DEPARTMENT
59
The sequence begins with a run of length three, followed by a run of length one, followed by another run of length three, and so on.The total number of runs in this sequence is 11.Let R be the total number of runs observed in an ordered sequence of n1 + n2 observations, where n1 and n2 are the respective sample sizes. The possible values of R are 2, 3, 4, …. (n1 + n2 ).
The only question to ask prior to performing the test is, Is the sample size small or large?We will use the guideline that a small sample has n1 and n2 less than or equal to 15.
In the table, gives the lower rL and upper rU values of the distribution f(r) with α/2 = 0,025 in each tail.
SWN SCIENCE DEPARTMENT
60
If n1 or n2 exceeds 15, the sample is considered large, in which case a normal approximation to f(r) is used to test H0 versus
H1.
f(r)
rAR
rL rU
SWN SCIENCE DEPARTMENT
61
The mean and variance of R are determined to be
normal approximation
SWN SCIENCE DEPARTMENT
62
The Kruskal – Wallis H test is the nonparametric equivalent of the Analysis of Variance F test.It test the null hypothesis that all k populations possess the same probability distribution against the alternative hypothesis that the distributions differ in location – that is, one or more of the distributions are shifted to the right or left of each other.The advantage of the Kruskall – Wallis H test over the F test is that we need make no assumptions about the nature of sampled populations.A completely randomized design specifies that we select independent random samples of n1, n2 , …. nk
observations from the k populations.
THE KRUSKAL - WALLIS H TEST
SWN SCIENCE DEPARTMENT
63
To conduct the test, we first rank all :n = n1 + n2 + n3 + … +nk observations and compute the rank sums, R1 , R2 , …, Rk for the k samples.
The ranks of tied observations are averaged in the same manner as for the WILCOXON rank sum test.Then, if H0 is true, and if the sample sizes n1 , n2 , …, nk each equal 5 or more, then the test statistic is defined by :
will have a sampling distribution that can be approximated by a chi-square distribution with (k-1) degrees of freedom.Large values of H imply rejection of H0 .
SWN SCIENCE DEPARTMENT
64
Therefore, the rejection region for the test is , where is the value that located α in the upper tail of the chi- square distribution.
The test is summarized in the following :
SWN SCIENCE DEPARTMENT
65
H0 : The k population probability distributions are identical
H1 : At least two of the k population probability distributions differ in location
Test statistic :
where, ni = Number of measurements in sample i
Ri = Rank sum for sample i, where the rank of each measurementis computed according to its relative magnitude in the totality of data for the k samples.
KRUSKAL – WALLIS H TESTFOR COMPARING k POPULATION PROBABILITY DISTRIBUTIONS
SWN SCIENCE DEPARTMENT
66
n = Total sample size = n1 + n2 + … +nk
Rejection Region : with (k-1) dofAssumptions :
1. The k samples are random and independent2. There are 5 or more measurements in each sample 3. The observations can be ranked
No assumptions have to be made about the shape of the population probability distributions.
SWN SCIENCE DEPARTMENT
67
Independent random samples of three different brands of magnetron tubes (the key components in microwave ovens) were subjected to stress testing, and the number of hours each operated without repair was recorded. Although these times do not represent typical life lengths, they do indicate how well the tubes can withstand extreme stress. The data are shown in table (below). Experience has shown that the distributions of life lengths for manufactured product are often non normal, thus violating the assumptions required for the proper use of an ANOVA F test.Use the K-S H test to determine whether evidence exists to conclude that the brands of magnetron tubes tend to differ in length of life under stress. Test using α = 0,05
Example
SWN SCIENCE DEPARTMENT
68
BRAND A B C
36 49 71 48 33 31 5 60 140 67 2 59 53 55 42
SWN SCIENCE DEPARTMENT
69
Lakukan ranking/peringkat dan jumlahkan peringkat dari 3 sample tersebut.
H0 : the population probability distributions of length of life under stress are identical for the three brands of magnetron tubes.
versusH1 : at least two of the population probability
distributions differ in location
Solusi
A peringkat B peringkat C peringkat
36 5 49 8 71 14
48 7 33 4 31 3
5 2 60 12 140 15
67 13 2 1 59 11
53 9 55 10 42 6
R1 = 36 R2 = 35 R3 = 49
SWN SCIENCE DEPARTMENT
70
Test statistic :
H0 ???
f(H)
H1,22 5,99
SWN SCIENCE DEPARTMENT
71
COMPARISON OF POPULATION PROPORTIONSGiven X1~BIN(n1, p1) and X2~BIN(n2, p2)
Statistics :
Are defined to be the sample proportions.
Assume, that X1 and X2 are independent;
2
22
1
11 ˆ;ˆ
n
Xp
n
Xp
)ˆ()ˆ()ˆˆ( 2121 pEpEppE
21 pp
)ˆ()ˆ()ˆˆ( 2121 pVarpVarppVar
2
22
1
11 )1()1(
n
pp
n
pp
SWN SCIENCE DEPARTMENT
72
For sufficiently large n1 and n2 the standardized statistic :
The (1-α)100% CI :
As p1 and p2 UNKNOWN, approximate (1-α)100% CIfor (p1-p2) :
22
22
1
1121
)1()1()ˆˆ( zn
pp
n
pppp
2
22
1
11
2121
)1()1(
)()ˆˆ(
npp
npp
pppp
2
22
1
11221
)ˆ1(ˆ)ˆ1(ˆ)ˆˆ(
n
pp
n
ppzpp
SWN SCIENCE DEPARTMENT
73
In the testing situation,Ho : p1 = p2 = p ( p unknown )
Versus
Test statistic :
The unknown common value of p is estimated by :
1H
21 pp 21 pp 21 pp
zZRR : zZRR :2: zZRR
testlos
21
21
)1()1(
ˆˆ
npp
npp
ppZ
21
21ˆnn
XXp
SWN SCIENCE DEPARTMENT
74
EXAMPLEMembers of the Department of statistics at Iowa State Union collected the following data on grades in an introductory business statistics course and an introductory engineering statistics course.
Course #Students #A gradesB.Stat 571 82E.Stat 156 25 Ho : p1=p2
Vs H1 : p1≠p2
; The proportion of A grades in two courses is equal.
1436,0571
82ˆ1 p 1603,0
156
25ˆ 2 p
SWN SCIENCE DEPARTMENT
75
1472,0156571
2582ˆ
p
)1561
5711)(8528,0(1472,0
1603,01436,0
Z
52,0Z
The p-value is 2P(Z≤-0,52) = 0,6030 If α= 5% < p-value
Ho would not be rejected
Proportion of A’s does not differ significantly in the two courses.
SWN SCIENCE DEPARTMENT
76
An insurance company is thinking about offering discount on its life insurance policies to non smokers. As part of its analysis, it randomly select 200 men who are 50 years old and asks them if they smoke at least one pack of cigarettes per day and if they have ever suffered from heart diseases. The results indicate that 20 out of 80 smokers and 15 out of 120 non smokers suffer from heart disease. Can we conclude at the 5% los that smokers have a higher incidence of heart disease than non smokers ? DATA
berumur 50th
perokok menderita penyakit JANTUNG parameter : p1
berumur 50th
bukan perokok menderita penyakit JANTUNG parameter : p2
EXERCISE
Solution:
SWN SCIENCE DEPARTMENT
77
Jelas Data Qualitative vs
Test statistic :
ztab
Sample proportion : ;
Pooled proportion estimate :Value of the test statistic:
)11
(ˆˆ
)ˆˆ(
21
21
nnqp
ppz
.645,1: 05,0 zzzRR
25,080
20ˆ1 p 125,0
120
15ˆ 2 p
175,0200
35
12080
1520ˆ
p
hitcal zz ˆ ˆ
ˆˆ
1 2
1 2
p -p (0,25-0,125)z= =
1 1 1 1pq( + ) 0,175(0,825)( + )
n n 80 120
0: 21 ppHo0: 211 ppH
SWN SCIENCE DEPARTMENT
78
tabcal zz 28,2 oHreject
Test statistic, is normally distributed
We can calculate p-value
p-value = Reject Ho
%13,10113,0)28,2( zP
SWN SCIENCE DEPARTMENT
79
SOAL-SOAL
1. Diberikan pmf dari variabel random X sbb: x 0 1 2 3 p(x) 0 k k 3k2
Tentukan k sehingga memenuhi sifat dari pmf!
xxp 0)(
130)( 2 kkkxp
0123 2 kk
1,3
10)1)(13( kkkk
1)(xp
Solusi: Ada dua sifat pmf, yaitu :
SWN SCIENCE DEPARTMENT
80
Untuk
Dengan demikian tidak memenuhi. Selanjutnya untuk dapat diperiksa ternyata pada kondisi ini memenuhi sifat pmf.Jadi nilai
01)1(1 pk
01)2( p
1k3
1k
3
1k
81
In a public opinion survey, 60 out of a sample of 100 high-income voters and 40 out of a sample of 75 low-income voters supported a decrease in sales tax.(a) Can we conclude at the 5% los that the
proportion of voters favoring a sales tax decrease differs between high and low-income voters?
(b) What is the p-value of this test?(c) Estimate the difference in proportions, with 99%
confidence! 0)(: 21 ppHo
0)(: 211 ppH
96,1: zRR
)11
(ˆˆ
)ˆˆ(
21
21
nnqp
ppz
Solution:
vs
Test statistic :
SWN SCIENCE DEPARTMENT
SWN SCIENCE DEPARTMENT
82
53,075
40ˆ;6,0
100
60ˆ 21 pp
571,0175
100
75100
4060ˆ
p
429,0ˆ1ˆ pq
93,0
)751
1001
)(429,0(571,0
)53,060,0(
calz
0-1,96 1,96
(a) Conclusion : don not reject Ho
(b) p-value = 2P(z > 0,93) = 2(0,1762) = 0,3524.(c)
The difference between the two-proportions is estimated to lie between -0,125 and 0,265
2
22
1
11221
ˆˆˆˆ)ˆˆ(
n
qp
n
qpzpp 75
)47,0)(53,0(
100
)4,0)(6,0(575,2)53,060,0(
195,007,0
SWN SCIENCE DEPARTMENT
83
TEST on MEANS WHEN THE OBSERVATIONS ARE PAIRED
TESTING THE PAIRED DIFFERENCES
Let (X1, Y1), (X2, Y2) … (Xn, Ym) be the n pairs, where (Xi, Yi) denotes the systolic blood pressure of the i th subject before and after the drug.It is assumed that the differences D1, D2, …, Dn constitute independent normally distributed RV such that: and
TEST STATISTIC:
iiDE 2DiDVar
oDoH : oDH :1vs
nS
DT
D
o
n
DD i
22 )(
1
1DD
nS iDan
d
SWN SCIENCE DEPARTMENT
84
Rejection criteria for testing hypotheses on means when the observation are paired
Null hypothesis Value test statistic under Ho
Alternative hypothesis Rejection criteria
Reject Ho whenor when
oDoH :ns
dt
d
o
Reject Ho when
Reject Ho when
1,2 ntt oDH :1
1,21 ntt
1,1 ntt
1, ntt
oDH :1
oDH :1
SWN SCIENCE DEPARTMENT
85
A paired difference experiment is conducted to compare the starting salaries of male and female college graduates who find jobs. Pairs are formed by choosing a male and female with the same major and similar GRADE-POINT-AVERAGE. Suppose a random sample of ten pairs is formed in this manner and starting annual salary of each person is recorded. The result are shown in table. Test to see whether there is evidence that the mean starting salary, μ1 , for males exceeds the mean starting salary, μ2, for female. Use α=0,05.
SWN SCIENCE DEPARTMENT
86
Pair Male Female Difference (male-female)
1 $ 14.300 $13.800 $ 500
2 16.500 16.600 -100
3 15.400 14.800 600
4 13.500 13.500 0
5 18.500 17.600 900
6 12.800 13.000 -200
7 14.500 14.200 300
8 16.200 15.100 1.100
9 13.400 13.200 200
10 14.200 13.500 700
SWN SCIENCE DEPARTMENT
87
Solution:
)0(0: 21 DoHvs
)0(0: 211 DH
Test statistic :
dxns
oxt D
DD
D
;
RR : reject Ho if : t > tα ; t0.05,9=1,833
400 n
Dxd iD
61,43489,888.1882 DD SS
91,21061,434
400t
0 1,833
t
T-distribution with 9 dof
SWN SCIENCE DEPARTMENT
88
tcal falls in RR
Reject Ho at the los=0,05
Starting salary for males exceeds the starting salary for females
SWN SCIENCE DEPARTMENT
89
Consider a classroom where the students are given a test before they are taught the subject matter covered by the test. The student’s score on this pre test are recorded as the first data set. Next, the subject matter is presented to the class. After the instruction is completed, the students are retested on the same material. The scores on the second test, the post test, compose the second data set. It is reasonable to expect that a student that scored high on the pre test will also score high on the post test(and vice versa). Inherently, a strong dependency exists between the members of a pair of scores generated by each individual.Suppose that the scores in table, have been generated by 15 students under the conditions just described. How would you decide whether the instruction had been effective?
SWN SCIENCE DEPARTMENT
90
Student Pre test Post test D
1 54 66 12
2 79 85 6
3 91 83 -8
4 75 88 13
5 68 93 25
6 43 40 -3
7 33 78 45
8 85 91 6
9 22 44 22
10 56 82 26
11 73 59 -14
12 63 81 18
13 29 64 35
14 75 83 8
15 87 81 -6
A data set with paired scores
SWN SCIENCE DEPARTMENT
91
EX : Use the T statistic for the hypotheses
versus , which σ = 1to compute :a) β, if α = 0.05 and n = 16b) α, if β = 0.025 and n = 16c) n, if α = 0.05 and β = 0.025
Solution:vs
Ho : μ = 5
Ho : μ = 5
H1 : μ = 6
H1 : μ = 6
μ = μo = 6
μ = μ1 > μo
Test Statistic :
nXT
)(
RR = { > c}X
(a) 05.0)5( cXP
05.0
161
5
161
5(
cXP
SWN SCIENCE DEPARTMENT
92
05.0)5(4( cTP
05.0)( tTP
753,115 tt , berarti
753,1)5(4 c c = 5.438
)6()(ˆ 1 cXPbenarHHterimaP o
)248.2()6(4( TPcTP
Tidak ada dalam tabel tJADI PAKAI INTERPOLASIUmumnya, dipakai INTERPOLASI LINEAR
21;)( xxxbxaxf
SWN SCIENCE DEPARTMENT
93
21 xxx o
)()()(
)()( 112
121 xx
xx
xfxfxfbxaxf ooo
TABEL t
One tail α0,10 0.05 0.025 0.01 0.005 0.001
Two-tail α0,20 0.10 0.05 0.02 0.01 0.002
1.341 1.753 2.131 2.602
υ
123...15
2.248
SWN SCIENCE DEPARTMENT
94
)()()(
)()( 112
121 xx
xx
xfxfxfxf oo
)117.0(471.0
025.0010.0025.0)(
oxf
021.0)( oxf
021.0)248.2( TP
(b)
025.0)6( cXP
β = 0.025 ; n = 16 α = ?
025.0)6(4( cTP025.0)( tTP
131.2t
SWN SCIENCE DEPARTMENT
95
042.0)868.1( TP
Jadi : 4(c-6) = -2,131 c = 5,467
)5()( cXPbenarHHtolakP oo
SWN SCIENCE DEPARTMENT
96
TABLE INTERPOLATION
Suppose that it is desired to evaluate a function f(x) at a point xo , and that a table of values of f(x) is available for some, but not all, values of x. In particular, the table may not give the value f(xo) but may give values for f(x1) and f(x2) where x1< xo< x2 .We can use the known values of f(x) for x = x1 , x2 to approximate the value of f(xo) .This process is known as INTERPOLATION. Perhaps the most commonly used interpolation method is linear interpolation.If f(x) is sufficiently smooth and not too curvilinear between x = x1 and x = x2 , calculus tells us that f(x) can be regarded as being nearly linear over the interval [x1 , x2]
97
That is,
Solving the equations :
For a and b yields :
21;)( xxxbxaxf
2211 )(;)( bxaxfbxaxf
12
12 )()(
xx
xfxfb
12
121
)()()(
xx
xfxfxfa
Hence :)(
)()()()( 1
12
121 xx
xx
xfxfxfbxaxf ooo
f(x1)
f(xo)f(x2)
x1 xo x2
f(x)a+bx
SWN SCIENCE DEPARTMENT
SWN SCIENCE DEPARTMENT
98
EXERCISE1. Let (X1, X2, …, Xn) be a random sample of a normal RV
X with mean μ and variance 100. Let :
vsAs a decision test, we use the rule to accept Ho if , where
is the value of sample mean.a) find RRb) find α and β for n = 16.
2. Let (X1, X2, …, Xn) be a random sample of a Bernoulli R.V X with pmf:
where it is know that 0 < p ≤ .Let : vsand n = 20. As a decision test, we use the rule to reject Ho if
Ho : μ = 50H1 : μ = 55
53xx
2
1
Ho : p = H1 : p =2
1 )(1 p
1,0;)1();( 1 xpppxp xxX
2
1
n
iix
1
6
SWN SCIENCE DEPARTMENT
99
(a) Find the power function γ(p) of the test.(b) Find α(c) Find β : (i) if and (ii) 1p 2p
4
1
10
1
Solutions :2.
a)
b)
Ho : p = 2
1H1 : p = )(1 p
2
1vs
X~BER(p) 1,0;)1()( 1 xppxp xxX
)()( pHrejectPp o
2
10;)1(
20 206
0
ppp
kkk
k
)2
1()
2
1( pHrejectP o
2
10;)
2
1()
2
1(
20 206
0
p
kkk
kTableα=0.058
SWN SCIENCE DEPARTMENT
100
c) )()( 1 trueisHHacceptPp o
2142,0)4
3()
4
1(
201)
4
1( 20
6
0
kk
k k
)(1 1pHrejectP o
0024,0)10
9()
10
1(
201)
10
1( 20
6
0
kk
k k
SWN SCIENCE DEPARTMENT
101
Let (X1, X2, …, Xn) be a random sample of a normal RV X with mean μ and variance 100. Let :
vsAs a decision test, we use the rule to accept Ho if . Find the value of c and sample size n such that α =0.025 and β = 0.05.
Ho : μ = 50H1 : μ = 55
cx
Solution :}:),...,,{(: 211 cxxxxR n
)50()( cXPbenarHHtolakP oo
025.0)10
50(
n
cZP
025.0)( zZP
n= 52c = 52.718
SWN SCIENCE DEPARTMENT
102
975.0)( zZP
975.0)10
50(
n
c
60.19)50(96.1)10
50(
nc
n
c
)55()( 1 cXPbenarHHterimaP o
05.0)10
55(05.0)
10
55(
n
c
n
cP
45.16)55(645.110
)55(
nc
nc
SWN SCIENCE DEPARTMENT
103
29.3)50(92.3)55(55
29.3
50
92.3
cccc
50.16429.360.21529.3 cc
21.7
10.38010.38021.7 cc
7184466,52721
38010c
718.52c
60.19)50( nc
211.72718
19600
718.2
600.19
718.2
60.19n
52998.51 n
SWN SCIENCE DEPARTMENT
104
Let (X1, X2, …, Xn) be a random sample of a normal RV X with mean μ and variance 36. Let :
vsAs a decision test, we use the rule to accept Ho if , where
is the value of sample mean.a) Find the expression for the critical region/rejection region R1
b) Find α and β for n = 16.
Ho : μ = 50H1 : μ = 55
53xx
Solution :a) dimana}53:),...,,{(: 211 xxxxR n
)2()5053( ZPXP
n
iixn
x1
1
0228.09772.01)2(1
SWN SCIENCE DEPARTMENT
105
)5553()( 1 XPbenarHHterimaP o
)333.1()333.1( ZP
)333.1(1
x1 xo x2
1.330 1.333 1.340
0.9082 0.9099 1.330 1.340
x1 < xo < x2
SWN SCIENCE DEPARTMENT
106
)330.1333.1(330.1340.1
9082.09099.09082.0)333.1(
f
)003.0(0100.0
0017.09082.0
90871.000051.09082.0)333.1( f
)333.1(1 0913.090870.01
0913.0