Post on 20-Jan-2016
description
transcript
Mathematlcs Term 3 STPM Chapter 6 Chisquaredrests N
6.1 The Chi-squored Distribution
Hypothesis test discussed in the last chapter each involves a null hypothesis stated in terms of a population
parameter and a test statistic having a known probability distribution. They are called parametric tests.
However, not all ideas can be stated in terms of population parameters. In this chapter, we shall discuss a
non-parametric test called chi-squared test which is performed using the chi-squared distribution.
Let xt, x2, ..., x,be a random sample from a normal distribution with mean 1t andvariance d.
Then the sampling distribution of the statistic
Le.-o)'^,2 - i=l
C
is called the chi-squared distribution with n - L degrees of freedom. The probability density function is
givenby r , _xi
f(X',) = c(X',)' 'e 2
where c is a constant, Xl ls the chi-squared statistic with v degrees offreedom and e is the base ofthe natural
logarithm. c is a normalised factor so that the area under the chi-squared curve is equal to one.
Examples of chi-squared distributions with various degrees of freedom are shown in the figure below. The
curve for degrees of freedom, y = n - 1 = 3 - I = 2, represents the distribution of chi-square values computed
from all possible samples of size 3. Likewise, the curve for degrees of freedom equal to 10 corresponds to
the distribution for samples of size 11.
ill-
295
I
l*Nl *"ah"-"tics Term 3 STPM chapter 6 chi-squared Tests
The chi-squared distribution has the following properties:
. The values of X2 cannot be negative
. The curve is not symmetric
. They are all positively skewed
. As v gets larger, the degree of skewness decreases
. The mean of the distribution is equal to the number of degrees of freedom: p = v.
. The variance is equal to two times the number of degrees of freedom: 02 = 2 x v
. When the degrees of freedom are greater than or equal to 2, the maximum value occurs when
xl,=, - z
. As the degrees of freedom increase, the chi-squared curve approaches a normal distribution.
The area under the curve between 0 and a particular chi-squared value is a cumulative probability associated
with that chi-squared value. For example, the figure below is a graph of the chi-squared distribution with6 degrees of freedom, the shaded area represents a cumulative probability associated with a chi-squared
statistic equal to x; that is, it is the probability that the value of a chi-squared statistic will fall between
0 and x.
The X2-distribution table gives values of X' for various values of a and v, where a and v represent
significance level and degrees of freedom respectively. The areas, c, are the column headings; the degrees offreedom, v, are given in the left column, and the table entries are the X2 values. Hence the X2 value with 6
degrees of freedom, leaving an area of 0.05 to the left, is Xi = 1.635. Owing to lack of symmetry, we mustalso use the table to find X'u = 12.592 for q, = 0.95.
296
BJ
Mathematics Term 3 STPM Chapter 6 Chisquared fests N
Critical values for the X2-distribution
If X has a X2-distribution with u degrees of freedom, then for eachpair of values of p and v, the tabulated value of x is such thatP(X< x)=P. N
ill-
P 0.01 0.025 0.0s 0.9 0.95 0.975 0.99 0.995 0.999
v =l2
3
4
5
6
7
8
9
10
11
t2
l3
t4
15
l6t7
18
t9
20
2t
22
23
24
25
26
27
28
29
30
0.031571
0.02010
0.1 148
0.2971
0.5543
0.872r
1.239
t.647
2.088
2.558
3.053
3.571
4.107
4.660
5.229
s.8t2
6.408
7.0rs
7.633
8.260
8.897
9.542
10.20
10.86
lt.52
12.20
12.88
13.56
t4.26
14.9s
0.039821
0.05064
0.21s8
0.4844
0.8312
1.237
1.690
2.1 80
2.700
3.247
3.816
4.404
5.009
s.629
6.262
6.908
7.564
8.231
8.907
9.59r
10.28
10.98
tL.69
t2.40
13.r2
13.84
14.57
15.31
16.0s
t6.79
o.0\932
0.t026
0.3518
0.7t07
1.145
1.63s
2.167
2.733
3.32s
3.940
4.575
5.226
5.892
6.57t
7.26r
7.962
8.672
9.390
10.12
10.85
I 1.59
12.34
13.09
13.8s
14.61
1s.38
16.15
16.93
17.71
t8.49
2.706
4.60s
6.251
7.779
9.236
t0.64
t2.02
t3.36
14.68
t5.99
17.28
18.55
19.81
2t.06
22.3r
23.s4
24.77
25.99
27.20
28.41
29.62
30.81
32.0t
33.20
34.38
35.56
36.74
37.92
39.09
40.26
3.841
5.991
7.815
9.488
tl.07t2.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
26.30
27.59
28.87
30.14
3t.41
32.67
33.92
35.t7
36.42
37.65
38.89
40.1 I
41.34
42.56
43.77
5.024
7.378
9.348
I 1.14
t2.83
14.45
16.01
17.53
19.02
20.48
21.92
23.34
24.74
26.r2
27.49
28.85
30.1 9
31.53
32.85
34.t7
35.48
36.78
38.08
39.36
40.6s
41.92
43.19
44.46
45.72
46.98
6.635
9.2t0
r1.34
t3.28
15.09
16.81
18.48
20.09
21.67
23.2r
24.73
26.22
27.69
29.14
30.58
32.00
33.4r
34.81
36.t9
37.57
38.93
40.29
4t.64
42.98
44.3r
45.64
46.96
48.28
49.s9
50.89
7.879
10.60
12.84
14.86
16.75
18.55
20.28
2t.95
23.s9
25.t9
26.76
28.30
29.82
3t.32
32.80
34.27
35.72
37.t6
38.58
40.00
41.40
42.80
44.r8
45.56
46.93
48.29
49.65
50.99
52.34
53.67
10.83
t3.82
r6.27
18.47
20.51
22.46
24.32
26.r2
27.88
29.59
31.26
32.91
34.53
36.t2
37.70
39.25
40.79
42.31
43.82
45.31
46.80
48.27
49.73
5 1.18
52.62
54.05
s5.48
s6.89
58.30
59.70
297
lNl t"ah.*"tics Term 3 STPM chapter 6 chi-squared Tests
Example '1
Solution
Example 2
$olation
The curve of the chi-squared distribution with v = 3 degrees of freedom is shown
below. Find the critical value of X2 such that the area in the shaded region is0.025.
Look it up in the table by proceeding down the left column entitled v, degrees
of freedom, to v = 3. Then move to the right till the column labelled 0.975 is
found. The result is 9.348. Thus we have P(x' > 9.348) = 9.925.
A factory has produced a particular type ofdrill. On average, the useful operating
live is 5.5 hours. The standard deviation is 0.47 hour. The quality controldepartment runs a test by randomly selecting six drills. The standard deviation
of the selected drill is 0.61 hour. Determine the chi-squared statistic represented
by this test.
Given o = 0.47 hour, s = 0.61 hour, and the number of sample observations
n = 6. the chi-squared statistic is
n,z - nS2x= d_ 6(0.61'?)
0.472
= 10.107
GJl.
2.
E;ge1,eiSe_-Cl,_=Find the 95th percentile of the chi-squared distribution with 9 degrees of freedom.
Using the table of chi-squared distribution table, find
(a) P(x:, < 18.4s),
(b) P(X1, > 1e.81),
(c) P(X'r, ) 32.67).
298
Mathematics Term 3 STPM Chapter 6 Chi-squared r"sts N
Giving v and q, find the critical value(s) for each case
(a)3.
a--
(b)
(c)
4.
5.
Using the chi-squared distribution table, find the value of k such that
(a) P(X1, < k) = 0.0t
(b) P(x1, > k) = o.es
(c) P(k < x2s < 9.39) = o.o4
(a) Find the mean and the standard deviation of a chi-squared distribution with 8 degrees of freedom.(b) Which one of the following chi-squared distributions looks the most like a normal distribution?
(i) A chi-squared distribution with I degree of freedom(ii) A chi-squared distribution with 2 degrees of freedom(iii) A chi-squared distribution with 5 degrees of freedom(iv) A chi-squared distribution with 10 degrees of freedom
A random sample of 30 observations from a normal population with variance d = 8.3, is found tohave a sample variance s2 = LL.72. Determine the chi-squared statistic from this experiment,
The chi-squared test can be used to test how good a fit between observed frequencies and expected frequencies.Observed frequencies are the actual frequencies observed from a random sample. Expected frequenciesare theoretical frequencies based on a distribution under the null hyprothesis which is presumed to be trueuntil statistical evidence indicates otherwise.
As an example: what would we expect by flipping a coin 12 times? By chance, we observe six heads and sixtails. If we observe one head and eleven tails in this experiment, would this outcome be attributable merelyto chance or be it due to the coin being biased? The chi-squared test can help providing an answer.
Before discussing the chi-squared test, we have several assumptions to make. First, frequency data is used
to represent the actual number of elements in each category. Second, categories are mutually exclusive, that
6.
299
ilg-
iil*rNl u.th.-"tics Term 3 STPM chapter 6 chisquared Tests
is, whatever is being tallied can only be in one cell and cannot overlap. Third, categorical data is a grouping
of data according to similar characteristics in a way to show the frequencies of each category.
Let us look at an example to see how we use the chi-squared test to determine whether the frequencies
observed across the categories differ significantly from what are expected theoretically. Consider the tossing
of a six-sided dice. We have the null hlpothesis that the dice is fair, which is equivalent to the hlpothesis
that the distribution of outcomes is uniform. Suppose that the dice is thrown 60 times and each outcome is
recorded. The observed frequency o for each face of the dice is shown in the table below:
The chi-squared test will compare the observed frequencies o. with the corresponding expected frequencies
e-. The table above lists the observed frequencies, and the expected frequencies need to be determined.
To calculate the expected frequency for each outcome, we make use of the hypothesis that the outcome
of a fair dice is uniformly distributed. Since the probability of each outcome is one-sixth and there are a
total of 60 rolls of the dice, we have
Expected frequency e x60=10
Note that the expected frequencies are anticipated only in theoretical sense. It is not practical to expect
the observed frequencies perfectly match the expected frequencies. The table below lists the observed and
expected frequencies for each category:
Faces
1
ot = 12
I
or = 12
er=10
2
o,=8
2
o,=8
e:=10
Faces
3
o_, = I'l
3
ot= 14
e., = l0
4
or= 7
4
oi7er= l0
5
o-=9
5
o---9
e-=10
6
oa=10
6
oe=10
ee= l0
_16
6
Now, we need to decide whether the observed frequencies are reasonably close to the expected frequencies
or really different from them. The hypothesis to be tested is how good the observed frequencies fit a given
pattern or a theoretical distribution. The test is called a goodness-of-fit test.
A useful measure for the oerall discrepancy between the observed and expected frequencies is the chi-squared test statistic
5b -,t'v2 i=l r I' ,-1
where X2 is a value of a random variable X2 whose sampling distribution is approximately very closely
described by the chi-squared distribution with k - 1 degrees of freedom and k is the number of categories.
The symbols o. and e. represent the observed and expected frequencies respectively for the lth category.
For the chi-squared goodness-of-fit test, the number ofdegrees offreedom shows the number ofindependentfree choices which can be made in allocating values to the expected frequencies. In this example of tossing
300
Mathematics Term 3 STPM chapter 6 Chi-squaredf""ts N
a dice, there are six expected frequencies (one for each face, that is, I to 6) and only five of the expected
frequencies can vary independently and the sixth one must take whatever value is required to fulfil that
constraint oftotal frequency. Thus, the degrees offreedom v = number ofcategories - number ofconstraints.
Here there are six categories and one constraint, so v = 6 - I = 5.
To calculate the chi-squared test statistic, we first subtract the expected frequency e. from the observed
frequency o-. Then we square the difference and subsequently divide the squared difference by the expected
frequency e., before finally adding the quotients. This is done in the table below:
This means the value of X2 with 5 degrees of freedom is 3.4.
In the goodness-of-fit test, if the observed frequencies are the same as the expected frequencies, then
X2 = 0. Thus, if X2 value is small, there will be high degree of compatibility between expected and observed
frequencies, indicating a good fit. lf X2 value is large, there is a low degree of matching between the two
frequencies and the fit is poor. This also implies that the critical region falls in the right tail of the chi-
squared distribution. At the l0% significance level, we flnd X'z, = 9.236 using X2 table. The calculated value
of X2 = 3.4 is less than 9.236, it would support the hypothesis that the outcomes of the dice is uniformlydistributed. In other words, the dice is fair.
Note: To perform a chi-squared test, the expected frequency for each category is at least equal to 5. This
restriction may require combining adjacent categories, resulting in a reduction of the number of degrees offreedom.
il,g-
Faces o.I
e.I
(o,"r)
(o. - e,)2(o, - e,)2
e.I
1 t2 10 2 4 0.4
2 8 l0 1 4 0.4
J t4 l0 4 t6 1.6
4 7 l0 _J 9 0.9
5 9 l0 -1 I 0.1
6 l0 0l0 0 0
X2 = 3.4
9.236
30r
lSl *.ah"-.tlcs Term 3 STPM Chapter 6 Chi-squared Tests
EXample 3 A quality supervisor at a glass manufacturing factory inspects a random sample
of 60 sheets of glass to check for any minor defects. The number of flaws in a
glass sheet are recorded. The results are as follows:
Numberofflaws 0 1 2 3
Observed frequency 32 15 9 4
Use a 5% significance level to test the hypothesis that these data follows a Poisson
distribution.
A test procedure is as follows.
i:*":#illI:i#liHr"'#ilLi',',',::'r',T.0,,,.,Step @: Specify the significance levelHere a = 0.05
Step @: Select the appropriate test statistic and calculate its valueUse the chi-squared goodness-of-fit test to determine whether observed sample
frequencies differ significantly from expected frequencies specified in the nullhypothesis.
The mean of the presumed Poisson distribution is unknown so must be estimatedfrom the data by the sample mean,
Lox^- L,
- 3z)o+rc*t+9*z+q*332+15+9+4
Hencewithtr=0.75,
p(X = x) - e-o'5.0.'75*' , xi= o, 1,2,3' i' x.!
which gives the following probability associated with each class and thus thecorresponding expected frequency is obtained by multiplying the appropriatePoisson probability by the sample size n = 60.
x, P(X=x,) e,
0 0.472 28.32
t 0.354 2t.242 0.133 7.98
3 or more 0.041 2.46
If an expected frequency is less than 5, two or more classes can be combined.In the above situation the expected frequency in the last class is less than 3, so
we should combine the last two classes to get,
=4560= 0.75
B6
302
Mathematlcs Term 3 STPM Chapter 6 Chi-squared f"rrc N
Number of 0bserved Expectedflaws frequency frequency
0 32 28.32
1 15 21.24
2 or more 13 10.44
The chi-squared value can now be calculated:
w2-s @-e)'l\ -L e
_ (32 - 28sD'z (ls - 2t.2q'z (13 - rl.4q'z28.32 2t.24 10.44
= 2.94
Step @: Determine the critical regionSince both the total frequency and the mean of the Poisson distribution of the
observed data are required in estimation, the number of degrees of freedom is
k - 2.Here, we have 3 classes, thus the chi-squared statistic has 3 - 2 = | degree
of freedom. Using a significance level of 0.05, from chi-squared distribution table,the critical value of X'?o.r, with 1 degree of freedom is 3.841.
Step @: Make a decisionAs X2 = 2S4 < 3.841, we conclude that there is no real evidence to suggest the
data does not follow a Poisson distribution.
Exampre 4 fr"i11*"3:'rJi"Ji #u::;r,#1T""'Hl'i-'1fi3;:"Jl",H5il;deviation s = 6.4 minutes. Determine wether there is significant evidence at
the 5o/o significance level, to reject the null hypothesis that the call length has a
normal distribution.
Call length (in minutes) Frequency
0-s 4
5-10 9
10-15 16
15-20 13
20-25 5
2s-30 3
We proceed with the steps of a test procedure as follows:
Step @: State the hypothesesHo: The telephone call lengths follow a normal distributionH,: The telephone call lengths do not follow a normal distribution
ill-303
N U"th.-"tlcs Term 3 STPM Chapter 6 Chi-squared Tests
Classboundaries oi
Below 10 13
10-15 16
15-20 13
Step @: Specify the significance levelHere a = 0.05
Step @: Select the appropriate test statistic and calculate its valueUse the chi-squared goodness-of-fit test to determine whether observed sample
frequencies differ significantly from expected frequencies specified in the nullhypothesis.
The distribution of call lengths may be approximated by the normal distribution.The sample mean and sample standard deviation will be used for p and o incalculating z values corresponding to the class boundaries. The expected frequencyfor each class (category), listed in the given table can be obtained from a normalcurve. The z values corresponding to the boundaries of the second class are
_ 5-t4 = -t.406r 6.4
,-= to-t+ =_0.625, 6.4
From the normal table, the area between zt = -1.406 and z, = -0.625 is
P(-1.406<Z<-0.62s)= P(Z < -0.62s) - P(Z < -1.406)= 0.266 - 0.08 = 0.186
Thus, the expected frequency for the second class is e, :0.186 x 50:9.3.
The expected frequency for the first class interval is obtained by using the totalarea under the normal curve to the left of the boundary 5. For the last class
interval, we can use the total area to the right of the boundary 25. All otherexpected frequencies could be found by the similar method described above forthe second class. The complete set of calculation needed to find the expectedfrequency in each class is summarised in the table below. Note that we have
combined adjacent classes in the table, where the expected frequencies are less
than 5. As a result, the total number of classes is reduced from 6 to 4.
Class boundaries ! o,
i:i, i ;),,l0-ls i 16
rs-20 I ,,
'ri -'rZ ;)tThe following table shows the detailed calculations for the chi-squared value.
€, (o,- e,) (o,- e,)2 +Lr3.3 -0.3 0.09 0.0068
€,
1l
e;i t3 3
14.8
t3.2
ilj,"
reJ 14.8 1.2 t.44
13.2 -0.2 0.04
0.0973
0.0030
0.0727
X2 = 0.180
304
Above 20 8 8.8 -0.8 0.64
Mathematlcs Term 3 STPM Cf,apter 6 Chi-squared f""t" N
Step @: Determine the critical regionAltogether three constraints: total frequency, sample mean and standard deviation,have been estimated from the sample data, the number of degrees of freedomis therefore equal to k - 3 = 4 - 3 = l. Using a significance level of 0.05, thecritical value of chi-squared with I degree of freedom is 3.841.
Step @: Make a decisionAs X2 = 0.180 < 3.841, we have no reason to reject the null hypothesis andconclude that the normal distribution offers a good frt for the distribution oftelephone call lengths.
l.
Number of accidents 0
Observed frequency 28
(a) Determine the mean number of accidents per week.(b) Test the hypothesis that the data follows Poisson distribution at the 5% significence level.
12315 12 5
6.
Exereise 6.'Assume that a chi-squared goodness-of-fit test is conducted. Determine the critical value of the chi-squared test statistic for each of the following cases.(a) Number of categories = 7, ot = 0.01(b) Number of categories = 10, a = 0.10
A random sample of 500 observations is obtained and distributed into 4 categories as follows:
CategoryL234xi 49 263 146 42
Use a = 0.05 to test the null hypothesis Ho: p, = 0.10, pz = 0.50, p, = 0.30, p4 = 0.10.
Three coins are tossed 150 times, and the observed frequencies of 0, l, 2 and 3 heads per toss are14, 43, 67 and 26 times respectively. Use a 570 significance level to test whether the three coins are
balanced.
An experiment is to draw a card from a regular deck of 52 cards that has been thoroughly shuffledand it is recorded whether it is a spade, heart, diamond, or club. This process is repeated 40 times,each time replacing the card just drawn. If after 40 trials, 9 spades, 13 hearts, ll diamonds and 7 clubsare obtained. Test the hypothesis that the deck is honest at the 10% significence level.
Each package of beans sold in the supermarket is supposed to mix red beans, mung beans, black beansand black-eyed beans in the ratio of 5:3:l:1. A random sample selected from these packages contains400 of mixed beans is found to have 210 red beans, 124 mung beans, 30 black beans and 36 black-eyed beans. Test the hlpothesis that the package contains the mixed beans in the ratio 5:3:1:l at the0.05 significance level.
A boy buys a bag of 100 jelly beans. This bag has 5 different colours of jelly beans in it. Assume allfive colours are equally likely to be put in the bag. The boy is curious about the colour distributionand opens the bag. He finds out that he has 17 brown, 24 yellow, l0 red, 31 green, and l8 white. Testthe hlpothesis that the colours of the jelly beans occur with equal frequency at a significance level of5o/o.
The number of road accidents per week at a junction is monitored by the public traffic department.The table below shows the frequency of accidents per week in 60 weeks.
7.
305
il6
6
8. The following frequency distribution table represents the number of days during a year that a total of50 employees at a company are absent from work due to illness. It is thought that the data follows anormal distribution with population mean Lt = 7 and, standard deviation o = 3.
Number of days absent Number of employees
0-3 4
3-6 13
6-9 24
9-t2 7
t2-15 2
Test the goodness-of-fit between the observed class frequencies and the corresponding expected
frequencies of a normal distribution at the 5% significence level.
9. A paper shop has several retail stores in a city. The following table shows the number of boxes shipped
per day for the last 100 days.
Number of packages shipped Number of days
0-5 5
5-10 13
10-15 28
t5-20 23
20-25 18
25-30 l0
30-35 3
(a) Calculate the sample mean and sample standard deviation of the number of absent days per week.
(b) Use a 5% significance level to test the goodness of fit between the observed class frequencies snd
the corresponding expected frequencies of a normal distribution.
10. The table below shows the number of rain days in fanuary for the years from 1953 to 2004.
Numberofraindays 0 I 2 3 4 5
Observed frequency 9 7 14 15 6 I
(a) Find the mean rain day.
(b) Test the hypothesis that the recorded data may be fitted by the Poisson distribution at the 10olo
significance level.
11. A recent study reports the number of hours of personal computer usage per week for a sample of 60
persons. Excluding from the study are people who work in the office and use the computer as part oftheir work.
1.1 6.7 2.2 2.6 9.8 6.4 4.9 5.2 4.5 9.3 7.9 4.6
4.3 4.5 9.3 5.3 6.3 8.8 6.5 0.6 5.2 6.6 9.3 4.3
6.3 2.r 2.7 0.4 5.1 5.6 5.4 4.8 2.1 10.1 1.3 5.6
2.4 2.4 4.7 1.7 2.0 6.7 3.7 3.3 1.1 2.7 6.7 6.s
4.3 9.7 7.7 5.2 r.7 8.s 4.2 5.5 9.2 8.s 6.0 8.1
(a) Organise the data into a frequency distribution.(b) Compute the sample mean and sample standard deviation of number of hours computer usage
per week.(c) It is thought that the data follows a normal distribution. Test the hlpothesis at the 57o significance
Ievel.
306
Mathematics Term 3 STPM Chapfer 6 Chi-squared fe"ts N
When two attributes (variables) are observed for each element of a random sample, the data can besimultaneously classified with respect to these attributes in a two-way classification table called a contingencytable. We can then determine whether there is a significant association between the two attributes.
Suppose we take a random sample of 200 persons and classify them based on gender as well as whether thesepersons own handphones. The observed frequencies are presented in the following 2 x 2 contingency table.
Own handphone Total
(no)
60 130
40 70
100 200
A contingency table can be of any size. In general, a contingency table with r rows and c columns is denotedas an r x c table. The row and column totals in the above table are called marginal frequencies. It is commonpractice to refer to each possible outcome of an experiment as a cell. Hence in our example we have four cells.
Let us test the hlpothesis of independence between a person's gender and a person's possession of a handphone.To perform this test, we first calculate the expected frequencies for each of the four cells of the above 2 x2 contingency table under the assumption that the hypothesis is true.
Let M represent the event that an individual selected from the sample is male.Let Y represent the event that an individual selected owns a handphone.
events, P(M n D = P(M)P(I). But P(M n n =#,P(M) =ffi, a.,d
e ,., - I no\/ roo \2oo - \ 2oo /\ 2oo /
Which we can rearrange as
, - 130 x 100 _ (First row total)(First column total)',,- 2oo--@"Where e,, is the expected frequency for the cell in row I and column l.
The general formula for obtaining the expected frequency of any cell is given by
Expected frequency - (Row-total)(Colpmn total)Total sample size
The expected frequency for each cell is recorded in parentheses beside the actual observed value in the tableshown below.
MaleFemale
Total
Own handphone(ves)
70
30
100
Own handphone(yes)
70 (6s)30 (3s)
100
Own handhpone i Total(no)
I
,
60 (6s) I 130
40 (3s) I 70
100 200
Since M and Y are independent
P()'') = loo . Thu.. we have200
MaleFemale
Total il6,--Note that the expected frequencies in any row or column add up to the appropriate marginal total. Weneed to calculate only the one expected frequency in the top row of the table and then find the others bysubtraction. The number ofdegrees offreedom associated with the chi-squared test used here is equal to thenumber of cell frequencies that may be filled in freely when we are given the marginal totals and the grand
307
G--g
o DrrM onapteroL;nFsqu
total, and in this illustration that number is 1. A simple formula providing the correct number of degrees offreedom is
v=(r_l)(c_l).Hence, for our example, v = (2 - l)(2 - 1) = I degree of freedom.We want to measure how much the observed frequencies differ collectively, from their corresponding expectedfrequencies. We do this with the chi-squared test statistic
-,- { (o -e,)2n-,?, ,
,
where the summation extends over all the cells in the r x c contingency table.
We have
uz _ (70 - 65)'z (60 - 65)r (30 - 35): (40 - 35),65 65 35 3s
= 2.1978
Using a chi-squared table, we can see that for y = 1, the critical value for 5% significance level is X] = 3.3a1.Since the calculated value for X2 of 2.1978 does not fall within the critical region, we do not ieject thehypothesis that there is no relationship between a person's gender and the person's possession ofa handphone.
EXample 5 The following data show the attitude of housewives in various parts of the countryto a certain brand of detergent.
Attitude North Central South
Like 46 21 3lIndifferent 25 58 35Dislike 16 37 42
Test the hlpothesis that the attitude to new introduced detergent is independentof geographical area of residence at the l7o significance level
The given table is arranged to include the row and column totals.
Attitude North Central South Total
Like 46 2t 31 98Indifferent 25 58 35 I l8Dislike 16 37 42 95
Total 87 116 108 311
Step @: State the hypothesesHo: There is no association between attitude and locationH,: Theere is association between attitude and location
Step @: Specifr the significance levelGiven a = 0.01
Step @: Select the appropriate test statistic and calculate its valueUse the chi-squared test for independence to determine whether there is anysignificant association between the two categorical variables.
Mathematlcs Term 3 STPM Chapter 6 Chi-squaredf"y" N
As with goodness-of-fit test described earlier, the key idea of the chi-squaredtest for independence is a comparison of observed and expected frequencies.
The expected frequency for each cell of the table can be generated using the
following formula:
Expected frequency - (Row-total)(Colgmn total)- ---1---"-t
Total sample size
In fact, for a 3 x 3 contingency table, only four expected values in the top tworows of the table are calculated and the remaining five expected values are found
by subtraction. For example, to calculate the expected frequency (for attitude
like and north;29-I JL = 27.41.In this way, the table of both observed and' 311expected frequencies is as shown below.
Attitude North Central South Total
98
ll895
Total 87 116 108 311
The number of degrees of freedom v = (r - lXc - l) = (3 - 1X3 - l) = 4.
The chi-squared test statistic is
" .( (o,-e,)'L-2
i=l Ei
A6 - 27.4i'), Ql - 36.55)2, (31 - 34.04)2, (25 - 33.01)'z, (58 - 44.01)2
27.41 36.55 34.04 33.01 44.01
.(35-40.98)2. (16-26.5$2 . G7-35.4q'z , e2-32.98)'?40.98 26,58 35.44 32.98
= 33.5057
Step @: Determine the critical regionFrom chi-squared table, the critical value X2 for 4 degrees of freedom at 17o level
is given by 13.28.
Step @: Make a decisionAs the calculated value 33.51 is greater than the critical value 13.28, we can
conclude there is evidence to reject Ho; that is attitude to new detergent and
geographical area of residence are not independent.
IndifferentDislike
2s (33.01) s8 (44.01) 3s (40.e8)
16 (26.s8) 37 (3s.44) 42 (32.e8)
E}(ereise&1. An experiment has 500 observations and the data are classified into 4 x 6 contingency table. Suppose
we conduct a chi-squared test of independence at the l7o significance level. Assume the calculated
value of the chi-squared test statistic is 39.2.
(a) Determine the number of degrees of freedom.(b) Find the critical value for the chi-squared test of independence.(c) Determine whether the chi-squared test values falls into the critical region.
309
il6
lSl *.ahu-.tlcs Term 3 STPM Chapter 6Chr'-sguared rests
2, The following3 x 2 contingency table contains observed values for a sample of size 250. Determine
whether the row and column variables are independent using the chi-squared test with a = 0.025.
X ,Y
AB
C
25
55
63
)/32
38
3. A research group performs a study on gender and handedness (right- or left-handed). 800 individuals
are randomly chosen from a very large population. The following contingency table displays the
distribution of the two categories.
Right-handed Left-handed
Male 344 72
Female 352 32
Test the hypothesis that gender is independent of handedness at the 57o significance level.
4. Consider a sample of 200 customers. For each customer, we have information on gender and preference
of food. A contingency table for these data is shown below.
Indian fapanese Western
MaleFemale 20 50 20
Carry out a test, at the 57o significance level, to determine whether there is any association between
gender and preference of food.
5. In an experiment to study the association between diabetes and smoking habits, the following data are
502040
B-g
Non- Moderate Heavysmokers smokers smokers
Diabetes 25 30 18No diabetes 40 2L 16
Using a l%o significance level, test the hypothesis that there is no association between cigarette smokingand the risk of diabetes.
6. A camera manufacturer has four suppliers of lenses. The table below shows the numbers of defectivelenses supplied by the suppliers.
Good Defective
Supplier I 95 5
Supplier 2 180 15
Supplier 3 134 t6Supplier 4 138 7
Test, at the 57o significance level, whether the supplier is associated with the lens quality. What is youradvice to the purchasing department based on the test result?
Mathematics Term 3 STPM Chapfer 6 Chisquaredf"src N
7. The table shows the result of a taste test in which a random sample of 500 people in two age groupsis asked which of four formulations of a chocolate drink they prefer.
Age group Formulation A Formulation B Formulation C Formulation D
7 -2526-50
30
28
69
36
116
70
78
73
Use a 0.01 significance level to test whether the preference for the different formulation change withage.
8. Fruit trees are subject to a bacteria-caused disease. Several different treatments for this disease are
adopted. Treatment A: no action taken, treatment B: careful removal of clearly affected branches, andtreatment C: frequent spraying of the leaves with an antibiotic in addition to careful removal of clearlya{fected branches. There are few different outcomes from the disease. Outcome 1: tree dies in the same
year as the disease is noticed, outcome 2: tree dies 2-4 years after disease is noticed, outcome 3: treesurvives beyond 4 years. A group of 200 trees are assorted into one of the treatments and over thenext few years the outcome is recorded. The results are displayed in the following contingency table.
OutcomeTreatment
A B C
1
2
J
37
l6J
24
20
15
t732
36
Determine whether there is any substantial evidence to conclude that outcome is independent oftreatment. Use a 5% significance level for this test.
9. The table below shows the observed distribution of blood types: A, B, AB, and O in three samples ofMalays living in Kedah, Selangor and fohor.
Blood type Kedah Selangor |ohor
AB
ABo
t416
3
t7
205184
5l232
4t37
1l5l
Test, at the 5o/o significance level, whether the distribution of blood type is different across the threestates.
10. A manufacturer operates four assembly machines on three separate shifts daily. The table below gives
the number of machine breakdowns recorded in the past year.
Machine I Machine 2 Machine 3 Machine 4
First shiftSecond shiftThird shift
75
90
141
89
108
175
43
63
t2t
28
59
t4l
Determine whether these data provide sufficient evidence, at the 2.5o/o significance level, to infer thatmachine breakdown is independent of shift.
ill-3ll
ummePgl. The chi-squared distribution has one parameter, called the degree of freedom.
2. The chi-squared distribution curve lies to the right of the vertical axis and is skewed to the right.
3. In a goodness-of-fit test, we test the null hypothesis that the observed frequencies follow a certair
. :"":"::i::::.]:i":']:'::t:: ,h- -,,,, hrmnrhpcic rh,r rrrrn arrrihrrrpc ,rp inr,pnpnrpnr
5. General test procedure ln a chl-squared test.. State the hypotheses. Specify the significance level
. Calculate the value of the chi-squared test statistic f @' -e')' (Combine any adjacent classes
where necessary) i= I €,
. Determine the critical region based on the number of degrees of freedom and the significance level
. Make a decision
l. (a) Find P(0.83 < x1 < 12.8) .
(b) Determine the value of ft such that P(6.447 I X'r, < k) = O.Oag.
Three identical dice are thrown 150 times. The number of dice whose scores on the top faces at each
throw are odd is recorded. The results are as follows:
Using a 570 significance level, test the hypothesis that all three dice are unbiased.
A departmental store sells men's shirts and stocks these shirts in five different sizes: S, M, L, XL, andXXL. The number of the shirts sold each week is recorded.
Sizes Number of shirts
S
M
L
XL
xxL
2l24
39
25
13
4.
Test, at a l07o significance level, the hypothesis that number of shirts sold is uniformly distributed.
Cars heading to a certain junction may go straight, turn left or turn right. A road transport departmentofficer asserts that 60% of the cars will go straight at the intersection, and of the remaining 40%o, equalproportions will turn left and right. One hundred cars are randomly monitored and it is found that 51
cars go straight, 17 cars turn left are 32 cars turn right. Test, at the 5olo significance level, the hypothesisthat the proportions of cars going straight, turning left and turning right do not differ significantlyfrom those asserted by the officer.
JJ 59 43 l5
)
REVI'ION EXERCI'E
Number of odd scores
Frequency
Mathematics Term 3 STPM Chapter 6 Chi-squaredrr"ts N
A pharmaceutical company conducts a trial on 200 patients to determine the effectiveness of a new
cough remedy. Of these patients, 100 are randomly selected to be given the standard cough remedy
and the remaining 100 are assigned the new cough remedy. The result are recorded as shown.
No reliefSome reliefFull relief
53
34
13
6.
Carry out a test, at a significance level of 57o, to investigate whether the two cough remedies are equally
effective.
A football fan keeps the record of the goals scored per match by his favourite team. The results are
shown below.
(a) Computed the mean number of goals scored per match.
(b) Using a 57o significance level, perform a test of the hlpothesis that the number of goals per match
has a Poisson distribution.
The following table gives the cumulative frequency distribution of the lives (in years) of 40 note-book
batteries tested by a battery manufacturer.
Based on the previous experience, it is believed that a normal distribution with mean 3.5 years and
standard deviation 0.7 year provides a good approximation. Perform a chi-squared test, at the 5o/o
significance level, to determine whether the normal distribution gives a good fit for these data.
The table below shows the frequency distribution of marks for a paper obtained by 178 candidates.
The population mean and standard deviation of the distribution of marks for the paper are 26.0 and
11.5 respectively. Test, at the 10% significance level, the hypothesis that the distribution of marks forthe paper is normal.
7.
ilg*
Goals obtained per match
11 16 25 14
Battery life not greater than 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Cumulative frequency 0 2 J l 22 32 3t 40
50<x<6040<x<5030<x<4020<x<3010<x<200<x<10
5
19
34
63
47
l0
313
5.
Standardcough remedy
Newcough remedy
34Number of matches
Mark,.r Number of candidates
37
44
19
BJ
lNl U"th"-"tics Term 3 STPM chapter 6 Chi-squared Tests
9. A botanist sows three seeds in each of 80 pots. The number of seeds which germinate in each port is
recorded. The results of all the 80 ports are given in the following table.
Number of seeds germinate 0 1 2 J
Number of pots 25 20 29 6
(a) Estimate the probability that an individual seed germinates.
(b) Using a 17o significance level, test the hlpothesis that the data may be fitted by the binomialdistribution.
10. The distributions of marks for a paper marks in an examination has mean U and standard deviation
o. Each candidate is assigned one of the five grades A, B, C, D, E as follows:
Mark,x Grade
x 2 ui39,2 A
u+g< x < u+3!'22 B
u-g<xlui!'22 C
u-3L<x<rr-4'22 D
x < u-3L'2 E
The table below summarises the grades of a random sample of 198 candidates.
Grade A B C D E
Number of candidates t7 55 81 JJ t2
Determine, at the 1% significance level, the adequacy of a normal distribution as a model for these
data.
11. The lengths (in millimetres) in a random sample of 50 leaves of a certain plant are recorded
as follows:
145 133 125 157 165 138 t43 151 148 132
155 136 144 158 147 t52 140 148 146 150
138 177 165 l l8 154 126 163 121 140 168
163 r35 147 153 146 140 173 142 r35 138
156 147 142 128 144 145 l5l 135 161 150
Test the hypothesis that the leave length can be approximately modelled by a normal distribution.
Use a 0.05 significance level.
314
Mathematlcs Term 3 STPM Chapler 6 chi-squaredf""t" N
12. The table below shows the number of individuals exposed to a certain virus and the number ofindividuals who develop the disease.
Development of disease
Yes No
Exposure to Yes 44 116
virus No 19 128
Conduct a test of hypothesis at the l% significance level, to determine whether there is association
between the exposure to the virus and the development of the disease.
13. The table below shows the number of males and females in each of three ernployment categories at amanufacturing company.
Managerial Support Worker
Male 10 39 285Female 6 52 624
Using a 17o significance level, test whether there is any association between gender and employmentcategories.
14. A researcher in a study of heart disease in males links subjects to socioeconomic status and smokinghabits. The results are summarised in the contingency table below
Socioeconomic status
High Middle Low
Current 66 29)T9Ktng Former t 19 27hablts Never gg lz
55
36
30
Perform a chi-squared test on association between smoking habits and socioeconomic status. Use a
significance level 2.5o/o.
15. A hlpermarket wants to study the relationship between the method of payment by customers ofdifferent age groups. A random sample of 250 customers is taken and the results are summarised
in the table below.
Age group
L8-25 26-35 36-45 Over46
Payment Card l8 36 25 30
method Cash t4 27 33 67
Carry out a test at the 570 significance level to find out whether the method of payment is independent
of age group.
il6
315
N *"an"rr,.tics Term 3 STPM Chapter 6 Chi-squared Tests
The school of Biologicalpollutant and the numberin the table below.
16. Sciences of a university records theof brain abnormality for laboratory
level of exposure to a certainmice. The data are summarised
Number of brain abnormalitiy
0-2 3-4 5-6
Test, at the 570 significance level, r.thether there is association between the level of exposure to thepollutant and the number of brain abnormality lbund in the laboratory mice.
17, The table below summarises the number of hours of sleep at nights for a random sample of adults ofdifferent age groups.
Number of hours of sleep
Less than 6 6 to 8 More than 8
Age group
25-4445-54
>_ 55
41 85 70
34 77 62
76 69 43
Carry out a test, at the 1% significance level, to determine whether the number of hours of sleep is
independent of the age of an adult.
A plant expert collects samples of rice from a large field of 600 plots. One part of his investigation is
based on the sterility observed and genotype used for each plot.
Genotypes
I II III IV
Sterilitv
No problem
Moderate
Severe
30 21 19 16
r02 90 120 77
18 39 11 57
Test, at a l% significance level, whether sterility is independent of genotype.
6
HighMedium
Iow
t28
7
18
7
8
39
13
8
3t6
Level ofexposure to
pollutant
18.