Post on 02-Jan-2016
transcript
Chapter 13: Categorical Data Analysis
Statistics
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
2
Where We’ve Been Presented methods for making inferences
about the population proportion associated with a two-level qualitative variable (i.e., a binomial variable)
Presented methods for making inferences about the difference between two binomial proportions
Where We’re Going
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
3
Discuss qualitative (categorical) data with more than two outcomes
Present a chi-square hypothesis test for comparing the category proportions associated with a single qualitative variable – called a one-way analysis
Present a chi-square hypothesis test relating two qualitative variables – called a two-way analysis
13.1: Categorical Data and the Multinomial Experiment
Properties of the Multinomial Experiment
1. The experiment consists of n identical trials.
2. There are k possible outcomes (called classes, categories or cells) to each trial.
3. The probabilities of the k outcomes, denoted by p1, p2, …, pk, where p1+ p2+ … + pk = 1, remain the same from trial to trial.
4. The trials are independent.
5. The random variables of interest are the cell counts n1, n2, …, nk of the number of observations that fall into each of the k categories.
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
4
13.2: Testing Categorical Probabilities: One-Way Table
Suppose three candidates are running for office, and 150 voters are asked their preferences. Candidate 1 is the choice of 61 voters. Candidate 2 is the choice of 53 voters. Candidate 3 is the choice of 36 voters.
Do these data suggest the population may prefer one candidate over the others?
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
5
13.2: Testing Categorical Probabilities: One-Way Table
Candidate 1 is the
choice of 61 voters.
Candidate 2 is the
choice of 53 voters.
Candidate 3 is the
choice of 36 voters.
n =150
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
6
130 1 2 3
13
0
1 2 3
20
2 22 1 1 2 2
1 2
: No preference
: At least one of the proprtions exceeds
150(Number of votes for each candidate| ) 50350
A chi-square ( ) test is used to test .
[ ] [ ]
a
H p p p
H
E H
E E E
H
n E n E
E E
23 3
3
2 2 22
2.05, 2
[ ]
[61 50] [53 50] [36 50]6.52
50 50 50
5.99147df
n E
E
13.2: Testing Categorical Probabilities: One-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
7
Reject the null hypothesis
Test of a Hypothesis about Multinomial Probabilities: One-Way Table
H0: p1 = p1,0, p2 = p2,0, … , pk = pk,0
where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial probabilities
Ha: At least one of the multinomial probabilities does not equal its hypothesized value
where Ei = np1,0, is the expected cell count given the null hypothesis.
13.2: Testing Categorical Probabilities: One-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
8
22 [ ]
Test statistic: i i
i
n E
E
2 2Rejection region: ,
with (k-1) df.
Conditions Required for a Valid 2 Test:One-Way Table
1. A multinomial experiment has been conducted. 2. The sample size n will be large enough so that, for every cell,
the expected cell count E(ni) will be equal to 5 or more.
13.2: Testing Categorical Probabilities: One-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
9
Legalization Decriminalization Existing Law No Opinion
7% 18% 65% 10%
13.2: Testing Categorical Probabilities: One-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
10
Example 13.2: Distribution of Opinions About MarijuanaPossession Before Television Series has Aired
Table 13.2: Distribution of Opinions About MarijuanaPossession After Television Series has Aired
Legalization Decriminalization Existing Law No Opinion
39 99 336 26
13.2: Testing Categorical Probabilities: One-Way Table
11McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
13.2: Testing Categorical Probabilities: One-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
12
Expected Distribution of 500 Opinions About MarijuanaPossession After Television Series has Aired
Legalization Decriminalization Existing Law No Opinion
500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50
0 1 2 3 4
22
2 2.01, 3
: .07, .18, .65, .10
: At least one of the proportions differs
from its null hypothesis value.
[ ]Test statistic:
Rejection region: 11.3449
a
i i
i
df
H p p p p
H
n E
E
13.2: Testing Categorical Probabilities: One-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
13
Expected Distribution of 500 Opinions About MarijuanaPossession After Television Series has Aired
Legalization Decriminalization Existing Law No Opinion
500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50
2 2.01, 3
2 2 2 22
2
Rejection region: 11.3449
(39 35) (99 90) (336 325) (26 50)
35 90 325 50
13.249
df
13.2: Testing Categorical Probabilities: One-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
14
Expected Distribution of 500 Opinions About MarijuanaPossession After Television Series has Aired
Legalization Decriminalization Existing Law No Opinion
500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50
2 2.01, 3
2 2 2 22
2
Rejection region: 11.3449
(39 35) (99 90) (336 325) (26 50)
35 90 325 50
13.249
df
Reject the null hypothesis
13.2: Testing Categorical Probabilities: One-Way Table
Inferences can be made on any single proportion as well: 95% confidence interval on the proportion of citizens in the
viewing area with no opinion is
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
15
4
4
4
ˆ4
44
4 4ˆ
ˆ4
ˆ 1.96
26ˆwhere .052
500
ˆ ˆ(1 ) .052(.948)and .0099
500ˆ 1.96 .052 1.96(.0099) .052 .019
p
p
p
p
np
n
p p
np
13.3: Testing Categorical Probabilities: Two-Way Table
Chi-square analysis can also be used to investigate studies based on qualitative factors. Does having one characteristic make it
more/less likely to exhibit another characteristic?
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
16
13.3: Testing Categorical Probabilities: Two-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
17
Column
1 2 c Row Totals
1 n11 n12 n1c R1
Row 2 n21 n22 n2c R2
r nr1 nr2 nrc Rr
Column Totals C1 C1 C1 n
The columns are divided according to the subcategories for one qualitative variable and the rows for the other qualitative variable.
13.3: Testing Categorical Probabilities: Two-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
18
0
22
General Form of a Two-way (Contigency) Table Analysis:
A Test for Independence
: The two classifications are independent
: The two classifications are dependent
[ ]Test statistic:
where
a
ij ij
ij
H
H
n E
E
2 2
and total for row , total for row , sample size
Rejection region: , df = ( 1)( 1)
i jij
i j
RCE
nR i C j n
r c
13.3: Testing Categorical Probabilities: Two-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
19
The results of a survey regarding marital status and religious affiliation are reported below (Example 13.3 in the text).
A B C D None Totals
Divorced 39 19 12 28 18 116
Married, never divorced
172 61 44 70 37 384
Totals 211 80 56 98 55 500
MaritalStatus
Religious Affiliation
H0: Marital status and religious affiliation are independentHa: Marital status and religious affiliation are dependent
13.3: Testing Categorical Probabilities: Two-Way Table
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
20
The expected frequencies (see Figure 13.4) are included below:
A B C D None Totals
Divorced 39(48.95)
19(18.56)
12(12.99)
28(27.74)
18(12.76)
116
Married, never divorced
172(162.05)
61(61.44)
44(43.01)
70(75.26)
37(42.24)
384
Totals 211 80 56 98 55 500
MaritalStatus
Religious Affiliation
The chi-square value computed with SAS is 7.1355, with p-value = .1289.Even at the = .10 level, we cannot reject the null hypothesis.
13.3: Testing Categorical Probabilities: Two-Way Table
21McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
13.4: A Word of Caution About Chi-Square Tests
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
22
13.4: A Word of Caution About Chi-Square Tests
McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis
23
Be sure