Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | philip-carr |
View: | 212 times |
Download: | 0 times |
Dan PiettSTAT 211-019
West Virginia University
Lecture 12
Last WeekHypothesis Tests on a difference in means Hypothesis Tests on a difference in
proportionsThe 2-sided alternative
OverviewChi-Squared Goodness of Fit TestChi-Squared Test of Independence
Section 12.1
Chi-Squared Goodness of Fit Test
Multinomial DataPreviously we have looked at data coming
from a binomial distribution2 Outcomes (Success, Failure)Example: Flipping a coin (Heads, Tails)
Suppose we are interested in data with more than 2 outcomesExample: Rolling a die6 Outcomes (1, 2, 3, 4, 5, 6)
We obtain multinomial data from a multinomial experiment
Multinomial ExperimentsMultinomial Experiments follow these
properties1. Fixed number of trials, n2. Each trial results in exactly one of K
possible outcomes3. Probability pi, is the probability of getting
outcome i on a single trial p1 + p2 + p3 + … + pK = 1
4. Trials are independent
Finding Expected FrequenciesRemembering back to the binomial distribution
Expected Value = n*pFor our multinomial distribution we will have K
expected countsEach Expected Count; Ei = n*pi
Example: Rolling a fair 6-sided die 600 times (pi = 1/6)Outcome 1 2 3 4 5 6
Probability 1/6 1/6 1/6 1/6 1/6 1/6
Expected Counts
100 100 100 100 100 100
Observed FrequenciesWhen we do our multinomial experiment,
we will not always get exactly our expected counts.Example:
We expected 100 4’s on our dice experiment. Suppose we only get 85.
85 is our Observed Frequency; Oi
Our Observed Frequencies (Counts) are our actual data
Suppose on our 600 dice throws, these are our observed counts
Outcome 1 2 3 4 5 6
Expected Counts
100 100 100 100 100 100
Observed Counts
97 113 102 85 109 94
Chi-Squared Goodness of Fit TestSo the question to be asked when looking at a
table like this is “are our observed counts far enough from our expected counts to determine that the expected counts are wrong?”
This is what the Chi-Squared Goodness of Fit Test attempts to answer.
Note that our test will follow the 7 step procedure
Outcome 1 2 3 4 5 6
Expected Counts
100 100 100 100 100 100
Observed Counts
97 113 102 85 109 94
Chi-Square Goodness of Fit Test1. H0: p1 = #1, p2 = #2, … pK = #k
2. HA: At least one pi ≠ #i
3. Alpha is .05 if not specified
4. Test Statistic =
5. P-value will come from the Chi-Squared Table with df = k-1 P(Test Statistic > Chi Squared Tabled Value)
There is only 1 alternative hypothesis
6. Our decision rule will be to reject H0 if p-value < alpha
7. We have (do not have) enough evidence at the .05 level to conclude that the at least one of our probabilities is incorrect.
We require that our expected counts at each cell are at least 5 and that our sample is independent and random.
Example:For Fall 2013, 99 STAT 211 students were given
a choice of 3 section times (A,B,C) to take the final exam. The data that follows shows the number of students who selected each section. Does the data indicate that the students exhibit a preference, or indicate that all sections are equally likely to be chosen. Use alpha=.05 (Hint: If all 3 are equally likely, all pi’s will be 1/3)
Observed Counts:A – 40B – 30C – 29
Section 12.2
Chi-Squared Test for Independence
Association of Categorical VariablesThus far, all of our confidence intervals and
hypothesis tests have been done on numeric variables.
We will now shift our attention to categorical variablesEx: Eye Color, Class Rank
The question we wish to answer is, “is there an association between two categorical variables?”Ex: Is there an association between Eye Color and
Hair Color?We will use a Chi Squared Test to answer this
question, but first we need to discuss contingency tables.
Contingency Tables (Observed)We can organize categorical data in a
contingency table, with r rows and c columns. This is known as an r x c (r by c) contingency table. Note that the contingency tables contains observed counts
Example: Some Possible Values for Hair Color vs. Eye Color
Hair x Eye
Brown Blue Green
Black 90 20 8
Brown 65 22 9
Blonde 33 75 12
Contingency Tables (Expected)Much like the goodness of fit test, we will need
to calculate our expected counts.The formula for the expected counts isSo for the previous example
We now have Observed and Expected counts, so we can do a Chi-Squared Test for independence
Hair x Eye
Brown Blue Green Total
Black 110 (81.1) 20 (45.6) 8 (11.3) 138
Brown 65 (??) 22 (??) 9 (??) 96
Blonde 33 (??) 75 (??) 12 (??) 120
Total 208 117 29 354
Chi-Squared Test for Independence1. H0: Variable 1 and Variable 2 are independent
2. HA: Variable 1 and Variable 2 are not independent (dependent)
3. Alpha is .05 if not specified
4. Test Statistic =
5. P-value will come from the Chi-Squared Table with df = (r-1)(c-1)
P(Test Statistic > Chi Squared Tabled Value) There is only 1 alternative hypothesis
6. Our decision rule will be to reject H0 if p-value < alpha
7. We have (do not have) enough evidence at the .05 level to conclude that the variables are dependent.
We require that our expected counts at each cell are at least 5 and that our sample is independent and random.
ExampleDoes “test failure” reduce academic
aspirations and thereby contribute to a decision to drop out of school? A survey of 283 students is randomly selected from schools with low graduation rates. The contingency table below reports the results to the question “Do tests required for graduation discourage students from staying in school?” Does there appear to be a relationship between the schools’ location and the students’ responses?
Response x School
Urban Suburban Rural
Yes 57 27 47
No 23 16 12
Unsure 45 25 31