Week 10
Chapter 10 - Hypothesis Testing III :The Analysis of Variance
(ANOVA)&
Chapter 11 – Hypothesis Testing IV: Chi Square
In This Presentation
The basic logic of analysis of variance (ANOVA)
A sample problem applying ANOVA The Five Step Model Limitations of ANOVA
Post hoc techniques
Chapter 8
Population =Penn State University
Group 1 = All Education
MajorsRS of 100
Education
Majors
Chapter 9
Population =Pennsylvania
Group 1 = All Males in
Population
Group 2 = All Females
in Population
RS of 100 Males
RS of 100 Females
In this Chapter
Population =Pennsylvania
Group 1 = All Protestants
in Population
Group 2 = All Catholics
in Population
RS of 100 Protest.
RS of 100 Catholics
RS of 100 Jews
Group 2 = All Jews in Population
Basic Logic
ANOVA can be used in situations where the researcher is interested in the differences in sample means across three or more categories
Examples: How do Protestants, Catholics and Jews vary in terms of
number of children? How do Republicans, Democrats, and Independents vary
in terms of income? How do older, middle-aged, and younger people vary in
terms of frequency of church attendance?
Basic Logic
ANOVA is used when: The independent variable has more than two categories The dependent variable is measured at the interval or
ratio level
Basic Logic
Can think of ANOVA as extension of t test for more than two groups The t test can only be used when the independent
variable only has two categories ANOVA asks “are the differences between the
samples large enough to reject the null hypothesis and justify the conclusion that the populations represented by the samples are different?” (p. 243)
The Ho is that the population means are the same:
Ho: μ1= μ2= μ3 = … = μk
Basic Logic
If the Ho is true, the sample means should be about the same value If the Ho is true, there will be little difference between
sample means
If the Ho is false, there should be substantial differences between categories, combined with relatively little difference within categories The sample standard deviations should be low in value If the Ho is false, there will be big difference between
sample means combined with small values for sample standard deviations
Basic Logic
The larger the differences between the sample means, the more likely the Ho is false – especially when there is little difference within categories
When we reject the Ho, we are saying there
are differences between the populations
represented by the sample
Example 1
We have administered the support for capital punishment scale to a sample of 20 people who are equally divided across five religious categories
Example 1
Hypothesis Test of ANOVA Step 1: Make assumptions and meet test
requirements Independent random samples Interval-ratio level of measurement Normally distributed populations Equal population variances
Example 1
Step 2: State the null hypothesis Ho: μ1 = μ2 = μ3 = μ4 = μ5
H1: at least one of the populations means is different
Step 3: Select the sampling distribution and establish the critical region Sampling distribution = F distribution Alpha = 0.05 dfw = 15, dfb = 4 F(critical) = 3.06
Example 1
Step 4: Compute test statistic F = 2.57
Step 5: Make a decision and interpret the results F(critical) = 3.06 F(obtained) = 2.57 The test statistic does not fall in the critical region,
so fail to reject the null hypothesis – support for capital punishment does not differ across the populations of religious affiliations
Limitations of ANOVA
1. Requires interval-ratio level measurement of the dependent variable and roughly equal numbers of cases in the categories of the independent variable
2. Statistically significant differences are not necessarily important
3. The alternative (research) hypothesis is not specific – it only asserts that at least one of the population means differs from the others
Use post hoc techniques for more specific differences
ANOVA output
ANOVA
Total output
309.600 2 154.800 14.221 .000
293.900 27 10.885
603.500 29
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
In This Presentation
Bivariate (crosstabulation) tables The basic logic of Chi Square The terminology used with bivariate tables The computation of Chi Square with an
example problem The five step model Limitations of Chi Square
The Bivariate Table
Bivariate tables: display the scores of cases on two different variables at the same time
The Bivariate Table
Note the two dimensions: rows and columns. What is the independent variable?What is the dependent variable? Where are the row and column marginals?Where is the total number of cases (N)?
Chi Square
Chi Square can be used: with variables that are measured at any level (nominal,
ordinal, interval or ratio) with variables that have many categories or scores when we don’t know the shape of the population or
sampling distribution
Basic Logic
Independence: “Two variables are independent if the
classification of a case into a particular category of one variable has no effect on the probability that the case will fall into any particular category of the second variable” (p. 274)
Basic Logic
Chi Square is a test of significance based on bivariate, crosstabulation tables (also called crosstabs)
We are looking for significant differences between the actual
cell frequencies OBSERVED in a table (fo) and those that
would be EXPECTED by random chance or if cell
frequencies were independent (fe)
Example
RQ: Is the probability of securing employment in the field of social work dependent on the accreditation status of the program?
NULL HYP: The probability of securing employment in the field of social work is NOT dependent on the accreditation status of the program. (The variables are independent)
HYP: The probability of securing employment in the field of social work is dependent on the accreditation status of the program. (The variables are dependent)
22100
5540
N
inalargmcolumninalargmrowfe
Expected frequency (fe) for the top-left cell:
Computation of Chi Square
Step 1: Make Assumptions and Meet Test Requirements Independent random samples Level of Measurement is nominal
Note the minimal assumptions. In particular, note that no assumption is made about the shape of the sampling distribution. The chi square test is nonparametric, or distribution-free
Step 2: State the Null Hypothesis
Ho: The variables are independent
Another way to state the Ho, more consistently with previous tests: H0: fo = fe
H1: The variables are dependent
Another way to state the H1:
H1: fo ≠ fe
Step 3: Select the Sampling Distribution and Establish the Critical Region Sampling Distribution = Chi Square, χ2 Alpha = 0.05 df = (r-1)(c-1) = (2-1)(2-1)= 1 χ2 (critical) = 3.841
Step 5: Make a Decision and Interpret the Results of the Test χ2 (critical) = 3.841 χ2 (obtained) = 10.78 The test statistic falls in the critical region, so
reject Ho
There is a significant relationship between employment status and accreditation status in the population from which the sample was drawn
Interpreting Chi Square
The chi square test tells us only if the variables are independent or not
It does not tell us the pattern or nature of the relationship
To investigate the pattern, compute percentages within each column and compare across the columns
Computation of Chi Square
Are the homicide rate and volume of gun sales related for a sample of 25 cities? (Problem 11.4, p. 295)
The bivariate table shows the relationship between homicide rate (columns) and gun sales (rows)
This 2 x 2 table has 4 cells
Step 1: Make Assumptions and Meet Test Requirements Independent random samples Level of Measurement is nominal
Note the minimal assumptions. In particular, note that no assumption is made about the shape of the sampling distribution. The chi square test is nonparametric, or distribution-free
Step 2: State the Null Hypothesis
Ho: The variables are independent Another way to state the Ho, more
consistently with previous tests: Ho: fo = fe
H1: The variables are dependent Another way to state the H1:
H1: fo ≠ fe
Step 3: Select the Sampling Distribution and Establish the Critical Region Sampling Distribution = χ2
Alpha = 0.05 df = (r-1)(c-1) = (2-1)(2-1)=1 χ2 (critical) = 3.841
Step 5: Make a Decision and Interpret the Results of the Test χ2 (critical) = 3.841 χ2 (obtained) = 2.00 The test statistic is not in the critical region,
fail to reject the Ho
There is no relationship between homicide rate and gun sales in the population from which the sample was drawn
Interpreting Chi Square
Cities low on homicide rate were high in gun sales, and cities high in homicide rate were low in gun sales
As homicide rates increase, gun sales decrease We found this relationship not to be significant, but it does have
a clear pattern
Gun Sales
Homicide Rate
Low High
High 8 (66.7%) 5 (38.5%) 13
Low 4 (33.3%) 8 (61.5%) 12
12 (100%) 13 (100%) 25
Limitations of Chi Square
1. Difficult to interpret when variables have many categories Best when variables have four or fewer categories
2. With small sample size, cannot assume that Chi Square sampling distribution will be accurate
Small sample: High percentage of cells have expected frequencies of 5 or less
3. Like all tests of hypotheses, Chi Square is sensitive to sample size
As N increases, obtained Chi Square increases With large samples, trivial relationships may be significant
It is important to remember that statistical significance is not the same as substantive significance
Chi Square in SPSS
Step 5: making a decision and interpreting the results of the test
overweight_1 * urban Crosstabulation
329 468 797
385.7 411.3 797.0
155 48 203
98.3 104.7 203.0
484 516 1000
484.0 516.0 1000.0
Count
Expected Count
Count
Expected Count
Count
Expected Count
0
1
overweight_1
Total
0 1
urban
Total
Chi-Square Tests
79.699b 1 .000
78.301 1 .000
82.696 1 .000
.000 .000
79.619 1 .000
1000
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is 98.25.
b.
Result(χ2 obtained)
Chi Square in SPSS
Symmetric Measures
.272 .000
1000
Contingency CoefficientNominal by Nominal
N of Valid Cases
Value Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
The nominal symmetric measures indicate both the strength and significance of the relationship between the row and column variables of a crosstabulation.