Download - Week 10

Week 10

Chapter 10 - Hypothesis Testing III :The Analysis of Variance

(ANOVA)&

Chapter 11 – Hypothesis Testing IV: Chi Square

Chapter 10

Hypothesis Testing III :

The Analysis of Variance

(ANOVA)

In This Presentation

The basic logic of analysis of variance (ANOVA)

A sample problem applying ANOVA The Five Step Model Limitations of ANOVA

Post hoc techniques

Chapter 8

Population =Penn State University

Group 1 = All Education

MajorsRS of 100

Education

Majors

Chapter 9

Population =Pennsylvania

Group 1 = All Males in

Population

Group 2 = All Females

in Population

RS of 100 Males

RS of 100 Females

In this Chapter

Population =Pennsylvania

Group 1 = All Protestants

in Population

Group 2 = All Catholics

in Population

RS of 100 Protest.

RS of 100 Catholics

RS of 100 Jews

Group 2 = All Jews in Population

Basic Logic

ANOVA can be used in situations where the researcher is interested in the differences in sample means across three or more categories

Examples: How do Protestants, Catholics and Jews vary in terms of

number of children? How do Republicans, Democrats, and Independents vary

in terms of income? How do older, middle-aged, and younger people vary in

terms of frequency of church attendance?

Basic Logic

ANOVA is used when: The independent variable has more than two categories The dependent variable is measured at the interval or

ratio level

Basic Logic

Can think of ANOVA as extension of t test for more than two groups The t test can only be used when the independent

variable only has two categories ANOVA asks “are the differences between the

samples large enough to reject the null hypothesis and justify the conclusion that the populations represented by the samples are different?” (p. 243)

The Ho is that the population means are the same:

Ho: μ1= μ2= μ3 = … = μk

Basic Logic

If the Ho is true, the sample means should be about the same value If the Ho is true, there will be little difference between

sample means

If the Ho is false, there should be substantial differences between categories, combined with relatively little difference within categories The sample standard deviations should be low in value If the Ho is false, there will be big difference between

sample means combined with small values for sample standard deviations

Basic Logic

The larger the differences between the sample means, the more likely the Ho is false – especially when there is little difference within categories

When we reject the Ho, we are saying there

are differences between the populations

represented by the sample

Example 1

We have administered the support for capital punishment scale to a sample of 20 people who are equally divided across five religious categories

Example 1

Hypothesis Test of ANOVA Step 1: Make assumptions and meet test

requirements Independent random samples Interval-ratio level of measurement Normally distributed populations Equal population variances

Example 1

Step 2: State the null hypothesis Ho: μ1 = μ2 = μ3 = μ4 = μ5

H1: at least one of the populations means is different

Step 3: Select the sampling distribution and establish the critical region Sampling distribution = F distribution Alpha = 0.05 dfw = 15, dfb = 4 F(critical) = 3.06

Example 1

Step 4: Compute test statistic F = 2.57

Step 5: Make a decision and interpret the results F(critical) = 3.06 F(obtained) = 2.57 The test statistic does not fall in the critical region,

so fail to reject the null hypothesis – support for capital punishment does not differ across the populations of religious affiliations

Limitations of ANOVA

1. Requires interval-ratio level measurement of the dependent variable and roughly equal numbers of cases in the categories of the independent variable

2. Statistically significant differences are not necessarily important

3. The alternative (research) hypothesis is not specific – it only asserts that at least one of the population means differs from the others

Use post hoc techniques for more specific differences

USING SPSS

On the top menu, click on “Analyze” Select “Compare Means” Select “One Way ANOVA”

ANOVA in SPSS Analyze / Compare means / One-way ANOVA

ANOVA dialog box

ANOVA output

ANOVA

Total output

309.600 2 154.800 14.221 .000

293.900 27 10.885

603.500 29

Between Groups

Within Groups

Total

Sum ofSquares df Mean Square F Sig.

Chapter 11

Hypothesis Testing IV: Chi Square

In This Presentation

Bivariate (crosstabulation) tables The basic logic of Chi Square The terminology used with bivariate tables The computation of Chi Square with an

example problem The five step model Limitations of Chi Square

The Bivariate Table

Bivariate tables: display the scores of cases on two different variables at the same time

The Bivariate Table

Note the two dimensions: rows and columns. What is the independent variable?What is the dependent variable? Where are the row and column marginals?Where is the total number of cases (N)?

Chi Square

Chi Square can be used: with variables that are measured at any level (nominal,

ordinal, interval or ratio) with variables that have many categories or scores when we don’t know the shape of the population or

sampling distribution

Basic Logic

Independence: “Two variables are independent if the

classification of a case into a particular category of one variable has no effect on the probability that the case will fall into any particular category of the second variable” (p. 274)

Chi Square as a test of statistical significance is a test for independence

Basic Logic

Basic Logic

Chi Square is a test of significance based on bivariate, crosstabulation tables (also called crosstabs)

We are looking for significant differences between the actual

cell frequencies OBSERVED in a table (fo) and those that

would be EXPECTED by random chance or if cell

frequencies were independent (fe)

Computation of Chi Square

Example

RQ: Is the probability of securing employment in the field of social work dependent on the accreditation status of the program?

NULL HYP: The probability of securing employment in the field of social work is NOT dependent on the accreditation status of the program. (The variables are independent)

HYP: The probability of securing employment in the field of social work is dependent on the accreditation status of the program. (The variables are dependent)


Example

22100

5540

N

inalargmcolumninalargmrowfe

Expected frequency (fe) for the top-left cell:



Example

Step 1: Make Assumptions and Meet Test Requirements Independent random samples Level of Measurement is nominal

Note the minimal assumptions. In particular, note that no assumption is made about the shape of the sampling distribution. The chi square test is nonparametric, or distribution-free

Step 2: State the Null Hypothesis

Ho: The variables are independent

Another way to state the Ho, more consistently with previous tests: H0: fo = fe

H1: The variables are dependent

Another way to state the H1:

H1: fo ≠ fe

Step 3: Select the Sampling Distribution and Establish the Critical Region Sampling Distribution = Chi Square, χ2 Alpha = 0.05 df = (r-1)(c-1) = (2-1)(2-1)= 1 χ2 (critical) = 3.841

Step 4: Calculate the Test Statistic

χ2 (obtained) = 10.78

Step 5: Make a Decision and Interpret the Results of the Test χ2 (critical) = 3.841 χ2 (obtained) = 10.78 The test statistic falls in the critical region, so

reject Ho

There is a significant relationship between employment status and accreditation status in the population from which the sample was drawn

Interpreting Chi Square

The chi square test tells us only if the variables are independent or not

It does not tell us the pattern or nature of the relationship

To investigate the pattern, compute percentages within each column and compare across the columns


Are the homicide rate and volume of gun sales related for a sample of 25 cities? (Problem 11.4, p. 295)

The bivariate table shows the relationship between homicide rate (columns) and gun sales (rows)

This 2 x 2 table has 4 cells

Step 1: Make Assumptions and Meet Test Requirements Independent random samples Level of Measurement is nominal

Note the minimal assumptions. In particular, note that no assumption is made about the shape of the sampling distribution. The chi square test is nonparametric, or distribution-free

Step 2: State the Null Hypothesis

Ho: The variables are independent Another way to state the Ho, more

consistently with previous tests: Ho: fo = fe

H1: The variables are dependent Another way to state the H1:

H1: fo ≠ fe

Step 3: Select the Sampling Distribution and Establish the Critical Region Sampling Distribution = χ2

Alpha = 0.05 df = (r-1)(c-1) = (2-1)(2-1)=1 χ2 (critical) = 3.841

Step 4: Calculate the Test Statistic

χ2 (obtained) = 2.00

Step 5: Make a Decision and Interpret the Results of the Test χ2 (critical) = 3.841 χ2 (obtained) = 2.00 The test statistic is not in the critical region,

fail to reject the Ho

There is no relationship between homicide rate and gun sales in the population from which the sample was drawn

Interpreting Chi Square

Cities low on homicide rate were high in gun sales, and cities high in homicide rate were low in gun sales

As homicide rates increase, gun sales decrease We found this relationship not to be significant, but it does have

a clear pattern

Gun Sales

Homicide Rate

Low High

High 8 (66.7%) 5 (38.5%) 13

Low 4 (33.3%) 8 (61.5%) 12

12 (100%) 13 (100%) 25

Limitations of Chi Square

1. Difficult to interpret when variables have many categories Best when variables have four or fewer categories

2. With small sample size, cannot assume that Chi Square sampling distribution will be accurate

Small sample: High percentage of cells have expected frequencies of 5 or less

3. Like all tests of hypotheses, Chi Square is sensitive to sample size

As N increases, obtained Chi Square increases With large samples, trivial relationships may be significant

It is important to remember that statistical significance is not the same as substantive significance

Chi Square in SPSS Step 4: computing the test statistic in SPSS

Chi Square in SPSS

Step 5: making a decision and interpreting the results of the test

overweight_1 * urban Crosstabulation

329 468 797

385.7 411.3 797.0

155 48 203

98.3 104.7 203.0

484 516 1000

484.0 516.0 1000.0

Count

Expected Count

Count

Expected Count

Count

Expected Count

0

1

overweight_1

Total

0 1

urban

Total

Chi-Square Tests

79.699b 1 .000

78.301 1 .000

82.696 1 .000

.000 .000

79.619 1 .000

1000

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is 98.25.

b.

Result(χ2 obtained)

Chi Square in SPSS

Symmetric Measures

.272 .000

1000

Contingency CoefficientNominal by Nominal

N of Valid Cases

Value Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

The nominal symmetric measures indicate both the strength and significance of the relationship between the row and column variables of a crosstabulation.