URBP 204A QUANTITATIVE METHODS I Statistical Analysis Lecture IV

transcript

URBP 204A QUANTITATIVE METHODS I

Statistical Analysis Lecture IV

Gregory NewmarkSan Jose State University

(This lecture is based on Chapters 5,12,13, & 15 of Neil Salkind’sStatistics for People who (Think They) Hate Statistics, 2nd Edition

which is also the source of many of the offered examples. All cartoons are from CAUSEweb.org by J.B. Landers.)

More Statistical Tests• Factorial Analysis of Variance (ANOVA)

– Tests between means of more than two groups for two or more factors (independent variables)

• Correlation Coefficient– Tests the association between two variables

• One Sample Chi-Square (χ2)– Tests if an observed distribution of frequencies for one

factor is what one would expect by chance• Two Factor Chi-Square (χ2)

– Tests if an observed distribution of frequencies for two factors is what one would expect by chance

Factorial ANOVA• Compares observations of a single variable among two

or more groups which incorporate two or more factors.

• Examples:– Reading Skills

• School (Elementary, Middle, High)• Academic Philosophy (Montessori, Waldorf)

– Environmental Knowledge• Commute Mode (Car, Bus, Walking)• Age (Under 40, 40+)

– Wealth • Favorite Team (A’s, Giants, Dodger, Angels)• Home Location (Oakland, SF, LA)

– Weight Loss• Gender (Male, Female)• Exercise (Biking, Running)

Factorial ANOVA• Two Types of Effects

– Main Effects: differences within one factor– Interaction Effects: differences across factors

• Example:– Weight Loss

• Gender (Male, Female)• Exercise (Biking, Running)

– Main Effects:• Does weight loss vary by exercise?• Does weight loss vary by gender?

– Interaction Effects: • Does weight loss due to exercise vary by gender?

Factorial ANOVA

• Example:– “How is weight loss affected by exercise program

and gender?”• Steps:

– State hypotheses• Null :

H0 : µMale = µFemale

H0 : µBiking = µRunning

H0 : µMale-Biking = µFemale-Biking = µMale-Running = µFemale-Running

• Research : What would these three be?

Factorial ANOVA• Steps (Continued):

– Set significance level• Level of risk of Type I Error = 5% • Level of Significance (p) = 0.05

– Select statistical test• Factorial ANOVA

– Computation of obtained test statistic value • Insert obtained data into appropriate formula• (SPSS can expedite this step for us)

Factorial ANOVA• Weight Loss Data

Male-Biking Male-Running Female-Biking Female-Running

76 88 65 65

78 76 90 67

76 76 65 67

76 76 90 87

76 56 65 78

74 76 90 56

74 76 90 54

76 98 79 56

76 88 70 54

55 78 90 56

Factorial ANOVA

• SPSS Outputs

Tests of Between-Subjects Effects

Dependent Variable: WeightLoss

1522.875a 3 507.625 4.678 .007218892.025 1 218892.025 2017.386 .000

265.225 1 265.225 2.444 .127207.025 1 207.025 1.908 .176

1050.625 1 1050.625 9.683 .0043906.100 36 108.503

224321.000 405428.975 39

SourceCorrected ModelInterceptExerciseGenderExercise * GenderErrorTotalCorrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .281 (Adjusted R Squared = .221)a.

Between-Subjects Factors

Running 20Biking 20Male 20Female 20

Exercise

Gender

Value Label N

Factorial ANOVA

• SPSS Outputs

Factorial ANOVA

• SPSS Outputs– Graph them!

FemaleMale

Gender

BikingRunning

Exercise

Factorial ANOVA• Steps (Continued)

– Computation of obtained test statistic value • Exercise F = 2.444, p = 0.127• Gender F = 1.908, p = 0.176• Interaction F = 9.683, p = 0.004

– Look up the critical F score• dfnumerator = # of Factors – 1 • dfdenominator = # of Observations – # of Groups• What is the critical F score?

– Comparison of obtained and critical values• If obtained > critical reject the null hypothesis• If obtained < critical stick with the null hypothesis

Factorial ANOVA• Steps (Continued)

– Therefore we reject the null hypothesis for the interaction effects. This means that while choice of exercise alone and gender alone make no difference to weight loss, in combination they do differentially affect weight loss. Men should run and women should bike, according to these data.

Correlation Coefficient• Tests whether changes in two variables are

related• Examples

– “Are property values positively related to distance from waste dumps?”

– “Is age correlated with height for minors?”– “Are apartment rents negatively related to

commute time?”– “Does someone’s height relate to income?”– “How related are hand size and height?”

Correlation Coefficient• Are Tastiness and Ease correlated for fruit?• Is there directionality?

Correlation Coefficient• Numeric index that reflects the linear relationship

between two variables (bivariate correlation)– “How does the value of one variable change when another

variable changes?”– Each case has two data points:

• E.g. This study records each persons height and weight to see if they are correlated.

– Ranges from -1.0 to +1.0– Two types of possible correlations

• Change in the same direction : positive or direct correlation• Change in opposite directions: negative or indirect correlation

– Absolute value reflects strength of correlation• Pearson Product-Moment Correlation

– Both variables need to be ratio or interval

Correlation Coefficient• Scatterplot

Correlation Coefficient• Coefficient of Determination

– Squaring the correlation coefficient (r2)– The percentage of variance in one variable that is

accounted for by the variance in another variable• Example: GPA and Time Spent Studying

– [rGPA and Study Time = 0.70]; [r2GPA and Study Time = 0.49]

• 49% of the variance in GPA can be explained by the variance in studying time

• GPA and studying time share 49% of the variance between themselves

Correlation Coefficient• Example

– “How related are hand size and height?”• Steps

– State hypotheses• Null : H0 : ρHand Size and Height = 0

• Research: H1 : rHand Size and Height ≠ 0– Non-directional

Correlation Coefficient• Steps (Continued)

– Select statistical test• Correlation Coefficient (it is the test statistic!)

– Computation of obtained test statistic value • Insert obtained data into appropriate formula

Correlation Coefficient• Plot the data: n = 30

– Computation of obtained test statistic value • rHand Size and Height = 0.736

Correlations

Height HandHeight Pearson

Correlation1 .736**

Sig. (2-tailed) .000

N 30 30Hand Pearson

Correlation.736** 1

Sig. (2-tailed) .000

N 30 30**. Correlation is significant at the 0.01 level (2-tailed).

– Computation of critical test statistic value• Value needed to reject null hypothesis• Look up p = 0.05 in critical value table• Consider degrees of freedom [df= n – 2] • Consider number of tails (is there directionality?)• rcritical = ?

Correlation Coefficient

• What happens to the critical score when the number of cases (n) decreases? Why?

• Steps (Continued)– Comparison of obtained and critical values

• If obtained > critical reject the null hypothesis• If obtained < critical stick with the null hypothesis• robtained = 0.736 > rcritical = 0.349

– Therefore, we reject the null hypothesis and accept the research hypothesis that height and handbreadth are correlated.

• Is there a directionality to that correlation?

• Significance vs. Meaning– Rules of Thumb

• r = 0.8 to 1.0 Very strong relationship• r = 0.6 to 0.8 Strong relationship• r = 0.4 to 0.6 Moderate relationship• r = 0.2 to 0.4 Weak relationship• r = 0.0 to 0.2 Weak or no relationship

• Does correlation express causation?• Classic Example:

– Ice Cream Eaten– Crimes Committed

• Correlation expresses association only

Chi-Square (χ2)• Non-Parametric Test

– Does not rely on a given distribution• Useful for small sample sizes

– Enables consideration of data that comes as ordinal or nominal frequencies

• Number of children in different grades• Percentage of people by state receiving social security

One Sample Chi-Square (χ2)• Tests whether an observed distribution of

frequencies for one factor is likely to have occurred by chance

• Examples:– “Is this community evenly distributed among ethnic

groups?”– “Are the 31 ice cream flavors at Baskin Robbins

equally purchased?”– “Are commuting mode shares evenly spread out?”– “Did people report equal preferences for a school

voucher policy?”

One Sample Chi-Square (χ2)• Examples:

– “Did people report equal preferences for a school voucher policy?”

– Data (90 People split into 3 Categories)• For 23• Maybe 17• Against 50

– Always try to have at least 5 responses per category

One Sample Chi-Square (χ2)• Steps:

H0 : ProportionFor = ProportionMaybe = ProportionAgainst

• Research : H1 : ProportionFor ≠ ProportionMaybe ≠ ProportionAgainst

– Select statistical test• Chi-Square (χ2)

One Sample Chi-Square (χ2)• Steps (Continued):

– Computation of obtained test statistic value • Insert obtained data into appropriate formula• (SPSS can expedite this step for us)

– Computation of obtained test statistic value

Category O E (O-E) (O-E)2 (O-E)2/E

For 23 30 -7 49 1.63

Against 17 30 -13 169 5.63

Maybe 50 30 20 400 13.33

Total 90 90 -- -- 20.59

– Computation of obtained test statistic value • χ2 obtained = 20.59

– Computation of critical test statistic value• Value needed to reject null hypothesis• Look up p = 0.05 in χ2 table• Consider degrees of freedom [df= # of categories - 1] • χ2 critical = 5.99

– Computation of obtained test statistic value Votes

23 30.0 -7.017 30.0 -13.050 30.0 20.090

ForMaybeAgainstTotal

Observed N Expected N Residual

Test Statistics

20.6002

Chi-Square a

dfAsymp. Sig.

0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 30.0.

– Comparison of obtained and critical values• If obtained > critical reject the null hypothesis• If obtained < critical stick with the null hypothesis• χ2 obtained = 20.59 > χ2 critical = 5.99

– Therefore, we can reject the null hypothesis and we thus conclude that distribution of preferences regarding the school voucher is not even.

Two Factor Chi-Square (χ2)• What if we want to see if gender effects the

distribution of votes?

• How is this different from Factorial ANOVA?

Votes * Gender Crosstabulation

17 6 237 10 17

20 30 5044 46 90

ForMaybeAgainst

Male FemaleGender

Two Factor Chi-Square (χ2)• Steps:

H0 : PFor*Male = PMaybe*Male = PAgainst *Male = PFor*Female = PMaybe*Female = PAgainst *Female

• Research : H1 : PFor*Male ≠ PMaybe*Male ≠ PAgainst *Male ≠ PFor*Female ≠ PMaybe*Female ≠ PAgainst

*Female

– Select statistical test• Chi-Square (χ2)

Two Factor Chi-Square (χ2)• Steps (Continued):

– Computation of obtained test statistic value • Insert obtained data into appropriate formula• Same as for One Factor Chi-Square

Two Factor Chi-Square (χ2)• How do we find the expected frequencies?

– (Row Total * Column Total)/ Total Total– Expected Value [For*Male] = (23*44)/90 = 11.2

Votes * Gender Crosstabulation

17 6 2311.2 11.8 23.0

7 10 178.3 8.7 17.020 30 50

24.4 25.6 50.044 46 90

44.0 46.0 90.0

CountExpected CountCountExpected CountCountExpected CountCountExpected Count

Against

Male FemaleGender

– Computation of obtained test statistic value • χ2 obtained = 7.750

Chi-Square Tests

7.750a 2 .0217.984 2 .018

6.344 1 .012

Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid Cases

Value dfAsymp. Sig.

(2-sided)

0 cells (.0%) have expected count less than 5. Theminimum expected count is 8.31.

– Computation of critical test statistic value• Value needed to reject null hypothesis• Look up p = 0.05 in χ2 table• Consider degrees of freedom • df= (# of rows – 1) * (# of columns – 1) • χ2 critical = ?

– Comparison of obtained and critical values• If obtained > critical reject the null hypothesis• If obtained < critical stick with the null hypothesis• χ2 obtained = 7.750 > χ2 critical = 5.99

– Therefore, we can reject the null hypothesis and we thus conclude that gender affects the distribution of preferences regarding the school vouchers.

Tutorial Time

URBP 204A QUANTITATIVE METHODS I Statistical Analysis Lecture IV

Documents