+ All Categories
Home > Documents > Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing...

Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing...

Date post: 11-Jan-2016
Category:
Upload: daniela-boone
View: 218 times
Download: 1 times
Share this document with a friend
28
Test of Independence
Transcript
Page 1: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Test of Independence

Page 2: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Parametric test• In earlier lectures, we discussed the procedures of drawing

conclusions about population parameters on the basis of sample information.

• It requires certain assumptions about the population from which the sample is drawn.

• Application of t-test for small samples requires that the parent population is normally distributed. Second, the hypothesis is formulated by specifying a particular value for the parameter.

• When it is not possible to make any assumption about the value of a parameter the test procedure and the population does not follow normal distribution , we use non-parametric tests.

Page 3: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Non-parametric test

• In many cases, particularly for nominal variables, we do not have parameters.

• For a categorical data, the variable (or attribute) assumes values pertaining to a finite number of categories. For instance, a gender variable has only two categories: male and female. We can count the number of observations in each category.

• Drawing inferences in the case of nominal data, chi-square test statistic is used for independence test between two categorical variables.

Page 4: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Contingency table• A contingency table is an arrangement of data in a

two-way classification. The data are sorted into cells, and the count for each cell is reported.

• The contingency table involves two factors (or variables), and a common question concerning such tables is whether the data indicate that the two variables are independent or dependent.

• To illustrate a test of independence, let’s consider a random sample that shows the gender of liberal arts college students and their favorite academic area.

Page 5: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

MS SS H0

5

10

15

20

25

30

35

40

45

30.333.6

36.1

19.7

40.4 39.9

MaleFemale

%

Looking at the bar chart, it appears that there are differences in the proportion of females’ and males’ preference for each subject.We need a statistical test to verify our visual impression, that is there is a statistical dependence between them.

Page 6: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• A group of 300 were randomly selected and asked whether he or she prefers taking liberal arts courses in the area of math–science, social science, or humanities.

• Does this sample present sufficient evidence to reject the null hypothesis “Preference for math–science, social science, or humanities is independent of the gender of a college student”? (In layman term, it implies that the preference for subject area is not related to gender).

Page 7: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• Parameter of interest: Determining the independence of the variables “gender” and “favorite subject area” requires us to discuss the probability of the various cases and the effect that answers about one variable have on the probability of answers about the other variable.

• Two random variables X and Y are called independent if the probability distribution of one variable is not affected by the presence of another.

• As defined in conditional probability, independence requires P(MS | M) = P(MS | F) = P(MS) ; that is, gender has no effect on the probability of a person’s choice of subject area.

• If both events are independent, then P(AB)= P(A) * P(B)

Page 8: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Steps in hypothesis testing

Ho: Preference for math–science, social science, or humanities is independent of the gender of a college student.

Ha: Subject area preference is not independent of the gender of the student.

• Test statistic - 2 chi-square =

where O= observed frequency and E = expected frequency • If the observed counts are “close enough” to the expected counts, then the sum of all of their deviations 2 should not be much bigger than 0. Otherwise, would be very big, suggesting that the original hypothesis of independence between the 2 random variables is not valid.

Page 9: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• E is the expected count if X and Y are independent. Since the null hypothesis asserts that these factors are independent, we would expect the values to be distributed in proportion to the marginal totals.

• The expected value is calculated on the assumption that H0 is true, i.e. gender and subject choice is independent.

• There are 122 males; we would expect them to be distributed among MS, SS, and H proportionally to the 72, 113, and 115 totals. Thus, the expected cell counts for males are

• Would expect for the females

Page 10: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• Typically, the contingency table is written so that it contains all this information.

• The calculated chi-square is

= 2.035 + 0.533 + 0.164 + 1.395 + 0.365 + 0.112 = 4.604

Page 11: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

2 distribution table and degree of freedom

• Like the t distribution, the chi-square distribution has (C-1)(R-1) degree of freedom where C= number of column and R= number of row shown in the two-way table.

• In the displayed table, there are 3 columns and 2 rows. So the degree of freedom is (3-1)(2-1)=2.

Page 12: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.
Page 13: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• A chi-square statistic was computed for a two-way table having 4 degrees of freedom. The value of the statistic was 9.49. What is the p-value?A. 0.005 B. 0.01 C. 0.05

• A chi-square statistic was computed for a two-way table having 20 degrees of freedom. The value of the statistic was 29.69. What is the p-value?A. 0.025 B. 0.05 C. 0.075

Page 14: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Decision making

• Given the level of significance is 5% and df=2, the critical value is 5.99.• The computed 2 is 4.604 and is smaller than the critical value. As it lies

inside the non-rejection region, we fail to reject the null hypothesis at the 5% significance level. Therefore, we conclude that the preference for academic subject areas does not depend on the sex of the respondents.

• Alternatively, we say – sex of the students has no significant influence (relationship) on the preference for subject areas.

• P-value approach: The null hypothesis of the independence assumption is to be rejected if the p-value of the Chi-squared test statistics is less than a given significance level α.

• Given the 2 test statistic 4.604, the corresponding p-value at d.f. 2 is around 10%. If we set α=0.05, we cannot reject the null hypothesis.

Page 15: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• A researcher is interested to know ‘with whom it is easiest to make friends?’. 205 first year students from the FSS were randomly selected to participate in the interview.

• Expected values:

• Do we have evidence to support that it is easier for men and women to make friends with same sex or opposite sex at 5% significance level?

Opposite sex Same sex No difference

Total

Female 58 (48.79) 16 (19.38) 63 (68.83) 137

Male 15 (24.21) 13 (9.62) 40 (34.17) 68

Total 73 29 103 205

Page 16: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Ho: There is no relationship between sex of students and the perception to make friends in the population. Ha: There is a relationship between sex of students and the perception to make friends in the population.

Test statistic = 8.515

Assume the significance level is 5%, at df=((3-1)(2-1)=2, the critical value is 5.99.

As 2 test statistic (8.515) is larger than the critical value, we reject the null hypothesis and conclude that there is a relationship between the respondents’ sex and response to the question asked in the population.

Page 17: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• Suppose that a two-way table displaying sample information about gender and opinion about the legalization of marijuana (yes or no) is examined using a chi-square test. The necessary conditions are met and the chi-square value is calculated to be 15. What conclusion can be made?A. Gender and opinion have a statistically significant

relationshipB. Gender and opinion do not have a statistically

significant relationshipC. It is impossible to make a conclusion because we

don’t know the sample size.D. It is impossible to make a conclusion because we

don’t know the degrees of freedom.

Page 18: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• The value of the χ2-test statistic is 5.33. Are the results statistically significant at the 5% significance level if degree of freedom is 2?A. Yes, because 5.33 is greater than the critical value of

3.84.B. Yes, because 5.33 is greater than the critical value of

4.01.C. No, because 5.33 is smaller than the critical value of 5.99.D. No, because 5.33 is smaller than the critical value of

11.07.

Page 19: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• Suppose a random sample of 650 of the 1 million residents of a city is taken, in which every resident of each of four neighborhoods, A, B, C, and D, is equally likely to be chosen. A null hypothesis says the randomly chosen person's neighborhood of residence is independent of the person's occupational classification, which is either "blue collar", "white collar", or "service". Assume = 5%.

A B C D TotalBlue collar

90 60 104 95 349

White 30 50 51 20 151Service 30 40 45 35 150Total 150 150 200 150 650

Page 20: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

• Based on the following table, use the chi-square test to determine whether smoking habit and exercise level of students is independent at the 5% significance level.

Smoking habit

ExerciseFrequently Never Some

Heavy 7 1 3

Never 87 18 84

Occasionally 12 3 4

Regularly 9 1 7

Page 21: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Exercise

1. Which of the following relationships could be analyzed using a chi-square test?

A. The relationship between height (inches) and weight (pounds).

B. The relationship between satisfaction with K-12 schools (satisfied or not) and political party affiliation.

C. The relationship between gender and amount willing to spend on a stereo system (in dollars).

D. The relationship between opinion on gun control and income earned last year (in thousands of dollars).

Page 22: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

1. A student survey was done to study the relationship between where students live (dormitory, apartment, house, co-op, or parent’s home) and how they usually get to campus (walking, bus, bicycle, car, or subway). What are the degrees of freedom for the chi-square statistic?

A. 5 B. 16 C. 20 D. 25

2. A chi-square test involves a set of counts called “expected counts.” What are the expected counts?

A. Hypothetical counts that would occur if the alternative hypothesis were true.

B. Hypothetical counts that would occur if the null hypothesis were true.

C. The actual counts that did occur in the observed data.D. The long-run counts that would be expected if the observed

counts are representative.

Page 23: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

3. In the General Social Survey, respondents were asked what they thought was most important to get ahead: hard work, lucky breaks, or both. What is the null hypothesis for this situation?A. There is a relationship between gender and opinion on

what is important to get ahead in the sample.B. There is no relationship between gender and opinion on

what is important to get ahead in the sample.C. There is a relationship between gender and opinion on

what is important to get ahead in the population.D. There is no relationship between gender and opinion on

what is important to get ahead in the population.

• What is the alternative hypothesis?

Page 24: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Expected counts are printed below observed counts Male Female Total Hard work 284 393 677 292.31 384.69 Lucky breaks 84 121 205 88.51 116.49 Both 75 69 144 62.18 81.82 Total 443 583 1026 Chi-Sq = 0.236 + 0.180 + 0.230 + 0.175 + 2.645 + 2.010 = 5.476 P-Value = 0.065

What is the value of the test statistic? What are the degrees of freedom for this testing? At a significance level of 0.05, what is the conclusion?

Page 25: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

1. A researcher conducted a study on college students to see if there was a link between gender and how often they have cheated on an exam. She asked two questions on a survey:

(1) What is your gender? Male ___ Female ___(2) How many times have you cheated on an exam while in college? Never __ 1 or 2 times ___ 3 or more times ___

a. what is the appropriate null hypothesis?b. What are the degrees of freedom for the test statistic?

Page 26: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Using SPSS for 2 test

• Analyse → descriptive → crosstab→ drag a variable into column and a variable to row

• → statistic choose chi-square, • → cell by default the system gives observed

count, choose expected count and see the result. But the expected value is normally not reported. Choose either column or row percentage. This is to report the descriptive statistics in the write up.

Page 27: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.
Page 28: Test of Independence. Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis.

Recommended