AP Statistics4.3 Relations in Categorical Data
Use categorical data to calculate marginal and conditional proportions
Understand Simpson’s Paradox in context of a problem
Learning Objective:
two-way table- describes 2 categorical variables
row variable- describes people with one level
column variable- describes one level of your variable
Definitions
Marginal Distributions- row total and column totals
Conditional Distribution (“GIVEN”)- refers to people who only satisfy a
certain condition
roundoff error- when the data doesn’t add to 100%
The percent of people over 25 years of age who have at least
4 years of college is?
What percent of those who are 25-34 completed high school?
What percent completed 4 or more years of college and are 35-54?
What percent is 55 and over, given they did not complete high school?
#1- How many students do these data describe?
5375 #2- What percent of these students smoke?
1004/5375= 0.187= 18.7%
#3- Give the marginal distribution of parents’ smoking behavior, both in counts and percents.
Parent Both One Neither
% 33% 42% 25%
Both One Neither0
5
10
15
20
25
30
35
40
45
Parent
Parent
#4- What percent of the students smoke, given both their parents smoke?
400/1780= 0.22
#5- What percent of neither parents smoke, given their student does not smoke?
1168/4371= 0.27
refers to the reversal of the direction of a comparison or an association when the data from several groups are combined to form a single group.
Simpson’s Paradox-
What percent of patients died in each hospital?
Hospital A: Hospital B:
Hospital A has a higher death rate
Hospital A Hospital B Total
Died 63 16 79
Survived 2037 784 2821
Total 2100 800 2900
Good Condition Bad Condition
A: 6/600= 1% A: 57/1500=3.8%
B: 8/600= 1.333% B: 8/200% = 4%In both cases, Hospital A had a lower death
rate…….why?????
We took a closer look to determine the condition of the patient when they
entered the hospital.
A B
Died 6 8
Survived 594 592
A B
Died 57 8
Survived
1443 192
In both hospitals, people entering in bad condition had a higher death rate and since the majority of Hospital A entered in bad condition, overall they had a higher death rate.
WHY?????