Post on 18-Dec-2015
transcript
TOPIC 10TOPIC 10
Discrete (Categorical) Data Analysis
Discrete (Categorical) Data Analysis
Discrete Random VariablesDiscrete Random Variables
Recall that discrete random variables may take only discrete values.
For example,• Number of errors in a software product:
0, 1, 2, 3, 4, …• Categories of a product’s quality level”
High, medium, or low• Characteristics of a machine breakdown:
Mechanical failure, electrical failure, or operator misuse.
Sample ProportionsSample Proportions
Recall that the success probability p can be estimated by the sample proportion
n
xp ˆ
For large enough values of n the sample proportion can be taken to have approximately the normal distribution
n
pppNp
1,~ˆ
This expression may be written in terms of a standard normal distribution as
1,0~
1
ˆˆ
ˆ
N
npp
ppppZ
p
= Standard Errorp̂
Confidence Interval Estimation for pConfidence Interval Estimation for p
Assumptions:
15ˆ1
15ˆ
pn
pn
n
ppZpp
n
ppZp
ˆ1ˆˆ
ˆ1ˆˆ
22
Since the probability of p is unknown then we replace p with its estimated p̂
You’re a production manager for a newspaper. You want to find the % defective. Of 200 newspapers, 35 had defects. What is the 90% confidence interval estimate of the population proportion defective?
ExampleExample
Example SolutionExample Solution
2192.01308.0200
825.0175.0645.1175.0
200
825.0175.0645.1175.0
ˆ1ˆˆ
ˆ1ˆˆ
22
p
p
n
ppZpp
n
ppZp
SE = Sampling Error
• If no estimate of p is available, use p = 1 – p = 0.5
Sample Size for Estimating pSample Size for Estimating p
I don’t want to sample too much or too little!
2
2
2
2
1
11
ˆ
SE
ppZn
npp
SE
npp
ppZ
What sample size is needed to estimate p with 90% confidence and a width L of .03?
ExampleExample
015.02
03.0
2
LSE
300769.3006015.0
5.05.0645.11
2
2
2
2
2
SE
ppZn
ExercisesExercises
• Suppose that the auditing procedures require you to have 95% confidence in estimating the population proportion of sales invoices with errors within ± 0.07. The results from the past months indicate that the largest proportion has been no more than 0.15. Find the sample size needed to satisfy the requirements of the company.
Exercise:• In an election poll a random sample of 500 people
showed that 42 preferred voting for a particular candidate. Set up a 90% confidence interval estimate for the population proportion, p of the particular candidate.
Z Test of Hypothesis for the ProportionZ Test of Hypothesis for the Proportion
• One sample Z test for the proportion
npp
ppZ
1
ˆ
n
Xp̂ Number of items having the characteristic of interest
Sample size
where
p̂ Sample proportion of successes
p Hypothesized proportion of successes in the population
You’re an accounting manager. A year-end audit showed 4% of transactions had errors. You implement new procedures. A random sample of 500 transactions had 25 errors. Has the proportion of incorrect transactions changed at the .05 level of significance?
ExampleExample
• H0:
• Ha: • = , /2 = 0.025• n = • Critical Value(s):
Test Statistic:
Decision:
Conclusion:
p = .04
p .04
.05
500
Z0 1.96-1.96
.025
Reject H0
Reject H0
.025
Do not reject H0 at = .05
There is evidence proportion is 4%
Example SolutionExample Solution
14.1
50096.04.0
04.050025
1
npp
ppZ
ExerciseExercise
• A fast-food chain has developed a new process to ensure that orders at the drive-through are filled correctly. The previous process filled orders correctly 85% of the time. Based on a sample of 100 orders using the new process, 94 were filled correctly. At a 0.01 level of significance, can you conclude that the new process has increased the proportion of orders filled correctly?
Assumptions:• Independent, random samples• Normal approximation can be used if
Large-Sample Inference about p1 – p2
Large-Sample Inference about p1 – p2
15ˆ1,15ˆ,15ˆ1,15ˆ 22221111 pnpnpnpn
• (1 – α)100% Confidence Interval for ( p1 – p2)
2
22
1
11
22121
2
22
1
11
221
ˆˆˆˆˆˆ
ˆˆˆˆˆˆ
n
qp
n
qpZpppp
n
qp
n
qpZpp
• where
22
11
ˆ1ˆ
ˆ1ˆ
pq
pq
Large-Sample Inference about p1 – p2
Large-Sample Inference about p1 – p2
orn
qp
n
qpZpppp ,
ˆˆˆˆˆˆ
2
22
1
11
22121
As personnel director, you want to test the perception of fairness of two methods of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated Method 2 as fair. Find a 99% confidence interval for the difference in perceptions.
ExampleExample
Example SolutionExample Solution
402.0598.01ˆ1ˆ,598.082
49ˆ
192.0808.01ˆ1ˆ,808.078
63ˆ
222
111
pqp
pqp
391.029.082
402.0598.0
78
192.0808.058.2598.0808.0
ˆˆˆˆˆˆ
21
2
22
1
11
221
pp
n
qp
n
qpZpp
Hypothesis Testing for Two Proportions
Ha
HypothesisNo Difference
Any DifferencePop 1 • ³
Pop 2
Pop 1 < Pop 2
Pop 1 • £ Pop 2
Pop 1 > Pop 2
H0
Z – Test Statistic:
The rejection region follows the way similar to that in the one sample tests
Hypothesized difference
1 2 0p p
1 2 0p p 1 2 0p p
1 2 0p p 1 2 0p p
1 2 0p p
Large-Sample Inference about p1 – p2
Large-Sample Inference about p1 – p2
21
2121
11
ˆˆ
nnpq
ppppZ
pqnn
XXp
1,21
21where
As personnel director, you want to test the perception of fairness of two methods of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated Method 2 as fair. At the .01 level of significance, is there a difference in perceptions?
ExampleExample
1 21 2
1 2
1 2
1 2
63 49ˆ ˆ.808 .598
78 82
63 49ˆ .70
78 82
X Xp p
n n
X Xp
n n
1 2 1 2
1 2
ˆ ˆ .808 .598 0
1 11 1 .70 1 .70ˆ ˆ178 82
2.90
p p p pZ
p pn n
Example SolutionExample Solution
Test Statistic:
Decision:
Conclusion:
Reject H0 at = .01
There is evidence of a difference in proportions
• H0:
• Ha: • = • n1 = n2 = • Critical Value(s):
p1 - p2 = 0
p1 - p2 0
.01
78 82
z0 2.58-2.58
Reject H0 Reject H0
.005 .005
Z = +2.90
5820050 .Z .
Example SolutionExample Solution
Chi-Square Tests for k ProportionsChi-Square Tests for k Proportions
• This topic extends hypothesis testing to analyze differences between population proportions based on two or more samples.
• Qualitative data that fall in more than two categories often result from a multinomial experiment.
• Some of the characteristics of the multinomial experiment are
The probabilities of the k outcomes, denoted p1, p2, … , pk, remain the same from trial to trial, where p1 + p2 + … + pk = 1
The trials are independent
• Recall, binomial experiment is a multinomial experiment with k = 2
Chi-Square (2) TestsChi-Square (2) Tests
Draw Sample
Populations
p1 = p2 = p3 = p4 = ….. pk
Evidence to accept/reject our
claim
Observed and expected frequencies
x , e
2 Test for equality of proportions
Road MapRoad Map
Decision Making
One/Two Samples Analysis of Variance
One-Way Table
χ2 Tests
Two-Way Table
Multinomial ExperimentMultinomial Experiment
• n identical and independent trials
• k outcomes to each trial
• Constant outcome probability, pk
• Random variable is count, nk
• Example: ask 100 people (n) which of 3 candidates (k) they will vote for
• Uses one-way contingency table: Shows number of observations in k independent groups (outcomes or variable levels)
One Way Contingency TableOne Way Contingency Table
Outcomes (k = 3)
Number of responses
Candidate
Tom Bill Mary Total
35 20 45 100
2 Test Basic Idea2 Test Basic Idea
Assumptions:
1. A multinomial experiment has been conducted
2. The sample size n is large: ei is greater than or equal to 5 for every cell ( i = 1, 2, 3, …, k)
1. Compares observed frequency (xi) to expected frequency [ei] assuming null hypothesis is true
2. Closer observed frequency is to expected frequency, the more likely the H0 is true
• Measured by squared difference relative to expected frequency
— Reject large values
2. Test Statistic Observed frequency
Expected frequency:ei = npi,0
3. Degrees of Freedom: k – 1 Number of outcomes
Hypothesized probability
1. Hypotheses• H0: p1 = p1,0, p2 = p2,0, ..., pk = pk,0
• Ha: At least one pi is different from above
2 Test for k Proportions2 Test for k Proportions
k
i i
ii
e
ex
1
22
What is the critical 2 value if k = 3, and =.05?
c20
Upper Tail Areadf .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991
2 Table (Portion)
If xi = ei, 2 = 0.
Do not reject H0
df = k - 1 = 2
5.991
Reject H0
= .05
Finding Critical Value ExampleFinding Critical Value Example
As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair, 45 rated Method 2 as fair, 72 rated Method 3 as fair. At the .05 level of significance, is there a difference in perceptions?
2 Test for k Proportions Example2 Test for k Proportions Example
2 Test for k Proportions Solution2 Test for k Proportions Solution
x1 = 63 x2 = 45 x3 = 72
603
180321 eee
3.6
60
6072
60
6045
60
6063 222
1
22
k
i i
ii
e
ex
Test Statistic:
Decision:
Conclusion:
2 = 6.3
Reject H0 at = .05
There is evidence of a difference in proportions
• H0:
• Ha:• =• n1 = n2 = n3 =• Critical Value(s):
c20
Reject H0
p1 = p2 = p3 = 1/3
At least 1 is different.05
63 45 72
5.991
= .05
2 Test for k Proportions Solution2 Test for k Proportions Solution
Road MapRoad Map
Decision Making
One/Two Samples Analysis of Variance
Two-Way
Table
χ2 Tests
One-Way Table
Test of Independenc
e
• Shows if a relationship exists between two qualitative (categorical) variables
One sample is drawn Does not show causality
• Uses two-way contingency table
2 Test of Independence2 Test of Independence
Assumptions:
1. Multinomial experiment has been conducted
2. The sample size, n, is large: eij is greater than or equal to 5 for every cell
Shows number of observations from 1 sample jointly in 2 qualitative variables
Levels of variable 2
Levels of variable 1
Two-Way Contingency TableTwo-Way Contingency Table
House Location House Style Urban Rural Total
Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160
1. Hypotheses• H0: Variables are independent
• Ha: Variables are related (dependent)
3. Degrees of Freedom: (r – 1)(c – 1)
Rows Columns
2. Test Statistic Observed frequency
Expected frequency
cells all
2
2
ij
ijij
e
ex
2 Test of Independence2 Test of Independence
1. Statistical independence means joint probability equals product of marginal probabilities
2. Compute marginal probabilities and multiply for joint probability
3. Expected frequency is sample size times joint probability
2 Test of Independence Expected Frequencies2 Test of Independence Expected Frequencies
78 160
Marginal probability =
112 160
Marginal probability = Joint probability = 112
16078 160
Expected freq. = 160× 112 160
78 160
= 54.6
Location Urban Rural
House Style Obs. Obs. Total
Split–Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected Frequency ExampleExpected Frequency Example
Ri
Cj
House LocationUrban Rural
House Style Obs. Exp. Obs. Exp. Total
Split Level 63
112×78 160
54.6 49
112×82 160
57.4 112
Ranch 15
48×78 160
23.4 33
48×82 160
24.6 48
Total 78 78 82 82 160•
= =
= =
Expected Frequency CalculationExpected Frequency Calculation
n
cre jiij
ri: Total frequency in row i-th
cj: Total frequency in column j-th
As a realtor you want to determine if house style and house location are related. At the .05 level of significance, is there evidence of a relationship?
ExampleExample
House Location House Style Urban Rural Total
Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160
House Location Urban Rural
House Style Obs. Exp. Obs. Exp. Total
Split-Level 63 54.6 49 57.4 112
Ranch 15 23.4 33 24.6 48
Total 78 78 82 82 160
eij 5 in all cells112×82
160
48×78 160
48×82 160
112×78 160
Example SolutionExample Solution
= =
= =
Example SolutionExample Solution
Test Statistic:
41.8
6.24
6.2433
4.57
4.5749
6.54
6.5463 222
2
2
cellsall ij
ijij
e
ex
Test Statistic:
Decision:
Conclusion:
2 = 8.41
Reject H0 at = .05
There is evidence of a relationship
• H0:
• Ha:• =• df = • Critical Value(s):
c20
Reject H0
No Relationship
Relationship.05
(2 – 1) (2 – 1) = 1
3.841
= .05
Example SolutionExample Solution
You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level of significance, is there evidence of a relationship?
Diet Pepsi
Diet Coke No Yes Total
No 84 32 116Yes 48 122 170Total 132 154 286
Exercise 1Exercise 1
Diet Pepsi No Yes
Diet Coke Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286
eij 5 in all cells
170×132 286
170×154 286
116×132 286
154×116 286
Exercise 1 SolutionExercise 1 Solution
= =
= =
Exercise 1 SolutionExercise 1 Solution
Test Statistic:
29.54
5.91
5.91122
5.62
5.6232
5.53
5.5384 222
2
2
cellsall ij
ijij
e
ex
Test Statistic:
Decision:
Conclusion:
2 = 54.29
Reject H0 at = .05
There is evidence of a relationship
• H0:
• Ha:• =• df = • Critical Value(s):
c20
Reject H0
No Relationship
Relationship.05
(2 – 1) (2 – 1) = 1
3.841
= .05
Exercise 1 SolutionExercise 1 Solution
There is a statistically significant relationship between purchasing Diet Coke and Diet Pepsi. So what do you think the relationship is? Aren’t they competitors?
Diet Pepsi
Diet Coke No Yes Total
No 84 32 116Yes 48 122 170Total 132 154 286
Exercise 2Exercise 2
Low Income
High IncomeDiet Pepsi
Diet Coke No Yes TotalNo 4 30 34Yes 40 2 42
Total 44 32 76•
Diet PepsiDiet Coke No Yes TotalNo 80 2 82Yes 8 120 128
Total 88 122 210•
You Re-Analyze the DataYou Re-Analyze the Data
Apparent relation
Underlying causal relation
Control or intervening variable (true cause)
Diet Coke
Diet Pepsi
True Relationships*True Relationships*
Numbers don’t think - People do!
Moral of the Story*Moral of the Story*