TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random...

transcript

TOPIC 10TOPIC 10

Discrete (Categorical) Data Analysis

Discrete Random VariablesDiscrete Random Variables

Recall that discrete random variables may take only discrete values.

For example,• Number of errors in a software product:

0, 1, 2, 3, 4, …• Categories of a product’s quality level”

High, medium, or low• Characteristics of a machine breakdown:

Mechanical failure, electrical failure, or operator misuse.

Sample ProportionsSample Proportions

Recall that the success probability p can be estimated by the sample proportion

For large enough values of n the sample proportion can be taken to have approximately the normal distribution

This expression may be written in terms of a standard normal distribution as

= Standard Errorp̂

Confidence Interval Estimation for pConfidence Interval Estimation for p

Assumptions:

ˆ1ˆˆ

Since the probability of p is unknown then we replace p with its estimated p̂

You’re a production manager for a newspaper. You want to find the % defective. Of 200 newspapers, 35 had defects. What is the 90% confidence interval estimate of the population proportion defective?

ExampleExample

Example SolutionExample Solution

2192.01308.0200

825.0175.0645.1175.0

ˆ1ˆˆ

SE = Sampling Error

• If no estimate of p is available, use p = 1 – p = 0.5

Sample Size for Estimating pSample Size for Estimating p

I don’t want to sample too much or too little!

What sample size is needed to estimate p with 90% confidence and a width L of .03?

ExampleExample

015.02

300769.3006015.0

5.05.0645.11

ExercisesExercises

• Suppose that the auditing procedures require you to have 95% confidence in estimating the population proportion of sales invoices with errors within ± 0.07. The results from the past months indicate that the largest proportion has been no more than 0.15. Find the sample size needed to satisfy the requirements of the company.

Exercise:• In an election poll a random sample of 500 people

showed that 42 preferred voting for a particular candidate. Set up a 90% confidence interval estimate for the population proportion, p of the particular candidate.

Z Test of Hypothesis for the ProportionZ Test of Hypothesis for the Proportion

• One sample Z test for the proportion

Xp̂ Number of items having the characteristic of interest

Sample size

p̂ Sample proportion of successes

p Hypothesized proportion of successes in the population

You’re an accounting manager. A year-end audit showed 4% of transactions had errors. You implement new procedures. A random sample of 500 transactions had 25 errors. Has the proportion of incorrect transactions changed at the .05 level of significance?

ExampleExample

• H0:

• Ha: • = , /2 = 0.025• n = • Critical Value(s):

Test Statistic:

Decision:

Conclusion:

p = .04

Z0 1.96-1.96

Reject H0

Do not reject H0 at = .05

There is evidence proportion is 4%

50096.04.0

04.050025

ExerciseExercise

• A fast-food chain has developed a new process to ensure that orders at the drive-through are filled correctly. The previous process filled orders correctly 85% of the time. Based on a sample of 100 orders using the new process, 94 were filled correctly. At a 0.01 level of significance, can you conclude that the new process has increased the proportion of orders filled correctly?

Assumptions:• Independent, random samples• Normal approximation can be used if

Large-Sample Inference about p1 – p2

15ˆ1,15ˆ,15ˆ1,15ˆ 22221111 pnpnpnpn

• (1 – α)100% Confidence Interval for ( p1 – p2)

ˆˆˆˆˆˆ

qpZpppp

• where

qpZpppp ,

ˆˆˆˆˆˆ

As personnel director, you want to test the perception of fairness of two methods of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated Method 2 as fair. Find a 99% confidence interval for the difference in perceptions.

ExampleExample

402.0598.01ˆ1ˆ,598.082

192.0808.01ˆ1ˆ,808.078

391.029.082

402.0598.0

192.0808.058.2598.0808.0

ˆˆˆˆˆˆ

Hypothesis Testing for Two Proportions

HypothesisNo Difference

Any DifferencePop 1 • ³

Pop 1 < Pop 2

Pop 1 • £ Pop 2

Pop 1 > Pop 2

Z – Test Statistic:

The rejection region follows the way similar to that in the one sample tests

Hypothesized difference

1 2 0p p

1 2 0p p 1 2 0p p

1 2 0p p

21where

As personnel director, you want to test the perception of fairness of two methods of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated Method 2 as fair. At the .01 level of significance, is there a difference in perceptions?

ExampleExample

1 21 2

63 49ˆ ˆ.808 .598

63 49ˆ .70

X Xp p

1 2 1 2

ˆ ˆ .808 .598 0

1 11 1 .70 1 .70ˆ ˆ178 82

p p p pZ

p pn n

Test Statistic:

Decision:

Conclusion:

Reject H0 at = .01

There is evidence of a difference in proportions

• H0:

• Ha: • = • n1 = n2 = • Critical Value(s):

p1 - p2 = 0

p1 - p2 0

z0 2.58-2.58

Reject H0 Reject H0

.005 .005

Z = +2.90

5820050 .Z .

Chi-Square Tests for k ProportionsChi-Square Tests for k Proportions

• This topic extends hypothesis testing to analyze differences between population proportions based on two or more samples.

• Qualitative data that fall in more than two categories often result from a multinomial experiment.

• Some of the characteristics of the multinomial experiment are

The probabilities of the k outcomes, denoted p1, p2, … , pk, remain the same from trial to trial, where p1 + p2 + … + pk = 1

The trials are independent

• Recall, binomial experiment is a multinomial experiment with k = 2

Chi-Square (2) TestsChi-Square (2) Tests

Draw Sample

Populations

p1 = p2 = p3 = p4 = ….. pk

Evidence to accept/reject our

Observed and expected frequencies

2 Test for equality of proportions

Road MapRoad Map

Decision Making

One/Two Samples Analysis of Variance

One-Way Table

χ2 Tests

Two-Way Table

Multinomial ExperimentMultinomial Experiment

• n identical and independent trials

• k outcomes to each trial

• Constant outcome probability, pk

• Random variable is count, nk

• Example: ask 100 people (n) which of 3 candidates (k) they will vote for

• Uses one-way contingency table: Shows number of observations in k independent groups (outcomes or variable levels)

One Way Contingency TableOne Way Contingency Table

Outcomes (k = 3)

Number of responses

Candidate

Tom Bill Mary Total

35 20 45 100

2 Test Basic Idea2 Test Basic Idea

Assumptions:

1. A multinomial experiment has been conducted

2. The sample size n is large: ei is greater than or equal to 5 for every cell ( i = 1, 2, 3, …, k)

1. Compares observed frequency (xi) to expected frequency [ei] assuming null hypothesis is true

2. Closer observed frequency is to expected frequency, the more likely the H0 is true

• Measured by squared difference relative to expected frequency

— Reject large values

2. Test Statistic Observed frequency

Expected frequency:ei = npi,0

3. Degrees of Freedom: k – 1 Number of outcomes

Hypothesized probability

1. Hypotheses• H0: p1 = p1,0, p2 = p2,0, ..., pk = pk,0

• Ha: At least one pi is different from above

2 Test for k Proportions2 Test for k Proportions

What is the critical 2 value if k = 3, and =.05?

Upper Tail Areadf .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991

2 Table (Portion)

If xi = ei, 2 = 0.

Do not reject H0

df = k - 1 = 2

Reject H0

Finding Critical Value ExampleFinding Critical Value Example

As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair, 45 rated Method 2 as fair, 72 rated Method 3 as fair. At the .05 level of significance, is there a difference in perceptions?

2 Test for k Proportions Example2 Test for k Proportions Example

2 Test for k Proportions Solution2 Test for k Proportions Solution

x1 = 63 x2 = 45 x3 = 72

180321 eee

6063 222

Test Statistic:

Decision:

Conclusion:

2 = 6.3

Reject H0 at = .05

There is evidence of a difference in proportions

• H0:

• Ha:• =• n1 = n2 = n3 =• Critical Value(s):

Reject H0

p1 = p2 = p3 = 1/3

At least 1 is different.05

63 45 72

2 Test for k Proportions Solution2 Test for k Proportions Solution

Road MapRoad Map

Decision Making

One/Two Samples Analysis of Variance

Two-Way

χ2 Tests

One-Way Table

Test of Independenc

• Shows if a relationship exists between two qualitative (categorical) variables

One sample is drawn Does not show causality

• Uses two-way contingency table

2 Test of Independence2 Test of Independence

Assumptions:

1. Multinomial experiment has been conducted

2. The sample size, n, is large: eij is greater than or equal to 5 for every cell

Shows number of observations from 1 sample jointly in 2 qualitative variables

Levels of variable 2

Levels of variable 1

Two-Way Contingency TableTwo-Way Contingency Table

House Location House Style Urban Rural Total

Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

1. Hypotheses• H0: Variables are independent

• Ha: Variables are related (dependent)

3. Degrees of Freedom: (r – 1)(c – 1)

Rows Columns

2. Test Statistic Observed frequency

Expected frequency

cells all

2 Test of Independence2 Test of Independence

1. Statistical independence means joint probability equals product of marginal probabilities

2. Compute marginal probabilities and multiply for joint probability

3. Expected frequency is sample size times joint probability

2 Test of Independence Expected Frequencies2 Test of Independence Expected Frequencies

78 160

Marginal probability =

112 160

Marginal probability = Joint probability = 112

16078 160

Expected freq. = 160× 112 160

78 160

= 54.6

Location Urban Rural

House Style Obs. Obs. Total

Split–Level 63 49 112

Ranch 15 33 48

Total 78 82 160

Expected Frequency ExampleExpected Frequency Example

House LocationUrban Rural

House Style Obs. Exp. Obs. Exp. Total

Split Level 63

112×78 160

54.6 49

112×82 160

57.4 112

Ranch 15

48×78 160

23.4 33

48×82 160

24.6 48

Total 78 78 82 82 160•

Expected Frequency CalculationExpected Frequency Calculation

cre jiij

ri: Total frequency in row i-th

cj: Total frequency in column j-th

As a realtor you want to determine if house style and house location are related. At the .05 level of significance, is there evidence of a relationship?

ExampleExample

House Location House Style Urban Rural Total

Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

House Location Urban Rural

House Style Obs. Exp. Obs. Exp. Total

Split-Level 63 54.6 49 57.4 112

Ranch 15 23.4 33 24.6 48

Total 78 78 82 82 160

eij 5 in all cells112×82

48×78 160

48×82 160

112×78 160

Test Statistic:

6.2433

4.5749

6.5463 222

cellsall ij

Test Statistic:

Decision:

Conclusion:

2 = 8.41

Reject H0 at = .05

There is evidence of a relationship

• H0:

• Ha:• =• df = • Critical Value(s):

Reject H0

No Relationship

Relationship.05

(2 – 1) (2 – 1) = 1

You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level of significance, is there evidence of a relationship?

Diet Pepsi

Diet Coke No Yes Total

No 84 32 116Yes 48 122 170Total 132 154 286

Exercise 1Exercise 1

Diet Pepsi No Yes

Diet Coke Obs. Exp. Obs. Exp. Total

No 84 53.5 32 62.5 116

Yes 48 78.5 122 91.5 170

Total 132 132 154 154 286

eij 5 in all cells

170×132 286

170×154 286

116×132 286

154×116 286

Exercise 1 SolutionExercise 1 Solution

Test Statistic:

5.91122

5.6232

5.5384 222

cellsall ij

Test Statistic:

Decision:

Conclusion:

2 = 54.29

Reject H0 at = .05

There is evidence of a relationship

• H0:

• Ha:• =• df = • Critical Value(s):

Reject H0

No Relationship

Relationship.05

(2 – 1) (2 – 1) = 1

Exercise 1 SolutionExercise 1 Solution

There is a statistically significant relationship between purchasing Diet Coke and Diet Pepsi. So what do you think the relationship is? Aren’t they competitors?

Diet Pepsi

Diet Coke No Yes Total

No 84 32 116Yes 48 122 170Total 132 154 286

Exercise 2Exercise 2

Low Income

High IncomeDiet Pepsi

Diet Coke No Yes TotalNo 4 30 34Yes 40 2 42

Total 44 32 76•

Diet PepsiDiet Coke No Yes TotalNo 80 2 82Yes 8 120 128

Total 88 122 210•

You Re-Analyze the DataYou Re-Analyze the Data

Apparent relation

Underlying causal relation

Control or intervening variable (true cause)

Diet Coke

Diet Pepsi

True Relationships*True Relationships*

Numbers don’t think - People do!

Moral of the Story*Moral of the Story*

TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random...

Documents