+ All Categories
Home > Documents > Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Date post: 21-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
49
Copyright (c) Bani K. Mal lick 1 STAT 651 Lecture #15
Transcript
Page 1: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 1

STAT 651

Lecture #15

Page 2: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 2

Topics in Lecture #15 Some basic probability

The binomial distribution

Inference about a single population proportions

Page 3: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 3

Book Sections Covered in Lecture #15

Chapters 4.7-4.8

Chapter 10.2

Page 4: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 4

Lecture 14 Review: Nonparametric Methods

Replace each observation by its rank in the pooled data

Do the usual ANOVA F-test

Kruskal-Wallis

Page 5: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 5

Lecture 14 Review: Nonparametric Methods

Once you have decided that the populations are different in their means, there is no version of a LSD

You simply have to do each comparison in turn

This is a bit of a pain in SPSS, because you physically must do each 2-population comparison, defining the groups as you go

Page 6: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 6

Categorical Data

Not all experiments are based on numerical outcomes

We will deal with categorical outcomes, i.e., outcomes that for each individual is a category

The simplest categorical variable is binary:

Success or failure

Male of female

Page 7: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 7

Categorical Data

For example, consider flipping a fair coin, and let

X = 0 means “tails”

X = 1 means “heads”

Page 8: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 8

Categorical Data

The fraction of the population who are “successes” will be denoted by the Greek symbol

Note that because it is a Greek symbol, it represents something to do with a population

For coin flipping, if you flipped all the fair coins in the world (the population), the fraction of the times they turn up heads equals

Page 9: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 9

Categorical Data

The fraction of the population who are “successes” will be denoted by the Greek symbol

The fraction of the sample of size n who are “successes” is going to be denoted by

We want to relate to

Let X = number of successes in the sample. The fraction = (# successes)/n = X / n

Page 10: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 10

Categorical Data

Suppose you flip a coin 10 times, and get 6 heads.

The proportion of heads = 0.60

The percentage of heads = 60%

Page 11: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 11

Categorical Data

The number of success X in n experiments each with probability of success is called a binomial random variable

There is a formula for this:

Pr(X = k) =

0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc.

k n kn!Pr( k/ n) (1 )ˆ

k! (n-k)!

Page 12: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 12

Categorical Data

0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc.

The idea is to relate the sample fraction to the population fraction using this formula

Key Point: if we knew , then we could entirely characterize the fraction of experiments that have k successes

k n kn!Pr(X k) Pr( k/ n) (1 )ˆ

k! (n-k)!

Page 13: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 13

Categorical Data

The probability that the coin lands on heads will be denoted by the Greek symbol

Suppose you flip a coin 2 times, and count the number of heads.

So here, X = number of heads that arise when you flip a coin 2 times

X takes on the values 0, 1 and 2

takes on the values 0/2, ½, 2/2

Page 14: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 14

Categorical Data: What the binomial formula does

The experiment results in 4 equally likely outcomes: each occurs ¼ of the time

Tails on toss #1

Heads on toss #1

Tails of toss #2

¼ ¼

Heads on Toss #2

¼ ¼

Page 15: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 15

Categorical Data

Heads = “success”:

Tails on toss #1

Heads on toss #1

Tails on toss #2

¼ ¼

Heads on Toss #2

¼ ¼

Pr(X 0) Pr( 0/ 2) 1/ 4ˆ Pr(X 1) Pr( 1/ 2) 1/ 2ˆ

Pr(X 2) Pr( 2/ 2) 1/ 4ˆ The binomial formula can be used to give these results without thinking

Page 16: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 16

Categorical Data

0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc.

n=2, k=1, k! = 1, n! = 2, (n-k)! = 1

The binomial formula gives the answer ½, which we know to be correct

k n kn!Pr(X k) Pr( k/ n) (1 )ˆ

k! (n-k)!

k n k.5, and(1 ) .5

Page 17: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 17

Categorical Data

Roll a fair dice

1 2 3 4 5 6

First Dice

Every combination is equally likely, so what are the probabilities?

Page 18: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 18

Categorical Data

Roll a fair dice

1 2 3 4 5 6

1/6 1/6 1/6 1/6 1/6 1/6

First Dice

Every combination is equally likely, so what are the probabilities?

Page 19: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 19

Categorical Data

Roll a fair dice

1 2 3 4 5 6

1/6 1/6 1/6 1/6 1/6 1/6

First Dice

Every combination is equally likely, so what are the probabilities?

What is the chance of rolling a 1 or a 2?

Page 20: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 20

Categorical Data

Roll a fair dice

1 2 3 4 5 6

1/6 1/6 1/6 1/6 1/6 1/6

First Dice

Every combination is equally likely, so what are the probabilities?

What is the chance of rolling a 1 or 2? 2/6 = 1/3

Page 21: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 21

Categorical Data

Now roll two fair dice

1 2 3 4 5 6

1

2

3

4

5

6

Second Dice

First Dice

Every combination is equally likely, so what are the probabilities?

Page 22: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 22

Categorical Data

Roll two fair dice

1 2 3 4 5 6

1 1/36 1/36 1/36 1/36 1/36 1/36

2 1/36 1/36 1/36 1/36 1/36 1/36

3 1/36 1/36 1/36 1/36 1/36 1/36

4 1/36 1/36 1/36 1/36 1/36 1/36

5 1/36 1/36 1/36 1/36 1/36 1/36

6 1/36 1/36 1/36 1/36 1/36 1/36

Second Dice

First Dice

Every combination is equally likely, so what are the probabilities?

Page 23: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 23

Categorical Data

Roll two fair dice

1 2 3 4 5 6

1 1/36 1/36 1/36 1/36 1/36 1/36

2 1/36 1/36 1/36 1/36 1/36 1/36

3 1/36 1/36 1/36 1/36 1/36 1/36

4 1/36 1/36 1/36 1/36 1/36 1/36

5 1/36 1/36 1/36 1/36 1/36 1/36

6 1/36 1/36 1/36 1/36 1/36 1/36

Second Dice

First Dice

Define a success as rolling a 1 or a 2. What is the chance of two successes?

Page 24: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 24

Categorical Data

Roll two fair dice

1 2 3 4 5 6

1 1/36 1/36 1/36 1/36 1/36 1/36

2 1/36 1/36 1/36 1/36 1/36 1/36

3 1/36 1/36 1/36 1/36 1/36 1/36

4 1/36 1/36 1/36 1/36 1/36 1/36

5 1/36 1/36 1/36 1/36 1/36 1/36

6 1/36 1/36 1/36 1/36 1/36 1/36

Second Dice

First Dice

Define a success as rolling a 1 or a 2. What is the chance of two successes? 4/36 = 1/9

Page 25: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 25

Categorical Data

Roll two fair dice

1 2 3 4 5 6

1 1/36 1/36 1/36 1/36 1/36 1/36

2 1/36 1/36 1/36 1/36 1/36 1/36

3 1/36 1/36 1/36 1/36 1/36 1/36

4 1/36 1/36 1/36 1/36 1/36 1/36

5 1/36 1/36 1/36 1/36 1/36 1/36

6 1/36 1/36 1/36 1/36 1/36 1/36

Second Dice

First Dice

Define a success as rolling a 1 or a 2. What is the chance of two failures? 16/36 = 4/9

Page 26: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 26

Categorical Data

So, a success occurs when you roll a 1 or a 2

Pr(success on a single die) = 2/6 = 1/3 =

Pr(2 successes) = 1/3 x 1/3 = 1/9

Use the binomial formula: pr(X=k) when k=2

k!=2, n!=2, (n-k)!=1,

k n k1/ 9,and(1 ) 1

k n kn!Pr(X k) Pr( k/ n) (1 ) 1/ 9ˆ

k! (n-k)!

Page 27: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 27

Categorical Data

In other words, the binomial formula works in these simple cases, where we can draw nice tables

Now think of rolling 4 dice, and ask the chance the 3 of the 4 times you get a 1 or a 2

Too big a table: need a formula

Page 28: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 28

Categorical Data

Does it matter what you call as “success” and hat you call a “failure”?

No, as long as you keep track

For example, in a class experiment many years ago, men were asked whether they preferred to wear boxers or briefs

This is binary, because there are only 2 outcomes

“success” = ?????

Page 29: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 29

Categorical Data

Binary experiments have sampling variability, just like sample means, etc.

Experiment: “success” = being under 5’10” in height

First 6 men with SSN < 5

First 6 men with SSN > 5

Note how the number of “successes” was not the same! (I might have to do this a few times)

Page 30: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 30

Categorical Data

The sample fraction is a random variable

This means that if I do the experiment over and over, I will get different values.

These different values have a standard deviation.

Page 31: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 31

Categorical Data

The sample fraction has a standard error

Its standard error is

Note how if you have a bigger sample, the standard error decreases

The standard error is biggest when = 0.50.

ˆ

(1 )n

Page 32: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 32

Categorical Data

The sample fraction has a standard error

Its standard error is

The estimated standard error based on the sample is

ˆ

(1 )n

ˆ

(1 )ˆ ˆˆ

n

Page 33: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 33

Categorical Data

It is possible to make confidence intervals for the population fraction if the number of successes > 5, and the number of failures > 5

If this is not satisfied, consult a statistician

Under these conditions, the Central Limit Theorem says that the sample fraction is approximately normally distributed (in repeated experiments)

Page 34: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 34

Categorical Data

(1100% CI for the population fraction

is by looking up 1 in Table 1

/ 2 ˆzˆ ˆ

ˆ

(1 )ˆ ˆˆ

n

/ 2z

Page 35: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 35

Categorical Data

Often, you will only know the sample proportion/percentage and the sample size

Computing the confidence interval for the population proportion: two ways By hand

By SPSS (this is a pain if you do not have the data entered already)

Because you may need to do this by hand, I will make you do this.

Page 36: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 36

Categorical Data

(1100% CI for the population fraction

95% CI, = 1.96

n = 25, = 0.30

/ 2 ˆzˆ ˆ

ˆ

(1 ) .3(1 .3)ˆ ˆ 0.09165ˆn 25

/ 2z

/ 2 ˆz 0.30 1.96x0.09165ˆ ˆ

Page 37: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 37

Categorical Data

(1100% CI for the population fraction

Interpretation?

/ 2 ˆz 0.30 1.96x0.09165ˆ ˆ

0.30 0.18 [0.12,0.48]

Page 38: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 38

Categorical Data

(1100% CI for the population fraction

Interpretation? The proportion of successes in the population is from 0.12 to 0.48 (12% to 48%) with 95% confidence

/ 2 ˆz 0.30 1.96x0.09165ˆ ˆ

0.30 0.18 [0.12,0.48]

Page 39: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 39

Categorical Data

You can use SPSS as long as the number of successes and the number of failures both exceed 5

To get the confidence intervals, you first have to define a numeric version of your variable that classifies whether an observation is a success or failure.

You then compute the 1-sample confidence interval from “descriptives” “Explore”: Demo

Page 40: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 40

Categorical Data

If you set up your data in SPSS, the “mean” will be the proportion/fraction/percentage of 1’s

Data = 0 1 1 1 0 0 0 1 0 0

n = 10

Mean = 4/10 = .40

= .40

Page 41: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 41

Boxers versus briefs for males

Case Processing Summary

188 100.0% 0 .0% 188 100.0%Boxers or BriefsPerference

N Percent N Percent N Percent

Valid Missing Total

Cases

In this output, boxers = 1 and briefs = 0

Page 42: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 42

Boxers versus briefs for males: what % prefer boxers? In the

sample, 46.81%. In the population???

Descriptives

.4681 3.649E-02

.3961

.5401

.4645

.0000

.250

.5003

.00

1.00

1.00

1.0000

.129 .177

-2.005 .353

MeanLower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

Boxers or BriefsPerference

Statistic Std. Error

In this output, boxers = 1 and briefs = 0. The proportionof 1’s is the mean

Page 43: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 43

Boxers versus briefs for males: what % prefer boxers? Between

39.61% and 54.01%

Descriptives

.4681 3.649E-02.3961

.5401

.4645

.0000

.250.5003

.00

1.001.00

1.0000.129 .177

-2.005 .353

Mean

Lower BoundUpper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

MedianVariance

Std. DeviationMinimum

MaximumRange

Interquartile Range

SkewnessKurtosis

GenderMaleNumeric Boxers: 0

= Briefs, 1 = Boxers

Statistic Std. Error

Page 44: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 44

Boxers versus briefs

In the sample, 46.81% of the men preferred boxers to briefs: 53.19% preferred briefs.

Between 39.61% and 54.01% men prefer boxers to briefs (95% CI)

Is there enough evidence to conclude that men generally prefer briefs?

Page 45: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 45

Boxers versus briefs

In the sample, 46.81% of the men preferred boxers to briefs: 53.19% preferred briefs.

Between 39.61% and 54.01% men prefer boxers to briefs (95% CI)

Is there enough evidence to conclude that men generally prefer briefs?

No: since 50% is in the CI! This means that it is possible (95%CI) that 50% prefer boxers, 50% prefer briefs, = 0.50.

Page 46: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 46

Sample Size Calculations

The standard error of the sample fraction is

If you want an (1100% CI interval to be

you should set

ˆ

(1 )n

E

/ 2

(1 )E z

n

Page 47: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 47

Sample Size Calculations

This means that

/ 2

(1 )E z

n

2/ 2 2

(1 )n z

E

Page 48: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 48

Sample Size Calculations

The small problem is that you do not know . You have two choices: Make a guess for

Set = 0.50 and calculate (most conservative, since it results in largest sample size)

Most polling operations make the latter choice, since it is most conservative

2/ 2 2

(1 )n z

E

Page 49: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Copyright (c) Bani K. Mallick 49

Sample Size Calculations: Examples

Set E = 0.04, 95% CI, you guess that = 0.30

You have no good guess:

2/ 2 2

(1 )n z

E

22

.3(1 .3)n 1.96 504

.04

22

.5(1 .5)n 1.96 601

.04


Recommended