+ All Categories
Home > Documents > Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test...

Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test...

Date post: 14-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
109
Transcript
Page 1: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to
Page 2: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Statistical Methods

Statistical Methods

Descriptive Statistics

Inferential Statistics

Estimation Hypothesis Testing

Others

Page 3: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Outline of today

Hypothesis testing for one population mean

Hypothesis testing for two samples comparing the differences between: The means of two related populations: Paired t-test

The means of two independent populations Two-sample t-test

The variances of two populations F test

Page 4: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Motivating Eg: Birth Weight

Average birth weight in the general population is 120 oz. You take a sample of 100 babies born in the hospital you work at

(that is located in a low-SES area), and find that the sample mean birth weight =115 oz with a sample standard deviation s=24 oz.

You wonder: is the mean birth weight of low-SES babies indeed lower than

that in the general population?

Or is this observed difference merely due to chance?

We need an objective method to determine which hypothesis is right

x

Page 5: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Null Hypothesis

Parameter interest: the mean birth weight of low-SES babies, denoted by µ

We propose a value for µ, denoted as µ0.

The null hypothesis, H0: µ = µ0 E.g. H0: µ = 120 Begin with the assumption that the null hypothesis is

true E.g. H0 : the mean birth weight of low-SES babies

is equal to that in the general population Similar to the notion of innocent until proven guilty

Purpose: to determine if the data leads us to reject the null hypothesis

Page 6: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Alternative Hypothesis

1. Is set up to represent research goal 2. Opposite of null hypothesis E.g. Ha : the mean birth weight of SES babies is lower than that

in the general population

3. Ha: µ < 120 4. Always has inequality sign: ≠, <, or > ≠ will lead to two-sided tests < , > will lead to one-sided tests

Page 7: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease

Suppose we want to compare fasting serum-cholesterol levels among recent Asian immigrants to the United States with typical levels found in the general U.S. population. Assume cholesterol levels in women aged 21-40 in the United States are approximately normally distributed with mean 190 mg/dL. It is unknown whether cholesterol levels among recent Asian Immigrants are higher or lower than those in the General U.S. Population.

•Parameter of interest: µ= cholesterol levels among recent Asian immigrants

•H0: µ = 190 vs Ha: µ ≠ 190

Page 8: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Hypothesis testing

Formally, a statistical hypothesis testing problem includes two hypotheses Null hypothesis (H0) Alternative hypothesis (Ha, H1)

Hypothesis must be stated before analysis A Belief about a population parameter

Parameter is population mean, proportion, variance NOT samples. We will never have a hypothesis statement with either or in it.

x p̂

Page 9: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Possible Outcomes in Hypothesis Testing

Truth: Real Situation (in practice unknown)

Null Hypothesis true Research Hypothesis true

Study inconclusive (H0 is accepted)

H0 is true and H0 is accepted

(Correct decision)

H1 is true and H0 is accepted

(Type II error=β) Research Hypothesis supported (H0 is rejected)

H0 is true and H0 is rejected

(Type I Error=α)

H1 is true and H0 is rejected

(Correct decision) 1-Type II error=1-

β=power

Page 10: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Errors in Making Decision

1. Type I Error Reject null hypothesis H0 when H0 is true type I error is α =Pr(reject H0 | H0 is true) Called level of significance

Has serious consequences

2. Type II Error Do not reject H0 when H0 is false (H1 is true) β =Pr(do not reject H0 | H1 is true)

3. Power of a statistical test is 1-β=Pr(reject H0 |H1 is

true)

Page 11: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Ex Birth Weight Average birth weight in the general population is 120 oz. You take a

sample of 100 babies born in the hospital you work at (that is located in a low-SES area), and find that the sample mean birth weight =115 oz with a sample standard deviation s=24 oz.

H0: µ = 120 Ha :µ < 120 What’s the meaning of type I error?

Type II error?

x

Page 12: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Ex Birth Weight Average birth weight in the general population is 120 oz. You take a

sample of 100 babies born in the hospital you work at (that is located in a low-SES area), and find that the sample mean birth weight =115 oz with a sample standard deviation s=24 oz.

H0: µ = 120 Ha :µ < 120 What’s the meaning of type I error? The probability of deciding the mean birth weight in the hospital was

lower than 120 oz when in fact it was 120 oz.

Type II error?

x

Page 13: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Ex Birth Weight Average birth weight in the general population is 120 oz. You take a

sample of 100 babies born in the hospital you work at (that is located in a low-SES area), and find that the sample mean birth weight =115 oz with a sample standard deviation s=24 oz.

H0: µ = 120 Ha :µ < 120 What’s the meaning of type I error? The probability of deciding the mean birth weight in the hospital was

lower than 120 oz when in fact it was 120 oz.

Type II error? The probability of deciding the mean birth weight in the hospital was

120 oz when in fact it was lower than 120 oz.

x

Page 14: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease Assume cholesterol levels in women aged 21-40 in the United

States are approximately normally distributed with mean 190 mg/dL. It is unknown whether cholesterol levels among recent Asian Immigrants are higher or lower than those in the General U.S. Population.

Parameter of interest: µ= cholesterol levels among recent Asian immigrants

H0: µ = 190 vs Ha: µ ≠ 190 What’s the meaning of type I error? Type II error?

Page 15: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease Assume cholesterol levels in women aged 21-40 in the United

States are approximately normally distributed with mean 190 mg/dL. It is unknown whether cholesterol levels among recent Asian Immigrants are higher or lower than those in the General U.S. Population.

Parameter of interest: µ= cholesterol levels among recent Asian immigrants

H0: µ = 190 vs Ha: µ ≠ 190 What’s the meaning of type I error? The probability of deciding that the cholesterol levels among recent

Asian immigrants are different from those in the general US population, while the truth is there is no difference.

Type II error?

Page 16: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease Assume cholesterol levels in women aged 21-40 in the United

States are approximately normally distributed with mean 190 mg/dL. It is unknown whether cholesterol levels among recent Asian Immigrants are higher or lower than those in the General U.S. Population.

Parameter of interest: µ= cholesterol levels among recent Asian immigrants

H0: µ = 190 vs Ha: µ ≠ 190 What’s the meaning of type I error?

The probability of deciding that the cholesterol levels among recent Asian immigrants are different from those in the general US population, while the truth is there is no difference.

Type II error? The probability of deciding that the cholesterol levels among recent Asian immigrants are the same as those in the general US population, while the truth is they are different.

Page 17: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Type I & II Error Relationship Type I and Type II errors cannot happen at the same time

Type I error can only occur if H0 is true

small α means reject H0 less often

Type II error can only occur if H0 is false

small β means reject H1 less often

If Type I error probability (α) , then

Type II error probability (β)

Page 18: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

α & β Have an Inverse Relationship

α β

Can’t reduce both errors simultaneously: trade-off!

General Strategy: fix α at specific level(0.05) and to use the test that minimizes β or, maximizes the power

Page 19: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Hypothesis Testing

Population

I believe the population mean age is =50 (hypothesis).

Mean x = 20

Reject hypothesis! Not close.

Random sample

Compare to the hypothesized value 0µ

x

Page 20: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

t-Test Statistics (σ Unknown)

The test statistic is a measure of how close the data is to the null hypothesis. The larger the test statistic, the further our data is from the null hypothesis and the stronger the evidence is in favor of the alternative hypothesis. Fact: If H0 is true, then test statistic t follows a t-

distribution on n-1 degrees of freedom, provided the data are from a normal distribution.

0

/XtS n

− µ=

Page 21: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Basic Idea:

t statistics 0

Distribution of t statistics

H0

tn-1 distribution

t

nsxt

/0µ−

=

Page 22: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Rejection Region

1. Def: the range of values of the test statistics t for which H0 is rejected

2. We need a critical (cut-off) value to decide if our sample mean is “too extreme” when null hypothesis is true.

3. Designated α (alpha) for Typical values are .01, .05, .10 selected by researcher at start α= P(Rejecting H0 | H0 is true)

Page 23: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Rejection Region (One-Sided Test)

0 tn-1, α

α

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1

Ha: µ < µ0

recall Pr(t<tn-1, α )= α ns

xt/

0µ−=

Page 24: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Rejection Region (One-Sided Test)

0 tn-1, α

α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1

Observed t statistic

Ha: µ < µ0

Page 25: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Rejection Region (One-Sided Test)

0 tn-1, α

α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

Observed t statistic

Ha: µ < µ0

Page 26: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Rejection Regions (Two-Sided Test)

1/2 α

Rejection Region

0

1/2 α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1, α/2 tn-1, 1-α /2

Ha: µ ≠ µ0

Page 27: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Rejection Regions (Two-Sided Test)

1/2 α

Rejection Region

0

1/2 α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1, α/2 tn-1, 1-α /2

Observed t statistic

Ha: µ ≠ µ0

Page 28: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Rejection Regions (Two-Sided Test)

1/2 α

Rejection Region

0

1/2 α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1, α/2 tn-1, 1-α /2 Observed t statistic

Ha: µ ≠ µ0

Page 29: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Rejection Regions (Two-Sided Test)

1/2 α

Rejection Region

0

1/2 α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1, α/2 tn-1, 1-α /2

Observed t statistic

Ha: µ ≠ µ0

Page 30: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Birth Weight Example

The mean birth weight in the United States is 120oz. You get a list of birth weights from 100 consecutive, full-term, live-born deliveries from the maternity ward of a hospital in a low-SES area.

The sample mean birth weight is 115 oz and standard deviation is 24 oz.

Can we actually say the underlying mean birth weight from this hospital is lower than the national average?

Page 31: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

One-Sided t Test Solution

H0: Ha: α = df = Critical Value(s):

Test Statistic:

Decision: Conclusion:

Page 32: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

One-Sided t Test Solution

H0: µ = 120 Ha: µ < 120 α =0.05 df = Critical Value(s):

Test Statistic:

Decision:

08.2100/24120115

/0 −=

−=

−=

nsxt µ

Page 33: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

One-Sided t Test Solution

H0: µ = 120 Ha: µ < 120 α =0.05 df =100-1=99 Critical Value(s): -1.66 EXCEL: t99, .05=-TINV(0.1,99)

Test Statistic:

Decision:

08.2100/24120115

/0 −=

−=

−=

nsxt µ

Page 34: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

One-Sided t Test Solution

H0: µ = 120 Ha: µ < 120 α =0.05 df =100-1=99 Critical Value(s): -1.66 EXCEL: t99, .05=-TINV(0.1,99)

Test Statistic: Decision:

08.2100/24120115

/0 −=

−=

−=

nsXt µ

66.108.2 −<−Reject H0 at significant level 0.05 and the true mean birth weight is significantly lower in this hospital than in the general population.

Page 35: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease ( two-sided t-test)

Test the hypothesis that the mean cholesterol level of recent female Asian immigrants is different from the mean in the general U.S. population 190 mg/dL. Blood tests are performed on 100 female Asian immigrants, the mean level mg/dL with standard deviation s=40 mg/dL

52.181=x

Page 36: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease ( two-sided t-test)

Test the hypothesis that the mean cholesterol level of recent female Asian immigrants is different from the mean in the general U.S. population 190 mg/dL. Blood tests are performed on 100 female Asian immigrants, the mean level mg/dL with standard deviation s=40 mg/dL

H0: µ = 190 vs Ha: µ ≠ 190

52.181=x

Page 37: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease ( two-sided t-test)

Test the hypothesis that the mean cholesterol level of recent female Asian immigrants is different from the mean in the general U.S. population 190 mg/dL. Blood tests are performed on 100 female Asian immigrants, the mean level mg/dL with standard deviation s=40 mg/dL

H0: µ = 190 vs Ha: µ ≠ 190

52.181=x

12.2100/40

19052.181/

0 −=−

=−

=ns

xt µ

Page 38: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease ( two-sided t-test)

Test the hypothesis that the mean cholesterol level of recent female Asian immigrants is different from the mean in the general U.S. population 190 mg/dL. Blood tests are performed on 100 female Asian immigrants, the mean level mg/dL with standard deviation s=40 mg/dL

H0: µ = 190 vs Ha: µ ≠ 190 12.2

100/4019052.181

/0 −=

−=

−=

nsxt µ

52.181=x

α =0.05, the critical values are t99, .025 =-TINV(0.05,99) = -1.98, so t99,

.975=1.98.

Page 39: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease ( two-sided t-test)

Test the hypothesis that the mean cholesterol level of recent female Asian immigrants is different from the mean in the general U.S. population 190 mg/dL. Blood tests are performed on 100 female Asian immigrants, the mean level mg/dL with standard deviation s=40 mg/dL

H0: µ = 190 vs Ha: µ ≠ 190 12.2

100/4019052.181

/0 −=

−=

−=

nsxt µ

52.181=x

0 t99, .025 t99, .975

α =0.05, the critical values are t99, .025 =-TINV(0.05,99) = -1.98, so t99,

.975=1.98. Thus t=-2.12 is in the rejection region.

Page 40: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease ( two-sided t-test)

Test the hypothesis that the mean cholesterol level of recent female Asian immigrants is different from the mean in the general U.S. population 190 mg/dL. Blood tests are performed on 100 female Asian immigrants, the mean level mg/dL with standard deviation s=40 mg/dL

H0: µ = 190 vs Ha: µ ≠ 190 12.2

100/4019052.181

/0 −=

−=

−=

nsxt µ

52.181=x

0 t99, .025 t99, .975

α =0.05, the critical values are t99, .025 =-TINV(0.05,99) = -1.98, so t99,

.975=1.98. Thus t=-2.12 is in the rejection region. H0 is rejected at 5% level of significance.

Conclusion: the mean cholesterol level of recent Asian immigrants is significantly different from the mean for the general U.S. population

Page 41: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Determination of Statistical Significance

Critical-value method Calculate the critical value from α and t

distribution p-value method Once we calculate the actual t statistics, we can

ask what is the smallest significance level at which we could still reject H0 (observed level of significance)

Page 42: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

p-value

Def.: Probability of obtaining a test statistic as extreme or more extreme than actual sample value given H0 is true

For H1: µ < µ0 p-value=P(tn-1<=tobs) For H1: µ > µ0 p-value=P(tn-1>= tobs)

For H1: µ = µ0 p-value= 2P(tn-1<= tobs), if tobs <=0 2(1-P(tn-1<= tobs)), if tobs >0 Used to make rejection decision If p-value ≥ α, do not reject H0 If p-value < α, reject H0

Page 43: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Birth Weight Example

The mean birth weight in the United States is 120oz. You get a list of birth weights from 100 consecutive, full-term, live-born deliveries from the maternity ward of a hospital in a low-SES area.

The sample mean birth weight is = 115 oz and standard deviation is s=24 oz.

Find the p-value.

x

Page 44: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

One-sided test: 1. t value of sample statistic (observed)

Z 0 -2.08

08.2100/24120115

/0 −=

−=

−=

nsxt µ

Page 45: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

One-sided test:

Use alternative hypothesis to find direction p-value is P(t99 <=-2.08) = .020 EXCEL function: TDIST(2.08,99,1)=0.02

Z 0 -2.08

p-Value=.02 tn-1 distribution

Pvalue< α, so H0 is rejected and we conclude that the true mean birth weight is significantly lower in this hospital than in the general population.

Page 46: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease ( two-sided t-test)

Test the hypothesis that the mean cholesterol level of recent female Asian immigrants is different from the mean in the general U.S. population 190 mg/dL. Blood tests are performed on 100 female Asian immigrants, the mean level mg/dL with standard deviation s=40 mg/dL

H0: µ = 190 vs Ha: µ ≠ 190

What is the pvalue?

52.181=x

Page 47: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Two-sided test: 1. t value of sample statistic (observed)

Z 0 -2.12

12.2100/40

19052.181/

0 −=−

=−

=ns

xt µ

Page 48: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Two-sided test:

p-value is 2*P(t99 ≤ -2.12) =TDIST(2.12,99,2)=0.037< α

t 0 -2.12

1/2 p-Value=.0185

Page 49: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Two-sided test:

p-value is 2*P(t99 ≤ -2.12) =TDIST(2.12,99,2)=0.037< α

Conclusion: the mean cholesterol level of recent Asian immigrants is significantly different from the mean for the general U.S. population

Z 0 2.12 -2.12

1/2 p-Value=.0185

1/2 p-Value=.0185

Page 50: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Relationship Between Hypothesis Testing and Confidence Intervals (2-sided

case)

p-value tells exactly how significant results are CI gives range of values that may contain µ

H0: µ = µ0 versus H1: µ = µ1 ≠ µ0 H0 is rejected with level α 100% × (1- α) CI does

not contain µ0

Page 51: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Hypothesis Tests vs. Confidence Intervals

There are three ways to test hypotheses (assume α = 0.05): 1. By critical value method 2. By computing p-value 3. By constructing confidence interval

Page 52: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EX Cardiovascular Disease ( two-sided t-test) CI method

Test the hypothesis that the mean cholesterol level of recent female Asian immigrants is different from the mean in the general U.S. population 190 mg/dL. Blood tests are performed on 100 female Asian immigrants, the mean level mg/dL with standard deviation s=40 mg/dL

H0: µ = 190 vs Ha: µ ≠ 190

0 181.52 190 2.12/ 40 / 100

xts n

µ− −= = = −

181.52x =

α =0.05, the critical values are t99, .025 =-TINV(0.05,99) = -1.98, so t99, .975=1.98. Then the 95% CI for µ is

Conclusion: since it does not include µ = 190, we conclude that the mean cholesterol level of recent Asian immigrants is significantly different from the mean for the general U.S. population

181.52 1.98*40 / 100 [173.6,189.4]± =

Page 53: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Hypothesis Testing: Two-Sample Inference

Page 54: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

We’ll learn… How to use hypothesis testing for comparing the

difference between 1. The means of two independent populations 2. The means of two related populations 3. The variances of two populations

Page 55: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Two-sample Inference

Possible scenarios: 1. Two independent samples From two independent populations

2. Paired samples: Single population (before/after measurements) Two related populations (matched pairs)

Page 56: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Example

Let’s say we are interested in the relationship between use of oral contraceptives (OC) and level of blood pressure (BP) in women.

1. Follow up non-OC users and measure the change when they become OC user :

Same patients: each woman is used as her own

control, so observed difference more likely to be due to OC use

Hard to follow up all patients Expensive

Longitudinal study; paired samples from single population

Page 57: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Example

Let’s say we are interested in the relationship between use of oral contraceptives (OC) and level of blood pressure (BP) in women.

2. Measure the difference in BP between a group of OC users and a group of non-OC users:

The participants are seen at only one visit, could

be very different due to other factors such as age Financially feasible

Cross-sectional: two independent samples

Page 58: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

The Paired t Test

Page 59: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Related Populations: Paired t Test

Tests means of 2 related populations Paired samples Repeated measures (before/after) Use difference between paired values

Assumptions: Both populations are normally distributed If not normal, use large samples (n>=30)

Page 60: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Want to study the effect of OC on blood pressure. Study design:

Recruit 10 women who are not using OC. Follow-up after 1 year of using OC. Interested in knowing BP difference.

Example

SBP Level: 10 women not using OC using OC Difference (baseline) 1 115 128 13 2 112 115 3 3 107 106 -1 4 119 128 9 … … …. ….

Page 61: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Mean Difference, σD Known

The ith paired difference is The point estimate for the population mean paired difference is D :

Suppose the population standard deviation of the difference scores, σD, is known

The test statistic for the mean difference is a Z value:

Di = X1i - X2i

D =

Dii=1

n

∑n , n is the number of pairs in the paired sample

Z =D − µD

σD

n Where µD = hypothesized mean difference σD = population standard dev. of differences n = the sample size (number of pairs)

Page 62: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

If σD is unknown, we can estimate the unknown population standard deviation with a sample standard deviation:

Use a paired t test, the test statistic for D is now a t statistic, with n-1 d.f.:

1n

)D(DS

n

1i

2i

D −

−=

∑=

Mean Difference, σD Unknown

t =D − µD

SD

n

Page 63: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Lower-tail test:

H0: µD = 0 H1: µD < 0

Upper-tail test:

H0: µD = 0 H1: µD > 0

Two-tail test:

H0: µD = 0 H1: µD ≠ 0

Hypothesis Testing for Mean Difference, σD Unknown

α α/2 α/2 α

-tα -tα/2 tα tα/2

Reject H0 if t < -tα Reject H0 if t > tα Reject H0 if t < -tα/2 or t > tα/2

Where t has n - 1 d.f.

Page 64: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Confidence Interval

where n = the sample size (number of pairs in the paired sample)

The confidence interval for µD is 1. σD known

2. σD unknown

D − Z

σD

n, D + Z

σD

n

D − tn−1

SD

nD + tn−1

SD

n

Page 65: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Paired t Test Example

SBP Level: 10 women not using OC using OC Difference, Di 1 115 128 13 2 112 115 3 3 107 106 - 1 4 119 128 9 5 115 122 7 6 138 145 7 7 126 132 6 8 105 109 4 9 104 102 -2 10 115 117 2

D =

Di∑n

=4.8, SD =(Di − D)2∑n − 1

= 4.566

Page 66: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Has the use of OC made a difference in their blood pressure (at the 0.01 level)?

H0: µD = 0 H1: µD ≠ 0

Critical Value = ± 2.26

d.f. = 10 - 1 = 9

Reject

α/2 t9,0.025 t9,0.975

- 2.26 2.26

Paired t Test: Solution

Reject

α/2 α = .01

EXCEL: t9,0.975 =TINV(0.05,9)=2.26

Page 67: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Has the use of OC made a difference in their blood pressure (at the 0.01 level)?

H0: µD = 0 H1: µD ≠ 0

Test Statistic:

Critical Value = ± 2.26

d.f. = 10 - 1 = 9

Reject

α/2 t9,0.025 t9,0.975

- 2.26 2.26

Paired t Test: Solution

Reject

α/2

3.32

α = .01

t =

D − µD

SD / n=

4.8 − 04.566/ 10

= 3.32

EXCEL: t9,0.975 =TINV(0.05,9)=2.26

Page 68: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Has the use of OC made a difference in their blood pressure (at the 0.01 level)?

4.8 D =

H0: µD = 0 H1: µD ≠ 0

Reject

α/2 - 2.26 2.26

Decision: reject H0 (t stat is in the reject region)

Conclusion: There is a significant change in the blood pressure.

Paired t Test: Solution

Reject

α/2

3.32 α = .01

Test Statistic:

Critical Value = ± 2.26

d.f. = 10 - 1 = 9

3.32

t =

D − µD

SD / n=

4.8 − 04.566/ 10

= 3.32

Page 69: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Confidence Interval for the True Difference (µD) Between the Underlying means of

Two Paired Samples

So in the above example: 95% Confidence Interval is [1.53, 8.07]

Page 70: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Install the Excel 2007 Analysis ToolPak

http://www.dummies.com/how-to/content/how-to-

install-the-excel-2007-analysis-toolpak.html

Although the Analysis ToolPak comes with Excel 2007, it doesn’t come pre-installed. Follow the following link to install it in Excel. After installation, please restart Excel, then you will see the Data Analysis button in the Analysis group added to the end of the Ribbon’s Data tab.

Page 71: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EXCEL Paired T-test Analysis

EXCELDataData Analysist-Test: Paired Two Sample for Means

Page 72: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EXCEL Paired T-test Analysis Results

t-Test: Paired Two Sample for Means

Variable 1 Variable 2

Mean 120.4 115.6

Variance 174.9333 106.2667

Observations 10 10

Pearson Correlation 0.954777

Hypothesized Mean Difference 0

df 9

t Stat 3.324651

P(T<=t) one-tail 0.004437

t Critical one-tail 1.833113

P(T<=t) two-tail 0.008874

t Critical two-tail 2.262157

Page 73: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Independent Samples

Page 74: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Independent Samples

Different data sources Unrelated Independent Sample selected from one population has no

effect on the sample selected from the other population

Goal: Test hypothesis or form a confidence interval for the difference between two population means, µ1 –µ2

The point estimate for the difference is X1 – X2

Page 75: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Difference Between Two Means

Population means, independent

samples

σ1 and σ2 known

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

Page 76: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Difference Between Two Means

Population means, independent

samples

σ1 and σ2 known Use a Z test statistic

Use Sp to estimate unknown σ , use a t test statistic and pooled standard deviation

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

Use S1 and S2 to estimate unknown σ1 and σ2 , use a separate-variance t test

Page 77: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

σ1 and σ2 Known

Assumptions:

Samples are randomly and independently drawn

Population distributions are normal or both sample sizes are ≥ 30

The test statistic is a Z-value…

Population standard deviations are known

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

Page 78: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

…and the standard error of X1 – X2 is

σ

X1 − X2=

σ12

n1

+σ 2

2

n2

(continued)

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

σ1 and σ2 Known

Page 79: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

Z =X1 − X2( )− µ1 − µ2( )

σ12

n1

+σ 2

2

n2

The test statistic for µ1 –µ2 is:

(continued)

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

σ1 and σ2 Known

Page 80: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Hypothesis Tests for Two Population Means

Lower-tail test:

H0: µ1 = µ2 H1: µ1 < µ2

i.e.,

H0: µ1 – µ2 = 0 H1: µ1 – µ2 < 0

Upper-tail test:

H0: µ1 = µ2 H1: µ1 > µ2

i.e.,

H0: µ1 – µ2 = 0 H1: µ1 – µ2 > 0

Two-tail test:

H0: µ1 = µ2 H1: µ1 ≠ µ2

i.e.,

H0: µ1 – µ2 = 0 H1: µ1 – µ2 ≠ 0

Two Population Means, Independent Samples

Page 81: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Lower-tail test: H0: µ1 – µ2 = 0 H1: µ1 – µ2 < 0

Upper-tail test: H0: µ1 – µ2 = 0 H1: µ1 – µ2 > 0

Two-tail test: H0: µ1 – µ2 = 0 H1: µ1 – µ2 ≠ 0

α α/2 α/2 α

-zα -zα/2 zα zα/2

Reject H0 if Z < -Zα Reject H0 if Z > Zα Reject H0 if Z < -Zα/2 or Z > Zα/2

Hypothesis tests for µ1 – µ2

Page 82: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

X1 − X2( )± Zσ1

2

n1

+σ 2

2

n2

The confidence interval for µ1 –µ2 is:

Confidence Interval, σ1 and σ2 Known

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

Page 83: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

σ1 and σ2 Unknown, Assumed Equal

Assumptions: Samples are randomly and independently drawn Populations are normally distributed or both sample sizes are at least 30 Population variances are unknown but assumed equal

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

Page 84: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

(continued)

Forming interval estimates:

The population variances are assumed equal, so use the two sample variances and pool them to estimate the common σ2

The pooled variance

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

σ1 and σ2 Unknown, Assumed Equal

Sp

2 =n1 − 1( )S1

2 + n2 − 1( )S22

(n1 − 1) + (n2 − 1)

Page 85: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

t =X1 − X2( )− µ1 − µ2( )

Sp2 1

n1

+1n2

The test statistic for µ1 –µ2 is a t value with (n1 + n2 – 2) degrees of freedom

(continued)

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

σ1 and σ2 Unknown, Assumed Equal

Page 86: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known X1 − X2( )± tn1 +n2 -2 Sp

2 1n1

+1n2

The confidence interval for µ1 –µ2 is:

Confidence Interval, σ1 and σ2 Unknown

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

Page 87: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Pooled-Variance t Test: Example

Example: Compare the mean systolic pressure between OC and non-OC users.

Assuming both populations are approximately normal with equal variances, is there a difference in average SBP (α = 0.05)?

SBP level ID not using OCs using OC 1 115 128 2 112 115 3 107 106 4 119 128 5 115 122 6 138 145 7 126 132 8 105 109 9 104 102

10 115 117

Page 88: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EXCEL DATA Analysis

With raw data, we can perform the t-test using the following analysis EXCELDataData Analysist-Test: Two

Sample Assuming Equal Variances

Page 89: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EXCEL T-test Analysis Results t-Test: Two-Sample Assuming Equal Variances

Variable 1 Variable 2 Mean 115.6 120.4 Variance 106.2666667 174.9333333

Observations 10 10

Pooled Variance 140.6

Hypothesized Mean Difference 0 df 18 t Stat -0.905177144

P(T<=t) one-tail 0.188664198

t Critical one-tail 1.734063592

P(T<=t) two-tail 0.377328397

t Critical two-tail 2.100922037

Sample results

Page 90: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Solution

H0: µ1 – µ2 = 0 Ha: µ1 – µ2 ≠ 0 α = 0.05 df = (10-1)+(10-1)=18 Critical Value(s): 2.1 TINV(0.05, 18)=2.1

Test Statistic: -0.91

Decision: Do not reject at α = 0.05 Conclusion: we conclude that the mean blood pressures of the OC an

d non-OC groups do not significantly differ from each

other. Pvalue=0.38>0.05, do not reject

H0 t 2.1 -2.1

.025

Reject H0

.025

Reject H0

0

Page 91: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Calculating the Test Statistic

( ) ( ) ( ) ( )− + − − + −= = =

− + − + −

2 21 1 2 22

p1 2

n 1 S n 1 S 10 1 106.3 10 1 174.9S 140.6

(n 1) (n 1) (10-1) (10 1)

( ) ( ) ( )µ µ− − − − −= = =

++

1 2 1 2

2p

1 2

X X 115.6 120.4 0t -0.91

1 11 1 140.6S 10 10n n

The test statistic is:

Page 92: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

σ1 and σ2 Unknown, Not Assumed Equal

Assumptions: Samples are randomly and independently drawn Populations are normally distributed or both sample sizes are at least 30 Population variances are unknown but cannot be assumed to be equal

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

Page 93: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

t =X1 − X2( )− µ1 − µ2( )

S12

n1

+S2

2

n2

The test statistic for µ1 –µ2 is:

(continued)

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

σ1 and σ2 Unknown, Unequal variances

Page 94: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

(continued)

Forming the test statistic:

The population variances are not assumed equal, so include the two sample variances in the computation of the test statistic

The test statistic can be approximated by a t distribution with v degrees of freedom (see next slide)

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

σ1 and σ2 Unknown, Not Assumed Equal

Page 95: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

The number of degrees of freedom is the integer portion of:

(continued)

ν =

S12

n1

+S2

2

n2

2

S12

n1

2

n1 − 1( )+S1

2

n1

2

n2 − 1( )

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

σ1 and σ2 Unknown, Assumed Unequal (Satterthwaite’s method)

Page 96: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Population means, independent

samples

σ1 and σ2 known

The confidence interval for µ1 –µ2 is:

(continued)

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

σ1 and σ2 Unknown, Unequal variances

X1 − X2( )± tν

S12

n1

+S2

2

n2

Page 97: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Unequal Variances: Example

Example: Compare the mean systolic pressure between OC and non-OC users.

Assuming both populations are approximately normal with unequal variances, test for the equality of the mean cholesterol levels of the children Group1 and Group2 (α = 0.05).

SBP level ID not using OCs using OC 1 115 128 2 112 115 3 107 106 4 119 128 5 115 122 6 138 145 7 126 132 8 105 109 9 104 102

10 115 117

Page 98: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EXCEL DATA Analysis

EXCELDataData Analysist-Test: Two Sample Assuming Unequal Variances

Page 99: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EXCEL DATA Analysis

t-Test: Two-Sample Assuming Unequal Variances

Variable 1 Variable 2

Mean 115.6 120.4

Variance 106.2667 174.9333

Observations 10 10

Hypothesized Mean Difference 0

df 17

t Stat -0.90518

P(T<=t) one-tail 0.189011

t Critical one-tail 1.739607

P(T<=t) two-tail 0.378022

t Critical two-tail 2.109816

Page 100: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Solution

H0: µ1 – µ2 = 0 Ha: µ1 – µ2 ≠ 0 α = 0.05 Degree of freedom?? Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Page 101: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Solution

H0: µ1 – µ2 = 0 Ha: µ1 – µ2 ≠ 0 α = 0.05 Degree of freedom=17 Critical Value(s): from the Table 5,

t17,0.975=2.11

Test Statistic: -0.91 Decision: reject at α = 0.05 Conclusion: we conclude that the mean blood pressures of the OC and

non-OC groups do not significantly differ from each ot

her. .

t 0 2.11 -2.11

.0025

Reject H0

.0025

Reject H0

Page 102: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Calculating the degree of freedom

The d.f (integer portion) is: 17

( ) ( )ν

+

=

− + −

+ =

+

22 21 2

1 22 22 2

1 11 2

1 1

2

2 2

S Sn n

S Sn 1 n 1n n

106.27 174.9310 10 = 17

106.27 174.939 910 10

( ) ( )

( )

µ µ− − −=

+

− −= =

+

1 2 1 2

2 21 2

1 2

X Xt

S Sn n

174.10 93 0-0.91

174.9310 1

6.27

0106.27

Test Statistic:

Page 103: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Comparing population variances: F test

Page 104: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Testing for the equality of Two Variances

Suppose that we have two independent samples from a N(µ1, σ1

2) and N(µ2, σ22) populations, respectively.

We want to test the following hypotheses: H0: σ1

2 = σ22

H1: σ12 ≠ σ2

2

Test Statistic (based on the F-distribution):

Reject H0 if the variance ratio is either too large (>> 1) or too small (<< 1) and accept otherwise

Follows the F-distribution (Fisher, Snedecor)

S12

S22

Page 105: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

F distribution

follows F distribution under H0: σ12 = σ2

2

F is a family of distributions, parameterized by df1 = n1-1 and df2 =

n2-1 Denoted

Always positive

(generally positively skewed)

Reject if

F =

S12

S22

F =

S12

S22 ~ Fn1 −1,n2 −1

F > Fn1 −1,n2 −1,1−α / 2 or F < Fn1 −1,n2 −1,α / 2

Page 106: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

F Test for Two Variances Critical Values

Rejection region

Rejection region

Acceptation region

2/,1,1 21 α−− nnF 2/1,1,1 21 α−−− nnF

F > Fn1 −1,n2 −1,1−α / 2 or F < Fn1 −1,n2 −1,α / 2 Reject if

Page 107: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Unequal Variances: Example

Example: Compare the mean systolic pressure between OC and non-OC users.

Test for the equality of the two variances (α = 0.05).

SBP level ID not using OCs using OC 1 115 128 2 112 115 3 107 106 4 119 128 5 115 122 6 138 145 7 126 132 8 105 109 9 104 102

10 115 117

Page 108: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

EXCEL DATA Analysis

EXCELDataData AnalysisF-Test: Two Sample for Variances Sample Output

F-Test Two-Sample for Variances

Variable 1 Variable 2

Mean 120.4 115.6

Variance 174.9333 106.2667

Observations 10 10

df 9 9

F 1.646173

P(F<=f) one-tail 0.234645

F Critical one-tail 3.178893

NOTE: • This Excel procedure is only for one-sided test.

• If the test is two-sided you have two options. First, you can divide the given value of by 2, and input the result as the level of significance. The second option is to always use the p-value criterion and for a two-sided test, multiply the one-sided p-value by 2.

α

Page 109: Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test Statistics (σ Unknown) The test statistic is a measure of how close the data is to

Two Sample Inference

Two independent populations or paired samples? Paired samples: Paired z-test (if σ known) or paired t-test (if σ unknown) based

on observed differences

Independent samples: z-test if population variances are given Otherwise, F test to determine if population variances can be assumed equal If yes, t-test with pooled estimate of variance If no, t-test with separate estimates of variance


Recommended