Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test...

Statistical Methods

Statistical Methods

Descriptive Statistics

Inferential Statistics

Estimation Hypothesis Testing

Others

Outline of today

Hypothesis testing for one population mean

Hypothesis testing for two samples comparing the differences between: The means of two related populations: Paired t-test

The means of two independent populations Two-sample t-test

The variances of two populations F test

Motivating Eg: Birth Weight

Average birth weight in the general population is 120 oz. You take a sample of 100 babies born in the hospital you work at

(that is located in a low-SES area), and find that the sample mean birth weight =115 oz with a sample standard deviation s=24 oz.

You wonder: is the mean birth weight of low-SES babies indeed lower than

that in the general population?

Or is this observed difference merely due to chance?

We need an objective method to determine which hypothesis is right

x

Null Hypothesis

Parameter interest: the mean birth weight of low-SES babies, denoted by µ

We propose a value for µ, denoted as µ0.

The null hypothesis, H0: µ = µ0 E.g. H0: µ = 120 Begin with the assumption that the null hypothesis is

true E.g. H0 : the mean birth weight of low-SES babies

is equal to that in the general population Similar to the notion of innocent until proven guilty

Purpose: to determine if the data leads us to reject the null hypothesis

Alternative Hypothesis

1. Is set up to represent research goal 2. Opposite of null hypothesis E.g. Ha : the mean birth weight of SES babies is lower than that

in the general population

3. Ha: µ < 120 4. Always has inequality sign: ≠, <, or > ≠ will lead to two-sided tests < , > will lead to one-sided tests

EX Cardiovascular Disease

Suppose we want to compare fasting serum-cholesterol levels among recent Asian immigrants to the United States with typical levels found in the general U.S. population. Assume cholesterol levels in women aged 21-40 in the United States are approximately normally distributed with mean 190 mg/dL. It is unknown whether cholesterol levels among recent Asian Immigrants are higher or lower than those in the General U.S. Population.

•Parameter of interest: µ= cholesterol levels among recent Asian immigrants

•H0: µ = 190 vs Ha: µ ≠ 190

Hypothesis testing

Formally, a statistical hypothesis testing problem includes two hypotheses Null hypothesis (H0) Alternative hypothesis (Ha, H1)

Hypothesis must be stated before analysis A Belief about a population parameter

Parameter is population mean, proportion, variance NOT samples. We will never have a hypothesis statement with either or in it.

x p̂

Possible Outcomes in Hypothesis Testing

Truth: Real Situation (in practice unknown)

Null Hypothesis true Research Hypothesis true

Study inconclusive (H0 is accepted)

H0 is true and H0 is accepted

(Correct decision)

H1 is true and H0 is accepted

(Type II error=β) Research Hypothesis supported (H0 is rejected)

H0 is true and H0 is rejected

(Type I Error=α)

H1 is true and H0 is rejected

(Correct decision) 1-Type II error=1-

β=power

Errors in Making Decision

1. Type I Error Reject null hypothesis H0 when H0 is true type I error is α =Pr(reject H0 | H0 is true) Called level of significance

Has serious consequences

2. Type II Error Do not reject H0 when H0 is false (H1 is true) β =Pr(do not reject H0 | H1 is true)

3. Power of a statistical test is 1-β=Pr(reject H0 |H1 is

true)

Ex Birth Weight Average birth weight in the general population is 120 oz. You take a

sample of 100 babies born in the hospital you work at (that is located in a low-SES area), and find that the sample mean birth weight =115 oz with a sample standard deviation s=24 oz.

H0: µ = 120 Ha :µ < 120 What’s the meaning of type I error?

Type II error?

x



H0: µ = 120 Ha :µ < 120 What’s the meaning of type I error? The probability of deciding the mean birth weight in the hospital was

lower than 120 oz when in fact it was 120 oz.

Type II error?

x



H0: µ = 120 Ha :µ < 120 What’s the meaning of type I error? The probability of deciding the mean birth weight in the hospital was

lower than 120 oz when in fact it was 120 oz.

Type II error? The probability of deciding the mean birth weight in the hospital was

120 oz when in fact it was lower than 120 oz.

x

EX Cardiovascular Disease Assume cholesterol levels in women aged 21-40 in the United

States are approximately normally distributed with mean 190 mg/dL. It is unknown whether cholesterol levels among recent Asian Immigrants are higher or lower than those in the General U.S. Population.

Parameter of interest: µ= cholesterol levels among recent Asian immigrants

H0: µ = 190 vs Ha: µ ≠ 190 What’s the meaning of type I error? Type II error?




H0: µ = 190 vs Ha: µ ≠ 190 What’s the meaning of type I error? The probability of deciding that the cholesterol levels among recent

Asian immigrants are different from those in the general US population, while the truth is there is no difference.

Type II error?




H0: µ = 190 vs Ha: µ ≠ 190 What’s the meaning of type I error?

The probability of deciding that the cholesterol levels among recent Asian immigrants are different from those in the general US population, while the truth is there is no difference.

Type II error? The probability of deciding that the cholesterol levels among recent Asian immigrants are the same as those in the general US population, while the truth is they are different.

Type I & II Error Relationship Type I and Type II errors cannot happen at the same time

Type I error can only occur if H0 is true

small α means reject H0 less often

Type II error can only occur if H0 is false

small β means reject H1 less often

If Type I error probability (α) , then

Type II error probability (β)

α & β Have an Inverse Relationship

α β

Can’t reduce both errors simultaneously: trade-off!

General Strategy: fix α at specific level(0.05) and to use the test that minimizes β or, maximizes the power

Hypothesis Testing

Population

I believe the population mean age is =50 (hypothesis).

Mean x = 20

Reject hypothesis! Not close.

Random sample

Compare to the hypothesized value 0µ

x

0µ

t-Test Statistics (σ Unknown)

The test statistic is a measure of how close the data is to the null hypothesis. The larger the test statistic, the further our data is from the null hypothesis and the stronger the evidence is in favor of the alternative hypothesis. Fact: If H0 is true, then test statistic t follows a t-

distribution on n-1 degrees of freedom, provided the data are from a normal distribution.

0

/XtS n

− µ=

Basic Idea:

t statistics 0

Distribution of t statistics

H0

tn-1 distribution

t

nsxt

/0µ−

=

Rejection Region

1. Def: the range of values of the test statistics t for which H0 is rejected

2. We need a critical (cut-off) value to decide if our sample mean is “too extreme” when null hypothesis is true.

3. Designated α (alpha) for Typical values are .01, .05, .10 selected by researcher at start α= P(Rejecting H0 | H0 is true)

Rejection Region (One-Sided Test)

0 tn-1, α

α

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1

Ha: µ < µ0

recall Pr(t<tn-1, α )= α ns

xt/

0µ−=


0 tn-1, α

α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1

Observed t statistic

Ha: µ < µ0


0 tn-1, α

α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence


Ha: µ < µ0

Rejection Regions (Two-Sided Test)

1/2 α

Rejection Region

0

1/2 α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1, α/2 tn-1, 1-α /2

Ha: µ ≠ µ0


1/2 α

Rejection Region

0

1/2 α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1, α/2 tn-1, 1-α /2


Ha: µ ≠ µ0


1/2 α

Rejection Region

0

1/2 α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1, α/2 tn-1, 1-α /2 Observed t statistic

Ha: µ ≠ µ0


1/2 α

Rejection Region

0

1/2 α

t Statistic

Rejection Region

Nonrejection Region

1 - α

Level of Confidence

tn-1, α/2 tn-1, 1-α /2


Ha: µ ≠ µ0

Birth Weight Example

The mean birth weight in the United States is 120oz. You get a list of birth weights from 100 consecutive, full-term, live-born deliveries from the maternity ward of a hospital in a low-SES area.

The sample mean birth weight is 115 oz and standard deviation is 24 oz.

Can we actually say the underlying mean birth weight from this hospital is lower than the national average?

One-Sided t Test Solution

H0: Ha: α = df = Critical Value(s):

Test Statistic:

Decision: Conclusion:


H0: µ = 120 Ha: µ < 120 α =0.05 df = Critical Value(s):

Test Statistic:

Decision:

08.2100/24120115

/0 −=

−=

−=

nsxt µ


H0: µ = 120 Ha: µ < 120 α =0.05 df =100-1=99 Critical Value(s): -1.66 EXCEL: t99, .05=-TINV(0.1,99)

Test Statistic:

Decision:

08.2100/24120115

/0 −=

−=

−=

nsxt µ


H0: µ = 120 Ha: µ < 120 α =0.05 df =100-1=99 Critical Value(s): -1.66 EXCEL: t99, .05=-TINV(0.1,99)

Test Statistic: Decision:

08.2100/24120115

/0 −=

−=

−=

nsXt µ

66.108.2 −<−Reject H0 at significant level 0.05 and the true mean birth weight is significantly lower in this hospital than in the general population.

EX Cardiovascular Disease ( two-sided t-test)

Test the hypothesis that the mean cholesterol level of recent female Asian immigrants is different from the mean in the general U.S. population 190 mg/dL. Blood tests are performed on 100 female Asian immigrants, the mean level mg/dL with standard deviation s=40 mg/dL

52.181=x



H0: µ = 190 vs Ha: µ ≠ 190

52.181=x



H0: µ = 190 vs Ha: µ ≠ 190

52.181=x

12.2100/40

19052.181/

0 −=−

=−

=ns

xt µ



H0: µ = 190 vs Ha: µ ≠ 190 12.2

100/4019052.181

/0 −=

−=

−=

nsxt µ

52.181=x

α =0.05, the critical values are t99, .025 =-TINV(0.05,99) = -1.98, so t99,

.975=1.98.



H0: µ = 190 vs Ha: µ ≠ 190 12.2

100/4019052.181

/0 −=

−=

−=

nsxt µ

52.181=x

0 t99, .025 t99, .975


.975=1.98. Thus t=-2.12 is in the rejection region.



H0: µ = 190 vs Ha: µ ≠ 190 12.2

100/4019052.181

/0 −=

−=

−=

nsxt µ

52.181=x

0 t99, .025 t99, .975


.975=1.98. Thus t=-2.12 is in the rejection region. H0 is rejected at 5% level of significance.

Conclusion: the mean cholesterol level of recent Asian immigrants is significantly different from the mean for the general U.S. population

Determination of Statistical Significance

Critical-value method Calculate the critical value from α and t

distribution p-value method Once we calculate the actual t statistics, we can

ask what is the smallest significance level at which we could still reject H0 (observed level of significance)

p-value

Def.: Probability of obtaining a test statistic as extreme or more extreme than actual sample value given H0 is true

For H1: µ < µ0 p-value=P(tn-1<=tobs) For H1: µ > µ0 p-value=P(tn-1>= tobs)

For H1: µ = µ0 p-value= 2P(tn-1<= tobs), if tobs <=0 2(1-P(tn-1<= tobs)), if tobs >0 Used to make rejection decision If p-value ≥ α, do not reject H0 If p-value < α, reject H0

Birth Weight Example

The mean birth weight in the United States is 120oz. You get a list of birth weights from 100 consecutive, full-term, live-born deliveries from the maternity ward of a hospital in a low-SES area.

The sample mean birth weight is = 115 oz and standard deviation is s=24 oz.

Find the p-value.

x

One-sided test: 1. t value of sample statistic (observed)

Z 0 -2.08

08.2100/24120115

/0 −=

−=

−=

nsxt µ

One-sided test:

Use alternative hypothesis to find direction p-value is P(t99 <=-2.08) = .020 EXCEL function: TDIST(2.08,99,1)=0.02

Z 0 -2.08

p-Value=.02 tn-1 distribution

Pvalue< α, so H0 is rejected and we conclude that the true mean birth weight is significantly lower in this hospital than in the general population.



H0: µ = 190 vs Ha: µ ≠ 190

What is the pvalue?

52.181=x

Two-sided test: 1. t value of sample statistic (observed)

Z 0 -2.12

12.2100/40

19052.181/

0 −=−

=−

=ns

xt µ

Two-sided test:

p-value is 2*P(t99 ≤ -2.12) =TDIST(2.12,99,2)=0.037< α

t 0 -2.12

1/2 p-Value=.0185

Two-sided test:

p-value is 2*P(t99 ≤ -2.12) =TDIST(2.12,99,2)=0.037< α

Conclusion: the mean cholesterol level of recent Asian immigrants is significantly different from the mean for the general U.S. population

Z 0 2.12 -2.12

1/2 p-Value=.0185

1/2 p-Value=.0185

Relationship Between Hypothesis Testing and Confidence Intervals (2-sided

case)

p-value tells exactly how significant results are CI gives range of values that may contain µ

H0: µ = µ0 versus H1: µ = µ1 ≠ µ0 H0 is rejected with level α 100% × (1- α) CI does

not contain µ0

Hypothesis Tests vs. Confidence Intervals

There are three ways to test hypotheses (assume α = 0.05): 1. By critical value method 2. By computing p-value 3. By constructing confidence interval

EX Cardiovascular Disease ( two-sided t-test) CI method


H0: µ = 190 vs Ha: µ ≠ 190

0 181.52 190 2.12/ 40 / 100

xts n

µ− −= = = −

181.52x =

α =0.05, the critical values are t99, .025 =-TINV(0.05,99) = -1.98, so t99, .975=1.98. Then the 95% CI for µ is

Conclusion: since it does not include µ = 190, we conclude that the mean cholesterol level of recent Asian immigrants is significantly different from the mean for the general U.S. population

181.52 1.98*40 / 100 [173.6,189.4]± =

Hypothesis Testing: Two-Sample Inference

We’ll learn… How to use hypothesis testing for comparing the

difference between 1. The means of two independent populations 2. The means of two related populations 3. The variances of two populations

Two-sample Inference

Possible scenarios: 1. Two independent samples From two independent populations

2. Paired samples: Single population (before/after measurements) Two related populations (matched pairs)

Example

Let’s say we are interested in the relationship between use of oral contraceptives (OC) and level of blood pressure (BP) in women.

1. Follow up non-OC users and measure the change when they become OC user :

Same patients: each woman is used as her own

control, so observed difference more likely to be due to OC use

Hard to follow up all patients Expensive

Longitudinal study; paired samples from single population

Example

Let’s say we are interested in the relationship between use of oral contraceptives (OC) and level of blood pressure (BP) in women.

2. Measure the difference in BP between a group of OC users and a group of non-OC users:

The participants are seen at only one visit, could

be very different due to other factors such as age Financially feasible

Cross-sectional: two independent samples

The Paired t Test

Related Populations: Paired t Test

Tests means of 2 related populations Paired samples Repeated measures (before/after) Use difference between paired values

Assumptions: Both populations are normally distributed If not normal, use large samples (n>=30)

Want to study the effect of OC on blood pressure. Study design:

Recruit 10 women who are not using OC. Follow-up after 1 year of using OC. Interested in knowing BP difference.

Example

SBP Level: 10 women not using OC using OC Difference (baseline) 1 115 128 13 2 112 115 3 3 107 106 -1 4 119 128 9 … … …. ….

Mean Difference, σD Known

The ith paired difference is The point estimate for the population mean paired difference is D :

Suppose the population standard deviation of the difference scores, σD, is known

The test statistic for the mean difference is a Z value:

Di = X1i - X2i

D =

Dii=1

n

∑n , n is the number of pairs in the paired sample

Z =D − µD

σD

n Where µD = hypothesized mean difference σD = population standard dev. of differences n = the sample size (number of pairs)

If σD is unknown, we can estimate the unknown population standard deviation with a sample standard deviation:

Use a paired t test, the test statistic for D is now a t statistic, with n-1 d.f.:

1n

)D(DS

n

1i

2i

D −

−=

∑=

Mean Difference, σD Unknown

t =D − µD

SD

n

Lower-tail test:

H0: µD = 0 H1: µD < 0

Upper-tail test:

H0: µD = 0 H1: µD > 0

Two-tail test:

H0: µD = 0 H1: µD ≠ 0

Hypothesis Testing for Mean Difference, σD Unknown

α α/2 α/2 α

-tα -tα/2 tα tα/2

Reject H0 if t < -tα Reject H0 if t > tα Reject H0 if t < -tα/2 or t > tα/2

Where t has n - 1 d.f.

Confidence Interval

where n = the sample size (number of pairs in the paired sample)

The confidence interval for µD is 1. σD known

2. σD unknown

D − Z

σD

n, D + Z

σD

n

D − tn−1

SD

nD + tn−1

SD

n

Paired t Test Example

SBP Level: 10 women not using OC using OC Difference, Di 1 115 128 13 2 112 115 3 3 107 106 - 1 4 119 128 9 5 115 122 7 6 138 145 7 7 126 132 6 8 105 109 4 9 104 102 -2 10 115 117 2

D =

Di∑n

=4.8, SD =(Di − D)2∑n − 1

= 4.566

Has the use of OC made a difference in their blood pressure (at the 0.01 level)?

H0: µD = 0 H1: µD ≠ 0

Critical Value = ± 2.26

d.f. = 10 - 1 = 9

Reject

α/2 t9,0.025 t9,0.975

- 2.26 2.26

Paired t Test: Solution

Reject

α/2 α = .01

EXCEL: t9,0.975 =TINV(0.05,9)=2.26


H0: µD = 0 H1: µD ≠ 0

Test Statistic:


d.f. = 10 - 1 = 9

Reject

α/2 t9,0.025 t9,0.975

- 2.26 2.26


Reject

α/2

3.32

α = .01

t =

D − µD

SD / n=

4.8 − 04.566/ 10

= 3.32

EXCEL: t9,0.975 =TINV(0.05,9)=2.26


4.8 D =

H0: µD = 0 H1: µD ≠ 0

Reject

α/2 - 2.26 2.26

Decision: reject H0 (t stat is in the reject region)

Conclusion: There is a significant change in the blood pressure.


Reject

α/2

3.32 α = .01

Test Statistic:


d.f. = 10 - 1 = 9

3.32

t =

D − µD

SD / n=

4.8 − 04.566/ 10

= 3.32

Confidence Interval for the True Difference (µD) Between the Underlying means of

Two Paired Samples

So in the above example: 95% Confidence Interval is [1.53, 8.07]

Install the Excel 2007 Analysis ToolPak

http://www.dummies.com/how-to/content/how-to-

install-the-excel-2007-analysis-toolpak.html

Although the Analysis ToolPak comes with Excel 2007, it doesn’t come pre-installed. Follow the following link to install it in Excel. After installation, please restart Excel, then you will see the Data Analysis button in the Analysis group added to the end of the Ribbon’s Data tab.

http://www.dummies.com/how-to/content/how-to-install-the-excel-2007-analysis-toolpak.html


















EXCEL Paired T-test Analysis

EXCELDataData Analysist-Test: Paired Two Sample for Means

EXCEL Paired T-test Analysis Results

t-Test: Paired Two Sample for Means

Variable 1 Variable 2

Mean 120.4 115.6

Variance 174.9333 106.2667

Observations 10 10

Pearson Correlation 0.954777

Hypothesized Mean Difference 0

df 9

t Stat 3.324651

P(T<=t) one-tail 0.004437

t Critical one-tail 1.833113

P(T<=t) two-tail 0.008874

t Critical two-tail 2.262157

Independent Samples

Independent Samples

Different data sources Unrelated Independent Sample selected from one population has no

effect on the sample selected from the other population

Goal: Test hypothesis or form a confidence interval for the difference between two population means, µ1 –µ2

The point estimate for the difference is X1 – X2

Difference Between Two Means

Population means, independent

samples

σ1 and σ2 known

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

Difference Between Two Means


samples

σ1 and σ2 known Use a Z test statistic

Use Sp to estimate unknown σ , use a t test statistic and pooled standard deviation



Use S1 and S2 to estimate unknown σ1 and σ2 , use a separate-variance t test


samples

σ1 and σ2 known

σ1 and σ2 Known

Assumptions:

Samples are randomly and independently drawn

Population distributions are normal or both sample sizes are ≥ 30

The test statistic is a Z-value…

Population standard deviations are known




samples

σ1 and σ2 known

…and the standard error of X1 – X2 is

σ

X1 − X2=

σ12

n1

+σ 2

2

n2

(continued)



σ1 and σ2 Known


samples

σ1 and σ2 known

Z =X1 − X2( )− µ1 − µ2( )

σ12

n1

+σ 2

2

n2

The test statistic for µ1 –µ2 is:

(continued)



σ1 and σ2 Known

Hypothesis Tests for Two Population Means

Lower-tail test:

H0: µ1 = µ2 H1: µ1 < µ2

i.e.,

H0: µ1 – µ2 = 0 H1: µ1 – µ2 < 0

Upper-tail test:

H0: µ1 = µ2 H1: µ1 > µ2

i.e.,

H0: µ1 – µ2 = 0 H1: µ1 – µ2 > 0

Two-tail test:

H0: µ1 = µ2 H1: µ1 ≠ µ2

i.e.,

H0: µ1 – µ2 = 0 H1: µ1 – µ2 ≠ 0

Two Population Means, Independent Samples

Lower-tail test: H0: µ1 – µ2 = 0 H1: µ1 – µ2 < 0

Upper-tail test: H0: µ1 – µ2 = 0 H1: µ1 – µ2 > 0

Two-tail test: H0: µ1 – µ2 = 0 H1: µ1 – µ2 ≠ 0

α α/2 α/2 α

-zα -zα/2 zα zα/2

Reject H0 if Z < -Zα Reject H0 if Z > Zα Reject H0 if Z < -Zα/2 or Z > Zα/2

Hypothesis tests for µ1 – µ2


samples

σ1 and σ2 known

X1 − X2( )± Zσ1

2

n1

+σ 2

2

n2

The confidence interval for µ1 –µ2 is:

Confidence Interval, σ1 and σ2 Known




samples

σ1 and σ2 known

σ1 and σ2 Unknown, Assumed Equal

Assumptions: Samples are randomly and independently drawn Populations are normally distributed or both sample sizes are at least 30 Population variances are unknown but assumed equal




samples

σ1 and σ2 known

(continued)

Forming interval estimates:

The population variances are assumed equal, so use the two sample variances and pool them to estimate the common σ2

The pooled variance




Sp

2 =n1 − 1( )S1

2 + n2 − 1( )S22

(n1 − 1) + (n2 − 1)


samples

σ1 and σ2 known

t =X1 − X2( )− µ1 − µ2( )

Sp2 1

n1

+1n2

The test statistic for µ1 –µ2 is a t value with (n1 + n2 – 2) degrees of freedom

(continued)





samples

σ1 and σ2 known X1 − X2( )± tn1 +n2 -2 Sp

2 1n1

+1n2


Confidence Interval, σ1 and σ2 Unknown



Pooled-Variance t Test: Example

Example: Compare the mean systolic pressure between OC and non-OC users.

Assuming both populations are approximately normal with equal variances, is there a difference in average SBP (α = 0.05)?

SBP level ID not using OCs using OC 1 115 128 2 112 115 3 107 106 4 119 128 5 115 122 6 138 145 7 126 132 8 105 109 9 104 102

10 115 117

EXCEL DATA Analysis

With raw data, we can perform the t-test using the following analysis EXCELDataData Analysist-Test: Two

Sample Assuming Equal Variances

EXCEL T-test Analysis Results t-Test: Two-Sample Assuming Equal Variances

Variable 1 Variable 2 Mean 115.6 120.4 Variance 106.2666667 174.9333333

Observations 10 10

Pooled Variance 140.6

Hypothesized Mean Difference 0 df 18 t Stat -0.905177144

P(T<=t) one-tail 0.188664198


P(T<=t) two-tail 0.377328397


Sample results

Solution

H0: µ1 – µ2 = 0 Ha: µ1 – µ2 ≠ 0 α = 0.05 df = (10-1)+(10-1)=18 Critical Value(s): 2.1 TINV(0.05, 18)=2.1

Test Statistic: -0.91

Decision: Do not reject at α = 0.05 Conclusion: we conclude that the mean blood pressures of the OC an

d non-OC groups do not significantly differ from each

other. Pvalue=0.38>0.05, do not reject

H0 t 2.1 -2.1

.025

Reject H0

.025

Reject H0

0

Calculating the Test Statistic

( ) ( ) ( ) ( )− + − − + −= = =

− + − + −

2 21 1 2 22

p1 2

n 1 S n 1 S 10 1 106.3 10 1 174.9S 140.6

(n 1) (n 1) (10-1) (10 1)

( ) ( ) ( )µ µ− − − − −= = =

++

1 2 1 2

2p

1 2

X X 115.6 120.4 0t -0.91

1 11 1 140.6S 10 10n n

The test statistic is:


samples

σ1 and σ2 known

σ1 and σ2 Unknown, Not Assumed Equal

Assumptions: Samples are randomly and independently drawn Populations are normally distributed or both sample sizes are at least 30 Population variances are unknown but cannot be assumed to be equal




samples

σ1 and σ2 known

t =X1 − X2( )− µ1 − µ2( )

S12

n1

+S2

2

n2

The test statistic for µ1 –µ2 is:

(continued)



σ1 and σ2 Unknown, Unequal variances


samples

σ1 and σ2 known

(continued)

Forming the test statistic:

The population variances are not assumed equal, so include the two sample variances in the computation of the test statistic

The test statistic can be approximated by a t distribution with v degrees of freedom (see next slide)



σ1 and σ2 Unknown, Not Assumed Equal


samples

σ1 and σ2 known

The number of degrees of freedom is the integer portion of:

(continued)

ν =

S12

n1

+S2

2

n2

2

S12

n1

2

n1 − 1( )+S1

2

n1

2

n2 − 1( )



σ1 and σ2 Unknown, Assumed Unequal (Satterthwaite’s method)


samples

σ1 and σ2 known


(continued)



σ1 and σ2 Unknown, Unequal variances

X1 − X2( )± tν

S12

n1

+S2

2

n2

Unequal Variances: Example


Assuming both populations are approximately normal with unequal variances, test for the equality of the mean cholesterol levels of the children Group1 and Group2 (α = 0.05).


10 115 117

EXCEL DATA Analysis

EXCELDataData Analysist-Test: Two Sample Assuming Unequal Variances

EXCEL DATA Analysis

t-Test: Two-Sample Assuming Unequal Variances


Mean 115.6 120.4

Variance 106.2667 174.9333

Observations 10 10

Hypothesized Mean Difference 0

df 17

t Stat -0.90518

P(T<=t) one-tail 0.189011


P(T<=t) two-tail 0.378022


Solution

H0: µ1 – µ2 = 0 Ha: µ1 – µ2 ≠ 0 α = 0.05 Degree of freedom?? Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Solution

H0: µ1 – µ2 = 0 Ha: µ1 – µ2 ≠ 0 α = 0.05 Degree of freedom=17 Critical Value(s): from the Table 5,

t17,0.975=2.11

Test Statistic: -0.91 Decision: reject at α = 0.05 Conclusion: we conclude that the mean blood pressures of the OC and

non-OC groups do not significantly differ from each ot

her. .

t 0 2.11 -2.11

.0025

Reject H0

.0025

Reject H0

Calculating the degree of freedom

The d.f (integer portion) is: 17

( ) ( )ν

+

=

− + −

+ =

+

22 21 2

1 22 22 2

1 11 2

1 1

2

2 2

S Sn n

S Sn 1 n 1n n

106.27 174.9310 10 = 17

106.27 174.939 910 10

( ) ( )

( )

µ µ− − −=

+

− −= =

+

1 2 1 2

2 21 2

1 2

X Xt

S Sn n

174.10 93 0-0.91

174.9310 1

6.27

0106.27

Test Statistic:

Comparing population variances: F test

Testing for the equality of Two Variances

Suppose that we have two independent samples from a N(µ1, σ1

2) and N(µ2, σ22) populations, respectively.

We want to test the following hypotheses: H0: σ1

2 = σ22

H1: σ12 ≠ σ2

2

Test Statistic (based on the F-distribution):

Reject H0 if the variance ratio is either too large (>> 1) or too small (<< 1) and accept otherwise

Follows the F-distribution (Fisher, Snedecor)

S12

S22

F distribution

follows F distribution under H0: σ12 = σ2

2

F is a family of distributions, parameterized by df1 = n1-1 and df2 =

n2-1 Denoted

Always positive

(generally positively skewed)

Reject if

F =

S12

S22

F =

S12

S22 ~ Fn1 −1,n2 −1

F > Fn1 −1,n2 −1,1−α / 2 or F < Fn1 −1,n2 −1,α / 2

F Test for Two Variances Critical Values

Rejection region

Rejection region

Acceptation region

2/,1,1 21 α−− nnF 2/1,1,1 21 α−−− nnF

F > Fn1 −1,n2 −1,1−α / 2 or F < Fn1 −1,n2 −1,α / 2 Reject if

Unequal Variances: Example


Test for the equality of the two variances (α = 0.05).


10 115 117

EXCEL DATA Analysis

EXCELDataData AnalysisF-Test: Two Sample for Variances Sample Output

F-Test Two-Sample for Variances


Mean 120.4 115.6

Variance 174.9333 106.2667

Observations 10 10

df 9 9

F 1.646173

P(F<=f) one-tail 0.234645

F Critical one-tail 3.178893

NOTE: • This Excel procedure is only for one-sided test.

• If the test is two-sided you have two options. First, you can divide the given value of by 2, and input the result as the level of significance. The second option is to always use the p-value criterion and for a two-sided test, multiply the one-sided p-value by 2.

α

Two Sample Inference

Two independent populations or paired samples? Paired samples: Paired z-test (if σ known) or paired t-test (if σ unknown) based

on observed differences

Independent samples: z-test if population variances are given Otherwise, F test to determine if population variances can be assumed equal If yes, t-test with pooled estimate of variance If no, t-test with separate estimates of variance

Date post:	14-Jul-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Basic Business Statistics, 10/efenyolab.org/...Biostatistics.../slides/IBB2015_13.pdf · t-Test...

Documents