Z-test and t-test Xuhua Xia [email protected] .

z-test and t-test

Xuhua Xia

[email protected]

http://dambe.bio.uottawa.ca

Xuhua Xia

68.27% of the measurements lie within the range of ,95.44% lie within 2,99.73% lie within 3,

50% lie within 0.67,95% lie within 1.96,97.5% lie within 2.24,99% lie within 2.58,99.5% lie within 2.81,99.9% lie within 3.29.

Given = 70kg and = 10kg for a normal distribution (of body weight), what is the probability of a body weight of 40 kg belonging to the population?

The normal deviate:

Standard deviation and Standard Error of the mean:

The standard deviate pertaining to the normal distribution of means:

iXZ

X

iXZ

nnX

2

Properties of a Normal Distribution

Xuhua Xia

1.96i

X

XZ

The z-score

The government has certain regulations on commercial product. Suppose that packages of sugar labeled as 2 kg should have a mean weight of 2 kg and a standard deviation equal to 0.10. If a package of sugar labeled 2 kg that you bought from a store has a weight of 1.82 kg, what is the z score? Can you present the package as evidence that the manufacturer has violated the government regulation?

Xuhua Xia

050

100150200250300350

29

.91

36

.32

42

.74

49

.15

55

.57

61

.98

68

.40

74

.81

81

.23

87

.64

94

.06

10

0.4

7

10

6.8

9

Body Weight

Fre

qu

en

cy

Body Weight of 10,000 Adult MenMean = 70 kg, Std Dev = 10 kg

Normal Distribution

Xuhua Xia

050

100150200250300350

29

.91

36

.32

42

.74

49

.15

55

.57

61

.98

68

.40

74

.81

81

.23

87

.64

94

.06

10

0.4

7

10

6.8

9

Body Weight

Fre

qu

en

cy

n

ssx

Frequency Distribution of Means

Xuhua Xia

Is the mean difference significantly larger than 0?

96.1147.275.9

933.20

X

iXZ

75.915744.37

nX

Wrong method assuming normal distribution:

= 20.933; = 37.744; n = 15;

Therefore, the mean difference is significantly larger than zero, i.e., inbreeding does reduce seed production.

Darwin’s Breeding Experiment

Species Outbreed Inbreed Difference 1 100 51 49 2 222 289 -67 3 121 113 8 4 433 417 16 5 222 216 6 6 111 88 23 7 534 506 28 8 432 391 41 9 99 85 14 10 445 416 29 11 112 56 56 12 333 309 24 13 222 147 75 14 422 362 60 15 101 149 -48

Xuhua Xia

I may premise that if we took by chance a dozen or score of men belonging to two nations and measured them, it would I presume be very rash to form any judgment from such small numbers on their (the nation’s) average heights. But the case is somewhat different with my … plants, as they were exactly of the same age, were subjected from first to last to the same conditions, and were descended from the same parents. -- Darwin, quoted in Fisher’s The design of experiments.

Problem of Small Samples

Species Outbreed Intbreed Difference 1 100 51 49 2 222 289 -67 3 121 113 8 4 433 417 16 5 222 216 6 6 111 88 23 7 534 506 28 8 432 391 41 9 99 85 14 10 445 416 29 11 112 56 56 12 333 309 24 13 222 147 75 14 422 362 60 15 101 149 -48

Xuhua Xia

050

100150200250300350

29

.91

36

.32

42

.74

49

.15

55

.57

61

.98

68

.40

74

.81

81

.23

87

.64

94

.06

10

0.4

7

10

6.8

9

Body Weight

Fre

qu

en

cy

Normal distribution

t distribution

t distribution is wider and flatter than the normal distribution

William S. Gosset & t Distribution

Xuhua Xia

t distribution• The t distribution depends on the degree of freedom (DF). For

Darwin’s data with a sample size = 15, DF = 15 - 1 = 14.• With the t distribution with DF = 14, we expect 95% of the

observations should fall within the range of mean 2.145 STD.

• Remember that for a normal distribution, 95% of the observations are expected to fall within the range of 1.96 .

• For pair-sample t-test with the null hypothesis being Mean1 = Mean2 (or MeanD = 0):

0 20.9332.147 2.145

9.75X

Dt

s

Xuhua Xia

T-Test• T-Test can be used to test

– the difference in mean between two samples (paired or unpaired),– a sample mean against a mean of a known population (e.g., the

concentration of a medicine set as a standard by the government), – whether a single individual observation belong to a sample with

sample size larger than one.

• The normal distribution and the Student’s t distribution. Why should the statistic t take into consideration both the mean difference and the variance?

• How to apply the test using Excel or SAS.• The assumptions.• Alternative methods: Wilcoxon rank-sum test or Mann-

Whitney U test.

Xuhua Xia

1 2

Xpooled S

X Xt

Same variance,smaller mean difference

Same mean difference,larger variance

The Essence of the t Statistic

-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6

-18 -12 -6 0 6 12 18

Xuhua Xia

More on variance and SE

1 2 1

1 2 1 2

1 2 1 2

2 2 22

2 2 2

2 2 2 1 2 1 2

1 2 1 2

( )

( )

A better estimate:

( )

x x x x

x x x x

x x x x

s E s s

s E s s

SS SS SS SSs E s s E

DF DF DF DF

Two independent variables: x1, x2 sampled from two normal distributions

1 2

1 2

1 2

1 2

1 2

1 2

1 2

1 2 1 2

1 2

2 2

1 2

2 2

1 2

2 2

1 21 2

2 2

1 2

;

:

, but both large:

Estimate of assuming equal variance:

x xx x

x xx x

x xx x

x x

x x x xx x

s sS S

n n

s swith n n n S

n

s swith n n S

n n

S

s sS

n n

Xuhua Xia

Sample 1 Sample 2

Sample size n1 n2

Mean 1x 2x

Standard dev. s1 s2

Sample size 7 7

Mean 76.857 82.714

Standard dev. 2.545 3.147

828.3

7147.3545.2

)714.82857.76(22

t

Df = (7-1) + (7-1) = 12

Computation for unpaired t-test

1 2

1 2

x x

x xt

S

1 2

1 2

1 2

1 2

1 2

1 2 1 2

1 2

2 2

1 2

2 2

1 21 2

2 2

1 2

:

, but both large:

Estimate of assuming equal variance:

x xx x

x xx x

x x

x x x xx x

s swith n n n S

n

s swith n n S

n n

S

s sS

n n

Xuhua Xia

How should we allocate the two crop varieties to the plots? What comparison would be fair?

Block 1

Block 2

Block 3

Block 4

Using blocks to reduce confounding environmental factors (Everything else being equal except for the treatment effect) in evaluating the protein content of two wheat variaties.

Paired-sample t-test: 3

1 1 1 1

1 1 1 1

2 2 2 2

2 2 2 2

Block 1

Block 2

Block 3

Block 4

1 2 2 1

2 1 1 2

2 1 2 1

1 2 1 2

Xuhua Xia

The Wilcoxon-Mann-Whitney Test• Statistical significance tests can be grouped into

– Parametric tests, e.g., t-test, ANOVA– Non-parametric tests, e.g., Wilcoxon-Mann-Whitney test,

sign test, runs test.

Xuhua Xia

When to Use Non-parametric Tests• Parametric tests depends on the assumed probability

distributions, e.g., normal distribution, t distribution, etc, and would give misleading results when the assumptions are violated.

• Non-parametric tests are called distribution-free tests and can be used in cases where the parametric tests are inappropriate.

• Parametric tests are more powerful than their non-parametric counterparts when the underlying assumptions are met.

Xuhua Xia

Wilcoxon-Mann-Whitney Test• The Wilcoxon-Mann-Whitney test is the non-

parametric equivalent of the t-test.• The original data are rank-transformed before

applying the test• The test statistic is U

Date post:	14-Jan-2016
Category:	Documents
Upload:	dontae-kirk
View:	220 times
Download:	0 times

Z-test and t-test Xuhua Xia [email protected] .

Documents