+ All Categories
Home > Documents > #05. Z-scores, Normal Distribution - Michigan State …€¦ · Microsoft PowerPoint - #05....

#05. Z-scores, Normal Distribution - Michigan State …€¦ · Microsoft PowerPoint - #05....

Date post: 10-Jul-2018
Category:
Upload: trinhdien
View: 212 times
Download: 0 times
Share this document with a friend
26
STT 200 Arnab Arnab Arnab Arnab Bhattacharjee Bhattacharjee Bhattacharjee Bhattacharjee This note is based on Chapter 6. Acknowledgement: Author is indebted to Dr. Ashoke Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit many of their slides.
Transcript

STT 200

Arnab Arnab Arnab Arnab BhattacharjeeBhattacharjeeBhattacharjeeBhattacharjee

This note is based on Chapter 6.

Acknowledgement: Author is indebted to Dr. Ashoke Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit

many of their slides.

Z-SCORES

Comparison with unit-free measurement

3

How to compare apples with oranges?

• A college admissions committee is looking at the

files of two candidates, one with a total SAT

score of 1500 and another with an ACT score of

22. Which candidate scored better?

• How do we compare things when they are

measured on different scales?

• We need to standardize the values.

4

How to standardize?

• Subtract mean from the value and then divide this

difference by the standard deviation.

• The standardized value = the z-score =���������

��.���.

• z-scores are free of units.

5

z-scores: An Example

Data: 4, 3, 10, 12, 8, 9, 3 (� = 7 in this case)

Mean = (4 + 3 + 10 + 12 + 8 + 9 + 3)/7 = 49/7 = 7.

Standard Deviation = 3.65.

Data z-scores

4 (4– 7)/3.65 = −0.82

3 (3– 7)/3.65 = −1.10

10 (10– 7)/3.65 = 0.82

12 (12– 7)/3.65 = 1.37

8 (8– 7)/3.65 = 0.27

9 (9– 7)/3.65 = 0.55

3 (3– 7)/3.65 = −1.10

6

Interpretation of z-scores

• The z-scores measure the distance of the data values

from the mean in the standard deviation scale.

• A z-score of 1 means that data value is 1 standard

deviation above the mean.

• A z-score of -1.2 means that data value is 1.2 standard

deviations below the mean.

• Regardless of the direction, the further a data value is

from the mean, the more unusual it is.

• A z-score of -1.3 is more unusual than a z-score of 1.2.

7

How to use z-scores?

• A college admissions committee is looking at the files of two candidates, one with a total SAT score of 1500 and another with an ACT score of 22. Which candidate scored better?

• SAT score mean = 1600, std dev = 500.

• ACT score mean = 23, std dev = 6.

• SAT score 1500 has z-score = (1500 − 1600)/500 = −0.2.

• ACT score 22 has z-score = (22 − 23)/6 = −0.17.

• ACT score 22 is better than SAT score 1500.

8

Which is more unusual?

A. A 58 in tall woman

z-score = (58 − 63.6)/2.5 = −2.24.

B. A 64 in tall man

z-score = (64 − 69)/2.8 = −1.79.

C. They are the same.

Heights of adult men have

� mean of 69.0 in.

� std. dev. of 2.8 in.

Heights of adult women have

� mean of 63.6 in.

� std. dev. of 2.5 in.

9

Using z-scores to solve problems

An example using height data and U.S. Marine and

Army height requirements

Question: Are the height restrictions set up by the

U.S. Army and U.S. Marine more restrictive for

men or women or are they roughly the same?

10

Heights of adult women have

• mean of 63.6 in.

• standard deviation of 2.5 in.

Heights of adult men have

– mean of 69.0 in.

– standard deviation of 2.8 in.

Men

Minimum

Women

Minimum

U.S. Army 60 in 58 in

U.S. Marine Corps 64 in 58 in

Height Restrictions

Data from a National Health Survey

11

Men Minimum Women minimum

U.S.

Army

U.S.

Marine

60 in

z-score = -3.21

Less restrictive

58 in

z-score = -2.24

More restrictive

64 in

z-score = -1.79

More restrictive

58 in

z-score = -2.24

Less restrictive

Heights of adult women have

• mean of 63.6 in.

• standard deviation of 2.5 in.

Heights of adult men have

– mean of 69.0 in.

– standard deviation of 2.8 in.

12

Austra Skujyte (Lithunia)

Shot Put = 16.40m,

Long Jump = 6.30m.

Carolina Kluft (Sweden)

Shot Put = 14.77m,

Long Jump = 6.78m.

Shot Put Long Jump

Mean(all contestants)

13.29m 6.16m

Std.Dev. 1.24m 0.23m

� 28 26

2004 Olympics

Women’s Heptathlon

Which performance was better?

13

A. Skujyte’s shot put,

z-score of Skujyte’s shot put = 2.51.

B. Kluft’s long jump,

z-score of Kluft’s long jump = 2.70.

C. Both were same.

Shot Put Long Jump

Mean(all contestant)

13.29m 6.16m

Std.Dev. 1.24m 0.23m

� 28 26

Based on shot put and long jump whose

performance was better?

14

A. Skujyte’s,

z-score: shot put = 2.51, long jump = 0.61.

Total z-score = (2.51+0.61) = 3.12.

B. Kluft’s,

z-score: shot put = 1.19, long jump = 2.70.

Total z-score = (1.19+2.70) = 3.89.

C. Both were same.

NORMAL DISTRIBUTION

Bell shaped symmetric curve

16

Effect of Standardization

• Standardization into z-scores does not change

the shape of the histogram.

• Standardization into z-scores changes the center

of the distribution by making the mean 0.

• Standardization into z-scores changes the spread

of the distribution by making the standard

deviation 1.

17

The Normal Distribution

• In many data-sets, the histogram is symmetric, unimodal and bell-shaped.

• These distributions are known as normal

distribution and the data are said to be normally

distributed.

18

The Histogram of z-scores

If data are normally distributed then

•The histogram of z-scores is also symmetric, unimodal and bell-shaped.

•We can approximate the histogram by a bell-shaped curve called the normal curve.

19

68-95-99.7 (Empirical) Rule

When data are bell shaped, the z-scores of the data

values follow the empirical rule.

20

More on Normal Distribution

68-95-99.7 (Empirical) Rule tells us that if data are

normally distributed, then almost all the data-

points are within plus minus 3 standard deviations

from the mean.

21

Approximately what percent of U.S. women do

you expect to be between 66 in and 67 in tall?

Heights of adult women are normally distributed with

• mean of 63.6 in,

• standard deviation of 2.5 in.

Use TI 83/84 Plus.

• Press [2nd] & [VARS] (i.e. [DISTR])

• Select 2: normalcdf

• Format of command:

normalcdf(lower bound, upper bound, mean, std.dev.)

For this problem: normalcdf(66, 67, 63.6, 2.5) = 0.0816.

i.e. about 8.2% of adult U.S. women have heights between 66

in and 67 in.

22

Approximately what percent of U.S. women do

you expect to be less than 64 in tall?

Heights of adult women are normally distributed with

• mean of 63.6 in,

• standard deviation of 2.5 in.

� Note that here upper bound is 64, but there is no mention of

lower bound.

� So take a very small value for lower bound, say -1000.

For this problem

normalcdf(-1000, 64, 63.6, 2.5) = 0.5636.

i.e. about 56.4% of adult U.S. women have heights less than 64 in.

23

Approximately what percent of U.S. women

do you expect to be more than 58 in tall?

Heights of adult women are normally distributed with

• mean of 63.6 in,

• standard deviation of 2.5 in.

� Note that here lower bound is 58, but there is no

mention of upper bound.

� So take a very high value for upper bound, say 1000.

For this problem

normalcdf(58, 1000, 63.6, 2.5) = 0.987.

i.e. 98.7% of adult U.S. women have heights more than 58

in.

What about men’s height?

24

Heights of adult men are normally distributed with

• mean of 69 in,

• standard deviation of 2.8 in.

onormalcdf(60, 1000, 69, 2.8) = 0.999.

Hence 99.9% adult male will have height more than 60 in.

onormalcdf(64, 1000, 69, 2.8) = 0.963.

So 96.3% adult male will have height more than 64 in.

�Thus for U.S. Army height restriction for women is more restrictive compared to men.

�But for U.S. Marine height restriction for men is more restrictive compared to women.

25

Below what height 80% of U.S. men do

have their heights?

Heights of adult men are normally distributed with

• mean of 69 in,

• standard deviation of 2.8 in.

The question is to find the height x such that

{Percent of men’s height < x} = 80% = 0.8.

Use TI 83/84 Plus.

• Press [2nd] & [VARS] (i.e. [DISTR])

• Select 3: invNorm

• Format of command:

invNorm(fraction, mean, std.dev.)

For this problem: invNorm(0.8, 69, 2.8) = 71.36.

i.e. 80% of U.S. men have heights less than 71.36 in.

Remark: invNorm

26

• invNorm only considers percentage or fraction in the lower

tail of normal distribution.

• For example, suppose the question is

“Above what height 10% of U.S. men

do have their heights?”

Notice here the question is find the height ! such that

{Percent of men’s height > !} = 10% = 0.1.

This means

{Percent of men’s height < !} = (100 − 10)% = 90% = 0.9.

For this problem: invNorm(0.9, 69, 2.8) = 72.59.

i.e. 90% of U.S. men have heights less than 72.59 in,

i.e. 10% of U.S. men have heights more than 72.59 in.


Recommended