+ All Categories
Home > Documents > The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution...

The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution...

Date post: 24-Jul-2020
Category:
Upload: others
View: 21 times
Download: 0 times
Share this document with a friend
25
The normal curve Approximating data with the normal distribution Summary The normal distribution Patrick Breheny March 3 Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 1 / 25
Transcript
Page 1: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

The normal distribution

Patrick Breheny

March 3

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 1 / 25

Page 2: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

A common histogram shape

Histograms of infant mortality rates, heights, and cholesterol levels:

Africa

Infant mortality rate

Fre

quen

cy

0 20 40 60 80 100

0

5

10

15

NHANES (adult women)

Height (inches)

Fre

quen

cy

55 60 65 70

0

200

400

600

NHANES (adult women)

LDL cholesterol

Fre

quen

cy

50 100 150 200 250

0

100

200

300

400

500

600

700

What do these histograms have in common?

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 2 / 25

Page 3: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

The normal curveMathematicians discovered long ago that the equation

y = 1√2πe−x2/2

described the histograms of many random variables

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

y

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 3 / 25

Page 4: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

Features of the normal curve

• The normal curve is symmetric around x = 0• The normal curve drops rapidly down near zero as x movesaway from 0• The normal curve is always positive

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 4 / 25

Page 5: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

The normal curve in action

Africa

Infant mortality rate (standard units)

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

NHANES (adult women)

Height (standard units)

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

NHANES (adult women)

LDL cholesterol (standard units)

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

• Technical note: The data has been standardized and the verticalaxis is now “density”• Data whose histogram looks like the normal curve are said tobe normally distributed or to follow a normal distribution

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 5 / 25

Page 6: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

Probabilities from the normal curveProbabilities are given by the area under the normal curve:

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 6 / 25

Page 7: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

The 68%/95% rule

This is where the 68%/95% rule of thumb that we discussed earliercomes from:

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5P=68%

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5P=95%

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5P=100%

x

Den

sity

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 7 / 25

Page 8: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

Calculating probabilities

• By knowing that the total area under the normal curve is 1,we can get a rough idea of the area under a curve by lookingat a plot• However, to get exact numbers, we will need a computer• “How much area is under this normal curve?” is an extremelycommon question in statistics, and programmers havedeveloped algorithms to answer this question very quickly• The output from these algorithms is commonly collected intotables, which is what you will have to use for exams

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 8 / 25

Page 9: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

Calculating the area under a normal curve, example 1Find the area under the normal curve between 0 and 1

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

.84− .5 = .34

> pnorm(1) - pnorm(0)[1] 0.3413447

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 9 / 25

Page 10: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

Calculating the area under a normal curve, example 2Find the area under the normal curve above 1

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

1− .84 = .16

> 1-pnorm(1)[1] 0.1586553

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 10 / 25

Page 11: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

Calculating the area under a normal curve, example 3Find the area under the normal curve that lies outside -1 and 1

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

• 1 - (.84-.16) = .32• Alternatively, we could have used symmetry: 2(.16)=.32

> 2*pnorm(-1)[1] 0.3173105

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 11 / 25

Page 12: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

Calculating percentiles

• A related question of interest is, “What is the xth percentileof the normal curve?”• This is the opposite of the earlier question: instead of beinggiven a value and asked to find the area to the left of thevalue, now we are told the area to the left and asked to findthe value• With a table, we can perform this inverse search by findingthe probability in the body of the table, then looking to themargins to find the percentile associated with it

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 12 / 25

Page 13: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

Calculating percentiles (cont’d)

• What is the 60th percentile of the normal curve?• There is no “.600” in the table, but there is a “.599”, whichcorresponds to 0.25• The real 60th percentile must lie between 0.25 and 0.26 (it’sactually 0.2533)• For this class, 0.25, 0.26, or anything in between is anacceptable answer• Or we could obtain an exact answer from R:

> qnorm(0.6)[1] 0.2533471

• How about the 10th percentile?• The 10th percentile is -1.28

> qnorm(0.1)[1] -1.281552

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 13 / 25

Page 14: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

IntroductionCalculating probabilities and percentiles from the normal curve

Calculating values such that a certain area lieswithin/outside them

Find the number x such that the area outside −x and x is equal to10%

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

Our answer is therefore ±1.645 (the 5th/95th percentile)

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 14 / 25

Page 15: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

Reconstructing a histogram

• In week 3, we said that the mean and standard deviationprovide a two-number summary of a histogram• We can now make this observation a little more concrete• Anything we could have learned from the full data set, we willnow determine by approximating the real distribution of thedata by the normal distribution• This approach is called the normal approximation

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 15 / 25

Page 16: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

NHANES adult women

• The data set we will work with on these examples is theNHANES sample of the heights of 2,649 adult women• The mean height is 63.5 inches• The standard deviation of height is 2.75 inches

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 16 / 25

Page 17: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

Procedure: Probabilities using the normal curve

The procedure for calculating probabilities with the normalapproximation is as follows:#1 Draw a picture of the normal curve and shade in the

appropriate probability#2 Convert to standard units: letting x denote a number in the

original units and z a number in standard units,

z = x− x̄SD

where x̄ is the mean and SD is the standard deviation#3 Determine the area under the normal curve using a table or

computer

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 17 / 25

Page 18: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

Estimating probabilities: Example # 1

• Suppose we want to estimate the percent of women who areunder 5 feet tall• 5 feet, or 60 inches is 1.27 standard deviations below themean: (60− 63.5)/2.75 = −1.27• Using the normal distribution, the probability of more than1.27 standard deviations below the mean isP (x < −1.27) = 10.2%• In the actual sample, 282 out of 2,649 women were under 5feet tall, which comes out to 10.6%

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 18 / 25

Page 19: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

Estimating probabilities: Example # 2

• Another example: suppose we want to estimate the percent ofwomen who are between 5’3 and 5’6 (63 and 66 inches)• These heights are 0.18 standard deviations below the meanand 0.91 standard deviations above the mean, respectively• Using the normal distribution, the probability of falling in thisregion is 39.0%• In the actual data set, 1,029 out of 2,649 women werebetween 5’3 and 5’6: 38.8%

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 19 / 25

Page 20: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

Procedure: Percentiles using the normal curve

• We can also use the normal distribution to approximatepercentiles• The procedure for calculating percentiles with the normalapproximation is as follows:#1 Draw a picture of the normal curve and shade in the

appropriate area under the curve#2 Determine the percentiles of the normal curve corresponding

to the shaded region using a table or computer#3 Convert from standard units back to the original units:

x = x̄+ z(SD)

where, again, x is in original units, z is in standard units, x̄ isthe mean, and SD is the standard deviation

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 20 / 25

Page 21: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

Approximating percentiles: Example

• Suppose instead that we wished to find the 75th percentile ofthese women’s heights• For the normal distribution, 0.67 is the 75th percentile• The mean plus 0.67 standard deviations in height is 65.35inches• For the actual data, the 75th percentile is 65.39 inches

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 21 / 25

Page 22: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

The broad applicability of the normal approximation

• These examples are by no means special: the distribution ofmany random variables are very closely approximated by thenormal distribution• Indeed, this is why statisticians call it the “normal”distribution• Other names for the normal distribution include the Gaussiandistribution (after its inventor) and the bell curve (after itsshape)• For variables with approximately normal distributions, themean and standard deviation essentially tell us everythingabout the data – other summary statistics and graphics areredundant

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 22 / 25

Page 23: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

Caution

• Other variables, however, are not approximated by the normaldistribution well, and give misleading or nonsensical resultswhen you apply the normal approximation to them• For example, the value 0 lies 1.22 standard deviations belowthe mean infant mortality rate for Europe• The normal approximation therefore predicts a probability that11% of the countries in Europe will have negative infantmortality rates

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 23 / 25

Page 24: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

Caution (cont’d)

• As another example, the normal distribution will always predictthe median to lie 0 standard deviations above the mean• i.e., it will always predict that the median equals the mean• As we have seen, however, the mean and median can differgreatly when distributions are skewed• For example, according to the U.S. census bureau, the meanincome in the United States is $50,413, while the medianincome is $33,706

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 24 / 25

Page 25: The normal distribution - MyWeb · The normal curve Approximating data with the normal distribution Summary The normal distribution PatrickBreheny March3 Patrick Breheny University

The normal curveApproximating data with the normal distribution

Summary

Summary

• The distribution of many random variables are very closelyapproximated by the normal distribution• Know how to calculate the area under the normal curve• Know how to determine percentiles of the normal curve• Know how to approximate probabilities for real data using thenormal approximation• Know how to approximate quantiles for real data using thenormal approximation

Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 25 / 25


Recommended