The Normal Distribution - Champlain College St....

Post on 30-Apr-2020

32 views 0 download

transcript

1

The Normal Distribution

Quantitative Methods II

Plan

• Probability density functions

• The Standard Normal Distribution

• The z-scores

• The Empirical Rule

• Tables of the Standard Normal Distribution

• Examples

• Normal approximation to the binomial distribution

2

Probability Density Functions

• A PDF, or density, describes the relative likelihood for a random variable to take on a given value.

• The probability of the random variable falling within a particular range of values is given by the area under the density function but above the horizontal axis and between the lowest and greatest values of the range.

Probability Density Functions

3

The Normal Distribution

http://statsthewayilikeit.com/

The Normal Distribution

• The normal distribution is the most important of all the distributions, and it occurs naturally in many instances, in nature and sciences, in particular, in statistics.

• One of the characteristic features of a normal distribution is the bell-shaped curve.

• Each normal distribution has two parameters: the mean 𝜇 (“mu”) and the standard deviation 𝜎 (“sigma”). We denote such a distribution by 𝑁(𝜇, 𝜎).

4

The Standard Normal Distribution

• The standard normal distribution 𝑁(0,1) has the mean zero and the standard deviation 1.

The Empirical Rule

5

The z-score

• Any normal distribution can be converted to the standard normal distribution by means of the z-score.

• The formula: 𝑧 =𝑥−𝜇

𝜎.

• If the z-score is positive, it means that the value of x is bigger than the mean 𝜇. On the graph, it to the right of 𝜇 . And if it is negative, then x is smaller, or to the left of 𝜇.

The z-score

• The z-score allows us to compare data that are scaled differently.

• The value of the z-score tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ.

• Example. Suppose that 𝑋 ~ 𝑁(14, 5) .

• (What does this notation mean?)

6

Example (continued)

• 𝑋 ~ 𝑁(14, 5)

• Suppose that 𝑥 = 22, its z-score

𝑧 =22 − 14

5= 1.6

• It means that the value of x is 1.6 times the standard deviation units to the right of the mean.

• For x = 4, the z-score z = −2. It means that …

The z-score

• If we know the z-score, we can figure out the value of x: 𝑥 = 𝜇 + 𝑧 ∙ 𝜎.

• Example. If 𝑋 ~ 𝑁(64, 12.5), what values of xlie more than 2 standard deviations to the right of the mean?

Answer: 𝑥 > 64 + 2 ∙ 12.5 = 89 .

• Example. What are the (approximate) bounds for the middle 68% of the values of x?

Answer: between 64 − 12.5 = 51.5

and 64 + 12.5 = 76.5

7

Using the normal distribution

The area to the right is complimentary to the area to the left:

𝑃 𝑋 > 𝑥 = 1 − 𝑃(𝑋 < 𝑥)

Tables of the Standard Normal Distribution

• For example:

http://www.mathsisfun.com/data/standard-normal-distribution-table.html

• How to use a table like this ?1. Compute the z-score(s). 2. Find the table value(s) T(z) .3. Decide, what is being asked:

T(z) , 0.5 − T(z), 0.5 + T(z), T(z1) − T(z2), T(z1) + T(z2), …

8

Using the table

Using the table

9

Using the table

Same goes for negative z’s by symmetry !

Using the table

10

Using the table

Examples

• What is the area between z = 0 and z = 1.43 ?

𝑃 0 < 𝑧 < 1.43 = 𝑇 1.43 = 0.4236

• What is the area to the right of z = 0.28 ?

𝑃 𝑧 > 0.28 = 0.5 − 𝑇 0.28

= 0.5 − 0.1103 = 0.3897

11

Examples

• What is the probability that z < ̶ 0.98 ?

𝑃 𝑧 < −0.98 = 0.5 − 𝑇 0.98

= 0.5 − 0.3365 = 0.1635 or 16.35%

• Find the probability that −0.4 < 𝑧 < 0.65 .

𝑃 −0.4 < 𝑧 < 0.65 = 𝑇 0.65 + 𝑇 0.4 =

0.2422 + 0.1554 = 0.3976 or 39.76%

Examples

• What is the area between 𝑧 = −2.03 and 𝑧 = −1.17 ?

𝑃 −2.03 < 𝑧 < −1.17 = 𝑇 2.03 − 𝑇(1.17)

= 0.4788 − 0.3790 = 0.0998

• What is the area to the right of 𝑧 = −0.66 ?

𝑃 𝑧 > −0.66 = 0.5 + 𝑇 0.66

= 0.5 + 0.2454 = 0.7454 etc.

12

Example

• The weights of adult indri lemurs are normally distributed with a mean of 13.8 lbsand a standard deviation of 2.2 lbs.

Given :

𝜇 = 13.8 lbs

𝜎 = 2.2 lbs

Example (continued)

• What is the probability that a randomly selected adult indri weighs less than 15 lbs?

1. Compute the z-score: 𝑧 =15−13.8

2.2= 0.55

2. Draw a bell-shaped curve and indicate the region.

3. Compute the answer: 𝑃 𝑧 < 0.55 = 0.5 + 𝑇 0.55 =

0.5 + 0.2088 = 0.7088 or 70.88%

13

Example (continued)

• What percentage of adult indris weigh more than 18.5 lbs.?

1. Compute 𝑧 =18.5−13.8

2.2= 2.14

2. Draw a bell-shaped curve + region.

3. Compute the answer:𝑃 𝑧 > 2.14 = 0.5 − 𝑇 2.14 =

0.5 − 0.4838 = 0.0162 or 1.62%.

Example (continued)

• Try it yourself !

• What percentage of adult indris weigh more than 12.5 lbs.? Answer: 72.24%

• What is the probability that a randomly selected adult indri weighs less than 13 lbs.?

Answer: 35.94%

14

Example (continued)

What percentage of adult indris weigh between 11.4 and 15.8 lbs.?

1. Compute 𝑧1= 0.91 and 𝑧2 = −1.09

2. Graph the curve and the region.

3. Answer: 𝑃 −1.09 < 𝑧 < 0.91 =

𝑇 0.91 + 𝑇 1.09 = 0.6807 or 68.07%

Try it: what is the probability that an adult indri weighs between 12 and 13 lbs.?

Answer: 15.33%

Finding the percentiles.

• Find the percentile for a data value is about the same as finding the probability that a random data value is below the given one, and rounding off to the nearest percentage.

• For example, if the weight of an adult indri is 15 lbs., then he/she is in the 71st percentile.

• Try it: if an adult indri weighs 11.7 lbs., to which percentile does she or he belong?

Answer: he or she belongs to the 17th percentile.

15

Reading the table backwards

• Suppose that we want to find a data value corresponding to the kth percentile.

• 𝑇(𝑧) is either 𝑘−50

100or

50−𝑘

100(must be > 0).

• Find the value inside the table closest to 𝑇(𝑧), and find the corresponding z-score.

(If there are two closest, take the average.)

• Finally, use the formula 𝑥 = 𝜇 + 𝑧 ∙ 𝜎 to find the data value x.

Example

• Find the weight of an adult indri in the 28th percentile.

• We have: 50−28

100= 0.22 . In the table, the value

most close to 0.22 is 0.2190, corresponding to 𝑧 = 0.58.

• But we must take the negative value 𝑧 = −0.58 , because the data value is to the left of the mean!

• Finally, 𝑥 = 13.8 + −0.58 ∙ 2.2 = 12.52 lbs.

• Try it: the top 33% of adult indris weigh at least how much?

Answer: 14.77 lbs.

16

Example: finding the mean

• The commuting times in Trenton, NJ are normally distributed with a standard deviation of 18 minutes. Find the mean commuting time, if it is known that 23% of all workers spend less than 30 minutes for their commute to work.

• Solution. According to the table, the z-score corresponding to 𝑥 = 30 equals 𝑧 = −0.74 .

Thus, 𝑥 = 𝜇 + 𝑧 ∙ 𝜎 implies that

30 = 𝜇 − 0.74 ∙ 18 , which we can solve for

𝜇 = 30 + 0.74 ∙ 18 = 43.32 minutes .

Recall: The Binomial Distribution

• The binomial distribution is a discrete probability distribution (i.e. the random variable x has only a discrete or a finite set of possible values).

• The binomial distribution requires n

• repeated

• independent• identical

trials. Each trial is either success or failure.

Probability of success = p,

probability of failure = q = 1 – p .

The random variable x = the number of successes.

17

The Binomial Distribution

• The random variable x (the number of successes) has the following measures:

• The mean

𝝁 = 𝒏 ∙ 𝒑• The variance

𝝈𝟐 = 𝒏 ∙ 𝒑 ∙ 𝒒• The standard deviation

𝝈 = 𝝈𝟐 = 𝒏𝒑𝒒

Approximating the Binomial Distribution by the Normal Distribution

• A consequence of the CLT: a Binomial Distribution 𝐵(𝑛, 𝑝) with the number of trials n and the probability of success p can be approximated by the normal distribution 𝑁(𝜇, 𝜎) if the following two conditions hold:

𝑛 ∙ 𝑝 > 5 and 𝑛 ∙ 𝑞 > 5

• Always check these conditions before applying approximation.

18

Continuity Correction Factor

• Instead of x we will use x – 0.5 or x + 0.5 depending on the contextual question.

• “More than 30”: x = 30.5

• “At least 30”: x = 29.5

• “Less than 46”: x = 45.5

• “At most 46”: x = 46.5

• “Between 34 and 37 (inclusively)”:

x1 = 33.5 and x2 = 37.5

• “Exactly 27”: x1 = 26.5 and x2 = 27.5

Example: tossing a coin

• A fair coin is tossed 16 times. What is the probability that the number of “heads” is at least 11?

• p = 0.5, q = 0.5, 𝜇 = 𝑛𝑝 = 8, 𝜎 = 𝑛𝑝𝑞 = 2

• Checking conditions: 𝑛𝑝 > 5, 𝑛𝑞 > 5 ? Yes.

• Correction factor: x = 11 – 0.5 = 10.5

• The z-score: 𝑧 =10.5−8

2= 1.25

• The required probability:

𝑃 𝑧 > 1.25 = 0.5 − 𝑇 1.25 = 0.5 − 0.3944

= 0.1056 or 10.56%

19

Example: rolling a die

• A fair die is rolled 330 times. Find the probability that the number “4” was rolled less than 60 times.

• p = 1/6, q = 5/6, 𝜇 = 𝑛𝑝 = 55, 𝜎 = 𝑛𝑝𝑞 = 6.77

• Checking conditions: 𝑛𝑝 > 5, 𝑛𝑞 > 5 ? Yes.

• Correction factor: x = 60 – 0.5 = 59.5

• The z-score: 𝑧 =59.5−55

6.77= 0.66

• The required probability:

𝑃 𝑧 < 0.66 = 0.5 + 𝑇 0.66 = 0.5 + 0.2454

= 0.7454 or 74.54%

Example: blood type B

• 90 Canadians are chosen at random. Find the probability that between 8 and 14 (inclusively) have blood type B.

• p = 0.09, q = 0.91, 𝜇 = 𝑛𝑝 = 8.1, 𝜎 = 𝑛𝑝𝑞 = 2.71

• Checking conditions: 𝑛𝑝 > 5, 𝑛𝑞 > 5 ? Yes.

• Correction factors: x1 = 7.5 and x2 = 14.5

• The z-scores: 𝑧1 = −0.22 and 𝑧2 = 2.36

• The required probability:

𝑃 −0.22 < 𝑧 < 2.36 = 𝑇(0.22) + 𝑇 2.36 =

0.0871 + 0.4909 = 0.5780 or 57.80%

20

Example: smokers

• 30 Quebecois are chosen at random. Find the probability that exactly 5 of those are smokers.

• p = 0.21, q = 0.79, 𝜇 = 𝑛𝑝 = 6.3, 𝜎 = 𝑛𝑝𝑞 = 2.23

• Checking conditions: 𝑛𝑝 > 5, 𝑛𝑞 > 5 ? Yes.

• Correction factors: x1 = 4.5 and x2 = 5.5

• The z-scores: 𝑧1 = −0.81 and 𝑧2 = −0.36

• The required probability:

𝑃 −0.81 < 𝑧 < −0.36 = 𝑇 0.81 − 𝑇 0.36 =

0.2910 − 0.1406 = 0.1504 or 15.04%