+ All Categories
Home > Documents > Unit 2: Probability and distributions Lecture 3: Normal...

Unit 2: Probability and distributions Lecture 3: Normal...

Date post: 24-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
10
U 2: P L 3: N D P S 101 Nicole Dalzell May 21, 2015 Announcements Lab 3a due tomorrow at 6 PM Project Proposal Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 2 / 37 Project Project Project instructions posted: https:// stat.duke.edu/ nmd16/ courses/ Summer15/ sta101.001-1/ problem sets/ project.pdf Think about research questions to explore. Decide if you’ll be collecting your own observational data, conduct an experiment, or use previously collected data (from a published study or public database). Brainstorm due Tuesday May 25. Proposal due Thurday, June 4. Project due Thursday, June 18. Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 3 / 37 Project Data resources: Data and GIS Services http:// guides.library.duke.edu/ stat101 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 4 / 37
Transcript
Page 1: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

U 2: P L 3: N D P

S 101

Nicole Dalzell

May 21, 2015

Announcements

Lab 3a due tomorrow at 6 PM

Project Proposal

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 2 / 37

Project

Project

Project instructions posted:https:// stat.duke.edu/∼nmd16/ courses/ Summer15/ sta101.001-1/problem sets/ project.pdf

Think about research questions to explore. Decide if you’ll becollecting your own observational data, conduct an experiment, or usepreviously collected data (from a published study or public database).

Brainstorm due Tuesday May 25. Proposal due Thurday, June 4.Project due Thursday, June 18.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 3 / 37

Project

Data resources: Data and GIS Services

http:// guides.library.duke.edu/ stat101

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 4 / 37

Page 2: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

Project

Data resources: Data and GIS Services - office hours

For questions related to finding data and getting it into R only, notstatistical analysis questions.

http:// library.duke.edu/ data/ about/ schedule.html

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 5 / 37

Project

Data resources: Hillary Mason & Bit.ly

http:// bitly.com/ bundles/ hmason/ 1

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 6 / 37

Project

Data resources: Reddit

http:// www.reddit.com/ r/ datasets

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 7 / 37

Project

Data resources: DASL

http:// lib.stat.cmu.edu/ DASL/

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 8 / 37

Page 3: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

Project

Data resources: Citizen Statistician

http:// citizen-statistician.org/ 2012/ 11/ 07/ data-sets-a-list-in-flux/

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 9 / 37

Project

Data resources

and many others, get creative!

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 10 / 37

Project

Getting Data in R

In lab you have been given nicely formatted data that can be directlyloaded into R using either load or source. This will rarely happenoutside of this class, and for your project you will need to convert yourdata into a format R can read.

Ideally use a plaintext format: csv, tab delimited, etc. (read.csv,read.table, read.delim)

Avoid proprietary formats (usually doable but require extra work)

Programs like Excel are useful to convert and clean up data

Find your data early, if you run into trouble ask for help

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 11 / 37

The Normal Distribution

Participation question

Scores on a standardized test are normally distributed with a mean of100 and a standard deviation of 20. If these scores are converted tostandard normal Z scores, which of the following statements will becorrect?

(a) Both the mean and median score will equal 0.

(b) The mean will equal 0, but the median cannot be determined.

(c) The mean of the z-scores will equal 100.

(d) The mean of the z-scores will equal 5.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 12 / 37

Page 4: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

The Normal Distribution

Approximating percentiles

Approximately what percent of students score below 1800 on the SAT?The mean SAT score is 1500, with a standard deviation of 300(Hint: Use the 68-95-99.7% rule.)

600 900 1200 1500 1800 2100 2400

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 13 / 37

The Normal Distribution

Percentiles

Percentile is the percentage of observations that fall below agiven data point.

Graphically, percentile is the area below the probabilitydistribution curve to the left of that observation.

600 900 1200 1500 1800 2100 2400

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 14 / 37

The Normal Distribution

Calculating percentiles - using computation

There are many ways to compute percentiles/areas under the curve:

R:

> pnorm(1800, mean = 1500, sd = 300)

[1] 0.8413447

Applet: http:// www.socr.ucla.edu/ htmls/ SOCR Distributions.html

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 15 / 37

The Normal Distribution

Z-Scores

Z-ScoreThe z-score for a data value, xi , is

z =xi − x̄

s

Values farther from 0 are more extreme.

A z-score puts values on a common scale

A z-score is the number of standard deviations a value falls fromthe mean

95% of all z-scores fall between -2 and 2 .

z-scores beyond -2 or 2 can be considered extreme

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 16 / 37

Page 5: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

The Normal Distribution

Participation question

Which of the following is false?

(a) Z scores are helpful for determining how unusual a data point iscompared to the rest of the data in the distribution.

(b) Majority of Z scores in a right skewed distribution are negative.

(c) Regardless of the shape of the distribution (symmetric vs.skewed) the Z score of the mean is always 0.

(d) In a normal distribution, Q1 and Q3 are more than one SD awayfrom the mean.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 17 / 37

The Normal Distribution

Calculating percentiles - using tables

Second decimal place of ZZ 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

Z-score = 1

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 18 / 37

The Normal Distribution

What percent of the standard normal distribution is above Z = 0.82?Choose the closest answer.

(a) 79.4%

(b) 20.6%

(c) 82%

(d) 18%

(e) Need to be provided the mean and the standard deviation of thedistribution in order to be able to solve this problem.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 19 / 37

The Normal Distribution

Example

The average daily temperature in June in LA is 77 F, with a standarddeviation of 5 degrees. Suppose the temperatures in June closelyfollow a normal distribution. What is the probability of observing atemperature of at most 83 F on a randomly chosen day in June? )

T ∼ N (mean = 77, sd = 5)

P(T ≤ 83) = P(Z ≤

83 − 775

)= P (Z ≤ 1.2)

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 20 / 37

Page 6: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

The Normal Distribution

Calculating percentiles - using tables

Second decimal place of ZZ 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

Z-score = 1.2

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 21 / 37

The Normal Distribution

Example

The average daily temperature in June in LA is 77 F, with a standarddeviation of 5 degrees. Suppose the temperatures in June closelyfollow a normal distribution. What is the probability of observing atemperature of at most 83 F on a randomly chosen day in June? )

T ∼ N (mean = 77, sd = 5)

P(T ≤ 83) = P(Z ≤

83 − 775

)= P (Z ≤ 1.2) ≈ 0.885

The probability of observing a temperature of at most 83 F on arandomly chosen day in June is approximately 0.885, or 88.5%.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 22 / 37

The Normal Distribution

Example (cont)

The average daily temperature in June in LA is 77 F, with a standarddeviation of 5 degrees. Suppose the temperatures in June closelyfollow a normal distribution. What is the probability of observing atemperature of at least 83 F on a randomly chosen day in June? )

T ∼ N (mean = 77, sd = 5)

P(T ≥ 83) = 1 − P(T ≤ 83) ≈ 1 − 0.885 = 0.115

The probability of observing a temperature of at least 83 F on arandomly chosen day in June is approximately 0.115, or 11.5%.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 23 / 37

The Normal Distribution

Example

The average daily temperature in June in LA is 77 F, with a standarddeviation of 5 degrees. Suppose the temperatures in June closelyfollow a normal distribution. What is the probability of observing atemperature of at most 83 F on a randomly chosen day in June? )

T ∼ N (mean = 77, sd = 5)

P(T ≤ 83) = P(Z ≤

83 − 775

)= P (Z ≤ 1.2) ≈ 0.885

The probability of observing a temperature of at most 83 F on arandomly chosen day in June is approximately 0.885, or 88.5%.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 24 / 37

Page 7: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

The Normal Distribution

Example (cont)

The average daily temperature in June in LA is 77 F, with a standarddeviation of 5 degrees. Suppose the temperatures in June closelyfollow a normal distribution. What is the probability of observing atemperature of at least 83 F on a randomly chosen day in June? )

T ∼ N (mean = 77, sd = 5)

P(T ≥ 83) = 1 − P(T ≤ 83) ≈ 1 − 0.885 = 0.115

The probability of observing a temperature of at least 83 F on arandomly chosen day in June is approximately 0.115, or 11.5%.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 25 / 37

Normal approximation to the binomial

This study also found that approximately 25% of Facebook users areconsidered power users. The same study found that the average Face-book user has 245 friends. What is the probability that the averageFacebook user with 245 friends has 70 or more friends who would beconsidered power users?

We are given that n = 245, p = 0.25, and we are asked for theprobability P(K ≥ 70).

P(K ≥ 70) = P(K = 70 or K = 71 or K = 72 or · · · or K = 245)

= P(K = 70) + P(K = 71) + P(K = 72) + · · ·+ P(K = 245)

This seems like an awful lot of work...

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 26 / 37

Normal approximation to the binomial

Histograms of number of successes

Hollow histograms of samples from the binomial model where p =0.10 and n = 10, 30, 100, and 300. What happens as n increases?

n = 10

0 2 4 6

n = 30

0 2 4 6 8 10

n = 100

0 5 10 15 20

n = 300

10 20 30 40 50

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 27 / 37

Normal approximation to the binomial

Normal approximation to the binomial

When the sample size is large enough, the binomial distribution withparameters n and p can be approximated by the normal model withparameters µ = np and σ =

√np(1 − p).

In the case of the Facebook power users, n = 245 and p = 0.25.

µ = 245 × 0.25 = 61.25 σ =√

245 × 0.25 × 0.75 = 6.78

Bin(n = 245, p = 0.25) ≈ N(µ = 61.25, σ = 6.78).

k

20 40 60 80 100

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Bin(245,0.25)N(61.5,6.78)

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 28 / 37

Page 8: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

Normal approximation to the binomial

Normal approximation to the binomial

When the sample size is large enough, the binomial distribution withparameters n and p can be approximated by the normal model withparameters µ = np and σ =

√np(1 − p).

In the case of the Facebook power users, n = 245 and p = 0.25.

µ = 245 × 0.25 = 61.25 σ =√

245 × 0.25 × 0.75 = 6.78

Bin(n = 245, p = 0.25) ≈ N(µ = 61.25, σ = 6.78).

k

20 40 60 80 100

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Bin(245,0.25)N(61.5,6.78)

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 29 / 37

Normal approximation to the binomial

What is the probability that the average Facebook user with 245 friendshas 70 or more friends who would be considered power users?

(a) 0.0251

(b) 0.0985

(c) 0.1128

(d) 0.9015

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 30 / 37

Normal approximation to the binomial

Low large is large enough?

The sample size is considered large enough if the expected numberof successes and failures are both at least 10.

np ≥ 10 and n(1 − p) ≥ 10

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 31 / 37

Normal approximation to the binomial

Participation question

Below are four pairs of Binomial distribution parameters. Which distri-bution can be approximated by the normal distribution?

(a) n = 100, p = 0.95

(b) n = 25, p = 0.45

(c) n = 150, p = 0.05

(d) n = 500, p = 0.015

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 32 / 37

Page 9: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

Application exercises Finding probabilities // Quality control

At Heinz ketchup factory the amounts which go into bottles of ketchup are supposed

to be normally distributed with mean 36 oz. and standard deviation 0.11 oz. Once

every 30 minutes a bottle is selected from the production line, and its contents are

noted precisely. If the amount of the bottle goes below 35.8 oz. or above 36.2 oz.,

then the bottle fails the quality control inspection. What percent of bottles pass the

quality control inspection?

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 33 / 37

Application exercises Finding cutoff points // Hot bodies

Body temperatures of healthy humans are distributed nearly normally with mean

98.2◦F and standard deviation 0.73◦F. What is the cutoff for the highest 10% of human

body temperatures?

Mackowiak, Wasserman, and Levine (1992), A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body

Temperature, and Other Legacies of Carl Reinhold August Wunderlick.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 34 / 37

Application exercises Conditional probability // SAT scores

SAT scores (out of 2400) are distributed normally with mean 1500 and standard devi-

ation 300. Suppose a school council awards a certificate of excellence to all students

who score at least 1900 on the SAT. What percent of the students who received this

certificate scored above 2100?

P(SAT > 2100 | SAT > 1900) =P(SAT > 2100 and SAT > 1900)

P(SAT > 1900)

=P(SAT > 2100)

P(SAT > 1900)

P(SAT > 2100) = P(2100 − 1500

300

)= P(Z > 2) = 1 − 0.9772 = 0.0228

P(X > 1900) = P(Z > 1.33) = 1 − 0.9082 = 0.0918

P(SAT > 2100 | SAT > 1900) =0.02280.0918

≈ 0.25 → 25% of students

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 35 / 37

Application exercises Finding missing parameters // Auto insurance premiums

Suppose a newspaper article states that the distribution of auto insurance premiumsfor residents of California is approximately normal with a mean of $1,650. The articlealso states that 25% of California residents pay more than $1,800.

1. What is the standard deviation of this distribution?

2. What is the IQR of this distribution?

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 36 / 37

Page 10: Unit 2: Probability and distributions Lecture 3: Normal ...nmd16/courses/Summer15/sta101.001-1/slides/(2) Unit 2/Lec 3...Unit 2: Probability and distributions Lecture 3: Normal Distribution

To Do

To Do

PS 3 due tomorrow in classReading Assignment for Friday:

Chapter 4 Sections 4.1 - 4.2.3( A sampling Distribution for themean)

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 37 / 37


Recommended