+ All Categories
Home > Documents > The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( , 2 ) 1.For...

The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( , 2 ) 1.For...

Date post: 29-Dec-2015
Category:
Upload: alexandrina-lane
View: 226 times
Download: 4 times
Share this document with a friend
Popular Tags:
42
The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( , 2 ) 1. For comparison of several different normal distributions 2. For calculations without a
Transcript
Page 1: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

The Standardized Normal Distribution

Z is N( 0, 12 )

The standardized normal

X is N( , 2 )

1. For comparison of several different normal distributions

2. For calculations without a computer

Page 2: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

The Normal Approximationto the Binomial

0

0.05

0.1

0.15

0.2

0.25

0.3

0 1 2 3 4 5 6 7 8 9 10

Both distributions have the same shape…

Page 3: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

This normal approximation to the binomial works reasonably well when

1. np 5 and n(1-p) 5

2. No computer is nearby

But it is important fact that the binomial distribution and normal distribution are similar … we will return to this subject

relatively soon

… central limit …

Page 4: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

When talking about binomial and normal probabilities, we’ve taken the following point of view:

A situation follows certain “probabilities,” and we can use this knowledge to deduce specific

information about the situation

Now we will take the reverse point of view:

Specific information about a situation can be used to find “probabilities” that describe the situation

Conceptual idea of new topics

The word “features” could replace “probabilities”

Page 5: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

For example, think about the quiz you took…

Professors have always noticed that students’ scores on a test tend to follow a normal distribution

By actually giving a test to a sample of students, you can estimate the mean and standard deviation of the underlying normal distribution

For tests like the SAT, the underlying distribution is then used as a ranking measure for students taking the same test later

These ideas are loose, and first we’re going to learn how to work with sample data

Page 6: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Populations and Samples

A population is a complete set of data representing a given situation

A sample is a subset of the population --- ideally a small-scale replica of the population

E.g., all students that take the SAT constitute a population, while those taking the test on a particular

Saturday are a sample

E.g., all American citizens are a population, while those selected for a survey are a sample

Populations are a relative concept

Page 7: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

For the following definitions, imagine a population like the starting salaries of all MBA students graduating this

year

A population is assumed to follow a random variable X, with values and probabilities

X = starting salary of a particular MBA studentP(X = $100,000) = ????

So we could calculate the expected value of the population, as well as the standard deviation

Except it is sometimes hard to get a handle on the entire population. Imagine finding out the starting salaries of

every single graduating MBA student in the U.S.!

Page 8: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

So instead of trying to look at the entire population, we look at a sample of the population, which

hopefully gives us a good picture of the population

We might take a survey of graduating MBAs to determine the average (or expected) starting salary

• A sample statistic is a quantitative measure of a sample; used to make estimates of the population

• A sample mean (or expected value) is used to estimate the population mean

• A sample standard deviation is used to estimate the population mean

Page 9: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Summary/Sample Measures

A sample is made up of n observations X1, X2, …, Xn

sample mean = X = (X1 + X2 + … + Xn) / n

Sample std dev = SX =

sqrt ( [ (X1 – X)2 + (X2 – X)2 + … + (Xn – X)2 ] / (n-1) )

The median is the middle of the values; 50% of the observation values fall below the median and 50% above

The mode is the most frequent observation value

The maximum and minimum are the largest and smallest observations; the range is the difference between the max and min

Page 10: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Relevant Excel Commands

= AVERAGE(array) = STDEV(array)

Tools Data Analysis Descriptive Statistics

(see Excel file)

Page 11: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Setup and Assumptions for This Lecture

• You have a population about which you’d like to know things such as mean, std dev, proportions

• Each member of the population is assumed to follow the random variable X with mean X, std dev X, and particular proportion pX

• Again, however, X, X, and pX are unknown

• The population is too big to measure directly• You will take samples instead• What information can be deduced?

Page 12: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

A Practical Approach: Point Estimates

To estimate X, X, and pX, take a sample of size n and calculate sample mean X-bar, sample std

dev SX, and sample proportion X/n. Use these as “point estimates” of the true X, X, and pX

Here’s an idea…

n1

2n

21

X

n1

X...Xn1

nX

1nXX...XX

S

X...Xn1

X

(a) Xi = value of i-th observation

(b) Xi = 1 if i-th observations has attribute, 0 otherwise

Page 13: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Can we do better?

Point estimates are nice, but is there a better idea? After all, who knows how close X-bar, SX,

and X/n are to X, X, and pX?

…interval estimates…

For example, point estimate: “An estimate for the true mean X is the point estimate X-bar = 23.66.”

For example, interval estimate: “There is a 95% probability that the true mean X lies between 23.60 and 23.70.”

Interval estimates are stronger than point estimates

Page 14: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Yes, we can do better!

But it takes the investigation of some pretty tricky concepts…

the sampling random variable and the sampling distributionX

Page 15: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

The Sampling Random Variable and the Sampling Distribution

Fix in your mind a number n – the number of observations taken in a single sample

Now think about taking many different samples of size n and calculating the sample mean for each sample taken

X

The sampling random variable is the random variable that assigns the sample mean to each sample of size n …

And the sampling distribution is the distribution of this random variable

Page 16: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Key Facts about the Sampling Distribution

XX The mean of the sampling distribution is the mean of the population

nX

X

The std dev of the sampling distribution is the std dev of the population divided by the square root of n

Central Limit TheoremIf n is large then the sampling random variable is approximately normally distributed

Page 17: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Comments on the Sampling DistributionRemember: we don’t actually know X or X and so we don’t know the mean and standard deviation of the sampling distribution either

XX n

XX

We can make statements like: “The sample mean of a random sample of size n has a 95% chance of falling within 2 std devs up or down from the true population mean”

The standard deviation of the sampling distribution is commonly called the standard error

Page 18: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

An Example

We can make statements like: “The sample mean of a random sample of size n has a 95% chance of falling within 2 standard errors up or down from the true population mean”

(see Excel)

Again, we must stress that we don’t know true population mean, population std dev, or

sampling distribution std error

Page 19: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

How the Sampling Distribution Can Be Used

If we don’t know anything about the sampling distribution except “in theory,” then how can we really use it?

Well, we can determine some information about the sampling distribution by taking an actual sample of size n

n1X X...Xn1

X

)1n(n

XX...XXn

SS2

n2

1XXX

SX-bar is called the sample standard error

Page 20: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

A Practical Approach: Interval Estimates (Means)

Using a sample of size n, let X-bar serve as a point estimate of the true population mean X and of the mean X-bar of the sampling distribution

Also let the sample standard error SX-bar serve as an estimate of the standard error of X-bar

From this information, we can build “confidence intervals” for the true mean X of the population

Page 21: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Heart Valve Manufacturer

Dimension Mean Std. Deviation Piston Diameter 0.060 0.0002 Sleeve Diameter 0.065 0.0002 Clearance (unsorted)

0.005 0.000283

Approximately 52% of the heart valve assemblies will meet the desired tolerance. Can we do better?

Page 22: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Decision: Implement sorting with batches of 5 A random sample (after sorting has been implemented) of 100 piston/valve assemblies yields 79 valid (meet tolerances) assemblies out of the 100 trials.

How do we know whether or not the process change has really improved the resulting yield?

Page 23: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

The yield (# good assemblies out of 100) is a binomial random variable. Our estimate of the mean (based on this sample) is 79% (or 79 out of 100). One way of determining whether the process has been improved is to construct a confidence interval about our estimate.

Page 24: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

X = the number of good assemblies in 100 trials. The probability that X is within + or – 10 of our estimate, 79:

P{69 X 89} = P{ X 89}- P{ X 69}

= BINOMDIST(89,100,0.79,true)-

BINOMDIST(69,100,0.79,true)

0.9971 – 0.0123 0.985 79 10 is a 98.5% confidence interval for the number of valid assemblies out of 100.

Page 25: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Confidence Interval % Confidence

79 2 37.6% 79 4 67.4% 79 8 94.9% 79 10 98.5% 79 14 99.9%

Note: the larger the interval the more certain we become that it covers the true mean. Note that the yield of the original process was 52%. Since the lower limit of a 99.9% confidence interval about our sample mean is 65% (substantially larger than 52%) we can be pretty certain the process has improved.

Page 26: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Confidence Intervals (Means)

Using a single sample of size n 30 with information

a 95% confidence interval for the actual population mean X is

“We are 95% confident that the true population mean X is between these two numbers.”

Page 27: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Confidence Intervals (Means) (cont’d)

For a k% confidence interval, 1.960 is replaced with the value z having P(-z Z z) = 0.01*(100 - k):

k% 0.01*(50 + k/2) z

k% 0.01*(50 + k/2) NORMSINV(0.01*(50 + k/2) )

90% 0.95 1.645

95% 0.975 1.960

99% 0.995 2.576

Z is the standardized normal

By formula, z = NORMSINV( 0.01*( 50 + k/2 ) )

Page 28: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Sample Problem

The corresponding Excel file contains a sample of size 80 on the length of a precision shaft for use in lathes.

a. Calculate the mean, standard deviation, and standard error of the 80 values

b. Construct 95% and 99% confidence intervals (C.I.s) for the population mean

(see Excel)

Page 29: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Confidence Intervals (Proportions)If you have

then a 95% confidence interval for the true population proportion is

The 1.960 follows the same rules as for means with n 30

Page 30: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Confidence Intervals - Using Central Limit Theorem for Heart Valve Problem

Recall that: X = # successful assemblies in 100 trials An estimate of the probability of obtaining a successful assembly, p, is given by:

Estimate = X/100

Page 31: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

The estimate is, therefore, a binomial random variable with :

Mean = np/n = p

And

Standard Deviation = sqrt(np[1-p])/n = sqrt(p[1-p]/n)

Note: We can apply the CLT to approximate the binomial with a normal having the same mean and standard deviation.

Page 32: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Central Limit Theorem Restated for Population Proportions

• As the Sample size, n, increases, the sampling distribution approaches a normal distribution with

• Mean = p

• Standard deviation = sqrt[p (1 – p)/n]

Page 33: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Heart Valve Example Re-Visited

• 79 out of 100 assemblies were good

• Estimates for mean and stdev are:

• Mean = 0.79• Stdev = sqrt[0.79 (1 – 0.79)/100] = 0.040731

Page 34: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

To reiterate, we are using the Central Limit Theorem to approximate the distribution of the estimate as a Normal distribution with mean 0.79 and standard deviation 0.040731. What we mean by, for example, a 95% confidence interval is to find a number, r, satisfying:

P(0.79 – r <= p <= 0.79 + r) = 95%

Page 35: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Since the Normal distribution is symmetric about its mean (in this case 0.79), this means that exactly half of the “leftover” probability (5% for a 95% confidence interval) must lie in each tail. This means that a probability of 2.5% must lie in each tail for a 95% confidence interval. In other words,

P{estimate <= 0.79 – r} = 0.025, and

P{estimate <= 0.79 + r} = 0.975}

To perform this calculation, use the NORMINV function from excel: 0.79 + r = NORMINV(0.975,0.79,0.040731) 0.8698. Solving for r, we get r 0.0798, so

Page 36: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

a 95% confidence interval for estimate is given by

estimate = 0.79 0.0798

Page 37: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Results using Normal Approximation:

Confidence Level Confidence Interval 99.9% 0.79 0.135 98.5% 0.79 0.099 95% 0.79 0.0798

67.4% 0.79 0.04

Results using Binomial Directly:

Confidence Interval ConfidenceLevel 99.9% 0.79 0.14 98.5% 0.79 0.10 94.9% 0.79 0.08 67.4% 0.79 0.04 37.6% 0.79 0.02

Page 38: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Example Problem Continued

(see Excel)

c. Estimate the population proportion of lathes that exceed 6.625 inches. Construct a 90% C.I. for this proportion

Page 39: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Sample Size Needed to Achieve High Confidence (Means)

Considering estimating X, how many observations n are needed to obtain a 95% confidence interval for a particular error tolerance?

The error tolerance E is ½ the width of the confidence interval

Here, is a conservative (high) estimate of the true std dev X, often gotten by doing a preliminary small sample

1.960 can be adjusted to get different confidences

Page 40: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Example Problem Continued

(see Excel)

d. Consider the sample of 80 as a preliminary sample. Find the minimal sample size to yield a 95% C.I. for the population mean with E = 0.00005. What about for 99% confidence?

Page 41: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Sample Size Needed to Achieve High Confidence (Proportions)

Considering estimating pX, how many observations n are needed to obtain a 95% confidence interval for a particular error tolerance?

The error tolerance E is ½ the width of the confidence interval

Here, p is a conservative (closer to 0.5) estimate of the true population proportion pX, often gotten by doing a preliminary small sample

1.960 can be adjusted to get different confidences

Page 42: The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.

Polling ExampleIn estimating the proportion of the population that approves of Bush’s performance as President, how many people should be polled to provide a 95% confidence interval with error 0.02?

“83% of Americans approve of the job Bush is doing, plus or minus 2 percentage points”

We should use conservative estimate of true proportion

n = (1.960/0.02)^2 (0.5)(1 – 0.5) = 2401.0

But, if we’re certain px 0.6…

n = (1.960/0.02)^2 (0.6)(1 – 0.6) = 1920.8


Recommended