Post on 25-Feb-2016
description
transcript
Normal Distributions 2/27/12
• Normal Distribution• Central Limit Theorem• Normal distributions for confidence intervals• Normal distributions for p-values• Standard Normal
Corresponding Sections: 5.1, 5.2
Exam 1 Grades
slope (thousandths)-60 -40 -20 0 20 40 60
Measures from Scrambled RestaurantTips Dot Plot
r-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Measures from Scrambled Collection 1 Dot Plot
Nullxbar98.2 98.3 98.4 98.5 98.6 98.7 98.8 98.9 99.0
Measures from Sample of BodyTemp50 Dot Plot
Diff-4 -3 -2 -1 0 1 2 3 4
Measures from Scrambled CaffeineTaps Dot Plot
xbar26 27 28 29 30 31 32
Measures from Sample of CommuteAtlanta Dot Plot
Slope :Restaurant tips
Correlation: Malevolent uniforms
Mean :Body Temperatures
Diff means: Finger taps
Mean : Atlanta commutes
phat0.3 0.4 0.5 0.6 0.7 0.8
Measures from Sample of Collection 1 Dot PlotProportion : Owners/dogs
What do you notice?
All bell-shaped distributions!
Bootstrap and Randomization Distributions
• The symmetric, bell-shaped curve we have seen for almost all of our bootstrap and randomization distributions is called a normal distribution
Normal Distribution
Freq
uenc
y
-3 -2 -1 0 1 2 3
050
010
0015
00
Central Limit Theorem!
For a sufficiently large sample size, the distribution of sample
statistics for a mean or a proportion is normal
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Central Limit Theorem
• The central limit theorem holds for ANY original distribution, although “sufficiently large sample size” varies
• The more skewed the original distribution is (the farther from normal), the larger the sample size has to be for the CLT to work
Central Limit Theorem
• For distributions of a quantitative variable that are not very skewed and without large outliers, n ≥ 30 is usually sufficient to use the CLT
• For distributions of a categorical variable, counts of at least 10 within each category is usually sufficient to use the CLT
• The normal distribution is fully characterized by it’s mean and standard deviation
Normal Distribution
mean,standard deviationN
Normal Distribution
0.523,0.048N
Bootstrap DistributionsIf a bootstrap distribution is approximately normally distributed, we can write it as
a) N(parameter, sd)b) N(statistic, sd)c) N(parameter, se)d) N(statistic, se)sd = standard deviation of variablese = standard error = standard deviation of statistic
Confidence Intervals
If the bootstrap distribution is normal:
To find a P% confidence interval , we just need to find the middle P% of the distribution
N(statistic, SE)
Best Picture
What proportion of visitors to www.naplesnews.com thought The Artist should win best picture?
ˆ .15p
???SE
Area under a Curve• The area under the curve of a normal distribution is equal to the proportion of the distribution falling within that range
• Knowing just the mean and standard deviation of a normal distribution allows you to calculate areas in the tails and percentiles
http://davidmlane.com/hyperstat/z_table.html
Best Picturehttp://davidmlane.com/hyperstat/z_table.html
Best Picture
For a normal sampling distribution, we can also use the formula
to give a 95% confidence interval.
Confidence Intervals
sample statistic 2 SE
2 .03
0.096.15
,0 66
1
.2
For normal bootstrap distributions, the formula
gives a 95% confidence interval.
How would you use the N(0,1) normal distribution to find the appropriate multiplier for other levels of confidence?
Confidence Intervals
sample statistic 2 SE
For a P% confidence interval, use
where P% of a N(0,1) distribution is between –z* and z*
Confidence Intervals
*sample statistic z SE
z*-z*
95%
Confidence Intervals
Confidence Intervals
Find z* for a 99% confidence interval.
http://davidmlane.com/hyperstat/z_table.html
z* = 2.576
News Sources“A new national survey shows that the majority (64%) of American adults use at least three different types of media every week to get news and information about their local community”
The standard error for this statistic is 1%
Find a 99% confidence interval for the true proportion.Source: http://pewresearch.org/databank/dailynumber/?NumberID=1331
News Sources*sample statistic z SE
2.5760.64 0 0. 1
0 0.6 64 .02
0.614,0.666
Confidence Interval Formula
*sample statistic z SE
From original data
From bootstrap
distribution
From N(0,1)
First Born Children
• Are first born children actually smarter?
• Based on data from last semester’s class survey, we’ll test whether first born children score significantly higher on the SAT
• From a randomization distribution, we find SE = 37
first born not first born 30.26X X
First Born Children
What normal distribution should we use to find the p-value?
a) N(30.26, 37)b) N(37, 30.26)c) N(0, 37)d) N(0, 30.26)
first born not first born 30.26, 37SX X E
Hypothesis TestingDistribution of Statistic Assuming Null
Statistic
-3 -2 -1 0 1 2 3
Observed Statistic
Distribution of Statistic Assuming Null
Statistic
-3 -2 -1 0 1 2 3
Distribution of Statistic Assuming Null
Statistic
-3 -2 -1 0 1 2 3
Observed Statistic
p-value
p-valuesIf the randomization distribution is normal:
To calculate a p-value, we just need to find the area in the appropriate tail(s) beyond the observed statistic of the distribution
N(null value, SE)
First Born ChildrenN(0, 37)
http://davidmlane.com/hyperstat/z_table.html
p-value = 0.207
First Born Children
Standard Normal• Sometimes, it is easier to just use one normal distribution to do inference
• The standard normal distribution is the normal distribution with mean 0 and standard deviation 1
0,1N
Distribution of Statistic Assuming Null
Statistic
-3 -2 -1 0 1 2 3
Standardized Test Statistic
• The standardized test statistic is the number of standard errors a statistic is from the null value
• The standardized test statistic (also called a z-statistic) is compared to N(0,1)
sample statistic null valueSE
z
p-value
1) Find the standardized test statistic:
2) The p-value is the area in the tail(s) beyond z for a standard normal distribution
sample statistic null valueSE
z
First Born Children
1) Find the standardized test statistic
sample statistic null valueSE
30.26 037
0.818
z
First Born Children
2) Find the area in the tail(s) beyond z for a standard normal distribution
p-value = 0.207
z-statistic
• Calculating the number of standard errors a statistic is from the null value allows us to assess extremity on a common scale
Formula for p-values
From randomization
distribution
From H0
sample statistic null valueSE
z
From original data
Compare z to N(0,1) for p-value
Standard Error
• Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations?
• We can!!!
• Or rather, we’ll be able to on Wednesday!