Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | charleen-dulcie-bishop |
View: | 225 times |
Download: | 7 times |
Chapter 15
Inference in Practice
PSLS/2e Chapter 15 1
Effective use of inferential methods requires more than knowing the facts. It requires understanding the reasoning behind the
process.
z Procedures• If we know standard deviation before data collected, the
confidence interval for is:
• To test H0: = 0, we use this statistic:
• These are called z procedures because they rely on critical values from the Z~N(0,1) density function
PSLS/2e Chapter 15 3
Conditions for Z Procedures1. Data must resemble an SRSSRS from the population
Ask: “where did the data come from?”– Bad samples Bad samples (see next slide) invalidate methods
2. Population must be NormalNormal …BUT…a fact known as the Central Limit Central Limit TheoremTheorem tells us the sampling distribution of x-bar will be Normal even if the population is not Normal ifif the sample is “large enough”
– In practice, z procedures are robust in large samples3. Population standard deviation must be knownmust be known
before data are collected …Chapter 17 will introduce procedures that can be used when is not known
PSLS/2e Chapter 15 4
Examples of BadBad SamplesSamples• Convenience samples - selecting members of the population
that are easiest to reach– Example: sample of mall shoppers teenagers and retired people
will be over-represented• Voluntary response samples - people who choose themselves
by responding to a broad appeal– Example: online polls are useless scientifically
(people who take the trouble to respond are not representative of the larger population)
• Under-coverage - some groups in the population are left out or underrepresented
– Example: using telephone listing to select subjects (not everyone has a listed phone number
• If the data do not come from an SRS or a randomized experiment conclusions are open to challenge.
• Always ask where the data came from.Always ask where the data came from.
PSLS/2e Chapter 15 5
Inference about µ 604/20/23 Inference about µ 6
Normality Assumption and the Central Limit Theorem
Normality can be assumed Normality can be assumed when when n n is large because of is large because of the the Central Limit TheoremCentral Limit Theorem
• Sample size less than 15: “Normality” can be assumed if data are symmetric, have a single peak and no outliers. If data are highly skewed, avoid z [and t] procedures.
• Sample size at least 15: Normality can be assumed unless data are strongly skewed or have outliers.
• Large samples n > 30 - 60: Normality can be assumed even for skewed distributions when the sample is large (n ≥ ~40)
Inference about µ 804/20/23 Inference about µ 8
Can Normality be assumed?
Moderately sized dataset (n = 20) w/strong skew. Normality cannot be assumed
Do NOT use z [or t] procedures
Inference about µ 904/20/23 Inference about µ 9
Can Normality be assumed?Extremely large data set (n ≈ 1000)
The data has a strong positive skew
But since sample is large, central limit theorem is strong and we can assume Normality.
Do use z [or t] procedures.
Inference about µ 1004/20/23 Inference about µ 10
Can Normality be assumed?
The distribution has no clear departures from Normality. Therefore, we can trust z [and t] procedures.
n is moderate
Additional Caution: GIGO
PSLS/2e Chapter 15 11
• Garbage In, Garbage Out • A study is only as good as the quality of the data• CIs and P-values are valueless when the
INFORMATION is of POOR QUALITY• Example: Self-reported data can be inaccurate and
biased
Additional Caution: P-values• P-values (significance tests) are often misunderstood• Even large differences can fail to be significant if the
sample is small • Statistical significance does NOT tell us whether a finding is
important statistical significance is NOT the same as practical significance
• P values are NOT the probability that H0 is true; it is the probability the data came from a distribution in which H0 is correct
• Failure to reject H0 is NOT the same as accepting H0• Although = 0.05 is a common cut-off, there is NO
set border between “significant” and “insignificant” results, surely God loves P = .06 nearly as much as P = .05.
PSLS/2e Chapter 15 12
Margin of Error (m)• When estimating µ with C confidence, the margin of error:
• The margin of error = half the CI length indicates the precision of the estimate
• z* and σ are immutable at a given level of confidence • To increase precision, increase the sample size:
↑ n → ↓ m → ↑ precision
PSLS/2e Chapter 15 13
m zn
Choosing a Sample Size
PSLS/2e Chapter 15 14
To determine the sample size requirement to achieve margin of error m when estimating µ use:
2
m
σzn
Example: National Assessment of Educational Progress (NAEP) Math Scores
PSLS/2e Chapter 15 15
NEAP math scores predict success following High School
Suppose that we want to estimate a population mean NAEP scores with 90% confidence and want the margin of error to be no more than ±5 points
We know the NEAP math scores have = 60
What sample size will be required to enable us to create such an interval?
Example
PSLS/2e Chapter 15 16
NAEP Quantitative Scores
If you round down your margin of error will be bigger If you round up your margin of error will be smaller (a good
thing).Always round UP to next integer. Study 400 individuals so m no greater than 5.
z σn
m
2
5
2(1.645)(60)
= 399.67
Example: Decrease margin of error m
PSLS/2e Chapter 15 17
Now suppose we want to estimate the population mean NAEP scores with 90% confidence and want the margin of error not to
exceed 3 points (recall that = 60).
What sample size will be required to enable us to create such an interval?
Case Study
PSLS/2e Chapter 15 18
NAEP Quantitative Scores
Therefore resolve to study 1083 (so that the margin of error does not exceed 3 points.
Note that lowering the margin of error to 3 points, required a much larger sample size!