+ All Categories
Home > Documents > Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample...

Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample...

Date post: 09-Mar-2018
Category:
Upload: dangkhue
View: 217 times
Download: 4 times
Share this document with a friend
47
Day 8: Sampling Daniel J. Mallinson School of Public Affairs Penn State Harrisburg [email protected] PADM-HADM 503 Mallinson Day 8 October 12, 2017 1 / 46
Transcript
Page 1: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Day 8: Sampling

Daniel J. Mallinson

School of Public AffairsPenn State [email protected]

PADM-HADM 503

Mallinson Day 8 October 12, 2017 1 / 46

Page 2: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Road map

Why Sample?

Sampling terminology

Probability and Non-Probability Sampling

Sample Size

Where do the formulas come from?

An SPSS Example

Mallinson Day 8 October 12, 2017 2 / 46

Page 3: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Why Sample?

Often not feasible to study the entire population

Too costly, too time consuming, or both

Enables us to make generalizations about a large number ofcases by study small numbers, with a reasonable degree ofvalidity

Mallinson Day 8 October 12, 2017 3 / 46

Page 4: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Why Sample?

Often not feasible to study the entire population

Too costly, too time consuming, or both

Enables us to make generalizations about a large number ofcases by study small numbers, with a reasonable degree ofvalidity

Mallinson Day 8 October 12, 2017 3 / 46

Page 5: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sampling Terminology

Sample

A selected group of units that are representative of a generalpopulation

Population

The entire group of units that are of interest to the researcher

Target population

A specifically defined population

Mallinson Day 8 October 12, 2017 4 / 46

Page 6: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sampling Terminology

Sampling Frame

The complete list of units from which a sample is selected (may notbe the same as the population)

Unit of Analysis

Units about which information is collected and analyses are conducted

Sampling Unit

This may be different from the unit of analysis at different stages ofsampling (see cluster sampling)

Mallinson Day 8 October 12, 2017 5 / 46

Page 7: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sampling Terminology

Parameter

A characteristic (measure) of the population

Statistic

A characteristic (measure) of the sample

Sampling Error

The difference between the parameter and the statistic

Mallinson Day 8 October 12, 2017 6 / 46

Page 8: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sampling Terminology

Standard Error

A measure (approximation) of sampling error

Sample Bias

Non-statistical errors, systematic misrepresentations of populationcharacteristics

Sampling Fraction

Percentage of the population selected for the sample

Sampling Design

Procedure of selecting a sample

Mallinson Day 8 October 12, 2017 7 / 46

Page 9: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Example of Terms

Population: All motor vehicles owned in the state in the currentfiscal year.

Sampling Frame: All vehicles appearing on the state list ofRegistered Motor Vehicles prepared July 1 of the current fiscalyear by the DMV

Sampling Design: Probability sampling

Sample: 300 motor vehicles randomly selected from thesampling frame

Unit of analysis: Motor vehicle

Statistic: Average distance passenger cars in the sample weredriven annually: 20,000 miles

Parameter: The actual average annual mileage of all passengercars in the state

Mallinson Day 8 October 12, 2017 8 / 46

Page 10: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Group Task

You have decided to conduct a mail survey for the following study:

You are an administrator at the Dauphin Countydepartment of Human Services. One of the programs underyour jurisdiction is smoking cessation that targets pregnantwomen. You would like to evaluate the effectiveness of thisprogram and determine why some women were successful atquitting and others were not. Remember that these factorscould be personal and/or programmatic.

As a group, determine the population, a sampling frame, samplingdesign, sample size, unit of analysis, statistic, and the relatedparameter.

Mallinson Day 8 October 12, 2017 9 / 46

Page 11: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Two Groups of Sampling DesignsProbability Sampling Designs

Designs whose sizes and sampling errors can be estimated usingstatistical analyses

1 Simple Random Sampling

2 Systematic Sampling

3 Stratified Random Sampling

4 Cluster and Multistage Sampling

Non-Probability Sampling Designs

Designs whose sizes or sampling errors cannot be estimated usingstatistical analyses

1 Convenience designs

2 Purposive sampling

3 Quota sampling

4 Snowball samplingMallinson Day 8 October 12, 2017 10 / 46

Page 12: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Probability Sampling Designs

Simple Random Sampling

The original sampling method

The basis of basic sampling statistics

Statistical formulas used in our book are based on this, all othersare variations on this model

Mallinson Day 8 October 12, 2017 11 / 46

Page 13: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Probability Sampling Designs

Simple Random Sampling

The principle: Each unit should have the same chance of beingselected

Two types:1 With replacement2 Without replacement - most commonly called simple random

sampling

Mallinson Day 8 October 12, 2017 12 / 46

Page 14: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Excel Method

Create column of names

Type RAND() in secondcolumn

Drag bottom corner to copydown the list

Copy, paste, and select“values only” option

Sort by the random numbers

Mallinson Day 8 October 12, 2017 13 / 46

Page 15: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Probability Sampling Designs

Systematic Sampling

Statistical formulas are the same as for simple random sampling

Called quasi-random sampling

Units are ordered in a sequence

Skip interval = Number of units in the sampling frame/Numberof units in the sample

Skip 5: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ...

Mallinson Day 8 October 12, 2017 14 / 46

Page 16: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Probability Sampling Designs

Systematic Sampling

The problem of periodicity

Example: If you want to estimate the number of people on the2nd Street (“Restaurant Row”) in Harrisburg, do not sampleevery 7th evening (e.g., Saturdays). You will get a biasedsample.

A strategy to break up periodicity:

Select a starting point randomly and select half of the sample inthe first round. Then select another starting point in the otherhalf.

Mallinson Day 8 October 12, 2017 15 / 46

Page 17: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Probability Sampling Designs

Stratified Random Sampling

Divide the population into strata and make random selectionsfrom each stratum

Results in better representation than simple random sampling,because each stratum is homogeneous

Requires a smaller sample size than simple random sampling

Mallinson Day 8 October 12, 2017 16 / 46

Page 18: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Probability Sampling Designs

Stratified Random Sampling

Two Types:1 Proportionate: Strata in a population will be represented

proportionately2 Disproportionate: Some strata may be over-sampled to ensure

representation; results of combined dataset should be weighted

Mallinson Day 8 October 12, 2017 17 / 46

Page 19: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Probability Sampling Designs

Cluster and Multistage Sampling

The weakest, but most commonly used method

Weakest means that this method requires the largest sample sizefor the same level of accuracy

Random-digit dialing emulates this method

If one stage is used, it is called cluster sampling

Gather data on all units within randomly selected clusters

If multiple stages, it is called multistage sampling

Mallinson Day 8 October 12, 2017 18 / 46

Page 20: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Probability Sampling Designs

Cluster and Multistage Sampling

Examples of levels that can be used in multistage sampling:

StateCountyTownship, borough, cityNeighborhoods (Census tracts)BlocksHouseholdsParticular individuals

Note that the sampling unit changes at each stage

Mallinson Day 8 October 12, 2017 19 / 46

Page 21: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Probability Sampling Designs

Cluster and Multistage Sampling

Probability proportionate to size (PPS) technique: Larger unitsare given more chances to be selected

Mallinson Day 8 October 12, 2017 20 / 46

Page 22: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Ranking Sampling Designs In Terms ofAccuracy

The best (most powerful, accurate) method yields the leastamount of sampling error for the same sample size

In other words, the best method requires the smallest samplesize for the same level of sampling error

The Ranking:

1. Stratified Random Sampling (Best)2. Simple Random Sampling, Systematic Sampling3. Cluster and Multistage Sampling (Worst)

Mallinson Day 8 October 12, 2017 21 / 46

Page 23: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Non-Probability Sampling Designs

1 Convenience designs (accidental sampling): Select whatever unityou want first

2 Purposive sampling (theory-based): There is a non-statisticalreasoning behind the sampling strategy used

3 Quota sampling: This is like stratified sampling, but units areselected randomly

4 Snowball sampling: One unit leads to the next one

Mallinson Day 8 October 12, 2017 22 / 46

Page 24: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size

The rule of thumb: the larger, the better

But the calculation of a sample size is more complex than thisrule

Larger samples cost more and larger samples may be more proneto errors in the data collection process

So, we need to select samples that are large enough for theresources (money and time) we have and the level ofmeasurement error we can tolerate

Mallinson Day 8 October 12, 2017 23 / 46

Page 25: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size

Sample size is determined by:

Population size

Population variability (homogeneity)

Confidence level

Accuracy desired

Mallinson Day 8 October 12, 2017 24 / 46

Page 26: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size

Population size:

Not a linear relationship with sample size (diminishing returns)

Is ignored for large population sizes, like a national population

Mallinson Day 8 October 12, 2017 25 / 46

Page 27: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size

Population variability:

Measured as standard deviation; the larger the variability, thelarger the sample size should be

Think of this: If every unit is identical, a sample of one would besufficient to represent all of the units

Mallinson Day 8 October 12, 2017 26 / 46

Page 28: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size

Confidence level:

Confidence in the validity of the results of an analysis on thesample

It is 1-alpha level. We will talk more about alpha level later inthe course

Bottom line: The more confidence desired, the larger the sampleshould be

Mallinson Day 8 October 12, 2017 27 / 46

Page 29: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size

Accuracy desired:

Measured by the standard error

A trade off between confidence level and accuracy

Mallinson Day 8 October 12, 2017 28 / 46

Page 30: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size

Confidence-Accuracy Trade Off

Confidence Level Accuracy as Shown by Confidence Level99% ±2.5895% ±1.9690% ±1.6550% ±.68

Table: Table 5.5, pg. 155

Mallinson Day 8 October 12, 2017 29 / 46

Page 31: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size Formulas

General Formula:

√n =

(Standard Deviation of Population ∗ Confidence level)

Accuracy desired (in standard error terms)(1)

√n = square root of sample size

Population size is ignored if it is relatively large (like thepopulation of a nation)

Mallinson Day 8 October 12, 2017 30 / 46

Page 32: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size Formulas

What this Formula Means:1 As variation in population ⇑, sample size ⇑2 As desired confidence level ⇑, sample size ⇑3 As desired level of accuracy ⇑, sample size ⇑4 As tolerable level of error ⇑, sample size ⇓

Mallinson Day 8 October 12, 2017 31 / 46

Page 33: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size Formulas

Proportions (Dichotomous Variables):

n =Z 2 ∗ p(1− p)

d2(2)

Z is the z-score for confidence level (e.g., 1.96 for 95%)

d is the desired accuracy (e.g., ±4%), i.e., margin of error

If the standard deviation of the population is unknown, use 50%(0.5)

Mallinson Day 8 October 12, 2017 32 / 46

Page 34: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size Formulas

Means (Interval or Ratio Variables):

n =σ2 ∗ Z 2

d2(3)

σ is the population variance; either assumed, estimated fromsample data, or previous knowledge

Z is the z-score for confidence level (e.g., 1.96 for 95%)

d is the desired accuracy (e.g., ±4%)

Mallinson Day 8 October 12, 2017 33 / 46

Page 35: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Sample Size Formulas

How to find n without using a formula

See the sample sizes for various degrees of accuracy andconfidence levels (for small populations): Table 5.6, p. 158

See the sample sizes for various degrees of accuracy andconfidence levels (for large populations): Table 5.7, p. 159

Mallinson Day 8 October 12, 2017 34 / 46

Page 36: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?How many different samples can be drawn from the same population?

Figure: Musu-Gillette, Lauren 2016

Mallinson Day 8 October 12, 2017 35 / 46

Page 37: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?

How many different samples can be drawn from the samepopulation?

n!

r !(n − r)!=

(n

r

)(4)

n is the size of the population, r is the sample size

Example: A sample size 3 from a population of 10, the formula wouldbe:

10 ∗ 9 ∗ 8 ∗ 7 ∗ 6 ∗ 5 ∗ 4 ∗ 3 ∗ 2 ∗ 1

3!(10− 3)!= 120 (5)

Mallinson Day 8 October 12, 2017 36 / 46

Page 38: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?

A sampling distribution is the distribution of the sample statistic weare interested in (e.g., mean, or percentage of voter for a candidate)in all possible samples. See Figure 5.6, p. 154.

Mallinson Day 8 October 12, 2017 37 / 46

Page 39: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?

The sampling distributions for particular populations can beplotted and their measures can be calculated

The normal distribution is the most common shape for asampling distribution

The normal is not the only, but the most basic

Mallinson Day 8 October 12, 2017 38 / 46

Page 40: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?

Figure: https://www.mathsisfun.com/data/standard-normal-distribution.html

Mallinson Day 8 October 12, 2017 39 / 46

Page 41: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?

The proportions of the area under the normal curve are fixed

If you move one or two standard deviations (Z units, Z scores)away from the mean, the area under the curve will always be thesame percentage when a distribution is normal

Mallinson Day 8 October 12, 2017 40 / 46

Page 42: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?

Normal Distribution (cont.)

This is a key characteristic of the normal distribution that helpsus make sampling estimations

Also the basis of statistical significance tests (e.g., the t-test),which we will discuss later

These areas under the curve can be used to calculatestandardized scores (recall the measurement section):

Z − scores =(Score−Mean)

Standard Deviation(6)

Mallinson Day 8 October 12, 2017 41 / 46

Page 43: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?

Central Limit Theorem

If the population is normally distributed, its sampling distributionwill also be normal

If the population is large, but not normally distributed, itssampling distribution will also be normal

One test we will discuss (t-test) uses a modified version of thenormal curve for its sampling distribution

Other tests have their own specific sampling distributions

Calculations of sampling error and confidence intervals are basedon the idea of a normal sampling distribution

Mallinson Day 8 October 12, 2017 42 / 46

Page 44: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?

Standard Error:

The standard deviation of the sampling distribution (populationcorrection factor (fcp) in bold for small populations)

(Proportions) SEp =

√p(1− p)

n∗ (N-n)(N-1) (7)

(Means) SEx =σ√n∗ (N-n)(N-1) (8)

Mallinson Day 8 October 12, 2017 43 / 46

Page 45: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Where Do the Formulas Come From?

Confidence Interval:

Confidence level = 1− alpha level

If alpha is 0.05, then the confidence level will be .95 (95%)

A confidence interval is calculated by using the confidence level

If CL is .95 (95%) and we assume a normal distribution:

Lower limit (bound) = Sample Mean− 1.96 ∗ SEx

Upper limit (bound) = Sample Mean + 1.96 ∗ SEx

CI means confidence intervals produced by 95% of sampleswould contain population parameter

Mallinson Day 8 October 12, 2017 44 / 46

Page 46: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

An SPSS Example

Mallinson Day 8 October 12, 2017 45 / 46

Page 47: Day 8: Sampling map Why Sample? Sampling terminology Probability and Non-Probability Sampling Sample Size Where do the formulas come from? An SPSS Example Mallinson Day 8 October ...

Questions?

Figure: Q&A by Libby Levi, CC BY-SA 2.0

Mallinson Day 8 October 12, 2017 46 / 46


Recommended