+ All Categories
Home > Documents > SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Date post: 26-Dec-2015
Category:
Upload: edith-dawson
View: 219 times
Download: 2 times
Share this document with a friend
Popular Tags:
32
SOCI5013 Advanced Social SOCI5013 Advanced Social Research Research Probability Sampling Probability Sampling Song Yang Song Yang Spring 2007 Spring 2007
Transcript
Page 1: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

SOCI5013 Advanced Social SOCI5013 Advanced Social ResearchResearch

Probability SamplingProbability Sampling

Song YangSong Yang

Spring 2007Spring 2007

Page 2: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

The Theory and Logic of The Theory and Logic of Probability SamplingProbability Sampling

Nonprobability sampling cannot guarantee Nonprobability sampling cannot guarantee a representative sample of the entire a representative sample of the entire population, thus all large-scale surveys population, thus all large-scale surveys use probability sampling methods.use probability sampling methods.

If all members of a population were If all members of a population were identical in all respects, studying a single identical in all respects, studying a single case suffices as a sample to study the case suffices as a sample to study the whole population. It never happens whole population. It never happens because human being varies in a great because human being varies in a great amount of characteristics.amount of characteristics.

Page 3: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Probability SamplingProbability Sampling

Sampling bias means those selected Sampling bias means those selected are not typical or representative of are not typical or representative of the larger population they have been the larger population they have been chosen from. Researchers may chosen from. Researchers may unconsciously induce sampling bias unconsciously induce sampling bias by choosing respondents most by choosing respondents most closest to them.closest to them.

Page 4: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Techniques to Avoid BiasTechniques to Avoid Bias

A sample is representative of the population A sample is representative of the population from which it is selected if the aggregate from which it is selected if the aggregate characteristics of the sample closely match characteristics of the sample closely match those same aggregate characteristics in the those same aggregate characteristics in the population. population.

A basic principle of probability sampling is A basic principle of probability sampling is that a sample will be representative of the that a sample will be representative of the population from which it is selected if all population from which it is selected if all members of the population have an equal members of the population have an equal chance of being selected in the sample, chance of being selected in the sample, which is commonly called EPSEM (Equal which is commonly called EPSEM (Equal Probability of Selection Method) Probability of Selection Method)

Page 5: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Advantages of Probability Advantages of Probability SamplesSamples

1) probability sample, although never 1) probability sample, although never perfectly representative, are more perfectly representative, are more representative than other types of representative than other types of samples such as nonprobability samples samples such as nonprobability samples because bias is avoided.because bias is avoided.

2) probability theory permits an estimate 2) probability theory permits an estimate of the representativeness of the sample. In of the representativeness of the sample. In other words, the probability sampler can other words, the probability sampler can provide an accurate estimate of success or provide an accurate estimate of success or failure in its representativeness.failure in its representativeness.

Page 6: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Elements and PopulationElements and Population Elements are units about which information is Elements are units about which information is

collected and that provides the basis of analysis. collected and that provides the basis of analysis. Most likely the elements in social studies are Most likely the elements in social studies are individuals. Some times, it can be families, individuals. Some times, it can be families, social clubs, corporations, and nations.social clubs, corporations, and nations.

Population is the theoretically specified Population is the theoretically specified aggregation of the elements in a study. It can be aggregation of the elements in a study. It can be current U.S. citizen, college students, etc.current U.S. citizen, college students, etc.

A study population is that aggregation of A study population is that aggregation of elements from which the sample is selected. For elements from which the sample is selected. For practical purpose, a polling firm may exclude practical purpose, a polling firm may exclude Alaska and Hawaii from an national sampling. Alaska and Hawaii from an national sampling.

Page 7: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Random SelectionRandom Selection

The purpose of sampling: to select a set of The purpose of sampling: to select a set of elements from a population in such a way elements from a population in such a way that descriptions of those elements that descriptions of those elements accurately describe the total population accurately describe the total population from which the elements are selected.from which the elements are selected.

Random selection, in which each element Random selection, in which each element has an equal chance of being selected, has an equal chance of being selected, independent of any other event in the independent of any other event in the selection process, is the key to selection process, is the key to accomplishing the purpose/goal of accomplishing the purpose/goal of sampling.sampling.

Page 8: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Flipping CoinsFlipping Coins

A classical illustration of random sampling A classical illustration of random sampling is flipping coins. Each time the chance of is flipping coins. Each time the chance of getting a head or the tail is 50%, getting a head or the tail is 50%, irrespective of all previous results.irrespective of all previous results.

Sampling distribution of ten casesSampling distribution of ten cases The conclusion: every increase in sample The conclusion: every increase in sample

size improves the distribution of estimates size improves the distribution of estimates of the mean.of the mean.

Page 9: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Sampling ErrorSampling Error Sampling error: the degree of error to be Sampling error: the degree of error to be

expected for a given sample design.expected for a given sample design.

S: standard error (standard deviation for sampling S: standard error (standard deviation for sampling distribution)distribution)P: percentage of cases equals 1 in a binary P: percentage of cases equals 1 in a binary variable variable

Q: percentage of cases equals 0 in a binary Q: percentage of cases equals 0 in a binary variable (Q = 100 –P)variable (Q = 100 –P)N: number of cases in each sampleN: number of cases in each sample

N

Qps

Page 10: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Populations and Sampling Populations and Sampling FrameFrame

A sampling frame is the list or quasi list of A sampling frame is the list or quasi list of elements from which a probability sample elements from which a probability sample is selected. Examples:is selected. Examples:

A random sample of parents of children in A random sample of parents of children in the third grade in public schools in Yakima the third grade in public schools in Yakima county, Washington.county, Washington.

A sample of 160 individuals was drawn A sample of 160 individuals was drawn randomly from the telephone directory of randomly from the telephone directory of Fayetteville Arkansas Fayetteville Arkansas

Page 11: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

A ProblemA Problem

Properly drawn samples provide Properly drawn samples provide information appropriate for describing the information appropriate for describing the population of elements that compose the population of elements that compose the sampling framesampling frame

Very often researchers select samples Very often researchers select samples from a given sampling frame and make from a given sampling frame and make assertions about a population that is assertions about a population that is similar but not identical to the population similar but not identical to the population defined by the sampling frame. defined by the sampling frame.

Page 12: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

The Sequence The Sequence

The sampling frame is a list of the The sampling frame is a list of the elements composing the the study elements composing the the study population. population.

Existing frame always define the study Existing frame always define the study population, rather than other way population, rather than other way around.around.

Have a population in mindsHave a population in minds Search for available sampling frameSearch for available sampling frame Redefine your population to Redefine your population to

accommodate your sampling frameaccommodate your sampling frame

Page 13: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

ElementsElements

You can make use of list of registered You can make use of list of registered voters, automobile owners, taxpayers, and voters, automobile owners, taxpayers, and telephone directoriestelephone directories

Telephone directories have many defects Telephone directories have many defects in representing the entire population in a in representing the entire population in a region. First is its social class bias, poor region. First is its social class bias, poor people have no phone line, rich people people have no phone line, rich people have many phone lines. Second, many have many phone lines. Second, many people choose not to put their names on people choose not to put their names on the list.the list.

Page 14: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

PrinciplesPrinciples

Findings based on a sample can be Findings based on a sample can be taken as representing only the taken as representing only the aggregation of elements that aggregation of elements that compose the sampling framecompose the sampling frame

Omission is inevitable. You need to Omission is inevitable. You need to correctly assess the empirical result correctly assess the empirical result and not to over-generalize your and not to over-generalize your findings.findings.

Each element in a sample appears Each element in a sample appears only once. only once.

Page 15: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Types of Sampling DesignTypes of Sampling Design

Simple random sampling: once you have a Simple random sampling: once you have a sampling frame, assign a unique number sampling frame, assign a unique number to each elements in the frame, and use to each elements in the frame, and use random number generator to select casesrandom number generator to select cases

public class random public class random public static void main (String args[])public static void main (String args[]) for (int i=0; i<10; i++)for (int i=0; i<10; i++)

System.out.println(Math.random()*10); System.out.println(Math.random()*10);

Page 16: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Types of Sampling DesignTypes of Sampling Design

Systematic sampling: Every KSystematic sampling: Every Kthth element in element in the entire list goes into the sample. Sampling the entire list goes into the sample. Sampling interval = population size / sample size; interval = population size / sample size; sampling ratio = sample size/population sizesampling ratio = sample size/population size

Very bad choice if the sampling interval is Very bad choice if the sampling interval is coincident with systematic bias in the list. For coincident with systematic bias in the list. For example, you sample every 10example, you sample every 10thth case in army case in army roster, but army roster is arranged according roster, but army roster is arranged according to ranks and sergeants always rank the 1to ranks and sergeants always rank the 1st, st,

1111thth and so on and so forth. You sample is and so on and so forth. You sample is either consisting of only sergeants or of either consisting of only sergeants or of absolutely no sergeants. absolutely no sergeants.

Page 17: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Stratified SamplingStratified Sampling

Stratified samples is to first organize the Stratified samples is to first organize the population into homogeneous subsets population into homogeneous subsets (with heterogeneous between subsets) (with heterogeneous between subsets) and to select the appropriate number of and to select the appropriate number of elements from each.elements from each.

The goal of stratified sampling is to reduce The goal of stratified sampling is to reduce sampling error by creating homogenous sampling error by creating homogenous subpopulation from which the samples are subpopulation from which the samples are selected. selected.

Page 18: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Stratified SamplingStratified Sampling

An example to produce a homogenous An example to produce a homogenous population in studies of college student is to population in studies of college student is to create subpopulation of students based on create subpopulation of students based on their age cohorts. So each subpopulation their age cohorts. So each subpopulation consists of people with the same age. Then consists of people with the same age. Then randomly select cases from each stratified randomly select cases from each stratified age cohorts.age cohorts.

Depending on your research focus, you may Depending on your research focus, you may stratify the population according to different stratify the population according to different variables such as sex, occupations, variables such as sex, occupations, educations, races, social classes, incomes, educations, races, social classes, incomes, etc. etc.

Page 19: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Implicit StratificationImplicit Stratification

Some lists have implicit stratification. Some lists have implicit stratification. For example, a university may use For example, a university may use students SSN to produce a roster for students SSN to produce a roster for the entire university. So the roster is the entire university. So the roster is grossly stratified by geographic grossly stratified by geographic locations. In these cases, you need to locations. In these cases, you need to use systematic sampling to produce use systematic sampling to produce homogeneous cases in terms of homogeneous cases in terms of geography.geography.

Page 20: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

An exampleAn example

Studying students in University of HawaiiStudying students in University of Hawaii Sampling frame is the computerized Sampling frame is the computerized

student file containing students id, gender, student file containing students id, gender, name, address, SSN, major, age, and class.name, address, SSN, major, age, and class.

Redefine the study population as day-Redefine the study population as day-program degree seeking, students in program degree seeking, students in fall semester on the Manoa compus, fall semester on the Manoa compus, including all departments, all levels, including all departments, all levels, all nationalities.all nationalities.

Stratified the population by college class Stratified the population by college class into many subpopulations. into many subpopulations.

Page 21: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

An exampleAn example

Determine the sample size to be 1,100 Determine the sample size to be 1,100 and ratio to be 1/14, a random number and ratio to be 1/14, a random number generator produces a number from 1 to generator produces a number from 1 to 14, students of that number in every 14 14, students of that number in every 14 students block is selected into the sample.students block is selected into the sample.

Due to budget cut, the sample size is Due to budget cut, the sample size is down to 733. A systematic random down to 733. A systematic random sampling with a random start reduces the sampling with a random start reduces the sample size to 733. sample size to 733.

Page 22: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Multistage Cluster SamplingMultistage Cluster Sampling

Multistage cluster sampling first samples Multistage cluster sampling first samples groups of elements, followed by the groups of elements, followed by the selection of elements within each of the selection of elements within each of the selected clustersselected clusters

Bian (1994) used multistage cluster Bian (1994) used multistage cluster sampling in his studies of work and sampling in his studies of work and inequality in urban China. He sampled 2 inequality in urban China. He sampled 2 out of totally 6 districts in Tianjin, China, out of totally 6 districts in Tianjin, China, using random selection. using random selection.

Page 23: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Bian’s ResearchBian’s Research Within each district, there are more than Within each district, there are more than

100 street blocks, which in turn have a 100 street blocks, which in turn have a entire list of household living in the street entire list of household living in the street blocks. Bian randomly selected 10 street blocks. Bian randomly selected 10 street blocks within each district and 50 blocks within each district and 50 household within each street blocks. So his household within each street blocks. So his sample ends up having 50 * 10 * 2 = 1000 sample ends up having 50 * 10 * 2 = 1000 individuals because he interviewed individuals because he interviewed individuals within each household.individuals within each household.

Bian, Yanjie. 1994. Bian, Yanjie. 1994. Work and inequality in Work and inequality in urban Chinaurban China.. Albany, NY: State University Albany, NY: State University of New York Pressof New York Press

Page 24: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Increasing Sampling ErrorIncreasing Sampling Error

Multistage design has a defect of increasing Multistage design has a defect of increasing the sampling error, which is the function of the sampling error, which is the function of the number of stages. In previous Bian the number of stages. In previous Bian example. Researchers have a sampling error example. Researchers have a sampling error when they randomly selected district, when they randomly selected district, another sampling error when they selected another sampling error when they selected blocks, and one more sampling error when blocks, and one more sampling error when they select individual households.they select individual households.

However, for a given sample size (mostly due However, for a given sample size (mostly due to budget constraint), the number of clusters to budget constraint), the number of clusters trade-offs with the number of elements within trade-offs with the number of elements within each cluster. each cluster.

Page 25: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

SolutionsSolutions

Solution one would be to increase the Solution one would be to increase the number of clusters and decrease the number number of clusters and decrease the number of elements within each cluster for a given of elements within each cluster for a given sample size. The reason we do this is sample size. The reason we do this is because each cluster consists of largely because each cluster consists of largely homogeneous elements, which will reduce homogeneous elements, which will reduce the sampling errorthe sampling error

The second solution uses stratification for the The second solution uses stratification for the multistage sampling. For example, using multistage sampling. For example, using geographic location as the stratifying geographic location as the stratifying variable to produce stratum, within which variable to produce stratum, within which you can randomly select churches.you can randomly select churches.

Page 26: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

SolutionSolution

U.S. census bureau has standardize U.S. census bureau has standardize this practice by asking 5 household this practice by asking 5 household per census block. If you need to per census block. If you need to study 2,000 household, you need to study 2,000 household, you need to randomly select 400 blocks from the randomly select 400 blocks from the list.list.

Page 27: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

Probability Proportionate to Size Probability Proportionate to Size (PPS) Sampling(PPS) Sampling

A more sophisticated sampling method A more sophisticated sampling method called PPS ensures the same probability of called PPS ensures the same probability of being selected in multistage random being selected in multistage random cluster sampling.cluster sampling.

We want to sample 100 blocks from a total We want to sample 100 blocks from a total 1,000 street blocks from a city, then within 1,000 street blocks from a city, then within each block, sample 1 household for the each block, sample 1 household for the study. The probability of block selection is study. The probability of block selection is 10%, the probability for household 10%, the probability for household selection is 1/N(block household number). selection is 1/N(block household number). Thus the probability of being selected for Thus the probability of being selected for the study is 1/(10*N) for each household. the study is 1/(10*N) for each household. So what’s the problem?So what’s the problem?

Page 28: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

PPSPPS

The problem is to ensure the same The problem is to ensure the same probability to be selected for each probability to be selected for each household, it assumes each block has the household, it assumes each block has the same number of households, which is not same number of households, which is not the case for most times.the case for most times.

Page 29: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

PPSPPS Suppose a city has ten living blocks. Block Suppose a city has ten living blocks. Block

A has 100 households, whereas other A has 100 households, whereas other blocks have 200 household/per block.blocks have 200 household/per block.

Using multistage random sampling to Using multistage random sampling to select 5 households, assuming a random select 5 households, assuming a random selection of 1 blockselection of 1 block

The probability of being selected in block A The probability of being selected in block A is 10% * 5% = 0.5%is 10% * 5% = 0.5%

The probability of being selected in other The probability of being selected in other blocks is 10% * 2.5% = 0.25% blocks is 10% * 2.5% = 0.25%

Page 30: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

SolutionSolution

PPS can solve the unequal probability of PPS can solve the unequal probability of selection problem associated with selection problem associated with multistage cluster sampling by assigning multistage cluster sampling by assigning weight to change the probability of each weight to change the probability of each cluster.cluster.

Page 31: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

SolutionSolution A city has two blocks. Block A has 100 A city has two blocks. Block A has 100

households, block B has 10 households. households, block B has 10 households. We assign the probability of selecting We assign the probability of selecting block A 10 times of that of selecting block block A 10 times of that of selecting block B. If P(B) = 1%, P(A) = 10%. Supposing we B. If P(B) = 1%, P(A) = 10%. Supposing we want to select 5 households each block. want to select 5 households each block. Households in block A has a probability of Households in block A has a probability of 5% of being selected, whereas household 5% of being selected, whereas household B has 50% of being selected. However B has 50% of being selected. However overall the household in block A has 5% * overall the household in block A has 5% * 10% = 0.0005% of being selected and B 10% = 0.0005% of being selected and B has the same probability of 50% * 1% = has the same probability of 50% * 1% = 0.0005% chances. 0.0005% chances.

Page 32: SOCI5013 Advanced Social Research Probability Sampling Song Yang Spring 2007.

ApplicationApplication

Suppose a city has three blocks: A Suppose a city has three blocks: A (1,000 households), B(100 (1,000 households), B(100 households), and C (10 households).households), and C (10 households).

Suppose you can only sample one Suppose you can only sample one block and you want to study five block and you want to study five households.households.

How to implement PPS to ensure How to implement PPS to ensure EPSE (Equal Probability of Selection)?EPSE (Equal Probability of Selection)?


Recommended