RESEARCH METHODS IN LINGUISTICS 1302740
Lecture (3)
Population Samples
2
Learn the reasons for sampling
Develop an understanding about different sampling methods
Distinguish between probability & non probability sampling
Discuss the relative advantages & disadvantages of each sampling methods
Population the group you are ultimately interested in knowing more
about their linguistic behaviour On the basis of sample study we can predict and generalize
the behavior of mass phenomena. “entire aggregation of cases that meets a designated set of
criteria".
A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005)
Sample vs. Census
Census: an accounting of the complete population A census study occurs if the entire population is very small or it is
reasonable to include the entire population (for other reasons).
It is called a census sample because data is gathered on every member of the population.
Why sample? The population of interest is usually too large to attempt
to survey all of its members.
Resources (time, money) and workload So… A carefully chosen sample can be used to represent the
population.
The sample reflects the characteristics of the population from which it is drawn.
Gives results with known accuracy that can be calculated mathematically
If all members of a population were identical, the population is considered to be homogenous.
That is, the characteristics of any one individual in the population would be the same as the characteristics of any other individual (little or no variation among individuals).
So, if the human population on Earth was homogenous in characteristics, how many people would an alien need to abduct in order to understand what humans were like?
When individual members of a population are different from each other, the population is considered to be heterogeneous (having significant variation among individuals).
How does this change an alien’s abduction scheme to find out more about humans?
In order to describe a heterogeneous population, observations of multiple individuals are needed to account for all possible characteristics that may exist.
Population
Sample
Using data to say something (make an inference) with confidence, about a whole (population) based on the study of a only a few (sample).
Sampling Frame
Sampling Process
What you want to talk
about
What you actually
observe in the data
Inference
If a sample of a population is to provide useful (linguistic) information about that population, then the sample must contain essentially the same (linguistic) variation as the population.
The more heterogeneous a population is… The greater the chance is that a sample may not adequately describe a
population we could be wrong in the inferences we make about the population.
And…
The larger the sample needs to be to adequately describe the
population we need more observations to be able to make accurate inferences.
Sampling is the process of selecting observations (a sample) to provide an adequate description and robust inferences of the population The sample is representative of the population.
The deviation between an estimate from an ideal sample and the true population value is the sampling error.
Almost always, the sampling frame does not match up perfectly with the target population, leading to errors of coverage.
Non-response is probably the most serious of these errors. Arises in three ways:
1. Inability of the person responding to come up
with the answer
2. Refusal to answer
3. Inability to contact the sampled elements
These errors can be classified as due to the interviewer, respondent, instrument, or method of data collection.
Interviewers have a direct and dramatic effect on the way a person responds to a question. Most people tend to side with the view apparently
favored by the interviewer, especially if they are neutral.
Friendly interviewers are more successful.
In general, interviewers of the same gender, racial, and ethnic groups as those being interviewed are slightly more successful.
Respondents differ greatly in motivation to answer correctly and in ability to do so.
Obtaining an honest response to sensitive questions is difficult.
Basic errors Recall bias: simply does not remember Prestige bias: exaggerates to ‘look’ better Intentional deception: lying Incorrect measurement: does not understand the units
or definition
There are 2 types of sampling: Non-Probability sampling Probability sampling
Probability Samples: each member of the population has a known non-zero probability of being selected
Methods include random sampling, systematic sampling,
and stratified sampling.
Nonprobability Samples: members are selected from the population in some nonrandom manner
Methods include convenience sampling, judgment
sampling, quota sampling, and snowball sampling
Probability Samples: each member of the population has a known non-zero probability of being selected
Methods include 1. (simple) random sampling 2. systematic sampling 3. stratified sampling
Random sampling is the purest form of probability sampling.
Each member of the population has an equal and known
chance of being selected. When there are very large populations, it is often
‘difficult’ to identify every member of the population, so the pool of available subjects becomes biased. You can use software to generate random numbers or
to draw directly from the columns of random numbers
Lottery method Random number tables
Define the population Determine percentage to
be interviewed or studied Each individual has an
equal chance of selection Random sample becomes
representative of the larger whole
List of population
Random subsample
advantages… • …easy to conduct • …strategy requires
minimum knowledge of the population to be sampled
disadvantages… • …need names of all
population members • …may over- represent
or under- estimate sample members
• …there is difficulty in reaching all selected in the sample
Systematic sampling is often used instead of random sampling. It is also called an Nth name selection technique.
After the required sample size has been calculated, every Nth record is selected from a list of population members.
As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method.
Its only advantage over the random sampling technique is simplicity (and possibly cost effectiveness).
Procedure
Number units in population from 1 to N.
Decide on the n that you want or need.
N/n=k the interval size. Randomly select a number from
1 to k. Take every kth unit.
1 26 51 76 2 27 52 77 3 28 53 78 4 29 54 79 5 30 55 80 6 31 56 81 7 32 57 82 8 33 58 83 9 34 59 84 10 35 60 85 11 36 61 86 12 37 62 87 13 38 63 88 14 39 64 89 15 40 65 90 16 41 66 91 17 42 67 92 18 43 68 93 19 44 69 94 20 45 70 95 21 46 71 96 22 47 72 97 23 48 73 98 24 49 74 99 25 50 75 100
N = 100
1 26 51 76 2 27 52 77 3 28 53 78 4 29 54 79 5 30 55 80 6 31 56 81 7 32 57 82 8 33 58 83 9 34 59 84 10 35 60 85 11 36 61 86 12 37 62 87 13 38 63 88 14 39 64 89 15 40 65 90 16 41 66 91 17 42 67 92 18 43 68 93 19 44 69 94 20 45 70 95 21 46 71 96 22 47 72 97 23 48 73 98 24 49 74 99 25 50 75 100
N = 100
Want n = 20
1 26 51 76 2 27 52 77 3 28 53 78 4 29 54 79 5 30 55 80 6 31 56 81 7 32 57 82 8 33 58 83 9 34 59 84 10 35 60 85 11 36 61 86 12 37 62 87 13 38 63 88 14 39 64 89 15 40 65 90 16 41 66 91 17 42 67 92 18 43 68 93 19 44 69 94 20 45 70 95 21 46 71 96 22 47 72 97 23 48 73 98 24 49 74 99 25 50 75 100
N (population) = 100
n (sample) = 20
N/n (interval) = 5
1 26 51 76 2 27 52 77 3 28 53 78 4 29 54 79 5 30 55 80 6 31 56 81 7 32 57 82 8 33 58 83 9 34 59 84 10 35 60 85 11 36 61 86 12 37 62 87 13 38 63 88 14 39 64 89 15 40 65 90 16 41 66 91 17 42 67 92 18 43 68 93 19 44 69 94 20 45 70 95 21 46 71 96 22 47 72 97 23 48 73 98 24 49 74 99 25 50 75 100
N = 100
Want n = 20
N/n = 5
Select a random number from 1-5: chose 4
1 26 51 76 2 27 52 77 3 28 53 78 4 29 54 79 5 30 55 80 6 31 56 81 7 32 57 82 8 33 58 83 9 34 59 84 10 35 60 85 11 36 61 86 12 37 62 87 13 38 63 88 14 39 64 89 15 40 65 90 16 41 66 91 17 42 67 92 18 43 68 93 19 44 69 94 20 45 70 95 21 46 71 96 22 47 72 97 23 48 73 98 24 49 74 99 25 50 75 100
N = 100
Want n = 20
N/n = 5
Select a random number from 1-5: chose 4
Start with #4 and take every 5th unit
advantages… • …sample selection
is simple • may be more precise
than simple random sample.
disadvantages… • …all members of the
population do not have an equal chance of being selected
• …the Kth person may be related to a periodical order in the population list, producing unrepresentativeness in the sample
Stratified sampling is commonly used probability method that is superior to random sampling because it reduces sampling error.
Sometimes called "proportional" or "quota" random sampling.
A stratum is a subset of the population that share at least one common characteristic; such as males and females. Identify relevant stratums and their actual representation in the
population. Random sampling is then used to select a sufficient number of subjects
from each stratum. Stratified sampling is often used when one or more of the stratums in
the population have a low incidence relative to the other stratums.
• Objective: Population of N units divided into non-overlapping strata N1, N2, N3, ... Ni such that N1 + N2 + ... + Ni = N; then do simple random sample of n/N in each strata.
• To insure representation of each strata, oversample smaller population groups.
• Sampling problems may differ in each strata. • Increase precision (lower variance) if strata are homogeneous
within.
List of clients
List of clients
Strata
African-American Others Hispanic-American
List of clients
Random subsamples of n/N
Strata
African-American Others Hispanic-American
advantages… • …more precise sample • …can be used for both
proportions and stratification sampling
• …sample represents the desired strata
disadvantages • …need names of all
population members
• …there is difficulty in reaching all selected in the sample
Nonprobability Samples: “Members are selected from the population in some nonrandom manner” (Barreiro, 2009) Methods include 1. convenience sampling 2. judgment sampling 3. quota sampling 4. snowball sampling
Convenience sampling is used in exploratory research where the researcher is interested in getting an inexpensive approximation.
The sample is selected because they are convenient (to the researcher).
It is a nonprobability method. Often used during preliminary
research (pilot studies) efforts to get an estimate without incurring the cost or time required to select a random sample
Exploratory research Inexpensive approximation
Ex: preliminary research efforts to attain the number of L1, L2, …., Ln speakers at university
Saves time and money selected because they are
willing and available
Convenience samples: samples drawn at the convenience of the interviewer. People tend to make the selection at familiar locations and to choose respondents who are like themselves.
Error occurs 1) in the form of members of the population who are
infrequent or nonusers of that location
1. who are not typical in the population
disadvantages… • …difficulty in
determining how much of the effect (dependent variable) results from the cause (independent variable)
advantages… • useful in pilot studies.
Judgment (Purposive) sampling is a common nonprobability method.
The sample is selected based upon judgment.
an extension of convenience sampling
Researcher's knowledge is used to hand pick the cases to be included in the sample
When using this method, the researcher must be confident that the chosen sample is truly representative of the entire population.
Subjective judgment
“The person who is selecting the sample is who tries to make the sample representative, depending on his opinion or purpose, thus being the representation subject” (Barreiro, 2009)
Requires researcher confidence that the sample truly represents an entire population
disadvantages… • …potential for
inaccuracy in the researcher’s criteria and resulting sample selections
• Personal prejudice & bias
• No objective way of evaluating reliability of results
disadvantages… • Small no. of sampling
units • Study unknown
traits/case sampling
Quota sampling is the nonprobability equivalent of stratified sampling.
First identify the stratums and their proportions as they are represented in the population
Then convenience or judgment sampling is used to select the required number of subjects from each stratum.
Convenience or judgment sampling to fill quota from specific sub-groups of a population Ex: Interviewer is instructed to interview 50 males
between the ages of 18-25
Useful when: Time is limited Money restraints Detailed accuracy is not important
disadvantages… • …people who are less
accessible (more difficult to contact, more reluctant to participate) are under-represented
Snowball sampling is a special nonprobability method used when the desired sample characteristic is rare.
It may be extremely difficult or cost prohibitive to locate respondents in these situations.
This technique relies on referrals from initial subjects to generate additional subjects (friend-of-friend).
It lowers search costs; however, it introduces bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population.
disadvantages… • not representative of the
population and will result in a biased sample as it is self-selecting.
disadvantages… • access to difficult to
reach populations (other methods may not yield any results).
• Convenient • Economical
Rarely representative of researcher's target population -
not every element in the population has a chance of being included in the sample
Must be cautious about inferences and conclusions drawn from the data
The more heterogeneous a population is, the larger the sample needs to be.
Depends on topic – frequently it occurs?
For probability sampling, the larger the sample size, the better.
With nonprobability samples, not generalizable regardless – still consider stability of results
About 20 – 30% usually return a questionnaire
Follow up techniques could bring it up to about 50%
Still, response rates under 60 – 70% challenge the integrity of the random sample
How the survey is distributed can affect the quality of sampling
Sample size depends on: How much sampling error can be tolerated—levels of
precision Size of the population—sample size matters with small
populations Variation within the population with respect to the
characteristic of interest—what you are investigating Smallest subgroup within the sample for which estimates
are needed Sample needs to be big enough to properly estimate the
smallest subgroup
Rule of thumb: “the larger the sample size, the more closely your sample data will match that from the population” (Birchall, 2009)
Key factors to consider:
How accurate you wish to be How confident you are in the results What budget you have available
http://www.surveysystem.com/sscalc.htm
http://www.ezsurvey.com/samplesize.html
http://www.macorr.com/ss_calculator.htm
List the research goals (usually some combination of accuracy, precision, and/or cost).
Identify potential sampling methods that might effectively achieve those goals.
Test the ability of each method to achieve each goal.
Choose the method that does the best job of achieving the goals.
Power: statistical method used to determine sample size “Statistical power is the ability to detect a true difference
when, in fact, a true difference exists in the population of interest.” McNamara (1994), p. 56
The larger the sample the more representative of the population
it is likely to be.
When expected differences between groups are large a large sample is not needed to ensure that differences will be revealed in statistical analysis
When expected differences are small a large sample is needed to show differences in statistical analysis
"A large sample cannot correct for a faulty sampling design".
Must assess both the size of the sample & the method by which the sample is selected.
• Sample plan: definite sequence of steps that the researcher goes through in order to draw and ultimately arrive at the final sample
• Step 1: Define the relevant population. • Specify the descriptors, geographic
locations, and time for the sampling units. • Step 2: Obtain a population list, if possible;
may only be some type of sample frame
• List brokers, government units, customer lists, competitors’ lists, association lists, directories, etc.
• Step 2 (concluded): • Incidence rate (occurrence of certain types in
the population, the lower the incidence the larger the required list needed to draw sample from)
• Step 3: Design the sample method (size and method).
• Determine specific sampling method to be used. All necessary steps must be specified (sample frame, n, … recontacts, and replacements)
• Step 4: Draw the sample. • Select the sample unit and gain the
information
• Step 4 (Continued): • Drop-down substitution • Oversampling • Resampling
• Step 5: Assess the sample. • Sample validation – compare sample profile
with population profile; check non-responders
• Step 6: Resample if necessary.