SamplingSampling
Time SeriesTime Series
● Time Series: One observational unit is observed many times Examples: Mostly macro, finance
• – The daily price of a stock over a month• – Brunei's exports from 1980-2005
– The monthly unemployment rate from January 2000 – December 2004
Advantages:• – We have many observations of the same thing• – This means we can use inferential statistics for large samples based on one entity
Disadvantages:1.Samples are usually small– Examples: Yearly economic growth for.... 20 years?
2.Observations are not independent andidentically distributed– Examples: If economic growth was higher than average in 1997, it was probably higher than average in 1996
3. Spurious correlation and regression can arise – Examples:
• ● We have a time series of the world's annual economic output from 1940 to 2000
• ● We have a time series of the number of countrieswith nuclear weapons from 1940 to 2000
• ● Will the two be correlated? Does it mean anything?
Longitudinal DataLongitudinal Data● Longitudinal or Panel Data: Several
observational units are surveyed several times
Examples:• – We have output per person for every district in
Brunei from 1990-2000• – We observe 2,000 households every year for 10
years. During that time they have births, deaths, job changes, etc.
Advantage: We can control for individual characteristics of the observational
unit• Example: We observe people over five years. Some of them move
from place to place.• We measure the income difference when a person moves from city
to country or country to city
Disadvantage: Observations are not independent and identically distributed– Example:
● People who have high incomes in one place will have high incomes in another place
● This means you can predict the person's income one year if you know it in another year
Using Secondary DataUsing Secondary DataBe sure you have:
A codebook, which describes the relationship between the survey questions and the data on the computer
– Examples: How are industries categorized? How are data coded? Are all questions asked of all people? A description of the sampling technique, i.e.,
how the sample was collected
Why do sampling?Why do sampling? To learn about the characteristics of a group of people To learn about the characteristics of a group of people
or objects without having to collect information about all or objects without having to collect information about all of the people or objects of interest.of the people or objects of interest.
To save money and time.To save money and time.
To increase internal validity. Use of multiple data To increase internal validity. Use of multiple data collectors or the passage of large amounts of time can collectors or the passage of large amounts of time can negatively impact internal validity . Well conducted negatively impact internal validity . Well conducted samples can actually be more accurate than collecting samples can actually be more accurate than collecting the desired data from all of the people or objects of the desired data from all of the people or objects of interest.interest.
Sampling MethodsSampling Methods
In Class ExerciseIn Class ExerciseThis exercise will demonstrate the power of This exercise will demonstrate the power of random sampling. It involves the following steps:random sampling. It involves the following steps:
A survey question will be distributed to everyone in class A survey question will be distributed to everyone in class asking for your position on an important current issue.asking for your position on an important current issue.
All the responses will be tabulated.All the responses will be tabulated. A random sample of the responses will be drawn.A random sample of the responses will be drawn. The results of the sample will be compared to the results The results of the sample will be compared to the results
from the universe of people in class.from the universe of people in class.
In Class ExerciseIn Class ExerciseSurvey QuestionsSurvey Questions::
1. Age? A. less than 20 B. 21-30 C. 31-40 D. 41-50 E. Above 501. Age? A. less than 20 B. 21-30 C. 31-40 D. 41-50 E. Above 502. Gender? A. Male B. Female2. Gender? A. Male B. Female3. Highest Academic Qualification? A. O Level B. A Level C. National Diploma 3. Highest Academic Qualification? A. O Level B. A Level C. National Diploma
D. Higher National Diploma E. Bachelors Degree F. Masters Degree.D. Higher National Diploma E. Bachelors Degree F. Masters Degree.3. Do you own a facebook account? A. Yes B. No3. Do you own a facebook account? A. Yes B. No4. If yes, how often do you update your fb account? A. Everyday B. Once a 4. If yes, how often do you update your fb account? A. Everyday B. Once a
week C.2-4 days/week D. 5-6days/weeks E. Not applicable F. week C.2-4 days/week D. 5-6days/weeks E. Not applicable F. Others:___________pls specify.Others:___________pls specify.
5. What activities do you use your facebook for? A. Personal Use B. Selling C. 5. What activities do you use your facebook for? A. Personal Use B. Selling C. Advertising D. Others:__________pls specifyAdvertising D. Others:__________pls specify
ID Number ____
Form of the Sample● Inperson interviews– Advantages:
• ● Higher response rate• ● May be the only way to reach some people,
especially poor people• ● Can obtain precise answers to technical
questions (income, occupation, etc.)
– Disadvantages:• ● Expensive• ● People may refuse to answer private questions
Form of the Sample● InPerson Questionnaires– Advantages:
• ● Higher response rate• ● May be the only way to reach some people,
especially poor people• ● Good way to ask very private questions
– Disadvantages:• ● Expensive• ● Assumes literacy
Solicited responses (by mail, handout, etc.)– Advantages:
• ● Relatively inexpensive• ● May get interesting responses
– Disadvantages:• ● Low response rate; biased toward extreme views• ● Assumes literacy
Telephone interviews– Advantages:
• ● Relatively inexpensive• ● Respondents feel some privacy• ● Reasonable response rate
– Disadvantages:• ● Biased toward wealthy in Ethiopia• ● In all countries: Landlines biased toward old,
cellphones toward young
Online interviews– Advantages:
• ● Very inexpensive; saves inputting costs as well• ● Respondents feel privacy• ● Response rate varies by method of solicitation
– Disadvantages:• ● Very biased toward wealthy in some countries• ● Biased toward young everywhere; very poor
have less online access in industrialized world
Types of SamplingTypes of Sampling The following three types of samples are based The following three types of samples are based
on the use of probability theory. These types of on the use of probability theory. These types of samples increase external validity (i.e., they samples increase external validity (i.e., they produce results which can to some extent be produce results which can to some extent be generalized to a broader group).generalized to a broader group).
Simple random sampleSimple random sample Stratified random sampleStratified random sample Cluster samplesCluster samples
Two ways of making a probability Two ways of making a probability sample more representative of sample more representative of the population being studied:the population being studied:
Make sure that every unit picked for Make sure that every unit picked for the sample has the same chance of the sample has the same chance of being picked as any other unit being picked as any other unit (randomness).(randomness).
Increase the sample size (less Increase the sample size (less important that (1) above).important that (1) above).
Proper Size of the SampleProper Size of the SampleFactors that affect what the size of the sample Factors that affect what the size of the sample
needs to be:needs to be:
The heterogeneity of the population (or strata or The heterogeneity of the population (or strata or clusters) from which the units are chosen.clusters) from which the units are chosen.
How many population subgroups (strata) you will deal How many population subgroups (strata) you will deal with simultaneously in the analysis.with simultaneously in the analysis.
How accurate you want your sample statistics How accurate you want your sample statistics (parameter estimates) to be.(parameter estimates) to be.
How common or rare is the phenomenon you are How common or rare is the phenomenon you are trying to detect.trying to detect.
How much money and time you have.How much money and time you have.
Calculating Sample SizeCalculating Sample Size XX22NP(1-P)NP(1-P)
n = ________________n = ________________
CC22(N-1)+(N-1)+XX22P(1-P)P(1-P)Where:Where:
n = n = the required sample sizethe required sample sizeXX22 = is the chi-square value for 1 degree of freedom at some = is the chi-square value for 1 degree of freedom at some desired probability leveldesired probability levelN = N = is the size of the population universe (which gets more is the size of the population universe (which gets more important as N gets smaller)important as N gets smaller)P =P = is the population parameter of the variable (set=.5 which is the population parameter of the variable (set=.5 which is is the worst case scenario, meaning maximally the worst case scenario, meaning maximally heterogeneous for a dichotomous variable)heterogeneous for a dichotomous variable)C = C = the chosen confidence intervalthe chosen confidence interval
Important note: This formula is good for dichotomous variable (yes/no Important note: This formula is good for dichotomous variable (yes/no type variable), not more complex variables.type variable), not more complex variables.
1 2 3 4 5Confindence X 2 value for Population Population RequiredInterval (+/-) 95% Level of Size Parameter Sample Size
C Significance N P n5% 3.841 50 0.5 445% 3.841 100 0.5 805% 3.841 150 0.5 1085% 3.841 200 0.5 1325% 3.841 250 0.5 1525% 3.841 300 0.5 1695% 3.841 400 0.5 1965% 3.841 500 0.5 2175% 3.841 800 0.5 2605% 3.841 1,000 0.5 2785% 3.841 1,500 0.5 3065% 3.841 2,000 0.5 3225% 3.841 3,000 0.5 3415% 3.841 4,000 0.5 3515% 3.841 5,000 0.5 3575% 3.841 10,000 0.5 3705% 3.841 50,000 0.5 3815% 3.841 1,000,000 0.5 3845% 3.841 50,000,000 0.5 3845% 3.841 100,000,000 0.5 3845% 3.841 300,000,000 0.5 384
For Simple Dichotimous Choice
Sample Size Required for Various Populations Sizesat 5% Confidence Interval
Stratified SamplingStratified Sampling
Is done whenever it is likely than an important Is done whenever it is likely than an important subpopulation will be under represented in a simple subpopulation will be under represented in a simple random sample.random sample.
Must know independent variables upon which to Must know independent variables upon which to stratifystratify
Must know the sizes of the strata subpopulationsMust know the sizes of the strata subpopulations Is complex and more costlyIs complex and more costly Each strata has it's own sampling error. But the Each strata has it's own sampling error. But the
aggregate sampling error of the total population is aggregate sampling error of the total population is reduced.reduced.
There is proportionate and disproportionate random There is proportionate and disproportionate random samplingsampling
Cluster SamplingCluster Sampling
Is a way to sample a population when Is a way to sample a population when there is no convenient lists or frames there is no convenient lists or frames (e.g., homeless in shelters or soup (e.g., homeless in shelters or soup kitchens).kitchens).
Self-Selection BiasSelf-Selection Bias
Is caused by the unit of observation (e.g., Is caused by the unit of observation (e.g., person) choosing whether or not to be a person) choosing whether or not to be a respondent in a survey.respondent in a survey.
If the self-selection process itself If the self-selection process itself isis random, it random, it will notwill not compromise the randomness of the compromise the randomness of the selection process.selection process.
If the self-selection process If the self-selection process is notis not random (is random (is systematic), it systematic), it willwill compromise the compromise the randomness of the selection process.randomness of the selection process.