1
A workshop on using R to select a sample for
EHES
Susie Cooper & Johan Heldal
Statistics Norway
2
Overview• What is R and why use it?• Practical Exercises
1. Installing and loading R and packages2. Reading external files3. Calculating sample sizes4. Stage 1 - Selecting Primary Sampling Units (PSU)5. Stage 2 - Selecting Secondary Sampling Units (SSU)
• Where to get more information
3
Why use R for EHES?• It has been agreed with EU because
• It’s free - therefore available for all countries involved.
• Very flexible• Very powerful and fast tool for sampling
and analyses.However…• There can be a steep learning curve to
using the program.• No user-friendly interface.
4
What is EHESsampling?• A tool for planning the sampling
design• Can be used to find good stratifications• Can calculate cost-variance optimal
sample sizes within PSUs.• Can calculate costs and variances of
alternatives.• A tool for taking a probability sample
from a sampling frame.
5
Using EHESsampling• The EHESsampling manual• Before using EHESsampling you have to
prepare some input datasets from the main sampling frame. For sampling at stage 1 you need• A dataset describing the PSUs• A dataset describing the strata
For stage 2 you need • The main sampling frame describing the
individual units
6
1. Loading Packages• Load the EHESsampling package and
other necessary packages each time you re-open R:
library(EHESsampling)
7
2. Reading External Files
• Open a new script by selecting File and New script
8
2. Reading External Files
• Set the working directory where data files are stored by typing into the new script:
setwd("X:/120/EHES/R/Data")
• Then press + R to send the line to the console
Location on your computer where the data files are stored
9
2. Reading External Files
• Read in the chosen file and save it in the working environment.
PSUs.df<-read.table("post1000.csv", sep=";", dec=",", header=T)
• The file is now stored as PSUs.df for this session.
10
• To see the start of the data set type:head(PSUs.df)
2. Reading External Files
Print the first 6 lines of this
24
Further Sampling Steps• Read in the strata dataset• Calculate the PSU sample sizes• Take a sample of PSUs – stage 1• Merge the selected PSUs with the
main sampling frame containing individual units.
• Sample individual units – stage 2
25
Selected Individuals
26
Help!• EHESsampling manual available at:
www.ehes.info • EHES participant manual – Part 1: Chapter
05• R websites:
• R official site: www.r-project.org• Quick R: www.statmethods.net
• Us:• [email protected]• [email protected]