+ All Categories
Home > Documents > Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph...

Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph...

Date post: 03-Jan-2016
Category:
Upload: alvin-montgomery
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
63
Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa, May, 2003
Transcript
Page 1: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Creating Something from Nothing:

Synthetic and Dummy files

Bo WandschneiderUniversity of Guelph

Chuck HumphreyUniversity of Alberta

DLI Training: Ottawa, May, 2003

Page 2: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Outline

• Types of data Files • Implications for analysis• Where do we get access• Which file is appropriate• Providing service with synthetic files • NPHS: an exercise• SLID: an exercise

Page 3: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Types of Data Files

• Microdata• Confidential Microdata Products

• Master Files• Share Files

• Public Access Microdata Products • Public Use Anonym zed microdata

(PUMFS)• Synthetic Files

Page 4: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Microdata Products

Microdata• raw data organized in a file where

the records or lines in the file are observations of a specific unit of analysis and the information on the lines are the values of variables

• requires some form of processing or analysis to be used

Page 5: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Microdata Products

Microdata - SCF Example000011031000+025607+000000+025607+000337+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+025944+006481+0194632331000000000090922201200000000000222+0232111000+000000+0000003000000000000000002228233411412190638749500575211004600132 000021031000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+0000001663000000000060824432200000000000632+0000000000+000000+0000000000000000000000003116121111435481500777500570033004300110 000031031000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+0000001663000000000040521112200000000000432+0206261110+000636+0000003000000000000000002228213411436491600778500570033004200085 000041031000+002080+000000+002080+000000+000575+000522+000000+000000+002574+000000+000000+003671+003149+000522+000000+000000+005751+000000+0057514551000000000060824432200000000000532+0220101021+000575+0005223000000000000000002240223411431251000774500571622361600065 000051031000+018050+000000+018050+000000+000288+000261+000000+000000+000000+000000+000000+000549+000288+000261+000000+001179+019778+002463+0173152221000000000050522201200000000000432+0000001011+000288+0002611000000000000000001246123411411440748739500575011021600046 000061031000+001500+000000+001500+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+001500+000000+0015002551000000000101024501200000000000631+0000000000+000000+0000000000000000000000003123263411431071300773500571612004300094 000071031000+000000+000000+000000+000000+000000+000000+002540+000000+000000+000000+000000+002540+002540+000000+000000+000000+002540+000000+0025404152000000000010340201200000000000222+0121134000+000000+0000003000000000000000002269233411436491600778500570033004200041 000081031000+008400+000000+008400+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+008400+000858+0075422551000000000080823301200000000000332+0000000000+000000+0000000000000000000000003118133411411210848739500575211004600055 000091031000+026000+000000+026000+000000+000287+000156+000000+000000+000879+000000+000000+001322+001166+000156+000000+000000+027322+004335+0229872231000000000070823422200000000000642+0000001012+000287+0001561000000000000000001248113411431400300774500564512071600060 000101031000+000000+000000+000000+000157+000000+000000+005043+000000+000000+000000+000000+005043+002541+002502+000000+000000+005200+000000+0052004652000000000040622312200000000000642+0000000000+000000+0000002000000000000000004376213411436491600778500570033004400076 000111031000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+0000001663000000000020341213100000000000462+0000000000+000000+0000000000000000000000003119213411435481500777500570033004500040 000121031000+000991+000000+000991+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000000+000991+000000+0009912551000000000020343322100000000000433+0000000000+000000+0000000000000000000000003117121311432231400773500571222004300244 000131031000+027716+000000+027716+000000+000288+000000+000000+000000+000000+000000+000000+000288+000288+000000+000000+000000+028004+006243+0217612221000000000070722201200000000000331+0034071100+000288+0000001000000000000000001226163411411431138739500575211004600156 000141031000+010000+000000+010000+000000+000600+000000+000000+000000+000000+000000+000000+000600+000600+000000+000000+000000+010600+000686+0099142331000000000040422201200000000000433+0077001011+000600+0005221000000000000000001260123411411440636719500573012221600148 000151031000+000750+000000+000750+000000+000000+000370+000000+000000+000000+000000+000000+000370+000000+000370+000000+000000+001120+000000+0011202551000000000080823313200000000000633+0323511032+001126+0003703000000000000000002245223411411261318529500575222004600132 000161031000+007012+000000+007012+000165+000000+000000+000000+000000+003082+000000+000000+003082+003082+000000+000000+000000+010259+001356+0089032541000000000070824432200000000000531+0000000000+000000+0000000000000000000000003118123411421320320439500573522171600111 000171031000+002027+000000+002027+000000+000000+000000+000000+000000+000000+000000+000000+

Page 6: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Confidential Microdata

Master Files• These files contain the fullness of

detail captured about the unit of observation. The information in these files can identify the individual who provided the original information and, therefore, are considered confidential.

Page 7: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Confidential Microdata

Master File – Example

Page 8: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Confidential Microdata

Master File - Personal identifiers

Page 9: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Confidential Microdata

Master File – Geography (SLID)

Page 10: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Confidential Microdata

Master File - Fullness of Data (NPHS)

Page 11: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Confidential Microdata

Master File - Fullness of Data

Page 12: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Confidential Microdata

Master File - Fullness of Data (SLID)

Page 13: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Confidential Microdata

Master File - Fullness of Data

Page 14: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Confidential Microdata

Share Files• these are confidential files in

which the respondents have signed a consent form permitting Statistics Canada to allow access to their information for approved research.

• Used with NPHS and NLSCY

Page 15: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Anonymized Microdata• these microdata are specially

prepared to minimize the possibility of disclosing or identifying any of the cases or observations

• the original data from the master file are edited to create a public use microdata file

Page 16: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Steps in Anonymizing Microdata• removal of all personal identification

information (names, addresses, etc)• include only gross levels of geography• collapse detailed information into a

smaller number of general categories• suppress the values of a variable

Page 17: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Statistics Canada PUMFs• only available for select social

surveys that undergo a review of the Data Release Committee, an internal Statistics Canada committee

• no ‘enterprise’ public use microdata

Page 18: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Statistics Canada PUMFs• almost all are cross-sectional, that

is, represent data collected at one point in time

• longitudinal data are difficult to anonymize while maintaining any useful information

Page 19: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

PUMFs – personal identifiers

Page 20: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

PUMFs – gross geography

Page 21: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

PUMFs – collapsed data

Page 22: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

PUMFs – suppressed data

Page 23: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Synthetic Files• These microdata do not contain

actual ‘real’ cases but are pseudo-cases that provide aggregate results close to the ‘real’ cases

Page 24: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Synthetic Files• They have been prepared to

create analysis runs with the master file without possibly disclosing or identifying any of the cases

Page 25: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Synthetic Files• The results are not to be reported;

strictly to be used to prepare analyses of master files

• Usually associated with longitudinal files

Page 26: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Steps in creating Synthetic Files• Observations are transformed• No records actually exist• Keep fullness of detail

Page 27: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Synthetic Files – NPHS example

Page 28: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Synthetic Files – NPHS 1999 general file

PUMF Synthetic

Obs 49046 49046

Var 176 400

Page 29: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Synthetic Files – NPHS 1999

Page 30: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Public Access Microdata

Synthetic Files – NPHS 1999

Page 31: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Implications for Analysis

What are the implications in doing analysis with these different types of microdata files?

Page 32: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Implications for Analysis

Master File• All observations• Has the most variables with the

most detail• Lots of geography and personal

characteristics• Little grouping or capping of

categories

Page 33: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Implications for Analysis

Master File• Restricted access: only available

to authorized Statistics Canada employees, which includes ‘deemed employees’

Page 34: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Implications for Analysis

Master File• Includes linkage variables across

files within a study, e.g., NLSCY linkage among the files for different units of analysis (kids, parents, teachers)

Page 35: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Implications for Analysis

Public Use Microdata (PUMF)• Suppressed observations• Suppressed variables: removed

from the file• Suppressed content

• Gross geography• Collapsed categories• Capped values

Page 36: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Implications for Analysis

Public Use Microdata (PUMF)• Licensed product: agree to certain

terms of use• No linkage to multiple units of

analysis, with a few exceptions (GSS Time Use and Family)

Page 37: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Implications for Analysis

Synthetic Files“Looks like a duck and quacks like a duck”, but it isn’t a duck or any other type of fowl.

Page 38: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Implications for Analysis

Synthetic Files• Looks like master files• Lots of observations• Lots of variables• Little grouping or capping of

categories• Lots of geographic detail

Page 39: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Synthetic Files

Precautions• Results not authentic – but close

in the aggregate• Use for testing analysis setups

only• Still need the master files for

publishable results

Page 40: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Where do we get Access?

Master File• Restricted access governed under

the Statistics Act• Remote Job Submission• Research Data Centres

• Apply to SSHRC to obtain a peer-reviewed proposal and STC for security clearance

Page 41: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Where do we get Access?

Public Use Microdata Files (PUMF)• Get from DLI• Analyze where ever is convenient • Can use a variety of analysis

software, including SAS, SPSS, Stata, HLM, LISREL, etc.

• Slidret sans data

Page 42: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Where do we get Access?

Synthetic Files• Author Divisions ‘may’ create it• Most relevant when dealing with

new Panel Data, but not necessarily, e.g., the Census has potential

• NPHS synthetic files on DLI FTP site

Page 43: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Where do we get Access?

Synthetic files• SLID, WES, YITS coming ????

• Do we need to encourage them?

• Work with locally• Build SAS and SPSS setups

Page 44: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Which File is Appropriate?

• 1st stop is still the PUMF• This file has the easiest access for us• Probably meets the needs of most

clients• Not as administratively burdensome

as synthetic or master file• Perfect for clients just looking for

‘data’ – courses in quantitative analysis

Page 45: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Which File is Appropriate?

• If more detail is needed, refer to the Master File Documentation (similar to Synthetic File Documentation)

• Make them aware that the cost of use is higher, both in terms of accessibility and analytical requirements

• Interest most likely to come from grad students and ‘experienced’ researchers

Page 46: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Which File is Appropriate?

• Download the Synthetic files from DLI

• Make them aware of problems with synthetic files – RESULTS ARE NOT PUBLISHABLE

• Encourage them to submit an application for RDC access – there is a time lag

Page 47: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Which File is Appropriate?

RDC

Page 48: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Which File is Appropriate?

• Some of you may work with client using synthetic files before passing her/him off to RDC

Page 49: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

DLI Contacts can provide four basic services with synthetic files.

• Build SPSS and SAS system files from the raw synthetic data files that are distributed through DLI;

• Provide information about the use of Remote Job Submission (a.k.a, Remote Access) and RDC’s;

Services for Synthetic Files

Page 50: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

• Assist with finding variables in the synthetic files;

• Provide instruction about ways of capturing SPSS or SAS code from “dummy” analysis runs with the synthetic files. It is this code that is then submitted to STC through remote job submission.

Services for Synthetic Files

Page 51: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

1. Building SPSS and SAS system files for synthetic data• The NPHS synthetic data are distributed

as a raw ASCII file with accompanying command files for SPSS and SAS

• Separate synthetic data files exist for the master file setup and for bootstrapping analysis

Services for Synthetic Files

Page 52: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

1. Building SPSS and SAS system files for synthetic data• The synthetic data for the 2000-2001

NPHS has 4,138 variables and 17,276 fabricated cases. Creating the SPSS and SAS system files from this file is not difficult, but it does take time. DLI Contacts may wish to create these products for their patrons.

Services for Synthetic Files

Page 53: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

2. Information about Remote Job Submission (RJS)• The author divisions supporting RJS have

established their own guidelines and have different operating procedures. Not all divisions supporting longitudinal surveys currently support RJS.

• Therefore, there is a need to track down this information for our patrons.

Services for Synthetic Files

Page 54: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

2. Information about Remote Job Submission (RJS)• For example, the sources for information

about RJS include the Centre for Education Statistics:

http://www.statcan.ca/english/edu/rda/index.htm

Services for Synthetic Files

Page 55: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,
Page 56: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

2. Information about Remote Job Submission (RJS)Where do you find this information?• Ask the DLI Team via the DLI List• The EAC has asked for a description of

RJS on the DLI website, which should be on the DLI Team’s to-do list

Services for Synthetic Files

Page 57: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

2. Information about Research Data Centres

• The collection of master files available through RDC’s is listed on the STC website for RDC’s

• Each RDC has its own website describing its services

http://www.statcan.ca/english/rdc/index.htm

Services for Synthetic Files

Page 58: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,
Page 59: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

3. Data Reference for the content of the synthetic files

• Helping researchers identify variables over longitudinal files is an important service

• Need to keep the unit of analysis straight• Need to understand the mnemonic naming

convention for variables over cycles• Develop indexing aids for you and your

patrons

Services for Synthetic Files

Page 60: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

4. Provide helpful tips for preserving the code from “dummy” analysis runs in SPSS and SAS

• Researchers will run analyses on the synthetic file to generate the code that they will subsequently email for Remote Job Submission

• Providing information about how to do this easily will be helpful to your patrons

Services for Synthetic Files

Page 61: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Let’s look at an example of these four services using the synthetic files from the NPHS, 2000-2001.

An Example Using the NPHS

Page 62: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Let’s look at an example of a “dummy” file using SLIDRET, a retrieval system developed to extract data from the cycles of the SLID. A “data-less” version of SLIDRET is available through DLI to help identify variables for RJS.

An Example Using SLID

Page 63: Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,

Location of Slides and Exercices

http://drc.uoguelph.ca/DATA/WKSHPS/IASSIST2003


Recommended