Post on 23-Mar-2021
transcript
Formerly SPSS Ireland
Creating Dummy Variables
in SPSS 20
Conor McCarthy
Services Consultant
What are Dummy Variables
Also known as Indicator Variables
Used in techniques like Regression where there is an assumption
that the predictors measurement level is scale
Dummy coding get’s around this assumption
Take a value of 0 or 1 to indicate the absence (0) or presence (1)
of some categorical effect
k -1 dummy variables required for a variable with k categories
2
An Example
Suppose you have a nominal variable with more than two
categories that you want to use as a predictor in a linear
Regression analysis i.e. Job Category
Then you will need to create 2 dummy variables (i.e. the
number of categories – 1) and include these new dummy
variables in your regression model
3
Considerations
Number of dummy variables – straight forward = k-1, where
k is the number of categories
Choose a reference category – this is the category that you
will compare all the other categories against
Often the reference category will be the first or last category
4
Doing this in SPSS 20
Built into the Logistic Regression procedures, needs to be
created manually for Linear Regression/Discriminant
Analysis
No single function available
Best to do this using syntax
5
Approach 1
Using “Employee Data.sav” located in
C:\Program Files\IBM\SPSS\Statistics\20\Samples\English
For variable jobcat create two dummy variables: jobcat1 and
jobcat2
Initially set each variable to 0 and then specify that each will
take on a value of 1 for job categories 1 and 2
In this way category number 3 is set to be the reference
category
6
Approach 1
7
Approach 1
8
Approach 2
Using the VECTOR and LOOP – END LOOP commands
Use the Vector Command to create the required number of
dummy variables i.e. 2 in this case
Use the LOOP – END LOOP command to loop through each
of the dummy variables that are created using the VECTOR
command
9
Approach 2
10
Approach 2
This approach will make the last category the reference
category as we are only looping through categories 1 and 2
in COMPUTE jobcat(#i) = ( jobcat = #i).
To make the first category the reference category you could
modify the COMPUTE statement in the syntax as follows:
COMPUTE jobcat(#i) = ( jobcat = #i +1).
11
Dealing with missing values
Modify compute statements in Approach 1 to just:
• IF (NOT MISSING(jobcat)) jobcat1=0.
• IF (NOT MISSING(jobcat)) jobcat2=0.
This ensures missing values are still missing in the dummy
variables
Approach 2 will deal with missing values implicitly
12
Approach 1 modified to account for missing values
13