Office for Faculty Excellence - East Carolina Universitycore.ecu.edu/ofe/StatisticsResearch/SPSS 1 8...

Post on 01-Apr-2018

216 views 3 download

transcript

By Hui Bian

Office for Faculty Excellence

1

• My office is located in 1001 Joyner library, room 1006

• Email: bianh@ecu.edu

• Tel: 252-328-5428

• You can download sample data files from: http://core.ecu.edu/ofe/StatisticsResearch/

2

• Goals of this workshop

–Data management: prepare your data for data analysis

–Learn basic SPSS functions

3

• Data used in the workshop

• We use data of 2013 National High School Youth Risk Behavior Surveillance System (YRBSS, CDC) as an example.

• SPSS version 22

4

• YRBSS monitors six types of health-risk behaviors that contribute to the leading causes of death and disability among youth and adults.

• Unintentional injuries and violence

• Sexual behaviors

• Alcohol and other drug use

• Tobacco use

• Unhealthy dietary behaviors

• Inadequate physical activity

5

• Every time when you run SPSS, you will get the output window at the same time.

– Syntax

6

• Data view

– The place to enter data

– Columns: variables

– Rows: records

• Variable view

– The place to enter variables

– List of all variables

– Characteristics of all variables

7

• Data view (enter data here)

– Each variable is a single column (Q1, Q2, etc.)

– Each case is a single row

8

• Data structure

–Multivariate (wide) data structure

•One row per subject

–Univariate (long) data structure

•One row per measurement

• Multivariate structure

10

• Univariate structure

11

• Variable view (enter variables here)

12

Q1, Q2, Q3, etc. are variable names. Type: numeric or string or something else. Values: how you code your numeric variables Measure: type of measurement (scale or ordinal or nominal).

• Variable view: Measure

–Nominal: Categorical variables. Numbers that are simply used as identifiers or names represent a nominal scale of measurement such as female vs. male.

13

• Measure

–Ordinal: An ordinal scale of measurement represents an ordered series of relationships or rank order. Likert-type scales (such as "On a scale of 1 to 10, with one being no pain and ten being high pain, how much pain are you in today?") represent ordinal data.

14

• Interval: A scale that represents quantity and has equal units but for which zero represents simply an additional point of measurement is an interval scale. The Fahrenheit scale is a clear example of the interval scale of measurement. Thus, 60 degree Fahrenheit or -10 degrees Fahrenheit represent interval data.

15

• Ratio: The ratio scale of measurement is similar to the interval scale in that it also represents quantity and has equality of units. However, this scale also has an absolute zero (no numbers exist below zero). For example, height and weight.

16

• Variable view: measurement

17

Measure: SPSS uses different symbols to represent scale of measurement.

• Using SPSS to enter data

–Excel is better than SPSS in terms of data entry or preparing data for data analysis.

–The data structure is multivariate format no matter which software (SPSS or Excel) you use.

18

• Before doing data entry

– You need a code book/scoring guide

– You give ID number for each case (NOT real identification numbers of your subjects) if you use paper survey.

– If you use online survey, you need something to identify your cases.

19

• Code book

20

A code book is about how you code your variables including: 1. Variable names 2. Values for each response option 3. Recoding

• Example of a code book

21

• Create a new data file

– File > New > Data

• Exercise: enter two variables and five cases

–Q01 (question 1: age from YRBSS)

–Q02 (question 2: sex from YRBSS)

–A total of five cases

23

• First, enter variables under Variable View

–Type variable name under Name column (e.g. Q01).

–Variable name can be 64 bytes long, and the first character must be a letter or one of the characters @, #, or $.

24

• Enter variables in variable view

–Type: Numeric, string, etc.

–Label: description of variables.

–Values: variable coding.

25

• Exercise: enter variables

–Two variables are all numeric variables

26

• Enter values

–Click the right corner of Values for Q01 variable

–We will see a tiny button and click it

–A window will pop up

27

• Enter variables: enter values

28

1. Value means how you code each response option, such as 1 = 12 years old or younger, 2= 13 years old… 7 = 18 years old or older.

2. Label means how you label that response option.

3. Click Add button to add value and label at the same time.

4. After finish inputting all response options, click OK.

• Enter variables: enter values

29

• Enter five cases: type values for each variable in data view

• Create ID variable

–Go to Transform > Compute variable > under Numeric expression > choose $Casenum from Function group (ALL)

–Or 1000 + $Casenum (four digits)

• Note for creating ID variable

–You can create ID variable before create a new data file.

–But you have to use Syntax.

• * Syntax for creating a blank data file with ID variable. – INPUT PROGRAM.

– LOOP id = 1000 TO 2000.

– END CASE.

– END LOOP.

– END FILE.

– END INPUT PROGRAM.

– EXECUTE.

Always treat your data with honesty, integrity, and ethics!

35

• Select File > Open >Data

• Choose Excel as file type

• Select the file you want to import

• Then click Open

36

37

• Tell SPSS which sheet you want to open

• CSV is a Comma-Separated Values file.

• If you use online survey (e.g. Qualtrics) to collect data, you can get a CSV data file.

• Select File > Open >Data

• Choose Text as file type

• Select the file you want to import

• Then click Open

39

41

42

43

44

45

46

47

• Tedious but very important!!!

–Key in values and labels for each variable

–Run frequency for each variable

–Check outputs to see if you have variables with wrong values.

48

• Data cleaning

–Check missing values and physical surveys if you use paper surveys, and make sure they are real missing.

–Sometimes, you need to recode string variables into numeric variables.

49

• Exercise: run frequency of Q2 from CSV data

–Go to Analyze > Descriptive Statistics > Frequencies

50

• Output of Frequency analysis of Q2

51

Anything wrong? Q2 here is a string variable and case sensitive. It has two categories: Female and Male. The lower case male here is a wrong entry.

• Important!

–Recoding

–Computing variables

• Example: Recode Q24 (Have you been bullied at school past 12 months) into a new variable Q24r with 1= Yes, No = 0.

• Recode variables

53

1. Transform > Recode into Different Variables

2. Select variable that you want to transform (e.g. Q24)

3. Click Arrow button to put your variable into the right window

4. Under Output Variable: type the name for the new variable, then click Change

5. Click Old and New Values

6. Type 1 under Old Value

and 1 under New Value,

click Add.

7. Type 2 under Old Value,

and 0 under New Value,

click Add.

8. Click Continue after

finish all the changes.

9. Click Ok

54

• Exercise: recode variables Q33, Q43, and Q49

–Recode 1 = 0 days/times into 0 = non-use

–Recode >= 2 (other categories) into 1 = use

55

• Compute variables

– Example: Create a new variable: drug_use (During the past 30 days, any use of cigarettes, alcohol, and marijuana is defined as use, else as non-use).

– There are two categories for the new variable (use vs. non-use). Coding for new variable: 1= Use and 0 = Non-use

• Use three variables Q33, Q43, and Q49

• Go to Transform > Compute Variable

60

Type “drug_use” under Target Variable; Type “0” under Numeric Expression. 0 means Non-use. Click If button.

61

With help of Arrow button, type Q33= 1 & Q43 = 1 & Q49= 1, then click continue, then click Ok. Do the same thing for Use category, but the Numeric expression is different: Q33>1 | Q43>1 | Q49>1 (Q33 >1 OR Q43>1 OR Q 49>1)

• Go back to Transform > Compute Variable > don’t change Target variable name > Type 1 under Numeric Expression (for Drug_use = 1)

• Click IF button to get this window (for Drug_use = 1), type Q33>1 | Q43>1 | Q49>1

• Click Continue and Ok.

• After click OK, a small window asks if you want to change existing variable because drug_use was already created when you first defined non-use category.

• Click OK

64

• Exercise: compute a new variable named Drug_N to assess total number of drugs that adolescents used during the last 30 days.

–Only three drugs are assessed: Q33r, Q43r, and Q49r

–The total number of drugs should be between 0 and 3.

• Go to Transform > Compute Variable

– Target Variable: type Drug_N

–Numeric Expression: SUM(Q33r,Q43r,Q49r)

• Function group: Statistical

• Functions and Special Variables: Sum

– If button: check Include all cases

• Recode variables: convert a string variable into a numeric variable

– Example: Q2 (Gender From CSV data file) is a string variable. Convert this variable into a numeric variable Q2r with two categories: Female = 1 and Male = 2.

–Go to Transform > Recode into Different Variables

68

69

• Click Old and New Values button

70

• Sort cases by variables: Data > Sort Cases

• You can use Sort Cases to find missing.

71

• Select cases –Example. Select Females for analysis.

–Go to Data > Select Cases

–Under Select: Check If condition is satisfied

–Click If button

– In the blank window type Q2 = 1

–Click Continue, click OK

73

You should see a new variable: filter_$ (Variable view), deleting this variable means deleting the selection.

75

76

Slashes mean Unselected cases. They are excluded from the data analysis.

• Select cases –Exercise. Select cases who used any of

cigarettes, alcohol, and marijuana during the last 30 days.

–Go to Data > Select Cases –Check “If condition satisfied” –Click If button –Type Q33 > 1 | Q43 > 1 | Q49 > 1, click

Continue, click OK.

77

• If we run Frequency of Drug_use, we should only get the frequency of drug users

• For example, we have both baseline and posttest data files and want to merge them into one file.

• Before merge files, we need to sort cases by matching variable first. In this example, code is the matching variable.

81

–Use baseline data file as active dataset

–Open both baseline and posttest data files (or just open baseline data file)

–Go to Data > Merge Files: two choices: Add cases and Add variables

– For this example, we choose Add variables (we want to add posttest variables into the file)

82

83

84

• Convert Multivariate to Univariate Format

–Multivariate structure: that is all values for each subject appear in one row under column’s names defined as the same for all subjects.

85

• Use data: restructure data.sav

– Each subject has seven time-point data (depression: pre, dep1-dep6)

86

• Go to Data > Restructure

87

• We only have one variable (depression variable) that needs to be transposed.

88

89

90

91

92

93

94