By Hui Bian
Office for Faculty Excellence
1
• My office is located in 1001 Joyner library, room 1006
• Email: [email protected]
• Tel: 252-328-5428
• You can download sample data files from: http://core.ecu.edu/ofe/StatisticsResearch/
2
• Exercise: recode variables Q33, Q43, and Q49
–Recode 1 = 0 days/times into 0 = non-use
–Recode >= 2 (other categories) into 1 = use
3
• The coding for the new recoded variables
• Go to Transform > Recode into Different Variables
• Click Old and New Values button to get
• Exercise: compute a new variable named Drug_N to assess total number of drugs that adolescents used during the last 30 days.
–Only three drugs are assessed: Q33r, Q43r, and Q49r
–The total number of drugs should be between 0 and 3.
• Go to Transform > Compute Variable
–Target Variable: type Drug_N
–Numeric Expression: SUM(Q33r,Q43r,Q49r)
• Function group: Statistical
• Functions and Special Variables: Sum
–If button: check Include all cases
• Recode variables: convert a string variable into a numeric variable
– Example: Q2 (Gender From CSV data file) is a string variable. Convert this variable into a numeric variable Q2r with two categories: Female = 1 and Male = 2.
–Go to Transform > Recode into Different Variables
10
11
• Click Old and New Values button
12
• Sort cases by variables: Data > Sort Cases
• You can use Sort Cases to find missing.
13
• Select cases –Example. Select Females for analysis.
–Go to Data > Select Cases
–Under Select: Check If condition is satisfied
–Click If button
– In the blank window type Q2 = 1
–Click Continue, click OK
15
You should see a new variable: filter_$ (Variable view), deleting this variable means deleting the selection.
17
18
Slashes mean Unselected cases. They are excluded from the data analysis.
• Select cases –Exercise. Select cases who used any of
cigarettes, alcohol, and marijuana during the last 30 days.
–Go to Data > Select Cases –Check “If condition satisfied” –Click If button –Type Q33 > 1 | Q43 > 1 | Q49 > 1, click
Continue, click OK.
19
• If we run Frequency of Drug_use, we should only get the frequency of drug users
• For example, we have both baseline and posttest data files and want to merge them into one file.
• Before merge files, we need to sort cases by matching variable first. In this example, code is the matching variable.
23
• Use baseline data file as active dataset.
• Open both baseline and posttest data files (or just open baseline data file).
• Go to Data > Merge Files: two choices: Add cases and Add variables.
• For this example, we choose Add variables (we want to add posttest variables into the file).
25
26
• Convert Multivariate to Univariate Format
–Multivariate structure: that is all values for each subject appear in one row under column’s names defined as the same for all subjects.
27
• Use data: restructure data_multivariate.sav
– Each subject has seven time-point data (depression: pre, dep1-dep6)
28
• Go to Data > Restructure
29
• We only have one variable (depression variable) that needs to be transposed.
30
31
32
33
34
35
• Data screening
–Understand your variables
–Do the variables meet the statistical assumptions when use parametric tests?
–Check outliers
• Distribution diagnosis: Graphs
–Histograms
–Stem-and-Leaf Plots
–Box Plots
–Normal Q-Q Plots
• Three SPSS functions used for data screening
–Frequencies
–Descriptives
–Explore
• Go to Analyze > Descriptive Statistics, you should see:
–Frequencies
–Descriptives
–Explore
• Example: run Frequencies of Q49 (How many times use marijuana 30 days)
• Central tendency
–A measure that is most representative of all scores in a distribution.
–Mean, median, mode
• Dispersion(variability):
–A measure of the spread of scores in a distribution.
–Variance, standard deviation, range
• Percentile Values
–SPSS reports 25th, 50th, and 75th percentile values.
–You also can tell SPSS to get any percentile values
• Distribution
• Skewness and kurtosis are statistics that characterize the shape and symmetry of the distribution.
• The normal distribution is symmetric and has skewness and kurtosis values of zero.
• Skewness: is a measure of symmetry
–Positive skewness: a long right tail.
–Negative skewness: a long left tail.
• Kurtosis: a measure of the extent to which observations cluster around a central point. – Leptokurtic data values are more peaked (positive
kurtosis) than normal distribution.
– Platykurtic data values are flatter and more dispersed along the X axis (negative kurtosis) than normal distribution.
• Click Charts to get a histograms
• SPSS output of Frequency analysis
Note: Standard error is an estimate of the standard deviation of a statistic. Range is the difference between the highest value and the lowest value.
• SPSS output of Frequency analysis
• SPSS output of Frequency analysis
• Descriptives function
– Example: Run Descriptives of Q49
• SPSS output of Descriptives analysis
• Second example: Out of three drugs (Q33, Q43, and Q49), we want to know which drug was used most frequently among high school students.
• We use Descriptives function to sort variables by Mean or by Sum.
• Go to Analyze > Descriptive Statistics > Descriptives > Click Options
Under Display Order, check Descending means
• SPSS output
• Syntax for sort by Sum
–DESCRIPTIVES VARIABLES=Q33 Q43 Q49
/STATISTICS=SUM MEAN STDDEV /SORT=SUM (D).
• Explore function
–Example: run Explore for Q6 and Q49
• Explore function: click Statistics and Plots to get:
• SPSS output: descriptives, similar to the results from Frequencies and Descriptives
• SPSS output: histograms
• SPSS output: stem-and- leaf graph of Q6
• SPSS output: rotated stem and leaf graph
• SPSS output: normal Q-Q plots
• Box plots
Meyers, Gamst, & Guarino (2006)
• SPSS output: box plots
• Explore Q6 by sex (Q2)
• SPSS output
• Histograms
• Stem-and-leaf plots
• Normal Q-Q plots
• Box plots
• Test means: t tests and Analysis of variance
– T tests
• one sample t test
• Independent-samples t test
• Paired-samples t test
– Analysis of variance (ANOVA)
• One-way/two-way between subject design
• One-way/two-way within subject design
• Mixed design
• Go to Analyze > Compare Means
• Student’s T test – The method assumes that the results follow the
normal distribution (also called student's t-distribution) if the null hypothesis is true.
– The paired t-test is used when you have a paired design.
– The independent t-test is used when you have an independent design.
• Independent-samples t test – Example: we want to know if there is a
difference between sex groups (Q2) in height (Q6).
–Go to Analyze > Compare Means > Independent-Samples T Test
• Test variable: Q6 (dependent variable)
• Grouping variable: Q2 (two groups: female and male)
• Coding of Q2: 1= Female and 2= Male
Click Define Groups, type 1 for Group 1 and 2 for Group 2 based upon the coding of Q2
• SPSS output
• Mean height of females = 1.62, SD = .07
• Mean height of males = 1.76, SD = .09
• t = -94.28, df = 12470.68, p = .00
• Conclusion: there is significant difference between female and male groups in height.
• Analysis of Variance (ANOVA) –Used to compare means of two or more
than two groups
–One-way ANOVA (between subjects): there is only one factor variable
– Example: we want to know if there is difference in height (Q6) among four grade groups (Q3)
• Original coding of Q3
• We need to recode Q3 in order to get rid of the last category.
–Then the new variable has four categories
–Go to Transform > Recode into a different variable
• Recoding Q3 into Q3r
• Go to Analyze > General Linear Model > Univariate
• SPSS output
• SPSS output
F(3, 12545) = 60.83, p = .00. There was a difference in height among four grade levels.
• Post Hoc tests
–We have already obtained a significant omnibus F-test with a factor of four levels.
–We need to know which means are significantly different.
• Click Post Hoc button
• Results of Post Hoc tests
Meyers, L. S., Gamst, G., & Guarino, A. J. (2006). Applied multivariate research: design and interpretation. Thousand Oaks, CA: Sage Publications, Inc.
90