Research Methodology: Tools
Applied Data Analysis (with SPSS)
Lecture 11: Nonparametric Methods
May 2014
Prof. Dr. Jürg Schwarz
Lic. phil. Heidi Bruderer Enzler
MSc Business Administration
Slide 2
Contents
Aims of the Lecture ______________________________________________________________________________________ 3
Typical Syntax ___________________________________________________________________________________________ 4
Introduction _____________________________________________________________________________________________ 5
Example ................................................................................................................................................................................................................. 5
Parametric vs. Nonparametric Tests _________________________________________________________________________ 8
Basic Ideas ............................................................................................................................................................................................................. 8
Unrelated and Related Samples ........................................................................................................................................................................... 10
Decision Tree of Nonparametric Tests (a Selection) ............................................................................................................................................. 12
Nonparametric Tests with SPSS ___________________________________________________________________________ 13
Comparison of Two Unrelated Groups ................................................................................................................................................................. 14
Comparison of Two Related Groups ..................................................................................................................................................................... 19
Comparison of Several Unrelated Groups ............................................................................................................................................................ 24
Comparison of Several Related Groups ............................................................................................................................................................... 29
Appendix ______________________________________________________________________________________________ 34
Slide 3
Aims of the Lecture
You will understand the concept of ranking data.
You will understand different concepts of nonparametric tests.
You will understand the key steps in conducting the following nonparametric tests:
◦ Wilcoxon rank-sum test
◦ Kruskal-Wallis test
◦ Wilcoxon signed-rank test
◦ Friedman test
You can apply nonparametric methods with SPSS
In particular, you will know how to >
◦ interpret the output
◦ describe the output
Slide 4
Dependent variables measure
Descriptive statistics
Dependent variable values
Group variable sample
Descriptive statistics
Typical Syntax
Wilcoxon rank-sum test (Mann-Whitney U-Test) NPAR TESTS /M-W= values BY sample(1 2) /STATISTICS=DESCRIPTIVES /MISSING ANALYSIS.
Wilcoxon signed-rank test NPAR TESTS /WILCOXON=measure1 WITH measure2 (PAIRED) /STATISTICS DESCRIPTIVES /MISSING ANALYSIS.
Slide 5
Introduction
Example
Medical research: correlation analysis of babies' weight at birth and the increase from day 70 to
day 100, n = 20
0
500
1000
1500
2000
2500
3000
3500
4000
2000 2500 3000 3500 4000
Problem: Leverage effect due to premature babies with weight at birth of less than 3'000 grams
Average normal weight at birth is 3'400 grams
Correlation coefficient
r = -.76
(Variation of r: -1 ≤ r ≤ 1)
The higher the weight at birth,
the lower the increase from
day 70 to day 100.
Incre
ase in w
eig
ht
[Gra
m]
Weight at birth [gram]
leverage effect
Slide 6
Solution
Take into account ranks
Baby
Weight
at birth Rank
Increase
in weight Rank
1 2'740 5 2'550 17
2 3'180 8 1'790 9
3 3'150 7 1'870 10
4 3'030 6 2'040 13
5 3'370 12 1'470 6
6 2'610 4 2'130 14
7 3'570 16 2'150 15
8 2'270 1 3'350 19
9 2'300 2 3'400 20
10 2'380 3 3'230 18
11 3'260 9 820 2
12 3'350 11 1'190 3
13 3'630 19 1'360 4
14 3'640 20 1'420 5
15 3'490 14 1'960 11
16 3'290 10 1'670 7
17 3'540 15 770 1
18 3'570 17 1'700 8
19 3'460 13 2'010 12
20 3'570 18 2'490 16
"How-to" in SPSS
Scales
Variables: ordinal or higher measurement level
SPSS
Analyze����Correlate����Bivariate... � "Spearman" (Spearman rank correlation coefficient)
Lowest weight at birth = 2'270 → Rank 1
Following weight at birth = 2'300 → Rank 2
etc.
Lowest increase in weight = 770 → Rank 1
Following increase in weight = 820 → Rank 2
etc.
(SPSS will allocate the ranks for you!)
Slide 7
Results
0
5
10
15
20
0 5 10 15 20
Leverage effect due to premature babies has been reduced.
Spearman rank correlation coefficient
r = -.56
(Compare with r = -.76 from above)
Rank o
f in
cre
ase in w
eig
ht
Rank of weight at birth
Slide 8
Parametric vs. Nonparametric Tests
Basic Ideas
All of the tests previously used in this course are based upon specific assumptions.
Especially assumptions about the distribution of variables in the population:
◦ normality (variable or error term is normally distributed)
◦ homogeneity of variance (variance of different groups/units is the same)
These tests are referred to as "parametric tests" because the shapes of the population
distributions are described with known distributions and their parameters.
Example: Variable X is normally distributed → X ~ N(µ,σ2) with parameters µ and σ2
.
Tests that do not make such assumptions are referred to as nonparametric tests.
Nonparametric tests are sometimes known as distribution-free tests
because they make no assumptions about the population distribution.
Nonparametric tests are used when the parametric assumptions are invalid.
Slide 9
Methods for nonparametric tests
There are 2 main methods for nonparametric tests:
◦ Resampling - Permutation (Example: Fisher exact test) - Simulation (Example: Bootstrapping)
◦ Ranking
All Nonparametric tests presented in this lecture work on the principle of ranking the data:
→ High scores being represented by high rankings, low scores by low rankings
The analysis is then carried out on the ranks rather than the original data
Advantages
Nonparametric tests can be used
◦ when nothing is known about the distribution in the population or when parametric assumptions are invalid
◦ with outliers
◦ with small samples
Disadvantage
By ranking the data, information about the magnitude of differences between scores is lost:
→ Nonparametric tests have less power than the parametric test even with same sample size.
It is more likely to miss a significant effect (β error).
Slide 10
Unrelated and Related Samples
Unrelated (independent) samples
The measurement values of a person in sample 1 ( ) and
a person in sample 2 ( ) are not influenced by one another.
Related (paired, dependent) samples
Each measurment value in sample 1 is influenced by a particular
measurement value in sample 2 (and vice versa).
When multiple measurements are applied to the same subject
◦ to examine a development over time (repeated measures)
◦ to compare different treatments
When different persons are tested
◦ who belong together ("natural pairings", e.g. couples)
◦ who are matched to reduce the effects of a confounding variable (e.g. matching persons with a comparable level of empathy)
Husband Wife
Item 1
Anger management training*
A B C
*Matched by level of intelligence
Treatment
A B C
Clinic 1
Clinic 2
Stamina
Day 1 Day 14 Day 70
Placebo
Treatment
A B C
Creativity
Slide 11
SPSS examples for unrelated and for related samples
Unrelated Related
Bone density (Slide 14) Sleeping pills (Slide 24)
:
Sample: Reaction time of proband 3 after
taking four different sleeping pills Sample B Women over 50
Sample A Women up to 50
Slide 12
Decision Tree of Nonparametric Tests (a Selection)
one group
two groups
many groups
normal
any
normal
any
normal
any
normal
any
normal
any
normal
any
related
unrelated
related
unrelated
related
unrelated
t-Test
Sign test
t-Test
t-Test
WRS test
WSR test
Kruskal-Wallis
Friedman
related
unrelated ANOVA
Repeated ANOVA
WRS: Wilcoxon rank-sum test WSR: Wilcoxon signed-rank test
Distribution of dependent va-riable
Samples
Slide 13
Nonparametric Tests with SPSS
SPSS: Analyze����Nonparametric Tests����Legacy Dialogs 7
"2 Independent Samples>" → WRS: Wilcoxon rank-sum test (Mann-Whitney U test)
"K Independent Samples>" → Kruskal-Wallis test
"2 Related Samples>" → WSR: Wilcoxon signed-rank test
"K Related Samples>" → Friedman test
Slide 14
Comparison of Two Unrelated Groups
Wilcoxon rank-sum test (Mann-Whitney U test)
Given
Two independent (unrelated) samples with sample sizes n1, n2 (in general n1 ≠ n2)
Small sample size
Not normally distributed
Question
Do the central tendencies µA and µB of a characteristic differ between two unpaired samples?
H0: Central tendencies are equal µA = µB
HA: Central tendencies are not equal µA ≠ µB
Example
Medical research about osteoporosis: bone density in g/cm3
◦ Sample A: women up to and including age 50, n = 13
◦ Sample B: women above age 50, n = 11
sample A 163 152 202 105 134 134 139 110 122 146 149 94 158
sample B 125 121 133 95 148 96 117 112 100 84 98
Slide 15
Procedure
1. Sort the values of the two samples by size
values
sample A 94 105 110 122 134 134 139 146 149 152 158 163 202
sample B 84 95 96 98 100 112 117 121 125 133 148
2. Allocate ranks to the values
ranks
ranks A 2 7 8 12 15.5 15.5 17 18 20 21 22 23 24
ranks B 1 3 4 5 6 9 10 11 13 14 19
3. Calculate the sum of ranks
rank sum
rank sum A 205
rank sum B 95
4. Calculate the test statistic based on the sample with smaller rank sum
Test statistic U = rank sums – ns ⋅ (ns + 1) / 2
Values of sample k with smaller rank sum
U = rank sumB – nB ⋅ (nB + 1) / 2 = 95 – 11 ⋅ (11 + 1) / 2 = 29
Slide 16
5. Determine the critical value in the table below
Distribution of Wilcoxon rank-sum test (Mann-Whitney U test) – two sided, α = 5%
n2
9 10 11 12 13 14 15 16 17 18 19 20
n1 8 15 17 19 22 24 26 29 31 34 36 38 41
9 17 20 23 26 28 31 34 37 39 42 45 48
10 20 23 26 29 33 36 39 42 45 48 52 55
11 23 26 30 33 37 40 44 47 51 55 58 62
12 26 29 33 37 41 45 49 53 57 61 65 69
→ The critical value is 37.
6. Compare the test statistic with the critical value
The value of the test statistic is in the rejection region of H0:
The bone densities of samples A and B are significantly different.
rejection of H0
non-rejection of H0 non-rejection of H0
37 -37
29
0
Slide 17
Wilcoxon rank-sum test (Mann-Whitney U test) with SPSS
SPSS: Analyze�Nonparametric Tests�Legacy Dialogs�2 Independent Samples >
Values of the grouping variable: Consult value labels or a frequency table
NPAR TESTS
/M-W= values BY sample(1 2)
/STATISTICS=DESCRIPTIVES
/MISSING ANALYSIS.
Slide 18
Here sample size n ≤ 30, therefore:
The bone densities of samples A and B are significantly different (Exact Wilcoxon rank-sum test:
U = 29, p = .013).
If sample size n > 30:
The bone densities of samples A and B are significantly different (Asymptotic Wilcoxon rank-
sum test: Z = -2.463, p = .014).
"Asymp. Sig. (2-tailed)" is based on an approximation
to a normal distribution.
For samples with n > 30 use "Asymp. Sig."
Test statistic U
Rank sum of smaller sample
Z-value calculated for the asymptotic test
Slide 19
Comparison of Two Related Groups
Wilcoxon signed-rank test
Given
Two related samples (sometimes called paired samples)
Small sample sizes
Not normally distributed
Question
Is there a difference between the central tendencies µA and µB of a characteristic in two related
samples?
H0: central tendencies are equal µA = µB
HA: central tendencies are not equal µA ≠ µB
Example
Medical research about osteoporosis: bone density in g/cm3
Sample: 10 women
◦ Measure 1: bone density before exercise therapy
◦ Measure 2: bone density after exercise therapy
Slide 20
Procedure
1. Calculate differences in values of the two related data points
|difference| = |measure 2 – measure 1|
2. Write down the sign of the difference
Sign of "measure 2 – measure 1"
3. Assign ranks to the absolute differences
Differences with a value of 0 will not be considered.
4. Sum up positive ranks and negative ranks
women measure 1 measure 2 | difference | sign rank positive ranks negative ranks
1 202 133 69 - 9 9
2 163 125 38 - 7 7
3 94 128 34 + 6 6
4 152 121 31 - 5 5
5 134 148 14 + 2 2
6 139 117 22 - 3.5 3.5
7 110 112 2 + 1 1
8 122 100 22 - 3.5 3.5
9 158 85 73 - 10 10
10 146 84 62 - 8 8
sum 55 9 46
1. 2. 3. 4. 4.
Slide 21
5. Calculate the test statistic
Test statistic W = |positive rank sum – negative rank sum | = |9 – 46| = 37
6. Determine the critical value in the table below
Distribution of Wilcoxon signed-rank test:
Number of differences not equal 0 one sided two sided
T0.95 T0.975
5 15 na
6 17 21
7 22 24
8 26 30
9 29 35
10 35 39
11 40 46
12 44 52
7. Compare the value of the test statistic with the critical value
The value of the test statistic is inside the non-rejection region of H0.
Test (weakly) not significant → Exercise therapy does not increase bone density.
non-rejection of H0
rejection of H0
39
37
0
Slide 22
Wilcoxon signed-rank test with SPSS
SPSS: Analyze�Nonparametric Tests�Legacy Dialogs�2 Related Samples >
NPAR TESTS
/WILCOXON=measure1 WITH measure2 (PAIRED)
/STATISTICS DESCRIPTIVES
/MISSING ANALYSIS.
Slide 23
Test (weakly) not significant.
→ Therapy does not increase bone density (Wilcoxon signed-rank test: Z = -1.887, p = .059).
As the rank sums approximatively follow a normal dis-
tribution, the smaller rank sum is z-standardized.
Slide 24
Comparison of Several Unrelated Groups
Kruskal-Wallis test
Given
Many independent (unrelated) samples with sample sizes n1, ... nk (in general ni ≠ nj, for i ≠ j)
Small sample sizes
Not normally distributed
Question
Is there a difference between the central tendencies?
H0: central tendencies are equal
HA: at least two of the central tendencies are not equal
Example
Test of 3 different sleeping pills (drug1, drug2, drug3) on sleep duration (measured in hours).
The sleeping pills are used in three random samples (n1 = 3, n2 = 4, n3 = 5)
drug1 6.2 6.9 5.1
drug2 7.1 6.2 6.2 7.9
drug3 8.4 8.8 8.6 8.2 7.2
sleep duration
Slide 25
Procedure
1. All values of the sample are sorted according to size
sleep duration
drug 1 5.1 6.2 6.9
drug 2 6.2 6.2 7.1 7.9
drug 3 7.2 8.2 8.4 8.6 8.8
2. Assign ranks to the values and calculate the rank sum for every sample
ranks rank sum
drug 1 1 3 5 9
drug 2 3 3 6 8 20
drug 3 7 9 10 11 12 49
3. Calculate the test statistic K
k 2 2 2 2j
jj 1
21 2 k j
RConstant 12 9 20 49K 3(N 1) 3(12 1) 7.71
N(N 1) n 12(12 1) 3 4 5
Constant = 12, applies to all numbers of samples and for all sample sizes
N = n +n +...+n R = squared rank sum of sample j
k
=
= − + = + + − + =
+ + ∑
j= number of samples n = sample size of sample j
Slide 26
4. Determine the critical value in the table below
The test statistic K follows a χ2-distribution with degrees of freedom ν = k – 1 = 3 – 1 = 2
1 - ααααdf 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990 0.995
1 1.07 1.32 1.64 2.07 2.71 3.84 5.02 6.63 7.88
2 2.41 2.77 3.22 3.79 4.61 5.99 7.38 9.21 10.60
3 3.66 4.11 4.64 5.32 6.25 7.81 9.35 11.34 12.84
4 4.88 5.39 5.99 6.74 7.78 9.49 11.14 13.28 14.86
5 6.06 6.63 7.29 8.12 9.24 11.07 12.83 15.09 16.75
6 7.23 7.84 8.56 9.45 10.64 12.59 14.45 16.81 18.55
7 8.38 9.04 9.80 10.75 12.02 14.07 16.01 18.48 20.28
8 9.52 10.22 11.03 12.03 13.36 15.51 17.53 20.09 21.95
9 10.66 11.39 12.24 13.29 14.68 16.92 19.02 21.67 23.59
10 11.78 12.55 13.44 14.53 15.99 18.31 20.48 23.21 25.19
11 12.90 13.70 14.63 15.77 17.28 19.68 21.92 24.73 26.76
12 14.01 14.85 15.81 16.99 18.55 21.03 23.34 26.22 28.30
13 15.12 15.98 16.98 18.20 19.81 22.36 24.74 27.69 29.82
14 16.22 17.12 18.15 19.41 21.06 23.68 26.12 29.14 31.32
15 17.32 18.25 19.31 20.60 22.31 25.00 27.49 30.58 32.80
Critical value for α = 5%: χχχχ295% = 5.99
5. Compare the value of the test statistic with the critical value
(χ295% = 5.99) < (K = 7.71)
The null hypothesis is rejected. The rank sums differ significantly.
The sleeping pills have different effects on sleep duration.
Slide 27
Kruskal-Wallis test with SPSS
SPSS: Analyze�Nonparametric Tests�Legacy Dialogs�K Independent Samples >
NPAR TESTS
/K-W=duration BY drug(1 3)
/STATISTICS DESCRIPTIVES
/MISSING ANALYSIS.
Slide 28
The sleeping pills have significantly different effects on sleep duration
(Kruskal-Wallis test: χ2 = 7.708, df = 2, p = .021).
Compare "Chi-Square" with test statistic K = 7.71
Slide 29
Comparison of Several Related Groups
Friedman test
Given
Related samples (repeated measures design)
Small sample size
Not normally distributed
Question
Is there a difference between the central tendencies?
H0: central tendencies are equal
HA: at least two of the central tendencies are not equal
Example
The effect on reaction time of 4 different sleeping pills (drug1, >) is measured (in milliseconds)
proband drug1 drug2 drug3 drug4
1 30 28 16 34
2 14 18 10 22
3 28 28 14 30
4 24 20 18 30
5 38 34 20 44
Example: Reaction time dependent on type of
sleeping pill (drug1, > drug4) of proband 3
Slide 30
Procedure
1. Within each person, assign ranks to the treatments
proband drug1 drug2 drug3 drug4
1 3 2 1 4
2 2 3 1 4
3 2.5 2.5 1 4
4 3 2 1 4
5 3 2 1 4
If several values have the same rank, they will be replaced by the average
2. Calculate the rank sum Rj of each column (= of each treatment)
proband drug1 drug2 drug3 drug4
1 3 2 1 4
2 2 3 1 4
3 2.5 2.5 1 4
4 3 2 1 4
5 3 2 1 4
Rj 13.5 11.5 5.0 20.0
3. Calculate the test statistic V
k2 2 2 2 2j
j 1
Cons tan t 12V R 3n(k 1) (13.5 11.5 5.0 20.0 ) 3 5(4 1) 13.74
nk(k 1) 5 4(4 1)
Constant = 12
k = number of treatment levels = 4 treatment levels
n = number of probands = 5 probands
=
= − + = + + + − ⋅ + =+ ⋅ +∑
Example: ranked reaction time (ranks 1 to 4)
of proband 3
Slide 31
4. Determine the critical value in the table below
The test statistic V follows a χ2-distribution with degrees of freedom ν = k – 1 = 4 – 1 = 3
1 - ααααdf 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990 0.995
1 1.07 1.32 1.64 2.07 2.71 3.84 5.02 6.63 7.88
2 2.41 2.77 3.22 3.79 4.61 5.99 7.38 9.21 10.60
3 3.66 4.11 4.64 5.32 6.25 7.81 9.35 11.34 12.84
4 4.88 5.39 5.99 6.74 7.78 9.49 11.14 13.28 14.86
5 6.06 6.63 7.29 8.12 9.24 11.07 12.83 15.09 16.75
6 7.23 7.84 8.56 9.45 10.64 12.59 14.45 16.81 18.55
7 8.38 9.04 9.80 10.75 12.02 14.07 16.01 18.48 20.28
8 9.52 10.22 11.03 12.03 13.36 15.51 17.53 20.09 21.95
9 10.66 11.39 12.24 13.29 14.68 16.92 19.02 21.67 23.59
10 11.78 12.55 13.44 14.53 15.99 18.31 20.48 23.21 25.19
11 12.90 13.70 14.63 15.77 17.28 19.68 21.92 24.73 26.76
12 14.01 14.85 15.81 16.99 18.55 21.03 23.34 26.22 28.30
13 15.12 15.98 16.98 18.20 19.81 22.36 24.74 27.69 29.82
14 16.22 17.12 18.15 19.41 21.06 23.68 26.12 29.14 31.32
15 17.32 18.25 19.31 20.60 22.31 25.00 27.49 30.58 32.80
Critical value for α = 5%: χχχχ295% = 7.81
5. Compare the value of the test statistic with the critical value
(χ295% = 7.81) < (V = 13.74)
The null hypothesis is rejected. The rank sums differ significantly.
The sleeping pills cause significantly different reaction times.
Slide 32
Friedman test with SPSS
SPSS: Analyze�Nonparametric Tests�Legacy Dialogs�K Related Samples >
NPAR TESTS
/FRIEDMAN=drug1 drug2 drug3 drug4
/STATISTICS DESCRIPTIVES
/MISSING LISTWISE.
Slide 33
The sleeping pills have different impact on reaction times
(Friedman test: χ2 = 14.020, df = 3, p = .003).
Compare "Chi-Square" with test statistic V = 13.74
SPSS uses a slightly different algorithm
Slide 34
Appendix
Which Groups Differ? Post hoc Tests after a Kruskal-Wallis Test
Post hoc tests can be accessed through the output generated by the newer SPSS dialogs:
SPSS: Analyze����Nonparametric Tests����Independent Samples >
Alternatively, separate Wilcoxon rank-sum tests for each of the combinations of two drugs could
be conducted (using an alpha level adjustment, e.g. a Bonferroni correction).
NPTESTS
/INDEPENDENT TEST (duration) GROUP (drug)
/MISSING SCOPE=ANALYSIS USERMISSING=EXCLUDE
/CRITERIA ALPHA=0.05 CILEVEL=95.
Note: This dialog does not allow defining
which groups are compared. All groups
coded in drug are being compared.
Slide 35
Double-click table in the output
Select "Pairwise comparisons"
Values needed for reporting the Kruskal-Wallis test
Copy output
Slide 36
=> Only drug=1 and drug=3 differ significantly from one another (p = .029).
SPSS does not offer any choice between different alpha level adjustments. By default,
a Bonferroni adjustment is carried out. Other adjustments need to be carried out manually.
Copy output
Slide 37
Which Groups Differ? Post hoc Tests after a Friedman Test
Post hoc tests can be accessed through the output generated by the newer SPSS dialogs:
SPSS: Analyze����Nonparametric Tests����Related Samples >
Alternatively, separate Wilcoxon signed-rank tests for each of the combinations of two drugs
could be conducted (using an alpha level adjustment, e.g. a Bonferroni correction).
NPTESTS
/RELATED TEST(drug1 drug2 drug3 drug4)
/MISSING SCOPE=ANALYSIS USERMISSING=EXCLUDE
/CRITERIA ALPHA=0.05 CILEVEL=95.
Slide 38
Double-click table in the output
Select "Pairwise comparisons"
Values needed for reporting Friedman test
Copy output
Slide 39
=> Only drug3 and drug4 differ significantly from one another (p = .001).
SPSS does not offer any choice between different alpha level adjustments. By default,
a Bonferroni adjustment is carried out. Other adjustments need to be carried out manually.
Copy output