W4-‐Lecture 1: Useful non-‐parametric tests for trends and distribu9on differences between the two datasets
• Mann-‐Kendall trend test • Resampling tests: – Permuta5on – Bootstrap, one-‐sample, two-‐sample
• Mul5plicity and “field significance”
Non-‐parametric test for loca1on: Wilcoon-‐Mann-‐Whitney or rank sum test:
• How do we compare the means or median of the two datasets if they are clearly non-‐Gaussian with unknown distribu5on and few wild values?
• We pool both datasets into one data (dataAB) and compare the rank of dataA with that of the data B in dataAB. For the null hypothesis that dataA and dataB are draw from the same sta5s5cal distribu5on, the sum of the dataA rank should have good probability to agree with the mean sum of the dataAB rank (Wilcoon-‐Mann-‐Whitney or rank sum test), or no different from the sum of dataB rank (Wilcoxon signed-‐rank test).
• These tests are much more powerful than one-‐sample or two-‐sample t-‐tests.
Wilcox-‐Mann-‐Whitney Test for data with small sample sizes (<20)
When the sample numbers of dataA and data B are small (<20), one can directly compare the U value of each data to the cri5cal U-‐value in the U-‐value Table for the same sample numbers. A very good example is given at hUp://sphweb.bumc.bu.edu/otlt/MPH-‐Modules/BS/BS704_Nonparametric/BS704_Nonparametric4.html
The null hypothesis is there is no difference in data distribu5on between the two datasets (Ho). If U value of one data is smaller than the cri5cal value, one can reject Ho because that data have been systema5cally ranked smaller than the other data. If the rank is larger than the U value, Ho cannot be rejected.
For datasets with larger samples (>20 for each data) • Null hypothesis: Both dataA with n1 samples and dataB with n2 samples are drawn
from the same sta5s5cal distribu5on, the probability for the rank of dataA equals to all other possible ranks for n1 samples randomly drawn from the pooled dataAB with total samples of n=n1+n2. Thus, there is a total number of (n!)/(n1!)(n2!) possible ranks. The distribu5on of the sums of these possible ranks follows Gaussian distribu5on if n1 and n2 > 10.
• Based on the Mann-‐Whitney U-‐sta5s5c, the mean and variance of the distribu5on of the (n!)/(n1!)(n2!) possible sums of the ranks for n1 or n2 samples are Under the U-statistic, we can compute the rank-sum, R, statistics for dataA and dataB
U1 = R1 −n1
2(n1 +1), U2 = R2 −
n2
2(n2 +1)
and compare to that of pooled data from dataA and dataB
µU =n1n2
2, σU =
n1n2 (n1 + n2 +1)12
if data have a few repeating values
If the data have large number of repeating values:
σU =n1n2 (n1 + n2 +1)
12−
n1n2
12(n1 + n2 )(n1 + n2 +1)(t3
j − t j )j
J
∑
If n1 or n2 is less than 20, check whether U=min(U1, U2 )Based on Gaussian distribution
z=U1 −µUσU
use z-value to determine whether the null hypothesis can be rejected.
Example-‐1:
• Determine whether the values in dataA (1, 3, 20, 5, 11) are significantly different from those of dataB (2, 5, 6, 7, 15, 17).
• Null hypothesis: DataA are draw from the same distribu5on of merged DataAB
• The U-‐value for smaller R data (dataA) is > 0 (the cri5cal U-‐value for 0.01 or 0.05 significance for n1=5). The null hypothesis cannot be rejected at 0.05 or 0.01 significance.
DataA rank dataB rank DataAB rankedAB rank+of+AB1 1 2 2 1 1 13 3 5 4 3 2 220 10 6 5 20 3 35 4 7 6 5 5 415 7.5 15 7.5 11 6 5
17 9 2 7 615 15 7.517 15 7.55 17 9
n1=3 n2=5 6 20 10rank+sum,+R 25.5 33.5 7
u 10.5
• If I plot all the possible U-‐values for all the probable dataA values randomly drawn from the dataAB, they would follow Gaussian distribu5on (of course, this assump5on is only valid for moderate to large samples, i.e., n1, n2>10)
Rank-‐sum values
Prob
ability
µU =n1n2
2=
5X62
=15
σU =n1n2 (n1 + n2 +1)
12=
5X6(5+ 6+1)12
= 5.5
zDataA =U1 −µUσU
=10.5−15
5.5= −0.82, The probability for this Z value is 79%
The null hypothesis cannot be rejected.
Example:
Evaluate whether cloud seeds can alter lightning strikes
• Null hypothesis: no • Effect.
U1 = R1 −n1
2(n1 +1) =108.5− 6(12+1) = 30.5
pooled data from dataA and dataB
µU =n1n2
2= (12)(11) / 2 = 66
σU =n1n2 (n1 + n2 +1)
12= [ (12)(11)(12+11+1) /12] =16.2
Based on Gaussian distribution
z=U1 −µUσU
= (30.5− 66) /16.2 = −2.19
p value: 0.014, ~1.4% of the 1352,078 possible values of U1 under null hypothesis are smaller than the observed U1. Null hypothesis canbe rejected.
The bootstrap Why? • When we only have one sample data, X with xi, where
i=1, …n, we need to determine its sta5s5cal distribu5on to assess uncertainty.
How to we construct the PD of this data sample? • We write each of the n data value on a paper slip, and
put all n number of slips into a hat, then randomly draw one slip and record the value as x*1.
• We put all the slips back to the hat and mixed, then draw the 2nd slip, record it’s value, as x*2.
• We repeat this process n 5mes to generated a new data, X*1= x*1,1, x*1,2, …. x*1,n.NOTICE THAT x*1,1 and x*1,2 can be the same value because a data value can be drawn from the hat more than 1 5mes.
• We can repeat the above process to generate the second new data, , X*2= x*2,1, x*2,2, …. x*2,n.
• We can repeat this process by computer many 5mes, say nB=10,000 or more, to generate nB number of new data, X*, each has the same n number of values as the original data X.
• Then, the sta5s5cs of interests, say, mean, is computed for each of the nB generated bootstrap samples, X*j, where j=1, 2, …. nb. The resultant frequency distribu5on is then used to approximate the true sampling distribu5on.
5% 95%
Example:
We’d like to calculate the mean of a data, X=1, 2, 3, 4, 5, 2 with n=6 values. We do not know it’s PDF, but need to es5mate the uncertainty of the mean.
Using the bootstrap approach, we can generate a set of new X*j, where j=1, 2, … 20, as shown below.
0
0.05
0.1
0.15
0.2
1.50 2.00 2.50 3.00 3.50 4.00
meanoriginal*data,*X 1 2 3 4 5 2 2.83 2.00 1
2.17 2bootstrap*generated 2.17 2data,*X*1 2 5 1 3 2 2 2.50 2.33 4X*2 1 5 3 3 1 5 3.00 2.33 4X*3 1 4 1 3 1 4 2.33 2.50 3X*4 1 3 1 2 1 4 2.00 2.50 3X*5 1 3 1 2 5 4 2.67 2.50 1X*6 1 3 1 2 3 4 2.33 2.50 1X*7 5 1 4 5 2 4 3.5 2.67X*8 3 4 2 1 1 2 2.17 2.67X*9 1 3 2 4 5 1 2.67 2.67X*10 3 1 2 1 4 2 2.17 2.67*X*11 2 5 1 3 2 2 2.5 2.83X*12 1 5 3 3 1 5 3.00 2.83X*13 4 3 1 5 4 3 3.33 2.83X*14 5 2 4 2 1 2 2.67 3.00X*15 1 3 5 3 2 4 3.00 3.00X*16 2 4 1 2 3 4 2.67 3.00X*17 5 1 2 4 2 1 2.50 3.33X*18 3 4 2 1 5 2 2.83 3.50X*19 1 3 2 4 5 2 2.83X*20 3 1 4 1 4 2 2.50
21
0*
0.02*
0.04*
0.06*
0.08*
0.1*
0.12*
0.14*
0.16*
0.18*
0.2*
0.00* 1.00* 2.00* 3.00* 4.00*
Series1*
The mean of original data is 2.83, the uncertainty range of the mean for the 90% confident is [2.0, 3.5].
Summary
• Wilcoon-‐Mann-‐Whitney or rank sum test: An effec1ve test to comparing two datasets with unknown distribu1on, by either comparing the sum of their rankings to the U-‐value (if the samples of the data <20) or to compute z-‐value of the U value of one data with that of the data pooled from the two datasets we compare with.
• Boots strap approach: Assume the data we have is just a randomly drawing from the sample pool represented by the data. One can randomly draw samples from this pool many 1mes (say 1000, or 10000) to determine the PDF of the sample property (e.g., Mean) and their range of the uncertainty.