Differential Analysis& FDR Correction
Differential Analysis Steps
Step 1: Construction of input data table in EXCELStep 2: Save EXCEL file into tab delimited txt fileStep 3: Upload data - tab delimited txt fileStep 4: Choose T or U Test Step 5: Enter your email and submitStep 6: Result interpretation: global FDRStep 7: Result interpretation: local FDR
Step 1:
CLASS 1 1 0 0
Gene.1.name … … … …
Gene.2.name … … … …
… … … … …
… … … … …
Construction of input data table in EXCEL
Step 1:
CLASS 1 1 0 0
Gene.1.name … … … …
Gene.2.name … … … …
… … … … …
… … … … …
Input data format:• Cell A1: “CLASS”• 1st Column: feature names• 1st Row: sample categories.
• It has to be binary, either 1 or 0• e.g. 1 is disease, 0 is control
• All other cells should be data, one sample per one column• e.g. array intensity or protein quantities
EXCEL file example
Step 2: Save EXCEL file into tab delimited txt file
Step 3: Upload data - tab delimited txt file
1
2
3
Step 3: Upload data - tab delimited txt file
Input data “input.txt” selected
Step 4: Choose T test or U test
Choose either T or U test for analysis
Step 4: T test or U test, which one to choose?
• The U test is useful in the same situations as t test
• U test should be used if the data are ordinal
• U test is more robust to outliers
• U test is more efficient • For distribution far from normal and for sufficiently large samples
To Discover Differential Features: Student’s T test or Mann Whitney U test?
Student’s T test:
Student’s T test is a parametric test of the null hypothesis, where the means of 2 normally distributed populations are equal. It is used when you have a nominal variable, which must only have 2 values, such as “male” and “female,” and measurement variable, and you want to compare the mean values of the measurement variable. It is a test of the null hypothesis, where the means of 2 normally distributed populations are equal.
Mann-Whitney U Test: Mann-Whitney U Test is a non-parametric test that examines whether 2 sites of
data could have come from the same population. It requires 2 data sets that do not need to be paired, normally distributed, or have equal numbers in each set.
Step 5: Enter your email and submit
Enter your email
Submit
Step 6: Result interpretationGlobal FDR
FDR plot red line: Total Discoveries (TD) or Total Discovery rate = 1
FDR plot green line: False Discoveries (MEAN) or False Discovery Rate FDR (MEAN)
FDR plot black bar line: False Discoveries (MEDIAN) or False Discovery Rate FDR (MEDIAN)
FDR plot blue line: False Discoveries (95%) or False Discovery Rate FDR (95%)
FDR plot dotted black line: FDR=0.05
95% FD/TD .05 FDR1 = TD/TD
Single hypothesis test P-value thresholds
Mean FD/TD Median FD/TDA
Glo
bal
FD
R0.
00.
20.
40.
60.
81.
0
0.0
0.2
0.4
0.6
10-9 0.01 0.02 0.03 0.0410-9 0.05 1.0
Step 6: Result interpretationGlobal FDR
Step 6: How to read the gFDR plots
• Commonly used global FDR cut off • 0.05
• If there are no significant features• No data points will show up below
the 0.05 dotted horizontal line
95% FD/TD .05 FDR1 = TD/TD
Single hypothesis test P-value thresholds
Mean FD/TD Median FD/TDA
Glo
bal
FD
R0.
00.
20.
40.
60.
81.
0
0.0
0.2
0.4
0.6
10-9 0.01 0.02 0.03 0.0410-9 0.05 1.0
Step 6: Result interpretationGlobal FDR
Features which satisfy global FDR < 0.05
Commonly used gFDR cutoff: 0.05
95% FD/TD .05 FDR1 = TD/TD
Single hypothesis test P-value thresholds
Mean FD/TD Median FD/TDAG
lob
al
FD
R0.
00.
20.
40.
60.
81.
0
0.0
0.2
0.4
0.6
10-9 0.01 0.02 0.03 0.0410-9 0.05 1.0
Step 6: Result interpretationGlobal FDR
95% FD/TD .05 FDR1 = TD/TD
Single hypothesis test P-value thresholds
Mean FD/TD Median FD/TDA
Glo
bal
FD
R0.
00.
20.
40.
60.
81.
0
0.0
0.2
0.4
0.6
10-9 0.01 0.02 0.03 0.0410-9 0.05 1.0
Features which satisfy global FDR < 0.05
Commonly used gFDR cutoff: 0.05
Step 7: Result interpretationlocal FDR
Lo
cal F
DR
Single hypothesis test P-value
0.0 0.01 0.02 0.03 0.04 0.050.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
50
.10
0.1
50
.20
Step 7: How to read the lFDR plots
It has been suggested (Aubert, et al., 2004) that the first abrupt change of the local FDR can be an indication for the determination of a good threshold to choose genuinely statistically significant features.
Step 7: Result interpretationlocal FDR
Lo
cal F
DR
Single hypothesis test P-value
0.0 0.01 0.02 0.03 0.04 0.050.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
50
.10
0.1
50
.20
1st abrupt change of lFDR
Step 7: Result interpretationlocal FDR
Click to download result file
Step 7: Result interpretationlocal FDR
Local FDR results:• 1st column: feature name
• 2nd column: t or U test P value
• 3rd column: local FDR results