GRAPHICAL DISPLAY OF DATA AND STATISTICS FOR BASIC SCIENCE RESEARCHERSElizabeth Garrett-Mayer, PhDProfessor of Biostatistics, Hollings Cancer Center and Dept. of Public Health Sciences
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
1. Look at the raw data
Tendency to ‘adjust’ or ‘normalize’ prior to seeing what the data say on their own.
Why is this important? Detection of outliers Helps to determine patterns in the data You need to know your data before you analyze it.
Do not get ahead of yourself: the analytic tools you use depend on what the data ‘look like.’
Example:
-5000.00
0.00
5000.00
10000.00
15000.00
20000.00
25000.00
Inte
rfer
on-g
amm
a re
leas
e (p
g/m
l)
no restimulation
restimulation
Example
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
Graphical displays lead us to the correct analysis approach
0 20 40 60 80
Time (days)
Vol
ume
030
100
300
600
1000
2000
3000
4000 High Cis+GRHigh CisLow Cis+GRLow CisGRControl
GR is a thromboxane receptor antagonist
0 20 40 60 80
Time from Injection (days)
Tum
or V
olum
e
High Cis+GRHigh CisLow Cis+GRLow CisGRControl
030
100
300
600
1000
2000
3000
4000
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70 80
Pro
porti
on T
umor
-Fre
e
Time to Tumor (Days)
High Cis+GRHigh CisLow Cis+GRLow CisGRnoGR
0 10 20 30 40
Days from Tumor Onset
Vol
ume
0
30
100
300
600
1000
2000
High Cis+GRHigh CisLow Cis+GRLow CisGRnoGR
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
Natural History of PSA: N=39
0 50 100 150 200
Time On Study (months)
PS
A
0.01
0.1
0.5
25
1050
200
Three treatment clinical trial. N=1000.
Treatment Group
PSA
Doce, Sched 1 Doce, Sched 2 Mitox
0.1
1
10
100
1000
10000
Treatment Group
PSA
Doce, Sched 1 Doce, Sched 2 Mitox
0.1
1
10
100
1000
10000
Treatment Group
PSA
Doce, Sched 1 Doce, Sched 2 Mitox
0.1
1
10
100
1000
10000
But, N=10? Or N=5? Show the data!
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
Fold-Changes
Fold-change should usually be displayed on the log-scale
Fold-change should be ANALYZED on the log scale
Tumor size, PSA, etc.
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
Make the inferences simpler by being consistent Example: Why?
Data are means ± SD (n=8) Data are means ± SE (n=4)
Figures: Same groups or conditions should be the same across figures Symbols Colors Lines
Analysis: t-test in some; rank sum test in others? Picking the best p-value? If there is a reason for INconsistency, you should explain it.
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
Paired data
-5000.00
0.00
5000.00
10000.00
15000.00
20000.00
25000.00
Inte
rfer
on-g
amm
a re
leas
e (p
g/m
l)
no restimulation
restimulation
Longitudinal Data
0 20 40 60 80
Time (days)
Vol
ume
030
100
300
600
1000
2000
3000
4000 High Cis+GRHigh CisLow Cis+GRLow CisGRControl
Longitudinal Experiment
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
They are usually real
Check the data entry But, just because a mouse
or experiment did not show consistent results, you cannot and should not remove it.
“Results shown are based on 6 representative mice”…RED FLAG.
Statistical assumptions?
Sometimes outliers create skewness or cause other problems for statistical analysis
Making the data fit the analytic approach is not correct.
Find an analytic approach that is valid for your data
Examples: Non-parametric tests transformations
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
Don’t overinterpret
The population? One donor….representative of the population? Same with cell lines Who can you generalize to, and therefore, what does
the p-value or confidence interval mean? Sometimes the p-value really ends up just testing the
precision of your assay!
P-values
The role of p-values P-values were never intended to be the ‘last’ line of
defense (Regina Nuzzo. Scientific method: statistical errors. Nature, 12 February 2014; 506: 150-52)
“P values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume.”
P-values are based on statistical tests The tests have assumptions and assume a specific
experimental design E.g. paired vs. two-sample t-tests
P-values
Never interpret a p-value on its own The p-value from a two-sample t-test depends on
The variance in the groups being compared The sample size in each group The difference in the means in each group
Many insignificant p-values accompany highly meaningful effect sizes. Why?
Incorrect implementation of t-testCD62L Low
CD62L High
CD62L Low
CD62L High
CD62L Low CD62L High
Donor 1 1 1.53 1.00 2.49 1.00 2.06Donor 2 1 1.87 1.00 2.52 1.00 1.87
CD62L Low
CD62L High
CD62L Low
CD62L High
CD62L Low CD62L High
Average 1 1.69966 12.504847 11.9637952
6
SD 0 0.24416 00.024433 00.1306806
9
p value 0.055839 0.0001320.0090674
4
0
0.5
1
1.5
2
2.5
CD62LLow
CD62LHigh
0
0.5
1
1.5
2
2.5
3
CD62LLow
CD62LHigh
0
0.5
1
1.5
2
2.5
CD62LLow
CD62LHigh
Multiplicity
When you use α= 0.05 (i.e., p<0.05) as a threshold, you have a 5% chance of making an error.
Multiple experiments: In a paper with 20 figures (including sub-figures) and 5 groups,
that means 200 p-values for all comparisons. By chance alone (meaning, if there are NO associations at all),
you would expect 10 significant p-values Multiple markers:
If you have a panel of 200 markers, you expect at least 10 to be significant by chance alone.
High-throughput setting: If you have 60,000 genes, you would expect 3000 false
positives.
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
Captions
Unlike other areas of medical research, in basic science a lot goes into the captions Statistical methods, p-values, experimental design Figures are multi-paneled (sometimes more than 8
displays in one figure). If you haven’t been clear in the (statistical) methods
section about your analysis approach, you need to in the caption
Figures should speak for themselves (i.e., reader should not have to reference the text).
Legends
When possible, put clarifying information in legends within the figure.
Makes interpretation simpler than sifting through all of the information in the caption.
The "Acid Test" for Tables and Figures: Any Table or Figure you present must be sufficiently clear, well-labeled, and described by its legend to be understood by your intended audience without reading the results section, i.e., it must be able to stand alone and be interpretable. Overly complicated Figures or Tables may be difficult to understand in or out of context, so strive for simplicity whenever possible. If you are unsure whether your tables or figures meet these criteria, give them to a fellow [scientist] and ask them to interpret your results.*
*http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtablefigs.html
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuff
Avoid 3-D unless you have ‘truly’ 3-D results
10 Principles of Display of Data
1. Look at the raw data2. Make graphs before implementing statistical analysis.3. The amount of data you can display is inversely proportional to
the amount of data you have4. Transformations are important5. Consistency is important6. Reflect experimental design and inferences in your displays7. Outliers: do not just delete them!8. Careful of over-interpretation9. Use captions and legends10. Avoid gratuitous fancy stuffBonus Principle: Contact your statistician early and often!
Get a statistical colleague onboard
Engage a statistician. But, “there is no such thing as a free lunch” For support with analyses for grants, they should be
included as collaborator/consultant/co-investigator Resources:
Hollings Cancer Center: Biostatistics Shared Resource CTSA: Biostatistics, Epidemiology & Research Design
Services.
References/Resources
Karl Broman’s top ten worst graphs: http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/
Reporting statistical results in your paper and figures:http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWstats.html
Open courseware: JHU statistics for laboratory scientists coursehttp://ocw.jhsph.edu/index.cfm/go/viewCourse/course/StatisticsLaboratoryScientistsI/coursePage/index/
ARRIVE guidelines: http://www.nc3rs.org.uk/page.asp?id=1357http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000412http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001756
Acknowledgements: Thanks to my colleagues
Examples are drawn from work from the labs of the following MUSC investigators Chris Voelkel-Johnson Shikhar Mehrotra Mark Rubinstein Omar Moussa/Dennis Watson
Clinical data is based on work done in collaboration with Mario Eisenberger (Johns Hopkins)