Advanced statistics
"there are three kinds of lies: lies, damned lies and statistics" Disraeli
Definition• procedure used in data exploration,
organisation, presentation, analysis and interpretation facilitating decision making
• separate real effect from randomvariation
• descriptive – tables– graphs– measures • inferential
– estimating size– sampling
• hypothesistesting
• makingmodels
Variability
repeated measurements18,2°C18,5°C19,1°C18,7°C intra-population variability
180cm175cm165cm157cm
ecological variabilityinter-population differences
ethnic differences
= BIODIVERSITY
temporal fluctuation
time
Data description
• data types: qualitative x quantitative• frequency: histogram• tendency: mean, range, median, mode,
quartiles, ….• variability: standard deviation, variance• distribution: symetric (sometimes
normal), asymetric
symetric
mean=median=modus
asymetric
median
mean
Distributions
Inferential statistics
• population
• sample
Hypothesis testing
• drawing conclusions about population by analyzing sample
research hypothesis
null hypothesis alternative hypothesis
Statistical significance???
by chanceDgenuine experimental effectC
observed difference
Statistical errors vs. measure of statistical significance
• Type 1: false positive• the study showed an effect which in reality does not exist
• Type 2: false negative• an effect was there but the study missed it
• P-value: probability of type I error (a)
Statistical tests
parametric (for normal,normal-likeor transformeddistribution)
non-parametric(distribution-free)
tests unpaired paired
• t-test independent(two-sample t-test)
• ANOVA / MANOVA• regression• correlation (Pearson)
• Mann-Whitney• median test• rank correlation (Spearman, Kendall)
• chi-squared • Kruskall-Walis
ANOVA
• t-test dependent(one-sample)• ANOVA
• Wilcoxon pairedtest
comparison of parameterbetween 2 or more groups of subjects
comparison of pair of parametersin subjects fromone group in timeintervals
ANOVA - Analysis of Variance
• determines the probability that two or more samples were drawn from the same parent population
• can be sub-classified by secondary or grouping variables (two way ANOVA)– eg. blood pressure before and after two different
treatments– if some of the measurements were made in the
same subjects then can be corrected for repeated measures
• comparison of two samples with one way ANOVA is very similar to performing a non-paired t- test
Regression and Correlation
• Regression indicates relationship(mathematical) between two or more variables
• parametric analysis rules apply so data must either be normally distributed or can be converted into normal distribution by transformation
• Correlation indicates an associationbetween two variables
• non-parametric rules apply as the association is calculated between the ranks of the variables, not the variables themselves
• correlation describes an association, not cause and effect!!!!
Regression model
• where a and b are parameters in the regression model
• the emphasis is on predicting one variable from the other
• the least-squares criteria for goodness-of-fit
Correlation
• a measure of the degree of linear relationshipbetween two variables (usually labeled X and Y)
• it is possible for two variables to be related (correlated), but not causing one another
• correlation coefficient (r)– the sign of the correlation coefficient (+ , -) defines the
direction of the relationship– the absolute value of the correlation coefficient measures
the strength of the relationship• r=0.0 indicates the absence of a linear relationship and
correlation• coefficients of r=+1.0 and r=-1.0 indicate a perfect linear
relationship
Scatterplots
Chi-Square test• a very useful, robust, simple test• handles classification data (eg. lived v died)• most frequently used for the 2 x 2 • cross-tabulation • table
• a null hypothesis that there is no significant difference between the survival rate in the two groups