Carolin Kosiol
Institute of Population Genetics
Vetmeduni Vienna
Spezielle Statistik in der Biomedizin
WS 2014/15
Normal Distribution
Normal Distribution
Central Limit Theorem:
If you take repeated samples from a population and
calculate their averages, then these averages will be
normally distributed.
Let’s demonstrate it for ourselves:
means <- numeric(1000)
for (i in 10000){
means[i] <- mean(runif(5)*10)
}
hist(means, ylim=c(0,1600))
How close is this to a normal distribution?
mean(means)
sd(means)
Probability density function:
xv <- seq(0,10,0.1)
yv <- dnorm(xv, mean=4.998581, sd=1.2899)*5000
lines(xv,yv)
Cumulative Probability
pnorm(-2)
Just a bit less than 2.5% will be lower than -2 standard
devations
pnorm(-1)
About 16% of random samples will be smaller than 1
standard deviation below the mean.
1-pnorm(3)
Probability of a sample from a Normal distribution being
more than 3 standard deviations is less than 0.2%
Properties of the Normal Distribution
Quantiles of the Normal Distribution
qnorm(c(0.0025,0.975))
Suppose we have measured the height of 300 horses
horses = rnorm(300, 135, 5)
hist(horses, xlab=“Height at withers”, ylab=“Frequency”)
For large sample sizes n we approach
a Gaussian distribution
horses = rnorm(10000, 135, 5)
hist(horses, 20, xlim=c(100,180), xlab=“Height at withers”,
ylab=“Frequency”)
Height at withers
(height at withers)
Plot for Testing Normality of Single Samples
Quantile-quantile plot’
qqnorm(rnorm(100, mean = 5, sd = 3))
qqline(rnorm(100, mean = 5, sd = 3), col = 2)
qqnorm(rnorm(1000, mean = 5, sd = 3))
qqline(rnorm(1000, mean = 5, sd = 3), col = 2)
Students t- distribution
Used instead of the Normal distribution when sample
sizes are small (n<30)
Shaped like normal distribution, but heavier tails
The equivalents to pnorm and qnorm are pt and qt
Quantiles of the Student t-distribution
qt(0.975,5)
2.57082
t-test
Comes in different „flavors“
One sample t-test to compare mean of the sample to
a known value
Two sample t-test for two independent samples
comparing the means of the two samples (see
example on the next slides)
Paired sample t-test compares the means of two
variables when measures are taken on the same
individuals (eg. before and after treatment)
Assumptions of the t-test
1) normal distribution of data
2) variance homogeneity (equal variances) of the 2
samples
The assumptions need to be checked!
If data is not normally distributed -> use Wilcoxon-
test instead (nonparametric test)
T-test Example Davis balanced dataset with 88 females and 88 males
t-test Example Davis balanced dataset with 88 females and 88 males
t-Test for Unequal Sample Sizes
Summary: the “norm” family
rnorm(n, mu, sqrt(sigma^2) ) simulates an iid sample
of size n with parameter mu and sigma^2 parameter
(but note that the function assumes you are inputting the
square root of this last parameter!!!)
dnorm(x, mu, sqrt(sigma^2) ) provides the value of the
normal probability density function (what is this?) for an x
of a particular value
pnorm(q, mu, sqrt(sigma^2) ) calculates the area
under the curve from negative infinity to the value q
qnorm(p, mu, sqrt(sigma^2) ) is the opposite of pnorm,
i.e. it takes an output of pnorm and returns the value q
(why is this useful!?)
R exercise (develop you own code)
Using the "rnorm“ function in R, simulate two samples
of size n=40; the first with mean µ=3.7 and the second
with mean µ=5.5 (average number of offspring for
dourmouse in a conifer forests and broad-leafed forests
environment respectively). Vary the standard deviation
of the samples (1,2,3,4,5,6).
Use a t-test to answer the following question: When is
the difference between the two samples not significant
(P > 0.05) anymore?