QM STEM Ed 20181
Quantitative Methods in STEM
Education Research
Topic 4: Inferential statistics
Judy Sheard
Faculty of Information Technology
Monash University, Australia
QM STEM Ed 20182
Overview of topic 4
Hypothesis testing.
Central Limit Theorem
Level of significance
Z-scores
Confidence intervals
Categories of statistical tests
QM STEM Ed 20183
Descriptive vs. inferential statistics
Descriptive statistics — used to describe sets
of quantitative data. This involves
descriptions of distributions of data and
relationships between variables.
Inferential statistics — used to make
inferences about populations from analysis of
subsets (samples) of the population.
QM STEM Ed 20184
Inferential statistics
“In inferential statistics, statistics are measures of the
sample and parameters are measures of the
population. Inferences are made about the
parameters from the statistics”. (Wiersma, 1995,
p.363)
Inferences are made about a population based on a
subset or random sample of that population.
Note that in educational research it is often not possible to
have a random sample – instead we attempt to show that
the sample is typical of the population by comparing
demographics, e.g. gender, age, educational background.
QM STEM Ed 20185
Hypothesis testing
In inferential statistics, a hypothesis is used to determine whether an observation has an underlying cause or whether it was due to some random fluctuation or error in a sample.
The researcher will test to see if the hypothesis is consistent with the sample data – if not the hypothesis is rejected.
Two different ways of stating a hypothesis:
Looking for a difference between groups;
Looking for relationships between groups.
QM STEM Ed 20186
Hypothesis testing
On what basis do we accept or reject a hypothesis? Consider this example:
A set of exercises was designed to encourage reflection on program design. It was hypothesized that these exercises improved students’ skills in program design. This method was used on a class of 30 students. In a test on program design, the class scored a mean of 60% with a standard deviation of 10. The same test on another class that had not used these exercises, resulted in a mean score of 55% with a standard deviation of 12.
Does the hypothesis seem reasonable? What if the class mean was 70%? What about 57%?
QM STEM Ed 20187
Null hypothesis
In inferential statistics we test the opposite of a research hypothesis using the null hypothesis. For example:
Research hypothesis: Skills in program design will be improved with the use of exercises to encourage reflection on program design.
Null hypothesis: There will be no difference in skill levels in program design between students who have completed exercises to encourage reflection on program design and those who have not.
Research hypothesis: The performance of introductory programming students is related to prior programming experience.
Null hypothesis: There is no relationship between programming performance and prior programming experience.
If your study finds there is a difference or some relationship then you can reject the null hypothesis (H0) and you can state that there is support for your research hypothesis (H1).
QM STEM Ed 20188
Sampling distribution
We need more than intuition here.
We will connect probability with a statistic — using
the concept of a sampling distribution of the
statistic.
A sampling distribution consists of the values of a
statistic computed from all possible samples of a
given size. (Wiersma, 2005, p.375).
Note that the sampling distribution is not the
sample distribution.
What does this mean?
We have a population.
We can take a sample of size n from the
populations and compute a statistic of this
sample, e.g. the mean.
We take all possible samples of size n and
compute the statistic of these samples.
We now have a distribution of the statistic.
QM STEM Ed 20189
QM STEM Ed 201810
Central limit theorem
The shape, location (central tendency) and
variability (dispersion) of the sampling
distribution is described by the central limit
theorem.
The central limit theorem (CLT) states:
Given any population, the distribution of the
sample mean is approximately a normal
distribution, provided the sample size is large.
This is the key theorem in statistics!
QM STEM Ed 201811
Central limit theorem
The central limit theorem specifies that the
sampling distribution of the mean has a mean
equal to the population mean (μ), a standard
deviation equal to σ/√n, and is normally
distributed.
(σ is the standard deviation of the population)
Some simulations to illustrate this:http://www.stat.sc.edu/~west/javahtml/CLT.html
http://www.rand.org/statistics/applets/clt.html
http://en.wikipedia.org/wiki/Concrete_illustration_of_the_central_limit_theorem
QM STEM Ed 201812
Level of significance
The level of significance is a probability used in testing hypotheses. It is a criterion used in making a decision about the hypothesis.
The common level used in educational research is 0.05. Occasionally other levels are used: 0.01, 0.001 and 0.1.
A level of 0.05 means that when the probability is lower than 0.05, the null hypothesis is rejected.
It then follows that if the null hypothesis is true it will only be rejected 5% of the time.
We now connect the sampling distribution with the level of significance.
QM STEM Ed 201813
The “68.3 - 95.5 - 99.7” rule
QM STEM Ed 201814
Z-score
The z-score (also called standard score) indicates how far, and in what direction, that score deviates from its distribution's mean, expressed in units of the distribution's standard deviation.
The formula for creating z-scores is:
Where:
x is a raw score to be standardizedμ is the mean of the population
σ is the standard deviation of the population
QM STEM Ed 201815
Standard z-score
The z-score indicates if a score was above or below the distribution mean.
A z-score of +1 indicates one standard deviation above the population mean.
A z-score of -1 indicates one standard deviation below the population mean.
For example, a mark of 53 on a test where the mean of all marks was 67 and the standard deviation of marks was 7 would give a standard score of -2.0.
QM STEM Ed 201816
Properties of standard scores
A z-score makes it possible to compare scores from different distributions.
z-scores have the following properties:
The mean of any set of z-scores is zero.
The standard deviation of any set of z-scores is always equal to 1.
The distribution of z-scores has the same shape as the distribution of raw score from which they were derived.
QM STEM Ed 201817
Confidence intervals
A confidence interval specifies a range within which we can have some degree of confidence of finding of finding another value – usually the population mean.
To construct a confidence interval based on the normal distribution we need:
a random sample of size n
the sample mean
the standard deviation of the population
a level of confidence
QM STEM Ed 201818
Defining confidence intervals
To find the lower (L) and upper (U) limits for a
confidence interval we use to following
L
U
x zn
x zn
The sample mean The sample
size
The std deviation
A z-score indicating the confidence level
QM STEM Ed 201819
Confidence intervals
Increasing the confidence level widens the
confidence interval.
Increasing the sample size narrows the
confidence interval.
Increasing the standard deviation makes the
interval wider.
Common confidence levels are 90%, 95%,
99% - but we can specify any level below
100%.
QM STEM Ed 201820
Choosing the z-score
1.96 1.96
For 95% confidence we
choose a central area of
0.95 on the standard normal
curve.
For 90% confidence we
choose a central area of
0.90 on the standard normal
curve.
0.95
1.645 1.645
0.90
QM STEM Ed 201821
The “68.3-95.5-99.7” rule
QM STEM Ed 201822
Example
The numbers below were randomly drawn from
a normal population with σ = 10.
56.87, 73.96, 59.77, 75.89, 71.60, 81.94, 69.11,
80.07, 74.70, 63.32
The sample mean = 70.72 and we want a 95%
confidence interval. So,10
L 70.72 1.96 64.5210
10U 70.72 1.96 76.92
10
QM STEM Ed 201823
Example cont..
So we are 95% confident that the population mean is between 64.52 and 76.92.
What does this really mean?
Would you get the same result from another random sample of size 10?
What if you took another 100 samples and constructed 100 confidence intervals? They would all be different and about 5% of them would
not even contain the population mean
QM STEM Ed 201824
The standard error
The standard error of the sample mean is:
You can see that the standard error gets smaller
as the sample size increases.
The standard error also shows up in the confidence
interval formula:
xn
x zn
This is why the
interval get smaller as n increases
QM STEM Ed 201825
Null hypothesis
The null hypothesis H0 is
the hypothesis of no
difference or no
relationship.
But there is a possibility
of a wrong decision.
Reducing the risk of one
error increases the risk
of another error.
Correct Error
(Type II
error
p = β)
Error
(Type I
error
p = α)
Correct
p = 1- β
(power)
H0 True H0 False
Accept H0
Reject H0
Researcher’s
decision
“State of the world” –
actual situation
QM STEM Ed 201826
Type I and Type II errors
Type I error occurs when the decision is to
reject the null hypothesis when it is actually
true. This probability equals the significance
level. Symbolized by α
Type II error occurs when the decision is not
to reject the null hypothesis when it is actually
false. Symbolized by β
QM STEM Ed 201827
Power of a statistical test
The power of a statistical test is the probability that the
test will lead to a decision to reject the null
hypothesis when the null hypothesis is indeed false.
How to increase the power of a test:
Increase the significance level
Increase the sample size
Reduce variability – e.g. use homogenous groups,
exclude outliers
http://www.cas.buffalo.edu/classes/psy/segal/2072001/Hyptest/Hyptsting.htm
QM STEM Ed 201828
Power
Predicted
distribution
Actual
distribution
QM STEM Ed 201829
Review of inferential statistics
reasoning
We have a population that we wish to make
measures of – parameters.
We select a random sample and compute
measures of the sample – statistics.
The statistics reflect the corresponding
parameters and sampling distribution.
We observe the statistics, and infer back to
the parameters in the light of the sampling
distribution and probability.
QM STEM Ed 201830
Analysis using inferential statistics
Data may be analysed using inferential statistics.
A common process is through hypothesis testing.
The role of hypothesis testing is to determine whether
the result obtained from analysis occurred by
chance.
The null hypothesis is tested through these statistical
tests.
There are numerous tests.
There are many different sampling distributions.
QM STEM Ed 201831
Tests using inferential statistics
Broadly two categories:
Parametric analyses – interval scale
measurement and assumptions about the
population.
Nonparametric analyses – typically nominal and
ordinal scale measurement and generally no
assumptions about the population.
QM STEM Ed 201832
Deciding upon which test to use
Factors to consider:
Number of independent and dependent variables.
Measurement levels of independent and dependent variables.
Related vs. non-related variables (only relevant when comparing groups).
Number of categories for the independent variables.
QM STEM Ed 201833
Why is measurement important?
Compute the average mark of these results.
Compute the average hair colour.
83 47 34 23 85 33 84 83 72 94 30
black brown red blonde
blonde blonde black red
QM STEM Ed 201834
Statistical vs. practical significance
We may be able to demonstrate statistical
significance but the effects may be very
small.
Hence, the effects may be statistically
significant but not practically significant.
Perhaps the cost of the new teaching method
or technology may outweigh the benefits
gained.
QM STEM Ed 201835
Meta analysis
Quantitative methods are typically applied to individual
studies.
Quantitative methods may also be used to review
results across studies – this is called meta analysis.
Vote counting – this technique has poor power
p-values – uses the size of the p-value
lost studies – estimates of numbers of studies not
published due to non-significant results.
Effect sizes – estimates of the effect size of studies are
compared
QM STEM Ed 201836
Categories of statistical tests
Differences between groups
Degree of relationship between variables
Clustering of variables or individuals
Analyses across time
In the remaining lectures we will review some
of the most commonly used in educational
research.
QM STEM Ed 201837