+ All Categories
Home > Education > Hypothesis testing

Hypothesis testing

Date post: 04-Aug-2015
Category:
Upload: sohail-patel
View: 93 times
Download: 0 times
Share this document with a friend
Popular Tags:
62
Hypothesis testing and interpretation of data Testing Of Hypothesis The basic logic of hypothesis testing is to prove or disprove the research question. When a researcher conducts quantitatively research, he/she is attempting to answer a research question or hypothesis that has been formulated .One method of evaluating this research question is via a process called hypothesis testing, which is sometimes also referred to as significance testing. Example : Two lecturers, Sandy and Mandy, thinks that they use the best method to teach their students. Each lecturer has 50 statistics student who are studying a graduate degree in management. In sandy’s class, students have to attend one lecture and one seminar class every week, whilst in Mandy believes that lectures are sufficient by themselves in their own time. This is the first year that Sandy has given seminars, but since they take up Page 1 of 62
Transcript
Page 1: Hypothesis testing

Hypothesis testing and interpretation of data

Testing Of Hypothesis

The basic logic of hypothesis testing is to prove or disprove the research question. When a

researcher conducts quantitatively research, he/she is attempting to answer a research question or

hypothesis that has been formulated .One method of evaluating this research question is via a

process called hypothesis testing, which is sometimes also referred to as significance testing.

Example :

Two lecturers, Sandy and Mandy, thinks that they use the best method to teach their students.

Each lecturer has 50 statistics student who are studying a graduate degree in management. In

sandy’s class, students have to attend one lecture and one seminar class every week, whilst in

Mandy believes that lectures are sufficient by themselves in their own time. This is the first year

that Sandy has given seminars, but since they take up a lot of her time, she wants to make sure

that she is not wasting her time and that seminars improve the students’ performance.

The Research Hypothesis

The first step in hypothesis testing is to set a research hypothesis. In a sandy and mandy,s study,

the aim is to examine the effect that two different teaching methods – providing both lectures

and seminars classes (sandy), and providing only lectures by themselves (mandy) – had on the

performance of the students. More specifically , they want to determine whether performance is

different between the two different teaching methods. Whilst mandy is skeptical about the

Page 1 of 52

Page 2: Hypothesis testing

effectiveness of seminars, sandy clearly believes that students do better than those in mandy’s

class. This leads to the following research hypothesis:

Research hypothesis: When student attend seminar classes, in addition to lecture, their

performance increases.

By taking a hypothesis testing approach, Sandy and Mandy want to generalize their result toa

population(total students) rather than just the students in their sample. However, in order to use

hypothesis testing, one needs to re-state the research hypothesis as a null and alternative

hypothesis.

Null hypothesis : the null hypothesis (H0) is a hypothesis which the researcher tries to disprove,

reject or nullify. A null hypothesis is “the hypothesis that there is no relationship between two or

more variables, symbolized as H0.

Alternative hypothesis: the alternate, or research, hypothesis proposes a relationship between two

or more variables, symbolized as H1.

Decision errors

Two type of errors can result from a hypothesis test.

TypeⅠerror : A typeⅠerror occurs when the researcher rejects anull hypothesis when it is true. The

probability of committing a type error is called the significance level. This probability is also

called alpha, and is often denoted by α

Type Ⅱerror : A Type error occurs when the researcher fails to reject a null hypothesis, which Ⅱ

is false. The probability of committing a Type error is called Beta, and is often denotedⅡ

by β . The probability of not committing a TypeⅡ error is called the Power of the test.

Page 2 of 52

Page 3: Hypothesis testing

Steps/ procedures in Hypothesis Testing

1. Identify the research problem :

The first step is to state the research problem The research problem needs to identify the

population of interest ,and the variables under investigation.

Example of research problem: To find out the effectiveness of two teaching methods- only

lecture method- with reference to exam marks of the students.

Page 3 of 52

Page 4: Hypothesis testing

In the above research problem, the population of interest refers to the student, and the variable

include the teaching methods and the marks.

This step enable the researcher not only define what is not to be tested but what variable(s) will

be used in sample data collection. The type of variable(s), wheter categorical, discreate or

continuous, further defines the statistical test which can be performed on the collected data.

2.Specific the null and alternative Hypothesis:

The research problem or question is converted into a null hypothesis and an alternative

hypothesis. The hypothesis. The hypotheses are started in such a way that they are mutually

exclusive. That is, if one is true, the other must be false.

(a)Null Hypothesis: A null hypothesis (H0)is a statement that declares the observed difference is

due to “chance”. It is the hypothesis the researcher hopes to reject or disprove.

A null hypothesis states that there is no relationship between two or more variables. The

simplistic definition of the null is - as the opposite of the alternative hypothesis(H1).

Example: “There is no difference between the two methods of teaching( only lecture method,

and lecture-cum-seminar method) on the scoring of marks of student.”

(b) Alternative Hypothesis:

The alternate hypothesis proposes a relationship between two or more variables, symbolized as

H1.

Example: “The lecture-cum-seminar method improves the scoring of marks of students as

compared to the only lecture method.”

Page 4 of 52

Page 5: Hypothesis testing

“Note that the two hypotheses we propose to test must be mutually exclusive i.e., when one is

true the other must be false. And we see that they must be exhaustive; they must be include all

possible occurrences.”

From the above, it is clear that the null hypothesis is a hypothesis of no difference. The main

problem of testing of hypothesis is to accept or to reject the null hypothesis. The alternative

hypothesis specifies a definite relationship between the two variables. Only one alternative

hypothesis is tested against the null hypothesis.

3. Significance Level:

After formulating the hypotheses, the researcher must determine a certain level of significance.

The confidence with which a null hypothesis is accepted or rejected depends on the level of

significance.

Generally, the level of significance falls between 5%and 1%:

A significance level of 5% means the risk of making a wrong decision in accepting a false

hypothesis or in rejecting a true hypothesis by 5 times out 100 occasions.

A significance level of 1% means the ris of making a wrong decision is 1%. This means the

researcher may make o

A wrong decision in accepting a false hypothesis or in rejecting a true hypothesis is once out of

100 occasions. Therefore, a 1% level of significance provides greater confidence with which null

hypothesis is accepted or rejected as compared to 5% level of significance.

4.Test Statistic:

Page 5 of 52

Page 6: Hypothesis testing

A statistic used to test the null hypothesis. The researcher needs to identify a test statistic that can

be used to assess the truth of the null hypothesis. It is used to test whether the null hypothesis set

up should be accepted or rejected.

Test statistic is calculated from the collected data. There are different types of test statistics. For

instance, the z statistic will compare the observed sample mean to an expected population mean

μ0. Large test statistics indicate data are far from expected, providing evidence against the null

hypothesis and in favor of the alternative hypothesis.

Every test in statistics indicate the same. Based on the sample data, it gives the probability( P-

Value) that can be observed. When the P-Value is low, it means the sample data are very

significant and it indicates that the null hypothesis is wrong. When the P-value is high, it

suggests that the null hypothesis is wrong. When the P-value is high, it suggest that the collected

data are within the normal range.

5.Region of Acceptance and Region of Rejection :

The region of acceptance is a range of values. If the test statistic falls within the region of

acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the

chance of making a Type Ⅰerror is equal to the Alpha(α) level of significance.

Type Ⅰerror –A rejection of a true null hypothesis

The set of values outside the region of acceptance is called the region of rejection. If the test

statistics falls within the region of rejection, the null hypothesis is rejected at the Alpha (α) level

of significance.

6. Select an Appropriate Test:

Page 6 of 52

Page 7: Hypothesis testing

A hypothesis test may be one-tailed or two-tailed. Whether the test is one sided or 2 sided

depends on alternative hypothesis and nature of the problem.

A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling

distribution, is called a one-tailed test. For example, suppose the null hypothesis states that the

mean is less than equal to 10. The alternative hypothesis would be that the mean is greater than

10. The region of rejection would consist of a range of numbers located on the right side of

sampling distribution; that is, a set of numbers greater than 10.

In simple words, in one tailed test, the test statistic for rejection of null hypothesis falls only in

one side of sampling distribution curve.

Significance Level

In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis. The significance level is used in hypothesis testing as follows: First, the difference between the results of the experiment and the null hypothesis is determined. Then, assuming the null hypothesis is true, the probability of a difference that large or larger is computed . Finally, this probability is compared to the significance level. If the probability is less than or equal to the significance level, then the null hypothesis is rejected and the outcome is said to be statistically significant. Traditionally, experimenters have used either the 0.05 level (sometimes called the 5% level) or the 0.01 level (1% level), although the choice of levels is largely subjective. The lower the significance level, the more the data must diverge from the null hypothesis to be

Page 7 of 52

Page 8: Hypothesis testing

significant. Therefore, the 0.01 level is more conservative than the 0.05 level. The Greek letter alpha (α) is sometimes used to indicate the significance level. See also: Type I error and significance test

Page 8 of 52

Page 9: Hypothesis testing

5) Identify the rejection region

• Is it an upper, lower, or two-tailed test?

• Determine the critical value associated with a, the level of significance of the test

The third step is to compute the probability value (also known as the p value). This is the probability of obtaining a sample statistic as different or more different from the parameter specified in the null hypothesis given that the null hypothesis is true.

Page 9 of 52

Page 10: Hypothesis testing

Page 10 of 52

Page 11: Hypothesis testing

Page 11 of 52

Page 12: Hypothesis testing

Hypothesis testing

Page 12 of 52

Page 13: Hypothesis testing

PARAMETRIC TESTS

1. Descriptive Statistics – overview of the attributes of a data set. These include measurements of central tendency (frequency histograms, mean, median, & mode) and dispersion (range, variance & standard deviation)

2. Inferential Statistics - provide measures of how well data support hypothesis and if data are generalizable beyond what was tested (significance tests)

Data: Observations recorded during research

Types of data:

Page 13 of 52

Page 14: Hypothesis testing

1. Nominal dataà synonymous with categorical data, assigned names/ categories based on characters with out ranking between categories.ex. male/female, yes/no, death /survival

2. Ordinal dataà ordered or graded data, expressed as Scores or ranks

ex. pain graded as mild, moderate and severe

3. Interval dataà an equal and definite interval between two measurements

it can be continuous or discrete

ex. weight expressed as 20, 21,22,23,24

interval between 20 & 21 is same as 23 &24

Page 14 of 52

Page 15: Hypothesis testing

Parametric Hypothesis tests are frequently used to measure the quality of sample parameters or to test whether estimates on a given parameter are equal for two samples.

Parametric Hypothesis tests set up a null hypothesis against an alternative hypothesis, testing, for instance, whether or not the population mean is equal to a certain value, and then using appropriate statistics to calculate the probability that the null hypothesis is true. You can then reject or accept the null hypothesis based on the calculated probability.

Page 15 of 52

Page 16: Hypothesis testing

Z test

z-test is based on the normal probability distribution and is used for judging the significance of several statistical measures, particularly the mean. The relevant test statistic, z, is worked out and compared with its probable value (to be read from table showing area under normal curve) at a specified level of significance for judging the significance of the measure concerned. This is a most frequently used test in research studies. This test is used even when binomial distribution or t-distribution is applicable on the presumption that such a distribution tends to approximate normal distribution as ‘n’ becomes larger. z-test is generally used for comparing the mean of a sample to some hypothesised mean for the population in case of large sample, or when population variance is known. z-test is also used for judging he significance of difference between means of two independent samples in case of large samples, or when population variance is known. z-test is also used for comparing the sample proportion to a

Page 16 of 52

Page 17: Hypothesis testing

theoretical value of population proportion or for judging the difference in proportions of two independent samples when n happens to be large. Besides, this test may be used for judging the significance of median, mode, coefficient of correlation and several other measures. t-test is based on t-distribution and is considered an appropriate test for judging the significance of a sample mean or for judging the significance of difference between the means of two samples in case of small sample(s) when population variance is not known (in which case we use variance of the sample as an estimate of the population variance). In case two samples are related, we use paired t-test (or what is known as difference test) for judging the significance of the mean of difference between the two related samples. It can also be used for judging the significance of the coefficients of simple and partial correlations. The relevant test statistic, t, is calculated from the sample data and then compared with its probable value based on t-distribution (to be read from the table that gives probable values of t for different levels of significance for different degrees of freedom) at a specified level of significance for concerning degrees of freedom for accepting or rejecting the null hypothesis. It may be noted that t-test applies only in case of small sample(s) when population variance is unknown.

A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by anormal distribution. Because of the central limit theorem, many test statistics are approximately normally distributed for large samples. For each significance level, the Z-test has a single critical value (for example, 1.96 for 5% two tailed) which makes it more convenient than the Student's t - test which has separate critical values for each sample size. Therefore, many statistical tests can be conveniently performed as approximate Z-tests if the sample size is large or the population variance known. If the population variance is unknown (and therefore has to be estimated from the sample itself) and the sample size is not large (n < 30), the Student's t-test may be more appropriate.

If T is a statistic that is approximately normally distributed under the null hypothesis, the next step in performing a Z-test is to estimate the expected value θ of T under the null hypothesis, and then obtain an estimate s of the standard deviation of T. After that the standard score Z = (T − θ) / s is calculated, from which one-tailed and two-tailed p -values can be calculated as Φ(−Z) (for upper-tailed tests), Φ(Z) (for lower-tailed tests) and 2Φ(−|Z|) (for two-tailed tests) where Φ is the standard normal cumulative distribution function.

Use in location testing[edit]

The term "Z-test" is often used to refer specifically to the one-sample location test comparing the mean of a set of measurements to a given constant. If the observed data X1, ..., Xn are (i) uncorrelated, (ii) have a common mean μ, and (iii) have a common variance σ2, then the sample average X has mean μ and variance σ2 / n. If our null hypothesis is that the mean value of the population is a given number μ0, we can use X −μ0 as a test-statistic, rejecting the null hypothesis if X −μ0is large.

To calculate the standardized statistic Z = (X − μ0) / s, we need to either know or have an approximate value for σ2, from which we can calculate s2 = σ2 / n. In some applications, σ2 is known, but this is uncommon. If the sample size is moderate or large, we can substitute the sample variance for σ2, giving a plug-in test. The resulting test will not be an exact Z-test since the uncertainty in the sample variance is not accounted for — however, it will be a good approximation unless the sample size is small. A t - test can be used to account for the uncertainty in the sample variance when the sample size is small and

Page 17 of 52

Page 18: Hypothesis testing

the data are exactly normal. There is no universal constant at which the sample size is generally considered large enough to justify use of the plug-in test. Typical rules of thumb range from 20 to 50 samples. For larger sample sizes, the t-test procedure gives almost identical p-values as the Z-test procedure.

Other location tests that can be performed as Z-tests are the two-sample location test and the paired difference test.

Conditions[edit]

For the Z-test to be applicable, certain conditions must be met.

Nuisance parameters should be known, or estimated with high accuracy (an example of a nuisance parameter would be the standard deviation in a one-sample location test). Z-tests focus on a single parameter, and treat all other unknown parameters as being fixed at their true values. In practice, due to Slutsky's theorem, "plugging in" consistent estimates of nuisance parameters can be justified. However if the sample size is not large enough for these estimates to be reasonably accurate, the Z-test may not perform well.

The test statistic should follow a normal distribution. Generally, one appeals to the central limit theorem to justify assuming that a test statistic varies normally. There is a great deal of statistical research on the question of when a test statistic varies approximately normally. If the variation of the test statistic is strongly non-normal, a Z-test should not be used.

If estimates of nuisance parameters are plugged in as discussed above, it is important to use estimates appropriate for the way the data were sampled. In the special case of Z-tests for the one or two sample location problem, the usual samplestandard deviation is only appropriate if the data were collected as an independent sample.

In some situations, it is possible to devise a test that properly accounts for the variation in plug-in estimates of nuisance parameters. In the case of one and two sample location problems, a t -test does this.

Example[edit]

Suppose that in a particular geographic region, the mean and standard deviation of scores on a reading test are 100 points, and 12 points, respectively. Our interest is in the scores of 55 students in a particular school who received a mean score of 96. We can ask whether this mean score is significantly lower than the regional mean — that is, are the students in this school comparable to a simple random sample of 55 students from the region as a whole, or are their scores surprisingly low?

We begin by calculating the standard error of the mean:

where is the population standard deviation

Next we calculate the z -score , which is the distance from the sample mean to the population mean in units of the standard error:

Page 18 of 52

Page 19: Hypothesis testing

In this example, we treat the population mean and variance as known, which would be appropriate if all students in the region were tested. When population parameters are unknown, a t test should be conducted instead.

The classroom mean score is 96, which is −2.47 standard error units from the population mean of 100. Looking up the z-score in a table of the standard normal distribution, we find that the probability of observing a standard normal value below -2.47 is approximately 0.5 - 0.4932 = 0.0068. This is the one-sided p -value for the null hypothesis that the 55 students are comparable to a simple random sample from the population of all test-takers. The two-sided p-value is approximately 0.014 (twice the one-sided p-value).

Another way of stating things is that with probability 1 − 0.014 = 0.986, a simple random sample of 55 students would have a mean test score within 4 units of the population mean. We could also say that with 98.6% confidence we reject the null hypothesis that the 55 test takers are comparable to a simple random sample from the population of test-takers.

The Z-test tells us that the 55 students of interest have an unusually low mean test score compared to most simple random samples of similar size from the population of test-takers. A deficiency of this analysis is that it does not consider whether theeffect size of 4 points is meaningful. If instead of a classroom, we considered a subregion containing 900 students whose mean score was 99, nearly the same z-score and p-value would be observed. This shows that if the sample size is large enough, very small differences from the null value can be highly statistically significant. See statistical hypothesis testing for further discussion of this issue.

Z-tests other than location tests[edit]

Location tests are the most familiar Z-tests. Another class of Z-tests arises in maximum likelihood estimation of theparameters in a parametric statistical model. Maximum likelihood estimates are approximately normal under certain conditions, and their asymptotic variance can be calculated in terms of the Fisher information. The maximum likelihood estimate divided by its standard error can be used as a test statistic for the null hypothesis that the population value of the parameter equals zero.

More generally, if is the maximum likelihood estimate of a parameter θ, and θ0 is the value of θ under the null hypothesis,

can be used as a Z-test statistic.

When using a Z-test for maximum likelihood estimates, it is important to be aware that the normal approximation may be poor if the sample size is not sufficiently large. Although there is no simple, universal rule stating how large the sample size must be to use a Z-test, simulation can give a good idea as to whether a Z-test is appropriate in a given situation.

Page 19 of 52

Page 20: Hypothesis testing

Z-tests are employed whenever it can be argued that a test statistic follows a normal distribution under the null hypothesis of interest. Many non-parametric test statistics, such as U statistics, are approximately normal for large enough sample sizes, and hence are often performed as Z-tests.

F test

F-test is based on F-distribution and is used to compare the variance of the two-independent samples. This test is also used in the context of analysis of variance (ANOVA) for judging the significance of more than two sample means at one and the same time. It is also used for judging the significance of multiple correlation coefficients. Test statistic, F, is calculated and compared with its probable value (to be seen in the F-ratio tables for different degrees of freedom for greater and smaller variances at specified level of significance) for accepting or rejecting the null hypothesis.

An F-test is any statistical test in which the test statistic has an F -distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the populationfrom which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.[

Common examples of F-tests[edit]

Common examples of the use of F-tests are, for example, the study of the following cases:

The hypothesis that the means of a given set of normally distributed populations, all having the same standard deviation, are equal. This is perhaps the best-known F-test, and plays an important role in the analysis of variance (ANOVA).

The hypothesis that a proposed regression model fits the data well. See Lack-of-fit sum of squares.

The hypothesis that a data set in a regression analysis follows the simpler of two proposed linear models that are nestedwithin each other.

Page 20 of 52

Page 21: Hypothesis testing

In addition, some statistical procedures, such as Scheffé's method for multiple comparisons adjustment in linear models, also use F-tests.

F-test of the equality of two variances[edit]

Main article: F-test of equality of variances

The F-test is sensitive to non-normality.[2][3] In the analysis of variance (ANOVA), alternative tests include Levene's test,Bartlett's test, and the Brown–Forsythe test. However, when any of these tests are conducted to test the underlying assumption of homoscedasticity (i.e. homogeneity of variance), as a preliminary step to testing for mean effects, there is an increase in the experiment-wise Type I error rate.[4]

Formula and calculation[edit]

Most F-tests arise by considering a decomposition of the variability in a collection of data in terms of sums of squares. Thetest statistic in an F-test is the ratio of two scaled sums of squares reflecting different sources of variability. These sums of squares are constructed so that the statistic tends to be greater when the null hypothesis is not true. In order for the statistic to follow the F -distribution under the null hypothesis, the sums of squares should be statistically independent, and each should follow a scaled chi-squared distribution. The latter condition is guaranteed if the data values are independent and normally distributed with a common variance.

Multiple-comparison ANOVA problems[edit]

The F-test in one-way analysis of variance is used to assess whether the expected values of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments. The ANOVA F-test can be used to assess whether any of the treatments is on average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same mean response. This is an example of an "omnibus" test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments (for instance, in the medical trial example with four treatments we could carry out six tests among pairs of treatments). The advantage of the ANOVA F-test is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for making multiple comparisons. The disadvantage of the ANOVA F-test is that if we reject the null hypothesis, we do not know which treatments can be said to be significantly different from the others – if the F-test is performed at level α we cannot state that the treatment pair with the greatest mean difference is significantly different at level α.

The formula for the one-way ANOVA F-test statistic is

or

Page 21 of 52

Page 22: Hypothesis testing

The "explained variance", or "between-group variability" is

where denotes the sample mean in the ith group, ni is the number of observations in the ith group, denotes the overall mean of the data, and K denotes the number of groups.

The "unexplained variance", or "within-group variability" is

where Yij is the jth observation in the ith out of K groups and N is the overall sample size. This F-statistic follows the F -distribution with K−1, N −K degrees of freedom under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if the population means of the groups all have the same value.

Note that when there are only two groups for the one-way ANOVA F-test, F=t2 where t is the Student's t statistic .

Regression problems[edit]

Consider two models, 1 and 2, where model 1 is 'nested' within model 2. Model 1 is the Restricted model, and Model 2 is the Unrestricted one. That is, model 1 has p1 parameters, and model 2 has p2 parameters, where p2 > p1, and for any choice of parameters in model 1, the same regression curve can be achieved by some choice of the parameters of model 2. (We use the convention that any constant parameter in a model is included when counting the parameters. For instance, the simple linear model y = mx + b has p=2 under this convention.) The model with more parameters will always be able to fit the data at least as well as the model with fewer parameters. Thus typically model 2 will give a better (i.e. lower error) fit to the data than model 1. But one often wants to determine whether model 2 gives a significantly better fit to the data. One approach to this problem is to use an F test.

If there are n data points to estimate parameters of both models from, then one can calculate the F statistic, given by

where RSSi is the residual sum of squares of model i. If your regression model has been calculated with weights, then replace RSSi with χ2, the weighted sum of squared residuals. Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with (p2−p1, n−p2) degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F -distribution for some desired false-rejection probability (e.g. 0.05). The F-test is a Wald test.

One-way ANOVA example[edit]

Page 22 of 52

Page 23: Hypothesis testing

Consider an experiment to study the effect of three different levels of a factor on a response (e.g. three levels of a fertilizer on plant growth). If we had 6 observations for each level, we could write the outcome of the experiment in a table like this, wherea1, a2, and a3 are the three levels of the factor being studied.

a1 a2 a3

6 8 13

8 12 9

4 9 11

5 11 8

3 6 7

4 8 12

The null hypothesis, denoted H0, for the overall F-test for this experiment would be that all three levels of the factor produce the same response, on average. To calculate the F-ratio:

Step 1: Calculate the mean within each group:

Step 2: Calculate the overall mean:

where a is the number of groups.

Step 3: Calculate the "between-group" sum of squares:

where n is the number of data values per group.

The between-group degrees of freedom is one less than the number of groups

Page 23 of 52

Page 24: Hypothesis testing

so the between-group mean square value is

Step 4: Calculate the "within-group" sum of squares. Begin by centering the data in each group

a1 a2 a3

6−5=1 8−9=−1 13−10=3

8−5=3 12−9=3 9−10=−1

4−5=−1 9−9=0 11−10=1

5−5=0 11−9=2 8−10=−2

3−5=−2 6−9=−3 7−10=−3

4−5=−1 8−9=−1 12−10=2

The within-group sum of squares is the sum of squares of all 18 values in this table

The within-group degrees of freedom is

Page 24 of 52

Page 25: Hypothesis testing

Thus the within-group mean square value is

Step 5: The F-ratio is

The critical value is the number that the test statistic must exceed to reject the test. In this case, Fcrit(2,15) = 3.68 at α = 0.05. Since F=9.3 > 3.68, the results are significant at the 5% significance level. One would reject the null hypothesis, concluding that there is strong evidence that the expected values in the three groups differ. The p-value for this test is 0.002.

After performing the F-test, it is common to carry out some "post-hoc" analysis of the group means. In this case, the first two group means differ by 4 units, the first and third group means differ by 5 units, and the second and third group means differ by only 1 unit. The standard error of each of these

differences is . Thus the first group is strongly different from the other groups, as the mean difference is more times the standard error, so we can be highly confident that the population mean of the first group differs from the population means of the other groups. However there is no evidence that the second and third groups have different population means from each other, as their mean difference of one unit is comparable to the standard error.

Note F(x, y) denotes an F -distribution cumulative distribution function with x degrees of freedom in the numerator and ydegrees of freedom in the denominator.

ANOVA's robustness with respect to Type I errors for departures from population normality[edit]

Page 25 of 52

Page 26: Hypothesis testing

The one-way ANOVA can be generalized to the factorial and multivariate layouts, as well as to the analysis of covariance.[clarification needed]

It is often stated in popular literature that none of these F-tests are robust when there are severe violations of the assumption that each population follows the normal distribution, particularly for small alpha levels and unbalanced layouts.[5] Furthermore, it is also claimed that if the underlying assumption of homoscedasticity is violated, the Type I error properties degenerate much more severely.[6]

However, this is a misconception, based on work done in the 1950s and earlier. The first comprehensive investigation of the issue by Monte Carlo simulation was Donaldson (1966).[7] He showed that under the usual departures (positive skew, unequal variances) "the F-test is conservative" so is less likely than it should be to find that a variable is significant. However, as either the sample size or the number of cells increases, "the power curves seem to converge to that based on the normal distribution". More detailed work was done by Tiku (1971).[8] He found that "The non-normal theory power of F is found to differ from the normal theory power by a correction term which decreases sharply with increasing sample size." The problem of non-normality, especially in large samples, is far less serious than popular articles would suggest.

The current view is that "Monte-Carlo studies were used extensively with normal distribution-based tests to determine how sensitive they are to violations of the assumption of normal distribution of the analyzed variables in the population. The general conclusion from these studies is that the consequences of such violations are less severe than previously thought. Although these conclusions should not entirely discourage anyone from being concerned about the normality assumption, they have increased the overall popularity of the distribution-dependent statistical tests in all areas of research."[9]

For nonparametric alternatives in the factorial layout, see Sawilowsky.[10] For more discussion see ANOVA on ranks.

Page 26 of 52

Page 27: Hypothesis testing

Page 27 of 52

Page 28: Hypothesis testing

Page 28 of 52

Page 29: Hypothesis testing

Page 29 of 52

Page 30: Hypothesis testing

Page 30 of 52

Page 31: Hypothesis testing

References[edit]

1. Jump up ̂  Lomax, Richard G. (2007) Statistical Concepts: A Second Course, p. 10, ISBN 0-8058-5850-4

2. Jump up ̂  Box, G. E. P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335. doi:10.1093/biomet/40.3-4.318.JSTOR 2333350.

3. Jump up ̂  Markowski, Carol A; Markowski, Edward P. (1990). "Conditions for the Effectiveness of a Preliminary Test of Variance". The American Statistician 44 (4): 322–326. doi:10.2307/2684360. JSTOR 2684360.

4. Jump up ̂  Sawilowsky, S. (2002). "Fermat, Schubert, Einstein, and Behrens-Fisher:The Probable Difference Between Two Means When σ1

2≠ σ22". Journal of Modern Applied Statistical

Methods, 1(2), 461–472.5. Jump up ̂  Blair, R. C. (1981). "A reaction to 'Consequences of failure to meet assumptions

underlying the fixed effects analysis of variance and covariance.'" Review of Educational Research, 51, 499–507.

6. Jump up ̂  Randolf, E. A., & Barcikowski, R. S. (1989, November). "Type I error rate when real study values are used as population parameters in a Monte Carlo study". Paper presented at the 11th annual meeting of the Mid-Western Educational Research Association, Chicago.

7. Jump up ̂  https://www.rand.org/content/dam/rand/pubs/research_memoranda/2008/RM5072.pdf

8. Jump up ̂  M. L. Tiku, "Power Function of the F-Test Under Non-Normal Situations", Journal of the American Statistical Association Vol. 66, No. 336 (Dec., 1971), page 913

9. Jump up ̂  https://www.statsoft.com/textbook/elementary-statistics-concepts/10. Jump up ̂  Sawilowsky, S. (1990). Nonparametric tests of interaction in experimental

design. Review of Educational Research, 25(20–59).

Page 31 of 52

Page 32: Hypothesis testing

Page 32 of 52

Page 33: Hypothesis testing

Page 33 of 52

Page 34: Hypothesis testing

Page 34 of 52

Page 35: Hypothesis testing

Page 35 of 52

Page 36: Hypothesis testing

Page 36 of 52

Page 37: Hypothesis testing

Page 37 of 52

Page 38: Hypothesis testing

Page 38 of 52

Page 39: Hypothesis testing

Page 39 of 52

Page 40: Hypothesis testing

Page 40 of 52

Page 41: Hypothesis testing

Page 41 of 52

Page 42: Hypothesis testing

Page 42 of 52

Page 43: Hypothesis testing

Page 43 of 52

Page 44: Hypothesis testing

Page 44 of 52

Page 45: Hypothesis testing

Page 45 of 52

Page 46: Hypothesis testing

Page 46 of 52

Page 47: Hypothesis testing

Page 47 of 52

Page 48: Hypothesis testing

Page 48 of 52

Page 49: Hypothesis testing

Page 49 of 52

Page 50: Hypothesis testing

Page 50 of 52

Page 51: Hypothesis testing

Page 51 of 52

Page 52: Hypothesis testing

Page 52 of 52


Recommended