Objectivesp Constructing and assessing hypotheses
p The t-statistic and the P-value
p Statistical significance
p The one-sample t test for a population mean
p One-sided versus two-sided tests
p Further reading: OS3, Sections 4.3 (using known population standard deviation) 5.1.5 (using estimated population standard deviation). ISP: Sections 6.2 and 7.1.
Topics:Understandinghypothesistests
p Learning objectivesp Be able to construct the appropriate null and alternative hypothesis
based on what one wants to learn (the null hypothesis will always have the equal sign embedded in it).
p Understand that a statistical test is based on assessing the likelihood/plausibility of the data being generated under null hypothesis is true.
p If the probability is large then the null is plausible and we cannot reject the null hypothesis. It does not prove the null, but simply states that the null is possible.
p If the probability is small the null seems implausible and we reject the null hypothesis and determine the alternative to be true.
Examplesofhypothesisp We call H0 the null hypothesis and HA the alternative hypothesis.p Examples include:
p Comparing product reviews§ H0: The reviews of two products are the same§ HA: The reviews of two products are different.
p Wine consumption and health§ H0: Regular consumption of wine has no effect on polyphenol levels
in the blood.§ HA: Regular consumption of wine increases polyphenol levels in the
blood. p Birds and flight
§ H0: Bird species A cannot fly.§ HA: Bird species A can fly.
Motivation
Example1:FlyingBirdsp Based on empirical observations we want to answer the following
question:p Question Can Bird species A fly?
p We write this as a hypothesis (conjecture)p Null Hypothesis: H0 : Bird species A cannot fly.p Alternative Hypothesis: HA: Bird species A can fly .
p Based on what we observe, is the null plausible?p Scenario1: You see one bird fly. You have immediately disproven the
null (that this species of birds cannot fly) and this proves the alternative.
p Scenario 2: None of the birds are flying (may be too much food to even attempt it). All this is consistent with the null being true, however it does not prove the null (they may fly later on). In this situation we say there is no evidence in the data to prove the alternative.
Setting-upthehypothesis
Example2:Comparingproductreviews.
q The Coffea (left) has an average review of 4.4 (over 261 customers).
q The Smart Lintelek (right) has an average review of 4.8 (over 58 customers).
q Over these customers Smart Lintelek scored highly. However, just comparing the sampled customers does not take into account sampling variability.
§ H0: The reviews of these two products are the same § HA: The reviews of these two products are different.
p In an “ideal world” all tracker customers would rate both devices and we would be able to compare the mean ratings over both. Our hypothesis should be based on the mean ratings of all customers (which in reality can never be observed).p Let denote the mean rating of the Coffea over all tracker customers.p Let denote the mean rating of the Smart Lintekel over all tracker
customers. p The hypothesis we want to investigate is
§ The null is that globally they would get the same mean ratings. We write this as
§ The alternative is that globally they would get different mean ratings. We write this as
µC
µSM
H0 : µC � µSM = 0 same as H0 : µC = µSM
HA : µC � µSM 6= 0 same as HA : µC 6= µSM
Example3:Buyingaproduct
q The Smart Lintelek has an average rating of 4.8 (over 58 customers). This is great, but the sample size is small.
q I only buy products if I am confident that the population mean is not 4 or below.
H0 : µ 4.0 vs HA : µ > 4.0| {z }I’ll buy
p Looking at the rating 4.8 it is clearly greater than 4.0.p This is great. But there is also a doubt in my mind.p Could it be that a (sample mean) of 4.8 can arise when the
population mean is 4.0?p I want to calculate the chance of this happening.
p We will calculate this chance both by hand and using Statcrunch.p If the chance turns out to be ”small” I can be sure that that the
population mean is not 4.0 and must be greater. p I will then go on to buy the product (since I have rejected the null).
p The null hypothesis is a very specific statement about parameter(s) of the population(s). It is labeled H0. This is the hypothesis we assess.
p The null should always have an equal sign in it.
§ Either: =, ≤ or ≥.
p The alternative hypothesis is a more general statement about the parameter(s) that is exclusive of the null hypothesis. It is labeled HA.
p In all hypothesis test the focus is only on the null hypothesis and assessing its plausibility based on the data.
Setting-upthehypothesisandunderstandingwhenitisimmediatelyclearthatthenullisplausibleandwecannot rejectit.
Example4:Buyingaproduct
q The Teslasz has an average (sample mean) rating of 3.3.q I only buy products if I can be sure that the population mean of the
reviews is over 4.0.
q With a sample mean of 3.3. I am certainly not going to buy this one. There is no evidence in the sample the mean is greater than 4.0. I cannot reject the null.
H0 : µ 4.0 vs HA : µ > 4.0| {z }I’ll buy
q The sample mean will usually be “close” (in some sense) to the population mean. When the sample mean is 3.3 the true mean could easily be lie around 3.3, which is less than 4.0.
q This tells us that null hypothesis is plausible and explains why I cannot reject the null.
H0 : µ 4.0 vs HA : µ > 4.0| {z }I’ll buy
QuestionTimep Question It is known that a freshman biology has mean score 75%.
A professor thinks that students who attend early morning classes have a higher mean score. Her early morning class this year can be considered as a sample of all students who take an early morning class. What is the hypothesis of interest?
p (A) H0 : μ ≥ 75% against HA : μ < 75%.p (B) H0 : μ ≤ 75% against HA : μ > 75%.p (C) H0 : μ = 75% against HA : μ ≠ 75%.p (D) H0 : μ < 75% against HA : μ ≥ 75%.
p http://www.easypolls.net/poll.html?p=59e63fdee4b036a938d4fcc6
QuestionTimep Question It is known that freshman biology has mean score 75%. A
professor thinks that students who attend early morning classes have a higher mean score. Her early morning class this year can be considered as a sample of all students who take an early morning class. The sample mean (average grade) in her class is 78%.
p What is the hypothesis of interest.p (A) H0 : μ ≥ 78% against HA : μ < 78%.p (B) H0 : μ ≤ 78% against HA : μ > 78%.p (C) H0 : μ = 75% against HA : μ ≠ 75%.p (D) H0 : μ ≤ 75% against HA : μ > 75%.
p The stated hypothesis should never be based on the data.p http://www.easypolls.net/poll.html?p=59e64010e4b036a938d4fcca
QuestionTimeq The price of gasoline has changed. Previously the mean yearly mileage
of a vehicle was 4000 miles. I want to see whether the mean yearly mileage has changed after the price change. What is the hypothesis of interest?
q (A) H0 : μ ≠ 4000 against HA : μ = 4000.q (B) H0 : μ = 4000 against HA : μ ≠ 4000.q (C) H0 : μ ≤ 4000 against HA : μ > 4000.
q http://www.easypolls.net/poll.html?p=59e64061e4b036a938d4fccb
Visuallycheckingplausibilityofthenull
p The null hypothesis is a very specific statement about parameter(s) of the population(s). It is labeled H0. This is the hypothesis that we assess.
p Only if the null seems unlikely do we reject it. Reject the null and accepting the alternative are the same thing.
p A hypothesis test always checks the validity of the null. In the following Example 5, we are going to ask if the numbers (observations) in each sample can arise if the null were true?
p If it seems unlikely, we reject the null and accept the alternative.
p If it seems plausible, then we do not reject null (though we do not say the null is true; recall Example 1 with birds flying).
Example5:Benefitsofwine?p Wine consumption and health
p H0: Regular consumption of wine has no effect on polyphenol levels in the blood.
p HA: Regular consumption of wine increases polyphenol levels in the blood.p Of course different people react in different ways. So we should not
focus on the individual nor should we focus on only the participants who took part in the study. We should focus on the population of interest (young males, say) and in particular the mean change in polyphenol levels over this entire population.p Let denote the mean change (over the entire population) in polyphenol
levels after consuming a small amount of red wine on a regular basis.µ
H0
: µ 0| {z }mean levels of polyphenols have stayed the same or reduced
vs HA : µ > 0| {z }mean levels of polyphenols have risen
p Whenever we are given the data and the null hypothesis. We must always ask ourselves, could we have obtained that data set if the null hypothesis were true.
p If it seems that we can, then we cannot reject the null (we cannot say the alterative were true).
p In the following examples, look at the data and ask yourself could the data have been generated if the null were true.
p Later we will put probabilities (called p-values) to these notions.
Whitewine:Situation1p 9 males are given white wine. The difference in polyphenol levels before and after the study is: -0.60 -1.05 -2.09 -1.23 0.71 -0.53 0.33 -0.48 -1.42.
p We plot the changes in the polyphenol levels for each individual on the time line (each blue spot corresponds to one observations). We see that there were some negative readings. These are persons who observed a decrease in polyphenol levels some which are positive. But every person is different. p The sample mean change is the green vertical line x̄ = �0.7
p The aim of the study is to see whether consumption of white wine increases polyphenol levels. But you find that for these 9 participants the sample mean has dropped. Well it is clear that for these guys we did not see an increase. We could easily have observed such a situation under the scenario that white wine has no effect on polyphenol. Though we cannot say the null is true, we cannot make any positive claims about the alternative.p Formally: Since is consistent with the null hypothesis μ≤0, there is no evidence to disprove the null. There is no evidence in the data that the polyphenol levels increase with moderate consumption of white wine. In conclusion we cannot reject the null based on this data set.
x̄ = �0.7
H0
: µ 0| {z }mean levels of polyphenols have stayed the same or reduced
vs HA : µ > 0| {z }mean levels of polyphenols have risen
Redwine:Situation2p 9 males are given red wine. The difference in polyphenol levels
before and after the study is: 8.45 10.18 10.98 10.35 10.75 8.98 8.84 10.38 9.79
p We plot the change in polyphenol levels on the line. Every participant observed an increase in polyphenol levels, all are over are 8.0.
p The sample mean is . So for this group of participants there is a clear increase.
x̄ = 9.86
p Of course, the increases could just be by chance. The concentration of polyphenol in a persons blood will always change.
p But it does seem highly unlikely that 9 people will all observe a substantial increase in polyphenol under the scenario that the red wine did not have an effect. This sample does not appear to be a fluke.
p Formally we say: It really seems very unlikely we could have obtained this data under the null (population mean μ≤0) and it strongly suggests that the alternative is true. The data strongly suggests drinking red wine increases polyphenol levels.
H0
: µ 0| {z }mean levels of polyphenols have stayed the same or reduced
vs HA : µ > 0| {z }mean levels of polyphenols have risen
Redwine:Situation3p The difference in polyphenol levels before and after the study is:
0.06, -0.36, 0.98, 0.82, -0.25, 2.49, -1.34, 1.16, 1.53.
p We plot the change in polyphenol levels on the line. Some observed an increase others observed a decrease in polyphenol.
p The sample mean is x̄ = 0.56
p For these participants there is a small overall increase. p Could this data have been observed if wine had no influence on
polyphenol level (in other words, if the null were true)?p A formal statistical test (we do later) will help us answer this
question. p Visual conclusion: Unsure
H0
: µ 0| {z }mean levels of polyphenols have stayed the same or reduced
vs HA : µ > 0| {z }mean levels of polyphenols have risen
Redwine:Situation4p The difference in polyphenol levels before and after the study is:
-0.43, -8.35 -8.31, 26.11, 4.32, 25.02, 9.40, 11.54, 0.71.
p We plot the change in polyphenol levels on the line. Some observed an increase others observed a decrease in polyphenol. There is a lot of variability in the data. But the majority are positive.
p The sample mean is .x̄ = 6.66
p In order words, to prove the alternative we need to show that the data is unlikely to have been observed if the null were true.
p Looking at changes in polyphenol levels for the participants, do you think they could have been observed when red wine has no influence on polyphenol?
p A formal statistical test (we do later) will help us answer this question.
p Visual conclusion: Unsure.p A statistical tests and tools allow us to systematically navigate
these different scenarios
H0
: µ 0| {z }mean levels of polyphenols have stayed the same or reduced
vs HA : µ > 0| {z }mean levels of polyphenols have risen
QuestionTimep The hypothesis is
p The green line is the sample mean = 4.8p What is the conclusions of the test?
p (A) Reject Null (HA is true)p (B) Cannot Reject Null (Cannot say HA is true)p (C) Do not know.
p http://www.easypolls.net/poll.html?p=59e640a9e4b036a938d4fccc
H0 : µ 1 HA : µ > 1
QuestionTimep The hypothesis is
p The green line is the sample mean = 4.8p What is the conclusions of the test?
p (A) Reject Null (HA is true)p (B) Cannot Reject Null (Cannot say HA is true)p (C) Do not know.
p http://www.easypolls.net/poll.html?p=59e640e4e4b036a938d4fcd0
H0 : µ = 1 HA : µ 6= 1
QuestionTimep The hypothesis is
p The green line is the sample mean = 2.4.p These are the ratings of a product. What is the conclusions of the test?
p (A) Reject Null (HA is true, I buy)p (B) Cannot Reject Null (Cannot say if HA is true, but I won’t buy)
p http://www.easypolls.net/poll.html?p=59e6414be4b036a938d4fcd1
H0 : µ 4 HA : µ > 4
QuestionTimep The hypothesis is
p The green line is the sample mean = 4.8p What is the conclusions of the test?
p (A) Reject Null (HA is true)p (B) Cannot Reject Null (Cannot say HA is true, but I won’t buy)
p http://www.easypolls.net/poll.html?p=59e6419fe4b036a938d4fcd3
H0 : µ � 1 HA : µ < 1
QuestionTimep The hypothesis is
p The green line is the sample mean = 5p These are the ratings of a product. What is the conclusions of the test?
p (A) Reject Null (HA is true, I buy)p (B) Cannot Reject Null (Cannot say if HA is true, I won’t buy)p (C) Do not know.
p http://www.easypolls.net/poll.html?p=59e641e5e4b036a938d4fcd5
H0 : µ 4 HA : µ > 4
QuestionTimep The hypothesis is
p The green line is the sample mean = 4.12p These are the ratings of a product. What is the conclusions of the test?
p (A) Reject Null (HA is true, I buy)p (B) Cannot Reject Null (Cannot say if HA is true, I won’t buy)p (C) Do not know.
p http://www.easypolls.net/poll.html?p=59e64221e4b036a938d4fcd9
H0 : µ 4 HA : µ > 4
Discussionp It was pretty clear what the answer should have been for most of the
previous questions.p But, the solution to the last question was unclear. p This is where we require statistical tools.p These tools will give us the chance of obtaining a sample mean of
4.12, when the population mean rating (amongst everyone who could have rated the product) was 4.0 or less.
p Interpreting probabilities is very important.
Example6:Doestheladytakemilk?p Recall the tea story in Chapter 1: In the 1930s a lady, in Cambridge,
insisted that the tea tasted different depending on whether milk was poured into the cup and then the tea or if the tea was first poured and then the milk.
p Fisher suggests that this can be statistically tested, by giving her tea where some cups are made with tea first and other cups are made with milk first and asking her to identify the cup. The competing hypothesis are:p H0: The lady has no idea and just guesses. p HA: The lady is able to select the correct cup.
p They collect the data and find that she identifies all 8 cups of tea correctly. This is the observed information from which we have to draw a statistical conclusion.
p The chance of her identifying all cups correctly is 1/72 = 1.39% under the scenario she is “guessing” (this is the null hypothesis).
Motivation2(cont)?
p Assessing the probability: p If the probability is over a threshold, then the null is deemed plausible and
we cannot reject the null. p If the probability is below the threshold then the null is deemed implausible
and we reject the null.p Typically, the α=5% significance level as used as the threshold. Since
1/72 = 1.39% is less than 5%, we believe the null is implausible (at the 5% level) and thus reject it (saying that there is evidence to suggest the alternative, that she knows her tea, is true).
p However, we will never know the truth! If she did the experiment 100 times and was simply guessing, then about 1.39 times out of a 100 she would correctly identify all cups correctly.
p Recall 5% is the proportion of times we are willing to reject the null, when in fact the null is true.
p To summarize In order to prove the alternative we have to calculate how plausible (this is a probability) it is to correctly identify all the cups of tea correctly, under the null that she was simply guessing. How likely is one to collect the data that is observed under the scenario of the null being true.
p If this probability is small, then it suggests that the null is an implausible scenario. If the null is an implausible scenario, then this implies the alternative is the plausible scenario (we say: there is evidence to suggest the alternative is true).
Topic:Howtodoahypothesistestp Learning objectives:
p Evaluating a probabilityp Understand how to do a one-sided (both left and right) and two-sided
test.p Be able to connect the p-values of a one-sided test with those of a two-
sided test. p Be able to construct the correct test based on the summary statistics
table.p Be able to do the test in Statcrunch and interpret the output. p Most tests use a t-distribution, but you should understand that a normal
distribution is used when the population standard deviation is known.p You should be able to check for normality of the sample mean based
on a QQplot of the data set and using the sampling distribution applet in Statcrunch. This will tell us whether the p-values are correct or not.
Theunderlyingprincipleinatestp A hypothesis test always checks the validity of the null; in other words,
could the numbers in front of you arise if the null were true?p In a hypothesis test we calculate the probability of observing the data
under the scenario the null is true.p Does the data disprove the null?p The underlying idea of a hypothesis test is that events with small
probabilities are unlikely to happen. If this probability turns out to be small, it suggests that the null assumption made in the calculation is not true and the alternative is a more logical explanation for the data.
Theunderlyingprincipleinatestp In most statistical tests we encounter will based on the population
mean.p This may seem very simple, but it will allow us to test a wide range of
useful hypotheses.p Most calculations will be made using that the sample mean is normal,
therefore we always need to check this assumption – else the probability we calculate will be incorrect.
p In the next few slides we will explain how to calculate these probabilities. Using one and two sided tests.
One-sidedtests
Example1(one-sidedtest)p A person will only buy a product if they are sure the population mean
rating is over 4. These are the hypothesis:
p This is an example of a one-sided testp Here is the data that was collected.
p The sample mean and sample standard deviation is 5 and 0. The sample size 31.
H0 : µ 4 HA : µ > 4
p A person will only buy a product if they are sure the population mean rating is over 4. These are the hypothesis:
p The sample mean and sample standard deviation is 5 and 0. The sample size 31.
p Since the sample standard deviation is 0, the sample standard error is
p To understand if the null is viable, we evaluate the number of standard errors the sample mean is from the mean under the null. This is the t-transform
p If the null is true, we compare the above with a t-distribution with 30 df.
H0 : µ 4 HA : µ > 4
0p31
= 0
t =5� 4
0= 1
Thet-valuep The t-value is
t =5� 4
0= 1
p The white region gives possible t-values if the null that were true. p Suppose then the t-value is like to be less than 1.697.p Suppose then the t-value is likely to be a lot less than 1.697. For
example if the population mean is 1 and then the sample mean will be close to 1 so the t-transform as defined on the previous page will be negative.
µ 4
µ = 4µ < 4
p The t-value from the data us t =
5� 4
0= 1
The t-value is infinity it is at the very right of the blue tail. The area to the right of this is called the p-value and is zero.
This tells us it is impossible to obtain the sample mean 5, when the population mean is 4.
Conclusion The null is implausible and we reject the null.
We do, however, have to be careful. This data is not normal, it is integer valued. This means the sample mean is not normal, so we have to be careful when using interpreting the p-value from using a t-distribution.
Returningtoredwine:Situation3p The difference in polyphenol levels before and after the study is:
0.06, -0.36, 0.98, 0.82, -0.25, 2.49, -1.34, 1.16, 1.53.
p We plot the change in polyphenol levels on the line. Some observed an increase, others observed a decrease in polyphenol.
p The sample mean is the sample std. dev = s = 1.14x̄ = 0.56
H0
: µ 0| {z }mean levels of polyphenols have stayed the same or reduced
vs HA : µ > 0| {z }mean levels of polyphenols have risen
Reminder:Thisisaone-sidedtest
p This is an example of a one-sided test
p A one-sided test is when the alternative hypothesis has a greater than or less than sign.
p Later we consider examples of two-sided tests.
p The way we approach these two different tests are slightly different.
H0
: µ 0| {z }mean levels of polyphenols have stayed the same or reduced
vs HA : µ > 0| {z }mean levels of polyphenols have risen
p In order to prove the alternative, we have to calculate the likelihood of observing the sample mean under the scenario the null is true.
p The null is and the sample mean is estimating zero (red wine exerts no influence).
p Here we use the CLT. If the data comes from the normal distribution or the sample size is sufficiently large the average will be close to normal.
p Thus if the null is true the t-ratio
p For this data set (standard error = 1.14/3 = 0.38)
p This is a “measure of distance” between the sample mean and the population under the scenario that the null is true.
x̄ = 0.56µ = 0
sample mean� 0
standard error
= t distribution with 8df
t =0.56� 0
1.14/p9= 1.47
§ For this data set the t-ratio is
§ The chance of this happening when the population mean is zero is
This is what the distribution of the the t-value will look like if the sample mean is normal and the population mean is 0 (null is true).
The p-value = 8.9%
t =0.56� 0
1.14/p9= 1.47
q This tells us that there is a 8.9% chance of observing the differences 0.06, -0.36, 0.98, 0.82, -0.25, 2.49, -1.34, 1.16, 1.53in 8 individuals when over the entire population of males who consume red wine there is no mean change.
q Since 8.9% is relatively large, the null is plausible (but it does not prove the null).
q We cannot reject the null. There is no evidence in the data to back the claim that red wine consumption increases the mean polyphenol levels.
Usually 5% is used as the decision rule. If the p-value is less than 5%, we deem the chance small and reject the null at the 5% level. Since 8.9% > 5%, for this data set we cannot reject the null at the 5% level.
Warning: 5% is the proportion of times we are will reject the null, when it is true.
Definition We want to quantify the proportion of random samples
that are at least as unusual as our actual result, if the null hypothesis were true. This quantity is called the p-value. The p-value (for a one-sided test, which this is) is the area greater than the t-value (since the alternative contains a greater than sign).
Red wine Example:
Since 8.9% > 5% we deem this probability large.
For this data set, there is not evidence in the data to suggest that
regular consumption of red wine increases polyphenol levels.
Recap:TheP-value(foronesidedtest)
p-value = area greater than t-value = 8.9%
Statcrunchp Load data into statcrunchp Go to Stat -> T Stat -> One Sample (a drop down menu)
p Select column (choose the data sets)p Perform (choose the hypothesis)p Press compute
Understandingp-valuesfromtheperspectiveofrejectionregionsandboundaryofdecisionsThispartisonlynecessarytounderstandwhat“statisticalpower”means.
Redwine:One-sidedboundaryofdecisionp If p-value > 5% we cannot reject the null. However, if p-value <5%
then we reject the null. p α=5% is the boundary of the decision. This means the area on the
right (since this is a one-sided test) should be less than 5%. This corresponds rejecting the null for any t-transform that is larger than 1.86.
p Remember that the t-transform = number of standard errors the sample mean is from the population mean under the null.
p If the null is true the t-transform is small and 1.86 is considered too large (based on the 5% decision rule)
1.86 standard errors from the null corresponds to the the sample mean
Therefore t-values greater than 1.86 correspond to sample means greater than 0.71. If the null is true, sample means less than blue bar are plausible (at the 5% level). And we reject the null if the sample mean is greater than 0.71 (over the blue bar),
X̄ = 0 + 1.86⇥ 0.38 = 0.71
p We reject the null the sample mean is greater than 0+1.86*0.38 = 0.71
p We do not reject the null if the sample mean is less than 0.71.p In this example, since the sample mean 0.56 is less than 0.71 we
cannot reject the null (it is on the left of the blue bar).
Summary
Redwine:All4situations
Here the data is plotted for each of the situations. The green line is the sample mean.Focus on the spread of each data set.
Matchingp-valuestotheredwineexamples(Example5)
For each situation superimpose a bell shape curve centered at zero (see the next slide).
Focus only on the right hand side of zero, because we are only looking for evidence of the red wine causing an increase in polyphenol levels.
p Recall the hypothesis isH
0
: µ 0| {z }mean levels of polyphenols have stayed the same or reduced
vs HA : µ > 0| {z }mean levels of polyphenols have risen
Cannot reject NullReject NullCannot reject NullCannot reject Null
We test H0: μ ≤ 0 vs HA: μ > 0. For each case we assess the likelihood of observing the data under the null hypothesis μ ≤ 0. We see Situation 1: p-value = 98%. It is highly likely to see data like this if red wine did not increase polyphenol levels. Conclusion: cannot reject the null.Situation 2: p-value <0.01%. It is highly unlikely to see data like this if red wine did not increase polyphenol levels. Conclusion: reject null, strong evidence of alternative.Situation 3: p-value = 8.9%. Data like this can be seen when μ ≤ 0. Conclusion: Cannot reject null.Situation 4: p-value = 7.67% Data like this can be seen when μ ≤ 0. Conclusion: Cannot reject null.
Example:Buyingaproduct
q I only buy products if I can be certain that their population mean reviews are over 4.0.
H0 : µ 4.0 vs HA : µ > 4.0| {z }I’ll buy
p The summary statistics for the tracker is
p 4.25 is 2.4 standard errors to the right of the null:
p The p-value is the area to the right of 2.4, which is 0.8%. Since 0.8%<5% we reject the null.
p If the reviews are representative of all the people who bought the product then there is evidence from the sample that the population mean review is greater than 4.
t =4.25� 4
0.104= 2.4
QuestionTimeBelow is the summary statistics for the Coffea watch. Suppose we will only buy the watch if we can be “sure” the mean review is over 4.0.
What is the hypothesis of interest, the t-value, p-value and conclusion of the test (use the 5% level)? Use that the t-value for t-distribution with 260 df are
(A) H0: µ ≤ 4.0 HA: µ > 4.0. the t-value is 2.2 the p-value is between 1%-2.5%. We reject the null and I buy the product.
(B) H0: µ ≤ 4.0 HA: µ > 4.0. the t-value is 0.13 the p-value greater than 30% I will not buy the product.
http://www.easypolls.net/poll.html?p=59f8bf18e4b036a938d52195
QuestionTime
Entomologists want to understand the number of chirps a minute a cricket makes. They conjecture that it is less than 17 chips per minutes. They collect the data on 15 crickets. The data is summarized above. What is the hypothesis of interest, the t-value, the p-value and result of test (use t-distribution with 14df).
(A) H0: µ ≥ 16.6 HA: µ < 16.6. The t-value is -0.93. The p-value is more than 15% we cannot reject the null. We cannot say the population mean is less than 17.
(B) H0: µ ≥ 17 HA: µ < 17. The t-value is -0.93. The p-value is more than 15% we cannot reject the null. We cannot say the population mean is less than 17.
(C)H0: µ ≤ 17 HA: µ > 17. The t-value is -0.23. The p-value is more than 50% we cannot reject the null. We cannot say the population mean is less than 17.
http://www.easypolls.net/poll.html?p=59f8bf41e4b036a938d52196
Two-sidedtests
Example:Tomatoes1Example: You are in charge of quality control in your food company. You randomly sample fourteen packs of cherry tomatoes, each labeled 224 grams. The average weight from your fourteen boxes is 226.1g. Obviously, we cannot expect boxes filled with whole tomatoes to all weigh exactly 224 grams.
p Is the somewhat larger sample mean simply due to chance variation?
p Or is it evidence that the machine that sorts the cherrytomatoes into packages needs to be recalibrated?
x
The hypothesis:
H0 : µ = 224g (µ is equal to the value claimed by the produce company)
HA : µ ≠ 224g (µ is either larger or smaller than the value
claimed)
Thisisatwo-sidedtest
p This is a two-sided test.
p This is because there is a not equal sign in the alternative hypothesis.
H0 : µ = 224 vs HA : µ 6= 224
H0 : µ = 224 vs HA : µ 6= 224
This is the data superimposed with a normal distribution centered about the mean 224g.We see the data seems to be slightly shifted to the right. We will test if this is statistically significant.
p After collecting the data, the basic prescription is to make a z/t-transform.
t =
0
@X̄ � µ|{z}mean under the null
1
A
s.e
H0 : µ = 224 vs HA : µ 6= 224
t =226.1� 224
1.01= 2.07
How unusual is this data, assuming it is properly calibrated (null is true)? We calculated that the sample mean is t = 2.07 standard errors from the mean under the null. The area to the right of 2.07 is 2.9%. Samples that are properly calibrated and are at least as unusual as this have t-value that is either greater than 2.07 or less than -2.07. The chance of this is the area to the right of 2.07 or area to the left of -2.07, which is 2×2.9 = 5.8%.
Definition We want to quantify the proportion of random samples that are at
least as unusual as our actual result, if the null hypothesis were true. This quantity is called the p-value. The p-value (for a two-sided test, which this is) is 2×the smallest area.
Tomato Example:
Since 5.8% > 5% we deem this probability large.
For this data set, we cannot reject the null. We will not investigate the tomato
packing machine.
There is always the possibility that the conclusion is incorrect.
Definition:TheP-value(fortwosidedtest)
Further reading: http://onlinestatbook.com/2/tests_of_means/single_mean.html
p-value = 2⇥smallest area = 2⇥ 2.9 = 5.8%
p Always try to match the calculation you have made with the Statcrunchoutput for the same problem.
p We calculated
p Which using a t-distribution with 13 dfs the p-value is betweem 5-10% using tables or with computer exactly 5.8%.
p We cannot reject the null.p Statcrunch gives the same result. It is important to map the calculation to
the statcrunch output.
t =226.1� 224
1.01= 2.07
Example:Tomatoes2Example: You are (again) in charge of quality control in your food company. You randomly sample fourteen packs of cherry tomatoes, each labeled 224 grams. The average weight from your fourteen boxes is 221.7g. Obviously, we cannot expect boxes filled with whole tomatoes to all weigh exactly 224 grams.
p Is the somewhat smaller weight simply due to chance variation?
p Or is it evidence that the machine that sorts the cherrytomatoes into packages needs to be recalibrated?The hypothesis:
H0 : µ = 224g (µ is equal to the value claimed by the produce company)
HA : µ ≠ 224g (µ is either larger or smaller than the value
claimed)
Thetomatomachinedata(2)p The next data we observe this data
The summary statistics are
The sample mean is 221.7g (average of 14 boxes).
H0 : µ = 224 vs HA : µ 6= 224
This is the data superimposed with a normal distribution centered about the mean 224g.We see the data seems to be shifted to the left. We will test if this is statistically significant.
Thebasicprescription
p After collecting the data, the basic prescription is to make a z/t-transform.
p This is a summary of the statistics:
t =
0
@X̄ � µ|{z}mean under the null
1
A
s.e
H0 : µ = 224 vs HA : µ 6= 224
t =221.7� 224
0.31= �9.3
How unusual is this data assuming it is properly calibrated (null is true)? We calculated that the sample mean is t = -9.3 standard errors from the mean under the null. The area to the left of -9.3 is almost 0%. Samples that are properly calibrated and are at least as unusual as this have t-value that is either greater than 9.3 or less than -9.3. The chance of this is the area to the right of 9.3 or area to the left of -9.3, which is 2×0 = 0%.
Since 0% < 5% we deem this probability very small.
It is very, very hard to get this type of data under the scenario that the
machine is properly calibrated and working. Thus there is strong
evidence that the tomato packing machine is not packing correctly and
the machine will have to be recalibrated.
Connectingtwosidedtestsandconfidenceintervalsp The results of a test at a certain significance level and confidence
intervals are closely related. p We use the two tomato examples to illustrate the connects. p We recall the hypothesis is
p Tomato 1: Summary statistics
p The 95% confidence interval for the mean is [226.1±1.02×2.16] = [223.9, 228.3]
H0 : µ = 224 vs HA : µ 6= 224
p Tomato 1: The 95% confidence interval for the mean is [226.1±1.02×2.16] = [223.9, 228.3]
p The confidence interval gives plausible values for the mean. p This means that 224 is a plausible mean. We cannot discount the null
hypothesis.p If the mean under the null is inside the 95% confidence interval, then for
a two-sided test the p-value is greater than 5% and we cannot reject the null.
p Similarly if the mean under the null is inside a 99% confidence interval for the mean, then the p-value for a two sided test is greater than 1%.
H0 : µ = 224 vs HA : µ 6= 224
p Tomato 2: The summary statistics is
p Based on the data the 95% confidence interval for the mean is [221.8±0.31×2.16] = [221.13, 222.5]
p The interval tells us where the population mean is likely to like. p 224g is not in this interval. This suggests that 224g is not a plausible
mean. p Since 224g is not in the 95% confidence interval for the mean, the p-
value for the two sided test is less than 5%. p If 224g is not in the 99% confidence interval for the mean, the p-value for
the two-sided test will be less than 1%.
H0 : µ = 224 vs HA : µ 6= 224
p The above arguments do not hold for one-sided test.p The relationship between one-sided tests and confidence intervals is
more complicated and will not be covered in this class.
QuestionTimeThe Windchill factor in a certain area is measured over a period of 216 days. The summary statistics and the critical values for the t-distribution with 215 degrees of freedom are given below.
Linkingthedifferentsidedtestsp We recall for a given data set and population mean we can do three
different tests. However, the results of all the tests are closely related. p Situation 1: The results for: H0 : μ ≤ 0 against HA : μ > 0 is
p Suppose we want to test the hypothesis that red wine decreases polyphenol levels. Then our hypothesis of interest isH0 : μ ≥ 0 against HA : μ < 0.
q The p-value for this test can easily be deduced from the above table. q The t-value is the same.q The p-value is different. q The p-value is the area to the LEFT of -2.45, which is 1-0.98 = 2%.
o Testing H0 : μ ≥ 0 against HA : μ < 0. Since the p-value 2% < 5% there is some evidence based on this data set that red wine decreases polyphenol levels.
o If we test H0 : μ = 0 against HA : μ ≠ 0, the p-value is 4% and there is evidence to suggest the mean is not zero.
QuestionTime
Experts conjecture that the weighting time between eruptions of Old Faithful is more than 68 minutes. What is the hypothesis of interest and the p-value (using the above output).
q (A) the p-value is 0.05%. Reject the null.
q (B) the p-value is 0.1%. Reject null
q (C) the p-value is 0.025%. Reject null
q http://www.easypolls.net/poll.html?p=59f8c07ae4b036a938d521a2
H0 : µ 68, HA > 68
H0 : µ 70.9, HA > 70.9
H0 : µ 68, HA > 68
QuestionTime
Experts conjecture that the weighting time for between eruptions of Old Faithful less than 68 minutes. What is the hypothesis of interest and the p-value (using the above output). http://www.easypolls.net/poll.html?p=59f8c0b5e4b036a938d521a4
q (A) the p-value is 99.95% reject null
q (B) the p-value is 0.05% cannot reject null
q (C) the p-value is 99.75%, cannot reject null.
H0 : µ � 70.9, HA < 70.9
H0 : µ � 68, HA < 68
H0 : µ � 68, HA < 68
QuestionTime(one-sided)p Let µ denote the (population) mean level of glucose in an expectant
mother. If µ > 140 gestational diabetes is diagnosed. The hypothesis we want to test is
p 6 blood samples are taken. The results are summarized above. What is the result of the test at the 5% level (use t-distribution with 5df)?p (A) The t-value is 1.64 and the p-value is between 5-10%. We can reject the
null and diagnose diabetes. p (B) The t-value is 1.64 and the p-value is between 5-10%. We cannot reject
the null. The data does not suggest she has gestational diabetes, she could have got a sample mean of 142 even if she were well.
p http://www.easypolls.net/poll.html?p=59fb6276e4b036a938d52809
H0 : µ 140 HA : µ > 140
Whentousethenormaldistributioninsteadofat-distibution inastatisticaltest
Example:UsingthenormaldistributionLowPotassium
p Hypokalemia is diagnosed when the blood potassium level is below 3.5mEq/dl.
p The potassium in a blood sample varies from sample to sample and follows a normal distribution with unknown mean.
p However, several years of data means that the standard deviation (the variation between samples) is known to be 0.2.
p Since the standard deviation is known and not estimated from a sample we use a normal distribution instead of a t-distribution (look back at chapter 6).
p As we looking for evidence of low potassium the hypothesis of interest isH0 : μ ≥ 3.5 against HA : μ<3.5.
p This is a one-sided test.
Example:Usingthenormaldistribution:LowPotassiump We test H0 : μ ≥ 3.5 against HA : μ<3.5.p A patient has 9 blood samples taken, their sample mean/average is
3.4, is there evidence to suggest low potassium (use 5% significance level)?
p The standard error is 0.2/√9 = 0.0666.p Below we plot the distribution of the sample mean if the null were true.
p The p-value is 6.6%.
Left: Distribution of sample mean under the null. The p-value is in red.
p To calculate the p-value using the z-tables, we make the z-transform, which is identical to a t-transform. We simply use a different tables to get the p-values.
z =3.4� 3.5
(s.e = 0.2/p9)
=�0.1
0.06= �1.5
Looking up the z-tables (remember the standard deviation is known) gives the p-value 6.68%. As this is greater than 5% we cannot reject the null. Despite the person having a sample mean below 3.5, such a sample can be collected when their true mean is 3.5. Thus there is not enough evidence that the person has low potassium.
Consequence We do not subject the person to more medical checks.
Example:Gestationaldiabetesp A patient has gestational diabetes if the mean glucose level of the
patient is over 140. We are looking for evidence of gestational diabetes.p The test is H0 : μ ≤ 140 against HA : μ > 140.p μ is never known. All we have are the results from a few blood samples. p However, it is known that the amount of glucose in blood is normally
distributed with known standard deviation with σ=4.p A patient goes to the doctors. We do not know if she has gestational
diabetes (μ is unknown). The glucose level in her blood samples is assumed to normally distributed with σ=4. After taking 4 blood samples her sample mean is 145. Is there evidence that she has gestational diabetes?
Example:Gestationaldiabetes
p We want to test H0: μ≤140 against the alternative HA: μ > 140. Based on the data can we disprove that she is healthy. p To this we need to know the variability in the sample mean, this is quantified
by the standard error = 4/√4 = 2. p Next we have to calculate how far her sample mean is from the mean if she
were healthy: z-transform = (145-140)/2 = 2.5 (we call it a z-transform rather than a t-transform because we know the standard deviation).
p Since the alternative is pointing to the right, we need to calculate the probability to the right of 2.5. From the z-tables this is 0.6%.
p 0.6% is quite small. It says the chance of getting a sample mean of 145 or higher, when the patient does not have gestational diabetes is 6 in a 1000.
p Since 0.6% < 5% (it is very small), we disprove the null. There is strong evidence from her blood samples that she has gestational diabetes.
QuestionTimep Low potassium is diagnosed if the mean level in a person is less than
3.5. The standard deviation of a given blood sample is known to be (0.3, this means use a normal distribution). The hypothesis of interest is
p A person has 4 blood samples taken. The sample mean is 3.0. If there any evidence they have low potassium (use the 5% level)? p (A) The z-value is z = -3.33. The p-value is 0.04%, this is so small, there is
strong evidence to suggest they have low potassium (reject null).p (B) The z-value is z = -3.33. The p-value is 0.04%. This is so small, we
cannot reject the null.p (C) The z-value is z = 1.66. The p-value is 4.7%. There is some evidence to
reject the null and determine they have low potassium.p http://www.easypolls.net/poll.html?p=59fb6645e4b036a938d52822
H0 : µ � 3.5 HA : µ < 3.5
Choiceoflevel
Decidingtheconclusionwithαp A very small P-value indicates that our results probably did not
occur when the null hypothesis is true, and therefore H0 is implausible.
It should be rejected. In this case we say the evidence is significant.
p The smaller the P-value the stronger the evidence against H0.
p The significance level α is the largest P-value for which we are
willing to reject the null hypothesis.
The value of α is decided before conducting the test.
p If the P-value is equal to or less than α then we reject H0. This is
when we accept Ha as the truth.
p If the P-value is greater than α then we fail to reject H0. Whatever
evidence there is, it is not sufficient to accept Ha.
p Typically we set α=5%.
Commentsonthedecisionrulep The objective of a test is to make a decision between the plausibility
of two competing hypothesis.p The p-value is the probability of observing the data under the
assumption the null hypothesis is true.p If the p-value is less than the significance level (often set at 5%).
The decision is to reject the null and go for the alternative instead.p If the p-value is greater than 5% than the data is consistent with the
null being true and we cannot reject the null. p The point is there is a chance we made the wrong decision. We
could have wrongly rejected the null when actually the null is true. p The chance of this happening is the significance level. In other
words, if we set the significance level at 5% and our p-value is less than 5% there is 5% chance we have made the wrong decision.
p The value at which we set the significance level determines how willing we are to wrongly reject the null hypothesis.
p Examples:p Suppose we are in a tomato packing plant. Our aim is to ensure that
the mean weight of a tomato box is 227g. p Every few hours we randomly sample 14 boxes of tomatoes and do a
hypothesis test. Each test is done at the 5% level. p We do the test 100 times, if the null hypothesis is true, then on
average we would falsely reject the null 5 times. p Each time we falsely reject the null, it is called a type I error or in
medical terms a false positive.p Suppose we reduce the significance level to 1%, in this case if the
null were true we would falsely reject the null 1 time out of a hundred.
p We will show in Chapter 8 that by increasing the significance level (from, say 5% to 10%) we increase the number of false positives, but we are more likely to detect the alternative (if it is true).
p Decreasing the significance level will have the opposite effect. p The p-value is measuring the level of evidence against the null.
The smaller the p-values the more the evidence against it.
TheSignificancelevelp How to choose the significance level?p There is a trade off between not wanting to falsely reject the null but
wanting to detect the alternative.p The lower the significance level, the less likely we are to falsely reject
the null, but this makes detecting the alternative much harder!p Example: Consider the court case H0: Innocent HA: Guilty.
p The p-value is the probability of observing the evidence given the null is actually true. If we set the significance level at 5%. Then a person is determined guilty if the p-value is less than 5%,
p This means 5% percent of all innocent people who were put on trial will be determined gulity.
p This is too much!p To avoid convicting such a large proportion of guilty people we
need to reduce the significance level.
p If the significance level is put to zero, this means that no one who is innocent is put into jail.
p However, it also means that all guilty people are free. In other words no amount of evidence is enough to convict a person.
p What significance level seems reasonable in this case? § 0.01%?
p This choice of significance level depends on the application. 5% is reasonable for a tomato packing plant (we can afford to check a machine several times), but too large for a conviction.
Checkingreliabilityofthep-value
Howreliablearethesep-values?p Remember, to calculate the p-values we have used the normal or t-
distribution (depending on whether the population standard deviation is known or not).
p Underlying these calculation is the assumption that the sample mean is normally distributed (remember we always make a plot of of the normal distribution and center it about the mean under the null).
p If the sample size is not large enough, the central limit theorem will not have `kicked-in’. Then the sample mean won’t be normally distributed. This means the probabilities we have calculated won’t be reliable – just like the 95% CI for the mean won’treally be a 95% confidence interval.
q In this case we must be cautious in interpreting the results of the test.
p Nevertheless: If the p-value is extremely small (say 0.0001), it would be small even if the correct distribution of the sample mean were used.
p On the other hand, if the p-value is close to the 5% significance level we need to careful about its statistical significance (since the correct distribution may mean the “true” p-value is greater than 5%).
Example:Siblingsp The university is interested in the (population) mean number of
younger siblings a student has at the university (in the hope that they will attended the university). They believe that the mean is greater than 0.25. To test this hypothesis, H0: μ≤ 0.25 against HA: μ> 0.25they randomly sample 3 students ask them how many siblings they have, they answer 0, 1, 3. The sample mean is 1.33 and the sample standard deviation is 1.53. p Question: What are the conclusions of the test at the 10% level and
comment on the reliability of the result.p Answer: The t-transform is t = (1.33-0.25)/(1.53/√3) = 1.22. Using the t-
tables (with 2df) we see this lies somewhere between 15-20%. Since the alternative hypothesis is pointing RIGHT this means the p-value is between15-20%.Now we comment on the reliability of this p-value. In HW9, Q1 we made plot of the sample mean (based on size 3) for younger sibling numbers.
q The distribution of the sample mean is the lowest plot on the left, this is clearly not normal (see also the corresponding QQplot). This means that the p-value is not correct, it is based on normality when the sample mean is not normal.
q This means we have to be very careful when we interpret this p-value.
q We recall if the sample size is larger (in Q2, Quiz 9 we looked at sample size n = 150), then sample mean is close to normal and we corresponding p-value will be closer to the truth (as it if came from the true distribution of the sample mean).
Labpracticep Out aim is to make inference about the mean weight of a newborn
calf based on the sample mean of 44 calves.p We first make a histogram of the data, to see if there are any major
deviation from normality.
p The distribution of weights at birth does not have a obvious skew or thick tail. This means that distribution of the sample mean based on a sample of 44 will be very close to normal. So we can rest assured that using the t-distribution (since the standard deviation is unknown) will be reliable.
p Question: Based on the data is evidence to suggest the mean weight of calves is greater than 90 pounds?
p We test H0: μ ≤ 90 against HA : μ > 90.
p We deduce the p-value in Statcrunch.
p The p-value 0.44%. Since 0.44% < 1% level, we reject the null at the 1% level (the alternative is true).
p This means there is strong evidence in the data to suggest the mean weight of calves (of that breed) is greater than 90 pounds.
Connectingconfidenceintervalsandstatisticaltests.
Thisisoldmaterialandwillnotbetestedorcovered.
Two-sidedtestsandconfidenceintervalsp There is a close connection between confidence intervals and two-sided
tests. Let us return to the one bed apartment in Dallas example. p 10 apartments are randomly sampled. The sample mean and the sample
standard deviation based on this sample is 980 dollars and 250 dollars (both are estimators based on a sample of size ten). The 95% confidence interval for the mean is [980±2.262×79]=[801,1159].
q Suppose we want to know whether the price of apartments has changed since last year, where the mean price was 850 dollars.q Based on this interval we see that 850 dollars is contained in this interval. This
means the mean could be 850 dollars . There given the sample it is unclear whether the mean price of apartments is the same since last year or not.
q We can rewrite the above as a statistical test H0: μ = 850 against HA : μ ≠850. The t-transform is t = (980-850)/79 = 1.64. Looking at the t-distribution, we see that 1.64 < 2.262 (this is the t-value corresponding to 9df at 2.5%). Therefore, the p-value is greater than 5%. Thus we cannot reject the null at the 5% level.
q Further reading: http://onlinestatbook.com/2/logic_of_hypothesis_testing/sign_conf.html
p Summarizing these two observations we see that:p 850 lies inside the 95% confidence interval [801,1159].p We are unable to reject the null at the 5% level.
p If the mean under the null lies in the 95% confidence interval, then this implies the corresponding p-value will be greater than 5%.
p On the other hand, if the mean under the null does not lie in the 95% confidence interval its p-value will be less than 5%.
p This is easily seen with an illustration (see later slides). p If 850 is in an interval centered about 980 (where each side has length
178.7). Then 980 must be the interval centered about 850 with sides of length 178.7. A few slides earlier we showed that this interval [850±2.262×79]=[671,1028] corresponded to points where we make a decision to reject the null or not at the 5% level.
p In general, if the mean under the null lies in a (1-α)×100% confidence interval, then the p-value for a two sided test will be greater than α.
Confidenceintervalsandone-sidedtests
p Review: Two-sided tests and confidence intervalsp The 95% confidence interval for the change in polyphenol levels is
[2.6,5.99]. This means if I am testing the hypothesis H0:μ = 0 against the alternative HA: μ ≠ 0, since 0 is not in the interval the p-value is less than 100 – 95% = 5%.
p The 99% confidence interval for the chance in polyphenol levels is [1.94,6.66]. This means if I am testing the hypothesis H0:μ = 0 against the alternative HA: μ ≠ 0, since 0 is not in the interval the p-value is less than 100 – 99% = 1%.
Consider the polyphenol and red wine example considered in Chapter 6. 15 randomly sampled men were asked to drink red wine every day for two weeks. Their change in polyphenol levels was measured:
0.7, 3.5, 4.0, 4.9, 5.5, 7,0, 7.4, 8.1, 8.4, 3.2, 0.8, 4.3, -0.2, -0.6, 7.5.
The average change is 4.3 and sample standard deviation is 3.06.
q One Sided test (pointing RIGHT) Suppose we are testing that polyphenol levels increase. This means testing the hypothesis H0:μ ≤ 0 against the alternative HA: μ > 0. The p-value is the area to the right of 4.3 (see that the alternative is pointing to the right). Since from above we have deduced that in the two sided test the p-value is less than 5%, so for the one-sided the p-value is less than 2.5%. q Why?
Recall the p-value for two-sided tests is the smallest area to the left/right ofof the t-transform times 2. In this case it is the area to the right of 4.3 times 2.
For the two sided test we have deduced that the p-value is less than 5%, this implies that the area to the RIGHT of 4.3 is less than 5/2 = 2.5%.The p-value for the one-sided test pointing to the RIGHT is the area to the right of 4.3. We have just shown that the area to the right of 4.3 less than 2.5%. Thus the p-value for the one-sided test pointing to the RIGHT is less than 2.5%.
q One Sided test (pointing LEFT) Suppose we are testing that polyphenol levels decrease. This means testing the hypothesis H0:μ ≥ 0 against the alternative HA: μ < 0. Since 4.3 is not in the 95% confidence interval thismeans the p-value is greater than 97.5% (there is no evidence to reject the null – which is clear 4.3 lies within the null hypothesis). q Why?
On the previous slide we showed that the p-value for the hypothesis pointing to the RIGHT is less than 2.5% - the area to the RIGHT of 4.3 is less than 2.5%. The p-value for the test pointing to the LEFT is the area to the LEFT of 4.3. Which has to be greater than 97.5% (since the area to the left plus the area to the right is 100%).
But this is obvious. The point of a test is to see how plausible the data is under the null. If the sample mean is 4.3 and the null is that the true mean is greater than or equal to 0, this is highly plausible! If this is highly plausible we cannot reject the null.
Illustration,meaninconfidenceinterval
Illustration,meannotinconfidenceinterval
Example1:CIandtestingScientists want to understand whether Omega 3 supplements increase the IQ of people. They randomly sampled 30 people (who previously did not take any supplementation), took their IQ before the experiment and asked them to take a daily 1000mg dose of EPA/DHA Omega 3. After two months they measured the IQ again. They took the difference between the current IQ (after supplementation) and previous IQ (before supplementation) and evaluated the average, which was (so for this group there was an overall increase, but we do know with out statistics, whether this is by chance). The 95% CI for the mean change is [-1,15]. q Question We want to test the hypothesis H0:μ =0 against HA: μ ≠ 0,
what is the result of the test using the 5% significance level?q Answer Because 0 is in the 95% CI interval [-1,15], the p-value for the
two sided test is greater than 5% so there is not enough evidence to reject the null.
x̄ = 7
p Question We want to test the hypothesis H0:μ ≤0 against HA: μ > 0 (in fact this is really the hypothesis of interest as it asks whether Omega 3 on average Omega 3 increases IQ) what is the result of the test using the 5% significance level?
p Answer In this case, the p-value is the area pointing RIGHT of 7. This is the smallest area. We know from the p-value for the two sided test is greater than 5%. This means the p-value for this one-sided test is greater than 2.5%, so we would NOT be able to reject the null if we did the test at the 1% level. However, it is unknown whether the p-value is less 5%, so we do not know whether or not we can reject the null at the 5% level. Further calculations need to be done to determine the p-value in this case.
p Question We want to test the hypothesis H0:μ ≥0 against HA: μ < 0 (the hypothesis of interest asks whether Omega 3 on average decreases IQ) what are the results of the test using the 5% significance level?
p Answer Since the average is , lies within the interval under the null (μ≥0), there is no evidence to reject the null.
x̄ = 7
Example2:CIandtestingScientists want to understand the Omega 3 supplements increase the IQ of people. This time they randomly sampled 100 people (who previously did not take any supplementation), took they IQ before the experiment and asked them to take a daily 1000mg dose of EPA/DHA Omega 3. After two months they measured the IQ again. They took the difference between the current IQ (after supplementation) and previous IQ (before supplementation) and evaluated the average, which was . The 95% CI for the mean change was [2.11,10.88]. q Question We want to test the hypothesis H0:μ =0 against HA: μ ≠ 0,
what are the results of the test using the 5% significance level?q Answer Because 0 is not in the 95% CI interval [2.11,10.88], the area
to the RIGHT (we need the smallest area) of 6.5 will be LESS than 2.5%. Thus the p-value is less than 5% and we can reject the null at the 5% level.
x̄ = 6.5
p Question We want to test the hypothesis H0:μ ≤0 against HA: μ > 0 (in fact this is really the hypothesis of interest as it asks whether Omega 3 on average Omega 3 increases IQ) what is the result of the test using the 5% significance level?
p Answer In this case, the p-value is area to the RIGHT of 6.5, which we know from the two-sided test is LESS than 2.5%. This means we can reject the null at the 5% level.
p Question We want to test the hypothesis H0:μ ≥0 against HA: μ < 0 (the hypothesis of interest asks whether Omega 3 on average decreases IQ) what is the result of the test using the 5% significance level?
p Answer Since the average is . > 0 there is no evidence to say the mean is less than zero. P-value is greater than 50%.
It is important to observe that the p-values for both the one-sided tests will always add up to one.
x̄ = 6.5
Whatiswrongwiththefollowing?q A random sample of size 30 is taken from a population that is
assumed to have a standard deviation of 5. The standard deviation of the sample mean (standard error) is 5/30.q Recall, the standard error is 5/√30.
p A study where the sample mean is 45, reports a statistical significance (p-value less than α%) for H0: μ≤ 55 against HA : μ> 55.q This is an example where you need to consider which way the one-sided
test is pointing. It is clear that with a sample mean of 45, there is no evidence what so ever to support the null. In this case the p-value will be greater than 50%.q Why? Because the p-value is the area to the right of 45 with the
mean centered at 55. If you make a plot, it is clear that the p-value is greater than 50%.
p A test rejected the null hypothesis that the sample mean was equal to 50.p Hypotheses are always about the population mean, which is
unobserved. It makes no sense to state the hypotheses in terms of the sample mean which is observed.
p A test preparation company wants to test that the average score of their students on the ACT is better than the national score of 21.5.They state their alternative hypothesis as HA: μ> 21.5. The z-value is equal to 0.018. Because this is less than the significance level 5%, the null hypothesis is rejected. q This is an example where the z-transform has been mistaken for a
probability! We need to deduce the probability (which is the p-value) by looking up 0.018 in the z-tables. This turns out to be 49% (0.018 is the number of standard deviations the sample mean is from 21.5), since it is so close to the mean, it is clear that the p-value will just below 50% and we cannot reject the null.
AccompanyingproblemsassociatedwiththisChapterp Quiz 9p Quiz 9 part 2p Quiz 10p Quiz 11p Part of Homework 4p Homework 5 (Q1-Q6).