11. Hypothesis Testing

1

INTRODUCTION TO HYPOTHESIS TESTING

2

HYPOTHESIS TESTING

• STATISTICAL TEST: The statistical procedure to draw an appropriate conclusion from sample data about a population parameter.

• HYPOTHESIS: Any statement concerning an unknown population parameter.

• Aim of a statistical test: Test an hypothesis concerning the values of one or more population parameters.

3

Concepts of Hypothesis Testing

• The critical concepts of hypothesis testing.– Example:

• An operation manager needs to determine if the mean demand during lead time is greater than 350.

• If so, changes in the ordering policy are needed.

– There are two hypotheses about a population mean:

• H0: The null hypothesis = 350

• H1: The alternative hypothesis > 350

This is what you want to prove

4

HYPOTHESIS TESTING

• Examples– Is there statistical evidence in a random sample

of inside diameter of a certain type of PVC pipe, that support the hypothesis that true average of all the inside diameters of a PVC pipe is 0.75?

– Is there statistical evidence in a random sample of circuit boards that support the hypothesis that less than 10% of the circuit boards are defective among all circuit boards produced by a certain manufacturer?

5

NULL AND ALTERNATIVE HYPOTHESIS

• NULL HYPOTHESIS=H0 states that a treatment has no effect or there is no change compared with the previous situation. The parameter is equal to a single value.

ALTERNATIVE HYPOTHESIS=HA states that a treatment has a significant effect or there is development compared with the previous situation. The parameter can be greater than or less than or different than the value shown in H0.

6

TEST STATISTIC AND REJECTION REGION

• TEST STATISTIC: The sample statistic on which we base our decision to reject or not reject the null hypothesis.

• REJECTION REGION: Range of values such that, if the test statistic falls in that range, we will decide to reject the null hypothesis, otherwise, we will not reject the null hypothesis. The probability that the (standardized) test statistic falls in the rejection region is the PROBABILITY OF TYPE I ERROR or SIGNIFICANCE LEVEL FOR THE TEST, which is known as .

7


• Assume the null hypothesis is true (= 350).

= 350

–Sample from the demand population, and build a statistic related to the parameter hypothesized (the sample mean).

–Pose the question: How probable is it to obtain a sample mean at least as extreme as the one observed from the sample, if H0 is correct?

8


• Assume the null hypothesis is true (= 350).

355x

= 350 450x

–Since the is much larger than 350, the mean is likely to be greater than 350. Reject the null hypothesis.

– In this case the mean is not likely to be greater than 350. Do not reject the null hypothesis.

x

9

Types of Errors

• Two types of errors may occur when deciding whether to reject H0 based on the statistic value.

– Type I error: Reject H0 when it is true.

– Type II error: Do not reject H0 when it is false.

• Example continued

– Type I error: Reject H0 ( = 350) in favor of H1 ( > 350) when the real value of is 350.

– Type II error: Believe that H0 is correct ( = 350) when the real value of is greater than 350.

10

Controlling the Probability of conducting a type I error

• Recall:– H0: = 350 and H1: > 350.

– H0 is rejected if is sufficiently large

• Thus, a type I error is made if when = 350.

• By properly selecting the critical value we can limit the probability of conducting a type I error to an acceptable level. Critical value

x= 350

x

11

RESULTS OF A TEST OF HYPOYHESIS

• Tests are based on the following principle:

Fix , minimize .

H0 is FalseH0 is True

Reject H0

Do not reject H0

Type I errorP(Type I error) =

Correct Decision

Correct Decision

Type II errorP(Type II error) =

12

PROCEDURE OF STATISTICAL TEST

1. Determining H0 and HA.

2. Choosing the best test statistic.

3. Deciding the rejection region (Decision Rule).

4. Conclusion.

13

POWER OF THE TEST AND P-VALUE

• 1- = Power of the test

= P(Reject H0|H0 is not true)

• p-value = Observed significance level = The smallest level of significance at which the null hypothesis can be rejected OR the maximum value of that you are willing to tolerate.

14

EXAMPLE 1

• For each of the following assertions, state whether it is legitimate statistical hypothesis and why?

a) H: >100b) H: s0.20c) H: d) H: e) H:

45X

1 2/ 1 5X Y

Yes, it is an assertion about the value of a parameter

No. The sample stdev is not a parameter

No. The sample median is not a parameter

Yes. It is about the value of two population standard deviations.

No. They are statistics.

15

EXAMPLE 2

• To determine whether the pipe welds in a nuclear power plant meet specifications, a random sample of welds is selected, and tests are conducted on each weld in the sample. Weld strength is measured as the force required to break the weld. Suppose the specifications state that mean strength of welds should exceed 100 lb/in2; the inspection team decides to test H0:=100 versus HA: >100. Explain why it might be preferable to use this HA rather than < 100.

16

EXAMPLE 2

• In this formulation, H0 states the welds do not conform to specifications. This assertion will not be rejected unless there is strong evidence to the contrary. Thus the burden proof is on those who wish to assert that the specification is satisfied. Using <100 results in the welds being believed in conformance unless provided otherwise, so the burden of proof is on non-conformance claim.

17

EXAMPLE 3

• Before agreeing to purchase a large order of polyethylene sheaths for a particular type of high pressure oil-filled submarine power cable, a company wants to see conclusive evidence that the true standard deviation of sheath thickness is less than 0.05 mm. What hypotheses should be tested, and why? In this context, what are the type I and type II errors?

18

Solution 3

is the population standard deviation. So, the appropriate hypothesis

H0: = 0.05 mm.

HA: < 0.05 mm.With this formulation the burden of proof is on the

data to show that the requirement has been met. Type I error: Conclude that the < 0.05 when it is

really equal to 0.05 mm.Type II error: Conclude that =0.05 mm when it is

really less than 0.05 mm.

19

HYPOTHESIS TEST FOR POPULATION MEAN,

known and X~N(, 2) Two-sided Test Test Statistic Rejecting Area

H0: = 0

HA: 0

• Reject Ho if z < -z/2 or z > z/2.

0x

z/ n

1- /2/2

z/2-z/2

Reject H0Reject H0

Do not reject H0

20

HYPOTHESIS TEST FOR POPULATION MEAN,

One-sided Tests Test Statistic Rejecting Area

1. H0: = 0

HA: > 0

Reject Ho if z > z.

2. H0: = 0

HA: < 0

Reject Ho if z < - z.

0x

z/ n

0x

z/ n

Reject H0

z

1-

Do not reject H0

-z

Reject H0Do not reject H0

1-

21

CALCULATION OF P-VALUE

• Determine the value of the test statistics,• For One-Tailed Test:

p-value= P(z > z0) if HA: >0

p-value= P(z < z0) if HA: <0

• For Two-Tailed Test

p=p-value = 2.P(z>zo) for z0>0

p=p-value = 2.P(z<z0) for z0<0

00

xz

/ n

z0

p-value

z0

p-value

-z0 z0

p/2p/2

22

DECISION RULE BY USING P-VALUES

• REJECT H0 IF p-value <

• DO NOT REJECT H0 IF p-value

p-value

23

EXAMPLE 4

• Do the contents of bottles of catsup have a net weight below an advertised threshold of 16 ounces?

• To test this 25 bottles of catsup were selected. They gave a net sample mean weight of .It is known that the standard deviation is . We want to test this at significance levels 1% and 5%.

X 15.9

.4

24

Solution 4

The z-score is:

The p-value is the probability of getting a score worse than this (relative to the alternative hypothesis) i.e.,

Compare the p-value to the significance level. Since it is bigger than both 1% and 5%, we do not reject the null hypothesis.

15.9 16

Z 1.25.4

25

P(Z 1.25) .1056

25

P-value for this one-tailed Test

• The p-value for this test is 0.1056

• Thus, do not reject H0 at 1% and 5% significance level. The contents of bottles of catsup have a net weight below an advertised threshold of 16 ounces.

-1.25

0.1056

0.10

0.05

26

• If we reject the null hypothesis, we conclude that there is enough evidence to infer that the alternative hypothesis is true.

• If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true.

• If we reject the null hypothesis, we conclude that there is enough evidence to infer that the alternative hypothesis is true.

• If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true. The alternative hypothesis

is the more importantone. It represents whatwe are investigating.

The alternative hypothesisis the more importantone. It represents whatwe are investigating.

Conclusions of a Test of Hypothesis

27

EXAMPLE 5

The melting point of each of 16 samples of a certain brand of hydrogenated vegetable oil was determined, resulting in . Assume that the melting point is normal with = 1.20.

a) Test whether the true average melting point of a certain brand of hydrogenated vegetable oil is 95 when =0.01.

94.32x

28

b) If a level .01 test is used, what is the probability of Type II error when the true mean is 94.

29

EXAMPLE 6 At a certain production facility that assembles

computer keyboards, the assembly time is known (from experience) to follow a normal distribution with mean of 130 seconds and standard deviation of 15 seconds. The production supervisor suspects that the average time to assemble the keyboards does not quite follow the specified value. To examine this problem, he measures the times for 100 assemblies and found that the sample mean assembly time ( ) is 126.8 seconds. Can the supervisor conclude at the 5% level of significance that the mean assembly time of 130 seconds is incorrect?

x

30

Solution 6

• We want to prove that the time required to do the assembly is different from what experience dictates:

• Since the standard deviation is ,

• The standardized test statistic value is:

AH : 130

X 126.8

15

126.8 130

Z 2.1315

100

31

Two-Tail Hypothesis:

H0:

HA:

1-

z0

Do not Reject H0

z=test statistic values

(-zz<z

Reject H0

(z<-z

Type I ErrorProbability

-z

Reject H0

(z>z

32

Test Statistic: -2.13=

10015

130-126.8=

n

-X=z

.9 z

-z

0 Z

Rejection Region

33

Conclusion 6

• Since –2.13<-1.96, it falls in the rejection

region.

• Hence, we reject the null hypothesis that the time required to do the assembly is still 130 seconds. The evidence suggests that the task now takes either more or less than 130 seconds.

34

Test of Hypothesis for the Population Mean ( unknown)

For samples of size n drawn from a Normal Population, the test statistic:

has a Student t-distribution with n-1 degrees of freedom.

x -ts / n

35

EXAMPLE 7

• 5 measurements of the tar content of a certain kind of cigarette yielded 14.5, 14.2, 14.4, 14.3 and 14.6 mg per cigarette. Show the difference between the mean of this sample and the average tar content claimed by the manufacturer, =14.0 is significance at =0.05.

x 14.4

52

2 2i2 i 1

( x x ) (14.5 14.4 ) ... ( 14.6 14.4 )s 0.025

n 1 5 1s 0.158

36

Solution 7

• H0: = 14.0

HA: 14.0

Decision Rule: Reject H0 if t<-t/2 or t> t/2.

0

/ 2 ,n 1 0.025 ,4

x 14.4 14.0t 5.66s / n 0.158 / 5

t t 2.776

37

Conclusion 7

• Reject H0 at = 0.05. Difference is significant.

5.66-2.766 2.766

0.0250.025

Reject H0 Reject H0

38

P-value of This Test

• p-value = 2.P(t > 5.66) = 2(0.0024)=0.0048

Since p-value = 0.0048 < = 0.05, reject H0.

Minitab Output

T-Test of the Mean

Test of mu = 14.0000 vs mu not = 14.0000

Variable N Mean StDev SE Mean T P-Value

C1 5 14.4000 0.1581 0.0707 5.66 0.0048

39

CONCLUSION USING THE CONFIDENCE INTERVALS

MINITAB OUTPUT:

Confidence Intervals

Variable N Mean StDev SE Mean 95.0 % C.I.

C1 5 14.4000 0.1581 0.0707 ( 14.2036, 14.5964)

• Since 14 is not in the interval, reject H0. =14 IS NOT IN THE CI

G. Baker, Department of StatisticsUniversity of South Carolina; Slide

40

Internal Combustion Engine

• The nominal power produced by a student-designed internal combustion engine should be 100 hp. The student team that designed the engine conducted 10 tests to determine the actual power. The data follow:

98, 101, 102, 97, 101, 98, 100, 92, 98, 100

Assume data came from a normal distribution.

G. Baker, Department of StatisticsUniversity of South Carolina; Slide

41


ColumnColumn nn MeanMean Std. Dev.Std. Dev.

hphp 1010 98.798.7 2.92.9

Summary Data:

What is the probability of getting a sample mean of 98.7 hp or less if the true mean is 100 hp?


-4 -3 -2 -1 0 1 2 3 4

t(df=9)

)418.1(10/9.2

1007.98)100|7.98( 99

dfdf tPtPyP

0.0949

What did we assume when doing this analysis?

Are you comfortable with the assumption?

43

EXAMPLE 8

The amount of shaft wear (.0001in.) after a fixed mileage was determined for each of n=8 internal combustion engines having copper lead as a bearing material, resulting in . Assuming that the distribution of shaft wear is normal with mean , test the mean shaft wear is greater than 3.5 at 5 % significance level.

3.72 and 1.25x s

44

EXAMPLE 9

To obtain information on the corrosion-resistance properties of a certain type of steel conduit, 25 specimen are buried in soil for a 2-year period. The maximum penetration (in mils) for each specimen is then measured, yielding a sample average penetration of and a sample standard deviation of s=4.8. The conduits were manufactured with the specification that true average penetration be at most 50 mils. They will be used unless it can be demonstrated conclusively that the specification has not been met. What would you conclude?

x=52.7

45

TESTING HYPOTHESIS ABOUT POPULATION PROPORTION, p

• ASSUMPTIONS:

1. The experiment is binomial.

2. The sample size is large enough.

x: The number of success

The sample proportion is

approximately for large n (np 5 and n(1-p) 5 ).

x p(1 p)p̂ ~ N(p, )

n n

46

HYPOTHESIS TEST FOR p

p̂ pz

p(1 p) / n

where np 5 and n(1 p) 5

/2/2

-z/2 z/2

Reject H0Reject H0Do not reject H0

Two-sided Test Test Statistic Rejecting Area

H0: p = p0

HA: p p0

• Reject H0 if z < -z/2 or z > z/2.

47

HYPOTHESIS TEST FOR p

One-sided Tests Test Statistic Rejecting Area

1. H0: p= p0

HA: p > p0

Reject H0 if z > z.

2. H0: p = p0

HA: p < p0

Reject Ho if z < - z.

p̂ pz

p(1 p) / n


z

Do not reject H0Reject H0

-z

Reject H0 Do not reject H0

p̂ pz

p(1 p) / n


48

EXAMPLE 10

• Mom’s Home Cokin’ claims that 70% of the customers are able to dine for less than $5. Mom wishes to test this claim at the 92% level of confidence. A random sample of 110 patrons revealed that 66 paid less than $5 for lunch.

H0: p = 0.70HA: p 0.70

49

Solution 10

• x = 66, n = 110 and p = 0.70

= 0.08, z/2 = z0.04 = 1.75

• Test Statistic:

x 66p̂ 0.6

n 110

0.6 0.7z 2.289

(0.7)(0.3) /110

50

Conclusion 10

• DECISION RULE:

Reject H0 if z < -1.75 or z > 1.75.

• CONCLUSION: Reject H0 at = 0.08. Mom’s claim is not true.

/2/2

1.75-1.75-2.289

Reject H0Reject H0

51

P-Value

• p-value = 2. P(z < -2.289) =2(0.011) = 0.022

The smallest value of to reject H0 is 0.022.

Since p-value = 0.022 < = 0.08, reject H0.

-2.289

0.011 0.011

2.289

52

CONFIDENCE INTERVAL APPROACH

• Find the 92% CI for p.

• 92% CI for p: 0.518 p 0.682• Hypothesis should be two sided to use

confidence interval approach.• Since p=0.7 is not in the above interval, reject

H0. Mom has underestimated the cost of her meal.

/ 2

ˆ ˆp(1 p) (0.6)(0.4)p̂ z 0.6 1.75

n 110

53

EXAMPLE 11

• Scientists think that robots will play a crucial role in factories in the next several decades. Suppose that in an experiment to determine whether the use of robots to weave computer cables is feasible, a robot was used to assemble 500 cables. The cables were examined and there were 15 defectives. If human assemblers have a defect rate of 0.035, does this data support the hypothesis that the proportion of defectives is lower for robots than humans? Use a 1% significance level.

54

Solution 11

H0: p = 0.035

HA: p < 0.035

It is given that x=15 and n=500. Thus,

X~Bin(n=500, p=0.035)

np=17.5>5 and n(1-p)=482.5>5, we can use normal approximation to binomial

15ˆ 0.03

500

xp

n

55

Solution 11 (continue)• Test statistic:

• Rejection region:

Reject H0 if z < -z = -z0.01= -2.33.• Conclusion: Since z = - 0.6085 > -z0.01= -2.33, do

not reject H0. Robots do not demonstrate their superiority.

ˆ 0.03 0.0350.6084

(1 ) 0.035(1 0.035)500

p pz

p pn

56

– Voters are asked by a certain network to participate in an exit poll in order to predict the winner on election day.

– Based on the data (where 1=Democrat, and 2=Republican), can the network conclude that the republican candidate will win the state college vote?

Example 12 (Predicting the winner in election day)

57

• The problem objective is to describe the population of votes in the state.– The data are nominal.– The parameter to be tested is ‘p’.– Success is defined as “Vote republican”.– The hypotheses are:

H0: p = .5

H1: p > .5 More than 50% vote RepublicanMore than 50% vote Republican

Solution 12

58

The rejection region is z > z = z.05 = 1.645.

From file we count 407 success. Number of voters participating is 765.

The sample proportion is

The value of the test statistic is

The p-value is = P(Z>1.77) = .0382

532.765407p̂

77.1765/)5.1(5.

5.532.

n/)p1(p

pp̂Z

Solving by hand

59

z-Test : Proportion

Sample Proportion 0.532Observations 765Hypothesized Proportion 0.5z Stat 1.77P(Z<=z) one-tail 0.0382z Critical one-tail 1.6449P(Z<=z) two-tail 0.0764z Critical two-tail 1.96

There is sufficient evidence to reject the null hypothesisin favor of the alternative hypothesis. At 5% significance level we can conclude that more than 50% voted Republican.

Testing the ProportionTesting the Proportion

<=0.05

Simple formula for difference in proportions

221

2/2

)(p

)Z)(1)((2

p

Zppn

Sample size in each group (assumes equal sized groups)

Represents the desired power (typically .84 for 80% power).

Represents the desired level of statistical significance (typically 1.96).

A measure of variability (similar to standard deviation)

Effect Size (the difference in proportions)

Simple formula for difference in means

Sample size in each group (assumes equal sized groups)

Represents the desired power (typically .84 for 80% power).

Represents the desired level of statistical significance (typically 1.96).

Standard deviation of the outcome variable Effect Size (the

difference in means)

2

2/2

2

difference

)Z(2

Zn

Sample size calculators on the web…

• http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize

• http://calculators.stat.ucla.edu

• http://hedwig.mgh.harvard.edu/sample_size/size.html

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize

http://calculators.stat.ucla.edu/

http://hedwig.mgh.harvard.edu/sample_size/size.html

http://hedwig.mgh.harvard.edu/sample_size/size.html

Date post:	03-Jan-2016
Category:	Documents
Upload:	nurgazy-nazhimidinov
View:	492 times
Download:	11 times

11. Hypothesis Testing

Documents