+ All Categories
Home > Documents > EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

Date post: 19-Dec-2015
Category:
View: 217 times
Download: 0 times
Share this document with a friend
56
EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing
Transcript
Page 1: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

EPIDEMIOLOGY AND BIOSTATISTICS DEPT.

2011

Esimating Population Value with

Hypothesis Testing

Page 2: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Introduction

• Every member of a population cannot be examined so we use the data from a sample, taken from the same population, to estimate some measure, such as the mean, of the population itself.

• The sample will provide us with the best estimate of the exact 'truth' about the population. The method of sampling depends on the data available but the ideal method, as every member of the population has an equal chance of being selected, is random sampling.

Page 3: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

• We estimate limits within which we are expect the 'truth' about the population to lie and state how confident we are about this estimation.

• There are therefore two types of estimate of a population parameter:– Point estimate - one particular value– Interval estimate - an interval centred on the

point estimate.

Point estimate

Interval estimate

Page 4: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Estimating population

• Point estimate is a single number used to estimate a population parameter. The best point estimate of the population mean is the sample mean.

• The accuracy with which the sample mean estimates the population mean is dependent upon how well the sample represents the population.

• Interval estimate, which is a range of values used to estimate a population parameter

Page 5: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Hypothesis Testing

• Statistics to test hypotheses take the following general form

Hypothesis Testing

• Hypothesis testing is generally used when some comparison is to be made.

Page 6: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Hypothesis testing is the use of statistics to determine the probability that a given

hypothesis is true.

Hypothesis in statistics, is a claim or statement

about property of a population

Page 7: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

The usual process of hypothesis testing consists of four steps.• Formulate the null hypothesis (commonly, that the

observations are the result of pure chance) and the alternative hypothesis (commonly, that the observations show a real effect combined with a component of chance variation).

• Identify a test statistic that can be used to assess the truth of the null hypothesis.

• Compute the P-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the null hypothesis were true. The smaller the value, the stronger the evidence against the null hypothesis.

• Compare the value to an acceptable significance value (sometimes called an alpha value). If , that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis is valid.

Page 8: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Treatment A

Treatment B

Survive

Not Survive

Survive

Not Survive

Examples :• We were to give a new cancer treatment to a group of

patients• Survival rate, for example, was different than the survival

rate of those who do not receive the new treatment. • What we are testing then is whether the sample patients

who receive the new treatment come from the population we already know about (cancer patients without the treatment).

• Hipotesis????? H0?....H1?

Page 9: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

• The parameter (mean, proportion, relative risk, coefficient of correlation) in a study population, which can be estimated only by observing the sample, is equal to the values given by the hypothesis.

• If the estimated value for the parameter turns out to be close enough to the hypothesized value, we can accept the hypotheses.

• If not, we may have to reject the hypothesis.

Page 10: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

• A significance test estimate the likelihood that an observed result (e.g. a difference between two groups) is due to the chance.

• In other words, a significance test is used to find out whether a study result which is observed in a sample can be considered as a result which exists in the population from which the sample was drawn.

Page 11: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Example : • We are investigating the medical risks associated

with a certain occupation and we take a random sample of 20 men aged 30-39 and their mean systolic blood pressure is found to be 141.4 mmHg.

• Suppose the past experience has told us that in the population at large the mean systolic blood pressure for men of this age group is = 133.2 mmHg with standard deviation = 15.1 mmHg.

Does the evidence of our sample indicate an increased blood pressure associated with this occupation ?

Page 12: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

• Suppose for the moment, we propose a hypothesis, that there is no increase in blood pressure in this occupation, and the sample of 20 men can be regarded as a random sample from the whole population of men aged 30-39 years.

• Then we know (in past experience) that the means of samples of 20 will be distributed normally about a mean of = 133.2 mmHg, with standard deviation /n = 15.1/20 = 3.38 (standard error of the mean).

Page 13: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

• From what we know of the normal distribution sample means outside the range 133.2 1.96 x 3.38, i.e. outside 126.6 to 139.9 would occur only in 5 % of samples of this size, i.e, with probability 0.05.

• Our sample mean lies outside this range because it is 141.4 mmHg.

• What can we conclude ?

Page 14: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

1. Our hypothesis that there is no increase in systolic blood pressure in this occupation is correct and our sample mean was large purely by an unfortunate sampling fluke. That is, a result as extreme as our sample mean which has a probability of 0.05, just happened to occur.

2. Our hypothesis that there is no increase in systolic blood pressure in this occupation is wrong

We cannot be sure which of these alternatives is correct, butbecause the probability that (1) is the correct conclusion is to small, we are obliged to

conclude(2) Thus we conclude that it is likely that there is an increase is

systolic blood pressure among men in this occupation and the probability P that we are wrong is less than 0.05. We write this as p <0.05.This type of argument is called a significance test.

Page 15: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

TEST STATISTIC

PROVED !!

Page 16: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Page 17: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

From formula : 95 % confidence interval for x 1.96 / n or equivalently if : x - Z = ---------- n is numerically greater than 1.96 we say the difference betwen x and is significant at the 5 % level and we write p <0.05.

If the Z is greater than 2.58 the difference is significant at the 1 % level and we write p <0.01.

Page 18: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

From formula : 95 % confidence interval for : = p 1.96 * (1- )/n or equivalently if : (p - ) Z = ---------------- (1- )/n If Z < 1.96 we say the difference betwen p and is not significant at the 5 % level and we write p >0.05.

If the Z >1.96 the difference is significant at the 5 % level and we write p <0.05.

If the Z > 2.58 the difference is highly significant (p < 0.01).

Page 19: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Mean, , is unknown

Population Random SampleI am 95%

confident that is between 40 &

60.

Mean X = 50

Estimation Process

Sample

Page 20: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Interpretation of a Confidence Interval• (1 - α) x 100% of the confidence intervals

– Constructed from different samples will actually contain the population mean.

– The probability that you obtain a confidence interval that contains the population mean.

• Often it is more useful to quote two limits between which the parameter is expected to lie, together with the probability of it lying in that range.

• The limits are called the confidence limits and the interval between them the confidence interval.

• e.g. We are 95% confident that the mean male height lies between 158 cm and 175 cm.

Page 21: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

• The width of the confidence interval depends on three sensible factors:the degree of confidence we wish to have in it,

the chance of it including the 'truth', e.g. 95%;the size of the sample, n; the amount of variation among the members of

the sample, i.e. its standard deviation, s.

Page 22: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

P-value

The P-value is the probability of observing a sample statistic as extreme as the test statistic, assuming the null hypothesis is true.

Interpret the results:If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.

Page 23: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 24: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 25: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 26: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 27: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 28: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 29: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 30: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 31: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 32: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 33: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 34: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 35: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 36: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 37: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 38: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 39: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

Conclusions in hypothesis testing* Always test the null hypothesis - Reject the H0

- Fail to reject the H0

Page 40: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 41: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 42: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 43: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 44: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 45: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.
Page 46: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

Hypothesis test of a population mean, .

The variable X is normally distributed in the population with mean and variance 2. Two situations are considered : (1) 2 known (from previous experience) (2) 2 unknown.

1. 2 known To a test of any parameter which is estimated by a statistic whose sampling distribution is normal. The procedure is :

a. Specify H0 : = 0, where 0 is a particular value. b. Specify H1 : 0 , say. c. Select a random sample of observations, x1, x2, ..., xn

d. Compute from a sample x = xi / n

Page 47: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

e. Consider the test statistic

( x - 0 ) Z = -------------------- ( / n)

f. Determine the critical region from tables of the standard normal distribution (see table 1). Since the specification of H1

has no direction, the critical region consists of both tails of the distribution.

Thus, for a two-tailed test at 2 level of significance, reject H0 if | Z | > Z () { i.e. If Z > Z () or Z < - Z () }; p < 2.

Do not reject H0 otherwise, p > 2. In particular, if 2 = 0.05--> Z () = 1.96. If 2 = 0.01--> Z () = 2.58.

Page 48: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Standard Normal Distribution Table

Page 49: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

2. 2 unknown a. Consider the test statistic (x - 0 )

T = -------------------- (s / n)

where s = the sample estimator of . T has a t-distribution on n-1 degrees of freedom. b. Determine the critical region from tables of the t-distribution (Table 2). From a two tailed test at the 2 level of significance, reject H0 if : | T| > t n - 1 () {i.e. T > t n - 1 () or T < - t n - 1 () }, p < 2. Do not reject H0 otherwise, p > 2

Page 50: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Example :

The sleeping time from the nine observations are 25; 31; 24; 28; 29; 30;31; 33 and 35 min.From these we wish to test at = 0.05.

H0 : = 26 versus H1 : 26 Suppose that the population variance is unknown and must be estimated from the sample. We assume the nine observations are from a normal population.

Page 51: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

• From these data, we compute

x = 29.56

• s2 = 12.53, s = 3.539. From table 2 (appendix), t 0.975 (8) = 2.306, and we reject H0 if the computed T exceeds 2.306. The computed T is T = (29.56 - 26) / (3.539/9) = 3.02

• Which exceeds 2.306; thus we reject H0 at the 0.05 significance level.

Page 52: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

Page 53: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

Hypothesis test of a population proportion,

The procedure is to :

a. Specify H0 : = 0, where 0 is a particular value. b. Specify H1 : 0 , say. c. Select a random sample of n individuals and determine the number x, of them with the characteristic. d. Compute from a sample p =x / n e. Consider the test statistic (p - 0 )

Z = --------------------

(0 (1 - 0 ) / n)

This test statistic has a standard normal distribution. f. Determine the critical region from tables of the standard normal distribution. For a two-tailed test at 2 level of significance, reject H0

if | Z | > Z () { i.e. If Z > Z () or Z < - Z () }; p < 2. Do not reject H0 otherwise, p > 2.

Page 54: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

PROBLEMS

1. The mean level of prothrombin in the normal population is known to be 20 mg/100 ml of plasma and standard deviation is 4 mg/100 ml. A sample of 40 patients showing vitamin K deficiency has a mean prothrombin level of 18.5 mg/100 ml. How reasonable is it to conclude that the true mean for patients with vitamin K deficiency is the same as that for the normal population ?

2. The height of adults living in suburban area of a large city has a mean equal to 160 cm, with standard deviation 7.5 cm. In a sample of 178 adults living in the inner city area, the mean height is found to be 156 cm. Assuming the same standard deviation for the two groups, are the mean heights significantly different ?

Page 55: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

LULU E. BUDIMAN

3. A program to stop smoking expects to obtain a 75 % success rate. The observed number of definitive cessations in a group of 100 adult attending the program is 80. Is this sufficient evidence to conclude that the success rate has increased ?

4. From population mortality data, suppose that 4 % of males age 65 die within one year. If it is found that 60 of such males in a group of 1000 die within a year, is this evidence of an increase in mortality in this sample ?

LULU E. BUDIMAN

Page 56: EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

Recommended