+ All Categories
Home > Education > Ebd1 lecture 6&7 2010

Ebd1 lecture 6&7 2010

Date post: 05-Dec-2014
Category:
Upload: reko-kemo
View: 590 times
Download: 0 times
Share this document with a friend
Description:
 
29
1 General Studies Community Dentistry 1 Lecture 6 Dr Nizam Abdullah Statistical Inference © The University of Adelaide, School of Dentistry Contents Review of descriptive statistics The normal curve Introduction to inferential statistics
Transcript
Page 1: Ebd1 lecture 6&7 2010

1

General Studies

Community Dentistry 1

Lecture 6

Dr Nizam Abdullah

Statistical Inference

© The University of Adelaide, School of Dentistry

Contents

Review of descriptive statistics

The normal curve

Introduction to inferential statistics

Page 2: Ebd1 lecture 6&7 2010

2

© The University of Adelaide, School of Dentistry

MeanMedian (50th Percentile)Mode

DispersionDispersionStandard deviation (SD) / VarianceInter-quartile range (IQR) (3rd quartile – 1st

quartile)Range (Maximum – Minimum)

Descriptive statisticsDescriptive statistics

Central tendencyCentral tendency

© The University of Adelaide, School of Dentistry

Distribution of a variableAnother important aspect of the description of a variable is the shape of its distribution, which tells you the frequency of values from different ranges of the variable.

Typically, a researcher is interested in how well the distribution can be approximated by the normal distribution.

The normal distribution can be used to determine how far the sample is likely to be off from the overall population, i.e. how big a ‘margin of error’there is likely to be.

Simple descriptive statistics can provide some information relevant to this issue.

Page 3: Ebd1 lecture 6&7 2010

3

© The University of Adelaide, School of Dentistry

Distribution of a variable (cont.)

A variable is said to be a normally distributed variable or to have a normal distribution if its distribution has the shape of a normal curve - the normal curve is a kind of bell-shaped curve.

A normal distribution (and hence a normal curve) is completely determined by its mean and standard deviation - the mean and standard deviation are called the parameters of the normal curve.

The normal curve is symmetric and centered about the mean.

The standard deviation determines the spread of the curve. The larger the standard deviation, the flatter and more spread out the curve will be.

© The University of Adelaide, School of Dentistry

Normal curve (cont.)

The mean, median, and mode all have the same value.

Page 4: Ebd1 lecture 6&7 2010

4

© The University of Adelaide, School of Dentistry

Different shapes of the Normal curve

Standard deviation changes the relative width of the distribution; the larger the standard deviation, the wider the curve.

© The University of Adelaide, School of Dentistry

Age distribution of Village A

0

5

10

15

20

25

30

35

40

45

0-4

5-9

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80+

• Bell-shaped curve• Symmetrical about its mean (mirror image

to each side)

Mean and median are equal.One side of the mean is 50% of the area.The area between mean-1SD and mean+1SD is 68% (Mean±1SD=30, 50).The area between mean-2SD and mean+2SD is 95% (Mean±2SD=20, 60).The area between mean-3SD and mean+3SD is 99.7% (Mean±3SD=10, 70).

50%50%

68%

95%99.7%

Properties of normal distribution

e.g. : (Age) Mean = 40, SD = 10Therefore, Mean±1SD = 30, 50Between 30 yr and 50 yr old, there will be 68% of the group.

Page 5: Ebd1 lecture 6&7 2010

5

© The University of Adelaide, School of Dentistry

Normal curve: 68-95-99.7 rule

68% of the observations fall withinone standard deviationof the mean

95% of the observations fall withintwo standard deviationsof the mean

99.7% of the observations fall withinthree standard deviations of the mean

-σ +σµ

68%

95%

-2σ µ +2σ

-3σ µ +3σ

99.7%

© The University of Adelaide, School of Dentistry

Distributions: NegativeIn a negatively skewed distribution, the mode is at the top of the curve, the median is lower than it, and the mean is lower than the median.The result is a ‘tail’ towards the more negative side of the graph.

Negative skewness (tail to left = left skewed)

Median < Mode

Mean < Median

Page 6: Ebd1 lecture 6&7 2010

6

© The University of Adelaide, School of Dentistry

Distributions: PositiveIn a positively skewed distribution, the mode is at the top of the curve, the median is higher than it, and the mean is higher than the median.The result is a ‘tail’ towards the more positive side of the graph.

Positive skewness :(tail to right = right skewed)

Median > Mode

Mean > Median

© The University of Adelaide, School of Dentistry

Example datasetFirst Year BDS students enrolled in EBD1.Response to survey: n= 90 (out of 119), or 76%.Variables:

– Age: quantitative variable measured on a ratio scale– Sex: qualitative variable measured on a nominal

scale, i.e. variable with categories male or female– Height: quantitative variable measured on a ratio scale– Weight: quantitative variable measured on a ratio

scale

Variables measured at a higher level can always be converted to a lower level, but not vice versa.For example, observations of actual age (ratio scale) can be converted to categories of older and younger (ordinal scale). Similarly for height and weight.

Page 7: Ebd1 lecture 6&7 2010

7

© The University of Adelaide, School of Dentistry

Data spreadsheet

Case Age Sex Height Weight1 21 2 165 452 22 2 170 533 18 1 . 744 20 2 165 445 19 1 175 706 19 2 163 537 24 2 163 498 18 1 170 609 29 2 178 70

10 28 2 163 5811 18 2 177 7212 38 2 164 6513 23 2 161 6514 20 2 178 6315 29 2 159 54

: : : : :: : : : :

© The University of Adelaide, School of Dentistry

Frequency distribution of height variable

150 155 160 165 170 175 180 185 190 195

Height (cm) Frequency

150-155 4

155-160 9

160-165 21

165-170 24

170-175 10

175-180 10

180-185 7

185-190 3

190-195 1

Total 89

Mode = 165cm, 170cm

Mean = 169.3cm

Median = 168cm

Page 8: Ebd1 lecture 6&7 2010

8

© The University of Adelaide, School of Dentistry

Frequency distribution of weight variable

40 45 50 55 60 85 90 100 10565 70 80 95 120 12575

Weight (kg) Freq.40-45 445-50 950-55 1555-60 1660-65 1365-70 770-75 1175-80 480-85 385-90 390-95 295-100 1100-105 1105-110 0110-115 0115-120 0120-125 1Total 90

Mean = 62.6 kg

Median = 60 kg

© The University of Adelaide, School of Dentistry

Frequency distribution of age variable

Mode < Median < Mean

18 yrs 19 yrs 20 yrs

Page 9: Ebd1 lecture 6&7 2010

9

© The University of Adelaide, School of Dentistry

Descriptive statistics

Age

Weight

Variable

Height

3.4

14.5

SD

8.8

38.0

120.0

Max

191.0

17.0

40.0

Min

152.0

90

90

Freq

89

21.0

80.0

Range

39.0

35.6

100.0

%

64.4

32

90

Freq

58

Sex

Variable

Male

Total

Category

Female

© The University of Adelaide, School of Dentistry

What is Inferential Statistics ?

It is the Statistical Technique/Method used to infer the result of the sample (statistic) to the population (parameter).

Population (Village A)

Sample10.14x =

µµ=? =?

The technique is called “Inferential Statistics”

Page 10: Ebd1 lecture 6&7 2010

10

© The University of Adelaide, School of Dentistry

Statistical inference

Inferential statistics are used to draw inferences about a population from a sample.

For example, the average number of decayed teeth in children aged 5 years can be estimated using observations from a sample of 5-year-olds.

© The University of Adelaide, School of Dentistry

Selecting a sample from a population

How can a sample that is representative of the population of interest be selected?

Answer: by random selection

When a random sample is drawn from the population of interest, every member of the population has the same probability, or chance, of being selected in the sample.

For this reason, random samples are considered to be unbiased.

Page 11: Ebd1 lecture 6&7 2010

11

© The University of Adelaide, School of Dentistry

Two types of Inferential Statistics

Parameter Estimation

Hypothesis testing

© The University of Adelaide, School of Dentistry

1. Parameter estimation

• Parameter estimation takes two forms:

• 1. Point estimation• 2. Interval estimation

Page 12: Ebd1 lecture 6&7 2010

12

© The University of Adelaide, School of Dentistry

Definition

• A point estimate is a single numerical value used to estimate the corresponding population parameter

• An interval estimate consists of two numerical values defining a range of values that, with a specific degree of confidence, we feel includes the parameter being estimated

© The University of Adelaide, School of Dentistry

Point estimate is when an estimate of the population parameter is given as a single number, e.g. sample mean, median, variance, standard deviation.

Interval estimation involves more than one point; it consists of a range of values within which the population parameter is thought to be, a confidence interval which contains the upper and lower limits of the range of values.

Point and interval estimates let us infer the true value of an unknown population parameter using information from a random sample of that population.

Parameter estimation

Page 13: Ebd1 lecture 6&7 2010

13

© The University of Adelaide, School of Dentistry

ExampleSuppose a paper reports that, among a sample of 2,823 5–6-year-old children living in Sharjah, the mean number of decayed teeth is 0.81 (SD = 1.66) with a 95% confidence interval of (0.75, 0.87).

InterpretationThe 95% confidence interval is the range in the mean number of decayed teeth we would expect in a population of 6-year-old children living in Sharjah.Because only a sample of children were used, the exact population mean cannot be known for certain.Hence, the 95% confidence interval indicates the margin of imprecision due to sampling error.Or, alternatively, you could think of it as the range in which there is a 95% chance that the true population mean lies.

Confidence intervals (cont.)

© The University of Adelaide, School of Dentistry

1. Estimation (CI)Population

Sample10.14x =

µµ=?=?

CI ±= x

)}( * { CI 95% 025.0 S.Etx ±=

)}43.0(* {1.96 10.14CI 95% ±=

{ta/2 * (Standard Error)}

Page 14: Ebd1 lecture 6&7 2010

14

© The University of Adelaide, School of Dentistry

1. Estimation (CI)Population

Sample10.14x =

µµ=?=?

CI ±= x)}( * { CI 95% 025.0 S.Etx ±=

s.d = 4.3n = 100

ns.dS.E =

0.431004.3S.E ==

)}43.0(* {1.96 10.14CI 95% ±=

0.8514 10.14CI 95% ±=

{tα/2 * (Standard Error)}

10.99 9.29, CI 95% =

© The University of Adelaide, School of Dentistry

We are 95% sure that mean of the population will lie between 9.29 and 10.99.

11.26 9.02, CI 99% =

Population

Sample10.14x =

µµ=?=?

1. Estimation (CI)

10.99 9.29, CI 95% =

For 99% replace 1.96 with 2.58

Page 15: Ebd1 lecture 6&7 2010

15

© The University of Adelaide, School of Dentistry

95% Confidence interval formula

Standard deviation vs. Standard error of the statistic

These two statistics are used for very different purposes.

Standard deviation is a measure of spread of a set of observations.

Standard error measures sampling error and is used to indicate the precision of a statistic, i.e. how close the statistic is to the parameter it is estimating.

⎟⎠

⎞⎜⎝

⎛∗±nDevStdEstimate .96.1 Std. error

e.g. Mean

© The University of Adelaide, School of Dentistry

Standard error example

Standard error of the meanIn a sample of 2,823 5–6-year-old children living in Sharjah, the mean number of decayed teeth is 0.81 and std deviation is 1.66.

The standard error is approximately 0.03.

So, we expect, on average, observed sample means of 0.81, but, when we’re wrong, we expect to be off by about 0.03 points, on average.

Standard error of the sample mean gives an indication of the extent to which the sample mean deviates from the population mean.

⎟⎠

⎞⎜⎝

⎛∗±nDevStdEstimate .96.10.81

03.02823

66.1ErrorStd ==

Page 16: Ebd1 lecture 6&7 2010

16

© The University of Adelaide, School of Dentistry

2. Hypothesis Testing

© The University of Adelaide, School of Dentistry

What is hypothesis testing?

In Estimation, we estimate a population parameter from a sample statistic

In Hypothesis testing, we answer to a specific question related to a population parameter

Page 17: Ebd1 lecture 6&7 2010

17

© The University of Adelaide, School of Dentistry

Hypothesis testing

• A (statistical) hypothesis is a statement of belief about population parameters

• It is a predominant feature of quantitative research in oral health & health care research in general

• Researchers can test a hypothesis to see whether the collected data support or refute such hypothesis

© The University of Adelaide, School of Dentistry

2 types of hypotheses

• The null hypothesis, symbolized by Ho; proposes no relationship between 2 variables or no effect in the population

• The alternative hypothesis, symbolized by Ha; is a statement that disagrees with the null hypothesis.

Page 18: Ebd1 lecture 6&7 2010

18

© The University of Adelaide, School of Dentistry

• If the null hypothesis is rejected as a result of sample evidence, then the alternative hypothesis is concluded

• If the evidence is insufficient to reject, the null hypothesis is retained, but not accepted

• Traditionally researches do not accept the null hypothesis from current evidence; they state that it cannot be rejected

© The University of Adelaide, School of Dentistry

Example

A toothpaste company claims that their toothpaste contains, on average, 1100 ppm of fluoride.Suppose we are interested in testing this claim. We will randomly sample 100 tubes (i.e., n=100) of toothpaste from this company and under identical conditions calculate the average fluoride content (in ppm) for this sample.From the sample of 100 tubes of toothpaste, the average ppm was found to be 1035 (= ). Could this sample have been drawn from a population with mean fluoride content of µ=1,100 (known variance σ2=200).

X

Page 19: Ebd1 lecture 6&7 2010

19

© The University of Adelaide, School of Dentistry

1. Propose a research question (identify the parameter of interest).

2. State the null hypothesis, H0 and alternative hypotheses, HA

3. Define a threshold value for declaring a P-value significant. The threshold is called the significance level of the test is denoted by alpha (α) and is commonly set to 0.05.

4. Select the appropriate statistical test to compute the P-value.

Basic steps in hypothesis testing

© The University of Adelaide, School of Dentistry

5. Compare the P-value of your test to the chosen level of significance. Can the null hypothesis be rejected?

6. If P-value < α , conclude that the difference is statistically significant and decide to reject the null hypothesis.If P-value ≥ α, conclude that the difference is not statistically significant and decide not to reject the null hypothesis.

Basic steps in hypothesis testing (cont.)

Page 20: Ebd1 lecture 6&7 2010

20

© The University of Adelaide, School of Dentistry

Example

A toothpaste company (X) claims that their toothpaste contains, on average, 1100 ppm of fluoride.What is the research question?

X

© The University of Adelaide, School of Dentistry

Research Q: Is the mean fluoride content in toothpaste X 1100ppm?

Ans: Yes or No

Ho: 1100µ =

Ha: 1100µ ≠

What is hypothesis testing?

1) Null hypothesis: The mean fluoride content in toothpaste X is equal to 1100ppm

2) Alternative hypothesis : The mean fluoride content in toothpaste X is not equal to 1100ppm

Page 21: Ebd1 lecture 6&7 2010

21

© The University of Adelaide, School of Dentistry

Define the p value (commonly set at 0.05

At the end of the hypothesis testing, we will get a P value.

If the P value is less than 0.05, we reject the Null Hypothesis (Ho).

If the P value is more than or equal to 0.05, we cannot reject the Null Hypothesis (Ho).

Select appropriate test to compute the p value

© The University of Adelaide, School of Dentistry

In above example, if we get P=.01, we reject the null hypothesis (Ho), then ……We conclude as Alternative Hypothesis (Ha) … “the mean fluoride content in toothpaste X is different from 1100ppm”.

Alternatively, we may report as ……“the mean fluoride content is significantly different from 1100ppm”.Note: (1) The second conclusion is more commonly used in the literature.

Q: Is the fluoride content in toothpaste X 1100ppm?

Ans: Yes or No

Ho: 1100µ = Ha: 1100µ ≠ 100n 200; varince1035;x ==

Page 22: Ebd1 lecture 6&7 2010

22

© The University of Adelaide, School of Dentistry

In above example, if we get P=.08, we CANNOT reject the null hypothesis (Ho), then ……We conclude as Alternative Hypothesis (Ha) … “the mean fluoride content in toothpaste X is NOT different from 1100ppm”.

Alternatively, we may report as ……“the mean fluoride content is NOTsignificantly different from 1100ppm”.

Q: Is the fluoride content in toothpaste X 1100ppm?

Ans: Yes or No

Ho: 1100µ = Ha: 1100µ ≠ 100n 200; varince1035;x ==

© The University of Adelaide, School of Dentistry

What is P value?

If the P value is less than 0.05, we reject the Null Hypothesis.

P value is the probability of error if you reject the Null Hypothesis and conclude as the Alternative Hypothesis.

Example: P value=0.01. It means that …There is 1% probability of error in our conclusion, if we conclude as Alternative Hypothesis (“significantly different”).

We, normally, allow less than 5% error.That is why the cut-off point for P value is 0.05.

Ans: Yes or No

Ho: 1100µ = Ha: 1100µ ≠ 100n 200; variance1035x ===

Q: Is the mean fluoride content in toothpaste X 1100ppm?

Page 23: Ebd1 lecture 6&7 2010

23

© The University of Adelaide, School of Dentistry

If the P value is less than 0.05, we reject the Null Hypothesis.

P value is the probability of error if you reject the Null Hypothesis and conclude as the Alternative Hypothesis.

Example: P value=0.2. It means that …There is 20% probability of error in our conclusion if we conclude as Alternative Hypothesis (“significant difference”).

Therefore, we can’t conclude as it is “significantly different”. We have to conclude as “the difference is not significant”.

What is P value?Q:

Ans: Yes or No

Ho: 1100µ = Ha: 1100µ ≠ 100n 200; variance1035;x ==

Is the mean fluoride content in toothpaste X 1100ppm?

© The University of Adelaide, School of Dentistry

If the P value is less than 0.05, we reject the Null Hypothesis.

It means that we have set the cut-off point at P less than 0.05 to reject the Ho.

We say this as …We set the “Alpha” at 0.05.

Because the type of error that we have been talking about, is called “Type I error” or “Alpha error”.

What is P value?Q: Is the mean fluoride content in toothpaste X 1100ppm?

Ans: Yes or No

Ho: 1100µ = Ha: 1100µ ≠ 100n 200; variance1035;x ==

Page 24: Ebd1 lecture 6&7 2010

24

© The University of Adelaide, School of Dentistry

Definition

The P-value is the smallest level of significance that would lead to rejection of the null hypothesis H0. (The p-value is the observed significance level.)

All statistical tests produce a P-value.

P-values answers the question: ‘Is there a statistically significant difference between study groups?’

The use of P-values in hypothesis testing

© The University of Adelaide, School of Dentistry

Most scientific articles report a P-value associated with a test. Generally, the P-value is compared to a significance level (α) of 0.05 or 0.01 in order to determine whether or not the result is statistically significant.

Decision rules:If P-value ≤ α then reject H0 at level α (a statistically significant result).If P-value > α then do not reject H0 at level α (not statistically significant).

Example:If P-value<0.05, this indicates that there is a less than 5% chance that the results observed occurred due to chance. We reject H0 and conclude that the result is significant.

P-values

Page 25: Ebd1 lecture 6&7 2010

25

© The University of Adelaide, School of Dentistry

Example (cont.)So, our hypotheses are:

H0: µ = 1,100HA: µ ≠ 1,100

The P-value for this test was found to be 0.0006.

Since P-value < 0.05, we reject H0 in favour of HA, i.e. we reject the original assumption that the sample was drawn from a population where µ=1,100 and σ2=200.

We say that there is a significant difference between the sample mean and the population mean at the 5% level, i.e. there is a less than 5% chance (or 0.06% chance) that the result observed occurred due to chance.

What is your conclusion?

© The University of Adelaide, School of Dentistry

When we sample, we select cases from a population of interest. Due to chance variations in selecting the sample’s few cases from the population’s many possible cases, the sample will deviate from the defined population’s true nature by a certain amount. This is called sampling error.

Therefore, inferences from samples to populations are always probabilistic, meaning we can never be 100% certain that our inference was correct.

Drawing the wrong conclusion is called an error of inference.

There are two types of errors of inference defined in terms of the null hypothesis: Type 1 error and Type 2 error.

Types of error

Page 26: Ebd1 lecture 6&7 2010

26

© The University of Adelaide, School of Dentistry

Possibilities related to decisions about H0:

Types of error (cont.)

(correct decision)Type II Error

Type I Error

probability= α (correct decision)

H0 true H0 false

Accept H0

Reject H0

Actual situation

Investigator’s decision

© The University of Adelaide, School of Dentistry

Type 1 and Type 2 errors can be quite difficult to understand, so let’s look at a few examples to help you grasp the concept.

Let’s hypothesise that two groups of dental patients are equal in their knowledge of preventive hygiene behaviours.

Now consider the following four scenarios. For each, determine whether or not an error has been made and, if so, what type of error.

Types of error (cont.)

Page 27: Ebd1 lecture 6&7 2010

27

© The University of Adelaide, School of Dentistry

1. You accept the null hypothesis when the groups are really equal in oral self-care knowledge.

Answer:

2. You reject the null hypothesis when the groups are really equal in oral self-care knowledge.

Answer:

3. You reject the null hypothesis when the groups are really different in their oral self-care knowledge.

Answer:

4. Accepts the null hypothesis when one group has much more oral self-care knowledge than the other.

Answer:

Types of error (cont.)

© The University of Adelaide, School of Dentistry

1. You accept the null hypothesis when the groups are really equal in oral self-care knowledge.

Answer: Correct decision

2. You reject the null hypothesis when the groups are really equal in oral self-care knowledge.

Answer: Type 1 error

3. You reject the null hypothesis when the groups are really different in their oral self-care knowledge.

Answer: Correct decision

4. Accepts the null hypothesis when one group has much more oral self-care knowledge than the other.

Answer: Type 2 error

Types of error (cont.)

Page 28: Ebd1 lecture 6&7 2010

28

© The University of Adelaide, School of Dentistry

Statistical significance does not necessarily imply that the true difference in population means is of sufficient magnitude to be of clinical importance.

Significance tests tell us whether a difference is statistically significant but significance tests do not tell us whether the difference is of practical importance.

In clinical practice we usually need to know the presence and size of any difference.

Statistical vs Practical significance

© The University of Adelaide, School of Dentistry

P-values only inform you on the likelihood of a difference being attributable to chance.

As the sample size increases and the variance decreases, small differences in mean values may provide statistically significant results.

Whether these ‘statistically significant’ differences are of any practical or clinical significance requires judgement on the part of the clinician.

Statistical vs Practical significance (cont.)

Page 29: Ebd1 lecture 6&7 2010

29

© The University of Adelaide, School of Dentistry

Consider a study comparing a new hypertensive medication (A) with a standard hypertensive medication (B).(Suppose drug A has additional side effects and is more expensive than drug B.)Results

1.Blood pressures of patients receiving A were significantly lowerthan those on B (p-value=0.0001).2. Difference in blood pressure between the groups was 5mmHg.

Interpretation1. Probability that the difference found, or bigger, being

attributable to chance is less than 0.01% or 1 in 10,000.2. But, given the small difference found between the groups,

might consider this difference to be too small to offset the difficulties, side effects and expense associated with drug A.

3. The effect is smaller than clinically meaningful, so we have statistical significance but not clinical/practical significance.

Statistical vs. Practical significance example

© The University of Adelaide, School of Dentistry


Recommended