+ All Categories
Home > Documents > SADC Course in Statistics Introduction to Statistical Inference (Session 03)

SADC Course in Statistics Introduction to Statistical Inference (Session 03)

Date post: 28-Mar-2015
Category:
Upload: molly-doyle
View: 218 times
Download: 1 times
Share this document with a friend
Popular Tags:
15
SADC Course in Statistics Introduction to Statistical Inference (Session 03)
Transcript
Page 1: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

SADC Course in Statistics

Introduction to Statistical Inference

(Session 03)

Page 2: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

2To put your footer here go to View > Header and Footer

Learning Objectives

By the end of this session, you will be able to • explain what is meant by statistical

inference• explain what is meant by an estimate of a

population parameter• explain what is meant by the sampling

distribution of an estimate• calculate and interpret the standard error

of a sample mean from data of a simple random sample

Page 3: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

3To put your footer here go to View > Header and Footer

What is statistical inference?

• Inference is about drawing conclusions about population characteristics using information gathered from the sample

• It will be assumed for the remainder of this module that the sample is representative of the population

• We shall further assume that the sample has been drawn as a simple random sample from an infinite population

Page 4: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

4To put your footer here go to View > Header and Footer

Estimating population parameters

Population Sample

Mean Variance 2 s2

Std. deviation s

x

• Population characteristics (parameters) are unknown, so use greek letters to denote population mean and standard deviation

• Sample characteristics are measurable and known, so use latin letters. They form estimates of the population values.

Page 5: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

5To put your footer here go to View > Header and Footer

An example of statistical inference

• What is the mean land holding size owned by rural households in district Kilindi in the Tanga region of Tanzania?

• Data from 404 households surveyed in this district gave a mean land holding size of 7.62 acres with a standard deviation 6.81.

• Our best estimate of the mean landholding size in Kilindi district is therefore 7.62 acres.

What results are likely if we sampled again with a different set of households?

Page 6: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

6To put your footer here go to View > Header and Footer

A brief return to Practical 2…• In practical 2, you sampled 5 Uganda

districts twice. Look back at the mean and standard deviation of each sample.

• You will notice the answers are different each time you sample, i.e. there is variability in the sample means.

• If we took many more samples, we could produce a histogram of the means of these samples.

An example follows…

Page 7: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

7To put your footer here go to View > Header and Footer

The distribution of means• Suppose 10 University students were given

a standard meal and the time taken to consume the meal was recorded for each.

• Suppose the 10 values gave:mean = 11.24, with std.dev.= 0.864

• Let’s assume this exercise was repeated 50 times with different samples of students

• A histogram of the resulting 500 obs. appears below, followed by a histogram of the 50 means from each sample

Page 8: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

8To put your footer here go to View > Header and Footer

Histogram of raw data

The data appear to follow a normal distribution

Page 9: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

9To put your footer here go to View > Header and Footer

Histogram of the 50 sample means

The distn of the sample means is called its Sampling Distribution

Notice that the variability of the above distn is smaller than the variability of the raw data

Page 10: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

10To put your footer here go to View > Header and Footer

Back to estimation…

The estimate of the mean landholding size in Kilindi district is 7.62 acres.

Is this sufficient for reporting purposes, given that this answer is based on one particular sample?

What we have is an estimate based on a sample of size 404. But how good is this estimate?

We need a measure of the precision, i.e. variability, of this estimate…

Page 11: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

11To put your footer here go to View > Header and Footer

Sampling Variability

The accuracy of the sample mean as an estimate of depends on:

(i) the sample size (n)

since the more data we collect, the more we know about the population, and the

(ii) inherent variability in the data 2

These two quantities must enter the measure of precision of any estimate of a population parameter. We aim for high precision, i.e. low standard error!

x

Page 12: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

12To put your footer here go to View > Header and Footer

Standard error of the mean

Precision of as estimate of is given by:

the standard error of the mean.

– Also written as s.e.m., or sometimes s.e.

Estimate using sample data: s/n

For example on landholding size,

s.e.=6.81/404 = 6.81/20.1 = 0.339

x

s.e. x n

Page 13: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

13To put your footer here go to View > Header and Footer

SummaryIf we had repeated samples (same size) taken from the same population: sample means would vary standard error of the mean is a measure of

variability of sample means over (hypothetically drawn) repeated samples

distribution of sample means over repeated samples is called the sampling distribution of the mean, ~ N(, 2/n)

The lower the value of the standard error, the greater is the precision of the estimate

x

Page 14: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

14To put your footer here go to View > Header and Footer

ReferencesSSC (2000b) Confidence and Significance: Key Concepts of Inferential Statistics. Statistical Guidelines Series supporting DFID Natural Resources Projects, Statistical Services Centre, The University of Reading, UK. www.reading.ac.uk/ssc/publications/guides.html

Owen, F. and Jones, R. (1990). Statistics. 3rd edn. Pitman Publishing, London, pp 480.

Clarke, G.M. and Cooke, D. (2004). A Basic Course in Statistics. 5th edn. Edward Arnold.

Page 15: SADC Course in Statistics Introduction to Statistical Inference (Session 03)

15To put your footer here go to View > Header and Footer

Practical work follows to ensure learning objectives are

achieved…


Recommended