1 Psych 5500/6500 Statistics and Parameters Fall, 2008.

1

Psych 5500/6500

Statistics and Parameters

Fall, 2008

2

Statistics

Two uses of the term:

1. ‘Statistics’ is a branch of mathematics.

2. ‘Statistics’ are measures that arises from your sample. The mean, variance, and standard deviation of your sample are all ‘statistics’. Statistics are usually symbolized with Roman letters.

3

Parameters

‘Parameters’ are measures that arise from the population from which you sampled. The mean, variance, and standard deviation of the population are ‘parameters’. Parameters are usually symbolized with Greek letters.

4

Estimating Parameters

While it is good to be able to describe your sample (using statistics) the goal of research is to understand the population from which the sample was drawn (i.e. to know the values of parameters). We usually cannot calculate the parameters directly, as that would require that we measure everyone in the population. Thus we need tools to estimate parameters based upon our sample data.

5

Desired Qualities of Estimators

1. Unbiased

2. Consistent

3. Relatively Efficient

6

Unbiased

Any estimate of a population parameter based upon sample data is unlikely to be exactly correct. If several samples are drawn then the estimates of the population parameter are likely to vary across the samples. A method of estimating a parameter is called unbiased if the expected value of the estimate equals the parameter being estimated.

The expected value of the estimate is the mean value that would be obtained if an infinite number of estimates were obtained.

7

Consistent

A method for estimating a parameter is called ‘consistent’ if the probability of the estimate being close to the value of the parameter increases as the sample size increases.

8

Relatively Efficient

A method of estimating a parameter is more ‘efficient’ than other methods if the variance of its estimates is less than the other methods. In other words, for any given N, a method is more efficient if its estimates are more closely clustered around the true value of the parameter than the estimates of the other method.

9

The MeanStatistic: the mean of the sample is a statistic, the

formula for computing it is:

Parameter: the mean of the population is a parameter, its symbol is μ, and the formula for computing it is:

sample in the scores theof all using N

YY

population in the scores theof all using N

YμY

10

Estimating μ

The mean of the sample is an unbiased, consistent, and efficient estimate of the mean of the population.

Yμ est.

Note: be sure to indicate that this is an estimate of μ.

11

Improving our estimate

Our estimate of μ has a higher probability of being close to correct if:

1. We increase N (remember ‘consistency’).

2. We decrease the variance of the variable we are studying.

12

The VarianceStatistic: the variance of the sample is a statistic,

the formula for computing it is:

Parameter: the variance of the population is a parameter, its symbol is σ2 , the formula for computing it is:

sample in the data theall using N

SSS2

population in the data theall using N

SSσ2

13

Estimating σ2

The variance of the sample is a biased estimate of the variance of the population, as the expected value of the sample variances will be less than the variance of the population (in other words the variance of the sample is usually less than the variance of the population).

See handout on why the variance of the sample is usually less than the variance of the population.

14

Unbiased Estimate of σ2

sample thefrom data theusing 1-N

SS σ est. 2

By dividing by (N-1) rather than by (N) we obtain an unbiased estimate of the population variance.

15

The Standard DeviationStatistic: the standard deviation of the sample is a

statistic, its formula is:

Parameter: the standard deviation of the population is a parameter, its symbol is σ the formula for computing it is:

SS 2

σσ 2

16

Estimate of σ

! σ est. σ est. : thatcase not the isit

σσ :hEven thoug

2

2

17

The problem has to do with the distribution of error estimates around the true value of the standard deviation, taking the square root affects estimates that are too high differently than it affects estimates that are too low.

Example: say σ² = 81 and so σ = 9Sample One: est. σ² = 70 (11 below σ²)Sample Two: est. σ² = 92 (11 above σ²)

But:Sample One: 70 = 8.36 (.64 below σ)Sample Two: 92 = 9.59 (.59 above σ)

σ of estimate biased a σ est. 2

18

What this Means

deviation standard population the

of estimate biased a us gives σ est. σ est.

variancepopulation theof

estimate unbiasedan us gives 1-N

SSσ est.

2

2

Despite that we will still use the second formula. The bias of the estimate of σ is kept in the back of our minds but is not important, because the context in which we will use this ‘est. σ’ will take the bias into account.

19

Formulas

estimate) (biased est. est. 9)

parameter) a pop, theof dev (std )8

statistic) a sample, theof dev (std SS 7)

sample thefrom data using 1-N

SS est. 6)

parameter) a pop, theof (variance population thefrom data using N

SS )5

statistic) a sample, theof (variance sample thefrom data using N

SSS 4)

Yμ est. 3)

parameter) a ,population of(mean population thefrom data using N

Y μ 2)

statistic) a sample, of(mean sample thefrom data using N

YY 1)

2

2

2

2

2

2

20

Useful Formulas for ‘Going Back and Forth’

1N

NS σ est.

1N

NS σ est. 22

N

1-Nσ est. S

N

1-N σ est. S 22

21

Other Texts and Software (1)

Some texts use ‘S²’ to represent the variance of the sample (like I do) but use ‘s²’ (lower case ‘s’) rather than ‘est. σ²’ to refer to the estimate of the population variance.

They then use ‘S’ to represent the standard deviation of the sample and ‘s’ rather than ‘est. σ’ to refer to the estimate of the population standard deviation.

22

Other Texts and Software (2)

Many texts use the term ‘sample variance’ to refer to the estimate of the population variance based upon the sample (est. σ²), rather than to the actual variance of the sample, and they have no term for and never refer to the actual variance of the sample. I prefer to use the term ‘sample variance’ to refer to the actual variance of the sample. The best way to tell which variance is being referred to in a context outside this class is to look for whether the formula uses N in the denominator or N-1.

23

Other Texts and Software (3)What SPSS calls ‘Variance’ is: SS/(N-1), the

estimate of the population variance based upon the sample data (est. σ²). What it calls ‘Standard Deviation’ is the square root of that (est. σ). SPSS doesn’t tell you that and its ‘Help’ menu doesn’t either. This is one of the challenges of using statistical software, trying to determine exactly what it is giving you. In this case I found out what it was by computing S² and est. σ² with a calculator and then seeing what value SPSS gave me for the variance of the data.

24

Descriptive and Inferential Statistics

Descriptive statistics are those that describe the sample:

Inferential statistics are those that make inferences about the population. They ‘arise from the sample’ but are used to make estimates about the values of the parameters:

S ,S ,Y 2

est. , est. , est. 2

25

Confidence Intervals

Making an estimate of a parameter does not inform us about how far off that estimate might be, we simply know the estimate is unbiased (i.e. across samples the mean of the estimates equals the value of the parameter).

It is useful to be able to generate a range of possible values of the parameter.

26

Confidence Intervals of the MeanLet’s say our sample is as follows:

Y = 88, 85, 92, 90, 79, 84, 93, 72, 84, 99

This is our single best estimate of μ but it is unlikely to be exactly correct. It is also possible to generate ‘confidence intervals’ concerning μ which will shed light on how far off that estimate might be. We will look at how to compute these in a later lecture, here will we take a look at what they are.

86.6 μ est. Y

27

Confidence IntervalsY = 88, 85, 92, 90, 79, 84, 93, 72, 84, 99

est. μ = 86.6 (This is called a ‘point estimate’).

95% confidence interval: 81.14 μ 92.06This is the interval that we are 95% confident contains the

true value of μ.

99% confidence interval: 78.76 μ 94.44This is the interval that we are 99% confident contains the

true value of μ.

28

Understanding Confidence Intervals

95% confidence interval: 81.14 μ 92.06

99% confidence interval: 78.76 μ 94.44

1. Note that the 99% confidence interval is larger than the 95% interval. To be more confident that the interval contains the true value of μ we need to make the interval larger.

29

Understanding Confidence Intervals

2. Confidence intervals get narrower (which is good as it gives us more precision in our estimate) as N increases or variance decreases.

30

Effect of increasing N. To demonstrate this I’ll simply repeat each score in the sample twice (to simulate doubling N while keeping the variance and the mean of the sample the same):

Y = 88, 85, 92, 90, 79, 84, 93, 72, 84, 99, 88, 85, 92, 90, 79, 84, 93, 72, 84, 93

95% confidence interval when N=20: 83.12 μ 90.08

Compare to 95% confidence interval when N=10:81.14 μ 92.06

Greater N led to narrower (more precise) confidence interval.

31

Effect of decreasing variance. To demonstrate this I’ve gone back to an N of 10 but have decreased the variance (without changing the mean):

Y = 87, 86, 91, 89, 80, 85, 93, 78, 86, 91

95% confidence interval when S²=20.64: 83.17 μ 90.03

Compare to 95% confidence interval when S²=52.44 :81.14 μ 92.06

Less variance led to narrower (more precise) confidence interval.

32

Understanding Confidence Intervals3. a) One common mistake is to say that if our

95% confidence interval is: 47 μ 53, then that means that 95% of our sample means will fall in that range. The confidence interval, however, is about the possible values of μ, not about the possible values of the sample mean.

b) Another common mistake is to say that there is a 95% chance that μ is between 47 and 53. What is correct, however, is to say that the formula for computing the confidence interval will produce an interval that contains the true value of μ 95% of the time. See supplemental handout.

33

Other Confidence Intervals

Confidence intervals are available for other parameters as well, including the variance and the standard deviation.

Date post:	24-Dec-2015
Category:	Documents
Upload:	abigail-goodwin
View:	213 times
Download:	0 times

1 Psych 5500/6500 Statistics and Parameters Fall, 2008.

Documents