+ All Categories
Home > Documents > STA301_LEC33

STA301_LEC33

Date post: 22-Feb-2016
Category:
Upload: amin-butt
View: 213 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
99
Virtual University of Pakistan Lecture No. 33 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah
Transcript
Page 1: STA301_LEC33

Virtual University of PakistanLecture No. 33 of the course on

Statistics and Probability

by

Miss Saleha Naghmi Habibullah

Page 2: STA301_LEC33

IN THE LAST LECTURE, YOU LEARNT

Sampling Distribution of p̂

Sampling Distribution of21 XX

Page 3: STA301_LEC33

TOPICS FOR TODAY

Sampling Distribution of (continued) Sampling Distribution of Point EstimationDesirable Qualities of a Good Point Estimator

–Unbiasedness–Consistency

21 p̂p̂

21 XX

Page 4: STA301_LEC33

We illustrate the real-life application of the sampling distribution of

with the help of the following example:

21 XX

Page 5: STA301_LEC33

EXAMPLECar batteries produced by

company A have a mean life of 4.3 years with a standard deviation of 0.6 years.

A similar battery produced by company B has a mean life of 4.0 years and a standard deviation of 0.4 years.

Page 6: STA301_LEC33

What is the probability that a random sample of 49 batteries from company A will have a mean life of at least 0.5 years more than the mean life of a sample of 36 batteries from company B?

Page 7: STA301_LEC33

SOLUTION We are given the following data:

Population A:

1 = 4.3 years, 1 = 0.6 years, Sample size: n1 = 49

Population B:

2 = 4.0 years, 2 = 0.4 years, Sample size: n2 = 36

Page 8: STA301_LEC33

Both sample sizes (n1 = 49, n2 = 36) are large enough to assume that the sampling distribution of the differences is approximately a normal such that:

21 XX

Page 9: STA301_LEC33

years3.00.43.421xx 21

and standard deviation:

.years1086.03616.0

4936.0

nn 2

22

1

21

xx 21

Mean:

Page 10: STA301_LEC33

Thus the variable

2

22

1

21

2121

nn

XXZ

1086.0

3.0XX 21

is approximately N (0, 1)

Page 11: STA301_LEC33

We are required to find the probability that the mean life of 49 batteries produced by company A will have a mean life of at least 0.5 years longer than the mean life of 36 batteries produced by company B, i.e.

Page 12: STA301_LEC33

5.0XXngTransformi 21

to z-value, we find that:

84.11086.0

3.05.0z

.5.0XXP 21

We are required to find:

Page 13: STA301_LEC33

0.3 0.5

0 1.84 Z

21 XX

Page 14: STA301_LEC33

Hence, using the table of areas under normal curve, we find:

84.1ZP5.0XXP 21

0329.04671.05.0

84.1Z0P5.0

Page 15: STA301_LEC33

In other words, (given that the real difference between the mean lifetimes of batteries of company A and batteries of company B is 4.3 - 4.0 = 0.3 years), the probability that a sample of 49 batteries produced by company A will have a mean life of at least 0.5 years longer than the mean life of a sample of 36 batteries produced by company B, is only 3.3%.

Page 16: STA301_LEC33

Next, we consider the

Sampling Distribution of the Differences between Proportions.

Page 17: STA301_LEC33

Suppose there are two binomial populations with proportions of successes p1 and p2 respectively.

Page 18: STA301_LEC33

L et in d ep en d en t ra n d o m sa m p les o f s izes n 1 a n d n 2 b e d raw nfrom th e resp ec tiv e p op u la tion s,an d th e d ifferen ces 21 p̂p̂ b etw eenth e p rop ortio n s o f a ll p o ssib le p a irs o f sa m p les b e com p u ted .

Page 19: STA301_LEC33

Then, a probability distribution of the differences 21 p̂p̂ can be obtained.Such a probability distribution iscalled the sam pling distribution of the differences between theproportions 21 p̂p̂ .

Page 20: STA301_LEC33

We illustrate the sampling distribution of

with the help of the following example:

21 p̂p̂

Page 21: STA301_LEC33

EXAMPLE:It is claimed that 30% of the

households in Community A and 20% of the households in Community B have at least one teenager.

A simple random sample of 100 households from each community yields the following results:

.13.0p̂,34.0p̂ BA

Page 22: STA301_LEC33

What is the probability of observing a difference this large or larger if the claims are true?

Page 23: STA301_LEC33

S O L U T I O N

W e a s s u m e t h a t i f t h e c l a i m s a r e t r u e , t h es a m p l i n g d i s t r i b u t i o n o f BA p̂p̂ i s a p p r o x i m a t e l yn o r m a l l y d i s t r i b u t e d ( a s , i n t h i s e x a m p l e , b o t h t h es a m p l e s i z e s a r e l a r g e e n o u g h f o r u s t o a p p l y t h en o r m a l a p p r o x i m a t i o n t o t h e b i n o m i a l d i s t r i b u t i o n ) .

S i n c e w e a r e r e a s o n a b l y c o n f i d e n t t h a t o u rs a m p l i n g d i s t r i b u t i o n i s a p p r o x i m a t e l y n o r m a l l yd i s t r i b u t e d , h e n c e w e w i l l b e f i n d i n g a n y r e q u i r e dp r o b a b i l i t y b y c o m p u t i n g t h e r e l e v a n t a r e a s u n d e ro u r n o r m a l c u r v e , a n d , i n o r d e r t o d o s o , w e w i l lf i r s t n e e d t o c o n v e r t o u r v a r i a b l e BA p̂p̂ t o Z .

Page 24: STA301_LEC33

In order to convert

to Z, we need the values of

as well as

It can be mathematically proved that:

BA p̂p̂

BA P̂P̂ .BA P̂P̂

Page 25: STA301_LEC33

P R O P E R T I E S O F T H E S A M P L I N G D I S T R I B U T I O N O F 21 p̂p̂ :

P r o p e r t y N o . 1 : T h e m e a n o f t h e s a m p l i n g

d i s t r i b u t i o n o f 21 p̂p̂ , d e n o t e d b y,

21 P̂P̂ i s e q u a l t o t h e d i f f e r e n c eb e t w e e n t h e p o p u l a t i o n p r o p o r t i o n s ,t h a t i s .pp 21p̂p̂ 21

Page 26: STA301_LEC33

P r o p e r t y N o . 2 :

T h e s t a n d a r d d e v i a t i o n o f t h es a m p l i n g d i s t r i b u t i o n o f 21 p̂p̂ , ( i . e .t h e s t a n d a r d e r r o r o f 21 p̂p̂ ) d e n o t e d b y 21 p̂p̂ i s g i v e n b y

,n

qpn

qp

2

22

1

11p̂p̂ 21

w h e r e q = 1 – p .

Page 27: STA301_LEC33

H e n c e , i n t h i s e x a m p l e , w e h a v e :

10.020.030.0BA p̂p̂

a n d 0037.0

10080.020.0

10070.030.02

p̂p̂ BA

Page 28: STA301_LEC33

T h e o b s e r v e d d i f f e r e n c e i n s a m p l e p r o p o r t i o n s i s 21.013.034.0p̂p̂ BA T h e p r o b a b i l i t y t h a t w e w i s h t o d e t e r m i n ei s r e p r e s e n t e d b y t h e a r e a t o t h e r i g h t o f 0 . 2 1 i n t h e s a m p l i n g d i s t r i b u t i o n o f BA p̂p̂ . T o f i n d t h i s a r e a , w e c o m p u t e

83.106.011.0

0037.010.021.0z

Page 29: STA301_LEC33

0.10

1.830

0.21BA p̂p̂

Z

Page 30: STA301_LEC33

By consulting the Area Table of the standard normal distribution, we find that the area between z = 0 and z = 1.83 is 0.4664. Hence, the area to the right of z = 1.83 is 0.0336.

This probability is shown in following figure:

Page 31: STA301_LEC33

0.10

1.830

0.21BA p̂p̂

Z

0.4664 0.0336

Page 32: STA301_LEC33

Thus, if the claim is true, the probability of observing a difference as larger as or larger than the actually observed is only 0.0336 i.e. 3.36%.

Page 33: STA301_LEC33

The students are encouraged to try to interpret this result with reference to the situation at hand, as, in attempting to solve a statistical problem, it is very important not just to apply various formulae and obtain numerical results, but to interpret the results with reference to the problem under consideration.

Page 34: STA301_LEC33

Does the result indicate that at least one of the two claims is untrue, or does it imply something else?

Page 35: STA301_LEC33

Before we close the basic discussion regarding sampling distributions, we would like to draw the students’ attention to the following two important points:

Page 36: STA301_LEC33

1) We have discussed various sampling distributions with reference to the simplest technique of random sampling, i.e. simple random sampling.

And, with reference to simple random sampling, it should be kept in mind that this technique of sampling is appropriate in that situation when the population is homogeneous.

Page 37: STA301_LEC33

2) Let us consider the reason why the standard deviation of the sampling distribution of any statistic is known as its standard error:

Page 38: STA301_LEC33

To answer this question, consider the fact that any statistic, considered as an estimate of the corresponding population parameter, should be as close in magnitude to the parameter as possible.

Page 39: STA301_LEC33

The difference between the value of the statistic and the value of the parameter can be regarded as an error --- and is called ‘sampling error’.

Page 40: STA301_LEC33

Geometrically, each one of these errors can be represented by horizontal line segment below the X-axis, as shown below:

Page 41: STA301_LEC33

x

3x1x 2x4x5x6x

Sampling Distribution of :x

Page 42: STA301_LEC33

The above diagram clearly indicates that there are various magnitudes of this error, depending on how far or how close the values of our statistic are in different samples.

Page 43: STA301_LEC33

The standard deviation of X gives us a ‘standard’ value of this error, and hence the term ‘Standard Error’.

Page 44: STA301_LEC33

Having presented the basic ideas regarding sampling distributions, we now begin the discussion regarding POINT ESTIMATION:

Page 45: STA301_LEC33

POINT ESTIMATION

Point estimation of a population parameter provides as an estimate a single value calculated from the sample that is likely to be close in magnitude to the unknown parameter.

Page 46: STA301_LEC33

The difference between ‘Estimate’ and ‘Estimator’:

An estimate is a numerical value of the unknown parameter obtained by applying a rule or a formula, called an estimator, to a sample X1, X2, …, Xn of size n, taken from a population.

Page 47: STA301_LEC33

In other words, an estimator stands for the rule or method that is used to estimate a parameter

whereas an estimate stands for the numerical value obtained by substituting the sample observations in the rule or the formula.

Page 48: STA301_LEC33

For instance:

Page 49: STA301_LEC33

I f X 1 , X 2 , … , X n i s a r a n d o ms a m p le o f s iz e n f r o m a p o p u la t io n

w ith m e a n , th e n i

n

1iX

n1X

i s a n

e s t im a to r o f , a n d x , th e n u m e r ic a l v a lu e o f X , i s a n e s t im a te o f ( i .e . a p o in t e s t im a t e o f ) .

Page 50: STA301_LEC33

In general, the (the Greek letter ) is customarily used todenote an unknown parameterthat could be a mean, median, proportion or standard deviation, while an estimator of iscommonly denoted by ̂, orsometimes by T.

Page 51: STA301_LEC33

It is important to note that an estimator is always a statistic which is a function of the sample observations and hence is a random variable as the sample observations are likely to vary from sample to sample.

In other words:

Page 52: STA301_LEC33

In repeated sampling, an estimator is a random variable, and has a probability distribution, which is known as its sampling distribution.

Page 53: STA301_LEC33

Having presented the basic definition of a point estimator, we now consider some desirable qualities of a good point estimator:

Page 54: STA301_LEC33

In this regard, the point to be understood is that a point estimator is considered a good estimator if it satisfies various criteria.

Three of these criteria are:

Page 55: STA301_LEC33

DESIRABLE QUALITIES OF A GOOD POINT ESTIMATOR

•unbiasedness•consistency•efficiency

Page 56: STA301_LEC33

The concept of unbiasedness is explained below:

Page 57: STA301_LEC33

U N B I A S E D N E S S

A n e s t i m a t o r i s d e f i n e d t o b e u n b i a s e d i f t h e s t a t i s t i c u s e d a s a n e s t i m a t o r h a s i t se x p e c t e d v a l u e e q u a l t o t h e t r u e v a l u e o f t h e p o p u l a t i o n p a r a m e t e r b e i n g e s t i m a t e d . I n o t h e r w o r d s , l e t ̂ b e a n e s t i m a t o r o f ap a r a m e t e r . T h e n ̂ w i l l b e c a l l e d a nu n b i a s e d e s t i m a t o r i f .ˆE

I f ,ˆE t h e s t a t i s t i c i s s a i d t o b e ab i a s e d e s t i m a t o r .

Page 58: STA301_LEC33

EXAMPLE

Let us consider the sample mean Xas an estimator of the population mean .

Then we have =

and .Xn1Xˆ

i

n

1i

Now, we know that XE

i.e. .ˆE Hence, X is an unbiased estimator of .

Page 59: STA301_LEC33

Let us illustrate the concept of unbiasedness by considering the example of the annual Ministry of Transport test that was presented in the last lecture:

Page 60: STA301_LEC33

EXAMPLELet us examine the case of an

annual Ministry of Transport test to which all cars, irrespective of age, have to be submitted.

The test looks for faulty breaks, steering, lights and suspension, and it is discovered after the first year that approximately the same number of cars have 0, 1, 2, 3, or 4 faults.

Page 61: STA301_LEC33

The above situation is equivalent to the following:

Page 62: STA301_LEC33

If we let X denote the number of faults in a car, then

X can take the values 0, 1, 2, 3, and 4,

and the probability of each of these X values is 1/5.

Page 63: STA301_LEC33

Hence, we have the following probability distribution:

Page 64: STA301_LEC33

No. ofFaulty Items

(X)

Probabilityf(x)

0 1/51 1/52 1/53 1/54 1/5

Total 1

Page 65: STA301_LEC33

2xxfXE

MEAN OF THE POPULATION DISTRIBUTION:

Page 66: STA301_LEC33

We are interested in considering the results that would be obtained if a sample of only two cars is tested.

Page 67: STA301_LEC33

The students will recall that we obtained 52 = 25 different possible samples, and, computing the mean of each possible sample, we obtained the following sampling distribution of X:

Page 68: STA301_LEC33

Sample Mean Probabilityx P(X =x)0.0 1/250.5 2/251.0 3/251.5 4/252.0 5/252.5 4/253.0 3/253.5 2/254.0 1/25Total 25/25=1

Page 69: STA301_LEC33

We computed the mean of this sampling distribution, and found that the mean of the sample means i.e. comes out to be equal to 2 --- exactly the same as the mean of the population !

x

Page 70: STA301_LEC33

We find that:

22550xfxx

i.e. the mean of the sampling distribution ofX is equal to the population mean.

Page 71: STA301_LEC33

By virtue of this property, we say that the sample mean is an UNBIASED estimate of the population mean.

Page 72: STA301_LEC33

It should be noted that this property,

always holds – regardless of the sample size.

,x

Page 73: STA301_LEC33

Unbiasedness is a propertythat requires that theprobability distribution of ̂ be necessarily centered at the parameter , irrespective of the value of n.

Page 74: STA301_LEC33

X

XE

Visual Representation of the Concept of Unbiasedness:

implies that the distribution of is centered at .X

Page 75: STA301_LEC33

What this means is that, although many of the individual sample means are either under-estimates or over-estimates of the true population mean, in the long run, the over-estimates balance the under-estimates so that the mean value of the sample means comes out to be equal to the population mean.

Page 76: STA301_LEC33

Let us now consider some other estimators which possess the desirable property of being unbiased:

Page 77: STA301_LEC33

The sample median is also an unbiased estimator of when the population is normally distributed

(i.e.If X is normally distributed, then

).X~E

Page 78: STA301_LEC33

Also, as far as p, the proportion of successes in the sample is concerned, we have:

Page 79: STA301_LEC33

pnnp

XEn1

nXEp̂E

Considering the binomial random variable X (which denotes the number of successes in n trials), we have:

Hence, the sample proportion is an unbiased estimator of the population parameter p.

Page 80: STA301_LEC33

But

Page 81: STA301_LEC33

As far as the sample variance S2 is concerned, it can be mathematically proved that

E(S2) 2.

Hence, the sample variance S2 is a biased estimator of 2.

Page 82: STA301_LEC33

F o r a n y p o p u l a t i o n p a r a m e t e r a n di t s e s t i m a t o r ̂ , t h e q u a n t i t y ̂E i sk n o w n a s t h e a m o u n t o f b i a s .

T h i s q u a n t i t y i s p o s i t i v e i f ,ˆE

a n d i s n e g a t i v e i f ,ˆE

a n d , h e n c e , t h e e s t i m a t o r i s s a i d t o b e p o s i t i v e l y b i a s e d w h e n ̂E a n d

n e g a t i v e l y b i a s e d w h e n ̂E .

Page 83: STA301_LEC33

Since unbiasedness is a desirable quality, we would like the sample variance to be an unbiased estimator of 2.

In order to achieve this end, the formula of the sample variance is modified as follows:

Page 84: STA301_LEC33

1nxxs

22

Modified formula for the sample variance:

Since E(s2) = 2, hence s2 is an unbiased estimator of 2.

Page 85: STA301_LEC33

Why is unbiasedness consider a desirable property of an estimator?

In order to obtain an answer to this question, consider the following:

Page 86: STA301_LEC33

With reference to the estimation of the population mean , we note that, in an actual study, the probability is very high that the mean of our sample i.e.X will either be less than or more than .

Hence, in an actual study, we can never guarantee that our X will coincide with .

Page 87: STA301_LEC33

Unbiasedness implies that, although in an actual study, we cannot guarantee that our sample mean will coincide with , our estimation procedure (i.e. formula) is such that, in repeated sampling, the average value of our statistic will be equal to .

Page 88: STA301_LEC33

The next desirable quality of a good point estimator is consistency:

Page 89: STA301_LEC33

C O N S I S T E N C Y

A n e s t i m a t o r ̂ i s s a i d t o b ea c o n s i s t e n t e s t i m a t o r o f t h ep a r a m e t e r i f , f o r a n ya r b i t r a r i l y s m a l l p o s i t i v eq u a n t i t y e ,

.1eˆPLimn

Page 90: STA301_LEC33

In other words, an estimator ̂is called a consistent estimatorof if the probability that ̂ is very close to , approaches unity with an increase in the sample size.

Page 91: STA301_LEC33

It should be noted that consistency is a large sample property.

Page 92: STA301_LEC33

Another point to be noted is that a consistent estimator may or may not be unbiased.

Page 93: STA301_LEC33

T he sam ple m ean

i

n

1iX

n1X

, w h ich is an

unbiased estim ator of , is a consistent estim ator of the m ean .

Page 94: STA301_LEC33

The sample proportion

is also a consistent estimator of the parameter p of a population that has a binomial distribution.

Page 95: STA301_LEC33

The median is not a consistent estimator of when the population has a skewed distribution.

Page 96: STA301_LEC33

The sample variance

though a biased estimator, is a consistent estimator of the population variance 2.

,XXn1S 2

i

n

1i

2

Page 97: STA301_LEC33

Generally speaking, it can be proved that a statistic whose STANDARD ERROR decreases with an increase in the sample size, will be consistent.

Page 98: STA301_LEC33

IN TODAY’S LECTURE, YOU LEARNT

Sampling Distribution of (continued) Sampling Distribution of Point EstimationDesirable Qualities of a Good Point Estimator

–Unbiasedness–Consistency

21 p̂p̂

21 XX

Page 99: STA301_LEC33

IN THE NEXT LECTURE, YOU WILL LEARN

Desirable Qualities of a Good Point Estimator:Efficiency

•Methods of Point Estimation:The Method of MomentsThe Method of Least SquaresThe Method of Maximum Likelihood

•Interval Estimation:•Confidence Interval for