Econometrics MLE Probit Logit

transcript

8/12/2019 Econometrics MLE Probit Logit

1/64

Applied Econometrics

Master of Applied Economics Program

Universitas Padjadjaran


2/64

Today

Introduction to Maximum Likelihood

Estimation

Application of Maximum Likelihood Estimation

Limited Dependent Variable Models

Probit

Logit


3/64

Additional References

Dougherty, Introduction to Econometrics, 4th

Ed, 2011 *best for basics*

Freund, J., Mathematical Statistics, 1992

Myung, IJ., Tutorial on maximum likelihood

estimation,Journal of Mathematical

Psychology 47, 2003

Ramachandran & Sokos, Mathematical

Statistics with Applications,2009


4/64

Method of ML

The method of maximum likelihood is

intuitively appealing, because we attempt to

find the values of the true parametersthat

would have most likelyproduced the data that

we in fact observed.

For most cases of practical interest, the

performance of maximum likelihoodestimators is optimal for large enough data.


5/64

Method of ML

To compute the likelihood we need to have a

good understanding of probability distribution

(density function)


6/64

Probabilities: Discrete Data

If our data is discrete random variable, we have the

(discrete) probability distributionof the data

A table, formula or graph that lists all possible values a

discrete random variable can assume, together with

associated probabilities

ImportantBinomial, Poisson


7/64

Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.Subject to respect for copyright and, where appropriate, attribution, they may be

used as a resource for teaching an econometrics course. There is no need to

refer to the author.

The content of this slideshow comes from Section R.2 of C. Dougherty,

In troduct ion to Econom etr ics, fourth edition 2011, Oxford University Press.

Additional (free) resources for both students and instructors may be

downloaded from the OUP Online Resource Centre

http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit

from participation in a formal course should consider the London School of

Economics summer school course

EC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

or the University of London International Programmes distance learning course

EC2020 Elements of Econometrics

www.londoninternational.ac.uk/lse.

2012.09.01
http://www.oup.com/uk/orc/bin/9780199567089/http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspxhttp://c/Documents%20and%20Settings/vacharop/Local%20Settings/Temporary%20Internet%20Files/www.londoninternational.ac.uk/lsehttp://c/Documents%20and%20Settings/vacharop/Local%20Settings/Temporary%20Internet%20Files/www.londoninternational.ac.uk/lsehttp://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspxhttp://www.oup.com/uk/orc/bin/9780199567089/


8/641

PROBABILITY DISTRIBUTION EXAMPLE: XIS THE SUM OF TWO DICE

red 1 2 3 4 5 6

Dougherty 2012


9/642


red 1 2 3 4 5 6green

1

2

3

4

5

6

Dougherty 2012


10/643



1

2

3

4

5

6

Dougherty 2012


11/644



1

2

3

4

5

6 10

Dougherty 2012


12/645



1

2

3

4

5 7

6

Dougherty 2012


13/646



1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

. Dougherty 2012


14/647



1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

X f p

2 1 1/36

3 2 2/36

4 3 3/36

5 4 4/36

6 5 5/36

7 6 6/368 5 5/36

9 4 4/36

10 3 3/36

11 2 2/36

12 1 1/36

Dougherty 2012


15/648



1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

X f p

2 1 1/36

3 2 2/36

4 3 3/36

5 4 4/36

6 5 5/36

7 6 6/368 5 5/36

9 4 4/36

10 3 3/36

11 2 2/36

12 1 1/36

Dougherty 2012


16/64

9



1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

X f p

2 1 1/36

3 2 2/36

4 3 3/36

5 4 4/36

6 5 5/36

7 6 6/368 5 5/36

9 4 4/36

10 3 3/36

11 2 2/36

12 1 1/36

Dougherty 2012


17/64

10



1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

X f p

2 1 1/36

3 2 2/36

4 3 3/36

5 4 4/36

6 5 5/36

7 6 6/368 5 5/36

9 4 4/36

10 3 3/36

11 2 2/36

12 1 1/36

Dougherty 2012

PROBABILITY DISTRIBUTION EXAMPLE IS THE SUM OF TWO DICE


18/64

11



1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

X f p

2 1 1/36

3 2 2/36

4 3 3/36

5 4 4/36

6 5 5/36

7 6 6/368 5 5/36

9 4 4/36

10 3 3/36

11 2 2/36

12 1 1/36

Dougherty 2012

PROBABILITY DISTRIBUTION EXAMPLE IS THE SUM OF TWO DICE


19/64

12



1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

X f p

2 1 1/36

3 2 2/36

4 3 3/36

5 4 4/36

6 5 5/36

7 6 6/368 5 5/36

9 4 4/36

10 3 3/36

11 2 2/36

12 1 1/36

Dougherty 2012

PROBABILITY DISTRIBUTION EXAMPLE X IS THE SUM OF TWO DICE


20/64

13



1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

X f p

2 1 1/36

3 2 2/36

4 3 3/36

5 4 4/36

6 5 5/36

7 6 6/368 5 5/36

9 4 4/36

10 3 3/36

11 2 2/36

12 1 1/36

Dougherty 2012

PROBABILITY DISTRIBUTION EXAMPLE X IS THE SUM OF TWO DICE


21/64

14

6__

36

5__

36

4__

36

3__

36

2__

36

2__

36

3__

36

5__

36

4__

36

probability

2 3 4 5 6 7 8 9 10 11 12 X

1

36

1

36


Dougherty 2012


22/64

Discrete Probability Distribution when

we have more than 1 RV

The distribution of a single random variable is known

as a univariate distribution

But we might be interested in the intersection of two

events, in which case we need to look at joint

distributions

Thejoint (probability) distributions of two or more

random variables are termed bivariate ormultivariate distributions


23/64


when we have more than 1 RV

If individual observations (yi) are statistically

independent of one another, then according to the

theory of probability, the PDF for the data y=(y1, y2,

, yn) given the parameter vector wcan be expressed

as a multiplication of PDFs for individual observations


24/64


we have more than 1 RV


25/64

Normal Distribution

2)(2

1

2

1

)(

x

exf

Note constants:

=3.14159

e=2.71828

This is a bell shaped curvewith different centers and

spreads depending on

and


26/64

Method of ML

The method of maximum likelihood is

intuitively appealing, because we attempt to

find the values of the true parametersthat

would have most likelyproduced the data thatwe in fact observed.

For most cases of practical interest, the

performance of maximum likelihoodestimators is optimal for large enough data.


27/64

1

L

p

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8

0.00

0.02

0.04

0.06

0 1 2 3 4 5 6 7 8

This sequence introduces the

principle of maximum likelihood

estimation and illustrates it withsome simple examples.

Suppose that you have a normally-

distributed random variableXwith

unknown population mean and

standard deviation , and that you

have a sample of two

observations, 4 and 6. For the

time being, we will assume that

is equal to 1.

Suppose initially you consider the

hypothesis = 3.5. Under this

hypothesis the probability density

at 4 would be 0.3521 and that at 6would be 0.0175.


28/64

L

p

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8

0.00

0.02

0.04

0.06

0 1 2 3 4 5 6 7 8

p(4) p(6)3.5 0.3521 0.0175

0.3521

0.0175

Suppose initially you

consider the hypothesis =3.5. Under this hypothesis

the probability density at 4

would be 0.3521 and that at

6 would be 0.0175.

INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION


29/64

4


The joint probability density, shown in the bottom chart, is the product of these, 0.0062.

p(4) p(6) L 3.5 0.3521 0.0175 0.0062

L

p

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8

0.00

0.02

0.04

0.06

0 1 2 3 4 5 6 7 8

0.3521

0.0175



30/64

5


Next consider the hypothesis = 4.0. Under this hypothesis the probability densitiesassociated with the two observations are 0.3989 and 0.0540, and the joint probability

density is 0.0215.

p(4) p(6) L 3.5 0.3521 0.0175 0.0062

4.0 0.3989 0.0540 0.0215

L

p

0.00

0.02

0.04

0.06

0 1 2 3 4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8

0.3989

0.0540



31/64

6


Under the hypothesis = 4.5, the probability densities are 0.3521 and 0.1295, and the jointprobability density is 0.0456.

p(4) p(6) L 3.5 0.3521 0.0175 0.0062

4.0 0.3989 0.0540 0.0215

4.5 0.3521 0.1295 0.0456

L

p

0.00

0.02

0.04

0.06

0 1 2 3 4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8

0.3521

0.1295



32/64

7


Under the hypothesis = 5.0, the probability densities are both 0.2420 and the jointprobability density is 0.0585.

p(4) p(6) L 3.5 0.3521 0.0175 0.0062

4.0 0.3989 0.0540 0.0215

4.5 0.3521 0.1295 0.0456

5.0 0.2420 0.2420 0.0585L

p

0.00

0.02

0.04

0.06

0 1 2 3 4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8

0.24200.2420



33/64

8


Under the hypothesis = 5.5, the probability densities are 0.1295 and 0.3521 and the jointprobability density is 0.0456.

p(4) p(6) L 3.5 0.3521 0.0175 0.0062

4.0 0.3989 0.0540 0.0215

4.5 0.3521 0.1295 0.0456

5.0 0.2420 0.2420 0.0585

5.5 0.1295 0.3521 0.0456

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8

0.00

0.02

0.04

0.06

0 1 2 3 4 5 6 7 8

L

p

0.3521

0.1295



34/64

9

The complete joint density function for all values of has now been plotted in the lowerdiagram. We see that it peaks at = 5.

p(4) p(6) L 3.5 0.3521 0.0175 0.0062

4.0 0.3989 0.0540 0.0215

4.5 0.3521 0.1295 0.0456

5.0 0.2420 0.2420 0.0585

5.5 0.1295 0.3521 0.0456

0.00

0.02

0.04

0.06

0 1 2 3 4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8

p

L

0.1295

0.3521



35/64

10

Now we will look at the mathematics of the example. If Xis normally distributed with mean

and standard deviation , its density function is as shown.

2

2

1

2

1)(

X

eXf



36/64

11

For the time being, we are assuming is equal to 1, so the density function simplifies to thesecond expression.

2

2

1

2

1)(

XeXf

2

2

1

2

1)(

X

eXf



37/64

12

Hence we obtain the probability densities for the observations where X = 4 and X= 6.

2

421

2

1)4(

ef

2

621

2

1)6(

ef

22

1

2

1)(

XeXf

2

2

1

2

1)(

X

eXf



38/64

13

The joint probability density for the two observations in the sample is just the product of

their individual densities.

2

621

2

1)6(

ef

22

1

2

1)(

XeXf

2

2

1

2

1)(

X

eXf

26

2

124

2

1

2

1

2

1 eejoint density

2

421

2

1)4(

ef



39/64

14

In maximum likelihood estimation we choose as our estimate of the value that gives us thegreatest joint density for the observations in our sample. This value is associated with the

greatest probability, or maximum likelihood, of obtaining the observations in the sample.

2

2

1

2

1)(

X

eXf

22

1

2

1)(

XeXf

2421

2

1)4(

ef

2

621

2

1)6(

ef

26

2

124

2

1

2

1

2

1 eejoint density


40/64

MLE AND REGRESSION ANALYSIS

MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS


41/64

1

X

Y

Xi

1

1+ 2Xi

We will now apply the maximum likelihood principle to regression analysis, using the simple linear model

Y = 1+ 2X + u.



42/64

2

The black marker shows the value that Ywould have ifXwere equal toXiand if there were no

disturbance term.

X

Y

Xi

1

1+ 2Xi



43/64

3

However we will assume that there is a disturbance term in the model and that it has a normal

distribution as shown.

X

Y

Xi

1

1+ 2Xi



44/64

4

Relative to the black marker, the curve represents the ex ante distribution for u, that is, its potential

distribution before the observation is generated. Ex post, of course, it is fixed at some specific value.

X

Y

Xi

1

1+ 2Xi



45/64

5

Relative to the horizontal axis, the curve also represents the ex ante distribution for Yfor that

observation, that is, conditional onX=Xi.

X

Y

Xi

1

1+ 2Xi



46/64

6

Potential values of Yclose to 1+ 2Xiwill have relatively large densities ...

X

Y

Xi

1

1+ 2Xi



47/64

X

Y

Xi

1

1+ 2Xi

7

... while potential values of Yrelatively far from 1+ 2Xiwill have small ones.



48/64

8

The mean value of the distribution of Yiis 1+ 2Xi. Its standard deviation is , the standard deviation ofthe disturbance term.

X

Y

Xi

1

1+ 2Xi



49/64

9

Hence the density function for the ex ante distribution of Yiis as shown.

X

Y

Xi

1

1+ 2Xi

2

2

1 21

2

1)(

ii

XY

i eYf



50/64

10

The joint density function for the observations on Yis the product of their individual densities.

2

2

1 21

2

1)(

ii

XY

i eYf

2

2

12

2

1

1

211211

2

1...

2

1)(...)(

nn

XYXY

n eeYfYf



51/64

11

Now, taking 1, 2and as our choice variables, and taking the data on YandXas given, we can re-interpret this function as the likelihood function for 1, 2, and .

2

2

1 21

2

1)(

ii

XY

i eYf

2

2

12

2

1

1

211211

2

1...

2

1)(...)(

nn

XYXY

n eeYfYf

221221121 2112112

1...

2

1,...,|,,

nn XYXY

n eeYYL



52/64

12

We will choose 1, 2, and so as to maximize the likelihood, given the data on YandX. As usual, it iseasier to do this indirectly, maximizing the log-likelihood instead.

2

2

1 21

2

1)(

ii

XY

i eYf

2

2

12

2

1

1

211211

2

1...

2

1)(...)(

nn

XYXY

n eeYfYf

221221121 2112112

1...

2

1,...,|,,

nn XYXY

n eeYYL

22

12

2

1 211211

21...

21loglog

nn

XYXY

eeL



53/64

13

As usual, the first step is to decompose the expression as the sum of the logarithms of the factors.

Zn

XYXYn

ee

eeL

nn

XYXY

XYXY

nn

nn

22

1log

21...

21

21log

2

1log...

2

1log

2

1...

2

1loglog

2

2

21

2

1211

2

2

12

2

1

2

2

12

2

1

211211

211211



54/64

14

Then we split the logarithm of each factor into two components. The first component is the same in each

case.

Zn

XYXYn

ee

eeL

nn

XYXY

XYXY

nn

nn

22

1log

21...

21

21log

2

1log...

2

1log

2

1...

2

1loglog

2

2

21

2

1211

2

2

12

2

1

2

2

12

2

1

211211

211211



55/64

15

Hence the log-likelihood simplifies as shown.

Zn

XYXYn

ee

eeL

nn

XYXY

XYXY

nn

nn

22

1log

21...

21

21log

2

1log...

2

1log

2

1...

2

1loglog

2

2

21

2

1211

2

2

12

2

1

2

2

12

2

1

211211

211211

22121211 )(...)(where nn XYXYZ



56/64

16

To maximize the log-likelihood, we need to minimizeZ. But choosing estimators of 1and 2to minimize

Zis exactly what we did when we derived the least squares regression coefficients.

Zn

XYXY

n

ee

eeL

nn

XYXY

XYXY

nn

nn

22

1log

21...

21

21log

2

1log...

2

1log

2

1...

2

1loglog

2

2

21

2

1211

2

2

12

2

1

2

2

12

2

1

211211

211211




57/64

17

Thus, for this regression model, the maximum likelihood estimators of 1and 2are identical to the least

squares estimators.

Zn

XYXY

n

ee

eeL

nn

XYXY

XYXY

nn

nn

22

1log

21...

21

21log

2

1log...

2

1log

2

1...

2

1loglog

2

2

21

2

1211

2

2

12

2

1

2

2

12

2

1

211211

211211




58/64

18

As a consequence,Zwill be the sum of the squares of the least squares residuals.

iiii

nn

XbbYee

XYXYZ

21

2

2

21

2

1211

where

)(...)(where

ZnL22

1loglog

2



59/64

19

To obtain the maximum likelihood estimator of , it is convenient to rearrange the log-likelihood functionas shown.

Znn

Znn

ZnL

22

1loglog

22

1log1log

22

1loglog

2

2

2



60/64

20

Differentiating it with respect to , we obtain the expression shown.

Znn

Znn

ZnL

22

1loglog

22

1log1log

22

1loglog

2

2

2

233log nZZnL



61/64

21

The first order condition for a maximum requires this to be equal to zero. Hence the maximum likelihood

estimator of the variance is the sum of the squares of the residuals divided by n.

Znn

Znn

ZnL

22

1loglog

22

1log1log

22

1loglog

2

2

2

233log nZZnL

n

e

n

Z i

2

2



62/64

22

Note that this is biased for finite samples. To obtain an unbiased estimator, we should divide by nk,

where kis the number of parameters, in this case 2. However, the bias disappears as the sample size

becomes large.

Znn

Znn

ZnL

22

1loglog

22

1log1log

22

1loglog

2

2

2

233log nZZnL

n

e

n

Z i

2

2


63/64

APPLICATIONS OF MLE

Probit and Logit Models


64/64

(Additional) References

Cramer, J.S.,An Introduction to Logit Model for Economists, 2ndEd., 2000, Timberlake Consultats LTD (Chapter 2)

Hill, Griffiths, Judge, Undergraduate Econometrics, 2ndEd, 2001

(chapter 12)

Johnston, J., and DiNardo, J., Econometric Methods, 4th ed.,1997, McGrawHill (Chapter 13)

Lye, Jenny, Limited Dependent Variables, Handout,

Melbourne University, 2006

Vahid, Farshid , 2002,Applied Econometrics: Section A:Introduction to Microeconometrics, Handout, Monash

University, Australia

Winkelmann & Boes,Analysis of Microdata,2006 (Chapter 1-4)