+ All Categories
Home > Documents > Etf3600 Lecture3 Mle Lpm 2013

Etf3600 Lecture3 Mle Lpm 2013

Date post: 14-Feb-2015
Category:
Upload: vince-lau
View: 18 times
Download: 0 times
Share this document with a friend
36
Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up ETF3600/5600 Quantitative Models for Business Research Lecture 3: Maximum Likelihood Estimation & Linear Probability Model References: Long Chapter 2 (Section 2.6) Powers 2.2.2 18 March, 2013
Transcript
Page 1: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

ETF3600/5600 Quantitative Models for BusinessResearch

Lecture 3:Maximum Likelihood Estimation

&Linear Probability Model

References:Long Chapter 2 (Section 2.6)

Powers 2.2.2

18 March, 2013

Page 2: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Review of Lecture 2

Maximum Likelihood EstimationIntuition & ExamplesMLE Normal Linear ModelMLE General Case

Linear Probability Model

Wrap-up

Page 3: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Estimation MethodsHow to estimate the parameters β0 and β1

An estimate of an unknown parameter in a model is a ”guess”based on the observed data.

Let (xi , yi ) be the random sample of size n from a population

yi = β0 + β1xi + εi (1)

To estimate the relationship between y and x’s we need toestimate β’s.

Various frameworks are used to estimate β’s:

Least Squares EstimationMaximum Likelihood EstimationMethod of MomentsBayesian Methods

Page 4: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Method of Least SquaresSummary

Different samples of size T will produce different values ofβ′s. The sampling properties of β are:

1 E (β) = β2 Var(β) = σ2/T3 β ∼ N(β, σ2/T )4 BLUE If β? is any other unbiased linear estimator of β then

var(β?) ≥ var(β).

Page 5: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Problems with LS Method

We have to make very specific assumptions about εi in orderto get OLS estimators with desirable properties.

We want the errors to be random so we assume:

E (εi ) = 0E (ε2i ) = σ2 = var(εi )E (εiεj) = 0 = cov(εi , εj)

If these assumptions are violated then LS estimators are notnecessarily BLUE.

Goodness of fit: R2(= ESS/TSS) measures how well theestimated model fits the underlying data.

Page 6: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Problems with LS Method

Serially dependent errors i.e.,E (εiεj) 6= 0 ⇒ Autocorrelation

Error terms from different (usually adjacent) periods (timeperiods or cross-section observations) are correlated.Example: Rain forecast. The probability of tomorrow beingrainy is greater if today is rainy than if today is dry.

Error variance are not constant E (ε2i ) 6= σ2 = var(εi ) ⇒Heteroskedasticity

In Greek hetero means differing and skedastic means varianceExample: Income and food expenditure. As income increasesthe variability of food expenditure will increase.

Function is non-linear.

Problem of outliers and extreme values.

Missing variable problem.

Page 7: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Relation to LS estimation

MLE is another form of estimation procedure.

LSE seeks the parameter values that provide the mostaccurate description of the data, measured in terms of howclosely the models fits the data under the square loss functioni.e., minimizing the sum of squares errors between theobservations and predictions.

MLE seeks the parameter values that are most likely to haveproduced the data

LSE estimated differ from MLE when the data is not normallydistribution.

Page 8: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Maximum Likelihood EstimationIntuition

Consider two possible outcomes, 1 and 0 where the probabilityof obtaining 1 is π and the probability of 0 is (1− π).

Ex. 1 Take a random sample of values of size n. Suppose n = 5 andthat the sample is (y = 1, y2 = 1, y3 = 1, y4 = 1, y5 = 1).What is the most likely value of π to have generated thissample?

Ex. 2 If the probability of getting a head upon flipping a particularcoin is π. We flip a coin ‘independently’ 10 times to get thefollowing sample: HHTHHHTTHH.

The probability of obtaining this sequence is a function of π

What is the most likely value of π to have generated thissample?

The intuition behind these questions is the intuition behindMLE. That is., what is the most likely value of the parameterto have generated the observed sample.

Page 9: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

MLE ExampleBut as we have already collected the data, the sample is fixed.The parameter π also has a fixed value, but the value iscurrently unknown.we need to work out the most likely π to generate this valuei.e., probability of observed data is a function of π.The probability of this sample being generated is:Pr(data|parameter) = Pr(HHTHHHTTHH|π)=π.π.(1− π).π.π.π.(1− π).(1− π).π.πSuppose π =0.1 then probability of obtaining our sample in arandom experiment would be:0.1 ∗ 0.1 ∗ (1− 0.1) ∗ 0.1 ∗ 0.1 ∗ 0.1 ∗ (1− 0.1) ∗ (1− 0.1) ∗ 0.1 ∗0.1 =0.0000000729Suppose π =0.2 then probability of obtaining our sample in arandom experiment would be:0.2 ∗ 0.2 ∗ (1− 0.2) ∗ 0.2 ∗ 0.2 ∗ 0.2 ∗ (1− 0.2) ∗ (1− 0.2) ∗ 0.2 ∗ 0.2 =0.00000655Thus it is more likely, given this sample, that π is 0.2 than 0.1.

Page 10: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

MLEExample

For different values of π:Value of π Prob of sample

0 00.1 0.00000007290.2 0.000006550.3 0.00007500.4 0.0003540.5 0.0009770.6 0.001790.7 0.002220.8 0.001680.9 0.0004781.0 0.0

Plot these on a graph. The maximum likelihood estimate of πis the value of π with the highest probability.

Page 11: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Likelihood FunctionExample 2

Figure 1. Likelihood of observing 7 heads and 3 tails in a particular sequence

for different values of the probability of observing a head, π.

Page 12: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

ML EstimationLikelihood Function for example 2

The likelihood function:

L(parameter |data) = L(π|HHTHHHTTH) = π7(1− π)3 (2)

The probability function and the likelihood function are givenby the same equation, but the probability function is afunction of the data with the value of the parameter fixed,while the likelihood function is a function of the parameterwith the data fixed.

Although each value of L(π|data) is a notional probability, thefunction L(π| data) is not a probability density function - itdoes not enclose the area of 1.

Page 13: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

ML EstimationLikelihood Function for example 2

The probability of obtaining the sample data in hand (HHTHHHTTH) is small regardless of the value of π.The value of π that is most supported by the data is the onefor which the likelihood is the largest:

This value is the maximum likelihood estimate (MLE) denotedby πhere π = 0.7 which is the sample proportion of heads 7/10.

One way to see this is that for n independent flips of thecoins, producing a particular sequence that includes x headsand (n − x) tailes, is given by:L(π|data) = Pr(π|data) = πx(1− π)(n−x)

This is the multiplication probabilities for all values gives thejoint density of the sample. This is the probability that thissample would arise in any random experiment.

We want the value of π that maximizes L(π| data) ∼ L(π)

Page 14: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

ML EstimationExample 2

It is simpler to maximize the log of the likelihood:

logL(π) = xlogπ + (n − x)log(1− π) (3)

Differentiating log L(x) with respect to π gives:

dlogL(π)

dπ=

x

π+ (n − x)

1

1− π(−1) (4)

dlogL(π)

dπ=

x

π+

n − x

1− π(5)

setting derivative to 0 and solving produces MLE which in thiscase is the sample proportion: x

n

⇒ π = Xn which is the Maximum Likelihood Estimator

Page 15: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

MLE of Normal Linear Model

We want to estimate β′s in:

yi = β0 + β1xi + ε (6)

ε is assumed to be identically and independently distributed(i.i.d distribution is the Normal distribution, ε ∼ N(0, σ2)

εi = y − β0 − β1x (7)

which is a normally distributed variable.

We need to find the β′s that are most likely to have generatedthis sample.

The Probability density function of each of the normallydistributed error ε is:

f (ε) =

(1

2πσ2

)0.5

exp

[−ε2

2σ2

](8)

Page 16: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Maximum Likelihood EstimationLikelihood function when normally distributed y

substituting ε = y − β0 − β1x ⇒

f (y − β0 − β1x) =

(1

2πσ2

)0.5

exp

[−(y − β0 − β1x)2

2σ2

](9)

Transform this density function into a likelihood.

The likelihood is the joint p.d.f of the sampled data

Since ε′s are assumed to be independent of each other, thejoint p.d.f is the product of individual probabilities.

Multiplying individual density function for each observationtogether gives the likelihood function of the sample:

L =n∏

t=1

(1

2πσ2

)0.5

exp

[− 1

2σ2(yt − α− βxt)2

](10)

Page 17: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Maximum Likelihood EstimationLog likelihood when normally distributed y

The likelihood function is a function of parameters (β′s andσ) and all of the data on the independent and dependentvariables (y’s and x’s) i.e., it is the formula for the joint p.d.fof sample.

This likelihood function can be used to find the set of valuesof the parameters as a function of the given data on ε thatmaximise the value of this likelihood function.

Simplify this function by taking logs (this would transform theproducts to sums).

The maximise by equating the first order derivative withrespect to each parameter to zero ∂logL

∂β

Page 18: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Maximum Likelihood EstimationNormally distributed y

1 Log likelihood:

lnL = ln(2πσ2)−n/2exp

−1

2σ2

n∏t=1

(yt − β0 − β1xt)2

= −n

2ln(2πσ2)− 1

2σ2

n∑t=1

(yt − β0 − β1xt)2

lnL = −n

2ln(2π)− nlnσ −

∑nt=1(yt − β0 − β1xt)2

2σ2(11)

2 Score function: Now get the first derivative L’(β) ( ∂lnL∂parameter ).

This function L’ is called the score function.3 Solve the equation ( ∂lnL

∂parameter = 0) and get the maximumlikelihood estimate. This equation is called the likelihoodequation.

Page 19: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Maximum Likelihood EstimationNormally distributed y

For βk

∂logL

∂βk=

∂βk(

1

σ2

n∑t=1

(yt − β0 − β1xt)2) (12)

∂logL

∂βk=

1

σ2

n∑t=1

xk(yt − β0 − β1xt) (13)

This is the same condition as OLS and hence OLS=MLE ⇒ ifyt is normally distributed then MLE=OLS.

If yt is not normally distributed then there is no simple closedform solution.

Page 20: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Maximum Likelihood EstimationVariance of MLE: Information matrix

The second order derivative of the log-likelihood functiongives the asymptotic sampling variance of MLE

Var(L′(β)) = −EL′(β)L′(β)T

= −E (∂2L(β)

∂βi∂βj) = I (β) (14)

The right hand since is called the expected Fisher InformationMatrix.

The distribution of parameters is asymptotically normal withvariance covariance martix given by

Var(β) = I (β)−1 = −(∂2L(β)

∂βi∂βj) (15)

Page 21: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Maximum Likelihood EstimationGeneral Approach

1 Define the density function2 Express the joint probability of the data3 Convert the joint probability into a likelihood4 Simplify the log likelihood expression5 Derive the likelihood function with respect to parameters6 Solve for the unknown parameters or write a program that

uses numerical analysis to produce maximum likelihoodestimates for the unknown:

Use successive approximation: Start with a starting value of βkcalculate log(L)Adjust the values of βk , recalculate log(L)Continue doing this to get values of βk .Choose the value of β that gives the highest log L.

7 There are a number of algorithms to find the ML estimationnumerically. Eg., Newton Raphson Method, Quadratic HillClimbing, Berndt-Hall-Hall-Hausman method.

Page 22: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

MLE Properties

MLE has many optimal properties in estimation:

Unbiased: The MLE are asymptotically unbiased, althoughthey may be biased in finite samples.Sufficiency: complete information about the parameter ofinterest contained in its MLE estimator. That is, if there is asufficient statistic for a parameter, then the maximumlikelihood estimator of the parameter is a function of asufficient statistic.Consistency: True parameter value that generated the datarecovered asymptoticallyEfficiency: Lowest possible variance of parameter estimatesachieved asymptoticallyML Estimators are asymptotically normally distributed.

Page 23: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

MLE Properties

Implementation problems:

Starting values essential for most numerical algorithmsAlgorithm choice seems esoteric but it might be important forsome problems.Scaling of variables, seems trivial, but matters in the numericalestimationsStep sizes of algorithm need to be small (not to skip overmaximum and minimum).Flat likelihood may imply there is no clearly identifiable uniquesolution and may be a real consequence of collinearity. Maybethere is no single maxima!

Page 24: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Linear Probability ModelIntroduction

Consider a dependent variable is of qualitative nature, codedas dummy variable Yi ∈ 0, 1Examples:

driving to work versus public transportemployed versus unemployed

being single versus married.

We will analyse two models:

Linear probability modelNon-linear models i.e., logit, probit.

Page 25: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Linear Probability ModelThe Model

Linear probability model is a linear regression model applied tobinary dependent variable

Yi = β0 + β1x1i + β2x2i ...+ βkxki + εi ,Yi ∈ 0, 1 (16)

Multiple linear regression model with a binary dependentvariable is called the linear probability model (LMP) becausethe response probability is linear in β′sThe zero conditional mean assumption, E(ε|x) = 0⇒ gives:

Pr(Yi = 1) = β0 + β1x1i + β2x2i ...+ βkxki (17)

this is called the response probability.

Page 26: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Linear Probability Model

Since Y can take only two values, βj cannot be interpreted aschange in Y given one unit increase in xj .β′s are the expected change in the response probability for aunit increase in xk

∂Pr(Yi = 1)

∂xj= βj (18)

Page 27: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Linear Probability ModelExample 1: Decision to drive and commuting time

Consider the decision to drive and mode of transport berelated as:

yi = α + βxi + εi (19)

where xi = (commuting time by bus - commuting time bycar).

we expect individuals to drive more if x increases.

Page 28: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Linear Probability ModelExample 1: Decision to drive and commuting time

Dependent Variable: Y Method: Least Squares Date: 03/08/07 Time: 11:43 Sample: 1 21 Included observations: 21

Variable Coefficien

t Std. Error t-Statistic Prob. C 0.484795 0.071449 6.785151 0.0000

X 0.007031 0.001286 5.466635 0.0000 R-squared 0.611326 Mean dependent var 0.476190

Adjusted R-squared 0.590869 S.D. dependent var 0.511766 S.E. of regression 0.327343 Akaike info criterion 0.694776 Sum squared resid 2.035914 Schwarz criterion 0.794254 Log likelihood -5.295144 F-statistic 29.88410 Durbin-Watson stat 1.978844 Prob(F-statistic) 0.000028

β(0.007) means that for every minute increase in the commutingtime by bus relative to car the probability to driving increases by0.007

Page 29: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Linear Probability ModelExample 2: Women’s labour force participation (Long Page 37)

Variable Description

LFP Whether works? [1 if in paid labour force; 0 otherwise.]k6 children younger than 6k618 children aged 6 to 18age Age in yearswc 1 if attended college; 0 otherwisehc 1 if husband attends college; 0 otherwiselwg log(women’s wage rate)Income Family income except woman’s wages

Page 30: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Linear Probability ModelExample 2: Women’s labour force participation (WLFP) (Long Page 37)

Long estimates a linear model in which the variable ”Whetherwoman works” is a linear function of all the other variables:

Variable β βσx t − val P(> ltl)

Constant 1.14 - 9.00 0.00Child(k6) -0.29 -0.154 -8.21 0.00Child(k618) -0.01 -0.015 -0.80 0.42Age -0.01 -0.103 -5.02 0.00wc 0.16 - 3.57 0.00hc 0.02 - 0.45 0.66lwg 0.12 0.072 4.07 0.00Income -0.01 -0.079 -4.30 0.00

β is coeff.; βσx = β ∗ σx standard coeff.

Page 31: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Linear Probability ModelExample 2: Women’s labour force participation (WLFP) (Long Page 37)

Less WLFP for families with larger number of pre schoolchildren (≤5 y/o) ⇒ each child under 5 will reduce theprobability of WLFP by 30% ! (in 1975) holding all othervariables constant.For a SD ↑ in family income the predicted probability of beingemployed decreases by 0.08 holding other variables constant.If the wife attended college then predicted prob. of being inlabour force increases by 0.16 holding all other variablesconstant.

Page 32: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Problems with LPMDon’t use it. Never!

Heteroskedasticity: Variance of errors depend on the x’s and isnot constant ⇒ LPM heteroskedastic making OLS inefficientand biased standard errors.

Var(y|x)=f(x) Assume binary variable y= 1 if P(y=1) and 1 if P(y=0).∑Pi = 1

has a mean E (y |x) = xβ = µ the variance is µ(1− µ).

Normality: Errors are not normally distributed.

Since y can be either 0 or 1 then error can be eitherε1 = 1− E (y |x) or ε0 = 0− E (y |x).

Nonsensical prediction: predicted y’s are <0 or >1.

In WLFP example, 35 year old woman who did not attendcollege, has four children, and husband did not attend collegehas a predicted probability of being employed of -0.48.Unreasonable prediction!

Page 33: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Problems with LPMFunctional form: model is linear i..e, unit increase in xk resultsin constant change of βk in the probability of an event,holding all other variables constant. Eg., for LPM eachadditional young child decreases the probability ofemployment.

The WLFP had 753 married white women, 428 of which wereemployed in labour force ⇒Pr(emp): pe = 428

753 = 0.568;Pr(unemp): 1-pe = 0.432

⇒ Odds Ratio: Ω = pe1−pe

No of ChildrenIn Labour Force 0 1 2 3

No 231 72 19 3Yes 375 46 7 0

Odds of emp 1.6 0.64 0.37 0.0

Pr(emp with 2 children): 77+19 = 0.27 ⇒ Ω = 0.27

1−0.27 = 0.37.Odds of being employed is negatively related to having kids(BUT, effect is not strictly linear)

Page 34: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Wrap-upMaximum Likelihood Estimation

The method of maximum likelihood provides estimators thathave both a reasonable intuitive basis and many desirablestatistical properties.The method is very broadly applicable and simple to apply.Once a maximum likelihood estimation provides standarderrors, statistical tests and other results useful for statisticalinference.A disadvantage of the method is that it frequently requiresstrong assumptions about the structure of the data.

Page 35: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Wrap-upLPM and Limitations

Multiple linear regression model with a binary dependentvariable is called the linear probability model (LMP)Many limitations:

Interpretation of parameters remain unaffected by havingbinary outcome.Effect of a variable is the same regardless of the values of theother variables.Effect of unit change for variable is the same regardless ofcurrent value of that variable.

Page 36: Etf3600 Lecture3 Mle Lpm 2013

Outline Review of Lecture 2 Maximum Likelihood Estimation Linear Probability Model Wrap-up

Summary

The limitation of LPM can be overcomed by using moresophisticated response models:

Pr(Y = 1) = G (β0 + β1X1i + β2X2i ) (20)

Where G is a function taking on values between zero andone:0 < G (z) < 1 for any real z.Two common functional forms are:

Logit ModelProbit Model


Recommended