The BLP Method of Demand Curve Estimation in Industrial ...

The BLP Method of Demand CurveEstimation in Industrial Organization

27 February 2006

Eric Rasmusen

Abstract

This is an exposition of the BLP method of structural estimation.

Dan R. and Catherine M. Dalton Professor, Department of Business Eco-nomics and Public Policy, Kelley School of Business, Indiana University, BU 456,1309 E. 10th Street, Bloomington, Indiana, 47405- 1701. Office: (812) 855-9219.Fax: 812-855-3354. [email protected]. http://www.rasmusen.org. Copies ofthis paper can be found athttp://www.rasmusen.org/papers/blp-rasmusen.pdf.

I thank Fei Lu for her comments.

I. Introduction

I came to know Professor Moriki Hosoe when he led the project oftranslating the first edition of my book, Games and Information, intoJapanese in 1990 and visited him at Kyushu University. He realized, as Idid, that a great manybnuseful new tools had been developed foreconomists, and that explaining the tools to applied modellers would beextraordinarily useful because of the difficulty of reading the originalexpositions in journal articles. Here, my objective is similar: to try toexplain a new technique, but in this case the statistical one of the “BLPMethod” of econometric estimation, named after Berry, Levinsohn & Pakes(1995). I hope it will be a useful contribution to the festschrift in honor ofProfessor Hosoe. I, alas, do not speak Japanese, so I thank xxx for theirtranslation of this paper. I know from editing the 1997 Public Policy andEconomic Analysis with Professor Hosoe how time-consuming it can be toedit a collection of articles, especially with authors of mixed linguisticbackgrounds, and so I thank Isao Miura and Tohru Naito for their editingof this volume.

The BLP Method is a way to estimate demand curves, a way that lendsitself to testing theories of industrial organization. It combines a variety ofnew econometric techniques of the 1980’s and 90’s. Philosophically, it is inthe style of structural modelling, in which empirical work starts with arigorous theoretical model in which players maximize utility and profitfunctions, and everything, including the disturbance terms, has aneconomic interpretation, the style for which McFadden won the NobelPrize. That is in contrast to the older, reduced-form, approach, in whichthe economist essentially looks for conditional correlations consistent withhis theory. Both approaches remain useful. What we get with thestructural approach is the assurance that we do have a self-consistenttheory and the ability to test much finer hypotheses about economicbehavior. What we lose is simplicity and robustness to specification error.

Other people have written explanations of the BLP Method. Nevo(2000) is justly famous for doing this, an interesting case of a commentaryon a paper being a leading article in the literature itself. (Recall, though,Bertrand’s 1883 comment on Cournot or Hicks’s 1937 “Mr Keynes and theClassics.”) The present paper will be something of a commentary on a

2

commentary, because I will use Nevo’s article as my foundation. I will usehis notation and equation numbering, and point out typos in his article as Igo along. I hope that this paper, starting from the most basic problem ofdemand estimation, will be useful to those, who like myself, are trying tounderstand modern structural estimation.

II. The Problem

Suppose we are trying to estimate a market demand curve. We haveavailable t = 1, ...20 months of data on a product, data consisting of thequantity sold, qt, and the price, pt. Our theory is that demand is linear,with this equation:

qt(pt) = α− βpt + εt. (1)

Let’s start with an industry subject to price controls. A mad dictatorsets the price each month, changing it for entirely whimsical reasons. Theresult will be data that looks like Figure 1a. That data nicely fits ourmodel in equation (1), as Figure 1b shows.

3

Figure 1: Supply and Demand with Price Controls

Next, suppose we do not have price controls. Instead, we have asituation of normal supply and demand. The problem is that now we mightobserve data like that in Figure 2a. Quantity rises with price; the demandcurve seems to slope the wrong way, if we use ordinary least squares (OLS)as in Figure 2b.

Figure 2: Supply and Demand without Price Controls

The solution to the paradox is shown in Figure 2c: OLS estimates thesupply curve, not the demand curve. This is what Working (1927) pointedout.

It could be that the unobservable variables εt are what are shifting thedemand curve in Figure 2. Or, it could be that it is some observablevariable that we have left out of our theory. So perhaps we should addincome, yt:

qt(pt) = α− βpt + γyt + εt. (2)

Note, however, that if the supply curve never shifted, we still wouldn’tbe able to estimate the effect of price on quantity demanded. We needsome observed variable that shifts supply.

4

Or, another approach would be to try to use a different kind of data.What if we could get data on individual purchases of product j? Our theoryfor the individual is different from our theory of the market. The demandcurve looks much the same, except now we have a subscript for consumer i.

qit(pt) = αi − βipt + γiyit + εit. (3)

But what about the supply curve? From the point of view of any one smallconsumer, the supply curve is flat. He has only a trivial influence on thequantity supplied and the market price, an influence we ignore in theoriesof perfect competition, where the consumer is a price-taker. Thus, we are inthe situation of Figure 3, which is very much like Figure 1.

Figure 3: The Individual’s Demand Curve

There is indeed one important difference: In Figure 3, all we’ve done isestimate the demand curve for one consumer. That is enough, if we arewilling to simplify our theory to assume that all consumers have identicaldemand functions:

qit(pt) = α− βpt + γyit + εit. (4)

This is not the same as assuming that all consumers have the samequantity demanded, since they will still differ in income yit and unobserved

5

influences, εit, but it does say that the effect of an increase in price onquantity is the same for all consumers. If we are willing to accept that,however, we can estimate the demand curve for one consumer, and we canuse our estimate β̂ for the market demand curve. Or, we can use data on ndifferent consumers, to get more variance in income, and estimate β̂ thatway.

But we would not have to use the theory that consumers are identical.There are two alternatives. The first alternative is to use the demand ofone or n consumers anyway, arriving at the same estimate β̂ that we wouldunder the theory (1). The interpretation would be different,though–itwould be that we have estimated the average value of β, and interpretingthe standard errors would be harder, since they would be affected by theamount of heterogeneity in βi as well as in εit. (The estimates would beunbiased, though, unlike the estimates I criticize in Rasmusen (1989a,b),since pt is not under the control of the consumer.)

Or, we could estimate the βi for all n consumers and then average then estimates to get a market β, as opposed to running one big regression.One situation in which this would clearly be the best approach is if hadreason to believe that the n consumers in our sample were notrepresentative of the entire population. In that case, running one regressionon all the data would result in a biased estimate, as a simple consequence ofstarting with a biased sample. Instead, we could run n separate regressions,and then compute a weighted average of the estimates, weighting each typeof consumer by how common his type was in the population.

Individual consumer data, however, is no panacea. For one thing, it ishard to get– especially since it is important to get a representative sampleof the population. For another, its freedom from the endogeneity problem isdeceptive. Recall that we assumed that each individual’s demand had noeffect on the market price. That is not literally correct, of course— everyone of 900,000 buyers of toothbrushes has some positive if trivial effect onmarket sales. If one of them decides not to buy a toothbrush, sales fall to899,999. That effect is so small that the one consumer can ignore it, andthe econometrician could not possibly estimate it given even a smallamount of noise in the data. The problem is that changes in individuals’demand are unlikely to be statistically independent of each other. When

6

the unobservable variable ε900000t for consumer i = 900, 000 is unusuallynegative, so too in all likelihood is the unobservable variable ε899999t forconsumer i = 899, 999. Thus, they will both reduce their purchases at thesame time, which will move the equilibrium to a new point on the supplycurve, reducing the market price. Price is endogenous for the purposes ofestimation, even though it is exogenous from the point of view of any oneconsumer.

So we are left with a big problem–identification– for demandestimation. The analyst needs to use instrumental variables, finding somevariable that is correlated with the price but not with anything else in thedemand equation, or else he must find a situation like our initial pricecontrol example where prices are exogenous.

But in fact even price controls might not lead to exogenous prices. Amad dictator is much more satisfactory, at least if he is truly mad. Supposewe have a political process setting the price controls, either a democraticone or a sane dictator who is making decisions with an eye to everything inthe economy and public opinion too. When is politics going to result in ahigher regulated price? Probably when demand is stronger and quantity isgreater. If the supply curve would naturally slope up, both buyers andsellers will complain more if the demand curve shifts out and the regulatedprice does not change. Thus, even the regulated price will have a supplycurve.

All of these problems arise in any method used to estimate a demandcurve, whether it be the reduced-form methods just described or the BLPstructural method. One thing about structural methods is that they forceus to think more about the econometric problems. Your structural modelwill say what the demand disturbance is— unobserved variables thatinfluence demand. If you must build a maximizing model of whereregulated prices come from, you will realize that they might depend onthose unobserved variables.

What does all this have to do with industrial organization? Isn’t it justprice theory, or even just consumer theory? Where are the firms? Thereason this is so important in industrial organization is that price theoryunderlies it. Production starts because somebody demands something. Anentrepreneur then discovers supply. That entrepreneur needs to organize

7

the supply, and so we have the firm. Other entrepreneurs compete, and wehave an industry. How consumers react to price changes is fundamental tothis. Natural extensions to this problem bring in most of how firms behave.Demand for a product depends on the prices of all firms in the industry,and so we bring in the theory of oligopoly. Demand depends on productcharacteristics, and so we have monopolistic competition and locationtheory. Demand depends on consumer information, and so we have searchtheory, advertising, adverse selection, and moral hazard. As we have seen,estimating demand inevitably brings in supply. Or, if you like, you couldthink of starting with the problem of estimating the supply curve in aperfectly competitive industry, a problem that can be approached in thesame way as we approach demand here.

III. The Structural Approach

Let us now start again, but with a structural approach. We will notbegin with a demand curve this time. Instead, we will start with consumerutility functions. The standard approach in microeconomic theory is tostart with the primitives of utility functions (or even preference orderings)and production functions and then see how maximizing choices ofconsumers and firms result in observed behavior. Or, in game theory terms,we begin with players, actions, information, and payoffs, and see whatequilibrium strategies result from players choosing actions to try tomaximize their payoffs given their information.

Suppose we are trying to estimate a demand elasticity— how quantitydemanded responds to price. We have observations from 20 months ofcereal market data, the same 50 products of cereal for each month, whichmakes a total of 1,000 data points. We also have data on 6 characteristicsof each cereal product and we have demographic data on how 4 consumercharacteristics are distributed in the population in each month.

Each type of consumer decides which product of cereal to buy, buyingeither one or zero units. We do not observe individual decisions, but we willmodel them so we can aggregate them to obtain the cereal product marketshares that we do observe. Let there be I = 400 consumers. It may seemstrange to specify a number of consumers, since we do not observe theirindividual behavior, but it will allow us to build our theoretical model up

8

from the fundamentals of individual consumer choice rather than startingwith aggregate consumption in each quarter. The number will be importantin the estimation itself, when we will randomly sample consumers using ourknowledge of the distribution of types of consumers in a given month, andour sample will be closer to the population the larger is I. The fact thatthe frequency of different consumer types changes over time is the variancein the data that allows us to estimate how each of the 4 consumercharacteristics affects demand.

At this point, we could decide to estimate the elasticity of demand foreach product and all the cross-elasticities directly, but with 50 productsthat would require estimating 2,500 numbers. Of course, theory tells usthat many of the cross-partials equal each other, but the number ofparameters would be too large to estimate even if we dropped half of them,particularly since we would not only need a large number of observations,but enough variability in the prices to see how each product’s demandchanges when the price of another product changes. Instead, we will focuson the product characteristics. There are only 6 of these, and there wouldonly be 6 even if there were 500 products of cereal instead of 50. In effect,we will be estimating cross-elasticities between cereal characteristics, but ifknow know those numbers, we can build up to the cross-elasticities betweenproducts using the characteristic levels of each product.

The Consumer Decision

The utility of consumer i if he were to buy product j in month t isgiven by the following equation, denoted equation (1N) because it isequation (1) in Nevo (2001):

uijt = αi(yi − pjt) + xjtβi + ξjt + εijt

i = 1, ..., 400, j = 1, ..., 50, t = 1, ..., 20,(1N)

where yi is the income of consumer i (which is unobserved and which we willassume does not vary across time), pjt is the observed price of product j inmonth t, xjt is a 6-dimensional vector of observed characteristics of productj in month t, ξjt (the letter “xi”) is a disturbance scalar summarizingunobserved characteristics of product j in month t, and εijt is the usualunobserved disturbance with mean zero. The parameters to be estimated

9

are consumer i’s marginal utility of income, αi, and his marginal utility ofproduct characteristics, the 6-vector βi. I have boldfaced the symbols forvectors and matrices above, and will do so throughout the article.

Consumer i also has the choice to not buy any product at all. We willmodel this outside option as buying “product 0” and normalize by settingthe j = 0 parameters equal to zero (or, if you like, by assuming that it hasa zero price and zero values of the characteristics):

ui0t ≡ αiyi + εi0t (5)

Equation (1N) is an indirect utility function, depending on income yi

and price pjt as well as on the real variables xjt, ξjt, and εijt. It is easilyderived from a quasilinear utility function, however, in which theconsumer’s utility is the utility from his consumption of one (or zero) of the50 products, plus utility which is linear in his consumption of the outsidegood. Quasilinear utility is not concave in income, so it lacks incomeeffects, but if those are important, they can be modelled by indirect utilitythat is a function not of (yi − pjt) but of some concave function such aslog(yi − pjt), as in BLP (1995).

A consumer’s utility depends both on a product’s characteristics (xjt)and directly on the product (ξjt), but in the characteristics approach we areusing, we implicitly assume that the direct marginal utilities are the resultof unobserved characteristics that are uncorrelated across products.

Consumer characteristics do not appear in the utility function. Theyplay a role later in the model, in determining βi, the marginal utility ofproduct characteristics. Consumer i’s income does appear, but it does notchange over time, which may seem strange. That is because we are notgoing to follow consumer i over time as his income changes; he is a type ofconsumer, and what will change over time is the number of consumers witha given income.

We will assume that εijt follows the Type I extreme-value distribution,which if it has mean zero and scale parameter one has the density andcumulative distribution

f(x) = exee−x

F (x) = e−e−x

. (6)

10

This is the limiting distribution of the maximum value of a series of drawsof independent identically distributed random variables. Figure 4 illustratesthe density, which is not dissimilar to the normal distribution, except thatit is slightly asymmetric and has thicker tails. This distribution isstandardly used in logit models because its cumulative distribution isrelated to the probability of x being larger than any other of a number ofdraws, which is like the random utility from one choice being higher thanthat from a number of other choices. This leads to a convenient formula forthe probability that a consumer makes a particular choice, and thus for aproduct’s market share, as we will see below.

Figure 4: The Type I Extreme-Value Distribution(from the Engineering Statistics Handbook)

Consumer i buys product j in month t if it yields him the highestutility of any product. What we observe, though, is not consumer i’sdecision, but the market share of product j. Also, though we do notobserve βi, consumer i’s marginal utility of product characteristics, we doobserve a sample of observable characteristics. Even if we did observe hisdecision, we would still have to choose between regular logit and BLP’srandom-coefficients logit, depending on whether we assumed that every

11

consumer had the same marginal utility of characteristics β or whether βi

depended on consumer characteristics.

Simple Logit

One way to proceed would be to assume that all consumers areidentical in their taste parameters; i.e., that αi = α and βi = β, and thatthe εijt disturbances are uncorrelated across i’s. Then we have the simplemultinomial logit model (multinomial because there are multiple choices,not just two). The utility function reduces to the following form, which is(1N) except that the parameters are no longer i-specific.

uijt = α(yi − pjt) + xjtβ + ξjt + εijt

i = 1, ..., 400, j = 1, ..., 50, t = 1, ..., 20.(5N)

Now the coefficients are the same for all consumers, but incomes differ. Wecan aggregate by adding up the incomes, however, since the coefficient oneach consumer has the same value, α. Thus we obtain an aggregate utilityfunction,

ujt = α(y − pjt) + xjtβ + ξjt + εjt, j = 1, ..., 50, t = 1, ..., 20. (7)

If we assume that εjt follows the Type I extreme-value distribution, thenthis is the multinomial logit model.

Since εjt follows the Type I extreme value distribution by assumption,it turns out that the market share of product j under our utility function is

sjt =exjtβ−αpjt+ξjt

1 +∑50

k=1 exktβ−αpkt+ξkt

(6N)

Equation (6N) is by no means obvious. The market share of product j isthe probability that j has the highest utility, which occurs if εjt is highenough relative to the other disturbances. The probability that product 1has a higher utility than the other 49 products and the outside good (which

12

has a utility normalized to zero) is thus

Prob(α(y − p1t) + x1tβ + ξ1t + ε1t > α(y − p2t) + x2tβ + ξ2t + ε2t)∗Prob(α(y − p1t) + x1tβ + ξ1t + ε1t > α(y − p3t) + x3tβ + ξ3t + ε3t) ∗ · · ·∗Prob(α(y − p1t) + x1tβ + ξ1t + ε1t > α(y − p50t) + x50tβ + ξ50t + ε50t)∗Prob(α(y − p1t) + x1tβ + ξ1t + ε1t > αy + ε0t)

(8)Substituting for the Type I extreme value distribution into equation (8)and solving this out yields, after much algebra, equation (6N). Since αyappears on both sides of each inequality, it drops out. The 1 in equation(6N) is there because of the outside good, with its 0 utility, since e0 = 1.

Elasticities of Demand

To find the elasticity of demand, we need to calculate∂sjt

∂pktfor products

k = 1, ..., 50. It is helpful to rewrite equation (6N) by defining Mj as

Mj ≡ exjtβ−αpjt+ξjt . (9)

so

sjt =Mj

1 +∑50

k=1 Mk

. (10)

Then∂sjt

∂pkt

=

∂Mj

∂pkt

1 +∑50

k=1 Mk

+

(−Mj

(1 +∑50

k=1 Mk)2

)(∂Mk

∂pkt

)(11)

First, suppose k 6= j. Then

∂sjt

∂pkt=

0

1 +∑50

k=1 Mk

+

(−Mj

(1 +∑50

k=1 Mk)2

)(−αMk)

= α

(Mj

1 +∑50

k=1 Mk

)(Mk

1 +∑50

k=1 Mk

)

= αsjtskt

(12)

13

Second, suppose k = j. Then

∂sjt

∂pjt=

−αMj

1 +∑50

k=1 Mk

+

(−Mj

(1 +∑50

k=1 Mk)2

)(−αMj)

= −αsjt + αs2jt

= −αsjt(1− sjt)

(13)

We can now calculate the elasticity of the market share: thepercentage change in the market share of product j when the price ofproduct k goes up:

ηjkt ≡%∆sjt

%∆pkt

=∂sjt

∂pkt

· pkt

sjt

=

{−αpjt(1− sjt) if j = k

αpktskt otherwise.(14)

Problems with Multinomial Logit

The theoretical structure of the elasticities in equation (14) isunrealistic in two ways.

1. If market shares are small, as is frequently the case, then α(1− sjt) isclose to α, so that own-price elasticities are close to −αpjt. This says that ifthe price is lower, demand is less elastic, less responsive to price, which inturn implies that the seller will charge a higher markup on products withlow marginal cost. There is no particular reason why we want to assumethis, and in reality we often see that markups are higher on products withhigher marginal cost, e.g. luxury cars compared to cheap cars.

2. The cross-price elasticity of product j with respect to the price ofproduct k is αpktskt, which only depends on features of product k— itsprice and market share. If product k raises its price, it loses customersequally to each other product.1 This is a standard defect of multinomiallogit, which McFadden’s famous red-bus/blue-bus example illustrates. Ifyou can choose among going to work by riding the red bus, the blue bus, or

1An apparent third feature, that a high price of product k means it has higher cross-elasticities with every other product, is actually the same problem as problem (1).

14

a bicycle, and the price of riding the red bus rises, are you equally likely toshift to the blue bus and to riding a bicycle? Of course not, but themultinomial logit model says that you will.

Another way to proceed is to use nested logit. In the red-bus/blue-busexample, you would first decide whether the blue bus or the red bus hadthe highest utility, and then decide whether the best bus’s utility wasgreater than the bicycle’s. In such a model, if the price of the red bus rose,you might switch to the blue bus, but you would not switch to the bicycle.A problem with nested logit, however, is that you need to use priorinformation to decide how to construct the nesting. In the case ofautomobiles, we might want to make the first choice to be between a largecar and small car, but it is not clear that this makes more sense thanmaking the first choice to be between a high-quality car and a low-qualitycar, especially if we are forcing all consumers to use the same nesting.

The Random Coefficients Logit Model

An alternative to simple logit or nested logit is to assume that theparameters— the marginal utilities of the product characteristics— aredifferent across consumers, and are determined by the consumercharacteristics. Random coefficients is the name used for this approach,though it is a somewhat misleading name. The approach does not say thatthe consumer behaves randomly. Rather, each consumer has fixedcoefficients in his utility function, but these coefficients are a function bothof fixed parameters that multiply his observed characteristics and onunobservable characteristics that might as well be random. Thus, “randomcoefficients” really means “individual coefficients”. We will denote theaverage values of the parameters αi and βi across consumers as α and β,and assume the following specification:

(αi

βi

)=

(αβ

)+ ΠDi + Σν i

=

(αβ

)+

(Πα

Πβ

)Di +

(Σα

Σβ

)(νiα, ν

iβ

) (2N)

where Di is a 4× 1 vector of consumer i’s observable characteristics, νi is a

15

7× 1 vector of the effect of consumer i’s unobservable characteristics on hisαi and βi parameters; Π is a 7× 4 matrix of how parameters (the αi andthe 6 elements of βi ) depend on consumer observables, Σ is a 7× 7 matrixof how those 7 parameters depend on the unobservables, and (νiα, ν iβ),(Πα,Πβ) and (Σα,Σβ) just split each vector or matrix into two parts.

We will denote the distributions of D and ν by P∗D(D) and P∗

ν(ν).Since we’ll be estimating the distribution of the consumer characteristics D,you will see the notation P̂∗

D(D) show up too. We will assume that P∗ν(ν)

is multivariate normal.2

Utility in the Random Coefficients Logit Model

Equation (1N) becomes3

uijt = αi(yi − pjt) + xjtβi + ξjt + εijt

= αiyi − (α + ΠαDi + Σανiα)pjt + xjt(β + ΠβDi + Σβνiβ) + ξjt + εijt

= αiyi + (−αpjt + xjtβ + ξjt) + (ΠαDi + Σανiα)pjt +xjt(ΠβDi + Σβνiβ

)+ εijt

= αiyi + (−αpjt + xjtβ + ξjt) + (−pjt,xjt) (ΠDi + Σν i) + εijt

= αiyi + δjt + µijt + εijt

j = 1, ..., 50, t = 1, ..., 20.(15)

What I have done above is to reorganize the terms to separate theminto four parts. First, there is the utility from income, αiyi. This plays nopart in the consumer’s choice, so it will drop out.

Second, there is the “mean utility”, δjt, which is the component ofutility from a consumer’s choice of product j that is the same across all

2Nevo variously uses P∗D(D), P̂∗

D(D), P∗ν (ν), and P̂∗

ν (ν) in his exposition. I havetried to follow him, but I may simply have misunderstood what he is doing.

3There is a typographical error in the Nevo paper on p. 520 in equation (3N): u insteadof µ in each line.

16

consumers.δjt ≡ −αpjt + xjtβ + ξjt (16)

Third and fourth, there is a heteroskedastic disturbance, µijt, and ahomoskedastic i.i.d. disturbance, εijt.

µijt ≡ (−pjt,xjt) (ΠDi + Σν i) (17)

Market Shares and Elasticities in the BLP Model

If we use the Type I extreme value distribution for εijt, then themarket share of product j for a consumer of type i turns out to be

sijt =eδjt+µijt

1 +∑50

k=1 eδkt+µikt

. (18)

Recall that we denote the distributions of D and ν by P∗D(D) and

P ∗ν(ν). Since we will be estimating the distribution of the consumer

characteristics D, you will see the notation P̂∗D(D) show up too.

The overall market share of product j in month t is found byintegrating the market shares picked by each consumer in equation (18)across the individual types, weighting each type by its probability in thepopulation:

sjt =

∫ν

∫D

sijtdP̂∗D(D)dP∗

ν(ν)

=

∫ν

∫D

[eδjt+µijt

1 +∑50

k=1 eδkt+µikt

]dP̂∗

D(D)dP∗ν(ν)

(19)

Equation (19) adds up the market shares of different types i of consumersbased on how common that type i is in month t.

The price elasticity of the market share of product j with respect to

17

the price of product k is

ηjkt ≡∂sjt

∂pkt

· pkt

sjt

=

−pjt

sjt

∫ν

∫D

αisijt(1− sijt)dP̂∗D(D)dP∗

ν(ν) if j = k

pkt

skt

∫ν

∫D

αisijtsiktdP̂∗D(D)dP∗

ν(ν) otherwise.

(20)

This is harder to estimate than the ordinary logit model, whose analogof equation (19) is equation (6N), repeated below.

sjt =exjtβ−αpjt+ξjt

1 +∑50

k=1 exktβ−αpkt+ξkt

(6N)

The difficulty comes from the integrals in (19). Usually these need to becalculated by simulation, starting with our real-world knowledge of thedistribution of consumer types i in a given month t and the characteristicsof product j, and combining that with estimates of how different consumertypes value different product characteristics. This suggests that we mightbegin with an initial set of parameter estimates, calculate what marketshares that generates for each month, see how those match the observedmarket shares, and then pick a new set of parameter estimates to try to geta closer match.

It is worth noting at this point that what is special aboutrandom-coefficients logit is not that it allows for interactions betweenproduct and consumer characteristics, but that it does so in a structuralmodel. One non-structural approach would have been to use ordinary leastsquares to estimate the following equation, including product dummies toaccount for the ξjt product fixed effects.

sjt = xjtβ − αpjt + ξjt (21)

Like simple logit, the method of equation (21) implies that the marketshare depends on the product characteristics and prices but not on anyinteraction between those things and consumer characteristics. We can

18

incorporate consumer characteristics by creating new variables in a vectordt that represents the mean value of the 4 consumer characteristics inmonth t, and then interacting the 4 consumer variables in dt with the 6product variables in xt to create a 1× 24 variable wt. Then we could useleast squares with product dummies to estimate

sjt = xjtβ − αpjt + dtθ1 + wtθw + ξjt (22)

Equation (22) adds market shares directly from the 4 consumercharacteristics and less directly via the 24 consumer-product characteristicinteractions. Thus, it has some of the flexibility of the BLP model. It is notthe result of a consistent theoretical model. Even aside from whetherrational consumer behavior could result in a reduced form like (21) or (22),those equations do not take account of the relationships between themarket shares of the different products— the sum of all the predictedmarket shares should not add up to more than one, for example. Therandom-coefficient logit model has the advantage of consistency, though itis more difficult to perform the estimation.

Before going on to estimate the random coefficients model, however,recall that regardless of how we specify the utility function— logit, nestedlogit, random-coefficients logit, or something else— we face the basicsimultaneity problem of estimating demand and supply functions. Marketshares depend on prices and disturbance terms, but prices will also dependon the disturbance terms. If demand is unusually strong, prices will behigher too. This calls for some kind of instrumental variables estimation,and the appropriate kind here is the generalized method of moments.

The Method of Moments

The generalized method of moments (GMM) of Hansen (1982)combines instrumental variables, generalized least squares, and a nonlinearspecification with the added complexity of what to do when there are somany instruments that the equation is overidentified. It helps to separatethese things out. We will go one step at a time, before circling back to theBLP method’s use of GMM.

What we are trying to do in econometrics is to estimate theparameters β of some theoretical model y = f(x; β) + ε given the values of

19

y and x that we observe. The two estimation approaches most used ineconomics are least squares and maximum likelihood. The principle of leastsquares is to find an equation that fits the data in the sense of minimizingthe distance between the estimated equation and the actual data. OLS usesthe sum of squared errors as its measure of distance, but if we used the sumof absolute deviations we would be estimating in the same spirit. Theprinciple of maximum likelihood is to assume that the disturbance follows aparticular distribution and to choose the equation parameters to maximizethe likelihood that we observe the data we actually do see.

The method of moments is closer to least squares in spirit. It does notmake assumptions on the distribution function shape of the disturbance(though it might make assumptions on whether variables are correlated).What it does is start with how the disturbance term relates to theexogenous variables. The name “method of moments” is misleading. The“first moment” of a distribution is the average and the “second moment” isthe variance. The typical method of moments in economics uses neither. Ituse a covariance condition instead. A better name would be “method ofanalogy”, because the method of moment works by trying to match sums ofobserved values to theoretical equations that the modeller specifies.4

Jeffrey Wooldridge has an excellent example of how this works in his2001 article in The Journal of Economic Perspectives. Suppose you weretrying to estimate the mean of a population, µ. The method of momentswould be to use the sample analog, the sample mean µ̂1 = x. But supposeyou had additional information: that the population variance is three timesthe population mean, so σ2 = 3µ. There would be an alternative way to usethe method of moments, based on the sample variance s2: by making yourestimate µ̂2 = s2/3. But these two estimates would be different, becauseeach is computed differently from the sample data: µ̂1 6= µ̂2. Which shouldyou use? Both estimates of µ are consistent— that is, in large samples theywill give reliable answers. The generalized method of moments, however,gives a way to combine both estimates, a way to produce a superiorestimator as a weighted average of µ̂1 and µ̂2. The optimal weight depends

4Even then, it is hard to differentiate this from the least squares approach. The obvioustheoretical moment condition to use is Ey = Xβ or Ey −Xβ = 0, the sample analog towhich is y −Xβ̂ = 0, minimizing which is exactly the basic argument for OLS. So perhapswe can’t make sense of the name.

20

on the variance of each estimate; the more variable single estimator shouldbe weighted less heavily. This is rather like the way in which generalizedleast squares weights individual observations to generate an estimator thatis more efficient than ordinary least squares, even though OLS isconsistent— thus, we have the “generalized” method of moments. We getthe extra efficiency because GMM uses the extra information that σ2 = 3µ.If we had still more information, we would want to include that too. In theapplication to BLP, the extra information will not be a priori informationlinking the mean and variance, but the extra instruments over theminimum needed to do instrumental variables estimation— theoveridentifying restrictions.

Let us set up some notation to go over this in detail. Suppose we haveT observations and we are trying to explain the observations on y, a T × 1vector, in terms of M observed variables and one unobserved variable. Theobserved explanatory variables are x1, ...,xM, where each xm is a T × 1vector and we can line them all up as X, a T ×M matrix. The unobservedvariable is the disturbance term ε, which is a T × 1 vector.

We need to assume something about the functional form and itsparameters. Let’s assume linearity, so

y = Xβ + ε, (23)

where β is an M × 1 vector of parameters.The expression Xβ isT ×M ×M × 1 so it comes out to be T × 1, the same as y.

Most commonly (and practically by definition) we assume that theobserved and unobserved variables are independent, which implies (but isnot equivalent to)

Ex′mε = 0. m = 1, . . . ,M, (24)

The sample analog to this isX′ε̂ = 0, (25)

where 0 is an M × 1 vector of zeroes. That gives us M equations (one foreach xm) for m unknowns (the M parameters βm). The value of ourestimate of the disturbance, ε̂, is

ε̂ ≡ y −Xβ̂, (26)

21

where the M × 1 vector β̂ contains our M estimated parameters.Substituting into (26), we get

X′(y −Xβ̂) = 0, (27)

soX′y = X′Xβ̂ (28)

andβ̂ = (X′X)−1X′y (29)

Thus, we get the familiar OLS estimator, but from the sample-theoryanalog principle rather than from minimizing squared errors. Theproperties are the same as the OLS estimator, since the estimator isidentical, so it is the best linear unbiased estimator (BLUE) and so forth.It should not be too surprising that we can get the OLS estimator usingdifferent approaches. Recall that the OLS estimator is also the maximumlikelihood estimator if the disturbances are normally distributed.

We solved for β̂ analytically here, but that is not an essential part ofthe method of moments— fortunately not, because sometimes the momentcondition (which was (26) here) has no exact solution. Something we couldalways do is to to turn the problem (using [27]) into

Minimize

β̂ X′(y −Xβ̂) (30)

We could, that is, try searching for the M values in β̂ that minimizeX′ε̂. A computer could do that for us. We would give the computer an

arbitrary starting value, β̂1 and calculate the size of X′ε̂1 using β̂1. We

would also give the computer a rule for searching for a second value, β̂2,and a third value, telling the computer to stop searching when X ′ε̂i is smallenough. You could construct your own program using a computer languagesuch as Basic or C, or you could use a minimization command from acomputer language such as Matlab or Gauss.

You would immediately run into a profound problem, however. Whatdoes “size” mean? X′ε̂ is an M × 1 vector, so it contains M numbers to beminimized, not just 1. What if one set of β̂ parameters yields

22

X′ε̂ = (0, 0, 0, 50) and another yields X′ε̂ = (20, 20, 20, 20)? Which hasminimized X′ε̂? We need a definition of the size of a vector. One definitionyou could use is to add up all the elements of the vector, in which case thetwo vectors would have sizes 50 and 80. Another is that you could add upthe squares of the elements, in which case the sizes are 2,500 and 1,600, theopposite ranking.

Here, we avoided the problem of defining size because we could find ananalytic perfect solution to the minimization problem. Any sensibledefinition of size would say that (0, 0, 0, 0, 0) is the smallest 5× 1 vectorpossible. In other method of moments situations, you will have to confrontthe problem directly.

Before we do that, however, let’s look at two other layers of complexitythat we can put into the method of moments: making the disturbances notbe i.i.d. and requiring instrumental variables. We will deal with theseseparately before first returning to the vector size problem and thencombining everything at the end.

Generalized Least Squares and the Method of Moments

As I said above, when the method of moments approach gives us thesame estimator as the least squares approach, it gives us the sameproperties, good and bad. In particular, if the disturbances are correlatedacross observations (serial correlation) or different observations havedisturbances with different probability distributions (heteroskedasticity),OLS is still consistent, but it is inefficient and its standard errors are biasedestimates of the standard deviations of the disturbances, so hypothesistesting is unreliable.

There is nothing in the logic of least squares that tells us how to get tothe GLS estimator which takes care of these problems. But the GLSestimator is, indeed, intuitive, both for serial correlation and forheteroskedasticity: it puts less weight on less informative data.

If the disturbances in observations 1, 2, and 3 are highly correlated,then our estimator ought to weight those observations less, because theydon’t contain as much information as three observations with independentdisturbances. If you’re trying to estimate the temperature precisely, and

23

you measure it with 100 thermometers that were manufactured to beidentical, so they all have the same measurement error, you don’t get asprecise an estimate as if you used 100 thermometers with independenterrors.

Similarly, if the disturbances in observations 1 to 40 are distributedwith a variance of 10 and the disturbances in observations 41 to 80 aredistributed with a variance of 500, then our estimator ought to weightobservations 1 to 40 more heavily. They contain less noise.

But these intuitions are not the intuitions of least squares, nor of themethod of moments. They are statistical intuitions, justified by either afrequentist or bayesian approach. We can nonetheless tack them on top ofthe least squares or method of moments estimator. In the method ofmoments, we can alter our moment condition by inserting the inverse of thevariance-covariance matrix, Φ−1, a T × T matrix since it shows thecovariance between any two of the T disturbances εt. The theoreticalmoment condition becomes

EX′Φ−1ε = 0, (31)

The sample analog is

X′Φ̂−1

ε̂ = 0, (32)

orX′Φ̂

−1(y −Xβ̂) = 0, (33)

which solves toβ̂ = (X′Φ̂

−1X)−1Φ̂

−1y. (34)

This estimator is identical to the GLS estimator used in theleast-squares approach. I didn’t say how to calculate the

variance-covariance estimate Φ̂−1

, but all we need is a consistent estimatorfor it, and we could do that as in GLS by iterating between calculating β̂ toget estimates for ε̂ and using those estimates to calculate Φ̂.

Instrumental Variables and the Method of Moments

Now let’s put aside the issues concerning GLS and thevariance-covariance estimates so we can look at a different problem:

24

endogeneity of the explanatory variables. What this means is that EX′εdoes not equal zero even in our theoretical model. Some of the x’s arecaused by the y, or something else jointly causes them.

The solution to this problem is instrumental variables. We need to findsome instruments, variables that (a) do not cause y, (b) are correlated withx, and (c) are uncorrelated with ε. For each x that is endogenous, we needto find at least one instrument. Requirement (a) says that our theory mustbe able to rule out the instruments from being part of the original equationwe are estimating, i.e., we can’t use one xm to instrument for another xm.The other two requirements can be interpreted as requiring us to set up asecond theoretical equation, one in which xm is explained by theinstruments, among other things, though it is not a problem if our secondequation omits some relevant variables, since we won’t care if theparameters we estimate in it are biased.

Let’s define a new matrix Z consisting of T observations on all of thexm variables which are exogenous plus at least one instrument for each xm

that is endogenous. Thus, Z will be T ×N , where N ≥ M . For themoment, let’s assume that N = M , which means we have one instrumentfor each endogenous xm and the system is exactly identified, notoveridentified. Our system then consists of

y = Xβ + ε X = ZΓ + ν, (35)

where Γ is an N ×M matrix of coefficients so that we have M equationsfor how the N exogenous variables affect the M explanatory variables inthe first equation.

Our theoretical moment equation is different now. We are notassuming that the X are all independent of ε. Instead, we are assuming thatthe variables in Z are all independent of ε. Thus, the theoretical equation is

EZ′ε = 0. (36)

The sample analog isZ′ε̂ = 0, (37)

where 0 is an M × 1 vector of zeroes, or

Z′(y −Xβ̂) = 0. (38)

25

As before, we can solve this. The first step is

Z′y = Z′Xβ̂) (39)

and the second step isβ̂ = (Z′X)−1Z′y. (40)

Thus, we get the standard IV estimator, using the logic of the method ofmoments.

Overidentification, Defining Vector Size, and the Method ofMoments

I purposely wrote out both algebra steps in the last two equations,because a problem comes up in going between them. We assumed thatN = M . What if N > M? Well, then the second step won’t work. Itrequires us to invert Z′X, which is an N × T × T ×M matrix— an N ×Mmatrix. We can invert that if it is a square M ×M , but not otherwise,because N 6= M .

A solution in the least squares approach is to use two-stage leastsquares. In the first stage, regress bfZ on X to calculate the fitted values

X̂ ≡ ZΓ̂ (41)

Then our estimator can be

β̂ = (X̂′X)−1X̂′y, (42)

which is fine since X̂ is a T ×M matrix like X, even if Z is T ×N .

GMM takes a different approach to get to the same answer. Thereader will recall our earlier discussion of how GMM can still apply even ifthere is no analytic solution to the equations, because we can still minimizethe sample analog moment condition— if we define vector size. So that iswhat we will now do.

Let’s define the size of a T × 1 vector w as w′w— the sum of squaredelements of the vector. Then the size of the M × 1 vector X′ε will be the1× T × T ×M ×M × T × T × 1 matrix (a scalar, actually) ε′XX′ε, and wewill choose β̂ so

β̂ =argmin

β ε̂′XX′ε̂, (43)

26

Recall that earlier I said that any sensible size definition would yieldthe OLS estimator in the simple case— or the IV estimator for an exactlyidentified system where N = M— since the minimand equals zero at thesolution then. We can verify that here, and find the OLS estimatoranalytically a different way: by using calculus to maximize the function

f(β̂) = ε̂′XX′ε̂

= (y −Xβ̂)′XX′(y −Xβ̂)

= y′XX′y − β̂′X′XX′y − y′XX′Xβ̂ + β̂

′X′XX′Xβ̂

(44)

We can differentiate this with respect to β̂ to get the first order condition

f ′(β̂) = −X′XX′y − y′XX′X + 2β̂′X′XX′X = 0

= −2X′XX′y + 2β̂′X′XX′X = 0

= 2X′X(−X′y + β̂′X′X) = 0

(45)

in which caseβ̂ = (X′X)−1X′y (46)

and we are back to OLS.

If we want to use GLS, we can weight our “size” by the inverse of the

variance-covariance matrix, Φ̂−1

, like this:

β̂ =argmin

β ε̂′XΦ̂−1

X′ε̂, (47)

If we want to use instrumental variables, then our GMM estimator willalso incorporate a weighting matrix. This is analogous to two-stage leastsquares (2SLS), where we can regress the variables in X on a greaternumber of variables in Z. Now the weighting matrix– the “size”definition— will start to matter. We will make one final change to it. Wewill use (Z′Z)−1σ2. To understand what this means, imagine that thedifferent zn vectors are uncorrelated with each other and that N = 3. Then

(Z′Z)−1 =

1

z′1z10 0

0 1z′2z1

0

0 0 1z′3z3

σ2 (48)

27

The σ2 makes no difference, since it applies equally to all the equations.But if zm varies a lot, it is going to get less weight. If we want to use GLSand instrumental variables, our GMM estimator becomes

β̂ =argmin

β ε̂′ZΦ̂−1

Z′ε̂, (49)

We cannot solve this analytically if N > M , so we would use a computer tosearch for the best solution numerically.

Thus, if our assumption on the population is that5

EZ′ω(θ∗) = 0, m = 1, . . . ,M, (8N)

then the GMM estimator is

θ̂ =argmin

θ ω(θ)′ZΦ−1Z′ω(θ), (9N)

where Φ is a consistent estimator of EZ′εε′Z.

We minimize the square of the sample analog because we want tominimize the magnitude, rather than generate large negative numbers byour choice, and squares are easier to deal with than absolute values (thesame choice in OLS versus minimizing absolute values of errors).

We weight by Φ because we want to make heavier use of observationsthat contain more independent information. This includes the serialcorrelation and heteroskedasticity corrections.

The method of moments, like ordinary least squares but unlikemaximum likelihood, does not require us to know the distribution of thedisturbances. In this context, though, we will still have to use theassumption that the εijt follow the extreme value distribution, because weneed it to calculate the integrals of market shares aggregated acrossconsumer types.

Note that GMM does not rely on linearity. We may not be able to findanalytic solutions if the theoretical equation is nonlinear, but we can stillminimize the discrepancy between the sample moment condition and the

5I think there is a typo in Nevo here, on page 531, and zm should replace Zm in equation(8N).

28

theoretical. (The least squares approach can handle nonlinearity too,actually. You could specify a nonlinear functional form, and minimize thesum of squared errors in estimating it.)

Back to the Logit Model

We now have discussed the logit model of consumer behavior and thegeneralized method of moments way to estimate a model. Next we willcombine them.

The tricky part of the theory is in choosing the function ω(θ∗) thatgoes into the moment condition. Here is the BLP approach (the numbers ofthe steps are from Nevo [2000], Appendix, p. 1).

(-1) Select arbitrary values for (δ,Π,Σ) as a starting point. Recall that δfrom (16) is a vector of the mean utility from each of the products, andthat Π,Σ is the vector showing how consumer characteristics and productcharacteristics interact to generate utility.

(0) Draw random values for (ν i,Di) for i = 1, ...ns from the distributionsP∗

ν(ν) and P̂∗D(D) for a sample of size ns, where the bigger you pick ns the

more accurate your estimate will be.

(1) Using the starting values and the random values, and using theassumption that the εijt follow the extreme-value distribution, approximatethe integral for market share that results from aggregating across i by thefollowing “smooth simulator”:

sjt =

(1

ns

) ns∑i=1

sijt

=

(1

ns

) ns∑i=1

[e[δjt+Σ6

k=1xkjt(σkνk

i +πk1Di1+···+πk4Di4)]

1 +∑50

m=1 e[δmt+P6

k=1 xkmt(σkνk

i +πk1Di1+···+πk4Di4)]

],

(11N)where (ν1

i , . . . , ν6i ) and (Di1, . . . , Di4) for i = 1, . . . ns are those random

draws from the previous step.

Thus, in step (1) we obtain integrals with predicted market shares.

29

(2) Use the following contraction mapping, which, a bit surprisingly,converges. Keeping (Π,Σ) fixed at their starting points, find values of δ bythe following iterative process.

δh+1·t = δh

·t + (ln(S·t)− ln(s·t)), (12N)

where S·t is the observed market share. and s·t is the predicted marketshare from step (1) that uses δh+1

·t as its starting point. Start with thearbitrary δ0 of step (-1).

If the observed and predicted market shares are equal, then δh+1·t = δh

·tand the series has converged. In practice, keep iterating until(ln(S·t)− ln(s·t)) is small enough for you to be satisfied with its accuracy.

Thus, in step (2) we come out with values for δ.

(2.5) Pick some starting values for (α, β).

(3) Figure out the value of the moment expression using the starting valuesand your δ estimate. First, define

ωjt = δjt − (xjtβ + αpjt) (13N)

Second, figure out the value of the moment expression,

ω′ZΦ−1Z′ω (50)

You need the matrix Φ−1 to do this. Until step (5), just use Φ−1 = Z′Z asa starting point.6

(4) Do a minimization search, trying nearby values of (α, β, δ,Π,Σ) untilthe value of the moment expression is close enough to zero.

(5) Take your converged estimates and use them to compute a new ω. Usethat to compute a new value for Φ, Z′ωω′Z. Then go back to step (1),using your latest parameter estimates as your starting values.

Nevo notes that you could then iterate between estimating parameters(step 4) and estimating the weighting matrix (step 5). Both methods are

6Nevo’s text says little about this problem, but he discusses it on page 5 of his appendix.

30

consistent, and neither has more attractive theoretical properties, so it isacceptable to just iterate through steps (1) to (4).

Conclusion

Now that we have gone through the entire procedure, it may be helpfulto list some of the ideas we have used.

1. Instrumental variables. We use instruments to correct for theendogeneity of prices, the classic problem in estimating supply anddemand.

2. Product characteristics. We look at the effect of characteristics ondemand, and then build up to products that have particular levels ofthe characteristics. Going from 50 products to 6 characteristicsdrastically reduces the number of parameters to be estimated.

3. Consumer and product characteristics interact. This is what is goingon when consumer marginal utilities are allowed to depend onconsumer characteristics. This makes the pattern of consumerpurchases substituting from one product to another more sensible.

4. Structural estimation. We do not just look at conditional correlationsof relevant variables with a disturbance term tacked on to account forthe imperfect fit of the regression equation. Instead, we start with amodel in which individuals maximize their payoffs by choice ofactions, and the model includes the disturbance term which will latershow up in the regression.

5. The contraction mapping. A contraction mapping is used to estimatethe parameters that are averaged across consumers, an otherwisedifficult optimization problem.

6. The method of moments. The generalized method of moments is usedto estimate the other parameters.

Not all of these are special to the BLP method. Ideas (1), (2), and (3)can all be used with least squares (which itself is a simplified version of

31

(6)). Idea (4) is used in standard logit. It is ideas (5) and (6) that arespecial to BLP, but of course BLP is a combination of all six ideas, which iswhy it is so difficult to explain .

The BLP Method has been widely used because it is general enough touse for a variety of estimation problems in industrial organization, not justfor simple demand problems. It is attractive compared to older methodsbecause it imposes relatively little structure on the theoretical model, andso allows many different kinds of firm and consumer behavior to be tested.This flexibility, however, is achieved at the cost of considerable intricacy.The BLP method is made up of a modelling part and an estimation part.The modelling part is a logit model of a maximizing consumer’s choice ofproduct depending on consumer and product characteristics. This is astructural model, and really any structural model of maximizing choice, byconsumer, government, or firm, could be used in its place. The estimationpart estimates the importance of the product characteristics, consumercharacteristics, and prices using the generalized method of moments. Thisis a highly flexible method, requiring weaker assumptions than maximiumlikelihood but like that procedure requiring a large number of observationsand much computing power. I hope in this summary I have made clearerhow the economist would go about combining this modelling andestimation that forms the BLP model.

References

Berry, Steven (1994) “Estimating Discrete-Choice Models of ProductDifferentiation,” The RAND Journal of Economics, 25: 242-262.

Berry, Steven, James Levinsohn & Ariel Pakes (1995) “AutomobilePrices in Market Equilibrium,” Econometrica, 63(4): 841-890 (July 1995).

Berry, Steven, James Levinsohn & Ariel Pakes (2004). “DifferentiatedProducts Demand Systems from a Combination of Micro and Macro Data:The New Car Market,” The Journal of Political Economy, 112(1): 68-105(February 2004).

Chamberlain, G. (1987) “Asymptotic Efficiency in Estimation withConditional Moment Restrictions,” Journal of Econometrics, 34: 305-334.

Hall, Bronwyn H. (1996) “Notes on Generalized Method of Moments

32

Estimation,” http://emlab.berkeley.edu/users/bhhall/e244/gmmnotes.pdfMarch 1996 (revised February 1999).

Hall, Bronwyn H. (2005) “Computer Code for Problem Set 3 (Effectsof Horizontal Merger),”http://emlab.berkeley.edu/users/bhhall/e220c/rc dc code.htm.

Hansen, L. P. (1982) “Large Sample Properties of Generalized Methodof Moments Estimators,” Econometrica, 50: 1029-1054.

Hicks, John (1937) “Mr Keynes and the Classics: A SuggestedSimplification,” Econometrica (1937).

Hosoe, Moriki & Eric Rasmusen, eds. (1997) Public Policy andEconomic Analysis, Fukuoka, Japan: Kyushu University Press (1997).

Nevo, Aviv (2000) “A Practitioner’s Guide to Estimation of Random-Coefficients Logit Models of Demand,” Journal of Economic andManagement Strategy, 9(4): 513-548 (Winter 2000).

Nevo, Aviv “Appendix to ‘A Practitioner’s Guide to Estimation ofRandom Coefficients Logit Models of Demand Estimation: TheNitty-Gritty’,”http://www.faculty.econ.northwestern.edu/faculty/nevo/supplements/Ras guide appendix.pdf.

Rasmusen, Eric (1998a) “Observed Choice, Estimation, and Optimismabout Policy Changes,” Public Choice, 97(1-2): 65-91 (October 1998).

Rasmusen, Eric (1998b) “The Observed Choice Problem in Estimatingthe Cost of Policies,” Economics Letters, 61(1): 13-15 (1998).

Wooldridge, Jeffrey M. (2001) “Applications of Generalized Method ofMoments Estimation,” Journal of Economic Perspectives, 15( 4): (Fall2001).

“1.3.6.6.16. Extreme Value Type I Distribution,” EngineeringStatistics Handbook,http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm.

33

Date post:	13-Jan-2022
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

The BLP Method of Demand Curve Estimation in Industrial ...

Documents