A Report on Work in Progress INFERENCE WITH...

1

A Report on Work in Progress

INFERENCE WITH IMPERFECT INSTRUMENTAL

VARIABLES

Aviv Nevo

Northwestern University and NBER

Adam Rosen

Northwestern University

2

Motivation

• Finding valid IV, that are uncorrelated with the error-term, is a concern

facing much of applied micro research.

• In practice, in many cases the validity of the IV is somewhat questionable;

• Leading example: estimation of demand for differentiated products.

Depending on what controls are included the error will typically include:

unobserved quality and/or unobserved promotional activities;

Common IV include: observed characteristics or prices in other markets;

3

Our Focus

• In this paper we acknowledge the imperfection of our IV and ask:

Can we get partial identification even with imperfect IV?

Can we use the imperfect IV to learn the direction of the OLS bias?

4

The Model

Consider the simple linear (population) model

We focus on estimating the slope parameter, .

Assumptions:

A1: Random sampling – we can use a random sample of size n from the population

model.

A2: No perfect collinearity.

A3: and .

If instead we assumed ( ) then OLS would yield consistent and

unbiased estimates.

5

Denote , and .

Let Z be an imperfect instrumental variable such that

A4: and .

A5: The covariance of X and ,, and Z and , have the same sign: .

A6: Z is less correlated with , than X: .

If instead (of A5 and A6) we made the stronger assumption then the

standard IV estimator would yield consistent estimates.

6

• Define:

and

so that and are the probability limits of the OLS and traditional IV (using Z

as an instrument) estimators for .

Note: can be greater or smaller than $, depending of the sign of

• Let,

• Finally, define

7

Identification

Result 1: Assume A2-A5. If then $ is between and , i.e., we have

an informative bound.

From the definitions: and .

Note: If instead of A5, , then we require .

8

Result 2: Assume A2-A5. If or then

and .

Y Since is unknown, unless , then does not inform of the direction of bias.

Note: if is small, i.e., the correlation between Z and , is small, then the second

condition is likely to be satisfied.

9

What can we hope to get from A6?

< By the definition of , , which implies .

< Therefore, .

< is unknown so this is not a feasible estimator.

< By A6, , so if is a continuous and monotonic function (of ) on

[0,1], then can be bounded by and .

Result 3: Assume A2-A6. If then $ is between and , i.e., we have

an informative bound.

; if and if (if );

10

$T(8)IV as a function of 8: DXZ=.2, 8*=0.25

$T(8)IV as a function of 8: DXZ=.3, 8*=0.25

11

$T(8)IV as a function of 8: DXZ=-.1, 8*=0.25

$T(8)IV as a function of 8: DXZ=-.5, 8*=0.25

12

Result 4: Assume A2-A6. If then

As the correlation between X and Z decreases, helps improve on the bound

provided in Result 1.

13

Result 5: Assume A2-A6 and that . If then

If we are willing to sign the correlation between X and ,, then for we can

bound . But all we can say about $ is that

Up to now: Assuming A5, if we got informative bounds, but the results are less

promising for the case.

14

Alternative Approaches

• Replace A6 with an assumption of the sort:

It is unclear where comes from. Indeed the previous approach can be seen as an

attempt to replace with .

• Suppose we have two 2 (or more) IV

Let Z1 and Z2 denote these IV. Suppose each of them satisfies A4-A6.

If and , then the additional IV can tighten bounds;

If and , then using Z2 we can achieve an informative bound;

If and ,

Difference Z1 and Z2 to create a variable that let’s us exploit previous results;

15

and , wlog assume ;

For simplicity assume Y ;

Define .

if and ;

There exists a ( that satisfies both these conditions iff ;

Assuming such a ( exists it should be in [0, ), the closer it is to

the more likely it is that , but the lower .

Result 6: Assume A2-A6, and , then

Furthermore, we can combine with Result 3 to improve bound.

16

Estimation

• The above bounds can be estimated by replacing population moments with their

sample counterparts. In most cases these are easy to compute (OLS or IV)

estimators;

• Standard errors can be computed using Imbens and Manski (2004);

17

Extension

• The above analysis extends directly to multiple regression with a single endogenous

variables, by netting out the other variables and defining thing accordingly (Note:

the interpretation of the error and the conditions are different after netting out the

effects.)

• In principle, the idea of replacing equalities with inequalities is not limited to our

model (just find the set of parameters that satisfy the inequalities). However, we still

have not provided conditions that characterize these sets and will help in search for

IVs.

18

Application: Estimating Demand for Differentiated Products

Logit Model

• The indirect utility for individual i from UPC j in market t

where: xjt observable characteristics; pjt is price, >jt unobserved product

characteristic, and gijt is a mean zero stochastic term.

• Individuals can also choose the “outside option”:

• Each individual chooses exactly one good;

• Assuming is distributed iid extreme value, then

and

19

The error term and instruments

• The error-term is the unobserved characteristic and depending on the application it

will include either unobserved quality or promotional activities;

• In most cases it is not hard to come up with models of supplier behavior that lead to

correlation between prices and these error terms;

• Below we think of the error as unobserved promotional activities and variation in

valuation of unobserved quality that are likely to be correlated with price reductions.

• In this setting common IV are prices (of the product) in other markets

correlated with prices due to common costs shocks;

uncorrelated with the error term if demand errors are independent across mkts;

• The independence could be violated, for example, if

promotional activity is correlated across markets;

national advertising;

• Common response: argue theoretically or compare to other evidence;

20

21

Data

C Scanner data for cereal at the brand-quarter-MSA level;

20 quarters: 1988-1992;

focus on 25 top cereal brands;

47-65 markets, focus on SF and Boston;

C Key variables:

market shares (quantity) – volume converted to servings; one serving per day;

prices – revenue/quantity : pre-coupon real transaction per serving price;

quantity sold on promotion – estimated fraction sold on weeks with promotion;

advertising – national quarterly brand level from LNA;

22

PRICES AND MARKET SHARES OF BRANDS IN SAMPLE

Description Mean Median Std Min Max BrandVariation

CityVariation

QuarterVariation

Prices (¢ per serving)

19.4 18.9 4.8 7.6 40.9 88.4% 5.3% 1.6%

Advertising(M$ perquarter)

3.56 3.04 2.03 0 9.95 66.2% -- 1.8%

Share withinCereal Market(%)

2.2 1.6 1.6 0.1 11.6 82.3% 0.5% 0%

Source: IRI Infoscan Data Base, University of Connecticut, Food Marketing Center.

23

Results

assume: (not needed for all the results);

characteristics include: advertising, brand and city dummies;

error: unobserved promotions and differences (over time and cities) in perceived quality;

Case 1: single IV

• consider Z = quantity sold on promotion – clearly not a valid IV: ;

• we find: , (0.71),

Y from Result 1: ;

• Y from Result 3: ;

Note: (1) For most of what we care about the effect is multiplicative, thus the gain

from the second bound is meaningful;

(2) It is not clear that A6 should hold;

24

Case 2: multiple IV – prices in other markets

• prices in other markets have been used in these setting (e.g, HLZ, 1994; Hausman,

1996, Nevo, 2001) but have been criticized (e.g., Bresnahan, 1996);

• Let Z1 = average price of brand in other markets in the region (NE for Boston and

Northern California for SF) ;

• Let Z2 = average price of brand in markets in the other region (NE for SF and

Northern California for Boston) ;

• In the data: , , and ;

so there exists a ( s.t and ;

• Assume that these inequalities hold for ( );

Y we’ll use , , and to compute our bounds

25

We find:

point estimates

- 2.21 (0.71) ;

- 4.08 (0.87) ;

Bounds:

using quantity sold on promotion: [-10.60, -3.94];

using prices in other markets:

-11.47 Y [-10.97, -4.08]

-4.85 Y [-10.97, -4.85]

26

Conclusions and Extensions

• With a single IV we derived informative bounds for .

• Suggests a strategy for searching for imperfect IV;

• With several IV we were able to derive informative bounds also for the case where

the IV are positively correlated with the endogenous variable;

• In the application we studied the proposed bounds were reasonably tight;

• Where are we going next?

multiple endogenous variables;

improve bounds;

more detailed application(s) – maybe better fit for cases with ;

Date post:	29-May-2020
Category:	Documents
Upload:	others
View:	14 times
Download:	0 times

A Report on Work in Progress INFERENCE WITH...

Documents