+ All Categories
Home > Documents > How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like...

How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like...

Date post: 29-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
34
How Sensitive are Sales Prices to Online Price Estimates in the Housing Market? Yong Suk Lee a and Yuya Sasaki b a Freeman Spogli Institute for International Studies, Stanford University, USA b Department of Economics, Johns Hopkins University, USA June 24, 2016 Abstract This paper examines the impact of online price estimates on transaction prices in the housing market. We develop an estimation model that uses the dierence between listing prices and online price estimates to proxy for house specific unobservables, and first dif- ferences observations within neighborhood to account for correlated neighborhood specific unobservables. Using house price estimates and sales prices collected from Zillow.com, we find that the elasticity of sales price with respect to the Zillow estimate is close to one, controlling for the aforementioned unobservables as well as observed house attributes. The accessibility of the internet at home strongly and positively predicts the elasticity es- timates across metropolitan areas. Furthermore, the change in Zillow estimates impacts how sales prices adjust from the list prices a month before sales. Our results imply that online price information aects house prices and that online price estimates can potentially have a direct impact on house price dynamics. Keywords: real estate pricing, online price estimates, hedonic valuation, neighbor- hood panel data, proxies JEL Codes: D82, R21, R31, R32 Corresponding author at Stanford University, Encina Hall E309, 616 Serra St. Stanford, CA 94305. E-mail address: [email protected] 1
Transcript
Page 1: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

How Sensitive are Sales Prices to Online Price Estimatesin the Housing Market?

Yong Suk Lee⇤ a and Yuya Sasakib

aFreeman Spogli Institute for International Studies, Stanford University, USAbDepartment of Economics, Johns Hopkins University, USA

June 24, 2016

Abstract

This paper examines the impact of online price estimates on transaction prices in thehousing market. We develop an estimation model that uses the di↵erence between listingprices and online price estimates to proxy for house specific unobservables, and first dif-ferences observations within neighborhood to account for correlated neighborhood specificunobservables. Using house price estimates and sales prices collected from Zillow.com, wefind that the elasticity of sales price with respect to the Zillow estimate is close to one,controlling for the aforementioned unobservables as well as observed house attributes.The accessibility of the internet at home strongly and positively predicts the elasticity es-timates across metropolitan areas. Furthermore, the change in Zillow estimates impactshow sales prices adjust from the list prices a month before sales. Our results imply thatonline price information a↵ects house prices and that online price estimates can potentiallyhave a direct impact on house price dynamics.

Keywords: real estate pricing, online price estimates, hedonic valuation, neighbor-hood panel data, proxies

JEL Codes: D82, R21, R31, R32

⇤Corresponding author at Stanford University, Encina Hall E309, 616 Serra St. Stanford, CA 94305.E-mail address: [email protected]

1

Page 2: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

1 Introduction

Like other types of assets, the price of real estate is determined by the observed and unobserved

attributes of the asset. Houses, especially single family houses, exhibit unobserved heterogeneity

across various dimensions. Same sized bedrooms can be valued di↵erently depending on the

location of the window. The topography of same sized lots can a↵ect the value of the property.

Neighborhood amenities, like nearby schools and parks, are important determinants of property

prices. While no two houses in general are alike, houses have been priced based on what

appraisers or brokers refer to as comparables, observationally similar houses that were recently

sold in the same or nearby neighborhood. Pricing adjustments are made to reflect the di↵erences

between the house of interest and the comparable houses. In other words, the price of a house

takes into account the price information of other houses. With the advancement of the internet,

one can easily search sales price information for a large number of properties. Furthermore,

there are online services that provide their own property estimates for free based on the property

and neighborhood attributes, as well as the sales prices of comparable properties. Does the

availability of such price information a↵ect actual sales prices? How large is the extent of this

impact?

In order to estimate the impact of online house price information on transaction prices, we

develop a reduced-form pricing equation as the convex combination of the online price estimate

and the hedonic valuation of a property. The main challenge for estimation is to control

for the unobserved house and neighborhood attributes in the model. We present a method

that nonparametrically proxies for unobserved house specific attributes by using the di↵erence

between the listing price and the online price estimate. As Baum-Snow and Ferreira (2015)

highlight, quasi-experimental research designs (Chay and Greenstone 2005, Ferreira 2010), and

in particular, boundary discontinuities (Black 1999, Bayer et al. 2005) have been used to control

for unobservable area specific attributes. On the other hand, Bajari et al. (2012) propose a

method that relies less on the research design but on the structural assumption that prior

house sales prices can be used to control for time-varying unobservable attributes in a hedonic

2

Page 3: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

regression. Better data, such as, repeat-sales house transaction data has also enabled researchers

to deal with unobservable house and neighborhood attributes (Bayer et al. 2013). This paper

develops a novel estimation strategy that incorporates structural assumption and data collection

into a hedonic model. In particular, we develop a hedonic model that controls for neighborhood

unobservables by first di↵erencing observations within neighborhoods, while relying on the weak

structural assumption that prior list prices contain unobserved house specific information and

the data collection strategy of having at least two properties per neighborhood.

We collect home value estimates, list prices, sales prices, and house and neighborhood

attributes from Zillow.com, an online real estate information provider, for 1,200 houses across

30 Metropolitan Statistical Areas (MSAs) in the US. We find that the elasticity of house sales

prices with respect to the Zillow price estimates is large and quite close to one. The results

are robust regardless of how we calculate the proxy variable to control for unobserved house

attributes.

Additionally, we explore possible factors that might explain the variation in the elasticity

estimates across the 30 MSAs. We find that the internet penetration rate, i.e., the average

accessibility of the internet at home, strongly and positively predicts the elasticity estimate

across metropolitan areas. In other words, sales prices are more responsive to Zillow estimates

in MSAs with better internet access at home. This e↵ect is robust to the income level, education

level, housing demand and supply, and unemployment rate of the MSA. Lastly, we find that the

change in Zillow estimates impacts how sales prices adjust from the list prices a month before

sales. This finding further corroborates our hypothesis that the information provided through

online price estimates directly impacts sales prices.

Our results are related to several strands of the literature. Ferreira and Gyourko (2011)

examine the causes of the most recent housing boom in the U.S. Though understanding the

causes of housing boom is not the focus of our paper, our findings imply that online price

estimates could have influenced the house price dynamics. Researchers have found that infor-

mation a↵ects house transaction prices in various contexts. Levitt and Syversson (2008) show

3

Page 4: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

that informational advantage translates to higher sales prices by examining properties owned

by real estate brokers. They find that realtors sell their own houses at about 4 percent higher

prices. Foreclosures can a↵ect the price of non-foreclosure houses by conveying new information

about unobserved neighborhood attributes, or more directly by being included as comparables.

Campbell et al. (2011) find that foreclosed homes lower prices of nearby houses by about 1

percent.1

The paper is organized as follows. Section 2 presents our econometric model and its esti-

mation strategy. Section 3 explains the data, in particular, the house level data that we collect

from Zillow.com. In Section 4, we present our elasticity estimates and examine the underlying

mechanisms. Section 5 concludes and discusses the implications.

2 The Econometric Model and Estimation Strategy

2.1 The Extended Hedonic Model

We propose an econometric method that estimates the impact of house price information on

sales prices. Specifically, we extend the traditional hedonic framework to one that incorporates

the potential e↵ects of house price information, in particular, the house level price estimates

provided by Zillow.2

The following is a list of economic factors that may potentially a↵ect transaction prices for

house i in a neighborhood Ni:

• Xi: A vector of house-specific amenities including: lot size, square footage, number of

1Also, related is the literature that examines how information, or the lack of information, impacts equityprices. Easley and O’Hara (1987) show that large trades in the securities market reflect better informationand impact security prices, and that investors demand higher returns on stocks for which there is less publicinformation. Real life examples of markets for information, like car reports for used cars or online reviews forrestaurants, more directly speak to the value people put on information.

2As a preliminary step, we first examined the hypothesis that online property price estimates impact actualsales prices at the aggregate level. If house price information directly impacts house prices, we expect therelation to hold at an aggregate level as well. Specifically, we test whether Zillow’s median price estimatesGranger cause the median sales price as reported by Zillow across 30 MSAs in the US. Appendix A presentsthe method and results.

4

Page 5: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

bedrooms, number of bathrooms, and year built.

• Ui: The value of unobserved house-specific amenities including: floor plans and appli-

ances.

• VNi : The value of unobserved neighborhood-specific amenities including: public schools,

crime, curb appeal, environmental quality, and other public services.

• Zi: Home price information, i.e., Zillow’s price estimates, that housing market partici-

pants can observe.

The standard hedonic pricing models forecast the transaction price as follows:

Yi = ↵ +Xi�| {z }Regression

+

Residual 1z}|{VNi +

Residual 2z}|{Ui +

Residual 3z}|{"i| {z }

Residual

. (2.1)

For the purpose of elucidating the problem that we face in our study, we decompose the usual

residual into three components, the first one reflecting the value VNi of neighborhood-specific

amenities, the second one reflecting the value Ui of house-specific amenities, and the third

one representing idiosyncratic errors "i. The standard hedonic pricing model (2.1) assumes

that sellers and/or buyers take the vector of house-specific amenities (Xi, Ui) and the value

of neighborhood-specific amenities VNi into account when making decisions about transaction

prices Yi in the equilibrium. Econometricians estimate the reduced-form coe�cient �, called

contributory values, for Xi, the house-specific amenities that are observable in the data.

We hypothesize that agents may also take into account the home price information Zi,

the one that is produced by real estate information providers like Zillow, when proposing to

set transaction prices. This hypothesis may reflect that both buyers and sellers may not be

so confident of their own home evaluation based on the information of the house and the

neighborhood, and therefore tend to use the measure Zi provided by third parties. In this

light, we propose an extended reduced-form equilibrium pricing model simply as the convex

5

Page 6: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

combination of outside and self valuations:

Yi = �Zi + (1� �)[↵ +Xi� + VNi + Ui + "i]. (2.2)

The expression in the square brackets in the second term, ↵ +Xi� + VNi + Ui + "i, constitute

those factors used in the traditional hedonic pricing models (2.1). Further, we add the first

term Zi to reflect the potential e↵ects of the home price information Zi on transaction prices Yi.

As such, the parameter � may be interpreted as the degree which agents rely on the third-party

information. Our null hypothesis that the home price estimates Zi do not impact the actual

transaction prices is thus represented by the equality � = 0, which is readily testable once apN -consistent estimate of � is obtained.

The OLS estimators of the parameters ↵, �, and � would be consistent if (VNi , Ui) were

mean independent of both Zi and Xi. However, this statistical independence assumption is

hard to justify at least for two reasons. First, the unobserved house-specific amenities Ui are

likely to be correlated with the observed house-specific amenities Xi. Second, more importantly

in our study, the introduction of Zi in the extended pricing model (2.2) causes another source

of endogeneity. To see this, it may help to think of how the home price information Zi is gener-

ated by real estate information providers. Although these service agencies do not disclose their

formulas, the estimates Zi are constructed using recent transaction data in the neighborhood

Ni of house i. (See Section A.2 in the appendix for the case of Zillow.) As such, the statistical

independence Zi ?? VNi between the price estimate and unobserved neighborhood characteris-

tics, or the corresponding mean independence, will probably not hold even if we control for the

observed house specific amenities Xi. We therefore propose a couple of approaches to handle

these two sources of endogeneity in the subsequent sections.

6

Page 7: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

2.2 Proxy Variable

To control for the endogenous unobserved house-specific amenities Ui, we follow the proxy

variable approach.Specifically, we construct a proxy variable using listing prices, denoted by

Li. The seller can perceive house-specific amenities Ui that econometricians cannot observe.

Note that the hedonic valuations Hi and the listing prices Li are both public information even

without house visits, while the true amenities Ui can be observed by the prospective buyers

only in house visits. In order to send a correct signal about the amenities Ui to prospective

buyers, sellers may add their values to benchmark hedonic valuations Hi when proposing their

listing prices, i.e.,

Li = Hi + g(Ui).

List prices may di↵er from the online hedonic estimates Hi for various reasons. List prices

tend to start high since the seller predicts that the negotiation process will ultimately result

in a lower sales price. How quickly the seller needs to sell the property could also impact the

list price. The function g thus captures the seller’s adjustment of the self-valuation of Ui. Note

that the identity function g(u) = u implies that there is no markup or markdown in the listing

prices above the observed and unobserved value of the house.3

Finally, to take this structure into estimation of the parameters, we assume that g is strictly

increasing so that its inverse g�1 exists. With this inverse function, we can recover the unob-

served house-specific amenities Ui by

Ui = g�1(Li �Hi).

Substituting this expression in (2.2) yields

Yi = �Zi + (1� �)[↵ +Xi� + VNi + g�1(Li �Hi) + "i]

= �Zi + ↵ +Xi� + �VNi + g(Li �Hi) + "i, (2.3)

3We find that initial list prices are higher than the prior hedonic estimates in about 68 percent and lower inabout 32 percent of the observations in our sample.

7

Page 8: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

where ↵ := (1 � �)↵, � := (1 � �)�, � = (1 � �), g := (1 � �)g�1(·) and "i = (1 � �)"i for

short-hand notations. This operation removes one of the two sources of endogeneity, namely

Ui, and it thus remains to handle the other unobserved variable VNi . For estimation of the

parameters with the additive nonparametric function g provided VNi is known, we can use

Robinson (1988).

This method works as follows. If the mean independence E["i | Ui] = 0 is true, then

E[Yi | Ui] = � E[Zi | Ui] + ↵ + E[Xi | Ui]� + � E[VNi | Ui] + g(U)

follows, where Ui = Li �Hi for a short-hand notation. Thus, we obtain

Yi � E[Yi | Ui] = �(Zi � E[Zi | Ui]) + (Xi � E[Xi | Ui])� + �(VNi � E[VNi | Ui]) + "i

If the contributory value VNi of neighborhood Ni were observed, then � may bepN -consistently

estimated by the OLS of Yi �E[Yi | Li �Hi] on Zi �E[Zi | Li �Hi], Xi �E[Xi | Li �Hi], and

VNi � E[VNi | Li � Hi], where the nonparametric regressions E[Yi | Li � Hi], E[Zi | Li � Hi],

E[Xi | Li �Hi] and E[VNi | Li �Hi] are pre-estimated using the kernel method.

2.3 Local First Di↵erencing

The previous section introduced a way to control for house-specific unobservables Ui, provided

that the contributory value VNi of neighborhood Ni were observed. If we have multiple ob-

servations per neighborhood, however, we do not need to observe VNi since we can take first

di↵erences within a neighborhood to cancel out the VNi terms. Note that Ni = Nj clearly

implies VNi = VNj . Hence, we can take the di↵erence of (2.3) between two properties, i and j

to obtain the equation

Yi � Yj = �(Zi � Zj) + (Xi �Xj)� + g(Li �Hi)� g(Lj �Hj) + "i � "j. (2.4)

8

Page 9: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

for any pair (i, j) such that Ni = Nj, i.e., within the same neighborhood.

This operation, mechanically identical to the method of first di↵erencing for panel data

analysis, removes the neighborhood fixed e↵ect VNi . For this sort of first-di↵erenced partially

linear equations, Li and Stengos (1996) extend the Robinson method (see the previous section).

Specifically, � may bepN -consistently estimated by the OLS of Yi�Yj�E[Yi�Yj | Li�Hi, Lj�

Hj] on Zi�Zj�E[Zi�Zj | Li�Hi, Lj�Hj], and Xi�Xj�E[Xi�Xj | Li�Hi, Lj�Hj], where

the nonparametric regressions E[Yi � Yj | Li �Hi, Lj �Hj], E[Zi � Zj | Li �Hi, Lj �Hj], and

E[Xi�Xj | Li�Hi, Lj�Hj] are pre-estimated using the kernel method. Baltagi and Li (2002a,

2002b) propose alternative methods to estimate first-di↵erenced partially linear models with

discussions on asymptotic properties of the estimators – they suggest that the nonparametric

pre-estimations be done with series approximation instead of the kernel method in order to

take advantage of the additivity between g(Li � Hi) and g(Lj � Hj). See Section A.3 in the

appendix for further details on how we use this semiparametric approach to estimate � and �.

3 Data

We collect house level data from Zillow, one of the major online real estate information

providers. Zillow provides individual house price estimates, which are available regardless

of whether the property is on the market or not.4 Zillow does not disclose the formula that

it uses to generate their price estimates, but the website mentions that it uses the physical

attributes of the property, tax assessments, and prior and current transactions of the property

itself and the comparable properties nearby. In addition to its current house price estimates,

Zillow provides its past estimates, current and past listing prices when available, and the most

recent sales price, and past sales prices when available. We collect the sales date and price,

4Currently there are many real estate websites. Many of these websites are brokerage websites where listingand selling of properties on the market is the main business model. These websites belong to the local multiplelisting service (MLS) which are local associations where real estate agents share their property listings. Otherwebsites, for example Zillow and Trulia, are not real estate brokerage firms but mainly serve as an informationprovider to various parties interested in the real estate market. Their business model aims to gain a wideaudience and profit from advertisement fees, not through brokerage fees.

9

Page 10: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

Zillow estimate at the time of sales, the estimates one, two, three, and six months before sales,

the initial listing price, and historical sales and listing prices when available. In addition, a rich

set of house specific and neighborhood specific information are available for Xi. We collect the

address of the house, square footage, number of bedrooms, number of bathrooms, lot size, year

built, and property tax. Zillow also provides nearby school names and the school ratings from

GreatSchools.org.

In constructing the sample, we collect data on multiple houses for each neighborhood to

enable first di↵erencing within neighborhoods. The following procedure is employed to collect

our sample of house level data. We first choose 30 MSAs where Zillow provides both their

house price estimates and sales price information.5 For each MSA we find 10 neighborhoods

with median Zillow estimates closest to the MSAmedian Zillow estimate and collect data on four

houses per neighborhood. If there are less than 10 neighborhoods in an MSA we additionally

collect four more houses from existing neighborhoods, starting with neighborhoods that have

median Zillow estimates closest to the MSA median Zillow estimate. Within each neighborhood,

we restrict the search to single family houses that are 2000 sqft or above, have 3 bedrooms or

more, 2 bathrooms or more, and were last sold between July 2012 and July 2013. For each

neighborhood, we narrow down to houses that have Zillow estimates that are closest to its

Zipcode median Zillow estimate and that list the same set of nearby public elementary schools.

We then randomly select the first four houses that have non-missing information on the Zillow

estimates at time of sales, sales price, initial listing price, number of beds and baths, house

square footage, lot size, and year built. This procedure returns 40 houses in 30 MSAs for a

total of 1,200 observations. Table 1 Panel A summarizes the characteristics of these houses.

In order to explore possible factors that might explain the variation in the elasticity estimates

across the 30 MSAs, we collect MSA level data from the US Census. We collect information

on the internet penetration rate, i.e., the average accessibility of the internet at home, and the

5We first refer to the MSA Zestimate accuracy file. The MSA file contains 30 MSAs. How-ever, in a few MSAs sale prices were not reported and we replaced those MSAs with other MSAswhere accuracy estimates and sales prices were available. The Zestimate accuracy file is accessible athttp://www.zillow.com/howto/DataCoverageZestimateAccuracy.htm.

10

Page 11: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

number individuals employed as residential real estate agents. In addition, we compile data

on the population, land area, number of families, number of housing units, median household

income, educational attainment, and unemployment rate for each MSA. The variables are for

the year 2013, except for the number of real estate agents in the MSA, which is for 2012. Table

1 Panel B presents the summary statistics for the MSA level variables.

4 Results

4.1 The Impact of Real Estate Information on Sales Prices

We first implement our procedure on the full sample and later by MSAs. Table 2 presents the

full sample results. In columns (1) through (3) we do not first di↵erence observations within

neighborhood, and hence we are not controlling for the unobserved neighborhood component.

Column (1) does not include the proxy for the unobserved house specific characteristics, column

(2) controls for the unobserved house specific characteristics by including a linear proxy, which

is the di↵erence between the initial list price and the Zillow estimate at the month of listing,

and column (3) uses the nonlinear proxy in place of the linear proxy as described in Section

2.2.

The coe�cient estimates on log Zillow estimate is closely distributed around one in all three

specifications and are statistically significantly di↵erent from zero. In panel A of Table 2, we

additionally test whether the coe�cient estimate is statistically di↵erent from one. We are

unable to reject the null in the first three columns.

Rather than reporting coe�cient estimates on each observable house specific variables, i.e.,

number of bedrooms, number of bathrooms, square footage, etc., we report the p-value from

the joint hypothesis test of whether all the coe�cient estimates on the observable housing

characteristics are equal to zero in panel B. We reject the null in columns (1) through (3). In

column (2), we also report the coe�cient estimate on the linear proxy. The coe�cient estimate

on the proxy is 0.23 and is statistically significantly di↵erent from zero at the 10 percent level.

11

Page 12: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

Columns (4) through (6) report results when we additionally perform the neighborhood first-

di↵erencing procedure to control for unobserved neighborhood characteristics, as described in

Section 2.3. The coe�cient estimates on log Zillow estimate decrease slightly to around 0.88 to

0.95. As the p-values in panel A indicate we reject the hypothesis that the elasticity estimate

is one in columns (4) and (5). Though we can not reject the same hypothesis in column (6) the

p-value is relatively small at 0.138. Panel B indicates that we can reject the hypothesis that all

the coe�cient estimates on the observable housing characteristics are equal to zero in columns

(4) and (5). However, we can not reject the joint hypothesis that the observable covariates are

jointly equal to zero in column (6).

Maintaining the specification with the non-linear proxy and local first di↵erencing as the

base, we additionally control for how long the house was listed on the market until it sold in

column (7). We add higher order covariates of the observables in column (8), and seasonal

trends in column (9). The elasticity estimate is tightly distributed between 0.941 and 0.948.

As the p-values in panel A indicate, we can not reject the hypothesis that these estimates are

statistically di↵erent from one at the 15 percent level. The p-values in panel B indicate that we

are now unable to reject the joint hypothesis that all observable covariates are zero. Overall,

Table 2 indicates that the elasticity estimates of house sales prices to online price estimates

are large and close to one, even when observable and unobservable house and neighborhood

characteristics are controlled for. In other words, participants in the housing market seem to

rely on online price information more so than own hedonic assessments of house or neighborhood

attributes. The multiple dimensions of house information, Xi, Ui and VNi , are available for them

to look at, but the scalar online price index, Zi, may be su�cient or more important to the

market participants when pricing property.

We hypothesize that information, now readily available through online websites, is directly

a↵ecting property sales prices and present evidence consistent with this information channel

in the following sections. However, once one considers how property transactions are actually

made, the Table 2 results may seem less surprising. A hedonic framework is often used to

12

Page 13: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

estimate the marginal valuation of one attribute in a composite good. It is an ex post revelation

of what people’s marginal willingness to pay for an attribute is. For instance, we can back out

the marginal value of a bedroom from a hedonic regression. However, the market participant,

be it the seller, buyer, or broker, would rarely use the estimates from a hedonic method to price

the composite good, the house. One of the most common methods to value property is to use

recent sales prices of comparable properties in the neighborhood and then to make marginal

adjustments based on the di↵erent attributes of the houses. Hence, nearby information is a

prime determinant in setting house prices. The emergence of Zillow and various online real

estate websites have made this information readily available to everyone in the market and not

just to comparables. Now, almost every potential homeowner has a comparable price for his or

her exact home.

4.2 What Explains the Variation in the Elasticity Estimates across

MSAs?

We next examine how the elasticity estimate �, i.e, the relative value of online information,

varies across MSAs. Table 3 presents the impact of one month prior Zillow estimates on sales

prices in each of the 30 MSAs using the full model in Table 2 column (9). For each MSA we

additionally conduct hypotheses tests on whether the elasticity estimate is one and whether all

covariates are jointly equal to zero. Even with 40 observations per MSA, many of the estimates

are statistically di↵erent from zero at the 10 percent level. The estimates vary considerably

across MSAs, e.g. ranging from -0.9 in Las Vegas to 2 in Boston. Many of the estimates are

statistically indistinguishable from one even at the MSA level.

What might explain the variation in the elasticity estimates in the real estate market and

hence the elasticity estimates in Table 3? We hypothesize that the availability of internet at

home would increase the demand for online house price information and the relative valuation

of online house price information in determining sales prices, i.e., the elasticity estimate. We

use the internet penetration rate, measured as one minus the share of households without

13

Page 14: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

internet and computer access in the MSA, as our main proxy for accessibility to online house

price information. Figure 1 presents a scatter plot between the MSA level elasticity estimates

and the internet penetration rate. The scatterplot reveals a surprisingly strong and positive

relationship between the two variables. Despite the sample size of 30, the bivariate regression

in Table 4 column (1) confirms that this relationship is statistically significant at the 5 percent

level. A 1 percent point increase in the internet penetration rate is associated with an increase

in the elasticity estimate by 0.2. In column (2) we include the number of residential real estate

agents in the MSA, but this variable has no significant impact and the coe�cient estimate on

the internet penetration rate remains unchanged. In column (3), we control for the size of the

MSA by controlling for the land area and population. The coe�cient estimate on the internet

penetration rate decreases slightly but is still significant. In column (4), we additionally control

for the number of families. The number of families conditional on the population captures the

potential demand for housing in the MSA. The coe�cient estimate on the log number of families

is positive and the coe�cient estimate on population is now negative. This likely suggests that

cities with more families per population demand more housing and puts more value on online

price information. In column (5), we control for the median household income and the number

of adults with a college degree. There is a weak negative relationship with median income

but no significant relationship with the size of the college educated population. We note that

the coe�cient estimate on internet penetration rate increases to 32.57. Finally, we control for

housing supply and the economic condition in the city. The negative coe�cient estimate on

the log number of housing units suggests that the supply of housing conditional on potential

demand reduces the elasticity estimate, potentially by reducing the relative demand for online

house price information. The coe�cient estimate on internet penetration rate barely changes

and is now statistically significant at the 1 percent level. The robust relationship between

the internet penetration rate and the elasticity estimate across MSAs in Table 4 supports the

hedonic price model we laid out in Section 3.

14

Page 15: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

4.3 How do online estimates impact list prices and sales prices

around the time of sales?

We further explore the information channel by examining how Zillow estimates impact the

movement of list prices and sales prices for the same house around the time of sales. Using

our data we construct a panel data of list prices and Zillow estimates. Each house has Zillow

estimates from 1, 2, and 3 months prior to sales. We were able to collect historical list prices

for only a subset of the initial sample. We perform a simple house level fixed e↵ects regression

of list price on current or one month prior Zillow estimates. As the results in Table 5 indicate,

we find no impact of Zillow estimates on list prices in the months before sales. People do not

immediately adjust list prices based on the short term changes in Zillow estimates.

But do Zillow estimates have an impact when sales are being negotiated? We examine

whether the short term change in Zillow estimates impact how the final sales price adjusts from

the list price one month prior to sales. Table 6 presents first-di↵erenced regression results where

the dependent variable is sales price minus list price one month prior to sales, and the main

regressor is the change in Zillow estimate between the period 1 month to 2 month prior to sales.

We control for the number of days the house was listed on the market prior to sales. Panel

A of column (1) indicates that the Zillow estimates have no impact on how prices adjust in

the final month leading to sales. However, we are concerned with cases where Zillow estimates

bounce around drastically and may not be perceived as reliable. We restrict the sample and

drop observations where the Zillow estimates fluctuate drastically in Panels B and C. Panel B

uses the sample of houses where the change in Zillow estimates is less than $100,000. Panel C

uses the sample where the change is less than 10 percent of the initial Zillow estimate. The

coe�cient estimate in column (1) is positive and statistically significant in Panel B and becomes

larger as we trim the sample further in Panel C.

A concern in column (1) is whether the change in market conditions that is correlated with

the change in Zillow estimates is driving the final price adjustments during the sales phase.

We additionally control for year and month fixed e↵ects in column (2) and the change in MSA

15

Page 16: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

level Zillow estimates in column (3). We exclude the own house observation when calculating

the MSA level average. Column (4) includes both the fixed e↵ects and the change at the MSA

level. The results remain quite robust across the di↵erent specifications. Column (4) estimates

of Panels B and C indicate that a thousand dollar change in the Zillow estimate results in the

final sales price adjusting from the list price one month prior to sales by 200 to 400 dollars.

5 Conclusion

We investigate how sensitive sales prices are to online price estimates in the housing market.

We propose a reduced-form equilibrium pricing model as the convex combination of third-party

online price estimate and self valuation of properties. Our method nonparametrically proxies

for unobservable house attributes by using the di↵erence between the listing price and the

online price estimate, and controls for unobserved neighborhood attributes by neighborhood

first di↵erencing. We collect house price estimates, sales and list prices, in addition to various

house and neighborhood attributes from Zillow.com across 30 MSAs in the US. Our empirical

results show that the elasticity of sales price with respect to the Zillow estimate is large and

close to one. In addition, we find that the accessibility of the internet at home strongly and

positively predicts the elasticity estimate across metropolitan areas. Furthermore, the change

in Zillow estimates impacts how sales prices adjust from the list prices a month before sales.

The literature has found that information impacts asset prices, in particular in the securi-

ties market. We find that information is valued in the real estate market as well. The large

elasticity estimate between house sales prices and online price estimates may have significant

implications. If information is more important than fundamentals in determining real estate

prices, then how information is generated could have a big impact on house price dynamics.

Also, the prevalence and accessibility of online house price information and people’s reliance

on such information potentially could precipitate house price movements.

16

Page 17: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

Acknowledgements

We thank Nathaniel Baum-Snow, Thomas Davido↵, Daniel Fetter, Edward Kung, Je↵ Zabel

and seminar participants at the AREUEA Annual Conference, Urban Economics Associations

Annual Conference, Greater Boston Urban and Real Estate Economics Conference, the Lincoln

Land Institute, and Williams College for comments. Danny Guo and Simmon Kim provided

excellent research assistance.

References

Bajari, Patrick, Jane Cooley, Kyoo il Kim, and Christopher Timmins. 2012.“A Rational Ex-

pectations Approach to Hedonic Price Regressions with Time-Varying Unobserved Product

Attributes: The Price of Pollution,” American Economic Review, 102(5): 1898-1926.

Baltagi, Badi and Dong Li. 2002a. “Series Estimation of Partially Linear Panel Data Models

with Fixed E↵ects,” Annals of Economics and Finance, 3: 103-116.

Baltagi, Badi and Qi Li. 2002b. “On Instrumental Variable Estimation of Semiparametric

Dynamic Panel Data Models,” Economics Letters, 76: 1-9.

Baum-Snow, Nathaniel and Fernando Ferreira. 2015. “Causal Inference in Urban Economics,”

Gilles Duranton, J. Vernon Henderson and William C. Strange Ed., Handbook of Regional

and Urban Economics, Vol. 5: 588-638

Bayer, Patrick, Fernando Ferreira, and Robert McMillan. 2007. “A Unified Framework for

Measuring Preferences for Schools and Neighborhoods,” Journal of Political Economy, 114(4):

588-638.

Bayer, Patrick, Marcus Casey, Fernando Ferreira, and Robert McMillan. 2013. “Estimating

Racial Price Di↵erentials in the Housing Market,” mimeo.

Black, Sandra. 1999. “Do Better Schools Matter? Parental Valuation of Elementary Educa-

tion,” Quarterly Journal of Economics, 114(2): 577-599.

17

Page 18: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

Campbell, John Y., Stefano Giglio, and Parag Pathak. 2011. “Forced Sales and House Prices,”

American Economic Review, 101: 2108-2131.

Chay, Kenneth and Michael Greenstone. 2005. “Does Air Quality Matter? Evidence from the

Housing Market,” Journal of Political Economy, 113(2): 376-424.

Easley, David and Maureen O’Hara. 1987. “Price, Trade Size, and Information in Securities

Markets,” Journal of Financial Economics, 19: 69-90.

Ferreira, Fernando and Joseph Gyourko. 2011. “Anatomy of the Beginning of the Housing

Boom: U.S. Neighborhoods and Metropolitan Areas 1993-2009,” mimeo.

Ferreira, Fernando. 2010. “You Can Take It with You: Proposition 13 Tax Beneifts, Residen-

tial Mobility and Willingness to Pay for Housing Amenities,” Journal of Public Economics,

94:661-673.

Levitt, Steven D. and Chad Syverson. 2008. “Market Distortions When Agents Are Better

Informed: The Value of Information In Real Estate Transactions,” Review of Economics and

Statistics, 90(4): 599-611.

Li, Qi and Thanasis Stengos 1996. “Semiparametric Estimation of Partially Linear Panel Data

Models,” Journal of Econometrics, 71: 389-397.

Newey, Whitney K. (1997) “Convergence Rates and Asymptotic Normality for Series Estima-

tors,” Journal of Econometrics, 79: 147-168.

Robinson, Peter. 1988. “Root-N-Consistent Semiparametric Regression,” Econometrica, 56:

931-954.

Rosen, Sherwin. 1974. “Hedonic Prices and Implicit Markets: Product Di↵erentiation in Pure

Competition,” Journal of Political Economy, 82(1): 34-55

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso,” Journal of the

Royal Statistical Society: Series B, 58: 267-288.

18

Page 19: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

Zou, Hui. 2006. “The Adaptive Lasso and Its Oracle Properties,” Journal of the American

Statistical Association, 101: 1418-1429.

A Appendix

A.1 The MSA Level Analysis

As a preliminary check, we also examine the hypothesis that online property price estimates

impact actual sales prices at the aggregate level. If house price information directly impacts

house prices, we expect the relation to hold at an aggregate level as well. Specifically, we test

whether Zillow’s median price estimates Granger cause the median sales price as reported by

Zillow across 30 MSAs in the US. The 30 MSAs were chosen based on Zillow’s MSA level report

and the availability of individual sales price information.6 Appendix Table 1 lists the 30 MSAs

and the summary statistics of the median sales price and Zillow estimates for three bedroom

single family houses. The MSA level data is available at Zillow’s research division and we collect

monthly data from October 2008 to April 2013.7 The following two subsections introduce the

empirical methodology for the MSA level analysis, and the third subsection presents empirical

results.

A.1.1 Granger Causality in VAR

For each MSA, we denote Zillow’s median log house price estimate at time t by Zt. The median

log sales price at time t is denoted by Yt. We assume that they jointly follow the p-th order

vector autoregressive (VAR(p)) process:

0

B@Zt

Yt

1

CA =

0

B@A

0,1

A0,2

1

CA+pX

q=1

0

B@Aq,1,1 Aq,1,2

Aq,2,1 Aq,2,2

1

CA

0

B@Zt�q

Yt�q

1

CA+

0

B@"t,1

"t,2

1

CA (A.1)

6Section 4 describes the selection of MSAs in more detail7The data is available at http://www.zillow.com/blog/research/data/

19

Page 20: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

We say that Zt does not Granger cause Yt if Aq,2,1 = 0 for all q = 1, · · · , p. A test

of this null hypothesis can be conducted by the Wald test on (A1,2,1, · · · , Ap,2,1). Let A

2

=

(A0,2, A1,2,1, A1,2,2, · · · , Ap,2,1, Ap,2,2) be the (2p+ 1)-dimensional vector of the coe�cient in the

second row of the above VAR model (A.1). Let A2

denote its consistent estimate, and let ⌃A2

denote a consistent estimate of the variance matrix of the coe�cient estimate A2

. The Wald

statistic is computed by

W = A02

R0(R⌃A2R0)�1RA

2

where R is the p by 2p+1 restriction matrix whose 2r-th column is one for each row r = 1, · · · , p,

and all the other elements are zero. Under the null hypothesis H0

: A2

= ~0, this Wald statistic

W follows the chi-square distribution of p degrees of freedom. We report this statistic and the

associated p-value for the test of Granger causality.

A.1.2 Model Selection

There is arbitrariness in the choice of the order p of the VAR model (A.1). Some commonly used

approaches to selecting p include Akaike Information Criterion (AIC) and Bayesian Information

Criterion (BIC). We conduct the hypothesis testing after selecting the order p of the VAR

process by choosing the minimum AIC or BIC. However, these approaches have some drawbacks

in terms of consistency of model selection and validity in post-selection inference.

A recently popular method of model selection in econometrics is the least absolute shrinkage

and selection operator (LASSO: Tibshirani, 1996). In particular, the adaptive LASSO (Zou,

2006) enjoys the nice Oracle property as well as consistency of the model selection. This method

works as follows. Let A denote a preliminary consistent estimate of the parameters A in model

(A.1) without model restriction, e.g., the least squares estimate under a choice of large order

20

Page 21: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

p. The adaptive LASSO estimate A⇤ is obtained by the L1 penalized least square problem

A⇤ = argminA

TX

t=p+1

�������

0

B@Zt

Yt

1

CA�

0

B@A

0,1

A0,2

1

CA�pX

q=1

0

B@Aq,1,1 Aq,1,2

Aq,2,1 Aq,2,2

1

CA

0

B@Zt�q

Yt�q

1

CA

�������

2

+�T

2X

r=1

0

@ |A0,r|���A0,r

��� +

2X

c=1

pX

q=1

|Aq,r,c|���Aq,r,c

���

1

A .

The theory (Zou, 2006) requires that the tuning parameters > 0 and �T > 0 satisfy the

asymptotic order �T/pT ! 0 and �TT

(�1)/2 ! 1 as T ! 1. In practice, however, T is

fixed given a finite sample, and thus this asymptotic guideline may not be useful. Therefore,

we present empirical estimation results for each of the di↵erent values of the tuning parameter,

and examine their robustness.

A.1.3 Results from the MSA level analysis

Appendix Table 2 presents the tests of Granger causality of median sales price yt by Zillow’s

median price estimates for each MSA. The results are based on the best information criteria

with the maximum p set to 10. Column (1) presents results when optimal p is chosen using

the minimum AIC. The optimal lag ranges from 7 to 10 with 10 being the most common. In

all 30 cities but one, Boston, the joint hypothesis that all the coe�cients on Zt�q are zero is

rejected at the 10 percent level. Column (2) uses the minimum BIC to choose p and the joint

hypothesis tests reject the null for all cities except Boston, Denver, and Philadelphia. The more

preferred LASSO method with tuning parameters of 0.5, 1.0, and 2.0 are presented in columns

(3) - (5). The optimal p tends to be smaller in these columns than in columns (1) and (2), but

the general results are similar. Other than in three to five cities, we reject the null hypothesis

that online price estimates do not Granger cause sales prices. The last row of Appendix Table

2 presents results for the entire US. The selected lag orders are 5 and 6 in the LASSO models of

columns (4) and (5) and 10 in the other models. The p-values imply that Zillow’s median price

estimate Granger-cause actual sales prices at the national level in all five models. The MSA

21

Page 22: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

and national level aggregate results indicate that house price information may impact actual

transaction prices.

A.2 Zillow Estimates

Zillow does not disclose the formula for how their hedonic price estimates are produced, but

they mention which data they use. According to their website, some of the data that they use

include:

• Physical attributes: Location, lot size, square footage, number of bedrooms and bath-

rooms and many other details.

• Tax assessments: Property tax information, actual property taxes paid, exceptions to tax

assessments and other information provided in the tax assessors’ records.

• Prior and current transactions: Actual sale prices over time of the home itself and com-

parable recent sales of nearby homes

A.3 Semiparametric Estimation

In this section, we describe details of the econometric method for semiparametric estimation of

� and �. For convenience of writing to the current goal, we slightly change the index notations

from those used in the main text. Suppose that there are N neighborhoods indexed by n. For

simplicity, assume that data contains J + 1 houses in each neighborhood. Randomly order

the J houses in each neighborhood to produce the indices j = 0, · · · , J for each neighborhood

n = 1, · · · , N . We observe a random sample of (Yn, Zn, Xn, Ln, Hn) where Yn = (Yn0, · · · , YnJ),

Zn = (Zn0, · · · , ZnJ), Xn = (Xn0, · · · , XnJ), Ln = (Ln0, · · · , LnJ), and Hn = (Hn0, · · · , HnJ).

We make the following assumption for this random sample.

Assumption 1. (i) The random vector (Yn, Zn, Xn, Ln, Hn) is independently and identically

distributed. (ii) The support of (Zn, Xn, Ln, Hn) is a Cartesian product of compact connected

22

Page 23: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

intervals on which (Zn, Xn, Ln, Hn) has a probability density function bounded away from zero.

(iii) g�1

and Var(Ynj | Znj = · , Xnj = · , Lnj �Hnj = · ) are bounded functions.

The vectors of first-di↵erenced observed random variables are stacked into matrices of NJ

rows as

Y = (�Y11

, · · · ,�Y1J ,�Y

21

, · · · ,�Y2J , · · · · · · ,�YN1

, · · · ,�YNJ)0

Z = (�Z11

, · · · ,�Z1J ,�Z

21

, · · · ,�Z2J , · · · · · · ,�ZN1

, · · · ,�ZNJ)0

X = (�X11

, · · · ,�X1J ,�X

21

, · · · ,�X2J , · · · · · · ,�XN1

, · · · ,�XNJ)0

where �Ynj = Ynj � Ynj�1

, �Znj = Znj � Znj�1

, and �Xnj = (Xnj � Xnj�1

)0 for each n =

1, · · · , N and j = 1, · · · , J

We approximate the unknown nonparametric function g using the power series. Specifically,

define the K-dimensional random vector

pKnj =�(Lnj �Hnj)� (Lnj�1

�Hnj�1

), · · · , (Lnj �Hnj)K � (Lnj�1

�Hnj�1

)K�0

for each n = 1, · · · , N and j = 1, · · · , J . Define the NJ ⇥K random matrix

PK = (pK11

, · · · , p1J , p

K21

, · · · , p2J , · · · · · · , pKN1

, · · · , pNJ)0.

To control the bias caused by this approximation, we use the following assumptions for su�cient

smoothness of g and the asymptotic choice of the order K of the power series.

Assumption 2. g is C1

-di↵eomorphic.

Assumption 3.pN/K ! 0 as N ! 1.

We estimate the parameters (�, �0) in the following manner. To partial out pKnj from the

estimating equation, we obtain the projections Y = PK(P 0KPK)�P 0

KY, Z = PK(P 0KPK)�P 0

KZ,

and X = PK(P 0KPK)�P 0

KX, where (·)� denotes a symmetric generalized inverse operation. Our

23

Page 24: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

estimator is

0

B@�

ˆ�

1

CA =

⇣(Z : X)� (Z : X)

⌘0 ⇣(Z : X)� (Z : X)

⌘�� ⇣(Z : X)� (Z : X)

⌘0 ⇣Y � Y

We also state the following rank condition.

Assumption 4. The matrix � := E[e0njenj] is positive definite, where enj is defined by enj =

(Znj : Xnj)� E [(Znj : Xnj) | Lnj �Hnj]

We use the result of Baltagi and Li (BL, 2002) to obtain standard errors. Assumption 1

(i)–(iii) imply Assumption 2.1 of BL For the power series estimator Assumption 1 (ii) implies

Assumption 8 of Newey (1997), which in turn implies Assumption 2 of Newey that is equivalent

to Assumption 2.2 of BL. Assumption 2 implies that g is continuously di↵erentiable. Since the

argument Lnj �Hnj of g is univariate, this implies that Assumption 3 of Newey for d = 0 (that

is equivalent to Assumption 2.3 (i) of BL) is satisfied with ↵ = s = 1 (= � in BL) – see page

157 of Newey. With this smoothness requirement, the choice of K as in Assumption 3 satisfies

Assumption 2.3 of BL. Therefore, by Theorem 2.1 of BL, we obtain

pN

0

B@� � �

ˆ� � �

1

CA ! N(0,��1⌦��1)

as N ! 1 under Assumptions 1–4, where � is the positive definite matrix defined in Assump-

tion 4 and ⌦ is given by ⌦ = E⇥e0nj E

⇥("nj � "nj�1

)2 | Znj, Xnj, Lnj �Hnj

⇤enj

⇤.

To estimate the part ⌦ of the variance matrix which involves the first-di↵erenced residuals

"nj � "nj�1

, we need to obtain the estimate

ˆg(Lnj �Hnj) =�(Lnj �Hnj), · · · , (Lnj �Hnj)

K�(P 0

KPK)�P 0K

⇣Y � Z� �X ˆ�

of the unknown nonparametric function g at each data point (Lnj �Hnj) and then recover the

24

Page 25: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

estimate

\("nj � "nj�1

) = Ynj�Ynj�1

�(Znj�Znj�1

)��(Xnj�Xnj�1

) ˆ�� ˆg(Lnj�Hnj�1

)+ ˆg(Lnj�Hnj�1

)

of "nj � "nj�1

. It follows from the uniform consistency of ˆg under Assumptions 1–3 (Theorem

2.2 (i) of BL) that \("nj � "nj�1

) is uniformly consistent for "nj � "nj�1

.

25

Page 26: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

26!!

Table 1. Summary statistics

Variable Mean Std. Dev. Min Max Obs

Panel A: House level data

Sales price 321914 245299 10000 2950000 1200

Zillow estimate when sold 322056 230858 36000 2600000 1200

Zillow estimate 1 month prior to sale 320108 230967 38000 2600000 1200

Zillow estimate 2 month prior to sale 317984 230550 37000 2600000 1200

Zillow estimate 3 month prior to sale 315433 230222 16100 2600000 1200

Zillow estimate in the month listed for sale 324022 232077 33000 2900000 1189

List price 340012 251240 19900 2900000 1199

Number of days between listing and sales 185.1 269.9 2 3349 1195

Number of bedrooms 3.85 0.87 3 9 1200

Number of bathrooms 2.68 0.64 2 6 1199

Square footage 2373 547 2000 10890 1200

Lot square footage 8365 13936 595 304920 1199

Year built 1960 37 1810 2013 1200

Panel B: MSA level data

Land area 6040.69 4783.888 1600.9 27259.9 30

Population, 2013 4254068 3822144 1601565 1.97E+07 30

Housing units, 2013 1711144 1458024 654120 7797490 30

Number of families, 2013 1013282 876799.7 400899 4550781 30

Median household income, 2013 60789.97 11027.35 46497 90962 30

Population above 25 yrs old with bachelor degree 20.74333 3.258589 12.7 26.9 30

Unemployment rate, 2013 9.943333 1.777771 7 14.7 30

Internet penetration rate, 2013 0.9149734 0.0129532 0.8945243 0.9439421 30

Number of employees in residential real estate agents or brokerage, 2012 4311.6 4262.602 749 22063 30

Notes: Variables listed in Panel A were collected manually from Zillow. Panel B data was extracted from the US Census.

Page 27: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

27!!

Table 2. The effect of online price estimates on sales prices: elasticity estimates using full sample

Dependent variable: Log sales price

(1) (2) (3) (4) (5) (6) (7) (8) (9)

Log Zillow estimate1 1.015 1.000 0.996 0.876 0.893 0.948 0.948 0.947 0.941

(0.029) (0.028) (0.029) (0.040) (0.038) (0.035) (0.035) (0.036) (0.036)

Length of listing -0.514 -0.5 -0.674 (0.842) (0.844) (0.843)

Proxy2 0.227 0.21 (0.131) (0.020)

Proxy format None Linear Non-linear3 None Linear Non-

linear Non-linear

Non-linear

Non-linear

Local First Differencing Yes Yes Yes Yes Yes Yes Higher Order Covariates4 Yes Yes

Season Dummy Yes A. Null hypothesis: coefficient estimate of log Zillow estimate=1 p-value 0.603 0.989 0.881 0.002 0.005 0.138 0.136 0.148 0.104 B. Null hypothesis: All coefficient estimates of the control variables =0 p-value 0.009 0.011 0.023 0.003 0.035 0.503 0.506 0.501 0.443 R-Squared 0.908 0.919 0.935 0.905 0.918 0.935 0.935 0.935 0.935 N 1200 1200 1200 1200 1200 1200 1200 1200 1200

Notes: 1 – The Zillow estimates are estimates one month prior to the time of sales. 2 – The proxy is the initial listing price minus Zillow estimate around the time of the initial listing. 3 – Third order polynomial approximation. 4 – Polynomials of third order for bed, bath, square feet, lot square feet, and year built. Standard errors are in parentheses.

Page 28: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

28!!

Table 3. The effect of online price estimates on sales prices: elasticity estimates by MSA

MSA Elasticity estimate1 p-Value: Elasticity

estimate1=0

p-Value: Elasticity

estimate1=1

p-Value: All coefficient

estimates on covariates2=0

R2 N

Atlanta 0.976 (0.234) 0.004 0.920 0.500 0.976 40 Baltimore 0.201 (0.297) 0.518 0.518 0.113 0.726 40 Boston 2.007 (0.263) 0.000 0.007 0.001 0.934 40 Charlotte -0.169 (2.926) 0.955 0.699 0.942 0.367 40 Chicago 1.000 (0.115) 0.000 0.997 0.701 0.969 40 Cincinnati 0.857 (0.290) 0.025 0.640 0.881 0.801 40 Columbus 0.513 (0.303) 0.125 0.142 0.906 0.600 40 Denver 1.139 (0.239) 0.001 0.575 0.960 0.901 40 Las Vegas -0.901 (0.755) 0.267 0.036 0.222 0.036 40 Los Angeles 0.799 (0.142) 0.001 0.197 0.179 0.940 40 Miami-Fort Lauderdale 0.928 (0.355) 0.031 0.845 0.288 0.900 40 Minneapolis-St. Paul 0.894 (0.148) 0.000 0.495 0.074 0.914 40 Nashville 1.095 (0.192) 0.000 0.631 0.476 0.770 40 New York 0.973 (0.501) 0.088 0.959 0.513 0.910 40 Orlando 0.587 (0.220) 0.026 0.093 0.714 0.835 40 Philadelphia 1.282 (0.344) 0.007 0.440 0.300 0.465 40 Phoenix 0.563 (0.361) 0.154 0.257 0.638 0.470 40 Pittsburgh 1.056 (0.120) 0.000 0.652 0.599 0.988 40 Portland 0.853 (0.491) 0.126 0.774 0.687 0.886 40 Providence-Warwick 1.617 (0.616) 0.034 0.350 0.689 0.570 40 Riverside 0.248 (0.385) 0.536 0.083 0.224 0.513 40 Sacramento 0.280 (0.584) 0.642 0.246 0.812 0.600 40 San Diego 1.656 (1.540) 0.314 0.681 0.316 0.232 40 San Francisco 0.476 (0.441) 0.311 0.269 0.170 0.458 40 San Jose 0.884 (0.195) 0.002 0.568 0.039 0.555 40 Seattle 0.749 (0.255) 0.022 0.358 0.627 0.628 40 St. Louis 0.305 (0.220) 0.202 0.013 0.209 0.797 40 Tampa 1.291 (0.148) 0.000 0.086 0.260 0.172 40 Virginia Beach 1.628 (0.675) 0.042 0.379 0.628 0.685 40 Washington DC 0.470 (0.222) 0.067 0.044 0.565 0.696 40

Notes: 1 – The coefficient estimate of the log Zillow estimate for each MSA using the specification in column (9) of Table 2. The Zillow estimates are estimates one month prior to the time of sales. 2 – Polynomials of third order for bed, bath, square feet, lot square feet, and year built. Standard errors are in parentheses.

Page 29: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

29!!

Table 4. Factors that affect the elasticity estimates across MSA’s

(1) (2) (3) (4) (5) (6) Elasticity estimate (Effect of online price estimates on sales prices)

Internet penetration rate 22.28** 22.57** 19.00* 18.02* 32.57** 31.73*** (9.244) (9.412) (10.22) (9.191) (13.34) (10.83)

Ln(number of residential real estate agents)

-0.0342 -0.334 -0.244 -0.278 -0.160 (0.0920) (0.311) (0.288) (0.352) (0.326)

Ln(land area) -0.141 -0.168 0.00672 -0.107 (0.201) (0.165) (0.252) (0.243)

Ln(population) 0.454 -3.472** -2.824 0.0484 (0.407) (1.304) (1.925) (2.175)

Ln(number of families) 3.920*** 3.283* 5.156*** (1.286) (1.773) (1.312)

Ln(median household income) -2.061* -3.701*** (1.056) (1.034)

Ln(number of college degrees) 1.203 1.376 (0.994) (0.999)

Ln(housing units) -4.896* (2.530)

Unemployment rate -0.0825 (0.0872)

Observations 30 30 30 30 30 30 R-squared 0.246 0.248 0.289 0.406 0.522 0.636

Notes: The elasticity estimates for each MSA in Table 3 are the dependent variable. All control variables are for 2013, except for the number of real estate agents, which is for 2012. The internet penetration rate is 1 minus the share of households without internet and computer access. Robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1.

Page 30: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

30!!

Table 5. Impact of Zillow estimates on list prices

(1) (2) (3) (4) List price List price Ln(list price) Ln(list price) Zillow estimate 0.00286 (0.0353) Zillow estimate one month ago 0.000661 (0.0158) Ln(Zillow estimate) -0.0210 (0.0330) Ln(Zillow estimate one month ago) -0.00261 (0.0138) House fixed effects Yes Yes Yes Yes Observations 2,257 1,620 2,256 1,619 R-squared 0.023 0.042 0.016 0.023

Notes: Robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1.

Page 31: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

31!!

Table 6. Impact of Zillow estimates on the change in sales prices from list price one month prior to sales

(1) (2) (3) (4) Sales price - List price one month prior to sales

Panel A. Full sample Zillow estimate 1 moth prior to sales – Zillow estimate 2 month prior to sales

0.134 0.112 0.0933 0.0727

(0.105) (0.105) (0.0970) (0.0962)

Days listed -28.26*** -27.40*** -25.50*** -24.53*** (6.167) (6.425) (6.038) (6.269)

Change in MSA level Zillow estimate

0.774*** 0.776*** (0.184) (0.184)

Month fixed effect Y Y Year fixed effect Y Y Observations 801 801 801 801 R-squared 0.047 0.058 0.073 0.084

Panel B: Change in Zillow estimate<$100,000 Zillow estimate 1 moth prior to sales – Zillow estimate 2 month prior to sales

0.284** 0.264** 0.217* 0.196*

(0.117) (0.116) (0.111) (0.110)

Days listed -28.58*** -28.06*** -26.28*** -25.66*** (6.270) (6.503) (6.217) (6.454)

Change in MSA level Zillow estimate

0.632*** 0.619*** (0.177) (0.177)

Month fixed effect Y Y Year fixed effect Y Y Observations 769 769 769 769 R-squared 0.056 0.067 0.074 0.083

Panel C: Change in Zillow estimate<10 percent Zillow estimate 1 moth prior to sales – Zillow estimate 2 month prior to sales

0.456* 0.432* 0.380* 0.356

(0.255) (0.255) (0.227) (0.230)

Days listed -29.76*** -28.91*** -26.41*** -25.62*** (6.981) (7.086) (6.817) (6.945)

Change in MSA level Zillow estimate

0.825*** 0.803*** (0.198) (0.198)

Month fixed effect Y Y Year fixed effect Y Y Observations 625 625 625 625 R-squared 0.070 0.080 0.102 0.110

Notes: In Panel B and C, the sample is restricted based on the change of Zillow estimates between the month of sales and 3 months prior to sales. Robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1.

Page 32: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

32!!

Figure1. Scatterplot between the elasticity estimate and the internet penetration rate across MSA’s

Notes: The elasticity estimates measure the impact of one month prior online price estimates on actual sales prices for each MSA and are from Table 3.

Page 33: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

33!!

Appendix Table 1. The list of Metropolitan Statistical Areas and the summary statistics of variables used in the MSA level VAR analysis

Median Sales Price Median Zillow Estimates October 2008 October 2010 October 2012 October 2008 October 2010 October 2012

Atlanta 194240 148650 154635 155200 131700 112600 Baltimore 270735 274823 259375 255700 230000 220800 Boston 321025 331100 322985 323300 313500 315000 Charlotte 176225 180775 176900 150600 137200 136000 Chicago 238100 210125 195925 221000 183900 161600 Cincinnati 136300 127400 123600 147700 143900 141460 Columbus 154200 148200 157175 137200 128900 126700 Denver 239645 240925 246325 213600 208200 224400 Las Vegas 226500 138105 132500 199100 127700 122500 Los Angeles 458752 405300 400000 435300 415200 405600 Miami-Fort Lauderdale 240200 157523 161050 196400 143300 149800

Minneapolis 220000 195965 195750 207500 180300 173400 Nashville 160885 171750 171725 155100 148500 148400 New York 399750 386652 366450 396400 364500 343100 Orlando 200500 121250 130900 175500 127400 123700 Philadelphia 224748 223575 213550 216300 199900 186500 Phoenix 215933 149125 165250 189300 133700 154100 Pittsburgh 126800 121800 131425 102800 105900 111100 Portland 258036 239100 233750 259500 226300 226800 Providence 242250 231225 207965 237600 227200 211000 Riverside 286500 198900 202500 236300 193200 192000 Sacramento 289950 228050 221750 267700 226400 217900 St. Louis 163075 150825 154057 141200 135400 127100 San Diego 401500 355750 358900 373900 364900 362800 San Francisco 572925 483250 480800 536600 499000 512400 San Jose 606100 556425 564000 605700 561800 610400 Seattle 330875 309225 296612 330400 278900 267300 Tampa 168475 130196 124165 147000 117700 111700 Virginia Beach 220441 222750 215125 223100 210900 195700

Washington DC 348750 358245 339717 329500 307700 320200

Notes: The median sales price and Zillow estimates are for three bedroom single family houses.

! !

Page 34: How Sensitive are Sales Prices to Online Price …yongslee/HSSPOPE.pdfNeighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no

34!!

Appendix Table 2. MSA level VAR results

(1) Minimum AIC

(2) Minimum BIC

(3) LASSO (0.5)

(4) LASSO (1.0)

(5) LASSO (2.0)

p (df) p-val p (df) p-val df p-val df p-val df p-val Atlanta 10 0.000 9 0.000 10 0.000 8 0.000 10 0.000 Baltimore 10 0.000 10 0.000 10 0.000 9 0.000 10 0.006 Boston 10 0.159 10 0.159 10 0.153 10 0.186 10 0.159 Charlotte 10 0.010 9 0.017 10 0.010 10 0.009 6 0.367 Chicago 9 0.000 9 0.000 9 0.000 9 0.000 9 0.000 Cincinnati 10 0.001 10 0.001 10 0.000 9 0.000 9 0.000 Columbus 10 0.000 5 0.000 9 0.000 2 0.276 7 0.000 Denver 10 0.084 7 0.602 10 0.142 10 0.084 9 0.077 Las Vegas 10 0.003 10 0.003 7 0.000 4 0.000 7 0.000 Los Angeles 9 0.000 9 0.000 10 0.000 8 0.000 9 0.000 Miami-Fort Lauderdale 10 0.000 10 0.000 10 0.000 10 0.000 10 0.000 Minneapolis 7 0.000 5 0.000 10 0.005 10 0.003 9 0.003 Nashville 10 0.061 10 0.061 8 0.114 9 0.029 6 0.135 New York 10 0.044 6 0.000 9 0.007 10 0.021 10 0.000 Orlando 10 0.000 10 0.000 10 0.000 10 0.000 10 0.000 Philadelphia 10 0.003 4 0.239 10 0.003 9 0.010 9 0.001 Phoenix 10 0.000 9 0.000 10 0.000 9 0.000 10 0.000 Pittsburgh 10 0.000 10 0.000 9 0.000 10 0.002 10 0.000 Portland 10 0.065 10 0.065 9 0.039 4 0.000 2 0.232 Providence 10 0.000 10 0.000 10 0.000 9 0.000 10 0.000 Riverside 10 0.000 10 0.000 8 0.000 10 0.000 8 0.000 Sacramento 10 0.000 10 0.000 9 0.000 10 0.000 8 0.000 St. Louis 10 0.014 9 0.011 10 0.025 8 0.007 6 0.002 San Diego 9 0.000 6 0.000 10 0.000 10 0.000 9 0.000 San Francisco 10 0.000 10 0.000 10 0.042 10 0.005 9 0.000 San Jose 10 0.000 10 0.000 9 0.000 8 0.000 4 0.000 Seattle 10 0.000 10 0.000 9 0.000 8 0.000 9 0.000 Tampa 10 0.000 8 0.000 10 0.000 10 0.000 10 0.000 Virginia Beach 10 0.034 4 0.054 9 0.021 4 0.566 3 0.415 Washington DC 10 0.000 10 0.000 10 0.000 10 0.000 10 0.000 United States 10 0.009 10 0.009 10 0.006 5 0.000 6 0.015 Notes: The analysis was performed on monthly data over the period between Oct. 2008 and Apr. 2013.


Recommended