How Sensitive are Sales Prices to Online Price Estimatesin the Housing Market?
Yong Suk Lee⇤ a and Yuya Sasakib
aFreeman Spogli Institute for International Studies, Stanford University, USAbDepartment of Economics, Johns Hopkins University, USA
June 24, 2016
Abstract
This paper examines the impact of online price estimates on transaction prices in thehousing market. We develop an estimation model that uses the di↵erence between listingprices and online price estimates to proxy for house specific unobservables, and first dif-ferences observations within neighborhood to account for correlated neighborhood specificunobservables. Using house price estimates and sales prices collected from Zillow.com, wefind that the elasticity of sales price with respect to the Zillow estimate is close to one,controlling for the aforementioned unobservables as well as observed house attributes.The accessibility of the internet at home strongly and positively predicts the elasticity es-timates across metropolitan areas. Furthermore, the change in Zillow estimates impactshow sales prices adjust from the list prices a month before sales. Our results imply thatonline price information a↵ects house prices and that online price estimates can potentiallyhave a direct impact on house price dynamics.
Keywords: real estate pricing, online price estimates, hedonic valuation, neighbor-hood panel data, proxies
JEL Codes: D82, R21, R31, R32
⇤Corresponding author at Stanford University, Encina Hall E309, 616 Serra St. Stanford, CA 94305.E-mail address: [email protected]
1
1 Introduction
Like other types of assets, the price of real estate is determined by the observed and unobserved
attributes of the asset. Houses, especially single family houses, exhibit unobserved heterogeneity
across various dimensions. Same sized bedrooms can be valued di↵erently depending on the
location of the window. The topography of same sized lots can a↵ect the value of the property.
Neighborhood amenities, like nearby schools and parks, are important determinants of property
prices. While no two houses in general are alike, houses have been priced based on what
appraisers or brokers refer to as comparables, observationally similar houses that were recently
sold in the same or nearby neighborhood. Pricing adjustments are made to reflect the di↵erences
between the house of interest and the comparable houses. In other words, the price of a house
takes into account the price information of other houses. With the advancement of the internet,
one can easily search sales price information for a large number of properties. Furthermore,
there are online services that provide their own property estimates for free based on the property
and neighborhood attributes, as well as the sales prices of comparable properties. Does the
availability of such price information a↵ect actual sales prices? How large is the extent of this
impact?
In order to estimate the impact of online house price information on transaction prices, we
develop a reduced-form pricing equation as the convex combination of the online price estimate
and the hedonic valuation of a property. The main challenge for estimation is to control
for the unobserved house and neighborhood attributes in the model. We present a method
that nonparametrically proxies for unobserved house specific attributes by using the di↵erence
between the listing price and the online price estimate. As Baum-Snow and Ferreira (2015)
highlight, quasi-experimental research designs (Chay and Greenstone 2005, Ferreira 2010), and
in particular, boundary discontinuities (Black 1999, Bayer et al. 2005) have been used to control
for unobservable area specific attributes. On the other hand, Bajari et al. (2012) propose a
method that relies less on the research design but on the structural assumption that prior
house sales prices can be used to control for time-varying unobservable attributes in a hedonic
2
regression. Better data, such as, repeat-sales house transaction data has also enabled researchers
to deal with unobservable house and neighborhood attributes (Bayer et al. 2013). This paper
develops a novel estimation strategy that incorporates structural assumption and data collection
into a hedonic model. In particular, we develop a hedonic model that controls for neighborhood
unobservables by first di↵erencing observations within neighborhoods, while relying on the weak
structural assumption that prior list prices contain unobserved house specific information and
the data collection strategy of having at least two properties per neighborhood.
We collect home value estimates, list prices, sales prices, and house and neighborhood
attributes from Zillow.com, an online real estate information provider, for 1,200 houses across
30 Metropolitan Statistical Areas (MSAs) in the US. We find that the elasticity of house sales
prices with respect to the Zillow price estimates is large and quite close to one. The results
are robust regardless of how we calculate the proxy variable to control for unobserved house
attributes.
Additionally, we explore possible factors that might explain the variation in the elasticity
estimates across the 30 MSAs. We find that the internet penetration rate, i.e., the average
accessibility of the internet at home, strongly and positively predicts the elasticity estimate
across metropolitan areas. In other words, sales prices are more responsive to Zillow estimates
in MSAs with better internet access at home. This e↵ect is robust to the income level, education
level, housing demand and supply, and unemployment rate of the MSA. Lastly, we find that the
change in Zillow estimates impacts how sales prices adjust from the list prices a month before
sales. This finding further corroborates our hypothesis that the information provided through
online price estimates directly impacts sales prices.
Our results are related to several strands of the literature. Ferreira and Gyourko (2011)
examine the causes of the most recent housing boom in the U.S. Though understanding the
causes of housing boom is not the focus of our paper, our findings imply that online price
estimates could have influenced the house price dynamics. Researchers have found that infor-
mation a↵ects house transaction prices in various contexts. Levitt and Syversson (2008) show
3
that informational advantage translates to higher sales prices by examining properties owned
by real estate brokers. They find that realtors sell their own houses at about 4 percent higher
prices. Foreclosures can a↵ect the price of non-foreclosure houses by conveying new information
about unobserved neighborhood attributes, or more directly by being included as comparables.
Campbell et al. (2011) find that foreclosed homes lower prices of nearby houses by about 1
percent.1
The paper is organized as follows. Section 2 presents our econometric model and its esti-
mation strategy. Section 3 explains the data, in particular, the house level data that we collect
from Zillow.com. In Section 4, we present our elasticity estimates and examine the underlying
mechanisms. Section 5 concludes and discusses the implications.
2 The Econometric Model and Estimation Strategy
2.1 The Extended Hedonic Model
We propose an econometric method that estimates the impact of house price information on
sales prices. Specifically, we extend the traditional hedonic framework to one that incorporates
the potential e↵ects of house price information, in particular, the house level price estimates
provided by Zillow.2
The following is a list of economic factors that may potentially a↵ect transaction prices for
house i in a neighborhood Ni:
• Xi: A vector of house-specific amenities including: lot size, square footage, number of
1Also, related is the literature that examines how information, or the lack of information, impacts equityprices. Easley and O’Hara (1987) show that large trades in the securities market reflect better informationand impact security prices, and that investors demand higher returns on stocks for which there is less publicinformation. Real life examples of markets for information, like car reports for used cars or online reviews forrestaurants, more directly speak to the value people put on information.
2As a preliminary step, we first examined the hypothesis that online property price estimates impact actualsales prices at the aggregate level. If house price information directly impacts house prices, we expect therelation to hold at an aggregate level as well. Specifically, we test whether Zillow’s median price estimatesGranger cause the median sales price as reported by Zillow across 30 MSAs in the US. Appendix A presentsthe method and results.
4
bedrooms, number of bathrooms, and year built.
• Ui: The value of unobserved house-specific amenities including: floor plans and appli-
ances.
• VNi : The value of unobserved neighborhood-specific amenities including: public schools,
crime, curb appeal, environmental quality, and other public services.
• Zi: Home price information, i.e., Zillow’s price estimates, that housing market partici-
pants can observe.
The standard hedonic pricing models forecast the transaction price as follows:
Yi = ↵ +Xi�| {z }Regression
+
Residual 1z}|{VNi +
Residual 2z}|{Ui +
Residual 3z}|{"i| {z }
Residual
. (2.1)
For the purpose of elucidating the problem that we face in our study, we decompose the usual
residual into three components, the first one reflecting the value VNi of neighborhood-specific
amenities, the second one reflecting the value Ui of house-specific amenities, and the third
one representing idiosyncratic errors "i. The standard hedonic pricing model (2.1) assumes
that sellers and/or buyers take the vector of house-specific amenities (Xi, Ui) and the value
of neighborhood-specific amenities VNi into account when making decisions about transaction
prices Yi in the equilibrium. Econometricians estimate the reduced-form coe�cient �, called
contributory values, for Xi, the house-specific amenities that are observable in the data.
We hypothesize that agents may also take into account the home price information Zi,
the one that is produced by real estate information providers like Zillow, when proposing to
set transaction prices. This hypothesis may reflect that both buyers and sellers may not be
so confident of their own home evaluation based on the information of the house and the
neighborhood, and therefore tend to use the measure Zi provided by third parties. In this
light, we propose an extended reduced-form equilibrium pricing model simply as the convex
5
combination of outside and self valuations:
Yi = �Zi + (1� �)[↵ +Xi� + VNi + Ui + "i]. (2.2)
The expression in the square brackets in the second term, ↵ +Xi� + VNi + Ui + "i, constitute
those factors used in the traditional hedonic pricing models (2.1). Further, we add the first
term Zi to reflect the potential e↵ects of the home price information Zi on transaction prices Yi.
As such, the parameter � may be interpreted as the degree which agents rely on the third-party
information. Our null hypothesis that the home price estimates Zi do not impact the actual
transaction prices is thus represented by the equality � = 0, which is readily testable once apN -consistent estimate of � is obtained.
The OLS estimators of the parameters ↵, �, and � would be consistent if (VNi , Ui) were
mean independent of both Zi and Xi. However, this statistical independence assumption is
hard to justify at least for two reasons. First, the unobserved house-specific amenities Ui are
likely to be correlated with the observed house-specific amenities Xi. Second, more importantly
in our study, the introduction of Zi in the extended pricing model (2.2) causes another source
of endogeneity. To see this, it may help to think of how the home price information Zi is gener-
ated by real estate information providers. Although these service agencies do not disclose their
formulas, the estimates Zi are constructed using recent transaction data in the neighborhood
Ni of house i. (See Section A.2 in the appendix for the case of Zillow.) As such, the statistical
independence Zi ?? VNi between the price estimate and unobserved neighborhood characteris-
tics, or the corresponding mean independence, will probably not hold even if we control for the
observed house specific amenities Xi. We therefore propose a couple of approaches to handle
these two sources of endogeneity in the subsequent sections.
6
2.2 Proxy Variable
To control for the endogenous unobserved house-specific amenities Ui, we follow the proxy
variable approach.Specifically, we construct a proxy variable using listing prices, denoted by
Li. The seller can perceive house-specific amenities Ui that econometricians cannot observe.
Note that the hedonic valuations Hi and the listing prices Li are both public information even
without house visits, while the true amenities Ui can be observed by the prospective buyers
only in house visits. In order to send a correct signal about the amenities Ui to prospective
buyers, sellers may add their values to benchmark hedonic valuations Hi when proposing their
listing prices, i.e.,
Li = Hi + g(Ui).
List prices may di↵er from the online hedonic estimates Hi for various reasons. List prices
tend to start high since the seller predicts that the negotiation process will ultimately result
in a lower sales price. How quickly the seller needs to sell the property could also impact the
list price. The function g thus captures the seller’s adjustment of the self-valuation of Ui. Note
that the identity function g(u) = u implies that there is no markup or markdown in the listing
prices above the observed and unobserved value of the house.3
Finally, to take this structure into estimation of the parameters, we assume that g is strictly
increasing so that its inverse g�1 exists. With this inverse function, we can recover the unob-
served house-specific amenities Ui by
Ui = g�1(Li �Hi).
Substituting this expression in (2.2) yields
Yi = �Zi + (1� �)[↵ +Xi� + VNi + g�1(Li �Hi) + "i]
= �Zi + ↵ +Xi� + �VNi + g(Li �Hi) + "i, (2.3)
3We find that initial list prices are higher than the prior hedonic estimates in about 68 percent and lower inabout 32 percent of the observations in our sample.
7
where ↵ := (1 � �)↵, � := (1 � �)�, � = (1 � �), g := (1 � �)g�1(·) and "i = (1 � �)"i for
short-hand notations. This operation removes one of the two sources of endogeneity, namely
Ui, and it thus remains to handle the other unobserved variable VNi . For estimation of the
parameters with the additive nonparametric function g provided VNi is known, we can use
Robinson (1988).
This method works as follows. If the mean independence E["i | Ui] = 0 is true, then
E[Yi | Ui] = � E[Zi | Ui] + ↵ + E[Xi | Ui]� + � E[VNi | Ui] + g(U)
follows, where Ui = Li �Hi for a short-hand notation. Thus, we obtain
Yi � E[Yi | Ui] = �(Zi � E[Zi | Ui]) + (Xi � E[Xi | Ui])� + �(VNi � E[VNi | Ui]) + "i
If the contributory value VNi of neighborhood Ni were observed, then � may bepN -consistently
estimated by the OLS of Yi �E[Yi | Li �Hi] on Zi �E[Zi | Li �Hi], Xi �E[Xi | Li �Hi], and
VNi � E[VNi | Li � Hi], where the nonparametric regressions E[Yi | Li � Hi], E[Zi | Li � Hi],
E[Xi | Li �Hi] and E[VNi | Li �Hi] are pre-estimated using the kernel method.
2.3 Local First Di↵erencing
The previous section introduced a way to control for house-specific unobservables Ui, provided
that the contributory value VNi of neighborhood Ni were observed. If we have multiple ob-
servations per neighborhood, however, we do not need to observe VNi since we can take first
di↵erences within a neighborhood to cancel out the VNi terms. Note that Ni = Nj clearly
implies VNi = VNj . Hence, we can take the di↵erence of (2.3) between two properties, i and j
to obtain the equation
Yi � Yj = �(Zi � Zj) + (Xi �Xj)� + g(Li �Hi)� g(Lj �Hj) + "i � "j. (2.4)
8
for any pair (i, j) such that Ni = Nj, i.e., within the same neighborhood.
This operation, mechanically identical to the method of first di↵erencing for panel data
analysis, removes the neighborhood fixed e↵ect VNi . For this sort of first-di↵erenced partially
linear equations, Li and Stengos (1996) extend the Robinson method (see the previous section).
Specifically, � may bepN -consistently estimated by the OLS of Yi�Yj�E[Yi�Yj | Li�Hi, Lj�
Hj] on Zi�Zj�E[Zi�Zj | Li�Hi, Lj�Hj], and Xi�Xj�E[Xi�Xj | Li�Hi, Lj�Hj], where
the nonparametric regressions E[Yi � Yj | Li �Hi, Lj �Hj], E[Zi � Zj | Li �Hi, Lj �Hj], and
E[Xi�Xj | Li�Hi, Lj�Hj] are pre-estimated using the kernel method. Baltagi and Li (2002a,
2002b) propose alternative methods to estimate first-di↵erenced partially linear models with
discussions on asymptotic properties of the estimators – they suggest that the nonparametric
pre-estimations be done with series approximation instead of the kernel method in order to
take advantage of the additivity between g(Li � Hi) and g(Lj � Hj). See Section A.3 in the
appendix for further details on how we use this semiparametric approach to estimate � and �.
3 Data
We collect house level data from Zillow, one of the major online real estate information
providers. Zillow provides individual house price estimates, which are available regardless
of whether the property is on the market or not.4 Zillow does not disclose the formula that
it uses to generate their price estimates, but the website mentions that it uses the physical
attributes of the property, tax assessments, and prior and current transactions of the property
itself and the comparable properties nearby. In addition to its current house price estimates,
Zillow provides its past estimates, current and past listing prices when available, and the most
recent sales price, and past sales prices when available. We collect the sales date and price,
4Currently there are many real estate websites. Many of these websites are brokerage websites where listingand selling of properties on the market is the main business model. These websites belong to the local multiplelisting service (MLS) which are local associations where real estate agents share their property listings. Otherwebsites, for example Zillow and Trulia, are not real estate brokerage firms but mainly serve as an informationprovider to various parties interested in the real estate market. Their business model aims to gain a wideaudience and profit from advertisement fees, not through brokerage fees.
9
Zillow estimate at the time of sales, the estimates one, two, three, and six months before sales,
the initial listing price, and historical sales and listing prices when available. In addition, a rich
set of house specific and neighborhood specific information are available for Xi. We collect the
address of the house, square footage, number of bedrooms, number of bathrooms, lot size, year
built, and property tax. Zillow also provides nearby school names and the school ratings from
GreatSchools.org.
In constructing the sample, we collect data on multiple houses for each neighborhood to
enable first di↵erencing within neighborhoods. The following procedure is employed to collect
our sample of house level data. We first choose 30 MSAs where Zillow provides both their
house price estimates and sales price information.5 For each MSA we find 10 neighborhoods
with median Zillow estimates closest to the MSAmedian Zillow estimate and collect data on four
houses per neighborhood. If there are less than 10 neighborhoods in an MSA we additionally
collect four more houses from existing neighborhoods, starting with neighborhoods that have
median Zillow estimates closest to the MSA median Zillow estimate. Within each neighborhood,
we restrict the search to single family houses that are 2000 sqft or above, have 3 bedrooms or
more, 2 bathrooms or more, and were last sold between July 2012 and July 2013. For each
neighborhood, we narrow down to houses that have Zillow estimates that are closest to its
Zipcode median Zillow estimate and that list the same set of nearby public elementary schools.
We then randomly select the first four houses that have non-missing information on the Zillow
estimates at time of sales, sales price, initial listing price, number of beds and baths, house
square footage, lot size, and year built. This procedure returns 40 houses in 30 MSAs for a
total of 1,200 observations. Table 1 Panel A summarizes the characteristics of these houses.
In order to explore possible factors that might explain the variation in the elasticity estimates
across the 30 MSAs, we collect MSA level data from the US Census. We collect information
on the internet penetration rate, i.e., the average accessibility of the internet at home, and the
5We first refer to the MSA Zestimate accuracy file. The MSA file contains 30 MSAs. How-ever, in a few MSAs sale prices were not reported and we replaced those MSAs with other MSAswhere accuracy estimates and sales prices were available. The Zestimate accuracy file is accessible athttp://www.zillow.com/howto/DataCoverageZestimateAccuracy.htm.
10
number individuals employed as residential real estate agents. In addition, we compile data
on the population, land area, number of families, number of housing units, median household
income, educational attainment, and unemployment rate for each MSA. The variables are for
the year 2013, except for the number of real estate agents in the MSA, which is for 2012. Table
1 Panel B presents the summary statistics for the MSA level variables.
4 Results
4.1 The Impact of Real Estate Information on Sales Prices
We first implement our procedure on the full sample and later by MSAs. Table 2 presents the
full sample results. In columns (1) through (3) we do not first di↵erence observations within
neighborhood, and hence we are not controlling for the unobserved neighborhood component.
Column (1) does not include the proxy for the unobserved house specific characteristics, column
(2) controls for the unobserved house specific characteristics by including a linear proxy, which
is the di↵erence between the initial list price and the Zillow estimate at the month of listing,
and column (3) uses the nonlinear proxy in place of the linear proxy as described in Section
2.2.
The coe�cient estimates on log Zillow estimate is closely distributed around one in all three
specifications and are statistically significantly di↵erent from zero. In panel A of Table 2, we
additionally test whether the coe�cient estimate is statistically di↵erent from one. We are
unable to reject the null in the first three columns.
Rather than reporting coe�cient estimates on each observable house specific variables, i.e.,
number of bedrooms, number of bathrooms, square footage, etc., we report the p-value from
the joint hypothesis test of whether all the coe�cient estimates on the observable housing
characteristics are equal to zero in panel B. We reject the null in columns (1) through (3). In
column (2), we also report the coe�cient estimate on the linear proxy. The coe�cient estimate
on the proxy is 0.23 and is statistically significantly di↵erent from zero at the 10 percent level.
11
Columns (4) through (6) report results when we additionally perform the neighborhood first-
di↵erencing procedure to control for unobserved neighborhood characteristics, as described in
Section 2.3. The coe�cient estimates on log Zillow estimate decrease slightly to around 0.88 to
0.95. As the p-values in panel A indicate we reject the hypothesis that the elasticity estimate
is one in columns (4) and (5). Though we can not reject the same hypothesis in column (6) the
p-value is relatively small at 0.138. Panel B indicates that we can reject the hypothesis that all
the coe�cient estimates on the observable housing characteristics are equal to zero in columns
(4) and (5). However, we can not reject the joint hypothesis that the observable covariates are
jointly equal to zero in column (6).
Maintaining the specification with the non-linear proxy and local first di↵erencing as the
base, we additionally control for how long the house was listed on the market until it sold in
column (7). We add higher order covariates of the observables in column (8), and seasonal
trends in column (9). The elasticity estimate is tightly distributed between 0.941 and 0.948.
As the p-values in panel A indicate, we can not reject the hypothesis that these estimates are
statistically di↵erent from one at the 15 percent level. The p-values in panel B indicate that we
are now unable to reject the joint hypothesis that all observable covariates are zero. Overall,
Table 2 indicates that the elasticity estimates of house sales prices to online price estimates
are large and close to one, even when observable and unobservable house and neighborhood
characteristics are controlled for. In other words, participants in the housing market seem to
rely on online price information more so than own hedonic assessments of house or neighborhood
attributes. The multiple dimensions of house information, Xi, Ui and VNi , are available for them
to look at, but the scalar online price index, Zi, may be su�cient or more important to the
market participants when pricing property.
We hypothesize that information, now readily available through online websites, is directly
a↵ecting property sales prices and present evidence consistent with this information channel
in the following sections. However, once one considers how property transactions are actually
made, the Table 2 results may seem less surprising. A hedonic framework is often used to
12
estimate the marginal valuation of one attribute in a composite good. It is an ex post revelation
of what people’s marginal willingness to pay for an attribute is. For instance, we can back out
the marginal value of a bedroom from a hedonic regression. However, the market participant,
be it the seller, buyer, or broker, would rarely use the estimates from a hedonic method to price
the composite good, the house. One of the most common methods to value property is to use
recent sales prices of comparable properties in the neighborhood and then to make marginal
adjustments based on the di↵erent attributes of the houses. Hence, nearby information is a
prime determinant in setting house prices. The emergence of Zillow and various online real
estate websites have made this information readily available to everyone in the market and not
just to comparables. Now, almost every potential homeowner has a comparable price for his or
her exact home.
4.2 What Explains the Variation in the Elasticity Estimates across
MSAs?
We next examine how the elasticity estimate �, i.e, the relative value of online information,
varies across MSAs. Table 3 presents the impact of one month prior Zillow estimates on sales
prices in each of the 30 MSAs using the full model in Table 2 column (9). For each MSA we
additionally conduct hypotheses tests on whether the elasticity estimate is one and whether all
covariates are jointly equal to zero. Even with 40 observations per MSA, many of the estimates
are statistically di↵erent from zero at the 10 percent level. The estimates vary considerably
across MSAs, e.g. ranging from -0.9 in Las Vegas to 2 in Boston. Many of the estimates are
statistically indistinguishable from one even at the MSA level.
What might explain the variation in the elasticity estimates in the real estate market and
hence the elasticity estimates in Table 3? We hypothesize that the availability of internet at
home would increase the demand for online house price information and the relative valuation
of online house price information in determining sales prices, i.e., the elasticity estimate. We
use the internet penetration rate, measured as one minus the share of households without
13
internet and computer access in the MSA, as our main proxy for accessibility to online house
price information. Figure 1 presents a scatter plot between the MSA level elasticity estimates
and the internet penetration rate. The scatterplot reveals a surprisingly strong and positive
relationship between the two variables. Despite the sample size of 30, the bivariate regression
in Table 4 column (1) confirms that this relationship is statistically significant at the 5 percent
level. A 1 percent point increase in the internet penetration rate is associated with an increase
in the elasticity estimate by 0.2. In column (2) we include the number of residential real estate
agents in the MSA, but this variable has no significant impact and the coe�cient estimate on
the internet penetration rate remains unchanged. In column (3), we control for the size of the
MSA by controlling for the land area and population. The coe�cient estimate on the internet
penetration rate decreases slightly but is still significant. In column (4), we additionally control
for the number of families. The number of families conditional on the population captures the
potential demand for housing in the MSA. The coe�cient estimate on the log number of families
is positive and the coe�cient estimate on population is now negative. This likely suggests that
cities with more families per population demand more housing and puts more value on online
price information. In column (5), we control for the median household income and the number
of adults with a college degree. There is a weak negative relationship with median income
but no significant relationship with the size of the college educated population. We note that
the coe�cient estimate on internet penetration rate increases to 32.57. Finally, we control for
housing supply and the economic condition in the city. The negative coe�cient estimate on
the log number of housing units suggests that the supply of housing conditional on potential
demand reduces the elasticity estimate, potentially by reducing the relative demand for online
house price information. The coe�cient estimate on internet penetration rate barely changes
and is now statistically significant at the 1 percent level. The robust relationship between
the internet penetration rate and the elasticity estimate across MSAs in Table 4 supports the
hedonic price model we laid out in Section 3.
14
4.3 How do online estimates impact list prices and sales prices
around the time of sales?
We further explore the information channel by examining how Zillow estimates impact the
movement of list prices and sales prices for the same house around the time of sales. Using
our data we construct a panel data of list prices and Zillow estimates. Each house has Zillow
estimates from 1, 2, and 3 months prior to sales. We were able to collect historical list prices
for only a subset of the initial sample. We perform a simple house level fixed e↵ects regression
of list price on current or one month prior Zillow estimates. As the results in Table 5 indicate,
we find no impact of Zillow estimates on list prices in the months before sales. People do not
immediately adjust list prices based on the short term changes in Zillow estimates.
But do Zillow estimates have an impact when sales are being negotiated? We examine
whether the short term change in Zillow estimates impact how the final sales price adjusts from
the list price one month prior to sales. Table 6 presents first-di↵erenced regression results where
the dependent variable is sales price minus list price one month prior to sales, and the main
regressor is the change in Zillow estimate between the period 1 month to 2 month prior to sales.
We control for the number of days the house was listed on the market prior to sales. Panel
A of column (1) indicates that the Zillow estimates have no impact on how prices adjust in
the final month leading to sales. However, we are concerned with cases where Zillow estimates
bounce around drastically and may not be perceived as reliable. We restrict the sample and
drop observations where the Zillow estimates fluctuate drastically in Panels B and C. Panel B
uses the sample of houses where the change in Zillow estimates is less than $100,000. Panel C
uses the sample where the change is less than 10 percent of the initial Zillow estimate. The
coe�cient estimate in column (1) is positive and statistically significant in Panel B and becomes
larger as we trim the sample further in Panel C.
A concern in column (1) is whether the change in market conditions that is correlated with
the change in Zillow estimates is driving the final price adjustments during the sales phase.
We additionally control for year and month fixed e↵ects in column (2) and the change in MSA
15
level Zillow estimates in column (3). We exclude the own house observation when calculating
the MSA level average. Column (4) includes both the fixed e↵ects and the change at the MSA
level. The results remain quite robust across the di↵erent specifications. Column (4) estimates
of Panels B and C indicate that a thousand dollar change in the Zillow estimate results in the
final sales price adjusting from the list price one month prior to sales by 200 to 400 dollars.
5 Conclusion
We investigate how sensitive sales prices are to online price estimates in the housing market.
We propose a reduced-form equilibrium pricing model as the convex combination of third-party
online price estimate and self valuation of properties. Our method nonparametrically proxies
for unobservable house attributes by using the di↵erence between the listing price and the
online price estimate, and controls for unobserved neighborhood attributes by neighborhood
first di↵erencing. We collect house price estimates, sales and list prices, in addition to various
house and neighborhood attributes from Zillow.com across 30 MSAs in the US. Our empirical
results show that the elasticity of sales price with respect to the Zillow estimate is large and
close to one. In addition, we find that the accessibility of the internet at home strongly and
positively predicts the elasticity estimate across metropolitan areas. Furthermore, the change
in Zillow estimates impacts how sales prices adjust from the list prices a month before sales.
The literature has found that information impacts asset prices, in particular in the securi-
ties market. We find that information is valued in the real estate market as well. The large
elasticity estimate between house sales prices and online price estimates may have significant
implications. If information is more important than fundamentals in determining real estate
prices, then how information is generated could have a big impact on house price dynamics.
Also, the prevalence and accessibility of online house price information and people’s reliance
on such information potentially could precipitate house price movements.
16
Acknowledgements
We thank Nathaniel Baum-Snow, Thomas Davido↵, Daniel Fetter, Edward Kung, Je↵ Zabel
and seminar participants at the AREUEA Annual Conference, Urban Economics Associations
Annual Conference, Greater Boston Urban and Real Estate Economics Conference, the Lincoln
Land Institute, and Williams College for comments. Danny Guo and Simmon Kim provided
excellent research assistance.
References
Bajari, Patrick, Jane Cooley, Kyoo il Kim, and Christopher Timmins. 2012.“A Rational Ex-
pectations Approach to Hedonic Price Regressions with Time-Varying Unobserved Product
Attributes: The Price of Pollution,” American Economic Review, 102(5): 1898-1926.
Baltagi, Badi and Dong Li. 2002a. “Series Estimation of Partially Linear Panel Data Models
with Fixed E↵ects,” Annals of Economics and Finance, 3: 103-116.
Baltagi, Badi and Qi Li. 2002b. “On Instrumental Variable Estimation of Semiparametric
Dynamic Panel Data Models,” Economics Letters, 76: 1-9.
Baum-Snow, Nathaniel and Fernando Ferreira. 2015. “Causal Inference in Urban Economics,”
Gilles Duranton, J. Vernon Henderson and William C. Strange Ed., Handbook of Regional
and Urban Economics, Vol. 5: 588-638
Bayer, Patrick, Fernando Ferreira, and Robert McMillan. 2007. “A Unified Framework for
Measuring Preferences for Schools and Neighborhoods,” Journal of Political Economy, 114(4):
588-638.
Bayer, Patrick, Marcus Casey, Fernando Ferreira, and Robert McMillan. 2013. “Estimating
Racial Price Di↵erentials in the Housing Market,” mimeo.
Black, Sandra. 1999. “Do Better Schools Matter? Parental Valuation of Elementary Educa-
tion,” Quarterly Journal of Economics, 114(2): 577-599.
17
Campbell, John Y., Stefano Giglio, and Parag Pathak. 2011. “Forced Sales and House Prices,”
American Economic Review, 101: 2108-2131.
Chay, Kenneth and Michael Greenstone. 2005. “Does Air Quality Matter? Evidence from the
Housing Market,” Journal of Political Economy, 113(2): 376-424.
Easley, David and Maureen O’Hara. 1987. “Price, Trade Size, and Information in Securities
Markets,” Journal of Financial Economics, 19: 69-90.
Ferreira, Fernando and Joseph Gyourko. 2011. “Anatomy of the Beginning of the Housing
Boom: U.S. Neighborhoods and Metropolitan Areas 1993-2009,” mimeo.
Ferreira, Fernando. 2010. “You Can Take It with You: Proposition 13 Tax Beneifts, Residen-
tial Mobility and Willingness to Pay for Housing Amenities,” Journal of Public Economics,
94:661-673.
Levitt, Steven D. and Chad Syverson. 2008. “Market Distortions When Agents Are Better
Informed: The Value of Information In Real Estate Transactions,” Review of Economics and
Statistics, 90(4): 599-611.
Li, Qi and Thanasis Stengos 1996. “Semiparametric Estimation of Partially Linear Panel Data
Models,” Journal of Econometrics, 71: 389-397.
Newey, Whitney K. (1997) “Convergence Rates and Asymptotic Normality for Series Estima-
tors,” Journal of Econometrics, 79: 147-168.
Robinson, Peter. 1988. “Root-N-Consistent Semiparametric Regression,” Econometrica, 56:
931-954.
Rosen, Sherwin. 1974. “Hedonic Prices and Implicit Markets: Product Di↵erentiation in Pure
Competition,” Journal of Political Economy, 82(1): 34-55
Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso,” Journal of the
Royal Statistical Society: Series B, 58: 267-288.
18
Zou, Hui. 2006. “The Adaptive Lasso and Its Oracle Properties,” Journal of the American
Statistical Association, 101: 1418-1429.
A Appendix
A.1 The MSA Level Analysis
As a preliminary check, we also examine the hypothesis that online property price estimates
impact actual sales prices at the aggregate level. If house price information directly impacts
house prices, we expect the relation to hold at an aggregate level as well. Specifically, we test
whether Zillow’s median price estimates Granger cause the median sales price as reported by
Zillow across 30 MSAs in the US. The 30 MSAs were chosen based on Zillow’s MSA level report
and the availability of individual sales price information.6 Appendix Table 1 lists the 30 MSAs
and the summary statistics of the median sales price and Zillow estimates for three bedroom
single family houses. The MSA level data is available at Zillow’s research division and we collect
monthly data from October 2008 to April 2013.7 The following two subsections introduce the
empirical methodology for the MSA level analysis, and the third subsection presents empirical
results.
A.1.1 Granger Causality in VAR
For each MSA, we denote Zillow’s median log house price estimate at time t by Zt. The median
log sales price at time t is denoted by Yt. We assume that they jointly follow the p-th order
vector autoregressive (VAR(p)) process:
0
B@Zt
Yt
1
CA =
0
B@A
0,1
A0,2
1
CA+pX
q=1
0
B@Aq,1,1 Aq,1,2
Aq,2,1 Aq,2,2
1
CA
0
B@Zt�q
Yt�q
1
CA+
0
B@"t,1
"t,2
1
CA (A.1)
6Section 4 describes the selection of MSAs in more detail7The data is available at http://www.zillow.com/blog/research/data/
19
We say that Zt does not Granger cause Yt if Aq,2,1 = 0 for all q = 1, · · · , p. A test
of this null hypothesis can be conducted by the Wald test on (A1,2,1, · · · , Ap,2,1). Let A
2
=
(A0,2, A1,2,1, A1,2,2, · · · , Ap,2,1, Ap,2,2) be the (2p+ 1)-dimensional vector of the coe�cient in the
second row of the above VAR model (A.1). Let A2
denote its consistent estimate, and let ⌃A2
denote a consistent estimate of the variance matrix of the coe�cient estimate A2
. The Wald
statistic is computed by
W = A02
R0(R⌃A2R0)�1RA
2
where R is the p by 2p+1 restriction matrix whose 2r-th column is one for each row r = 1, · · · , p,
and all the other elements are zero. Under the null hypothesis H0
: A2
= ~0, this Wald statistic
W follows the chi-square distribution of p degrees of freedom. We report this statistic and the
associated p-value for the test of Granger causality.
A.1.2 Model Selection
There is arbitrariness in the choice of the order p of the VAR model (A.1). Some commonly used
approaches to selecting p include Akaike Information Criterion (AIC) and Bayesian Information
Criterion (BIC). We conduct the hypothesis testing after selecting the order p of the VAR
process by choosing the minimum AIC or BIC. However, these approaches have some drawbacks
in terms of consistency of model selection and validity in post-selection inference.
A recently popular method of model selection in econometrics is the least absolute shrinkage
and selection operator (LASSO: Tibshirani, 1996). In particular, the adaptive LASSO (Zou,
2006) enjoys the nice Oracle property as well as consistency of the model selection. This method
works as follows. Let A denote a preliminary consistent estimate of the parameters A in model
(A.1) without model restriction, e.g., the least squares estimate under a choice of large order
20
p. The adaptive LASSO estimate A⇤ is obtained by the L1 penalized least square problem
A⇤ = argminA
TX
t=p+1
�������
0
B@Zt
Yt
1
CA�
0
B@A
0,1
A0,2
1
CA�pX
q=1
0
B@Aq,1,1 Aq,1,2
Aq,2,1 Aq,2,2
1
CA
0
B@Zt�q
Yt�q
1
CA
�������
2
+�T
2X
r=1
0
@ |A0,r|���A0,r
��� +
2X
c=1
pX
q=1
|Aq,r,c|���Aq,r,c
���
1
A .
The theory (Zou, 2006) requires that the tuning parameters > 0 and �T > 0 satisfy the
asymptotic order �T/pT ! 0 and �TT
(�1)/2 ! 1 as T ! 1. In practice, however, T is
fixed given a finite sample, and thus this asymptotic guideline may not be useful. Therefore,
we present empirical estimation results for each of the di↵erent values of the tuning parameter,
and examine their robustness.
A.1.3 Results from the MSA level analysis
Appendix Table 2 presents the tests of Granger causality of median sales price yt by Zillow’s
median price estimates for each MSA. The results are based on the best information criteria
with the maximum p set to 10. Column (1) presents results when optimal p is chosen using
the minimum AIC. The optimal lag ranges from 7 to 10 with 10 being the most common. In
all 30 cities but one, Boston, the joint hypothesis that all the coe�cients on Zt�q are zero is
rejected at the 10 percent level. Column (2) uses the minimum BIC to choose p and the joint
hypothesis tests reject the null for all cities except Boston, Denver, and Philadelphia. The more
preferred LASSO method with tuning parameters of 0.5, 1.0, and 2.0 are presented in columns
(3) - (5). The optimal p tends to be smaller in these columns than in columns (1) and (2), but
the general results are similar. Other than in three to five cities, we reject the null hypothesis
that online price estimates do not Granger cause sales prices. The last row of Appendix Table
2 presents results for the entire US. The selected lag orders are 5 and 6 in the LASSO models of
columns (4) and (5) and 10 in the other models. The p-values imply that Zillow’s median price
estimate Granger-cause actual sales prices at the national level in all five models. The MSA
21
and national level aggregate results indicate that house price information may impact actual
transaction prices.
A.2 Zillow Estimates
Zillow does not disclose the formula for how their hedonic price estimates are produced, but
they mention which data they use. According to their website, some of the data that they use
include:
• Physical attributes: Location, lot size, square footage, number of bedrooms and bath-
rooms and many other details.
• Tax assessments: Property tax information, actual property taxes paid, exceptions to tax
assessments and other information provided in the tax assessors’ records.
• Prior and current transactions: Actual sale prices over time of the home itself and com-
parable recent sales of nearby homes
A.3 Semiparametric Estimation
In this section, we describe details of the econometric method for semiparametric estimation of
� and �. For convenience of writing to the current goal, we slightly change the index notations
from those used in the main text. Suppose that there are N neighborhoods indexed by n. For
simplicity, assume that data contains J + 1 houses in each neighborhood. Randomly order
the J houses in each neighborhood to produce the indices j = 0, · · · , J for each neighborhood
n = 1, · · · , N . We observe a random sample of (Yn, Zn, Xn, Ln, Hn) where Yn = (Yn0, · · · , YnJ),
Zn = (Zn0, · · · , ZnJ), Xn = (Xn0, · · · , XnJ), Ln = (Ln0, · · · , LnJ), and Hn = (Hn0, · · · , HnJ).
We make the following assumption for this random sample.
Assumption 1. (i) The random vector (Yn, Zn, Xn, Ln, Hn) is independently and identically
distributed. (ii) The support of (Zn, Xn, Ln, Hn) is a Cartesian product of compact connected
22
intervals on which (Zn, Xn, Ln, Hn) has a probability density function bounded away from zero.
(iii) g�1
and Var(Ynj | Znj = · , Xnj = · , Lnj �Hnj = · ) are bounded functions.
The vectors of first-di↵erenced observed random variables are stacked into matrices of NJ
rows as
Y = (�Y11
, · · · ,�Y1J ,�Y
21
, · · · ,�Y2J , · · · · · · ,�YN1
, · · · ,�YNJ)0
Z = (�Z11
, · · · ,�Z1J ,�Z
21
, · · · ,�Z2J , · · · · · · ,�ZN1
, · · · ,�ZNJ)0
X = (�X11
, · · · ,�X1J ,�X
21
, · · · ,�X2J , · · · · · · ,�XN1
, · · · ,�XNJ)0
where �Ynj = Ynj � Ynj�1
, �Znj = Znj � Znj�1
, and �Xnj = (Xnj � Xnj�1
)0 for each n =
1, · · · , N and j = 1, · · · , J
We approximate the unknown nonparametric function g using the power series. Specifically,
define the K-dimensional random vector
pKnj =�(Lnj �Hnj)� (Lnj�1
�Hnj�1
), · · · , (Lnj �Hnj)K � (Lnj�1
�Hnj�1
)K�0
for each n = 1, · · · , N and j = 1, · · · , J . Define the NJ ⇥K random matrix
PK = (pK11
, · · · , p1J , p
K21
, · · · , p2J , · · · · · · , pKN1
, · · · , pNJ)0.
To control the bias caused by this approximation, we use the following assumptions for su�cient
smoothness of g and the asymptotic choice of the order K of the power series.
Assumption 2. g is C1
-di↵eomorphic.
Assumption 3.pN/K ! 0 as N ! 1.
We estimate the parameters (�, �0) in the following manner. To partial out pKnj from the
estimating equation, we obtain the projections Y = PK(P 0KPK)�P 0
KY, Z = PK(P 0KPK)�P 0
KZ,
and X = PK(P 0KPK)�P 0
KX, where (·)� denotes a symmetric generalized inverse operation. Our
23
estimator is
0
B@�
ˆ�
1
CA =
⇣(Z : X)� (Z : X)
⌘0 ⇣(Z : X)� (Z : X)
⌘�� ⇣(Z : X)� (Z : X)
⌘0 ⇣Y � Y
⌘
We also state the following rank condition.
Assumption 4. The matrix � := E[e0njenj] is positive definite, where enj is defined by enj =
(Znj : Xnj)� E [(Znj : Xnj) | Lnj �Hnj]
We use the result of Baltagi and Li (BL, 2002) to obtain standard errors. Assumption 1
(i)–(iii) imply Assumption 2.1 of BL For the power series estimator Assumption 1 (ii) implies
Assumption 8 of Newey (1997), which in turn implies Assumption 2 of Newey that is equivalent
to Assumption 2.2 of BL. Assumption 2 implies that g is continuously di↵erentiable. Since the
argument Lnj �Hnj of g is univariate, this implies that Assumption 3 of Newey for d = 0 (that
is equivalent to Assumption 2.3 (i) of BL) is satisfied with ↵ = s = 1 (= � in BL) – see page
157 of Newey. With this smoothness requirement, the choice of K as in Assumption 3 satisfies
Assumption 2.3 of BL. Therefore, by Theorem 2.1 of BL, we obtain
pN
0
B@� � �
ˆ� � �
1
CA ! N(0,��1⌦��1)
as N ! 1 under Assumptions 1–4, where � is the positive definite matrix defined in Assump-
tion 4 and ⌦ is given by ⌦ = E⇥e0nj E
⇥("nj � "nj�1
)2 | Znj, Xnj, Lnj �Hnj
⇤enj
⇤.
To estimate the part ⌦ of the variance matrix which involves the first-di↵erenced residuals
"nj � "nj�1
, we need to obtain the estimate
ˆg(Lnj �Hnj) =�(Lnj �Hnj), · · · , (Lnj �Hnj)
K�(P 0
KPK)�P 0K
⇣Y � Z� �X ˆ�
⌘
of the unknown nonparametric function g at each data point (Lnj �Hnj) and then recover the
24
estimate
\("nj � "nj�1
) = Ynj�Ynj�1
�(Znj�Znj�1
)��(Xnj�Xnj�1
) ˆ�� ˆg(Lnj�Hnj�1
)+ ˆg(Lnj�Hnj�1
)
of "nj � "nj�1
. It follows from the uniform consistency of ˆg under Assumptions 1–3 (Theorem
2.2 (i) of BL) that \("nj � "nj�1
) is uniformly consistent for "nj � "nj�1
.
25
26!!
Table 1. Summary statistics
Variable Mean Std. Dev. Min Max Obs
Panel A: House level data
Sales price 321914 245299 10000 2950000 1200
Zillow estimate when sold 322056 230858 36000 2600000 1200
Zillow estimate 1 month prior to sale 320108 230967 38000 2600000 1200
Zillow estimate 2 month prior to sale 317984 230550 37000 2600000 1200
Zillow estimate 3 month prior to sale 315433 230222 16100 2600000 1200
Zillow estimate in the month listed for sale 324022 232077 33000 2900000 1189
List price 340012 251240 19900 2900000 1199
Number of days between listing and sales 185.1 269.9 2 3349 1195
Number of bedrooms 3.85 0.87 3 9 1200
Number of bathrooms 2.68 0.64 2 6 1199
Square footage 2373 547 2000 10890 1200
Lot square footage 8365 13936 595 304920 1199
Year built 1960 37 1810 2013 1200
Panel B: MSA level data
Land area 6040.69 4783.888 1600.9 27259.9 30
Population, 2013 4254068 3822144 1601565 1.97E+07 30
Housing units, 2013 1711144 1458024 654120 7797490 30
Number of families, 2013 1013282 876799.7 400899 4550781 30
Median household income, 2013 60789.97 11027.35 46497 90962 30
Population above 25 yrs old with bachelor degree 20.74333 3.258589 12.7 26.9 30
Unemployment rate, 2013 9.943333 1.777771 7 14.7 30
Internet penetration rate, 2013 0.9149734 0.0129532 0.8945243 0.9439421 30
Number of employees in residential real estate agents or brokerage, 2012 4311.6 4262.602 749 22063 30
Notes: Variables listed in Panel A were collected manually from Zillow. Panel B data was extracted from the US Census.
27!!
Table 2. The effect of online price estimates on sales prices: elasticity estimates using full sample
Dependent variable: Log sales price
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Log Zillow estimate1 1.015 1.000 0.996 0.876 0.893 0.948 0.948 0.947 0.941
(0.029) (0.028) (0.029) (0.040) (0.038) (0.035) (0.035) (0.036) (0.036)
Length of listing -0.514 -0.5 -0.674 (0.842) (0.844) (0.843)
Proxy2 0.227 0.21 (0.131) (0.020)
Proxy format None Linear Non-linear3 None Linear Non-
linear Non-linear
Non-linear
Non-linear
Local First Differencing Yes Yes Yes Yes Yes Yes Higher Order Covariates4 Yes Yes
Season Dummy Yes A. Null hypothesis: coefficient estimate of log Zillow estimate=1 p-value 0.603 0.989 0.881 0.002 0.005 0.138 0.136 0.148 0.104 B. Null hypothesis: All coefficient estimates of the control variables =0 p-value 0.009 0.011 0.023 0.003 0.035 0.503 0.506 0.501 0.443 R-Squared 0.908 0.919 0.935 0.905 0.918 0.935 0.935 0.935 0.935 N 1200 1200 1200 1200 1200 1200 1200 1200 1200
Notes: 1 – The Zillow estimates are estimates one month prior to the time of sales. 2 – The proxy is the initial listing price minus Zillow estimate around the time of the initial listing. 3 – Third order polynomial approximation. 4 – Polynomials of third order for bed, bath, square feet, lot square feet, and year built. Standard errors are in parentheses.
28!!
Table 3. The effect of online price estimates on sales prices: elasticity estimates by MSA
MSA Elasticity estimate1 p-Value: Elasticity
estimate1=0
p-Value: Elasticity
estimate1=1
p-Value: All coefficient
estimates on covariates2=0
R2 N
Atlanta 0.976 (0.234) 0.004 0.920 0.500 0.976 40 Baltimore 0.201 (0.297) 0.518 0.518 0.113 0.726 40 Boston 2.007 (0.263) 0.000 0.007 0.001 0.934 40 Charlotte -0.169 (2.926) 0.955 0.699 0.942 0.367 40 Chicago 1.000 (0.115) 0.000 0.997 0.701 0.969 40 Cincinnati 0.857 (0.290) 0.025 0.640 0.881 0.801 40 Columbus 0.513 (0.303) 0.125 0.142 0.906 0.600 40 Denver 1.139 (0.239) 0.001 0.575 0.960 0.901 40 Las Vegas -0.901 (0.755) 0.267 0.036 0.222 0.036 40 Los Angeles 0.799 (0.142) 0.001 0.197 0.179 0.940 40 Miami-Fort Lauderdale 0.928 (0.355) 0.031 0.845 0.288 0.900 40 Minneapolis-St. Paul 0.894 (0.148) 0.000 0.495 0.074 0.914 40 Nashville 1.095 (0.192) 0.000 0.631 0.476 0.770 40 New York 0.973 (0.501) 0.088 0.959 0.513 0.910 40 Orlando 0.587 (0.220) 0.026 0.093 0.714 0.835 40 Philadelphia 1.282 (0.344) 0.007 0.440 0.300 0.465 40 Phoenix 0.563 (0.361) 0.154 0.257 0.638 0.470 40 Pittsburgh 1.056 (0.120) 0.000 0.652 0.599 0.988 40 Portland 0.853 (0.491) 0.126 0.774 0.687 0.886 40 Providence-Warwick 1.617 (0.616) 0.034 0.350 0.689 0.570 40 Riverside 0.248 (0.385) 0.536 0.083 0.224 0.513 40 Sacramento 0.280 (0.584) 0.642 0.246 0.812 0.600 40 San Diego 1.656 (1.540) 0.314 0.681 0.316 0.232 40 San Francisco 0.476 (0.441) 0.311 0.269 0.170 0.458 40 San Jose 0.884 (0.195) 0.002 0.568 0.039 0.555 40 Seattle 0.749 (0.255) 0.022 0.358 0.627 0.628 40 St. Louis 0.305 (0.220) 0.202 0.013 0.209 0.797 40 Tampa 1.291 (0.148) 0.000 0.086 0.260 0.172 40 Virginia Beach 1.628 (0.675) 0.042 0.379 0.628 0.685 40 Washington DC 0.470 (0.222) 0.067 0.044 0.565 0.696 40
Notes: 1 – The coefficient estimate of the log Zillow estimate for each MSA using the specification in column (9) of Table 2. The Zillow estimates are estimates one month prior to the time of sales. 2 – Polynomials of third order for bed, bath, square feet, lot square feet, and year built. Standard errors are in parentheses.
29!!
Table 4. Factors that affect the elasticity estimates across MSA’s
(1) (2) (3) (4) (5) (6) Elasticity estimate (Effect of online price estimates on sales prices)
Internet penetration rate 22.28** 22.57** 19.00* 18.02* 32.57** 31.73*** (9.244) (9.412) (10.22) (9.191) (13.34) (10.83)
Ln(number of residential real estate agents)
-0.0342 -0.334 -0.244 -0.278 -0.160 (0.0920) (0.311) (0.288) (0.352) (0.326)
Ln(land area) -0.141 -0.168 0.00672 -0.107 (0.201) (0.165) (0.252) (0.243)
Ln(population) 0.454 -3.472** -2.824 0.0484 (0.407) (1.304) (1.925) (2.175)
Ln(number of families) 3.920*** 3.283* 5.156*** (1.286) (1.773) (1.312)
Ln(median household income) -2.061* -3.701*** (1.056) (1.034)
Ln(number of college degrees) 1.203 1.376 (0.994) (0.999)
Ln(housing units) -4.896* (2.530)
Unemployment rate -0.0825 (0.0872)
Observations 30 30 30 30 30 30 R-squared 0.246 0.248 0.289 0.406 0.522 0.636
Notes: The elasticity estimates for each MSA in Table 3 are the dependent variable. All control variables are for 2013, except for the number of real estate agents, which is for 2012. The internet penetration rate is 1 minus the share of households without internet and computer access. Robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1.
30!!
Table 5. Impact of Zillow estimates on list prices
(1) (2) (3) (4) List price List price Ln(list price) Ln(list price) Zillow estimate 0.00286 (0.0353) Zillow estimate one month ago 0.000661 (0.0158) Ln(Zillow estimate) -0.0210 (0.0330) Ln(Zillow estimate one month ago) -0.00261 (0.0138) House fixed effects Yes Yes Yes Yes Observations 2,257 1,620 2,256 1,619 R-squared 0.023 0.042 0.016 0.023
Notes: Robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1.
31!!
Table 6. Impact of Zillow estimates on the change in sales prices from list price one month prior to sales
(1) (2) (3) (4) Sales price - List price one month prior to sales
Panel A. Full sample Zillow estimate 1 moth prior to sales – Zillow estimate 2 month prior to sales
0.134 0.112 0.0933 0.0727
(0.105) (0.105) (0.0970) (0.0962)
Days listed -28.26*** -27.40*** -25.50*** -24.53*** (6.167) (6.425) (6.038) (6.269)
Change in MSA level Zillow estimate
0.774*** 0.776*** (0.184) (0.184)
Month fixed effect Y Y Year fixed effect Y Y Observations 801 801 801 801 R-squared 0.047 0.058 0.073 0.084
Panel B: Change in Zillow estimate<$100,000 Zillow estimate 1 moth prior to sales – Zillow estimate 2 month prior to sales
0.284** 0.264** 0.217* 0.196*
(0.117) (0.116) (0.111) (0.110)
Days listed -28.58*** -28.06*** -26.28*** -25.66*** (6.270) (6.503) (6.217) (6.454)
Change in MSA level Zillow estimate
0.632*** 0.619*** (0.177) (0.177)
Month fixed effect Y Y Year fixed effect Y Y Observations 769 769 769 769 R-squared 0.056 0.067 0.074 0.083
Panel C: Change in Zillow estimate<10 percent Zillow estimate 1 moth prior to sales – Zillow estimate 2 month prior to sales
0.456* 0.432* 0.380* 0.356
(0.255) (0.255) (0.227) (0.230)
Days listed -29.76*** -28.91*** -26.41*** -25.62*** (6.981) (7.086) (6.817) (6.945)
Change in MSA level Zillow estimate
0.825*** 0.803*** (0.198) (0.198)
Month fixed effect Y Y Year fixed effect Y Y Observations 625 625 625 625 R-squared 0.070 0.080 0.102 0.110
Notes: In Panel B and C, the sample is restricted based on the change of Zillow estimates between the month of sales and 3 months prior to sales. Robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1.
32!!
Figure1. Scatterplot between the elasticity estimate and the internet penetration rate across MSA’s
Notes: The elasticity estimates measure the impact of one month prior online price estimates on actual sales prices for each MSA and are from Table 3.
33!!
Appendix Table 1. The list of Metropolitan Statistical Areas and the summary statistics of variables used in the MSA level VAR analysis
Median Sales Price Median Zillow Estimates October 2008 October 2010 October 2012 October 2008 October 2010 October 2012
Atlanta 194240 148650 154635 155200 131700 112600 Baltimore 270735 274823 259375 255700 230000 220800 Boston 321025 331100 322985 323300 313500 315000 Charlotte 176225 180775 176900 150600 137200 136000 Chicago 238100 210125 195925 221000 183900 161600 Cincinnati 136300 127400 123600 147700 143900 141460 Columbus 154200 148200 157175 137200 128900 126700 Denver 239645 240925 246325 213600 208200 224400 Las Vegas 226500 138105 132500 199100 127700 122500 Los Angeles 458752 405300 400000 435300 415200 405600 Miami-Fort Lauderdale 240200 157523 161050 196400 143300 149800
Minneapolis 220000 195965 195750 207500 180300 173400 Nashville 160885 171750 171725 155100 148500 148400 New York 399750 386652 366450 396400 364500 343100 Orlando 200500 121250 130900 175500 127400 123700 Philadelphia 224748 223575 213550 216300 199900 186500 Phoenix 215933 149125 165250 189300 133700 154100 Pittsburgh 126800 121800 131425 102800 105900 111100 Portland 258036 239100 233750 259500 226300 226800 Providence 242250 231225 207965 237600 227200 211000 Riverside 286500 198900 202500 236300 193200 192000 Sacramento 289950 228050 221750 267700 226400 217900 St. Louis 163075 150825 154057 141200 135400 127100 San Diego 401500 355750 358900 373900 364900 362800 San Francisco 572925 483250 480800 536600 499000 512400 San Jose 606100 556425 564000 605700 561800 610400 Seattle 330875 309225 296612 330400 278900 267300 Tampa 168475 130196 124165 147000 117700 111700 Virginia Beach 220441 222750 215125 223100 210900 195700
Washington DC 348750 358245 339717 329500 307700 320200
Notes: The median sales price and Zillow estimates are for three bedroom single family houses.
! !
34!!
Appendix Table 2. MSA level VAR results
(1) Minimum AIC
(2) Minimum BIC
(3) LASSO (0.5)
(4) LASSO (1.0)
(5) LASSO (2.0)
p (df) p-val p (df) p-val df p-val df p-val df p-val Atlanta 10 0.000 9 0.000 10 0.000 8 0.000 10 0.000 Baltimore 10 0.000 10 0.000 10 0.000 9 0.000 10 0.006 Boston 10 0.159 10 0.159 10 0.153 10 0.186 10 0.159 Charlotte 10 0.010 9 0.017 10 0.010 10 0.009 6 0.367 Chicago 9 0.000 9 0.000 9 0.000 9 0.000 9 0.000 Cincinnati 10 0.001 10 0.001 10 0.000 9 0.000 9 0.000 Columbus 10 0.000 5 0.000 9 0.000 2 0.276 7 0.000 Denver 10 0.084 7 0.602 10 0.142 10 0.084 9 0.077 Las Vegas 10 0.003 10 0.003 7 0.000 4 0.000 7 0.000 Los Angeles 9 0.000 9 0.000 10 0.000 8 0.000 9 0.000 Miami-Fort Lauderdale 10 0.000 10 0.000 10 0.000 10 0.000 10 0.000 Minneapolis 7 0.000 5 0.000 10 0.005 10 0.003 9 0.003 Nashville 10 0.061 10 0.061 8 0.114 9 0.029 6 0.135 New York 10 0.044 6 0.000 9 0.007 10 0.021 10 0.000 Orlando 10 0.000 10 0.000 10 0.000 10 0.000 10 0.000 Philadelphia 10 0.003 4 0.239 10 0.003 9 0.010 9 0.001 Phoenix 10 0.000 9 0.000 10 0.000 9 0.000 10 0.000 Pittsburgh 10 0.000 10 0.000 9 0.000 10 0.002 10 0.000 Portland 10 0.065 10 0.065 9 0.039 4 0.000 2 0.232 Providence 10 0.000 10 0.000 10 0.000 9 0.000 10 0.000 Riverside 10 0.000 10 0.000 8 0.000 10 0.000 8 0.000 Sacramento 10 0.000 10 0.000 9 0.000 10 0.000 8 0.000 St. Louis 10 0.014 9 0.011 10 0.025 8 0.007 6 0.002 San Diego 9 0.000 6 0.000 10 0.000 10 0.000 9 0.000 San Francisco 10 0.000 10 0.000 10 0.042 10 0.005 9 0.000 San Jose 10 0.000 10 0.000 9 0.000 8 0.000 4 0.000 Seattle 10 0.000 10 0.000 9 0.000 8 0.000 9 0.000 Tampa 10 0.000 8 0.000 10 0.000 10 0.000 10 0.000 Virginia Beach 10 0.034 4 0.054 9 0.021 4 0.566 3 0.415 Washington DC 10 0.000 10 0.000 10 0.000 10 0.000 10 0.000 United States 10 0.009 10 0.009 10 0.006 5 0.000 6 0.015 Notes: The analysis was performed on monthly data over the period between Oct. 2008 and Apr. 2013.