Attenuation of Agglomeration Economies: Evidence from the ...types. Second, the spatial decay speed...

Attenuation of Agglomeration Economies: Evidence from the

Universe of Chinese Manufacturing Firms∗

Jing Lia†, Liyao Lia‡, Shimeng Liub§

a School of Economics, Singapore Management University, Singapore

b Institute for Economic and Social Research, Jinan University, China

April 27, 2020

Abstract

This paper examines the industry-specific attenuation speed of agglomeration economies

and its interplay with the large presence of state-owned enterprises in China. We achieve

this focus by taking advantage of unique geocoded administrative data on the universe

of Chinese manufacturing firms. The full-spectrum analysis also allows us to assess the

goodness of fit of various spatial decay functional forms and to systematically evaluate the

micro-foundations that govern the decay patterns across industry types. We obtain three

main findings. First, agglomeration economies attenuate sharply with spatial distance in

China, with large heterogeneity in the attenuation speed across ownership and industry

types. Second, the spatial decay speed is positively linked with proxies for knowledge

spillovers and labour market pooling but negatively linked with proxies for input sharing

and the share of the state sector. Last, the inverse square distance decay function presents

the best goodness of fit among the tested functional forms.

Key Words: agglomeration economies; attenuation speed; China; state-ownedenterprise (SOE)

JEL classifications: R1; R3; L3; L6

∗We thank Costas Arkolakis, Nate Baum-Snow, Stuart Rosenthal, William Strange, Nathan Schiff, MatthewTurner, Zhi Wang, Junfu Zhang, seminar participants at Singapore Management University, and participants ofthe 2019 China Meeting of the Econometric Society, 2019 Yangtze River Delta International Forum, 2019 SMUConference on Urban and Regional Economics, for their helpful comments. Jing Li gratefully acknowledgesthe financial support from Singapore Management University under the Lee Kong Chian Fellowship. ShimengLiu gratefully acknowledges support from the National Natural Science Foundation of China (71703059).Remaining errors are our own.†90 Stamford Road, Singapore 178903. Phone: +65-6808-5454. E-mail: [email protected].‡90 Stamford Road, Singapore 178903. E-mail: [email protected].§601W. Huangpu Ave., Tianhe District, Guangzhou, China 510632. Phone: +86-13403024213. E-mail:

[email protected].

1

1. Introduction

How do agglomeration economies attenuate spatially? This question is important because

the answer sheds light on the specific force of agglomeration economies that drives economic

concentrations and urban growth (Rosenthal and Strange, 2019). A growing literature ex-

amines the attenuation of agglomeration economies empirically and reveals striking decay

patterns that vary across industry types (Rosenthal and Strange, 2003, 2005, 2008; Fu, 2007;

Arzaghi and Henderson, 2008; Li, 2014). However, existing studies are insufficient to pro-

vide systematic evidence for the whole economy or to offer generic guidance on theoretical

models.1 Moreover, those studies are all implemented in developed countries (mostly in the

United States), despite a strong desire for understanding the economic fundamentals that

drive rapid urbanization in the developing world.

This paper takes an integrated approach to examine the spatial attenuation of agglomer-

ation economies and its interplay with the large presence of state-owned enterprises (SOEs)

in China. Specifically, we take advantage of unique geocoded administrative data on the

universe of Chinese manufacturing firms to embark on an extensive set of empirical exercises.

We start by providing the first comprehensive estimation of industry-specific attenuation

speed for the entire manufacturing sector. The nuanced full-spectrum analysis allows us, also

for the first time in the literature, to test for the goodness of fit of various spatial decay

functional forms.2 Next, using the most desirable functional form, we empirically verify the

theoretical insight that links the spatial decay speed with the micro-foundation of agglomer-

ation economies.3 Last, we systematically evaluate the role of SOEs in shaping the spatial

1One major limitation is that the studies only focus on a small set of industries. Examples include Rosenthaland Strange (2003) on the software, food processing, apparel, printing and publishing, fabricated metal, andmachinery industries; Arzaghi and Henderson (2008) on the advertising industry; and Li (2014) on the healthcare industry.

2Previous studies that model the spatial decay of productivity spillovers assume certain functional forms,such as the inverse exponential distance decay function in Lucas and Rossi-Hansberg (2002). However, thusfar, no statistical evidence exists to validate this assumption. By specifying rich distance-specific concentricring variables as measures of agglomeration for each industry, we generate sufficient variation in the data toestimate and compare the goodness of fit of various decay functional forms.

3Theory suggests that the spatial decay speed of agglomeration spillovers depends on which underlyingagglomeration forces are in action (Rosenthal and Strange, 2004; Combes and Gobillon, 2015). For instance,industries that heavily rely on knowledge spillovers as the main agglomeration force often require close-range,face-to-face contact, which implies a rapid spatial decay of agglomeration spillovers. Industries that clustermainly because of input-output links could have agglomeration externalities decay slowly and extend to a

2

manifestation of agglomeration economies.

Such a comprehensive analysis is essential to gaining a deep understanding on the nature

of agglomeration economies. The mechanism of agglomeration economies is widely used to

explain the formation of cities and the rapid economic growth taking place within cities

(Marshall, 1920; Glaeser et al., 1992; Ellison and Glaeser, 1999; Ellison et al., 2010). Less

is understood on the empirical underpinnings. The spatial attenuation of agglomeration

economies is an important empirical regularity that sheds light on the micro-foundations.

Therefore, a full-scope micro-level analysis across all industries helps reveal the specific micro-

foundations in each industry and the economic fundamentals that contribute to the spatial

structure of the macro-economy. The estimation of the attenuation speed and evaluation of

the spatial decay functional forms, empowered by the comprehensive analysis, provide useful

empirical guidance on calibrating theoretical work that explicitly models firm productivity

spillovers (Lucas and Rossi-Hansberg, 2002; Ahlfeldt et al., 2015).4

The scope and context of our paper also help to understand the specific economic funda-

mentals that drive urban growth in China and other developing countries. China has achieved

miraculous economic growth in the past few decades, exhibited mostly in its dramatic boom

in the manufacturing sector.5 While much has been proposed as the potential drivers, such

as improved labour mobility and rapid build-up of the transportation infrastructure, less is

understood on how different manufacturing industries in China benefit from various improve-

ments and thrive throughout the evolution (Au and Henderson, 2006; Tombe and Zhu, 2019;

Zhu, 2012). We reveal industry-specific micro-foundations through detailed spatial attenu-

ation patterns that help to shed light on the economic forces facilitating China’s dramatic

economic growth. Evidence from China, the largest developing country in the world, further

helps to verify whether the previously documented regularities of agglomeration economies

larger spatial scale.4For instance, Ogawa and Fujita (1980) and Fujita and Ogawa (1982) explicitly model spatial attenuation

of agglomeration economies to endogenize and determine the location of employment in a structural setting.The trade-off in the location decision resides in the tension between workers’ commuting costs and spatialspillover benefits accrued to firms. As shown in Fujita and Ogawa (1982), the equilibrium outcome rangesfrom a purely monocentric city to a complete dispersion, depending on the importance of the spatial decayfunction relative to commuting costs.

5China has become the world’s second-largest economy in terms of total gross domestic product (US13.608trillion in 2018, https://www.worldbank.org) within just four decades after its “reform and opening-up” policy.It is now often regarded as “the world factory.”

3

also hold for the developing world, adding to a growing body of literature on agglomeration

economies in developing countries.6

Moreover, a unique feature of the Chinese economy–the large presence of SOEs–allows

for investigating the role of SOEs in shaping the spatial manifestation of agglomeration

economies. The presence of SOEs remains controversial. They are usually stamped as be-

ing inefficient because of a lack of incentives and information (Megginson and Netter, 2001;

Djankov and Murrell, 2002; Huang et al., 2017; Estrin et al., 2009). However, because SOEs

are centrally controlled, they are capable of internalizing externalities to enhance efficiency.

Both aspects bear important implications on the agglomeration mechanisms and related in-

dustrial policies in countries with a large state sector. By documenting and comparing the

level and attenuation speed of agglomeration economies within and between the state and

private sectors, we reveal the nature of ownership-specific agglomeration economies that, to

our knowledge, has not been studied in the literature thus far.7

In theory, the benefits of agglomeration can be revealed by the relationship between ag-

glomeration and a variety of measures, including the location choice of new firms, total factor

productivity (TFP), total output per worker, and wages per worker. If agglomeration en-

hances productivity, it should manifest in higher TFP and total output per worker. As the

increased productivity raises firms’ willingness to pay for factor prices in a spatial general

equilibrium setting, wages per worker become higher (Rosen, 1979; Roback, 1982). More-

over, if the productivity gains outweigh the cost of agglomeration, new firms would prefer

to locate in places with higher density (Rosenthal and Strange, 2003; Arzaghi and Hender-

son, 2008). In this paper, we provide a comprehensive analysis of all the above-discussed

aspects of agglomeration benefits. We focus on the location choice of new firms in the main

empirical analysis but also consider firm TFP and other productivity correlates in a set of

complementary analyses.8

6Recent studies include Combes et al. (2015), Duranton (2016), and Chauvin et al. (2017).7Ge (2009) and Lu and Tao (2009) show that the agglomeration index is lower for industries with higher

shares of SOE employment, but their analysis is at the industry level and does not account for detailedgeographic variations to reveal the nature of spatial interactions between and within the private and statesectors.

8We measure firm birth as the probability of birth in each 2 km by 2 km grid in mainland China. Theadvantages of focusing on firm birth mainly reside in that new establishments are unconstrained by previousdecisions, such as the level of capital investments and output. While decisions on wage and output can be

4

To motivate our empirical specification of firm location choices and to diagnose potential

identification challenges, we present a simple conceptual framework based on Rosenthal and

Strange (2003) and Arzaghi and Henderson (2008). We specify a production function on

account of productivity spillovers that attenuate with geographic distance, similar in nature

to Lucas and Rossi-Hansberg (2002). We then specify the firm profit function as revenue,

subject to unobserved location and firm heterogeneity, minus costs. Costs include those for

labour and land capital and other location-specific fixed costs. In this way, one can clearly

see that the probability of firm birth is affected by the presence of existing firms at various

distances and that the identification of the impact of the agglomeration measures could be

compromised by its possible correlation with location-specific amenities and various cost

factors.

Guided by the conceptual framework, we conduct our empirical analysis in three steps.

First, we estimate the scale of distance- and industry-specific agglomeration economies to

explain the probability of firm births. Second, we estimate the attenuation speed of agglom-

eration economies by fitting a set of parametric functions to the first-step estimates. Last,

we explore the goodness of fit of various spatial decay functional forms, evaluate the micro-

foundations that govern the decay patterns across industry types, and reveal the role of SOEs

in shaping the spatial manifestation of agglomeration economies. Our key independent vari-

ables of interest in the first-step regression are measures of employment in the same industry

at various distances, conventionally adopted as the proxies for localization economies.9 To

alleviate concerns of omitted variable bias, we control for a wide range of related factors, in-

cluding proxies for urbanization economies (total manufacturing employment excluding own

industry within the same set of concentric rings), Herfindahl indexes that represent the local

industrial organization and industrial specialization, and city fixed effects.10

constrained by previous choices, such as previous investments in capital, firms’ location decisions are usuallytaking the existing economic environment as exogenously given (Rosenthal and Strange, 2003).

9Specifically, we create several concentric ring variables that measure employment in the same industry atvarious distances from a given location (that is, within 1 kilometer, between 1 and 5 kilometers, and so on)and regress firm birth probability at the centroid location on those concentric ring measures.

10The urbanization measures capture the positive externalities associated with the aggregate urban scale, aswell as the negative impacts of congestion and pollution. The Herfindahl indexes control for local competitive-ness and diversity of economic activities. Both variables play an important role in explaining the probabilityof new firm birth as emphasized in Glaeser et al. (1992) and Henderson et al. (1995). The city fixed effectshelp to absorb a wide range of city-specific amenities and disamenities that might influence the probability of

5

Despite various controls, the empirical identification of agglomeration economies is still

challenging. However, we argue that the identification for the attenuation speed of agglom-

eration economies is less vulnerable than the identification for the level of agglomeration

economies. The reason is that the potential bias in the level estimates is unlikely to be

systematically correlated with our distance measures in the second-step regression, which is

similar to the argument in Rosenthal and Strange (2008). For example, the conceptual model

suggests that one important factor in new firms’ location choices is the location-specific costs

of labour and land capital. To the extent that the cost factors cannot be fully controlled

for by observed location-specific attributes, the estimated localization economies in level

terms suffer from a downward bias.11 Nonetheless, since we control for various industry-

and location-specific attributes, the remaining bias caused by the cost factors is unlikely to

be systematically correlated with our distance measures. Thus, the estimates of attenuation

slope in the second-step regression are not necessarily biased.12

While we strongly believe in our baseline estimates, we adopt an instrumental variable

(IV) approach to corroborate our findings. We instrument for the contemporaneous industry-

specific employment concentration with a flexible function of historical attributes of the same

industry and exogenous grid-specific geological features, including the average terrain eleva-

tion and slope. We argue that because of accumulative effects, the historical measures remain

correlated with our key localization measures but not directly correlated with contempora-

neous unobserved cost factors (Ciccone and Hall, 1996; Combes et al., 2010). Similarly, the

underlying geological features are correlated with the likelihood of developing economic con-

centrations over time but not correlated with unobserved cost factors (Rosenthal and Strange,

2008; Glaeser and Kerr, 2009).13 We specify a nearly saturated functional form for the first-

firm birth, such as local fiscal policies and city-wide placed-based policies.11This potential omitted cost variable bias is in line with Arzaghi and Henderson (2008). They argued for

the need to control for location-specific unit rental cost in explaining the expected profit of advertising firms.Similarly, the possible presence of unobserved productivity amenities that affect both firm births and stockscauses ordinary least squares estimates to be upward biased.

12Note that the magnitude of the bias in levels caused by the presence of unobserved cost factors, as anexample, is determined by the correlation between the ring-specific cost factor and employment concentrations,as well as the ratio of standard deviations of both.

13The identification strategy may not completely address concerns on the presence of unobserved naturaladvantages in estimating the level coefficient. However, it should not compromise our identification of theattenuation slope in the second step because we explicitly control for proxies of natural advantages in oursecond-step industry-level regression. This point will be clear later in this paper.

6

stage regression to model the instruments in a flexible way. As we explain later in detail,

this effort leads to a set of seven hundred and twenty five instruments in total. We adopt

the machine learning Lasso framework to select the most relevant instruments to improve our

first-stage prediction while addressing potential issues associated with weak instruments.

Our empirical analysis draws on several unique administrative firm-level data sets. The

main data set is the China Economic Census (CEC), which provides the exact address of

all Chinese manufacturing firms and other employment information.14 One challenge with

respect to the measurement of nearby economic activities in this literature is the lack of

geographically refined data sets. For that reason, almost all studies on this topic rely on

aggregated data and implicitly presume certain distribution of firm locations within the ge-

ographic unit at which level the data are available. This approach imposes constraints on

identifying the spatial decay.15 In contrast, our data are well-suited to address this issue

because we have access to the exact address of each firm establishment which allows for a

more accurate and flexible way to quantify distance-specific agglomeration measures.16

We obtain the following findings. First, we find that agglomeration economies attenuate

with geographic distance. The initial attenuation is rapid, with the effect of own-industry em-

ployment in the first kilometer significantly larger than the effect of employment further away.

The estimated attenuation speed varies dramatically across industry types. The patterns are

largely consistent when we use the IV approach. The variation in the estimated industry-

specific attenuation speed is positively correlated with variations in proxies for knowledge

spillovers and labour market pooling but negatively correlated with variation in a proxy for

input sharing and the share of state firms. The correlations speak directly to the under-

lying micro-foundations of agglomeration economies. Moreover, among the various spatial

14The unit of observation in the CEC is the legal unit, which we will define formally in Section 4. Most ofthe legal units only consist of one establishment. In 2008, the percentage of the single-establishment legal unitwas 93.88. Thus, we treat a legal unit in our data as an establishment. In addition, we show the robustnessof our results using only the single-establishment legal units.

15For example, if the size of the geographic unit is large, the prior assumption on the spatial distributionwithin the unit would strip away any variation within a more refined distance range and would render thespatial decay pattern unidentified at a more refined distance. This might be especially concerning consideringthat the spatial decay of agglomeration economies could be rather fast for certain industries that rely heavilyon knowledge spillovers.

16In practice, we measure firm distances by allocating firms to each 2 km by 2 km grid and measure thedistance for each grid pairs. We elaborate on our measurement procedure and the reasons that we follow thisprocedure in a later section.

7

attenuation functions that we experimented with, the inverse square distance decay function

presents the best goodness of fit. This provides empirical guidance on the functional form in

modeling agglomeration economies.

Second, the agglomeration benefits and the associated attenuation patterns show strong

heterogeneity across ownership types. Both state and private firms benefit more from the

concentrations of own-type employment. Agglomeration spillovers across ownership types are

smaller in terms of magnitude and geographic scope, compared to within ownership types.

Relative to the average effects documented using the pooled data, existing private firms’

impact on private firms’ entry is both larger in magnitude and more significant. This pattern

suggests stronger agglomeration effects within the private sector. The estimated within-

SOE agglomeration effects are relatively weaker and are associated with a slower spatial

decay speed, compared to the effect within the private sector. Evidence suggests that the

agglomeration spillovers among SOEs could be more on input sharing and less on knowledge

spillovers. Similar patterns are also found when the agglomeration gains are measured in

TFP, output per worker, and wages per worker.

The findings in this paper contribute to the broad literature on the empirics of agglomera-

tion economies. Economists have long recognized the importance of agglomeration economies

that contribute to pronounced geographic clusters (Marshall, 1920; Krugman, 1991; Hender-

son et al., 1995; Henderson, 2003; Greenstone et al., 2010; Combes et al., 2012; Gaubert, 2018).

The exact magnitude of agglomeration economies, however, varies with the type of workers

and industries and with the period and country (Rosenthal and Strange, 2004; Combes and

Gobillon, 2015; Rosenthal and Strange, 2019). These differences arise because of variations

in the extent of the reliance on different fundamental sources of agglomeration economies.

They also arise because of the evolving nature of agglomeration forces at different stages of

economic development or unique institutional features of different countries and economies.

We extend the breadth of the literature by studying the nature of agglomeration economies

in China and by accounting for the role of its large presence of SOEs.

Our findings also add depth to the literature by systematically documenting the relation-

ship between the micro-foundations of agglomeration economies and the attenuation slope. It

8

is important to understand the source of agglomeration economies because different sources

have different policy implications (Holmes, 1999; Costa and Kahn, 2000; Duranton and Puga,

2004; Ellison et al., 2010; Liu, 2015; de la Roca and Puga, 2017; Diodato et al., 2018; Moretti,

2019; Davis and Dingel, 2019). Because different agglomeration forces should operate at dif-

ferent geographic scales, economists have long speculated a relationship between the nature

of agglomeration economies and the associated attenuation patterns (Rosenthal and Strange,

2003, 2008; Arzaghi and Henderson, 2008). Yet, no systematic evidence currently exists in

the literature to statistically establish such a relationship. We achieve that goal here and,

hence, support the notion that the evidence on attenuation of agglomeration economies is

relevant to determining their nature.

The rest of this paper is organized as follows. In Section 2, we discuss the conceptual

framework that explains agglomeration economies and firms’ location choices. Section 3

presents the empirical framework and addresses identification challenges. We discuss the

data and variables in Section 4. We present the baseline results in Section 5. Section 6

focuses on the probability of birth, TFP, and other correlates. We present a more focused

discussion on SOE versus non-SOEs in Section 7. We conclude with Section 8.

2. Agglomeration Economies and Location Choices

In this section, we first present a simple model of firms’ location choices in a standard

market economy on account of agglomeration economies that attenuate with geographic dis-

tance. The model setup helps to guide our empirical specification and interpretation and to

highlight potential identification pitfalls. We then discuss additional factors to consider with

the presence of SOEs in the Chinese economy.

2.1. Standard Market Economy

What determines firms’ location choices that give rise to the observed spatial variation

in the intensity of economic activities is one of the most fundamental questions in urban

economics. Firms could seek locations with natural productivity and/or cost advantages.

Firms could also seek locations with existing concentrations of similar firms driven by the

9

three Marshallian forces: knowledge spillovers, labour pooling, and input sharing (Marshall,

1920). Those forces contribute to the variation in firms’ geographic distribution in addition

to randomness and other organizational factors Ellison et al. (2010); Faggio et al. (2017).

The forces of agglomeration externalities attenuate with geographic distance. The speed of

attenuation differs as a result of industrial variation in the importance of different Marshallian

forces. Knowledge spillovers, for example, often require face-to-face contact within close

proximity. Labour pooling, which results from the ability to share a similar labour market

pool, takes place at a relatively larger distance. Input-sharing may operate at an even larger

geographic scope because the cost of transporting goods is kept low by the development

of transportation technology and infrastructure (Rosenthal and Strange, 2004; Combes and

Gobillon, 2015). Since the extent to which industries rely on each of the three Marshallian

forces is often different, the attenuation of agglomeration economies driven by these forces

would also be different across industry types.17

We specify a firm’s production technology by taking into consideration the industry-

specific geographic nature of agglomeration economies. The production technology is de-

scribed by a decreasing returns to scale production, with labour and land as inputs, and an

external effect that relates the productivity at a location to the density of economic activities

at other locations weighted by a spatial decay function. This external effect represents the ag-

glomeration forces mentioned earlier and was first introduced by Fujita and Ogawa (1982) as

the “location potential.” Lucas (2001) , Berliant et al. (2002) and Lucas and Rossi-Hansberg

(2002) model production technology in similar ways to study the structure of cities.

The production function is, written as the following:

Yi,s,c = γs

[∫ ∫f [µs, d(c, c′)] Lj,s,c′ djdc

′]As,c L

αsi,s,c K

βsi,s,c, (2.1)

where Yi,s,c is the output of firm i in industry s located at c, Li,s,c and Ki,s,c are units of

labour and land used by the firm, αs ∈ (0, 1), βs ∈ (0, 1), and αs+βs ∈ (0, 1). As,c represents

amenities that affect the productivity of firms in industry s at location c. The amenities con-

17Note that this statement does not preclude the possibility that close proximity is also required by thenetworking element to search for jobs and to create close interactions between buyers and sellers in inputsharing.

10

sidered here include observed characteristics of both own and surrounding locations, as well as

local attributes that are difficult to measure such as natural advantages, government policies,

and workforce qualities. The external Marshallian effect concerns localization externalities

arising from concentrations of own industry employment within and around a location and is

governed by two components. The first component is∫ ∫

f [µs, d(c, c′)] Lj,s,c′ djdc′, the sum

of employment of all other firms in the same four-digit industry in all locations weighted by a

spatial decay function f [µs, d(c, c′)]. This decay function decreases with the spatial distance

between c and c′, d(c, c′). µs is a parameter capturing the speed of the spatial decay.18 The

second component is γs, which is the industry-specific scale of agglomeration economies – that

is, the extent to which the weighted own industry employment across all locations impacts

the firm output.

Given the production technology, we formulate firms’ location choices as determined by

the potential profit to be achieved at different locations. The profit of a representative firm

i at location c is the following:

πi,s,c = pc,sYi,s,c − wcLi,s,c − rcKi,s,c − Fs,c, (2.2)

where pc,s is the price of the firm output, Fs,c is an industry and location specific fixed

cost, and wc and rc are the location-specific wage rate and rental cost, respectively. Firms

choose input quantities to maximize profits given wc and rc. In equilibrium, benefits from

agglomeration are capitalized into wages and rents (Glaeser and Mare, 2001; Rosenthal and

Strange, 2008; Arzaghi and Henderson, 2008), either through increased technological pro-

ductivity spillovers or “pecuniary externalities” (Combes and Gobillon, 2015). As we focus

on the determinant of new firms’ potential profits, we treat the output price, wages, and

land rents as given in our model. Potential endogeneity concerns with regards to empirical

estimation will be discussed in detail in Section 3.

Following Rosenthal and Strange (2003), we assume firms are heterogeneous in their

18The geographic scope and attenuation speed of such externalities are different across industries and con-trolled by the sector-specific decay function f [µs, d(c, c′)]. In the empirical estimation, we use several functionalforms to capture the spatial decay of agglomeration spillovers.

11

potential profitability. Hence, a specific firm’s profit function is expressed as

πi,s,c = pc,sYi,s,c(1 + εi,s,c)− wcLi,s,c − rcKi,s,c − Fs,c, (2.3)

where εi,s,c is the firm’s idiosyncratic productivity shock and is independent and identically

distributed across firms according to a cumulative distribution function Φs,c(ε).

Upon learning its own productivity shock, a firm will enter the market if its maximized

potential profit is non-negative. That is,

π∗i,s,c = max{L,K}

πi,s,c = pc,sY∗i,s,c(1 + εi,s,c)− wcL∗i,s,c − rcK∗i,s,c − Fs,c ≥ 0, (2.4)

where L∗i,s,c, K∗i,s,c and Y ∗i,s,c are labour, land capital and the level of output chosen at the

profit maximizing level, respectively. More specifically, we have

L∗i,s,c = (wβs−1c

rβsc)

1δs

[(1 + εi,s,c)ps,cγs

[∫ ∫f [µs, d(c, c′)]Lj,s,c′ djdc

′]As,cα1−βs ββs

] 1δs (2.5)

and

K∗i,s,c = ( rαs−1c

wαsc)

1δs

[(1 + εi,s,c)ps,cγs


′]As,cααss β1−αss

] 1δs , (2.6)

where δs = 1 − αs − βs. For any industry s at location c there is a cut-off ε∗s,c such that

π∗i,s,c ≥ 0 if and only if εi,s,c ≥ ε∗s,c. Using equations (2.3), (2.5), and (2.6), we obtain the

following expression ε∗s,c = (Fs,cδs

)δs(wcαs )αs( rcβs )βs(ps,cγs


′]As,c)−1−1. Therefore, the probability that a firm in industry s enters the market at location c is

1 − Φs,c(ε∗s,c), which is positively determined by the external effect and amenities, but is

negatively affected by wages, rental costs, and other fixed costs.

2.2. The Role of the State Sector

A large presence of SOEs is a unique feature of the Chinese economy. SOEs stand for

legal entities that undertake commercial activities on behalf of the government. Recent

estimates show that SOEs contributed 39 percent of China’s gross domestic product in 2015

(Holz, 2018). Despite being government-owned, SOEs’ priority is similar to private firms

in terms of delivering growth and tax revenue. This is especially true given that SOEs’

performance is closely tied to the promotion prospects of the supervising government officials

12

(Cao et al., 2019). Thus, the managerial decisions of SOEs are, at least in part, based on

profit maximizing incentives.

Nevertheless, the ownership nature and unique institutional features of SOEs entail addi-

tional factors coming into play in SOEs’ decision making and in the interactions within and

between SOEs and private firms. As a result, agglomeration channels in an economy with

a large presence of SOEs can be different from channels in a pure market economy. Entry

locations of private firms and SOEs could be affected differently by the presence of existing

SOEs and non-SOEs, and SOEs may have different incentives in response to agglomeration

externalities compared to private firms.

We first discuss how the entry decisions of private firms may be different depending on

existing concentrations of private and state firms. For private firms, the location decisions are

solely market-driven and exploiting local agglomeration economies from existing businesses is

a key factor to consider. The presence of nearby private firms attracts the entry of new private

firms by generating agglomeration externalities through the standard Marshallian forces. The

impact of nearby SOEs, however, is subject to the influence of unique institutional features

of SOEs that give rise to different interactions between the state and private sectors.

We highlight three important and related institutional features that contribute to dif-

ferent interactions between the state and private firms. The first relates to the government

favouritism toward and the ineffective management of SOEs. It is widely recognized that

SOEs are less efficient than private firms due to the lack of incentives and information (Hsieh

and Song, 2015; Huang et al., 2017). SOEs also face a different economic environment com-

pared with their private counterparts, because they have preferential access to loans and are

often protected by regulations that drive out competition from private companies. These

features imply that SOEs would not be as proactive as private firms to interact with other

firms and to contribute to local agglomeration economies. Therefore, new private firms would

be less attracted to concentrations of SOEs than to concentrations of private firms.

The second is through job hopping. In China, job hopping from SOEs to non-SOEs

is very limited because SOE jobs are more secure and offer higher pay (Meng, 2012; Ge

and Yang, 2014). Infrequent job turnovers between ownership sectors lead to a muted labour

13

pooling mechanism that would otherwise mutually enhance the productivity of both the state

and private sectors. The ineffective labour pooling mechanism between sectors decreases the

tendency of private firms to locate near SOEs relative to near private firms.

The third feature pertains to sharing inputs and outputs. Private firms may benefit more

from sharing inputs and outputs with their SOE counterparts than with other private firms

because SOEs are often sizable and stable business partners. Given their large capacity and

public ownership, SOEs may also have incentives to forfeit certain profit margins to benefit

local private firms in the partnership. This factor would increase private firms’ tendency to

locate near SOEs relative to private firms.

Thus, whether private firms are more or less likely to locate near SOEs is an empirical

question. However, since different mechanisms imply different attenuation patterns of ag-

glomeration economies, by documenting detailed attenuation patterns, we reveal the specific

mechanisms governing the nature of interactions within and between ownership sectors.

We next consider SOEs’ location decisions in response to the presence of existing state

versus private firms. Despite the lack of efficiency, in pursuit of profits, social planners still

locate SOEs based on market incentives, including opportunities to exploit agglomeration

economies. Given that private firms are, on average, more efficient and produce a higher

extent of spillover benefits, new SOEs may be more drawn to existing private firms than to

other SOEs. This is especially plausible considering that the majority of innovation activities

take place in the private sector and that the patents of private firms, as proxies of innovation,

are of higher quality as evidenced by that being cited more often and having a greater

international presence (Fang et al., 2017). Hence, to benefit from innovation spillovers of the

private sector, state firms need to be close to innovative private firms.

From another perspective, the central management of SOEs allows for the opportunity of

designing the spatial distribution of the state sector to maximize aggregate profits.19 SOEs

would then be more likely to locate closer to existing SOEs to internalize positive externalities.

This is especially true with the ownership status of SOEs widely decentralized during the SOE

19The social planner may take externalities into consideration and intentionally design clusters of SOEswhen making development plans. The industrial parks policy is one example in which the government isintentionally forming clusters of firms for possible externality internalization.

14

reform – local governments possess better information and are more effective in integrating

SOE management with the local economy (Huang et al., 2017).20 Thus, whether SOEs are

more attracted to the presence of state or private firms is also theoretically ambiguous.

All in all, while the agglomeration forces and associated attenuation patterns in China are

driven by similar market mechanisms as in developed countries, variations exist because of

China’s unique institutional features with the presence of a large state sector. By documenting

the different decay patterns of agglomeration economies within and between the state and non-

state sectors, we gain insight on the underlying micro-foundations that govern the spillovers

taking place within and across ownership types. These features help to shed light on the

underlying economic forces that drive rapid urban growth in China.

3. Empirical Framework

3.1. Estimation Procedure

In this section, we lay out the empirical framework to identify the spatial decay of ag-

glomeration economies in explaining firms’ location choices. As described in Section 2.1, the

aggregate agglomeration effect is captured by γs[∫ ∫

f [µs, d(c, c′)] Lj,s,c′ djdc′], where the

distance-weighted nearby economic activities are scaled by a factor of γs to constitute the

aggregate agglomeration effect. The related spatial attenuation of agglomeration economies

is captured by the function f [µs, d(c, c′)], which is embedded in the aggregate agglomeration

term. It is not straightforward to disentangle the attenuation speed from the scale effect in

levels, as the speed of the spatial attenuation, µs, is inherently linked to the scale parameter,

γs. In this paper, we adopt a two-step approach that relies on the specific distance-weighting

functional form to help identify µs.

In the first step, we estimate the scale of distance- and industry-specific agglomeration

economies in explaining the probability of firm birth at a location. In particular, we focus

on localization economies while controlling for urbanization economies, where both are spec-

ified in concentric ring variables as conventionally adopted in the literature (Rosenthal and

20The decentralization of SOEs means that the upper-level government delegates the right of control ofSOEs to lower-level governments.

15

Strange, 2004). In addition, we control for proxies for industrial organization and industrial

diversity as emphasized in Glaeser et al. (1992) and Henderson et al. (1995). The estimation

equation in the first step for each two-digit industry s is, hence, specified as follows (s is

suppressed for simplicity).

BirthRatec,p,i =∑r

αrLOCr,i +∑r

βrURBr,i + IOc,i +DIVc + µp + πi + εc,p,i. (3.1)

In the equation, BirthRatec,p,i is defined as the percentage of new firms at location

c (locations are defined as 2km by 2km grids) and prefecture city p out of all new firms

for each four-digit industry i. LOCr,i stands for localization economies, calculated as the

sum of within four-digit industry employment in industry i in concentric ring r. URBr,i

represents urbanization economies, calculated as the sum of all manufacturing employment

excluding industry i in concentric ring r. To capture localization and urbanization economies

at different distances, we specify four concentric ring variables for each location, in addition

to the own-location measures: own-location boundary to 5 km radius concentric ring, 5-10 km

concentric ring, 10-20 km concentric ring, and 20-30 km concentric ring. µp and πi represent

the prefecture city fixed effects and four-digit industry fixed effects.

In addition, we control for proxies for local industry organization and industry diversity.

Organizational features are captured by IOc,i, which is the Herfindahl index for each four-

digit industry within 30 km of the central grid c.21 It is defined as∑

j(Lj,i,c/Li,c)2, where

Lj,i,c is the employment level of firm j in four-digit industry i at the region within 30 km

of c, and Li,c is the total employment level of industry i at the region. We incorporate the

diversity of economic activity using a Herfindahl index of specialization, DIVc. It is defined

as∑

i(Li,c/Lc)2, where Li,c/Lc is industry i’s share of total employment within 30km of the

center of location c. By estimating equation (3.1) for each two-digit industry s, we obtain

distance- and industry-specific estimates of the localization impact on the probability of firm

birth, αr,s.

In the second step, we fit the first-step estimates of distance- and industry-specific agglom-

eration economies into various parametric decay functional forms to disentangle the speed of

21We also experimented with other distance radii, such as 1 km, 5 km, 10 km, and 20 km when definingthis control variable and our baseline results are very robust to this variation.

16

attenuation from the scale effect. Specifically, based on the ring definitions, we assign dis-

tance values, d, to the corresponding rings, r, and adopt several distance-weighting functional

forms to explain the variation in the first-step estimates of ring-specific (distance-specific) lo-

calization economies for each industry, which we now label as αr(d),s or equivalently αd,s. This

second-step procedure achieves two goals. First, we obtain an estimate of the speed of attenu-

ation, µs, for each industry. Second, we experiment with different types of distance-weighting

functional forms and test which functional form provides the best goodness of fit.

The estimation equation in the second step is as follows:

αd,s = µsf(d) + γs + εd,s, (3.2)

where f(d) is a specific distance-weighting functional form, µs is the parameter capturing the

industry-specific decay speed, γs represents industry fixed effects to soak up the industry-

specific scale effect and other unobserved characteristics, and εd,s is the error term. We

experiment with nine functional forms of f(d). These functions are (1) a negative linear

distance function f(d) = −d, (2) an inverse linear distance function f(d) = 1d , (3) an inverse

exponential distance function f(d) = 1ed

, (4) a negative square distance function f(d) = −d2,

(5) an inverse square distance function 1d2

, (6) an inverse square exponential distance function

f(d) = 1e2d

, (7) a negative cube distance function f(d) = −d3, (8) an inverse cube distance

function 1d3

, and (9) an inverse cube exponential distance function 1e3d

.

To verify the relationship between the industry-specific attenuation speed and the under-

lying micro-foundations, we estimate an alternative specification in the second step. Instead

of estimating industry-specific attenuation speed, we document how the decay speed changes

with variations in proxies for the three Marshallian agglomeration forces while controlling for

an industry’ SOE share and its reliance on natural resources, in the spirit of Ellison et al.

(2010). The estimation equation is as follows:

αd,s =µ1f(d)× 1[ISs>M(IS)] + µ2f(d)× 1[LPs>M(LP )] + µ3f(d)× 1[KSs>M(KS)] (3.3)

+ µ4f(d)× 1[NAs=NAH ] + µ5f(d)× 1[SOEs>M(SOE)] + µ6f(d) + εd,s,

where 1[·] denotes an indicator function; M(·) denotes the median of a variable; ISs, LPs, and

17

KSs stand for proxies for input sharing, labour market pooling, and knowledge spillovers,

repsectively; NAs is a proxy for an industry’s reliance on natural resources as a potential

alternative agglomeration force (Ellison and Glaeser, 1999); and SOEs is the share of SOEs

in industry s. Specifically, total transportation cost per dollar of shipment of final output

is used as a proxy for input sharing. Similar to Rosenthal and Strange (2001), we use the

percentage of workers with a college degree or above as a proxy for labour market pooling.

Innovative activity is related to the importance of knowledge spillovers and is measured by

the ratio of new product to total product. Three cost variables are used to construct a proxy

for the reliance of natural resources: water inputs per dollar shipment, energy inputs per

dollar shipment, and other natural resources inputs per dollar shipment.22 An industry is

defined as being highly dependent on natural resources (NAH) if at least two of the cost

variables for the industry are higher than its median.

3.2. Identification Strategy

Given the broad empirical setup, we now discuss and address concerns of endogeneity.

The literature on identifying the magnitude of agglomeration economies is typically con-

cerned with unobserved heterogeneity across people and locations.23 These two types of

unobserved heterogeneity give rise to multiple sources of endogeneity through unobserved

factors in As,c, wc, rc, and Fs,c in our setting and lead to an ambiguous direction of bias.

The first source of endogeneity is caused by factors in As,c that represent unobserved

location-specific productivity or unobserved variation in human capital induced by sorting.

Places endowed with higher productivity breed higher existing business concentrations and,

at the same time, attract new firms to arrive. Alternatively, people with higher ability may

sort into locations with better amenities or higher business concentrations, which also leads

to more frequent firm births (Combes et al., 2008). Thus, failing to directly control for As,c

will lead to an upward bias of the estimated agglomeration effect. The second source of

22Energy and other natural resources variables are constructed as in Rosenthal and Strange (2001). Coal,crude petroleum and natural gas are included as part of the energy variable, whereas output from mining,agriculture, and others are considered as other natural resources.

23For example, Glaeser et al. (2018) summarizes unobserved heterogeneity across people as reflecting “thesorting of people into places based on ability levels” and unobserved heterogeneity across places as reflecting“the tendency of people to move into areas that have endogenously higher productivity.”

18

endogeneity concerns the cost factors in wc and rc. Higher economic concentrations generate

larger agglomeration economies which boost productivity and induce higher wage rates and

land rents (Rosen, 1979; Roback, 1982). Wages and rents negatively affect the likelihood of

firm entry, which causes a downward bias of the estimated agglomeration effect.24 Sorting by

unobserved ability could exacerbate the downward bias by inducing wages to rise further in

highly concentrated locations.25 The third source of endogeneity is caused by the unobserved

fixed cost Fs,c, which could be driven by, for example, industry and location-specific policies.

The three sources of endogeneity give rise to an ambiguous direction of the bias in net.

To mitigate the bias in the estimated scale of agglomeration economies in the first step,

we include a comprehensive set of controls. First, we control for prefecture city fixed effects to

address cross-city differences in amenities, wages, rents, and location-based policies. If a city

can be considered as a self-contained labour market, wage rates should vary across cities but

not within a city, holding constant workers’ own productivity. Thus, city fixed effects could

help address concerns of unobserved wage differences arising from either location heterogene-

ity in productivity or ability sorting across cities. City fixed effects also help explicitly control

for cross-city differences in amenities, rental costs, and government policies that affect firms’

location preferences. Second, we control for ring-specific urbanization measures to address

concerns on unobserved variation in rental costs and other fixed costs within cities. Although

wages may not differ significantly within cities, rental costs do. Urbanization measures serve

as a good proxy for the overall demand for land, and hence, can help mitigate the concerns

of unobserved rental cost differences within cities. Ring-specific urbanization measures also

help control for unobserved positive externalities associated with the aggregate urban scale

and for unobserved negative impacts of congestion and pollution. Finally, we control for two

Herfindahl indexes for the 30 km concentric rings representing the local industrial organiza-

tion and industrial specialization separately. The Herfindahl indexes help address concerns

on unobserved local industry-level competitiveness and diversity of economic activities.

Despite various controls, we cannot fully resolve the concerns of omitted variables in

24This is in line with Arzaghi and Henderson (2008) on controlling for land rent.25If physical capital and human capital are complementary, the land rents also become higher (Acemoglu,

1996).

19

identifying the level of agglomeration benefits in the first-step estimation. However, we argue

that the identification for the attenuation speed of agglomeration economies in the second-step

estimation is less vulnerable than the identification for the level of agglomeration economies

because the possible remaining bias in the level estimates is unlikely to be systematically

correlated with our distance measures. In this case, potential bias in the level estimates

will be differenced out when we include industry fixed effects while fitting in a spatial decay

function to estimate the attenuation slope parameter in the second-step estimation.

While we strongly believe in our baseline estimates, we adopt an IV approach to cor-

roborate our findings. We instrument for contemporaneous industry-specific employment

concentration with a flexible function of historical employment concentrations of the same

industry in 1995 and exogenous grid-specific geological features including the average terrain

elevation and slope. The rationale for our instruments is as follows.

First, because of the accumulative effect, the historical measures of industry concentration

remain correlated with our key localization measures. However, the historical instruments

are not directly correlated with contemporaneous unobserved cost factors, similar to what

was argued in Ciccone and Hall (1996) and Combes et al. (2010). These instruments are

especially sensible in our setting given that the SOE reform policy in China was not imple-

mented widely back in 1995. At that time, the majority of firms were still SOEs, which were

established even earlier. The location choice of SOEs was subject to political and national

security considerations and was typically unrelated to market-driven cost factors before the

SOE reform. Second, the underlying geological features are correlated with the likelihood

of developing economic concentrations mentioned earlier but are not directly correlated with

contemporaneous cost and productivity factors. Instruments of this flavor have been used in

Rosenthal and Strange (2008).

To improve the efficiency of the IV estimation, we specify a nearly saturated functional

form of the instruments in the first-stage estimation. The idea is in line with the literature

on approximating optimal instruments nonparametrically, or equivalently in principle, by

constructing high-power polynomials (Amemiya, 1974; Chamberlain, 1987; Newey, 1990). To

construct the instruments, we follow a four-step procedure. First, as the geological data

20

contain the mean and standard deviation of terrain elevation and slope at the grid level,

we allocate each grid into the corresponding concentric rings (i.e., 0-1 km, 1-5 km, 5-10

km, 10-20 km, and 20-30 km rings). Second, for each concentric ring, we calculate the

first and second moments of grid-specific means and standard deviations for both the terrain

elevation and slope, which creates eight feature variables for each concentric ring.26 Third, we

discretize the mean and standard deviation of grid terrain elevation and slope for the 0-1 km

ring and the eight feature variables for each of the outer rings into ten separate categorical

dummies each to capture the nonlinearity of the impacts, which forms three hundred and

sixty variables. Fourth, we include the 1995 industry- and ring-specific employment counts,

discretized location-specific geological features, and the interactions to form the final set of

seven hundred and twenty five instruments.

The improved efficiency from using a large set of instruments comes at a cost arising from

potential weak instruments, which we resolve using the least absolute shrinkage and selec-

tion operator (Lasso). IV estimators based on many instruments may present undesirable

properties, such as the presence of weak instruments (Andrews et al., 2019). Weak instru-

ments may cause traditional IV estimates to be badly biased since t-tests may fail to control

size and conventional IV confidence intervals may not cover the true parameter value with

intended probability. This potential problem, however, can be resolved by appealing to the

Lasso procedure.

Lasso was introduced by Frank and Friedman (1993) and Tibshirani (1996) and is widely

used as an estimator of regression functions and as a model selection device. Using Lasso to

form first-stage predictions in IV estimation is a practical approach that obtains the efficiency

gains from using optimal instruments while dampening the problems associated with many

instruments (Belloni et al., 2012). However, the Lasso selection techniques are not perfect, and

selection mistakes could contaminate the post-model-selection estimator and inference (Leeb

and Potscher, 2008; Andrews et al., 2019). For this reason, we use the procedure proposed by

Chernozhukov et al. (2015) as opposed to the standard Lasso or post-Lasso procedure.27 In

26Note that the 0-1 km ring only consists of one grid. Thus, the 0-1 km ring is only associated with fourfeature variables, the mean and the standard deviation of grid terrain elevation and slope.

27The standard post-Lasso estimator performs the least square estimation using Lasso-selected instruments.

21

Chernozhukov et al. (2015), Lasso-selected variables and post-Lasso coefficient estimates are

used to construct orthogonalized versions of the dependent variable, independent variables of

interest, and optimal instruments. These variables are then used in a standard IV regression

for the final estimation. In this way, the estimation and inference for the parameters of

interest are locally insensitive to exclusions of relevant instruments.

4. Data, Variables, and Summary Statistics

4.1. Data

Our empirical analysis relies on three administrative firm-level data sets that are obtained

from the National Bureau of Statistics (NBS) of China and a complementary data set that

helps to form our instruments. The first and primary firm-level data set is the CEC.28 The

CEC is available for 2004 and 2008. We use the 2008 economic census for our core analysis

on firm birth. The 2004 economic census is used to construct agglomeration measures for

supplemental analyses on firm productivity and other correlates, which rely on firm data prior

to 2008. Both years provide information on a set of firm attributes, including firm name,

firm address, legal unit code, legal representative name, industry classification, opening year,

ownership type, fixed capital, output value, employment size, and others. The CEC catego-

rizes firms into four-digit industries based on the four-digit Chinese Industry Classification

(CIC) system. We focus on the manufacturing firms with a two-digit industry code from 13

to 42 in this study.

The unit of observation in the CEC is a legal unit (faren danwei). A legal unit needs

to meet several requirements: “(1) They are established legally, having their own names,

organizations, location and able to take civil liability; (2) They possess and use their assets

independently, assume liabilities and are entitled to sign contracts with other units; and (3)

They are financially independent and compile their own balance sheets.”29 By definition, a

legal unit is similar to a firm. Thus, for the rest of this paper, we refer to legal units as firms.

28The NBS conducted the first economic census in 2004, the second economic census in 2008 and thefollowing economic censuses every five years.

29China Statistical Yearbook 2009, Chapter 13.

22

Conveniently for our purpose, most firms in the CEC consist of a single plant. For example,

in the 2008 economic census, the share of single-plant firms is ninety-four percent. Thus, we

treat this data as an establishment-level data set for our main analysis, but we also carry out

estimations using only single-plant firms for robustness.

There are two major advantages of the CEC that make it well-suited for our study. First,

the CEC is the most comprehensive firm-level data set for the Chinese economy. The data

cover the universe of all registered firms in China, irrespective of size.30 As an example of the

scope, there are 5,228,726 firms included in the 2008 economic census. Observing the universe

of firms allows us to characterize the spatial distribution of economic activities accurately.

Second, the CEC provides detailed information on firm addresses, which allows us to

geocode the addresses to obtain the exact longitudes and latitudes of firm locations.31 This is a

major improvement over the previous studies, which often rely on geographically aggregated

data and implicitly presume certain distribution assumptions of firm locations within the

political and administrative units that are often of irregular shapes. The assumptions lead

to measurement errors and estimation bias and are likely to compromise the identification

of the spatial decay of agglomeration economies. In contrast, our empirical analysis is based

on accurate firm locations that allow for a more precise and flexible way of measuring firm

distances and of quantifying distance-specific agglomeration economies.

The second firm-level data set that we use is the Annual Surveys of Industrial Firms

(ASIF) of China from 1998 to 2007. The ASIF is an annual firm panel that covers private

firms with annual sales exceeding five million yuan and all SOEs.32 Similar to the CEC, most

firms in this data set are single-plant firms.33 The ASIF also contains detailed information

on firm addresses, which we use to geocode all firms and to obtain their exact longitudes

and latitudes. It provides the firms’ basic balance sheet information from 1998 to 2007

that enables us to estimate firm-level TFP. We estimate firms’ TFP, output per worker and

30Self-employed individuals and private firms with up to eight employees may operate under a different legalsystem and be excluded from the economic censuses.

31Accurately geocoding addresses in China can be challenging because all map service providers in Chinaare mandated by the government to mask the exact longitudes and latitudes. We provide more detailedbackground and discuss how we overcome this issue in Appendix A.

32In 2011, the sampling cut-off for private firms increases to twenty million yuan.33For instance, in 2007, the share of single-plant firms is 96.6 percent (Brandt et al., 2012).

23

average wage per worker from 2004 to 2007 and study how firm productivity is affected by

agglomeration measures in 2004.34 We adjust all dollar variables using the national consumer

price index so that they are comparable across years.35

We construct our IVs with a third firm-level data set and the digital elevation model

image of China. The third firm-level data set is an industrial firm census conducted by the

NBS in 1995. This survey covers all industrial firms in mainland China at that time. In

total, 510,381 industrial firms were surveyed. Similar to the CEC, the basic firm attributes

are collected in this survey. We construct our IVs of historical industrial attributes with this

data set. In 2003, the NBS adjusted the CIC system to reflect changes in the economy. To

ensure consistency in industry classification in our analysis, we constructed a harmonized

industry classification following Brandt et al. (2017). The digital elevation model image of

China provides the mean and standard deviation of terrain elevation and slope for each 2km

by 2km grid in China and is used to construct the geological instruments. Details on the

construction of instruments are explained in Section 3.2.

4.2. Variables and Summary Statistics

In this section, we first develop a general understanding of the spatial distribution of

economic activities for each manufacturing industry in China. In Table 1a, we list the top five

cities with the highest concentration of each two-digit industry. Three patterns emerge. First,

within each industry, we see a strong agglomeration pattern. For example, Dongguan hosts

more than 15 percent of firms for the industry of stationery, educational, and sports goods.

Even for some less agglomerated industries (food production, raw chemical materials and

chemical products, medical and pharmaceutical products, and non-metal mineral products),

the total share of firms in the top five cities is more than 12 percent. Second, across industries,

there is large heterogeneity in the level of agglomeration. The most agglomerated industry

is electronic and telecommunications, with around 20 percent of firms locate in Shenzhen,

whereas for beverage production, the highest firm share is only 3.46 percent in Yibin. Third,

34The TFP estimation method is described in more detail in Appendix B. We calculate firm TFP until 2007because the ASIF misses key variables for TFP estimation after 2007, such as the value added.

35The consumer price index is provided by the NBS at http://www.stats.gov.cn.

24

while a lot of industries concentrate in large cities, clusters also form in relatively smaller

cities. For example, food processing seems to concentrate in relatively smaller cities, including

Weihai, Yantai, Qingdao, Linyi, and Weifang.36

Next, we discuss how we measure economic concentrations in our analysis. We choose

away from replying on agglomeration measures within readily defined administrative bound-

aries for two reasons. The first reason is that we focus on capturing agglomeration economies

at more refined distances, and even the finest administrative areas are generally too broad

for our analysis. The second reason is that China’s official administrative areas are found to

be inconsistent with the power law (Dingel et al., 2019) and, thus, may be different from the

commonly defined “cities” used in research on developed countries. For these reasons, we use

concentric variables to capture economic concentrations at refined distances. Ideally, we could

define our concentric ring measures based on the exact pair-wise distances between firms. In

practice, we make a compromise by following the below procedure to save computational

power.

We measure firm distances and construct concentric ring measures in the following pro-

cedure. We first overlay 2 km by 2 km grids on the entire mainland China. We then pinpoint

all firms into those grids based on firms’ longitude and latitude.37 The distance between a

pair of firms is then calculated as the distance between the grid centroids that the respective

firms belong to. As we work with a large sample of firms, this simple aggregation allows

us to keep the calculation intensity at a manageable level without compromising much on

precision.38 An underlying assumption is that all firms in a grid locate at the grid centroid.

Finally, we construct five concentric rings (0-1 km, 1-5 km, 5-10 km, 10-20 km, 20-30 km)

from the centroid of each gird and examine how economic activities in each of five concentric

rings affect the economic outcomes of firms in the central grid.39

In the first-step regressions, we relate firm birth patterns to concentric ring-level mea-

36Industry-specific employment heat maps are provided in Online Appendix Figure OA1 for a better visualpresentation.

37Industry-specific employment concentrations at the grid level are reported in Appendix Table A1.38Given that our grids are sufficiently refined geographically, the assumption should not lead to severe

measurement issues. Our method is still superior to the previous studies using geographically aggregated datasince our aggregation is geographically refined and is of regular shape.

39Note that the 0-1 km ring is fully covered by the central grid as the size of the grid is 2 km by 2 km. Thiswill not affect the calculation since we assume all firms in a grid locate at the grid centroid.

25

sures of localization and urbanization economies. The ring-level localization and urbaniza-

tion measures, our key independent variables, are calculated as the existing concentration of

employment within and outside of each four-digit industry in the five concentric rings around

each grid, respectively. To avoid large amount of zeros, we restrict our sample to only those

grids with either existing economic activities or new firm births within each industry. Table

1b reports the summary statistics on the concentric ring measures of localization and ur-

banization economies. The variation in both localization and urbanization measures is large

across different locations. By construction, the outer rings on average have more employment

because they cover larger ground areas.

The dependent variable, new birth share, is calculated as the percentage of firm birth in

a grid out of all new firms for each four-digit industry. Birth share can be interpreted as the

probability of new firm birth incurring in four-digit industry s in grid c. Mathematically, it

is defined as ss,c = ns,c/ns, where ns,c is the total counts of new firms in four-digit industry

s in grid c, and ns,c is the total counts of new firms in four-digit industry s across all grids.

We define a firm as a new firm if the firm’s opening year is 2008 in the 2008 CEC. Using

birth share as the dependent variable is also consistent with our conceptual framework which

shows that the presence of nearby economic concentrations affects the probability of firm

birth at the central location. Table 1c reports summary statistics on the grid-level new firm

birth share for each industry.40

5. Probability of Birth

5.1. Industry-Specific Attenuation

In this section, we present the estimates of distance- and industry-specific agglomeration

economies in explaining the probability of firm birth, as represented in equation (3.1). We

report the ordinary least squares (OLS) and IV results for nine selected manufacturing indus-

tries in Tables 2 and 3. The OLS and IV results for the complete list of twenty-nine industries

are reported in Appendix Tables A2 and A3. The nine selected industries are food production

40Online Appendix Table OA1 reports summary statistics on the grid-level new employment birth share foreach industry.

26

(CIC 14); tobacco processing (CIC 16); furniture manufacturing (CIC 21); printing and record

pressing (CIC 23); medical and pharmaceutical products (CIC 27); transportation equipment

manufacturing (CIC 37); electronic and telecommunications (CIC 40); instruments, meters,

cultural, and official machinery (CIC 41); and artwork and other manufacturing (CIC 42).

We select the industries if they rank at the top in one of the three industry characteristics

that capture the underlying Marshallian forces: the share of transportation costs, the share

of high-skilled labour, and the share of new products.41 We also select two industries that

lead in the share of SOE employment, which highlights a unique feature of the Chinese

economy and may lead to different patterns of agglomeration economies in China. The main

consideration to report the nine industries rather than all in the main text is to save space,

but we also think these industries are especially useful examples to help shed light on the

forces behind the industry-specific attenuation patterns of agglomeration economies. Many

of these industries have also been the focus of other studies, such as Rosenthal and Strange

(2001, 2003) and Jofre-Monseny et al. (2011).

The general patterns in Tables 2 and 3 are quite similar.42 For most industries, localization

economies in the proximity of a grid, as captured by nearby concentration of firms in the same

four-digit industry, strongly increase the probability of firm birth in the grid. More important,

the estimated localization economies attenuate rapidly with distance. In fact, the localization

effects completely decay within 5 km for most industries. Urbanization economies, as captured

by nearby concentration of firms in other industries, also have positive impacts on firm births.

However, such positive urbanization effects completely decay or even become negative after 1

km, possibly due to rising congestion costs. These general patterns emerge for most industries

and are consistent with previous findings in Rosenthal and Strange (2003) and Arzaghi and

Henderson (2008).

The results in Tables 2 and 3 also reveal important heterogeneity in the attenuation

41Transportation cost is calculated based on the input-output table of China. High-skilled labour is definedas workers with a college degree or above. The share of new products is calculated as the ratio of new productoutput to total output, where the new product is defined by ASIF as a product that is not the main outputsof the firm in the last year. The share of SOE employment is calculated as the ratio of employment in SOEsto total employment in all firms.

42An extant literature has followed similar approaches to address potential unobserved features associatedwith city size and also found it to be of small practical importance (Ciccone and Hall, 1996; Combes et al.,2010; de la Roca and Puga, 2017).

27

patterns of agglomeration economies across industries. We present the IV estimates of lo-

calization economies in Figure 1 (except for the tobacco processing industry), which visually

illustrates this heterogeneity. At a first glance, we find that the industries with faster at-

tenuation of localization effects are usually associated with higher innovative activities or

more college-educated labor (e.g., electronic and telecommunications; instruments, meters,

cultural, and official machinery; and artwork and other manufacturing). We will further in-

vestigate this important correlation in Section 5.3. In addition, we fail to find evidence for

either localization or urbanization economies for the tobacco processing industry, which is

the industry associated with the highest share of SOE employment. The unique nature of

the state sector seems to play an important role in determining the pattern of agglomeration

economies. We will explore the role of the state sector further in Section 7.

Comparing Tables 2 and 3, some small discrepancies exist between the OLS and IV

estimates, which suggest possible presence of omitted variable bias. One discrepancy is that

the IV estimates of localization effects for the 0-1 km ring are lower than the OLS estimates

for industries of food production (CIC 14), furniture manufacturing (CIC 21) , medical and

pharmaceutical products (CIC 27), and electronic and telecommunications (CIC 40). This

difference could be explained by the possible presence of unobserved and persistent amenities

affecting both firm births and stocks, which cause OLS estimates to be upward biased. This

explanation is plausible because the role of unobserved and persistent amenities is potentially

important in many industries such as food production, which depends heavily on natural

resources, and furniture manufacturing, which depends on water resources.43

In other instances, the IV estimates of localization effects for the 0-1 km ring are higher

than the OLS estimates, such as for the printing and record pressing industry (CIC 23);

transportation equipment manufacturing industry (CIC 37); instruments, meters, cultural,

and official machinery industry (CIC 41); and artwork and other manufacturing industry

(CIC 42). For those industries, the downward bias caused by insufficient controls for labour

and rental costs may outweigh the upward bias induced by unobserved local amenities in the

OLS estimates. This is plausible given that those industries are relatively capital intensive

43The food production industry and furniture manufacturing industry rank among the highest in the costof natural resources and water-related costs, respectively.

28

and rely more on high-skilled labour.

We also note that for several industries (e.g., food production), the IV estimates of lo-

calization effects for the second and third rings tend to be higher than the OLS estimates.

The same pattern is found in Arzaghi and Henderson (2008) for the advertising industry

in Manhattan. Similar to their argument, this opposing effect with IV estimation could be

driven by the spatial correlation in the unobservables. As the unobserved amenities in the

own ring and neighboring rings are likely to be positively correlated, the neighboring rings

may draw firm births away from the own ring, which biases the OLS estimates of localization

effects in the second and third rings toward zero.

Thus, along with the results for the complete list of twenty-nine industries in Appendix

Tables A2 and A3, we find large heterogeneity in the attenuation patterns of agglomeration

economies across industry types.44 Despite small discrepancies, the general patterns from the

OLS and IV estimates are largely similar, which suggests that our OLS estimates are generally

robust. Previous studies that focus on only a few industries are unable to systematically

document this important heterogeneity and relate it to the underlying micro-foundations.

We explore this relationship after evaluating the goodness of fit of various spatial decay

functions in the next subsection.

5.2. Spatial Decay Function

Here, we statistically test for the goodness of fit of nine spatial decay functional forms.

These functions are (1) a negative linear distance function f(d) = −d, (2) an inverse linear

distance function f(d) = 1d , (3) an inverse exponential distance function f(d) = 1

ed, (4) a

negative square distance function f(d) = −d2, (5) an inverse square distance function 1d2

,

(6) an inverse square exponential distance function f(d) = 1e2d

, (7) a negative cube distance

function f(d) = −d3, (8) an inverse cube distance function 1d3

, and (9) an inverse cube

exponential distance function 1e3d

.

We estimate equation (3.2) with each of the nine functional forms using OLS and IV

estimates of localization effects obtained from the first-step regressions, respectively. We

44In Online Appendix Tables OA2-OA3, we document distance- and industry-specific localization economiesfor the sample of single-establishment firms and the sample of non-SOEs. The results are quite similar.

29

weight the observations based on the inverse of the estimated standard error, while allowing

for the attenuation parameter to vary with each two-digit industry. In the meantime, we

control for industry fixed effects to capture the scale effect of agglomeration economies, as

well as other industry-specific features, such as the extent of its reliance on natural resources.

We then compare the estimated root mean squared errors (RMSE) and mean absolute errors

(MAE) of the estimated models. In principle, the functional form that produces the smallest

RMSE and MAE presents the best fit to the data.

Table 4 reports the coefficient estimates of industry-specific attenuation slopes based on

an inverse square distance decay function. We will show later that this functional form has

the best goodness of fit to the data. The estimates based on the rest of the functional

forms are reported in Appendix Tables A4.1-A4.2 and in Online Appendix Tables OA4.1-

OA4.6. These estimates are produced based on the IV coefficient estimates from the first-step

regressions. Results based on OLS estimates from the first-step regressions are very similar.

Corresponding tables are available upon request.

Table 5 reports the goodness of fit for all nine functional forms. Specifically, the RMSE

and MAE are provided for model estimations based on four sets of first-step results: OLS

estimation of the birth model, IV estimation of the birth model, OLS estimation of the new

employment model, and IV estimation of the new employment model. Based on the RMSE

and MAE statistics, the fifth functional form specification, f(d) = 1d2

, dominates all others

in the model’s goodness of fit. The second-best functional form based on the same criteria

for the firm birth model is the inverse exponential distance function, f(d) = 1ed

, which has

been adopted in various settings including Lucas (2001) and Ahlfeldt et al. (2015).

5.3. Attenuation and Micro-foundations

In this section, we investigate the cross-industry variation in the estimated attenuation

slope and how the variation relates to the underlying micro-foundations of agglomeration

economies. As the inverse square distance decay function presents the best goodness of

fit, we focus on the corresponding estimates in Table 4, although the rest of the functional

forms produce similar patterns. We first highlight that the estimated attenuation speed

30

parameter is very heterogeneous across industries. For example, the industries of chemical

fibers, artwork and other manufacturing, and rubber products have the fastest attenuation

speed of localization economies, while tobacco processing seems to have non-attenuating

localization economies.

Second and more important, there exists an important empirical relationship between

the attenuation speed of localization economies and industry proxies for Marshallian forces

(the share of transportation costs, the share of high-skilled labour, and the share of new

products). First, we unfold the relationship visually in Figure 2 by presenting kernel density

estimations of the industry-specific attenuation speed parameters stratified based on whether

each industry characteristic is above or below its median.45 The industry characteristics we

consider include the above-mentioned proxies for Marshallian forces, the reliance on natural

resources, and the SOE share.

As shown in panel 1 of Figure 2, the decay speed of industries that rely more on transporta-

tion (red line) is concentrated at a lower level than industries that rely less on transportation

(blue line). The pattern is consistent with the idea that industries that rely more on trans-

portation are usually more reliant on input sharing as a source of agglomeration economies

and that input sharing is associated with a slower attenuation slope and a larger geographic

scope. As an example, the food production industry (CIC 14) has the highest reliance on

transportation among all industries. As shown in Tables 2-4, for this industry, the scope of

localization economies is among the largest and the attenuation speed is relatively small.

We use the new product ratio in an industry as a proxy for its reliance on knowledge

spillovers. In theory, as knowledge exchange and networking benefits are heavily reliant on

face-to-face contact and extremely localized, knowledge spillovers are usually confined within

a very narrow scope. Indeed, panel 2 of Figure 2 shows that for industries with larger than

median new product ratio (red line), the decay speed parameters are concentrated at a higher

level with a much larger upper bound. For example, transportation equipment manufacturing

and electronic and telecommunications have the first- and second-highest new product ratios,

respectively, and their decay speed of localization effects is relatively fast.

45We use the Epanechnikov kernel density function with the bandwidth selected using cross-validation.

31

An industry benefits more from labour market pooling if the industry has a high reliance

on skilled labor. Thus, the percentage of college-educated workers in an industry can be used

as a proxy for the importance of labour market pooling to an industry. Panel 3 of Figure

2 shows that the decay speed parameters for industries with a larger than median share of

college-educated workers (red line) are concentrated at a higher level, which indicates the

benefits of labour market pooling decay fast in general. In panels 4 and 5 of Figure 2, we

show that the reliance on natural resources and SOE share seem to play an important role

in an industry’s attenuation pattern. While the impact of an industry’s reliance on the

attenuation speed is unclear, higher than median SOE share (red line) seems to point to a

slower attenuation speed.

Next, we explore the relationship in a regression setup by showing whether an industry’s

attenuation speed changes with proxies for micro-foundations of agglomeration economies

while controlling for an industry’s reliance on natural resources and SOE share.46 We fit the

estimates of localization effects from the first-step OLS and IV regressions into a spatial decay

function and the interaction terms of the decay function with indicator variables to capture

the above-discussed key industry characteristics, as represented in equation (3.3). Those

industry characteristics are chosen in the spirit of Ellison and Glaeser (1999), Rosenthal

and Strange (2001), and Ellison et al. (2010), and the indicator variables are defined in the

empirical framework. Table 6 reports the regression results, with the upper panel using OLS

estimates of localization effects and the lower panel using IV estimates of localization effects.47

The general patterns in Table 6 are consistent with the kernel density estimation results.

In column (1), we only include the interaction terms of the decay function with the indicator

variables that represent high new product ration, high reliance on natural resources, and high

SOE ratio. The coefficient on the interactive term of the decay function and the high new

product ratio is positive and statistically significant at the 1 percent level in both panels,

which indicates a faster decay of localization economies for industries that rely more heavily

46The logic underlying the specification is in line with the literature on industry co-agglomeration pat-terns, which emphasizes the role of both natural resources and agglomeration forces in determining industryagglomeration. See, for example, Ellison and Glaeser (1999) and Ellison et al. (2010).

47In Online Appendix Table OA5, we show the same relationship with the estimates of localization economiesobtained with the sample of non-SOEs.

32

on knowledge spillovers. Similarly, in columns (2)-(3), we examine how the attenuation speed

changes with high college-educated worker ratio and high transportation cost. The results

suggest that the decay speed of localization economies is faster for industries relying more on

labor market pooling. The coefficient estimate on the interactive term of the decay function

and high transportation cost is too imprecise for us to draw a conclusion. In column (4),

we include all interaction terms to test the micro-foundations simultaneously. The estimates

become imprecisely estimated due to colinearity, but the general patterns still hold. Finally,

in all specifications, the coefficient on the interactive term of the decay function and high

SOE ratio is negative and statistically significant, which indicates a slower decay speed of

localization economies for industries with a higher SOE share.

5.4. Comparison with Rosenthal and Strange (2003)

We now contrast our estimates of industry-specific attenuation of agglomeration economies

with estimates from Rosenthal and Strange (2003) to reveal differences in the underlying

micro-foundations of agglomeration economies between China and the United States. Rosen-

thal and Strange (2003) estimate the determinants of firm entry for six industries: software

(SIC 7371, 7372, 7373, and 7375), food processing (SIC 20), apparel (SIC 23), printing and

publishing (SIC 27), fabricated metals (SIC 34), and industrial and commercial machinery

(SIC 35). Except for the software industry, we have corresponding estimates for the other five

manufacturing industries reported in their study. Our similar industries are food production

(CIC 14), garments and other fiber products (CIC18), printing and publishing (CIC 27),

metal products (CIC 34), and machinery and equipment manufacturing (CIC 35).

In the upper panel of Figure 3, we plot the estimated attenuation patterns for the five

manufacturing industries studied by Rosenthal and Strange (2003). In the middle panel, we

plot the same estimates but exclude the apparel industry on a different scale for easier com-

parison. We present, in the lower panel, the attenuation patterns of the Chinese counterparts

based on our estimates. To allow for easy comparison and interpretation, the vertical axis in

each figure is set such that the magnitude of the spillover benefits in the machinery industry

within the first ring in the corresponding study is equal to one; all other spillover effects are

33

measured relative to this value.48 The horizontal axis measures the spatial distance between

firms in the same industry, but the scale and the measurement unit vary between the two

studies. In Rosenthal and Strange (2003), as shown in the upper panel, the four concentric

rings represent 0-1 miles (0-1.6 km), 1-5 miles (1.6-8 km), 5-10 miles (8-16 km), and 10-

15 miles (16-24 km). In our setting, as shown in the lower panel, the five concentric rings

represent 0-1 km, 1-5 km, 5-10 km, 10-20 km, and 20-30 km.

Two interesting patterns emerge. First, the attenuation of agglomeration economies in

the apparel industry (SIC 23) in the United States is very fast, while the corresponding in-

dustry in China (CIC 18) has a much slower spatial decay of agglomeration economies. The

contrast implies that the apparel industry in the United States is more reliant on knowledge

spillovers or labour pooling mechanism than its Chinese counterpart. This is plausible as

the apparel industry in the United States could be more directly engaged in designing and

advertising, which require more ideas sharing and networking. The apparel industry in China

is more on processing and manufacturing, which depend less on knowledge spillovers. Second,

for food products, fabricated metal, and machinery, the spatial decay tends to be slower in

the United States than in China, after adjusting the different scales and distance units used

in both panels. This variation could be driven by the different extent of the transportation

infrastructure build-up in the two countries at the specific time periods.49 If the local trans-

portation system is not well-developed enough to allow for easy collaboration with nearby

firms at fair distances, the agglomeration economies would be restrained within a short range.

6. TFP and Other Correlates

While relying on firm birth patterns to identify agglomeration benefits has its advantages,

studying the impact of agglomeration economies on other productivity measures, such as

TFP and wages, is superior in terms of their easier interpretations.50 In a competitive

48The magnitude of the chosen base coefficient is actually very comparable across the two studies, despitea slight variation in the definition of the first rings. The impact of the first-ring localization economies formachinery in the United States is 6.35e-05 and the corresponding coefficient in China is 6.03e-05.

49In 2008, the localized transportation systems (for example, highways) in China in 2008 were not asextensively developed as that in the United States in 1997 (the sample year in Rosenthal and Strange (2003)).

50The advantage of examining firms’ birth patterns is that the location choice of a new firm is more sensitiveto agglomeration benefits than the changes in wages and TFP of existing firms, which can be constrained by

34

equilibrium, an increase in firm TFP or workers’ productivity, measured by nominal wage,

directly reflects the gains of agglomeration economies (Ciccone and Hall, 1996; Henderson,

2003; Combes et al., 2010; Glaeser et al., 2010). These measures also differ in the way that

the agglomeration benefits captured by TFP do not reflect effects related to land and housing

prices, which nevertheless contribute to wage differentials (Combes and Gobillon, 2015). This

is an advantage for focusing on TFP as opposed to wages, but the downside is that TFP is

not directly observable in data sets and the estimation of TFP could suffer from omitted

variable bias.51

In this section, we explore the attenuation of agglomeration economies using these al-

ternative productivity indicators. In columns (2)-(4) of Table 7, we report the impact of

localization effects at various distances on TFP, output per worker, and wages per worker

– with all industries pooled together. For comparison purposes, in column (1) of Table 7,

we also pool all industries together and estimate distance-specific localization impact on the

probability of firm birth. The general attenuation patterns appear for all four measures, even

though the implied attenuation speed and the spatial scope tend to be different. The largest

localization effects are always found in the 0-1 km ring, followed by localization effects in the

1-5 km ring. Whether the impact extends beyond 5 km depends on the specific measure of

productivity gains. In column (1), localization in the 5-10 km ring, but not those further

away, significantly contributes to the birth rates of new firms. In columns (2)-(3), the local-

ization effects on TFP and output per worker become largely insignificant or even negative

beyond the 5 km radius. In column (4), the wage effects of localization economies extend to a

much larger geographic scope with the coefficient on the 20-30 km ring remaining statistically

significant.

The variation in the attenuation slope and the corresponding spatial scope could be linked

to the nature of agglomeration economies implied by various productivity measures. For easy

previous choices such as investments in capital. The drawback, however, is that firms’ profits depend notonly on productivity but also on input and output quantity which are themselves influenced by agglomerationeffects.

51The economic interpretation of impact of agglomeration economies is not the same for TFP and wages.The elasticity obtained from the TFP regressions needs to be multiplied by one over the share of labour tobe directly comparable to that from wage regressions. In this paper, we focus on comparing the coefficientsacross different rings of the same measure as opposed to cross-regression comparisons.

35

comparison and interpretation across rings, we normalize the magnitude of the first-ring

coefficient as one in Table 7 and plot the attenuation patterns for all measures in Figure

4. Three interesting patterns emerge. First, as workers are highly mobile within the same

labor market, wages could be highly spatially correlated within a city, which leads to a slower

spatial attenuation. Second, firm entry decisions are affected by considerations of various

Marshallian forces and local cost factors. Since cost factors, such as wage, are affected by the

presence of employment at outer rings, the spatial attenuation documented with firm birth

patterns is faster than that captured using other productivity correlates. Third, unlike wage

and output per worker, TFP does not reflect the effects of land rental costs; the attenuation

patterns documented using TFP is flatter than those documented using wage and output per

worker.

In Table 8, we explore the role of the “Chinitz” effect, which is an important feature

of industrial organization that has been emphasized in Chinitz (1961), Vernon (1960), and

Jacobs (1969), as well as more recently in Rosenthal and Strange (2003) and Faggio et al.

(2017). The idea is that, relative to big firms, small firms are more effective in fostering

an innovative and collaborative community and, thus, are more important in contributing

to agglomeration economies and in enhancing nearby economic productivity. To test this

idea, we separately identify the impact of concentrations of small and big firms on the four

productivity measures.52 We find that the birth of new firms is more likely to be enhanced

by nearby employment concentrations of small firms in the 0-1 km and 1-5 km rings, but the

impact of employment concentration of big firms is slightly more pronounced in the 5-10 km

ring. More important, the implied attenuation speed of localization economies from small

firms is faster, which suggests that knowledge spillovers could be the main mechanism of

agglomeration economies for small firms facilitating the “Chinitz” effect.

Contrary to the findings of firm births, we find that the rest three productivity measures of

firms are more affected by the presence of nearby big firms. We think the main reason for this

finding is the sample selection in ASIF and the heterogeneous effect in firm size. As explained

in the data section, the remaining three productivity measures are calculated using the ASIF,

52Small firms are defined as those with employment size below the 10th percentile of all firms in the sameindustry.

36

which only covers private firms with annual sales exceeding five million yuan and all SOEs.

In other words, in columns (2)-(4) of Table 8, we find that the productivity of relatively big

firms is more affected by the presence of nearby big firms, which is plausible because labour

pooling and input sharing are generally more common among firms of comparable sizes. The

finding is consistent with evidence in Bloom et al. (2013) that suggests that larger firms have

a bigger gap between social and private returns.53

7. SOEs versus Non-SOEs

To investigate how the attenuation of agglomeration economies interplay with the large

presence of the state sector in China, we separately identify the impact of concentrations of

SOEs and non-SOEs at different distances on various outcome variables for all firms, SOEs,

and private firms, respectively. The results are presented in Table 9. Columns (1)-(3) focus

on the birth of new firms, with the first column pooling all firms together and columns (2)-

(3) looking at SOEs and non-SOEs separately. In a similar format, columns (4)-(12) report

the impact on firms’ TFP, output per worker, and wages per worker. The attenuation of

agglomeration economies within and between ownership sectors is plotted in Figure 5 for

better visual presentation and comparison.

An initial examination of columns (1)-(3) shows that the agglomeration spillovers within

and across firms of different ownership types follow consistent attenuation patterns as we

found earlier. More important, the results on the birth of new firms reveal large heterogeneity

in how firms of different ownership types contribute to agglomeration economies and in the

cross-ownership agglomeration spillovers. First, pooling all firms together, column (1) shows

that on average the probability of firm birth is more responsive to the presence of private

firms, in both the magnitude and geographic scope, than that of the SOEs. For instance,

the impact of non-SOE employment in the 0-1 km ring is about four times as large as the

impact of SOE employment in the same ring. In terms of scope, the impact of non-SOE

firms extends to 10 km, while the SOE impact is constrained within 1km. Overall, the results

53Bloom et al. (2013) argue that larger firms generate more spillovers since they have a higher level ofconnectivity with other firms in the technology space.

37

imply that private firms contribute to local agglomeration economies more than SOEs, which

is consistent with our prior that private firms are more efficient and market-oriented.

Second, focusing on the birth of SOEs and non-SOEs separately, columns (2) and (3)

reveal that firms benefit more from the employment concentrations of their own ownership

type. In column (2), birth rate of SOEs is more responsive to close-range SOE concentrations,

but is less affected by nearby non-SOE firms. For example, the impact of SOE employment

in the 0-1 km ring on SOE birth probability is six times as large as the impact of non-

SOE employment in the same ring. The results could be driven by the possibility that the

government tries to internalize potential spillover effects within SOEs when making location

choices for new SOEs. It could also be because SOEs are more likely to share labour, inputs,

and knowledge with each other, which produces larger externalities within the state sector.

In constrast, column (3) suggests that the birth rate of private firms is much more affected

by the presence of private firms in the proximity than that of SOEs. The impact of private

firms is larger in magnitude and extends out to more than 10 km, while the impact of SOEs is

smaller and confined within 0-1 km. The evidence implies that private firms generate higher

productivity spillovers to other private firms nearby than to SOEs. Overall, the results suggest

that the pattern in column (1) is driven by the strong within-sector agglomeration spillovers

that are revealed in columns (2)-(3).

We also find that there exist stronger within-ownership-type agglomeration effects than

cross-ownership-type spillovers when we focus on the impact on existing firms’ TFP and

output per worker. Private firms have almost no localization effects on SOEs’ productivity,

while they have an especially pronounced effect on private firms’ productivity. SOEs show

significant localization effects on both SOEs and private firms, with the effect within 0-1 km

being positive and statistically significant only on SOEs. One difference from the results for

the birth rate is that the presence of SOEs also benefits private firms and this effect takes

place at a relatively large geographic scope. We do not have a strong prior to interpret this

finding, but we suspect that this could be driven by potential benefits for private firms to

work and partner with the stable and sizable state sector to share inputs.

We then focus on the wage effects on the existing firms in columns (9)-(12). Pooling firms

38

together, wages per worker are more affected by the presence of non-SOEs than SOEs. The

effect from private firms is larger in both magnitude and the geographic scope. This could

be driven by the fact that private firms generate a higher extent of positive spillovers to

increase labour productivity on average. Compared to the presence of nearby SOEs, private

firms have a larger impact on wages in both the state sector and the private sector. Given

that the impact on SOEs’ wages is of a smaller geographic scope than the impact on private

firms’ wages, the former could mainly be driven by knowledge spillovers while the latter

could be driven by a combination of knowledge sharing, labour pooling, and input sharing.

SOEs’ impact on wages are relatively smaller and confined within a small geographic scope,

especially on wages in the private sector. This is consistent with the fact that the labour

flow from SOEs to non-SOEs is very limited in China. In particular, the fast attenuation of

SOEs’ spillover impact on private firms’ wages suggests that knowledge spillovers could be

the main Marshallian force at work.

8. Conclusion

Taking advantage of comprehensive geocoded administrative data sets, we examine a

complete list of twenty-nine Chinese manufacturing industries to estimate industry-specific

spatial attenuation speed of agglomeration economies and its interplay with the large pres-

ence of the state sector in China. We find that agglomeration economies attenuate sharply

with geographic distance. More interesting, there is notable heterogeneity across industry

and ownership types. Concentrations of private firms produce greater agglomeration benefits

to attract more new arrivals and induce higher increases in other productivity measures com-

pared to their state counterparts. This pattern is consistent with the argument that a more

entrepreneurial and more market-driven industrial system is more conducive to economic

growth. We also find that SOEs and non-SOEs benefit more from their own-type concentra-

tions, which suggests that agglomeration forces are significantly prohibited across ownership

types in China. It is also consistent with the fact that SOEs are capable of internalizing

potential spillovers.

The heterogeneity across industries is further explored and linked with the micro-foundations

39

of agglomeration economies – our nuanced full-spectrum analysis of this heterogeneity at re-

fined geographical levels allow us to statistically evaluate the underlying micro-foundations

that govern the spatial patterns of agglomeration economies. We find that agglomeration

benefits dissipate faster in industries more reliant on knowledge spillovers or labour market

pooling but slower in industries more reliant on input sharing or with a higher share of SOEs.

With the detailed estimates of industry- and distance-specific agglomeration economies for

all manufacturing industries, we also test for the goodness of fit of various spatial decay

functional forms. We find that the inverse square distance decay function presents the best

goodness of fit among the tested functional forms.

The revealed systematic evidence on spatial attenuation speed of agglomeration economies

not only offer generic guidance on theoretical models but also bear important policy implica-

tions. Our results suggest that place-based policies aiming for boosting local agglomeration

economies (for example, the industrial parks policy in China) should take into consideration

the different attenuation speed of agglomeration economies across industry and ownership

types. Moreover, policy makers may consider ways to improve the connections between

SOEs and non-SOEs and to facilitate stronger agglomeration economies across ownership

types. In future research, it would be interesting to investigate if the spatial attenuation pat-

terns evolve over time using data sets that span across a longer time horizon. It would also

be interesting to study how the agglomeration attenuation patterns in the service industries

may differ from that in the current manufacturing setting.

40

References

Acemoglu, D. (1996). A Microfoundation for Social Increasing Returns in Human Capital

Accumulation. The Quarterly Journal of Economics, 111(3):779–804.

Ackerberg, D. A., Caves, K., and Frazer, G. (2015). Identification Properties of Recent

Production Function Estimators. Econometrica, 83(6):2411–2451.

Ahlfeldt, G. M., Redding, S. J., Sturm, D. M., and Wolf, N. (2015). The Economics of

Density: Evidence From the Berlin Wall. Econometrica, 83(6):2127–2189.

Amemiya, T. (1974). The nonlinear two-stage least-squares estimator. Journal of Economet-

rics.

Andrews, I., Stock, J. H., and Sun, L. (2019). Weak Instruments in Instrumental Variables

Regression: Theory and Practice. Annual Review of Economics, 11(1):727–753.

Arzaghi, M. and Henderson, J. V. (2008). Networking off Madison Avenue. Review of

Economic Studies, 75:1011–1038.

Au, C.-C. and Henderson, J. V. (2006). How migration restrictions limit agglomeration and

productivity in China. Journal of Development Economics, 80:350–388.

Belloni, A., Chen, D. L., Chernozhukov, V., and Hansen, C. (2012). Sparse Models and

Methods for Optimal Instruments With an Application to Eminent Domain. Econometrica,

80(6):2369–2429.

Berliant, M., Peng, S. K., and Wang, P. (2002). Production Externalities and Urban Config-

uration. Journal of Economic Theory, 104(2):275–303.

Bloom, N., Schankerman, M., and Reenen, J. V. (2013). Identifying Technology Spillovers

and Product Market Rivalry. Econometrica, 81(4):1347–1393.

Brandt, L., Van Biesebroeck, J., Wang, L., and Zhang, Y. (2017). WTO Accession and

Performance of Chinese Manufacturing Firms. American Economic Review, 107(9):2784–

2820.

41

Brandt, L., Van Biesebroeck, J., and Zhang, Y. (2012). Creative Accounting or Creative

Destruction? Firm-level Productivity Growth in Chinese Manufacturing. Journal of De-

velopment Economics, 97(2):339–351.

Cao, X., Lemmon, M., Pan, X., Qian, M., and Tiane, G. (2019). Political Promotion, CEO

Incentives, and the Relationship Between Pay and Performance. Management Science,

65(7):2947–2965.

Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment re-

strictions. Journal of Econometrics, 34(3):305–334.

Chauvin, J. P., Glaeser, E., Ma, Y., and Tobio, K. (2017). What Is Different About Urban-

ization In Rich and Poor Countries? Cities in Brazil, China, India and the United States.

Journal of Urban Economics, 98:17–49.

Chernozhukov, V., Hansen, C., and Spindler, M. (2015). Post-Selection and Post-

Regularization Inference in Linear Models with Many Controls and Instruments. American

Economic Review, 105(5):486–490.

Chinitz, B. (1961). Contrasts in Agglomeration: New York and Pittsburgh. The American


Ciccone, A. and Hall, R. E. (1996). Productivity and the Density of Economic Activity.

American Economic Review, 86(1):54–70.

Combes, P.-P., Demurger, S., and Li, S. (2015). Migration Externalities in Chinese Cities.

European Economic Review, 76:152–167.

Combes, P.-P., Duranton, G., and Gobillon, L. (2008). Spatial Wage Disparities: Sorting

Matters! Journal of Urban Economics, 63(2):723–742.

Combes, P.-P., Duranton, G., Gobillon, L., Puga, D., and Roux, S. (2012). The productivity

advantages of large cities: distinguishing agglomeration from firm selection. Econometrica,

80(6):2543–2594.

42

Combes, P.-P., Duranton, G., Gobillon, L., and Roux, S. (2010). Estimating Agglomeration

Economies with History, Geology and Worker Effects. In Agglomeration economics.

Combes, P.-P. and Gobillon, L. (2015). The Empirics of Agglomeration Economies. In

Handbook of Regional and Urban Economics, pages 247–348.

Costa, D. L. and Kahn, M. E. (2000). Power couples: Changes in the locational choice of the

college educated, 1940-1990. Quarterly Journal of Economics, 115(4):1287–1315.

Davis, D. R. and Dingel, J. I. (2019). A spatial knowledge economy. American Economic

Review, 109(1):153–170.

de la Roca, J. and Puga, D. (2017). Learning byworking in big cities. Review of Economic

Studies, 84(1):106–142.

Dingel, J. I., Miscio, A., and Davis, D. R. (2019). Cities, Lights, and Skills in Developing

Economies. Journal of Urban Economics, page 103174.

Diodato, D., Neffke, F., and O’Clery, N. (2018). Why do industries coagglomerate? How

Marshallian externalities differ by industry and have evolved over time. Journal of Urban

Economics, 106:1–26.

Djankov, S. and Murrell, P. (2002). Enterprise Restructuring in Transition: A Quantitative

Survey. Journal of Economic Literature, 40(3):739–792.

Duranton, G. (2016). Agglomeration Effects in Colombia. Journal of Regional Science,

56(2):210–238.

Duranton, G. and Puga, D. (2004). Micro-Foundations of urban agglomeration economies.

Handbook of regional and urban economics.

Ellison, G. and Glaeser, E. L. (1999). The Geographic Concentration of Industry: Does

Natural Advantage Explain Agglomeration? American Economic Review, 89(2):311–316.

Ellison, G., Glaeser, E. L., and Kerr, W. R. (2010). What causes industry agglomeration?

Evidence from coagglomeration patterns. American Economic Review, 100(3):1195–1213.

43

Estrin, S., Hanousek, J., Kocenda, E., and Svejnar, J. (2009). The Effects of Privatization

and Ownership in Transition Economies. Journal of Economic Literature, 47(3):699–728.

Faggio, G., Silva, O., and Strange, W. C. (2017). Heterogeneous Agglomeration. Review of

Economics and Statistics, 99(1):8094.

Fang, L. H., Lerner, J., and Wu, C. (2017). Intellectual Property Rights Protection , Owner-

ship , and Innovation : Evidence from China. Review of Financial Studies, 30(7):2446–2477.

Frank, L. E. and Friedman, J. H. (1993). A Statistical View of Some Chemometrics Regression

Tools. Technometrics, 35(2):109–135.

Fu, S. (2007). Smart Cafe Cities: Testing human capital externalities in the Boston metropoli-

tan area. Journal of Urban Economics, 61(1):86–111.

Fujita, M. and Ogawa, H. (1982). Multiple Equilibria and Structural Transition of Non-

monocentric Urban Configurations. Regional Science and Urban Economics, 12(2):161–

196.

Gaubert, C. (2018). Firm Sorting and Agglomeration. American Economic Review,

108(11):3117 – 3153.

Ge, S. and Yang, D. T. (2014). Changes in China’s Wage Structure. Journal of the European

Economic Association, 12(2):300–336.

Ge, Y. (2009). Globalization and Industry Agglomeration in China. World Development,

37(3):550–559.

Glaeser, E. and Kerr, W. R. (2009). Local Industrial Conditions and Entrepreneurship.

Journal of Economics and Business Management Strategy, 18(3):623–663.

Glaeser, E. L., Kallal, H. D., Scheinkman, J. A., and Shleifer, A. (1992). Growth in Cities.

Journal of Political Economy, 100(6):1126–1152.

Glaeser, E. L., Kerr, W. R., and Ponzetto, G. A. M. (2010). Clusters of entrepreneurship.

Journal of Urban Economics, 67(1):150–168.

44

Glaeser, E. L., Kominers, S. D., Luca, M., and Naik, N. (2018). Big Data and Big Cities:

The Promises and Limitations of Improved Measures of Urban Life. Economic Inquiry,

56(1):114–137.

Glaeser, E. L. and Mare, D. C. (2001). Cities and Skills. Journal of Labor Economics,

19(2):316–342.

Greenstone, M., Richard Hornbeck, and Enrico Moretti (2010). Identifying Agglomeration

Spillovers: Evidence from Winners and Losers of Large Plant Openings. Journal of Political

Economy, 118(3):536–598.

Henderson, J. (2003). Marshall’s scale economies. Journal of Urban Economics, 53(1):1–28.

Henderson, J. V., Kuncoro, A., and Turner, M. (1995). Industrial Development in Cities.

Journal of Political Economy, 103(5).

Holmes, T. J. (1999). Localization of industry and vertical disintegration. Review of Eco-

nomics and Statistics, 81(2):314–325.

Holz, C. A. (2018). The Unfinished Business of State-owned Enterprise Reform in the Peoples

Republic of China. SSRN Electronic Journal, (December).

Hsieh, C.-T. and Song, Z. M. (2015). Grasp the Large, Let Go of the Small : The Trans-

formation of the State Sector in China. Brookings Papers on Economic Activity, pages

295–346.

Huang, Z., Li, L., Ma, G., and Xu, L. C. (2017). Hayek, Local Information, and Commanding

Heights: Decentralizing State-owned Enterprises in China. American Economic Review,

107(8):2455–2478.

Jacobs, J. (1969). The Economy of Cities. New York: Vintage.

Jofre-Monseny, J., Marın-Lopez, R., and Viladecans-Marsal, E. (2011). The Mechanisms of

Agglomeration: Evidence from the Effect of Inter-industry Relations on the Location of

New Firms. Journal of Urban Economics, 70(2-3):61–74.

45

Krugman, P. (1991). Increasing Returns and Economic Geography. Journal of Political

Economy, 99(3):483–499.

Leeb, H. and Potscher, B. M. (2008). Sparse Estimators and the Oracle Property, or the

Return of Hodges Estimator. Journal of Econometrics, 142(1):201–211.

Levinsohn, J. and Petrin, A. (2003). Estimating Production Functions Using Inputs to

Control for Unobservables. Review of Economic Studies, 70(2):317–341.

Li, J. (2014). The influence of state policy and proximity to medical services on health

outcomes. Journal of Urban Economics, 80:97–109.

Liu, S. (2015). Spillovers from universities: Evidence from the land-grant program. Journal

of Urban Economics, 87:25–41.

Loecker, J. D. and Warzynski, F. (2012). Markups and Firm-Level Export Status. American


Lu, J. and Tao, Z. (2009). Trends and determinants of Chinas industrial agglomeration.


Lucas, R. E. (2001). Externalities and Cities. Review of Economic Dynamics.

Lucas, R. E. and Rossi-Hansberg, E. (2002). On the Internal Structure of Cities. Economet-

rica, 70(4):1445–1476.

Marshall, A. (1920). Principles of Economics. London: Macmillan.

Megginson, W. L. and Netter, J. M. (2001). From State to Market: A Survey of Empirical

Studies on Privatization. Journal of Economic Literature, 39(2):321–389.

Meng, X. (2012). Labor Market Outcomes and Reforms in China. Journal of Economic

Perspectives, 26(4):75–102.

Moretti, E. (2019). The Effect of High-Tech Clusters on the Productivity of Top Inventors.

NBER Working Paper Series, (12610):No. 26270.

46

Newey, W. K. (1990). Efficient Instrumental Variables Estimation of Nonlinear Models.

Econometrica.

Ogawa, H. and Fujita, M. (1980). Equilibrium Land Use Patterns in a Nonmonocentric City.

Journal of Regional Science, 20(4):455–475.

Olley, G. S. and Pakes, A. (1996). The Dynamics of Productivity in the Telecommunications

Equipment Industry. Econometrica, 64(6):1263.

Roback, J. (1982). Wages, Rents, and the Quality of Life. Journal of Political Economy,

90(6):1257–1278.

Rosen, H. S. (1979). Housing Decisions and the US Income Tax: An Econometric Analysis.

Journal of Public Economics, 11(1):1–23.

Rosenthal, S. S. and Strange, W. C. (2001). The Determinants of Agglomeration. Journal of

Urban Economics, 50(2):191–229.

Rosenthal, S. S. and Strange, W. C. (2003). Geography, Industrial Organization, and Ag-

glomeration. Review of Economics and Statistics, 85(May):377–393.

Rosenthal, S. S. and Strange, W. C. (2004). Evidence on the Nature and Sources of Agglom-

eration Economies. Handbook of regional and urban economics, 4:2119–2171.

Rosenthal, S. S. and Strange, W. C. (2005). The Geography of Entrepreneurship in the New

York Metropolitan Area. Federal Reserve Bank of New York Economic Policy Review,

11:29–53.

Rosenthal, S. S. and Strange, W. C. (2008). The Attenuation of Human Capital Spillovers.


Rosenthal, S. S. and Strange, W. C. (2019). How Close is Close? The Spatial Reach of

Agglomeration Economies. Working paper, 53(9):1689–1699.

Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal

Statistical Society. Series B: Statistical Methodology, 73(3):273–282.

47

Tombe, T. and Zhu, X. (2019). Trade, Migration, and Productivity: A Quantitative. Amer-

ican Economic Review, 109(5):1843–1872.

Vernon, R. (1960). Metropolis 1985. Harvard University Press.

Zhu, X. (2012). Understanding China’s Growth: Past, Present, and Future. Journal of

Economic Perspectives, 26(4):103–124.

48

Table 1a. Industry Concentrations

CIC 13 CIC 14 CIC 15 CIC 16 CIC 17 CIC 18 CIC 19 CIC 20Leather, Furs, Timber Processing,

Food Processing Food Production Beverage Tobacco Processing Textile Industry Garments&Other Down&Related Bamboo,Cane,PalmProduction Fiber Products Products Fiber&Straw Products

Weihai (3.80%) Shanghai (3.93%) Yibin (3.46%) Changsha (10.83%) Suzhou (6.45%) Quanzhou (5.32%) Quanzhou (12.81%) Xuzhou (5.32%)Yantai (3.40%) Guangzhou (2.57%) Beijing (2.46%) Kunming (6.99%) Shaoxing (5.05%) Suzhou (5.17%) Dongguan (10.00%) Suqian (5.12%)Qingdao (3.19%) Beijing (2.53%) Xuzhou (2.22%) Guiyang (6.27%) Binzhou (3.47%) Shanghai (5.00%) Wenzhou (7.97%) Linyi (5.06%)Linyi (3.18%) Quanzhou (1.95%) Suqian (1.81%) Zhenghou (5.05%) Ningbo (3.26%) Guangzhou (4.09%) Guangzhou (5.77%) Nanping (3.55%)Weifang (3.04%) Chengdu (1.94%) Luzhou (1.81%) Hefei (3.40%) Hangzhou (3.25%) Ningbo (3.94%) Zhongshan (5.04%) Heze (2.98%)

CIC 21 CIC 22 CIC 23 CIC 24 CIC 25 CIC 26 CIC 27 CIC 28Furniture Papermaking& Printing& Stationery, Petroleum Processing, Raw Chemical Medical&Manufacturing Paper Products Record Pressing Educational& Coking Products,Gas Materials& Pharmaceutical Chemical Fibers

Sports Goods Production&Supply Chemical Products Products

Dongguan (9.96%) Dongguan (4.21%) Shanghai (6.35%) Dongguan (15.39%) Luliang (4.21%) Shanghai (3.21%) Shanghai (3.50%) Suzhou (12.54%)Fuoshan (5.71%) Hangzhou (3.88%) Shenzhen (6.25%) Shenzhen (8.44%) Linfen (3.90%) Suzhou (2.67%) Shijiazhuang (3.46%) Wuxi (7.92%)Shanghai (5.57%) Shenzhen (3.83%) Beijing (6.00%) Ningbo (5.30%) Daqing (3.71%) Guangzhou (2.19%) Beijing (3.25%) Jiaxing (7.33%)Shenzhen (5.36%) Suzhou (3.30%) Dongguan (5.44%) Guangzhou (5.17%) Taiyuan (3.68%) Zibo (2.11%) Haerbin (2.54%) Hangzhou (6.45%)Zhongshan (3.49%) Shanghai (2.49%) Fuoshan (4.91%) Zhongshan (5.02%) Zibo (3.54%) Tianjin (1.85%) Tianjin (2.44%) Shaoxing (5.73%)

CIC 29 CIC 30 CIC 31 CIC 32 CIC 33 CIC 34 CIC 35 CIC 36Rubber Products Plastic Products Non-metal Smelting&Pressing Smelting&Pressing Metal Products Machinery&Equipment Special Equipment

Mineral Products of Ferrous Metals of Nonferrous Metals Manufacturing Manufacturing

Qingdao (5.26%) Dongguan (10.37%) Fuoshan (3.53%) Tangshan (6.41%) Fuoshan (4.05%) Shanghai (6.52%) Shanghai (6.36%) Shanghai (5.54%)Shanghai (4.78%) Shenzhen (7.99%) Zibo (3.30%) Anshan (4.42%) Honghe (3.18%) Fuoshan (4.73%) NIngbo (4.31%) Suzhou (4.98%)Dongguan (4.63%) Shanghai (5.41%) Quanzhou (3.27%) Wuhan (3.90%) Yantai (3.08%) Shenzhen (4.41%) Suzhou (3.56%) Shenzhen (4.46%)Suzhou (4.18%) Suzhou (4.87%) Zhenghou (2.61%) Tianjin (3.55%) Yuncheng (2.73%) Suzhou (3.94%) Wuxi (3.50%) Dongguan (3.15%)Guangzhou (3.90%) Fuoshan (3.13%) Chongqing (2.17%) Handan (3.45%) Yingtan (2.66%) Jiangmen (3.45%) Dalian (3.40%) Wuxi (3.06%)

CIC 37 CIC 39 CIC 40 CIC 41 CIC 42Transportation Equi- Electric Equipment Electronic&Teleco- Instruments, Meters, Artwork&Otherpment Manufacturing &Machinery mmunications Cultural&Official Manufacturing

Machinery

Chongqing (7.86%) Shenzhen (7.84%) Shenzhen (19.86%) Shenzhen (10.07%) Quanzhou (8.70%)Shanghai (5.90%) Dongguan (5.90%) Suzhou (15.64%) Dongguan (5.95%) Shenzhen (5.15%)Changchun (3.31%) Fuoshan (5.67%) Dongguan (9.08%) Shanghai (5.72%) Guangzhou (4.99%)Tianjin (2.99%) Shanghai (5.09%) Shanghai (6.00%) Suzhou (4.22%) Taizhou (4.76%)Guangzhou (2.98%) Ningbo (4.97%) Huizhou (3.51%) Ningbo (4.14%) Jinhua (4.31%)

1 For each two-digit industry and each city, the number in the parenthesis is calculated as ns,c/ns, where ns,c is the total number of firms in industry s city c, and ns is the totalnumber of firms in industry s. The top five cities with the largest percentage of firms in the respective industry are listed.

49

Table 1b. Summary Statistics - Concentric Ring Measures of Existing Employment

Name N Mean Std DevLocalization Measures0 - 1 km 23,434,810 2.967718 114.67021 - 5 km 23,434,810 48.25242 631.99825 - 10 km 23,434,810 78.67449 877.937210 - 20 km 23,434,810 219.3559 1807.34220 - 30 km 23,434,810 293.1631 2140.247Urbanization Measures0 - 1 km 23,434,810 1144.517 3891.3151 - 5 km 23,434,810 19282.36 38718.055 - 10 km 23,434,810 31659.85 65016.3710 - 20 km 23,434,810 88373.29 166638.720 - 30 km 23,434,810 117919.8 209310.5

1 Localization measures are calculated as the sum of within four-digit industry employment in respectiveconcentric rings.

2 Urbanization measures are calculated as the sum of all manufacturing employment excluding own four-digit industry in respective concentric rings.

50

Table 1c. Summary Statistics - Grid Level Four-Digit Industry Firm Birth Share

Two-digit Industry Name CIC Code N Mean Std Dev

Food Processing 13 996705 1.50E-05 0.001Food Production 14 1173972 1.62E-05 0.001Beverage Production 15 756300 1.59E-05 0.001Tobacco Processing 16 44346 6.76E-05 0.007Textile Industry 17 1194920 1.67E-05 0.001Garments & Other Fibre Products 18 165315 1.81E-05 0.001Leather, Furs, Down & Related Products 19 485520 2.06E-05 0.002Timber Processing, Bamboo, Cane, Palm Fibre & Straw Products 20 479000 1.67E-05 0.001Furniture Manufacturing 21 250010 2.00E-05 0.002Papermaking & Paper Products 22 295940 1.35E-05 0.001Printing & Record Pressing 23 261945 1.91E-05 0.001Stationery, Educational & Sports Goods 24 578214 2.25E-05 0.003Petroleum Processing, Coking Products, Gas Production & Supply 25 183640 2.18E-05 0.001Raw Chemical Materials & Chemical Products 26 1953240 1.43E-05 0.001Medical & Pharmaceutical Products 27 361578 1.66E-05 0.001Chemical Fibres 28 241101 2.49E-05 0.002Rubber Products 29 453618 1.98E-05 0.001Plastic Products 30 537660 1.67E-05 0.001Non-metal Mineral Products 31 2026950 1.43E-05 0.001Smelting & Pressing of Ferrous Metals 32 232588 1.72E-05 0.001Smelting & Pressing of Nonferrous Metals 33 489096 1.64E-05 0.001Metal Products 34 1070748 1.68E-05 0.001Machinery & Equipment Manufacturing 35 1884769 1.59E-05 0.001Special Equipment Manufacturing 36 2352880 1.66E-05 0.002Transportation Equipment Manufacturing 37 1213107 1.65E-05 0.002Electric Equipment & Machinery 39 1407264 1.71E-05 0.001Electronic & Telecommunications 40 684292 2.05E-05 0.002Instruments, Meters, Cultural & Official Machinery 41 1058784 1.98E-05 0.003Artwork & Other Manufacturing 42 601308 2.00E-05 0.001ALL NA 23,434,810 1.70E-05 0.002

1 Firm birth share is defined as the percentage of new firms in a grid out of all new firms in the four-digit industry.

51

Table 2. OLS Estimates - Firm Birth Share

(1) (2) (3) (4) (5) (6) (7) (8) (9)

Food Tobacco Furniture Printing & Medical & Transportation Electronic & Instruments, Meters, Artwork &

Name Production Processing Manufacturing Record Pressing Pharmaceutical Equipment Telecom- Cultural & Other

Products Manufacturing munications Official Machinery Manufacturing

code 14 16 21 23 27 37 40 41 42

Localization Effects

0-1 km 6.87E-05 -1.78E-05 3.51E-05 2.09E-05 5.06E-05 8.71E-05 0.000107 0.000146 0.000171

(5.09) (-1.03) (3.79) (2.30) (4.76) (5.02) (4.92) (2.81) (6.01)

2-5 km 3.57E-06 2.13E-05 4.69E-06 1.71E-06 5.04E-07 7.53E-06 3.58E-06 5.51E-06 5.37E-06

(2.10) (0.67) (1.91) (0.90) (0.54) (2.69) (1.46) (1.06) (2.24)

5-10 km 9.20E-07 -4.81E-06 3.08E-06 -2.10E-06 -2.88E-08 1.47E-06 -5.85E-07 3.37E-06 7.26E-06

(0.80) (-0.25) (2.37) (-1.26) (-0.03) (0.97) (-0.30) (1.25) (3.86)

10-20 km 1.47E-06 3.91E-06 -3.01E-07 -2.10E-06 5.67E-08 -1.93E-06 -2.25E-06 6.16E-07 2.73E-06

(1.01) (0.19) (-0.09) (-1.03) (0.08) (-1.03) (-0.96) (0.25) (1.51)

20-30 km 2.08E-07 1.05E-05 -9.37E-08 -2.62E-07 -2.63E-06 -3.16E-06 -4.46E-06 -3.87E-06 -4.54E-07

(0.12) (0.63) (-0.06) (-0.17) (-2.84) (-1.89) (-2.17) (-0.92) (-0.26)

Urbanization Effects

0-1 km 1.24E-05 5.90E-05 1.62E-05 2.33E-05 1.90E-05 1.33E-05 1.65E-05 1.25E-05 1.37E-05

(6.01) (1.41) (3.34) (5.62) (6.77) (5.53) (4.76) (4.62) (5.39)

2-5 km -2.44E-06 -7.46E-05 -6.61E-06 8.48E-07 -4.64E-06 -1.60E-06 -2.92E-06 1.68E-06 -8.20E-06

(-2.26) (-1.56) (-1.55) (0.39) (-3.35) (-1.61) (-1.37) (0.64) (-4.06)

5-10 km -8.96E-08 8.99E-05 -3.80E-06 -2.72E-06 -3.28E-06 -9.99E-07 2.68E-07 -5.63E-06 -9.37E-07

(-0.07) (1.92) (-0.61) (-0.99) (-1.63) (-0.98) (0.12) (-1.44) (-0.47)

10-20 km -6.56E-06 -2.98E-06 1.52E-06 -1.73E-06 -5.49E-06 -5.70E-06 -3.35E-06 -4.82E-06 -1.12E-06

(-3.58) (-0.03) (0.22) (-0.56) (-2.49) (-2.71) (-0.87) (-1.43) (-0.37)

20-30 km 6.08E-07 -2.24E-04 -9.64E-07 -3.67E-06 2.32E-06 -7.00E-08 -4.04E-06 -5.84E-06 -5.95E-06

(0.35) (-1.34) (-0.18) (-0.69) (1.04) (-0.03) (-1.06) (-1.61) (-1.24)

Observations 1,173,972 44,346 250,010 261,945 361,578 1,213,107 684,292 1,058,784 601,308

Adj. R-squared 0.001 0.007 0.002 0.002 0.003 0.001 0.002 0.001 0.005

1 Coefficients reported are ring level localization and urbanization effects obtained by OLS estimation of equation (3.1) for each two-digit industry.2 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of each grid, Herfindahl index representing industry

diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.

52

Table 3. IV Lasso Estimates - Firm Birth Share

(1) (2) (3) (4) (5) (6) (7) (8) (9)

Food Tobacco Furniture Printing & Medical & Transportation Electronic & Instruments, Meters, Artwork &

Name Production Processing Manufacturing Record Pressing Pharmaceutical Equipment Telecom- Cultural & Other

Products Manufacturing munications Official Machinery Manufacturing

code 14 16 21 23 27 37 40 41 42


0-1 km 6.77E-05 -6.33E-05 2.94E-05 2.42E-05 4.58E-05 0.0001072 8.64E-05 0.0002248 0.0001889

(4.44) (-1.00) (3.58) (1.88) (4.41) (4.01) (4.04) (3.05) (5.45)

1-5 km 6.98E-06 -5.40E-05 -1.77E-06 -2.85E-06 2.57E-06 -5.20E-06 5.28E-06 3.41E-06 1.78E-05

(1.79) (-1.00) (-0.50) (-0.52) (1.12) (-0.58) (0.74) (0.36) (1.27)

5-10 km 1.05E-05 -4.65E-05 1.91E-06 -1.25E-06 2.51E-06 2.82E-06 6.90E-06 -4.18E-06 5.32E-06

(2.36) (-1.00) (0.35) (-0.19) (1.10) (0.44) (1.05) (-0.64) (0.44)

10-20 km -8.15E-07 -4.58E-05 6.02E-06 -3.78E-06 6.26E-07 -3.49E-06 1.58E-06 -8.57E-06 -6.53E-06

(-0.15) (-1.00) (0.74) (-0.56) (0.24) (-0.46) (0.27) (-0.90) (-0.66)

20-30 km 2.25E-06 -5.08E-05 -7.49E-06 1.01E-05 -2.39E-06 3.24E-06 6.47E-07 -1.45E-05 -6.34E-06

(0.42) (-1.00) (-0.77) (0.68) (-0.80) (0.35) (0.10) (-1.06) (-0.50)


0-1 km 1.40E-05 -5.34E-06 2.42E-05 2.45E-05 1.73E-05 1.90E-05 2.41E-05 1.36E-05 2.15E-05

(5.79) (-0.97) (3.16) (5.25) (5.26) (3.95) (3.76) (2.67) (4.61)

1-5 km -7.01E-06 2.89E-05 -7.57E-06 8.43E-06 -7.36E-06 1.91E-06 -1.38E-06 -1.32E-05 -2.93E-05

(-2.42) (1.00) (-2.48) (1.91) (-2.99) (0.33) (-0.20) (-1.34) (-2.64)

5-10 km -8.59E-06 7.20E-06 3.11E-06 -7.76E-06 -7.16E-06 -4.03E-07 -1.40E-05 2.11E-06 -3.83E-06

(-2.01) (0.96) (0.43) (-0.94) (-1.78) (-0.06) (-1.84) (0.14) (-0.30)

10-20 km -6.51E-06 -5.39E-05 -2.42E-05 6.91E-06 -4.58E-06 -4.55E-06 -9.56E-06 -1.84E-06 9.33E-06

(-0.95) (-1.00) (-1.51) (0.61) (-0.80) (-0.31) (-0.72) (-0.10) (0.53)

20-30 km -1.99E-06 1.23E-05 1.09E-05 -2.27E-05 4.45E-07 -1.81E-05 -1.50E-05 1.82E-06 -1.20E-05

(-0.27) (0.99) (0.88) (-1.13) (0.08) (-0.83) (-0.94) (0.08) (-0.57)

Observations 1,173,972 44,346 250,010 261,945 361,578 1,213,107 684,292 1,058,784 601,308

F stats 58.09 16.34 18.56 16.9 58.42 92.42 47.03 51.48 213.3

1 Coefficients reported are ring level localization and urbanization effects obtained by IV Lasso estimation of equation (3.1) for each two-digit industry.2 Post-lasso-orthogonalized variables are used in IV regression.3 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of each grid, Herfindahl index representing industry

diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.4 Numbers in parentheses are t-statistics clustered at the grid level.5 First stage F-statistics using Post-lasso-orthogonalized variables are reported.

53

Table 4. Spatial Decay Speed with Inverse Square Distance Decay Function Based on IV Lasso Estimates

(1) (2) (3) (4) (5) (6) (7) (8)Timber

Leather, Furs, Processing,Name Food Processing Food Production Beverage Tobacco Textile Industry Garments & Other Down & Related Bamboo, Cane,

Production Processing Fiber Products Products Palm Fiber &Straw Products

code 13 14 15 16 17 18 19 20

Decay Speed 4.47E-05 6.21E-05 8.58E-05 -2.42E-05 7.93E-05 2.99E-05 0.000145 6.43E-05(11.50) (11.71) (13.40) (-3.42) (16.54) (7.68) (15.12) (14.68)

(9) (10) (11) (12) (13) (14) (15) (16)Petroleum Raw Chemical

Name Furniture Papermaking & Printing & Stationery, Processing, Materials & Medical &Manufacturing Paper Products Record Pressing Educational & Coking Products, Chemical Pharmaceutical Chemical Fibers

Sports Goods & Gas Production Products Products& Supply

Code 21 22 23 24 25 26 27 28

Decay Speed 3.39E-05 2.65E-05 2.22E-05 0.000147 9.41E-05 6.30E-05 4.06E-05 0.000317(7.29) (7.81) (4.82) (12.88) (12.25) (15.53) (9.36) (17.60)

(17) (18) (19) (20) (21) (22) (23) (24)Smelting & Smelting & Machinery & Special

Name Rubber Products Plastic Products Non-metal Pressing of Pressing of Metal Products Equipment EquipmentMineral Products Ferrous Metals Nonferrous Manufacturing Manufacturing

MetalsCode 29 30 31 32 33 34 35 36

Decay Speed 0.000161 3.86E-05 6.42E-05 4.08E-05 6.92E-05 4.35E-05 5.97E-05 8.80E-05(18.07) (10.78) (15.60) (9.95) (12.44) (11.46) (15.09) (15.63)

(25) (26) (27) (28) (29) (30)Transportation Electric Equipment Electronic & Instruments, Meters, Artwork & Other

Name Equipment & Machinery Telecommunications Cultural & ManufacturingManufacturing Official Machinery

Code 37 39 40 41 42

Decay Speed 8.90E-05 4.89E-05 0.000110 0.000147 0.000169(14.19) (12.87) (15.71) (13.61) (21.14)

Observations 145 Adj. R-squared 0.974

1 Coefficients reported are two-digit-industry-specific spatial decay speed obtained by OLS estimation of equation (3.2), where localization effects in IV Lasso estimation(weighted by 1/sd ) are regressed on two-digit industry dummies and interaction terms of the decay function and two-digit industry dummies.

2 The spatial decay function is specified as f(d) = 1/d2.3 Numbers in parentheses are t-statistics.

54

Table 5. RMSE & MAE with Different Spatial Decay Functions

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9

New Firm

OLS RMSE 1.30E-05 4.50E-06 2.20E-06 1.40E-05 1.90E-06 2.60E-06 1.40E-05 2.40E-06 2.60E-06

MAE 2.85E-05 7.87E-06 2.54E-06 2.90E-05 2.33E-06 2.86E-06 2.92E-05 2.72E-06 2.86E-06

LASSO RMSE 2.30E-05 7.70E-06 5.90E-06 2.40E-05 5.70E-06 6.10E-06 1.80E-05 1.40E-05 1.40E-05

MAE 3.39E-05 9.26E-06 5.85E-06 3.48E-05 5.74E-06 5.98E-06 5.81E-05 4.68E-05 4.68E-05

New Employment

OLS RMSE 1.40E-05 5.20E-06 2.50E-06 1.50E-05 2.30E-06 2.80E-06 1.50E-05 2.70E-06 2.80E-06

MAE 3.07E-05 8.86E-06 2.78E-06 3.14E-05 2.67E-06 3.06E-06 3.15E-05 2.93E-06 3.06E-06

LASSO RMSE 1.90E-05 1.70E-05 1.60E-05 2.60E-05 7.60E-06 8.00E-06 2.70E-05 7.90E-06 8.00E-06

MAE 6.14E-05 5.39E-05 5.36E-05 3.75E-05 7.34E-06 7.47E-06 3.74E-05 7.37E-06 7.48E-06

1 Root mean squared errors (RMSE) and mean absolute errors (MAE) are based on the residuals of OLS and IV Lasso estimation of equation (3.2) ,

where localization effects (weighted by 1/sd ) are regressed on two-digit industry dummies and interaction terms of the spatial decay function and

two-digit industry dummies.2 Nine spatial decay functions are experimented:

The first decay function is specified as f(d) = −d (Model 1),

The second decay function is specified as f(d) = 1/d (Model 2),

The third decay function is specified as f(d) = 1/ed (Model 3),

The fourth decay function is specified as f(d) = −d2 (Model 4),

The fifth decay function is specified as f(d) = 1/d2 (Model 5),

The sixth decay function is specified as f(d) = 1/e2d (Model 6),

The seventh decay function is specified as f(d) = −d3 (Model 7),

The eighth decay function is specified as f(d) = 1/d3 (Model 8),

The ninth decay function is specified as f(d) = 1/e3d (Model 9).

55

Table 6. Spatial Decay Speed and Industry Characteristics

Regression Using OLS Estimates of Localization Effects(1) (2) (3) (4)

f(d) 0.000083 0.000083 0.000097 0.000077(9.12) (8.91) ( 10.21) (7.11)

f(d)× knowledge spillovers 0.000036 0.000021(3.49) (1.00)

f(d)× labor market pooling 0.000038 0.000023(3.35) (0.94)

f(d)× input sharing -4.11E-06 0.000010(-0.38) (0.88)

f(d)× natural advantage 7.17E-06 0.000013 7.71E-06 7.91E-06(0.71) ( 1.31) (0.70) ( 0.72)

f(d)× high SOE share -0.000047 -0.000054 -0.000036 -0.000053(-4.48) ( -4.73 ) (-3.44) (-4.39)

Constant 9.45E-08 9.45E-08 -2.39E-06 9.45E-08(0.04) (0.04) (-1.22) (0.04)

Adj. R-squared 0.6163 0.6138 0.5833 0.6142

Regression Using LASSO Estimates of Localization Effects(1) (2) (3) (5)

f(d) 0.000089 0.000088 0.00010 0.000086(7.72) (7.56) (8.88) (6.26)


f(d)× labor market pooling 0.000039 0.000016(2.72) (0.53)

f(d)× input sharing -9.43E-06 3.74E-06(-0.70) (0.25)

f(d)× natural advantage 7.50E-06 0.000014 9.68E-06 9.14E-06(0.59) (1.08) (0.71) (0.66)

f(d)× high SOE share -0.000045 -0.000052 -0.000033 -0.000049(-3.36) (-3.57) (-2.54) (-3.18)

Constant -1.22E-06 -1.22E-06 -1.22E-06 -1.22E-06(-0.38) (-0.38) (-0.37) (-0.37)

Adj. R-squared 0.5470 0.5379 0.5152 0.5350

1 Results are obtained by OLS estimation of (3.3), where localization effects(from either OLS or IV Lasso estimation) are regressed on a spatial decayfunction, and interaction terms of the decay function and various industrycharacteristic indicators.

2 The spatial decay function is specified as f(d) = 1/d2.3 For each two-digit industry, the indicator for reliance on knowledge spillovers

equals one if the ratio of new product to total product in the industry is higherthan the median of all industries and zero otherwise. The indicator for relianceon labor pooling equals one if the percentage of collage-educated workers inthe industry is higher than the median of all industries and zero otherwise.The indicator for reliance on input sharing equals one if transportation costper shipment in the industry is higher than the median of all industries andzero otherwise. The indicator for reliance on natural advantage equals one ifat least two of the three cost variables (water, energy, and natural resourcescost per shipment) in the industry are higher than the median of all industriesand zero otherwise. The indicator variable high SOE share equals one if thepercentage of SOE firms in the industry is higher than the median of allindustries and zero otherwise.

4 Numbers in parentheses are t-statistics.

56

Table 7. TFP and Other Correlates - Full Sample Localization Economies

Firm Birth Share TFP Output Per Worker Wages Per Worker

0-1 km 6.92E-05 0.000745 0.0154 0.0178

(23.11) (7.37) (24.70) (42.72)

1-5 km 3.94E-06 0.000533 0.00465 0.00441

(10.75) (5.06) (7.33) (11.45)

5-10 km 1.27E-06 9.84E-05 -0.00123 -0.00163

(4.57) (0.84) (-1.73) (-3.82)

10-20 km 9.80E-08 -0.000146 -0.00243 0.00228

(0.34) (-1.02) (-2.82) (4.46)

20-30 km -1.63E-06 -7.45E-06 -0.00322 0.00268

(-4.99) (-0.05) (-3.75) (5.30)

Obeservation 23,434,810 1,386,056 1,386,056 1,386,056

Adj. R-squared 0.0011 0.434 0.784 0.874

1 Coefficients reported are ring level localization effects obtained by OLS estimation of equation (3.1) for the

full sample. Four different productivity measures, firm birth share, TFP, output per worker, and wages

per worker, are used as the dependent variable.2 Control variables include ring level urbanization measures, Herfindahl index representing industry orga-

nization for each four-digit industry within 30 km of each grid, Herfindahl index representing industry


57

Table 8. Localization Economies of Small versus Big Firms

Firm Birth Share TFP Output Per Worker Wages Per Worker

0-1 km 0.000134 5.86E-05 0.00115 -0.0237

(7.12) (0.23) (0.74) (-23.70)

1-5 km 4.22E-06 0.000417 0.000674 -0.013

(2.72) (1.92) (0.54) (-16.48)

Employment 5-10 km 2.34E-07 0.000203 -0.00125 -0.00932

of (0.20) (0.91) (-0.97) (-11.40)

Small Firms 10-20 km -1.92E-06 -0.000153 -0.00633 -0.00467

(-3.61) (-0.69) (-4.86) (-5.75)

20-30 km -3.07E-06 -0.000149 -0.00287 -0.00243

(-7.16) (-0.70) (-2.28) (-3.12)

0-1 km 6.63E-05 0.000699 0.0106 0.0208

(22.60) (6.73) (18.08) (48.30)

1-5 km 3.62E-06 0.000468 0.00445 0.00817

(9.78) (4.30) (6.95) (20.55)

Employment 5-10 km 1.20E-06 4.74E-05 8.13E-06 0.00143

of (4.17) (0.39) (0.01) (3.31)

Big Firms 10-20 km 2.12E-07 -0.000107 -0.00167 0.00423

(0.80) (-0.74) (-1.96) (8.18)

20-30 km -1.45E-06 1.20E-05 -0.00311 0.00435

(-4.94) (0.08) (-3.60) (8.40)

Observations 23,434,810 1,386,056 1,386,056 1,386,056

Adj. R-squared 0.0012 0.434 0.792 0.874

1 Coefficients reported are localization effects of small firms and big firms on different productivity measures.2 Control variables include ring level urbanization measures, Herfindahl index representing industry organization

for each four-digit industry within 30 km of each grid, Herfindahl index representing industry diversity within

30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.

58

Table 9. Localization Economies within and between Ownership Types

Birth TFP Output Wages

All SOE Non-SOE All SOE Non-SOE All SOE Non-SOE All SOE Non-SOE

Localization Effects of SOE

0-1 km 1.77E-05 7.98E-05 2.55E-05 -0.000919 0.00107 -0.000333 -0.00873 0.05 -0.00663 0.00548 0.0128 0.011

(2.23) (1.97) (2.26) (-3.31) (1.55) (-1.08) (-5.08) (13.36) (-3.89) (5.11) (6.46) (10.12)

1-5 km -4.35E-07 1.01E-05 -4.49E-07 0.000963 0.000443 0.000917 0.00278 0.00372 0.004 0.00139 0.00738 0.000735

(-0.44) (2.37) (-0.32) (5.55) (0.56) (5.31) (2.90) (0.91) (4.35) (2.41) (3.62) (1.27)

5-10 km -8.74E-07 1.59E-06 -1.29E-06 0.000682 0.00205 0.000463 -0.00145 -0.00876 -0.000517 -0.000396 0.000457 -0.000329

(-0.86) (0.85) (-0.98) (3.89) (2.03) (2.68) (-1.54) (-1.70) (-0.57) (-0.69) (0.18) (-0.57)

10-20 km -1.34E-06 8.29E-07 -1.85E-06 0.000335 -0.000314 0.000366 0.00205 0.00191 0.00097 -0.000288 -0.00164 -0.000238

(-2.44) (0.93) (-2.67) (2.58) (-0.34) (2.86) (2.77) (0.39) (1.35) (-0.64) (-0.69) (-0.53)

20-30 km -9.97E-07 -9.06E-07 -1.28E-06 1.43E-05 -8.65E-05 6.11E-05 0.00102 -0.00577 0.000889 -0.00278 -0.00596 -0.00251

(-2.30) (-1.79) (-2.30) (0.12) (-0.09) (0.51) (1.44) (-1.17) (1.29) (-6.41) (-2.48) (-5.83)

Localization Effects of Non-SOE

0-1 km 7.06E-05 1.22E-05 9.87E-05 0.000877 -0.000194 0.000694 0.0168 0.038 0.0113 0.0179 0.0201 0.0154

(22.99) (3.93) (25.14) (8.63) (-0.30) (7.03) (28.33) (11.01) (19.98) (46.24) (11.50) (40.48)

1-5 km 4.05E-06 4.74E-07 5.68E-06 0.000368 0.000903 0.000365 0.00412 0.0173 0.00403 0.00358 0.00878 0.0034

(10.76) (0.94) (12.13) (3.50) (1.13) (3.53) (6.51) (4.13) (6.51) (9.29) (4.11) (8.90)

5-10 km 1.37E-06 7.72E-07 1.98E-06 1.94E-05 0.0000411 2.67E-05 -0.00147 -0.000306 -0.00102 -0.00172 -0.00408 -0.00144

(4.87) (2.03) (5.61) (0.16) (0.05) (0.23) (-2.06) (-0.07) (-1.45) (-4.04) (-1.82) (-3.35)

10-20 km 1.04E-07 -4.98E-07 3.98E-07 -0.000181 0.000566 -0.000165 -0.00247 0.0107 -0.00285 0.00279 0.00319 0.00278

(0.37) (-1.60) (1.12) (-1.27) (0.66) (-1.17) (-2.88) (2.25) (-3.40) (5.51) (1.34) (5.48)

20-30 km -1.63E-06 -8.21E-07 -1.96E-06 -0.00011 -0.000325 -8.21E-05 -0.00384 0.00259 -0.0046 0.00291 -0.00245 0.00282

(-5.09) (-2.44) (-4.93) (-0.75) (-0.37) (-0.57) (-4.47) (0.57) (-5.46) (5.75) (-1.07) (5.57)

Observations 23,434,810 23,434,810 23,434,810 23,434,810 23,434,810 23,434,810 23,434,810 23,434,810 23,434,810 23,434,810 23,434,810 23,434,810

Adj. R-squared 0.0011 0.0001 0.0014 0.435 0.295 0.451 0.784 0.685 0.792 0.874 0.837 0.876

1 Coefficients reported are localization effects of SOEs and non-SOEs on different productivity measures for all firms, SOEs, and non-SOEs.2 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of each grid, Herfindahl index representing industry


59

Figure 1: Cross Industry Comparison of the Attenuation of Localization Economies (IV)

60

(1) Dependence on Transportation (2) New Product Ratio

(3) College (and above) Degree Worker Ratio (4) Dependence on Natural Resources

(5) SOE Ratio

Figure 2: Kernel Density Estimation of Attenuation Parameters by Industry Characteristics

61

Attenuation Pattern from Rosenthal and Strange (2003) - FiveIndustries

Attenuation Pattern from Rosenthal and Strange (2003) - FourIndustries (Apparel Excluded)

Attenuation Pattern based on Our Estimates

Figure 3: Attenuation of Localization Economies for Five Selected Manufacturing Industries inComparison with Rosenthal and Strange (2003)

62

Figure 4: TFP and Other Correlates - Full Sample Localization Economies

63

(1) Birth (2) TFP

(3) Output Per Worker (4) Wages per Workder

Figure 5: Localization Economies within and between Ownership Types

64

Appendix A: Geocoding

Geocoding Chinese data can be tricky. Due to national security concerns, all map service

providers in China are mandated by the government to use a specific coordinate system called

GCJ-02. GCJ-02 (colloquially Mars Coordinates) is formulated by the Chinese Academy of

Surveying and Mapping (CASM) and is based on World Geodetic System 1984 (WGS-84).54

However, the way CASM converts GCJ-02 coordinates to WGS-84 coordinates is by using an

obfuscation algorithm to add random offsets to the WGS-84 latitude and longitude. Thus,

GCJ-02 coordinates can be displayed at the correct location on a GCJ-02 map, but not on a

WGS-84 map. Because almost all geographic information system (GIS) software is based on

WGS-84, directly using GCJ-02 coordinates in GIS software will cause measurement errors in

calculated distances. Geocoding with Google’s geocoding application programming interface

cannot resolve this issue because Google Maps also uses GCJ-02 for locations in China.

Thus, geocoding Chinese data and correctly processing it in GIS software requires us to

reverse the obfuscation algorithm and obtain the longitude and latitude based on WGS-84.

We are unaware of any previous relevant studies that have carefully dealt with this geocoding

issue. This may not be a serious issue if the geographic distances considered in the analysis

are large enough as the distance bias caused by the obfuscation algorithm is often within the

range of several hundreds of meters. However, in this study, we consider the distance of firms

within a few kilometres, in which case carefully dealing with the obfuscation algorithm can be

important to avoid serious bias. We consulted experts in geography who provided the source

code of the obfuscation algorithm to us. We then converted GCJ-02 coordinates to WGS-84

coordinates by reversing the obfuscation algorithm after the regular geocoding process.

54WGS-84 is the most commonly used reference system in cartography, geodesy, and satellite navigation,including GPS.

65

Appendix B: TFP Estimation

We estimate firm-level TFP following Brandt et al. (2012), Loecker and Warzynski (2012),

Ackerberg et al. (2015), and Brandt et al. (2017). The classical challenge in the estimation

of firm TFP is that firm productivity shocks are known to profit-maximizing firms but un-

observable to econometricians. Firms choose the current year’s inputs based on the contem-

poraneous productivity shocks. Thus, firms’ current year’s inputs may be endogenous. In

this case, the OLS estimation of the firm production function will lead to biased estimates of

TFP.

There are several commonly used techniques in the literature to deal with this endogeneity

problem. Olley and Pakes (1996) (i.e. the OP method) show that we can use firm investment

as a proxy for unobserved firm productivity shocks with a control function approach. Extend-

ing their framework, Levinsohn and Petrin (2003) suggest that we can use firm intermediate

inputs as a proxy for unobserved firm productivity shocks. Later progresses of the literature

are mostly built on those two seminal studies. The method we use in this study to estimation

firm-level TFP uses firm intermediate inputs in the control function and a GMM algorithm

that is first proposed by Loecker and Warzynski (2012) for parameter estimation in the firm

production function. The detailed TFP estimation procedure follows Brandt et al. (2012)

and Brandt et al. (2017) closely.

66

Appendix C: Tables

Table A1. Summary Statistics - Grid Level Four-Digit Industry Existing Employment


Food Processing 13 996705 2.894 68.998

Food Production 14 1173972 1.231 42.701

Beverage Production 15 756300 1.432 62.179

Tobacco Processing 16 44346 4.604 170.803

Textile Industry 17 1194920 5.099 160.653

Garments & Other Fibre Products 18 165315 25.756 310.736

Leather, Furs, Down & Related Products 19 485520 5.257 195.902

Timber Processing, Bamboo, Cane, Palm Fibre & Straw Products 20 479000 2.538 52.687

Furniture Manufacturing 21 250010 3.778 73.696

Papermaking & Paper Products 22 295940 4.783 75.624

Printing & Record Pressing 23 261945 2.055 37.056

Stationery, Educational & Sports Goods 24 578214 2.072 65.986

Petroleum Processing, Coking Products, Gas Production & Supply 25 183640 4.495 144.634

Raw Chemical Materials & Chemical Products 26 1953240 1.858 57.129

Medical & Pharmaceutical Products 27 361578 3.980 85.733

Chemical Fibres 28 241101 1.722 74.258

Rubber Products 29 453618 2.017 67.985

Plastic Products 30 537660 4.361 101.337

Non-metal Mineral Products 31 2026950 2.262 60.659

Smelting & Pressing of Ferrous Metals 32 232588 12.800 480.847

Smelting & Pressing of Nonferrous Metals 33 489096 2.820 122.861

Metal Products 34 1070748 2.767 54.681

Machinery & Equipment Manufacturing 35 1884769 2.429 57.887

Special Equipment Manufacturing 36 2352880 1.154 50.335

Transportation Equipment Manufacturing 37 1213107 3.400 106.446

Electric Equipment & Machinery 39 1407264 3.423 113.021

Electronic & Telecommunications 40 684292 8.346 346.336

Instruments, Meters, Cultural & Official Machinery 41 1058784 1.032 46.118

Artwork & Other Manufacturing 42 601308 2.003 60.761

ALL NA 23,434,810 1147.484 3899.308

67

Table A2. OLS Estimates - Firm Birth Share

(1) (2) (3) (4) (5) (6) (7) (8)

Timber

Garments & Leather, Furs, Processing,

Name Food Processing Food Production Beverage Tobacco Textile Industry Other Fiber Down & Related Bamboo, Cane,

Production Processing Products Products Palm Fiber &

Straw Products

code 13 14 15 16 17 18 19 20


0-1 km 4.53E-05 6.87E-05 8.58E-05 -1.78E-05 7.80E-05 2.90E-05 0.000145 6.65E-05

(6.87) (5.09) (4.71) (-1.03) (7.78) (4.44) (3.50) (7.92)

1-5 km 2.18E-06 3.57E-06 2.92E-06 2.13E-05 4.18E-06 2.80E-06 -3.13E-06 5.20E-06

(2.50) (2.10) (2.07) (0.67) (2.98) (2.11) (-0.65) (3.72)

5-10 km 1.88E-06 9.20E-07 -2.63E-07 -4.81E-06 -1.93E-07 -2.40E-07 2.41E-07 2.70E-06

(2.58) (0.80) (-0.19) (-0.25) (-0.17) (-0.27) (0.12) (2.94)

10-20 km 1.03E-06 1.47E-06 1.29E-06 3.91E-06 -3.67E-07 2.76E-07 -7.20E-07 2.71E-06

(1.10) (1.01) (0.77) (0.19) (-0.27) (0.21) (-0.39) (2.77)

20-30 km -9.09E-07 2.08E-07 2.21E-08 1.05E-05 -2.31E-06 -4.28E-06 -1.42E-06 2.03E-06

(-0.77) (0.12) (0.01) (0.63) (-1.66) (-2.44) (-0.51) (1.74)


0-1 km 6.34E-06 1.24E-05 1.39E-05 5.90E-05 1.34E-05 1.53E-05 1.24E-05 6.18E-06

(4.50) (6.01) (4.50) (1.41) (7.12) (3.77) (3.53) (5.18)

1-5 km -4.20E-06 -2.44E-06 -4.82E-06 -7.46E-05 -3.86E-06 -5.85E-06 -3.93E-06 -5.73E-06

(-4.83) (-2.26) (-2.09) (-1.56) (-3.61) (-2.89) (-1.85) (-5.59)

5-10 km -2.26E-06 -8.96E-08 -1.26E-06 8.99E-05 -4.11E-07 -1.43E-06 -5.08E-06 5.73E-07

(-1.83) (-0.07) (-0.60) (1.92) (-0.41) (-0.62) (-1.47) (0.52)

10-20 km -3.50E-06 -6.56E-06 -3.58E-06 -2.98E-06 -3.34E-06 -6.43E-06 -5.74E-06 -3.43E-06

(-2.14) (-3.58) (-1.55) (-0.03) (-2.66) (-2.29) (-1.30) (-2.74)

20-30 km -1.15E-06 6.08E-07 -9.16E-07 -2.24E-04 -2.75E-06 1.24E-06 5.19E-08 -1.58E-06

(-0.71) (0.35) (-0.36) (-1.34) (-1.56) (0.44) (0.01) (-0.93)

Observations 996,705 1,173,972 756,300 44,346 1,194,920 165,315 485,520 479,000

Adj. R-squared 0.002 0.001 0.002 0.007 0.003 0.004 0.004 0.008

1 Coefficients reported are ring level localization and urbanization effects obtained by OLS estimation of equation (3.1) for each two-digit industry.2 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of each grid, Herfindahl index representing

industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.

68

Table A2 (Continued). OLS Estimates - Firm Birth Share

(9) (10) (11) (12) (13) (14) (15) (16)

Petroleum Raw Chemical

Name Furniture Papermaking & Printing & Stationery, Processing, Materials & Medical &

Manufacturing Paper Products Record Pressing Educational & Coking Products, Chemical Pharmaceutical Chemical Fibers

Sports Goods & Gas Production Products Products

& Supply

Code 21 22 23 24 25 26 27 28


0-1 km 3.51E-05 2.60E-05 2.09E-05 0.000140 9.44E-05 6.18E-05 5.06E-05 0.000319

(3.79) (5.22) (2.30) (2.44) (3.61) (8.51) (4.76) (2.16)

1-5 km 4.69E-06 2.00E-06 1.71E-06 5.12E-06 6.26E-06 2.09E-06 5.04E-07 1.09E-05

(1.91) (2.09) (0.90) (0.93) (2.40) (2.08) (0.54) (1.22)

5-10 km 3.08E-06 -6.32E-08 -2.10E-06 -5.95E-07 1.85E-06 3.11E-07 -2.88E-08 -4.61E-06

(2.37) (-0.08) (-1.26) (-0.10) (0.71) (0.44) (-0.03) (-0.97)

10-20 km -3.01E-07 2.96E-07 -2.10E-06 -6.00E-06 8.39E-07 -3.05E-07 5.67E-08 6.88E-06

(-0.09) (0.31) (-1.03) (-1.04) (0.45) (-0.47) (0.08) (1.14)

20-30 km -9.37E-08 -1.57E-06 -2.62E-07 -1.19E-05 -6.81E-09 -2.53E-06 -2.63E-06 1.77E-06

(-0.06) (-1.97) (-0.17) (-1.74) (-0.00) (-3.34) (-2.84) (0.49)


0-1 km 1.62E-05 2.60E-05 2.33E-05 2.13E-05 1.27E-05 1.17E-05 1.90E-05 1.38E-05

(3.34) (5.22) (5.62) (4.55) (2.63) (7.67) (6.77) (2.19)

1-5 km -6.61E-06 2.00E-06 8.48E-07 -7.97E-07 -5.26E-06 -1.73E-06 -4.64E-06 -6.48E-06

(-1.55) (2.09) (0.39) (-0.24) (-2.31) (-2.79) (-3.35) (-0.91)

5-10 km -3.80E-06 -6.32E-08 -2.72E-06 -5.31E-06 -3.67E-06 -2.64E-06 -3.28E-06 -4.12E-06

(-0.61) (-0.08) (-0.99) (-1.21) (-0.94) (-3.09) (-1.63) (-0.56)

10-20 km 1.52E-06 2.96E-07 -1.73E-06 -7.98E-06 -6.46E-06 -1.46E-06 -5.49E-06 1.70E-05

(0.22) (0.31) (-0.56) (-1.80) (-1.27) (-1.16) (-2.49) (1.26)

20-30 km -9.64E-07 -1.57E-06 -3.67E-06 -5.84E-06 1.12E-06 -2.48E-06 2.32E-06 -2.28E-05

(-0.18) (-1.97) (-0.69) (-1.24) (0.20) (-1.89) (1.04) (-1.64)

Observations 250,010 295,940 261,945 578,214 183,640 1,953,240 361,578 241,101

Adj. R-squared 0.002 0.002 0.002 0.001 0.004 0.001 0.003 0.007

1 Coefficients reported are ring level localization and urbanization effects obtained by OLS estimation of equation (3.1) for each two-digit industry.2 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of each grid, Herfindahl index repre-

senting industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.

69


(17) (18) (19) (20) (21) (22) (23) (24)

Smelting & Smelting & Machinery & Special

Name Rubber Products Plastic Products Non-metal Pressing of Pressing of Metal Products Equipment Equipment

Mineral Products Ferrous Metals Nonferrous Manufacturing Manufacturing

Metals

Code 29 30 31 32 33 34 35 36


0-1 km 0.000164 3.96E-05 6.52E-05 4.03E-05 6.80E-05 4.33E-05 6.03E-05 8.61E-05

(4.59) (7.05) (8.82) (5.48) (4.99) (6.92) (8.89) (6.14)

1-5 km 5.66E-06 2.73E-06 6.41E-06 3.64E-06 4.37E-06 4.60E-06 4.17E-06 3.29E-06

(2.49) (3.61) (5.50) (2.92) (2.49) (4.78) (5.73) (2.59)

5-10 km 3.15E-06 1.60E-06 3.27E-06 5.94E-09 -6.16E-10 1.34E-07 2.64E-06 1.37E-06

(1.50) (2.16) (4.00) (0.01) (-0.00) (0.18) (2.57) (1.00)

10-20 km 3.81E-06 1.36E-06 7.37E-07 -2.30E-07 -2.10E-06 9.75E-08 7.86E-07 -1.71E-06

(2.10) (2.21) (0.76) (-0.29) (-1.51) (0.10) (0.76) (-1.38)

20-30 km 1.70E-06 1.50E-07 -3.09E-07 -7.31E-07 -2.42E-07 -8.28E-07 -8.81E-07 -3.24E-06

(0.75) (0.16) (-0.28) (-0.70) (-0.15) (-0.77) (-0.82) (-2.58)


0-1 km 1.30E-05 1.27E-05 9.77E-06 9.35E-06 1.24E-05 1.50E-05 1.30E-05 1.93E-05

(4.59) (9.84) (8.89) (4.40) (4.97) (8.79) (10.77) (9.14)

1-5 km -3.93E-06 -2.93E-06 -3.24E-06 -2.63E-06 -5.51E-06 -4.22E-06 -2.63E-06 -1.77E-06

(-2.32) (-4.59) (-4.40) (-1.76) (-2.49) (-3.29) (-3.96) (-2.03)

5-10 km -6.37E-07 -2.24E-06 5.46E-07 -4.15E-06 -3.91E-07 -9.81E-07 -1.12E-06 -2.72E-06

(-0.25) (-2.79) (0.59) (-2.05) (-0.08) (-1.06) (-1.54) (-2.95)

10-20 km -7.58E-06 -1.16E-06 -4.33E-06 -3.85E-06 -6.81E-06 -2.01E-06 -3.30E-06 -2.12E-06

(-2.01) (-0.99) (-3.46) (-1.47) (-1.51) (-1.47) (-3.30) (-1.67)

20-30 km -5.05E-06 -4.10E-06 -1.90E-06 -1.14E-06 -3.93E-06 8.48E-07 -5.14E-07 -6.79E-06

(-1.22) (-2.92) (-1.67) (-0.38) (-0.58) (0.56) (-0.49) (-1.91)

Observations 453,618 537,660 2,026,950 232,588 489,096 1,070,748 1,884,769 2,352,880

Adj. R-squared 0.005 0.006 0.002 0.006 0.002 0.002 0.001 0.001



70


(25) (26) (27) (28) (29)

Transportation Electric Equipment Electronic & Instruments, Meters, Artwork & Other

Name Equipment & Machinery Telecommunications Cultural & Manufacturing

Manufacturing Official Machinery

Code 37 39 40 41 42


0-1 km 8.71E-05 4.97E-05 0.000107 0.000146 0.000171

(5.02) (7.85) (4.92) (2.81) (6.01)

1-5 km 7.53E-06 3.64E-06 3.58E-06 5.51E-06 5.37E-06

(2.69) (3.45) (1.46) (1.06) (2.24)

5-10 km 1.47E-06 1.52E-06 -5.85E-07 3.37E-06 7.26E-06

(0.97) (2.09) (-0.30) (1.25) (3.86)

10-20 km -1.93E-06 9.66E-07 -2.25E-06 6.16E-07 2.73E-06

(-1.03) (1.40) (-0.96) (0.25) (1.51)

20-30 km -3.16E-06 6.07E-07 -4.46E-06 -3.87E-06 -4.54E-07

(-1.89) (0.70) (-2.17) (-0.92) (-0.26)


0-1 km 1.33E-05 1.72E-05 1.65E-05 1.25E-05 1.37E-05

(5.53) (7.78) (4.76) (4.62) (5.39)

1-5 km -1.60E-06 -1.71E-06 -2.92E-06 1.68E-06 -8.20E-06

(-1.61) (-1.72) (-1.37) (0.64) (-4.06)

5-10 km -9.99E-07 -1.93E-06 2.68E-07 -5.63E-06 -9.37E-07

(-0.98) (-2.23) (0.12) (-1.44) (-0.47)

10-20 km -5.70E-06 -2.66E-06 -3.35E-06 -4.82E-06 -1.12E-06

(-2.71) (-1.58) (-0.87) (-1.43) (-0.37)

20-30 km -7.00E-08 -8.14E-07 -4.04E-06 -5.84E-06 -5.95E-06

(-0.03) (-0.31) (-1.06) (-1.61) (-1.24)

Observations 1,213,107 1,407,264 684,292 1,058,784 601,308

Adj. R-squared 0.001 0.001 0.002 0.001 0.005

1 Coefficients reported are ring level localization and urbanization effects obtained by OLS estimation of equation (3.1) for each

two-digit industry.2 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of each

grid, Herfindahl index representing industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digit

industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.

71

Table A3. IV LASSO Estimates - Firm Birth Share

(1) (2) (3) (4) (5) (6) (7) (8)

Timber

Garments & Leather, Furs, Processing,

Name Food Processing Food Production Beverage Tobacco Textile Industry Other Fiber Down & Related Bamboo, Cane,

Production Processing Products Products Palm Fiber &

Straw Products

code 13 14 15 16 17 18 19 20


0-1 km 4.64E-05 6.77E-05 8.95E-05 -6.33E-05 6.61E-05 2.63E-05 0.0001683 6.63E-05

(5.76) (4.44) (3.88) (-1.00) (5.15) (3.63) (3.10) (6.55)

1-5 km 9.90E-06 6.98E-06 -1.81E-06 -5.40E-05 1.14E-05 -9.02E-07 -1.02E-06 1.16E-05

(2.85) (1.79) (-0.44) (-1.00) (2.66) (-0.31) (-0.10) (4.02)

5-10 km 8.87E-07 1.05E-05 4.22E-06 -4.65E-05 -1.44E-06 -6.75E-07 1.50E-05 1.40E-06

(0.21) (2.36) (0.58) (-1.00) (-0.36) (-0.23) (0.90) (0.60)

10-20 km 4.49E-06 -8.15E-07 1.00E-05 -4.58E-05 -2.47E-06 -2.65E-06 1.79E-05 1.03E-05

(1.18) (-0.15) (2.20) (-1.00) (-0.44) (-0.59) (1.41) (2.76)

20-30 km 4.83E-06 2.25E-06 -5.17E-06 -5.08E-05 9.65E-06 1.31E-05 2.25E-06 -2.66E-06

(0.98) (0.42) (-0.94) (-1.00) (1.49) (1.79) (0.32) (-0.71)


0-1 km 6.50E-06 1.40E-05 1.70E-05 -5.34E-06 1.94E-05 1.91E-05 2.25E-05 8.68E-06

(3.01) (5.79) (2.81) (-0.97) (5.30) (3.42) (3.06) (4.81)

1-5 km -1.25E-05 -7.01E-06 -1.33E-06 2.89E-05 -1.30E-05 -1.63E-06 -8.76E-06 -1.17E-05

(-4.08) (-2.42) (-0.38) (1.00) (-2.97) (-0.43) (-0.91) (-4.48)

5-10 km -1.06E-06 -8.59E-06 -3.27E-06 7.20E-06 1.76E-06 9.53E-07 -1.74E-05 1.93E-06

(-0.19) (-2.01) (-0.81) (0.96) (0.35) (0.19) (-0.89) (0.69)

10-20 km -7.58E-06 -6.51E-06 -3.05E-05 -5.39E-05 2.14E-06 1.21E-06 -4.01E-05 -1.40E-05

(-1.20) (-0.95) (-3.73) (-1.00) (0.24) (0.13) (-1.45) (-3.25)

20-30 km -4.34E-06 -1.99E-06 1.77E-05 1.23E-05 -1.78E-05 -2.07E-05 2.96E-05 6.74E-06

(-0.63) (-0.27) (1.76) (0.99) (-1.78) (-1.73) (1.13) (1.35)

Observations 996,705 1,173,972 756,300 44,346 1,194,920 165,315 485,520 479,000

1 Coefficients reported are ring level localization and urbanization effects obtained by IV Lasso estimation of equation (3.1) for each two-digit industry.2 Post-lasso-orthogonalized variables are used in IV regression.3 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of each grid, Herfindahl index representing


72

Table A3 (Continued). IV LASSO Estimates - Firm Birth Share

(9) (10) (11) (12) (13) (14) (15) (16)

Petroleum Raw Chemical

Name Furniture Papermaking & Printing & Stationery, Processing, Materials & Medical &

Manufacturing Paper Products Record Pressing Educational & Coking Products, Chemical Pharmaceutical Chemical Fibers

Sports Goods & Gas Production Products Products

& Supply

Code 21 22 23 24 25 26 27 28


0-1 km 2.94E-05 2.15E-05 2.42E-05 0.0001681 9.91E-05 7.46E-05 4.58E-05 0.0003302

(3.58) (3.72) (1.88) (2.42) (3.35) (7.45) (4.41) (1.97)

1-5 km -1.77E-06 1.95E-06 -2.85E-06 1.18E-05 1.03E-05 6.50E-07 2.57E-06 4.32E-06

(-0.50) (0.95) (-0.52) (0.80) (1.97) (0.19) (1.12) (0.34)

5-10 km 1.91E-06 1.35E-06 -1.25E-06 1.13E-05 4.60E-06 -1.43E-06 2.51E-06 -8.69E-06

(0.35) (0.52) (-0.19) (0.68) (0.77) (-0.42) (1.10) (-1.34)

10-20 km 6.02E-06 -3.43E-06 -3.78E-06 -2.52E-05 8.46E-06 1.18E-06 6.26E-07 -7.17E-06

(0.74) (-0.88) (-0.56) (-1.58) (2.27) (0.33) (0.24) (-0.73)

20-30 km -7.49E-06 -3.72E-06 1.01E-05 -2.76E-05 3.99E-06 -1.31E-06 -2.39E-06 -6.48E-06

(-0.77) (-0.76) (0.68) (-2.79) (0.92) (-0.35) (-0.80) (-0.61)


0-1 km 2.42E-05 9.10E-06 2.45E-05 1.99E-05 2.43E-05 1.63E-05 1.73E-05 1.68E-05

(3.16) (4.40) (5.25) (2.73) (2.46) (7.76) (5.26) (1.02)

1-5 km -7.57E-06 -6.95E-06 8.43E-06 -1.17E-05 -9.35E-06 -3.90E-06 -7.36E-06 -1.73E-05

(-2.48) (-2.58) (1.91) (-0.92) (-1.42) (-1.58) (-2.99) (-0.68)

5-10 km 3.11E-06 -4.75E-06 -7.76E-06 -2.80E-05 -1.43E-05 -4.05E-06 -7.16E-06 -5.13E-06

(0.43) (-1.26) (-0.94) (-1.85) (-1.34) (-1.02) (-1.78) (-0.22)

10-20 km -2.42E-05 -2.35E-06 6.91E-06 1.15E-05 -1.47E-05 -3.25E-06 -4.58E-06 4.81E-05

(-1.51) (-0.32) (0.61) (0.44) (-1.10) (-0.50) (-0.80) (0.89)

20-30 km 1.09E-05 -2.96E-06 -2.27E-05 1.24E-05 1.52E-05 -7.52E-06 4.45E-07 -2.88E-05

(0.88) (-0.35) (-1.13) (0.95) (0.97) (-1.25) (0.08) (-0.65)

Observations 250,010 295,940 261,945 578,214 183,640 1,953,240 361,578 241,101



73


(17) (18) (19) (20) (21) (22) (23) (24)

Smelting & Smelting & Machinery & Special

Name Rubber Products Plastic Products Non-metal Pressing of Pressing of Metal Products Equipment Equipment

Mineral Products Ferrous Metals Nonferrous Manufacturing Manufacturing

Metals

Code 29 30 31 32 33 34 35 36


0-1 km 0.0001703 3.85E-05 7.59E-05 3.84E-05 7.56E-05 4.95E-05 7.49E-05 9.28E-05

(4.06) (5.32) (6.66) (4.63) (4.85) (5.98) (6.46) (5.61)

1-5 km 7.60E-06 7.30E-06 2.04E-05 5.84E-07 9.32E-08 6.72E-07 5.28E-06 4.97E-07

(1.37) (2.44) (3.88) (0.26) (0.02) (0.19) (1.54) (0.08)

5-10 km 7.14E-06 -4.65E-06 -3.31E-06 5.15E-06 3.70E-06 3.93E-06 -6.58E-07 3.50E-06

(1.17) (-1.31) (-0.73) (2.59) (1.19) (1.24) (-0.17) (0.67)

10-20 km 7.59E-06 6.28E-06 1.41E-06 -1.11E-06 -1.72E-06 -4.72E-06 6.03E-06 1.04E-05

(1.02) (1.17) (0.38) (-0.41) (-0.50) (-0.86) (1.22) (1.59)

20-30 km 5.71E-06 8.33E-07 3.97E-07 -6.82E-06 -1.38E-06 -2.43E-07 4.67E-06 -3.62E-06

(0.56) (0.16) (0.11) (-2.12) (-0.32) (-0.04) (1.02) (-0.59)


0-1 km 1.30E-05 1.40E-05 1.44E-05 1.11E-05 1.24E-05 1.68E-05 1.76E-05 2.93E-05

(2.82) (7.09) (7.35) (3.68) (3.66) (7.94) (8.90) (7.38)

1-5 km -4.87E-06 -7.14E-06 -1.37E-05 -3.61E-06 -4.50E-06 -4.77E-06 -5.81E-06 -4.63E-06

(-0.99) (-2.83) (-4.65) (-1.38) (-1.17) (-1.55) (-2.21) (-1.14)

5-10 km -6.17E-06 2.36E-06 2.43E-06 -9.96E-06 -1.94E-05 -6.32E-06 6.95E-07 -4.69E-06

(-0.87) (0.74) (0.80) (-2.81) (-2.02) (-1.54) (0.19) (-0.84)

10-20 km -1.39E-05 -8.04E-06 -6.34E-06 -4.09E-06 -4.90E-06 3.13E-06 -1.17E-05 -2.66E-05

(-1.06) (-0.99) (-1.51) (-0.68) (-0.62) (0.35) (-1.45) (-2.04)

20-30 km -1.49E-05 -5.86E-06 -6.80E-06 7.56E-06 1.30E-05 -5.33E-06 -7.19E-06 7.49E-06

(-1.02) (-0.73) (-1.30) (1.24) (1.12) (-0.58) (-1.10) (0.66)

Observations 453,618 537,660 2,026,950 232,588 489,096 1,070,748 1,884,769 2,352,880



74


(25) (26) (27) (28) (29)Transportation Electric Equipment Electronic & Instruments, Meters, Artwork & Other


Code 37 39 40 41 42

Localization Effects0-1 km 0.0001072 5.79E-05 8.64E-05 0.0002248 0.0001889

(4.01) (6.32) (4.04) (3.05) (5.45)1-5 km -5.20E-06 -6.17E-07 5.28E-06 3.41E-06 1.78E-05

(-0.58) (-0.18) (0.74) (0.36) (1.27)5-10 km 2.82E-06 8.66E-08 6.90E-06 -4.18E-06 5.32E-06

(0.44) (0.03) (1.05) (-0.64) (0.44)10-20 km -3.49E-06 1.15E-05 1.58E-06 -8.57E-06 -6.53E-06

(-0.46) (1.85) (0.27) (-0.90) (-0.66)20-30 km 3.24E-06 -1.62E-06 6.47E-07 -1.45E-05 -6.34E-06

(0.35) (-0.29) (0.10) (-1.06) (-0.50)

Urbanization Effects0-1 km 1.90E-05 2.12E-05 2.41E-05 1.36E-05 2.15E-05

(3.95) (9.27) (3.76) (2.67) (4.61)1-5 km 1.91E-06 1.61E-06 -1.38E-06 -1.32E-05 -2.93E-05

(0.33) (0.49) (-0.20) (-1.34) (-2.64)5-10 km -4.03E-07 -4.35E-06 -1.40E-05 2.11E-06 -3.83E-06

(-0.06) (-1.16) (-1.84) (0.14) (-0.30)10-20 km -4.55E-06 -2.10E-05 -9.56E-06 -1.84E-06 9.33E-06

(-0.31) (-1.88) (-0.72) (-0.10) (0.53)20-30 km -1.81E-05 -1.25E-06 -1.50E-05 1.82E-06 -1.20E-05

(-0.83) (-0.11) (-0.94) (0.08) (-0.57)Observations 1,213,107 1,407,264 684,292 1,058,784 601,308

1 Coefficients reported are ring level localization and urbanization effects obtained by IV Lasso estimation of equation (3.1) foreach two-digit industry.

2 Post-lasso-orthogonalized variables are used in IV regression.3 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of each

grid, Herfindahl index representing industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digitindustry fixed effects.

4 Numbers in parentheses are t-statistics clustered at the grid level.

75

Table A4.1. Spatial Decay Speed with Inverse Linear Distance Decay Function Based on IV Lasso Estimates

(1) (2) (3) (4) (5) (6) (7) (8)Timber



code 13 14 15 16 17 18 19 20

Decay Speed 4.53E-05 6.74E-05 8.50E-05 -1.62E-05 6.82E-05 2.64E-05 0.00015338 6.73E-05(4.26) (4.76) (5.02) (-0.53) (5.16) (2.57) (6.03) (5.84)




Code 21 22 23 24 25 26 27 28

Decay Speed 3.21E-05 2.48E-05 2.61E-05 0.000201 9.25E-05 7.82E-05 4.75E-05 0.00030005(2.84) (2.71) (1.90) (6.95) (4.97) (6.79) (4.13) (7.31)



MetalsCode 29 30 31 32 33 34 35 36

Decay Speed 0.0001613 4.02E-05 8.07E-05 4.16E-05 7.74E-05 5.31E-05 7.45E-05 9.37E-05(7.14) (3.91) (6.60) (3.98) (5.53) (4.88) (5.97) (6.32)



Code 37 39 40 41 42

Decay Speed 0.00010961 5.90E-05 8.73E-05 0.0002342 0.00020302(5.88) (5.18) (5.25) (7.98) (9.52)


2 The spatial decay function is specified as f(d) = 1/d.3 Numbers in parentheses are t-statistics.

76

Table A4.2. Spatial Decay Speed with Inverse Exponential Distance Decay Function Based on IV Lasso Estimates

(1) (2) (3) (4) (5) (6) (7) (8)Timber



code 13 14 15 16 17 18 19 20

Decay Speed 0.000113 0.000171 0.000238 -3.92E-05 0.000170 6.99E-05 0.000439 0.000168(5.54) (6.24) (7.12) (-0.67) (6.72) (3.60) (8.60) (7.54)




Code 21 22 23 24 25 26 27 28

Decay Speed 8.10E-05 5.95E-05 6.89E-05 0.000491 0.000252 0.000204 0.000123 0.000919(3.84) (3.43) (2.65) (8.50) (6.72) (9.15) (5.45) (10.42)



MetalsCode 29 30 31 32 33 34 35 36

Decay Speed 0.000445 9.91E-05 0.000199 0.000104 0.000205 0.000134 0.000195 0.000246(9.94) (5.09) (8.32) (5.16) (7.46) (6.46) (8.08) (8.56)



Code 37 39 40 41 42

Decay Speed 0.000293 0.000154 0.000226 0.000630 0.000513(8.08) (7.10) (6.97) (10.68) (12.34)


2 The spatial decay function is specified as f(d) = 1/ed.3 Numbers in parentheses are t-statistics.

77

Online Appendix

Table OA1. Summary Statistics - Grid Level Four-Digit Industry Employment Birth Share


Food Processing 13 996705 1.50E-05 0.001

Food Production 14 1173972 1.62E-05 0.002

Beverage Production 15 756300 1.59E-05 0.002

Tobacco Processing 16 44346 6.76E-05 0.007

Textile Industry 17 1194920 1.67E-05 0.002

Garments & Other Fibre Products 18 165315 1.81E-05 0.002

Leather, Furs, Down & Related Products 19 485520 2.06E-05 0.002

Timber Processing, Bamboo, Cane, Palm Fibre & Straw Products 20 479000 1.67E-05 0.001

Furniture Manufacturing 21 250010 2.00E-05 0.002

Papermaking & Paper Products 22 295940 1.35E-05 0.001

Printing & Record Pressing 23 261945 1.91E-05 0.002

Stationery, Educational & Sports Goods 24 578214 2.25E-05 0.003

Petroleum Processing, Coking Products, Gas Production & Supply 25 183640 2.18E-05 0.002

Raw Chemical Materials & Chemical Products 26 1953240 1.43E-05 0.002

Medical & Pharmaceutical Products 27 361578 1.66E-05 0.001

Chemical Fibres 28 241101 2.49E-05 0.003

Rubber Products 29 453618 1.98E-05 0.002

Plastic Products 30 537660 1.67E-05 0.001

Non-metal Mineral Products 31 2026950 1.43E-05 0.001

Smelting & Pressing of Ferrous Metals 32 232588 1.72E-05 0.001

Smelting & Pressing of Nonferrous Metals 33 489096 1.64E-05 0.002

Metal Products 34 1070748 1.68E-05 0.001

Machinery & Equipment Manufacturing 35 1884769 1.59E-05 0.002

Special Equipment Manufacturing 36 2352880 1.66E-05 0.002

Transportation Equipment Manufacturing 37 1213107 1.65E-05 0.002

Electric Equipment & Machinery 39 1407264 1.71E-05 0.002

Electronic & Telecommunications 40 684292 2.05E-05 0.003

Instruments, Meters, Cultural & Official Machinery 41 1058784 1.98E-05 0.003

Artwork & Other Manufacturing 42 601308 2.00E-05 0.002

ALL NA 23,434,810 1.70E-05 0.002

1

Table OA2. OLS Estimates - Firm Birth Share (Single Establishment)

(1) (2) (3) (4) (5) (6) (7) (8)Timber

Garments & Leather, Furs, Processing,Name Food Processing Food Production Beverage Tobacco Textile Industry Other Fiber Down & Related Bamboo, Cane,

Production Processing Products Products Palm Fiber &Straw Products

code 13 14 15 16 17 18 19 20

Localization Effects0-1 km 4.47E-05 6.87E-05 8.93E-05 -2.63E-05 7.87E-05 2.96E-05 0.00014873 6.91E-05

(6.86) (5.08) (4.49) (-1.29) (7.55) (4.41) (3.48) (7.84)1-5 km 2.61E-06 4.30E-06 3.52E-06 2.55E-05 4.67E-06 2.82E-06 -3.28E-06 5.01E-06

(2.82) (2.28) (2.11) (0.63) (3.16) (2.09) (-0.67) (3.43)5-10 km 2.07E-06 9.32E-07 -2.31E-07 -9.86E-06 -3.94E-07 -2.05E-07 4.16E-07 2.65E-06

(2.71) (0.72) (-0.14) (-0.54) (-0.33) (-0.23) (0.21) (2.76)10-20 km 1.21E-06 1.76E-06 2.82E-07 2.05E-07 -3.22E-07 2.69E-07 -1.82E-06 2.65E-06

(1.23) (1.07) (0.16) (0.01) (-0.23) (0.20) (-0.99) (2.58)20-30 km -5.52E-07 3.70E-07 -6.47E-07 7.95E-06 -2.30E-06 -4.29E-06 -1.50E-06 1.56E-06

(-0.44) (0.19) (-0.38) (0.54) (-1.61) (-2.43) (-0.53) (1.32)

Urbanization Effects0-1 km 6.58E-06 1.26E-05 1.43E-05 7.08E-05 1.37E-05 1.53E-05 1.27E-05 6.00E-06

(4.51) (6.00) (4.58) (1.42) (7.10) (3.77) (3.50) (4.92)1-5 km -4.14E-06 -2.41E-06 -4.75E-06 -7.54E-05 -3.72E-06 -5.80E-06 -4.05E-06 -5.63E-06

(-4.74) (-2.19) (-2.04) (-1.58) (-3.47) (-2.87) (-2.01) (-5.21)5-10 km -2.23E-06 2.13E-08 -1.59E-06 8.98E-05 -4.81E-07 -1.17E-06 -4.48E-06 6.98E-07

(-1.73) (0.02) (-0.75) (1.90) (-0.44) (-0.51) (-1.40) (0.61)10-20 km -3.60E-06 -6.55E-06 -3.47E-06 -5.73E-06 -3.22E-06 -7.12E-06 -5.33E-06 -3.73E-06

(-2.10) (-3.46) (-1.44) (-0.06) (-2.50) (-2.46) (-1.17) (-2.86)20-30 km -1.36E-06 5.52E-07 -8.12E-07 -0.00024592 -3.05E-06 1.25E-06 9.61E-07 -6.68E-07

(-0.76) (0.31) (-0.30) (-1.36) (-1.67) (0.44) (0.23) (-0.38)

Observations 187,755 1,163,883 747,708 38,298 1,185,280 164,805 484,010 477,208Adj. R-squared 0.0015 0.0007 0.0013 0.0033 0.0023 0.0025 0.0031 0.0068


industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.4 Sample is restricted to firms with a single establishment.

2

Table OA2 (Continued). OLS Estimates - Firm Birth Share (Single Establishment)




Code 21 22 23 24 25 26 27 28

Localization Effects0-1 km 3.72E-05 2.59E-05 2.28E-05 0.00014626 9.98E-05 6.32E-05 3.87E-05 0.00033713

(3.83) (5.05) (2.30) (2.46) (3.53) (8.56) (4.59) (2.17)1-5 km 3.61E-06 2.07E-06 1.63E-07 5.44E-06 4.75E-06 2.61E-06 1.16E-06 7.10E-06

(1.51) (2.14) (0.10) (0.96) (1.74) (2.43) (1.17) (0.81)5-10 km 3.39E-06 2.50E-08 -1.74E-06 -3.54E-07 2.51E-06 1.59E-07 -6.21E-07 -4.51E-06

(2.51) (0.03) (-1.07) (-0.06) (0.90) (0.21) (-0.74) (-0.90)10-20 km -5.74E-07 4.72E-07 -1.73E-06 -5.99E-06 2.30E-07 -2.14E-07 5.99E-07 1.39E-06

(-0.18) (0.49) (-0.83) (-1.00) (0.11) (-0.31) (0.86) (0.30)20-30 km -1.19E-06 -1.51E-06 1.29E-07 -1.16E-05 -2.65E-07 -2.54E-06 -2.03E-06 -2.41E-09

(-0.64) (-1.84) (0.08) (-1.67) (-0.09) (-3.12) (-2.23) (0.00)


(3.35) (3.88) (5.55) (4.52) (2.64) (7.76) (6.90) (2.29)1-5 km -6.66E-06 -2.60E-06 1.48E-06 -6.24E-07 -5.39E-06 -1.65E-06 -4.48E-06 -6.93E-06

(-1.55) (-1.30) (0.66) (-0.19) (-2.36) (-2.64) (-3.19) (-0.95)5-10 km -3.82E-06 -3.86E-06 -3.15E-06 -5.07E-06 -2.65E-06 -2.78E-06 -3.58E-06 -6.98E-06

(-0.61) (-1.73) (-1.13) (-1.12) (-0.68) (-3.23) (-1.74) (-0.91)10-20 km 1.37E-06 -6.53E-07 -2.05E-06 -8.30E-06 -8.44E-06 -1.62E-06 -5.94E-06 1.94E-05

(0.20) (-0.25) (-0.63) (-1.78) (-1.59) (-1.28) (-2.55) (1.35)20-30 km -1.26E-06 -3.58E-06 -4.29E-06 -5.54E-06 1.28E-06 -2.41E-06 1.97E-06 -2.53E-05

(-0.24) (-1.62) (-0.79) (-1.11) (0.23) (-1.87) (0.86) (-1.77)

Observations 248,240 294,880 261,405 574,448 182,448 1,947,930 360,522 237,055Adj. R-squared 0.0005 0.0007 0.0004 0.0007 0.0025 0.0011 0.0019 0.0066


senting industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.4 Sample is restricted to firms with a single establishment.

3




MetalsCode 29 30 31 32 33 34 35 36

Localization Effects0-1 km 0.00016478 4.04E-05 6.74E-05 4.28E-05 6.78E-05 4.53E-05 6.25E-05 8.86E-05

(4.51) (7.01) (8.73) (5.39) (4.85) (6.97) (8.90) (6.01)1-5 km 6.00E-06 2.80E-06 6.73E-06 4.14E-06 4.47E-06 4.75E-06 4.10E-06 3.61E-06

(2.53) (3.63) (5.57) (3.16) (2.65) (4.84) (5.89) (2.72)5-10 km 3.51E-06 1.61E-06 3.40E-06 -2.33E-07 -3.27E-07 2.64E-07 1.53E-06 1.60E-06

(1.59) (2.14) (4.01) (-0.28) (-0.25) (0.35) (2.93) (1.10)10-20 km 3.75E-06 1.47E-06 9.00E-07 -1.49E-07 -2.77E-06 1.09E-07 3.23E-08 -1.69E-06

(2.02) (2.35) (0.89) (-0.18) (-1.99) (0.11) (0.02) (-1.28)20-30 km 1.70E-06 1.66E-07 -9.04E-09 -8.51E-07 -1.34E-06 -1.17E-06 -1.84E-06 -3.15E-06

(0.73) (0.18) (-0.01) (-0.76) (-0.79) (-1.05) (-0.97) (-2.35)


(4.67) (9.85) (9.00) (4.47) (4.99) (8.78) (10.82) (9.14)1-5 km -3.96E-06 -2.92E-06 -3.15E-06 -2.60E-06 -5.62E-06 -4.19E-06 -2.58E-06 -1.72E-06

(-2.27) (-4.49) (-4.25) (-1.75) (-2.52) (-3.26) (-3.77) (-1.95)5-10 km -9.76E-07 -2.24E-06 4.55E-07 -3.91E-06 -2.68E-07 -9.17E-07 -9.63E-07 -2.69E-06

(-0.39) (-2.76) (0.49) (-1.91) (-0.06) (-0.99) (-1.32) (-2.89)10-20 km -7.37E-06 -1.41E-06 -4.31E-06 -4.19E-06 -7.14E-06 -1.86E-06 -3.54E-06 -2.27E-06

(-1.99) (-1.20) (-3.44) (-1.59) (-1.55) (-1.34) (-3.50) (-1.80)20-30 km -5.16E-06 -4.17E-06 -2.20E-06 -6.52E-07 -3.62E-06 1.25E-06 -6.18E-07 -6.80E-06

(-1.22) (-2.98) (-1.88) (-0.21) (-0.51) (0.79) (-0.61) (-1.87)

Observations 450,945 536,499 2,023,470 231,468 485,208 1,066,302 1,881,762 2,338,480Adj. R-squared 0.0043 0.0054 0.0019 0.0043 0.0018 0.0013 0.0012 0.0018


industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.4 Sample is restricted to firms with a single establishment.

4




Code 37 39 40 41 42

Localization Effects0-1 km 9.05E-05 5.18E-05 0.00011004 0.00015562 0.00017832

(4.90) (7.65) (4.85) (2.82) (5.89)1-5 km 8.17E-06 3.78E-06 4.12E-06 6.31E-06 5.77E-06

(2.70) (3.43) (1.65) (1.15) (2.29)5-10 km 1.89E-06 1.48E-06 -5.40E-07 4.07E-06 7.44E-06

(1.18) (2.00) (-0.27) (1.42) (3.80)10-20 km -2.48E-06 1.11E-06 -2.00E-06 1.06E-06 3.79E-06

(-1.24) (1.54) (-0.84) (0.41) (2.05)20-30 km -3.47E-06 6.96E-07 -4.13E-06 -3.45E-06 5.61E-07

(-1.91) (0.77) (-1.96) (-0.78) (0.31)


(5.47) (7.75) (4.88) (4.68) (5.31)1-5 km -1.71E-06 -1.71E-06 -3.08E-06 1.68E-06 -7.58E-06

(-1.69) (-1.71) (-1.43) (0.64) (-3.71)5-10 km -7.94E-07 -1.87E-06 4.82E-08 -5.29E-06 -1.98E-06

(-0.76) (-2.05) (0.02) (-1.36) (-0.88)10-20 km -5.73E-06 -2.27E-06 -3.67E-06 -5.14E-06 -3.74E-08

(-2.70) (-1.34) (-0.93) (-1.55) (-0.01)20-30 km -6.44E-08 -7.38E-07 -4.03E-06 -5.25E-06 -4.67E-06

(-0.03) (-0.27) (-1.03) (-1.41) (-0.97)

Observations 1,207,689 1,403,952 679,714 1,052,592 597,576Adj. R-squared 0.0012 0.001 0.0019 0.0007 0.0048

1 Coefficients reported are ring level localization and urbanization effects obtained by OLS estimation of equation (3.1) for eachtwo-digit industry.

2 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of eachgrid, Herfindahl index representing industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digitindustry fixed effects.

3 Numbers in parentheses are t-statistics clustered at the grid level.4 Sample is restricted to firms with a single establishment.

5

Table OA3. OLS Estimates - Firm Birth Share (Non-SOEs)

(1) (2) (3) (4) (5) (6) (7) (8)Timber

Garments & Leather, Furs, Processing,Name Food Processing Food Production Beverage Tobacco Textile Industry Other Fiber Down & Related Bamboo, Cane,

Production Processing Products Products Palm Fiber &Straw Products

code 13 14 15 16 17 18 19 20

Localization Effects0-1 km 4.60E-05 5.28E-05 8.78E-05 -2.89E-05 7.92E-05 2.92E-05 0.00014555 6.79E-05

(6.87) (5.47) (4.62) (-1.36) (7.78) (4.44) (3.5) (7.92)1-5 km 2.28E-06 4.55E-06 2.86E-06 -1.96E-05 4.27E-06 2.81E-06 -3.14E-06 5.28E-06

(2.58) (2.32) (1.78) (-1.41) (3) (2.12) (-0.64) (3.74)5-10 km 1.94E-06 3.01E-07 -9.34E-07 -1.30E-05 -1.93E-07 -2.19E-07 2.62E-07 2.68E-06

(2.62) (0.26) (-0.64) (-1.50) (-0.17) (-0.25) (0.13) (2.91)10-20 km 1.04E-06 1.27E-06 -4.56E-07 -3.62E-06 -5.69E-07 2.52E-07 -6.66E-07 2.53E-06

(1.14) (0.99) (-0.27) (-1.05) (-0.41) (0.19) (-0.37) (2.65)20-30 km -8.59E-07 -1.96E-07 -6.41E-07 -1.91E-06 -2.41E-06 -4.35E-06 -1.36E-06 1.81E-06

(-0.75) (-0.14) (-0.37) (-0.70) (-1.74) (-2.48) (-0.49) (1.60)


(4.55) (5.80) (4.45) (1.26) (7.11) (3.76) (3.53) (5.20)1-5 km -4.25E-06 -2.54E-06 -4.77E-06 -4.50E-05 -3.85E-06 -5.83E-06 -3.92E-06 -5.77E-06

(-4.89) (-2.33) (-2.06) (-1.15) (-3.60) (-2.88) (-1.85) (-5.62)5-10 km -2.29E-06 -1.45E-07 -9.76E-07 7.32E-05 -4.11E-07 -1.43E-06 -5.08E-06 4.79E-07

(-1.85) (-0.11) (-0.47) (1.69) (-0.42) (-0.62) (-1.47) (0.43)10-20 km -3.46E-06 -6.46E-06 -3.13E-06 -7.04E-05 -3.28E-06 -6.42E-06 -5.74E-06 -3.36E-06

(-2.12) (-3.49) (-1.37) (-1.25) (-2.6) (-2.28) (-1.3) (-2.69)20-30 km -1.21E-06 8.15E-07 -1.60E-06 -9.81E-05 -2.79E-06 1.18E-06 3.07E-08 -1.43E-06

(-0.75) (0.45) (-0.64) (-0.87) (-1.59) (0.41) (0.01) (-0.84)


industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.4 Sample is restricted to non-SOEs.

6

Table OA3 (Continued). OLS Estimates - Firm Birth Share (Non-SOEs)




Code 21 22 23 24 25 26 27 28

Localization Effects0-1 km 3.52E-05 2.64E-05 2.22E-05 0.00014124 0.00010132 6.39E-05 3.97E-05 0.00032937

(3.79) (5.23) (2.23) (2.45) (3.53) (8.62) (4.64) (2.17)1-5 km 4.73E-06 2.06E-06 1.95E-06 5.28E-06 6.48E-06 2.39E-06 6.44E-07 7.25E-06

(1.93) (2.13) (0.97) (0.95) (2.25) (2.27) (0.67) (0.84)5-10 km 3.10E-06 -5.94E-08 -2.06E-06 -4.85E-07 2.43E-06 2.95E-07 -1.60E-07 -4.58E-06

(2.37) (-0.08) (-1.21) (-0.08) (0.86) (0.41) (-0.19) (-0.95)10-20 km -2.76E-07 3.52E-07 -1.79E-06 -5.80E-06 1.22E-06 -1.62E-07 1.75E-07 7.07E-06

(-0.09) (0.37) (-0.87) (-1.01) (0.62) (-0.25) (0.26) (1.15)20-30 km -1.25E-07 -1.60E-06 -2.82E-07 -1.16E-05 1.74E-07 -2.46E-06 -2.67E-06 1.37E-06

(-0.08) (-2.01) (-0.18) (-1.73) (0.07) (-3.26) (-2.94) (0.39)


(3.34) (3.63) (5.6) (4.54) (2.59) (7.59) (6.67) (2.17)1-5 km -6.60E-06 -2.21E-06 1.00E-06 -7.90E-07 -5.15E-06 -1.76E-06 -4.79E-06 -5.93E-06

(-1.55) (-1.07) (0.45) (-0.24) (-2.25) (-2.81) (-3.41) (-0.83)5-10 km -3.80E-06 -3.21E-06 -3.22E-06 -5.29E-06 -3.66E-06 -2.65E-06 -3.19E-06 -4.23E-06

(-0.61) (-1.55) (-1.16) (-1.2) (-0.93) (-3.08) (-1.57) (-0.58)10-20 km 1.50E-06 -7.09E-07 -2.57E-06 -7.99E-06 -6.22E-06 -1.35E-06 -5.90E-06 1.68E-05

(0.22) (-0.27) (-0.81) (-1.81) (-1.22) (-1.07) (-2.63) (1.25)20-30 km -9.56E-07 -3.09E-06 -2.83E-06 -5.90E-06 7.39E-07 -2.63E-06 2.93E-06 -2.30E-05

(-0.18) (-1.4) (-0.53) (-1.25) (0.13) (-1.99) (1.32) (-1.65)

1 Coefficients reported are ring level localization and urbanization effects obtained by OLS estimation of equation (3.1) for each two-digit industry.2 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 km of each grid, Herfindahl index

representing industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.4 Sample is restricted to non-SOEs.

7




MetalsCode 29 30 31 32 33 34 35 36

Localization Effects0-1 km 0.00016785 3.97E-05 6.73E-05 4.16E-05 6.79E-05 4.38E-05 6.20E-05 8.85E-05

(4.58) (7) (8.73) (5.42) (4.84) (6.92) (8.86) (6.09)1-5 km 4.97E-06 2.68E-06 6.47E-06 4.23E-06 4.32E-06 4.62E-06 4.24E-06 3.47E-06

(2.12) (3.53) (5.46) (3.21) (2.45) (4.74) (5.67) (2.66)5-10 km 3.29E-06 1.63E-06 3.39E-06 5.09E-11 1.57E-07 1.19E-07 2.79E-06 1.39E-06

(1.54) (2.18) (4.08) (0.00) (0.12) (0.16) (2.62) (1.01)10-20 km 3.90E-06 1.36E-06 7.17E-07 1.65E-08 -1.63E-06 9.66E-08 7.84E-07 -1.37E-06

(2.12) (2.21) (0.75) (0.02) (-1.16) (0.10) (0.76) (-1.12)20-30 km 1.36E-06 1.56E-07 -2.83E-07 -6.03E-07 5.02E-07 -8.32E-07 -7.54E-07 -3.03E-06

(0.59) (0.17) (-0.26) (-0.58) (0.30) (-0.78) (-0.71) (-2.51)


(4.67) (9.82) (8.85) (4.4) (5.15) (8.79) (10.78) (9.13)1-5 km -3.85E-06 -2.90E-06 -3.17E-06 -2.62E-06 -5.57E-06 -4.20E-06 -2.64E-06 -1.69E-06

(-2.29) (-4.53) (-4.3) (-1.76) (-2.51) (-3.28) (-3.99) (-1.95)5-10 km -3.43E-07 -2.24E-06 5.36E-07 -4.13E-06 -5.48E-07 -9.73E-07 -1.09E-06 -2.71E-06

(-0.14) (-2.79) (0.58) (-2.04) (-0.12) (-1.05) (-1.49) (-2.93)10-20 km -7.71E-06 -1.18E-06 -4.29E-06 -3.96E-06 -7.21E-06 -1.99E-06 -3.26E-06 -2.19E-06

(-2.05) (-1.01) (-3.42) (-1.51) (-1.6) (-1.46) (-3.27) (-1.71)20-30 km -4.88E-06 -4.14E-06 -1.91E-06 -1.15E-06 -3.96E-06 7.91E-07 -6.88E-07 -6.84E-06

(-1.17) (-2.95) (-1.67) (-0.39) (-0.59) (0.52) (-0.66) (-1.93)


senting industry diversity within 30 km of each grid, prefecture city fixed effects, and four-digit industry fixed effects.3 Numbers in parentheses are t-statistics clustered at the grid level.4 Sample is restricted to non-SOEs.

8




Code 37 39 40 41 42

Localization Effects0-1 km 9.76E-05 5.01E-05 0.0001098 0.00015154 0.00017236

(4.65) (7.84) (4.93) (2.83) (6.02)1-5 km 6.09E-06 3.64E-06 3.53E-06 5.88E-06 5.15E-06

(2.15) (3.39) (1.4) (1.11) (2.15)5-10 km 1.71E-06 1.63E-06 -7.16E-07 3.01E-06 7.32E-06

(1.08) (2.21) (-0.36) (1.16) (3.89)10-20 km -1.76E-06 8.00E-07 -2.54E-06 7.13E-07 2.72E-06

(-0.86) (1.17) (-1.14) (0.30) (1.51)20-30 km -2.67E-06 3.76E-07 -4.99E-06 -3.85E-06 -5.01E-07

(-1.66) (0.43) (-2.43) (-0.94) (-0.29)


(5.64) (7.79) (4.76) (4.61) (5.4)1-5 km -1.56E-06 -1.79E-06 -2.89E-06 1.65E-06 -8.18E-06

(-1.52) (-1.78) (-1.35) (0.63) (-4.06)5-10 km -8.59E-07 -1.93E-06 2.50E-07 -5.53E-06 -8.99E-07

(-0.79) (-2.22) (0.12) (-1.41) (-0.45)10-20 km -5.64E-06 -2.58E-06 -3.33E-06 -4.68E-06 -1.10E-06

(-2.67) (-1.54) (-0.88) (-1.39) (-0.36)20-30 km -8.31E-07 -8.21E-07 -3.96E-06 -6.02E-06 -5.95E-06

(-0.31) (-0.31) (-1.03) (-1.66) (-1.24)

1 Coefficients reported are ring level localization and urbanization effects obtained by OLS estimation of equation (3.1) foreach two-digit industry.

2 Control variables include Herfindahl index representing industry organization for each four-digit industry within 30 kmof each grid, Herfindahl index representing industry diversity within 30 km of each grid, prefecture city fixed effects, andfour-digit industry fixed effects.

3 Numbers in parentheses are t-statistics clustered at the grid level.4 Sample is restricted to non-SOEs.

9

Table OA4.1. Spatial Decay Speed with Negative Linear Distance Decay Function Based on IV Lasso Estimates

(1) (2) (3) (4) (5) (6) (7) (8)Timber



code 13 14 15 16 17 18 19 20

Decay Speed 6.89E-07 8.69E-07 6.80E-07 3.27E-07 7.71E-07 9.08E-08 1.04E-06 8.98E-07(0.75) (0.85) (0.64) (0.11) (0.72) (0.09) (0.74) (1.06)




Code 21 22 23 24 25 26 27 28



Name Rubber Products Plastic Products Nonmetal Pressing of Pressing of Metal Products Equipment EquipmentMineral Products Ferrous Metals Nonferrous Manufacturing Manufacturing

MetalsCode 29 30 31 32 33 34 35 36




Code 37 39 40 41 42

Decay Speed 9.78E-07 6.56E-07 1.07E-06 1.76E-06 2.98E-06(0.71) (0.67) (0.89) (1.09) (1.83)


2 The spatial decay function is specified as f(d) = −d.3 Numbers in parentheses are t-statistics.

10

Table OA4.2. Spatial Decay Speed with Negative Square Distance Decay Function Based on IV Lasso Estimates

(1) (2) (3) (4) (5) (6) (7) (8)Timber



code 13 14 15 16 17 18 19 20





Code 21 22 23 24 25 26 27 28




MetalsCode 29 30 31 32 33 34 35 36




Code 37 39 40 41 42



2 The spatial decay function is specified as f(d) = −d2.3 Numbers in parentheses are t-statistics.

11

Table OA4.3. Spatial Decay Speed with Inverse Square Exponential Distance Decay Function Based on IV Lasso Estimates

(1) (2) (3) (4) (5) (6) (7) (8)Timber



code 13 14 15 16 17 18 19 20

Decay Speed 0.000114 0.000163 0.000245 -0.000030 0.000224 0.000170 0.000290 0.000126(1.67) (2.20) (2.32) (-0.24) (2.63) (1.93) (2.25) (2.00)




Code 21 22 23 24 25 26 27 28

Decay Speed 0.000214 0.000107 0.000181 0.000309 0.000300 0.000211 0.000189 0.000370(2.17) (1.68) (1.91) (2.35) (2.31) (3.14) (2.46) (1.99)



MetalsCode 29 30 31 32 33 34 35 36

Decay Speed 0.000210 0.000147 0.000184 0.000145 0.000185 0.000185 0.000195 0.000317(2.05) (2.26) (2.76) (1.99) (2.22) (2.72) (2.89) (3.50)



Code 37 39 40 41 42

Decay Speed 0.000249 0.000216 0.000296 0.000238 0.000335(2.42) (3.07) (2.68) (2.11) (3.08)


2 The spatial decay function is specified as f(d) = 1/e2d.3 Numbers in parentheses are t-statistics.

12

Table OA4.4. Spatial Decay Speed with Negative Cube Distance Decay Function Based on IV Lasso Estimates

(1) (2) (3) (4) (5) (6) (7) (8)Timber



code 13 14 15 16 17 18 19 20





Code 21 22 23 24 25 26 27 28




MetalsCode 29 30 31 32 33 34 35 36




Code 37 39 40 41 42



2 The spatial decay function is specified as f(d) = −d3.3 Numbers in parentheses are t-statistics.

13

Table OA4.5. Spatial Decay Speed with Inverse Cube Distance Decay Function Based on IV Lasso Estimates

(1) (2) (3) (4) (5) (6) (7) (8)Timber



code 13 14 15 16 17 18 19 20

Decay Speed 0.000044 0.000061 0.000085 -0.000024 0.000078 0.000029 0.000146 0.000064(9.04) (9.20) (10.55) (-2.72) (12.95) (5.98) (12.02) (11.54)




Code 21 22 23 24 25 26 27 28

Decay Speed 0.000033 0.000026 0.000022 0.000144 0.000093 0.000062 0.000040 0.000318(5.69) (6.10) (3.76) (10.01) (9.58) (12.18) (7.35) (13.89)



MetalsCode 29 30 31 32 33 34 35 36

Decay Speed 0.000161 0.000038 0.000063 0.000040 0.000068 0.000043 0.000059 0.000087(14.28) (8.47) (12.18) (7.76) (9.71) (8.92) (11.81) (12.21)



Code 37 39 40 41 42

Decay Speed 0.000087 0.000048 0.000109 0.000145 0.000168(11.04) (10.10) (12.29) (10.68) (16.66)


2 The spatial decay function is specified as f(d) = 1/d3.3 Numbers in parentheses are t-statistics.

14

Table OA4.6. Spatial Decay Speed with Inverse Cube Exponential Distance Decay Function Based on IV Lasso Estimates

(1) (2) (3) (4) (5) (6) (7) (8)Timber



code 13 14 15 16 17 18 19 20

Decay Speed 0.000885 0.001229 0.001704 -0.000487 0.001561 0.000584 0.002930 0.001274(8.35) (8.50) (9.74) (-2.53) (11.95) (5.50) (11.13) (10.66)




Code 21 22 23 24 25 26 27 28

Decay Speed 0.000665 0.000520 0.000434 0.002867 0.001854 0.001246 0.000806 0.006355(5.25) (5.62) (3.46) (9.21) (8.83) (11.24) (6.79) (12.82)



MetalsCode 29 30 31 32 33 34 35 36

Decay Speed 0.003227 0.000764 0.001260 0.000800 0.001361 0.000849 0.001172 0.001731(13.19) (7.82) (11.22) (7.15) (8.95) (8.22) (10.89) (11.26)



Code 37 39 40 41 42

Decay Speed 0.001743 0.000968 0.002175 0.002909 0.003364(10.17) (9.32) (11.34) (9.85) (15.39)


2 The spatial decay function is specified as f(d) = 1/e3d.3 Numbers in parentheses are t-statistics.

15

Table OA5. Spatial Decay Speed And Industry Characteristics (Non-SOEs)

Regression Using OLS Estimates of Localization Effects(1) (2) (3) (4)

f(d) 0.000092 0.0000911 0..0001066 0.0000878(9.33) (9.12 ) (10.43) (7.57)


f(d)× labor market pooling 0.0000377 0.0000226(3.24) (0.91 )

f(d)× input sharing -7.63E-06 6.02E-06( -0.69) (0.50)

f(d)× natural advantage 6.07E-06 0.0000124 7.74E-06 8.08E-06( 0.59) ( 1.17 ) (0.69) (0.71)

f(d)× high SOE share -0.0000456 -0.0000526 -0.0000347 -0.0000512(-4.20) ( -4.48 ) (-3.24) ( -4.11 )

Constant -9.32E-06 -9.32E-06 -9.32E-06 -9.32E-06(-3.06) (-3.06) ( -2.96 ) ( -3.05 )

Adj. R-squared 0.5899 0.5889 0.5596 0.5866

1 Results are obtained by OLS estimation of (3.3), where localization effects from OLSestimation for the sample of non-SOEs are regressed on a spatial decay function, andinteraction terms of the decay function and various industry characteristic indicators.

2 The spatial decay function is specified as f(d) = 1/d2.3 For each two-digit industry, the indicator for reliance on knowledge spillovers equals one

if the ratio of new product to total product in the industry is higher than the median ofall industries and zero otherwise. The indicator for reliance on labor pooling equals oneif the percentage of collage-educated workers in the industry is higher than the medianof all industries and zero otherwise. The indicator for reliance on input sharing equalsone if transportation cost per shipment in the industry is higher than the median of allindustries and zero otherwise. The indicator for reliance on natural advantage equalsone if at least two of the three cost variables (water, energy, and natural resources costper shipment) in the industry are higher than the median of all industries and zerootherwise. The indicator variable high SOE share equals one if the percentage of SOEfirms in the industry is higher than the median of all industries and zero otherwise.

4 Numbers in parentheses are t-statistics.5 Sample is restricted to non-SOE firms.

16

13.Food Processing 14.Food Production

15.Beverage Production 16.Tobacco Processing

17.Textile Industry 18.Garments & Other Fibre Products

19.Leather, Furs, Down & Related Products 20.Timber Processing, Bamboo, Cane, Palm Fibre &Straw Products

17

21.Furniture Manufacturing 22.Papermaking & Paper Products

23.Printing & Record Pressing 24.Stationery, Educational & Sports Goods

25.Petroleum Processing, Coking Products, Gas Pro-duction & Supply

26.Raw Chemical Materials & Chemical Products

27.Medical & Pharmaceutical Products 28.Chemical Fibres

18

29.Rubber Products 30.Plastic Products

31.Non-metal Mineral Products 32.Smelting & Pressing of Ferrous Metals

33.Smelting & Pressing of Nonferrous Metals 34.Metal Products

35.Machinery & Equipment Manufacturing 36.Special Equipment Manufacturing

19

37.Transportation Equipment Manufacturing 39.Electric Equipment & Machinery

40.Electronic & Telecommunications 41.Instruments, Meters, Cultural & Official Machin-ery

42.Artwork & Other Manufacturing

Figure OA1: Concentration of Prefecture City Level Employment by Industry

20

Date post:	22-Aug-2020
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

Attenuation of Agglomeration Economies: Evidence from the ...types. Second, the spatial decay speed...

Documents