Download - Structural Homophily · of homophily has empirical power. I build a strategic model of network formation, which produces a unique equilibrium network. Individuals have homophilic

Structural Homophily∗

Vincent Boucher1

Abstract

Homophily, or the fact that similar individuals tend to interact with each other, is a

prominent feature of economic and social networks. I show that the equilibrium structure

of homophily has empirical power. I build a strategic model of network formation, which

produces a unique equilibrium network. Individuals have homophilic preferences and face

capacity constraints on the number of links. I develop a novel empirical method, based on

the shape of the equilibrium network, which allows for the identification and estimation of

the underlying homophilic preferences. I apply this new methodology to the formation of

friendship networks.

JEL Codes: D85,C72,C13

∗Manuscript received May 2012, revised July 2013 and January 2014.1I would like to thank Yann Bramoulle, Marc Henry and Onur Ozgur for sharing their time, and

for their valuable comments and discussions. I also want to thank Lars Ehlers his precious help, andfor many discussions and comments. Thanks also to Paolo Pin for his helpful comments. I thank co-editor Holger Sieg and three anonymous referees for their valuable comments and suggestions. I wouldalso like to thank Ismael Mourifie, Louis-Philippe Beland, Yousef Msaid and David Karp, as well as theparticipants at many seminars including members of the Departments of Economics of Colorado StateUniversity, McGill University, Universite Laval and Universite de Sherbrooke. I would also like to thankthe participants at various conferences, including those of the Canadian Economics Association (2011),Coalition Theory Network (2010), Societe canadienne de science economique (2010), Econcon (2011),and Groupe de recherche international (2011), and the Cambridge-INET workshop on Networks (2012),for their questions, comments and suggestions. Finally, I gratefully acknowledge financial support fromCIREQ, FRQSC, SSHRC, and Cambridge-INET. This research uses data from Add Health, a programproject directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, andKathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development,with cooperative funding from 23 other federal agencies and foundations.

1 Introduction

The fact that similar individuals tend to interact with each other is a prominent feature of

social networks. The phenomenon, referred to as homophily, is increasingly being studied

by economists.2 Indeed, the structure of the social networks in which individuals inter-

act has been shown to significantly influence many social outcomes such as segregation,3

information transmission and learning,4 and employment and wages.5 Being able to un-

derstand, identify, and measure how the social characteristics of an individual influence

network formation is therefore of central importance to understanding social outcomes.

However, most studies to date have overlooked the equilibrium implications of homophily,

and disregard key factors such as the impact of time constraints.

In this paper, I develop a model of strategic network formation incorporating homophilic

preferences and capacity constraints on the number of links formed in a network. My anal-

ysis uncovers novel structural predictions generated by the equilibrium interplay between

the individuals’ homophilic preferences and capacity constraints. Building on the explicit

structure of homophily obtained at equilibrium, I develop a new estimation technique that

allows one to recover underlying homophilic preferences parameters. As an illustration, I

study the impact of homophilic preferences on the formation of friendship networks among

American teenagers. I find that individuals’ choices of friends are most strongly influenced

by racial considerations, and that Hispanics and Blacks are much more segregated than

Asians. I also find that age, gender and socio-economic differences significantly affect the

formation of friendship networks.

The emphasis on the equilibrium implications of homophilic preferences is new to the

literature. The equilibrium network resulting from the theoretical model exhibits more

structure than the known stylized facts regarding homophilic patterns in social networks.6

The equilibrium network architecture allows for an original empirical methodology using a

maximum likelihood approach. A key feature of my estimation strategy is that it recovers

2See for example Currarini et al. (2009), Bramoulle et al. (2012), and Rivas (2009).3Echenique and Fryer (2007), Watts (2007), and Mele (2013).4Golub and Jackson (2010a,2010b).5van der Leij et al. (2009) and Patacchini and Zenou (2012).6See Bramoulle et al. (2012), and Currarini et al (2009).

1

explicit preference parameters characterizing homophily in social networks. In other words,

the estimation strategy allows for the identification of (homophilic) preference interactions

from constraint interactions.7

My theoretical framework produces strong predictions. There exists a generically

unique equilibrium network. A key assumption is that the homophilic preferences of in-

dividuals can be represented by a distance function on the set of characteristics of the

individuals. This idea is implicitly or explicitly exploited by many papers looking at ho-

mophily in social networks.8 This assumption allows me to introduce enough heterogeneity

in the model to account for individuals’ observed heterogeneity (such as socio-economic and

demographic variables). I also assume that individuals have link-separable utilities, and

an explicit resource constraint, such as time. For example, while a teenager may prefer to

be friends with other teenagers who have similar characteristics, he must take into account

the fact that he has limited time to spend with the friends he chooses to have. The in-

troduction of the resource constraint implies that individuals’ equilibrium payoffs do not

only depend on the direct links they have, but on the whole equilibrium network. The

resource constraint also explicitly introduces an upper bound on the number of bilateral

relationships an individual can sustain.9 The specific notion of homophily emerging in

equilibrium results from the tension between individuals’ homophilic preferences and the

resource constraint. These two premises imply a novel theoretical prediction on the shape

of homophily in equilibrium. I call this specific network architecture structural homophily.

Structural homophily describes an explicit relationship between the individuals’ socioe-

conomic characteristics and the network architecture. An individual is characterized by

a “social neighborhood” on the space of individual characteristics.10 This neighborhood

explicitly determines the set of acceptable bilateral relationships. In a network charac-

terized by structural homophily, two individuals are linked if and only if they belong to

7Manski (2000) distinguishes between three sources of social interactions: preference interactions, con-straint interactions, and expectations interactions.

8See for example, Johnson and Gilles (2000), Marmaros and Sacerdote (2006), Iijima and Kamada(2013), Mele (2013) and Christakis et al. (2010).

9It relates to the sociological and psychological observation referred to as Dunbar’s number.10The social neighborhood relates to the sociological notion of a “social niche”; see for example McPher-

son et al. (2001).

2

the intersection of their neighborhoods. These neighborhoods are not directly observable,

but are implied by equilibrium predictions of the theoretical model for a given a distance

function. This novel theoretical prediction has empirical power.

I use structural homophily to develop an original estimation strategy. This strategy is

based on the duality between the equilibrium network structure and structural homophily.

Any equilibrium network exhibits structural homophily, and any observed network that

exhibits structural homophily is an equilibrium network. I develop a maximum likelihood

approach, defined over a population of distinct social networks. The empirical method

allows for the identification and estimation of the effect of homophilic preferences, while

controlling for many unobserved individual variables. This is relevant for policymaking

since it provides insight into the relative importance of a set of socio-economic character-

istics. For instance, the method allows for the measurement of the magnitude of racial

segregation.

As an illustration, I use data from the friendship networks of American teenagers pro-

vided by the Add Health database.11 I find a strong influence of race in the choice of

friendships. In comparison, age and gender differences are less important. Among the

many individual characteristics, socio-economic status (in particular, the labour market

status of the parents) least affects the formation of friendships.

This paper contributes to the theoretical and the empirical literature on network for-

mation. Most theoretical models of network formation produce relatively structured equi-

librium networks such as stars, circles or chains.12 These models, although highly relevant

from a theoretical perspective, are not well suited for empirical purposes. Indeed, the re-

sulting set of equilibrium networks is both too large (many equilibrium networks) and too

constraining (stars, chains, circles, etc.) to represent actual observable social networks.

Most theoretical models assume that payoffs depend on detailed features of the network

structure, but neglect the capacity constraints on the number of links an individual can

11Carolina Population Center, University of North Carolina at Chapel Hill; see http://www.cpc.unc.

edu/projects/addhealth.12Bala and Goyal (2000), Jackson (2008, chapter 6), Jackson and Rogers (1997), Jackson and Wolinsky

(1996), and Johnson and Gilles (2000).

3

http://www.cpc.unc.edu/projects/addhealth

http://www.cpc.unc.edu/projects/addhealth

make.13 I show that the introduction of this constraint, combined with explicit ex-ante

homophilic and link-separable utilities, implies the existence of a unique equilibrium net-

work.14

Two competing explanations of homophily have been proposed. The first is through

correlations in the meeting process:15 individuals have no preference bias, but individuals

with similar characteristics have a higher probability of meeting. The second is through

preference biases:16 individuals prefer to link with similar individuals. In this paper, I

assume that individuals have homophilic preferences, but evolve in a deterministic world.

I analyze the equilibrium implication of these preferences in a fully strategic setting.

The empirical literature on network formation is still in an early stage. The few existing

papers clearly identify homophily as a driving factor of the network formation process.17

This paper contributes to the literature on strategic network formation by providing an

estimation strategy based on the equilibrium structure of homophilic preferences. Focussing

on homophily has the advantage of recovering information relevant for policy making, while

controlling for many unobservable variables. Equilibrium considerations are important, as

they imply a departure from link-level estimation techniques. Specifically, the equilibrium

utility of an individual does not only depend on his direct links, but also on the whole

distribution of links in the population. The model defines a precise dependence structure

that allows for the definition of a standard maximum likelihood estimator.18 Focussing on

homophily allows me to complement and precise existing results in the literature.

There is also a relatively new related literature on peer effects with endogenous social

networks.19 These papers model the formation of a network jointly with a peer effect game

on the realized network. Accordingly, they allow for the identification and estimation of

13Exceptions include Bloch and Dutta (2009) and Rubı-Barcelo (2010).14I concentrate on strategic models of network formation. A large literature exists on random network

formation, which is not directly concerned with the current setting. Interested readers can refer to Jackson(2008, chapters 4 and 5) and the references therein.

15See for example Bramoulle et al. (2012)16See also Currarini et al. (2009), and Mele (2013)17See for example Christakis et al. (2010), Mele (2013), Currarini et al. (2010), Franz et al. (2010), and

Sheng (2012).18As opposed to the Bayesian approaches, as in Christakis et al. (2010), and Mele (2013).19Badev (2013), Goldsmith-Pinkham and Imbens (2013), and Hsieh and Lee (2013).

4

a wide range of social interaction parameters, but are unfortunately strongly dependent

on the structural specifications of the preferences. Focussing on homophily has the ad-

vantage of capturing relevant preference parameters while allowing for a wide range of

non-homophilic preferences.

The remainder of the paper is organized as follows. In section 2, I present the theoretical

model and key definitions. In section 3, I find and characterize the (unique) equilibrium

network. In section 4, I describe the empirical methodology and explore its properties using

Monte Carlo simulations. In section 5, I present an application focussed on the formation

of friendship networks in high schools, and discuss policy implications. I conclude with

section 6.

2 The Theoretical Model

In this section, I present a model of network formation that characterizes the equilibrium

effects of homophily. The model generically produces a unique equilibrium. I first provide

a formal definition of structural homophily. Next, I outline the theoretical framework, and

finally, I briefly present the main definitions and equilibrium concepts.

2.1 Structural Homophily

Some preliminary assumptions are necessary in order to introduce the notion of structural

homophily. There is a finite set of individuals N . Individuals may be linked together

through a network. Let gi ⊆ N be the set of individuals linked to individual i for all i ∈ N .

Each individual i ∈ N is characterized by a type θi ∈ Θ, where Θ is the type space. An

individual’s type could represent, for instance, a series of socioeconomic characteristics. I

consider a distance d on Θ. For notational simplicity, let dij ≡ d(θi, θj) for any i, j ∈ N .

Then, structural homophily is defined as follows.

Definition 1 A network g exhibits structural homophily with respect to a distance

d(., .) if whenever two individuals, i and j, are not linked, either dij ≥ maxk∈gi{dik} or

dij ≥ maxk∈gj{djk}.

5

This definition formalizes the fact that two individuals that are “close” should be

linked.20 Intuitively, if two individuals are not linked, it is because, from the point of

view of one of the individuals, the other is located relatively too far. Note that this defini-

tion only makes sense when the creation of a link requires mutual consent. Figure 1 shows

two examples of networks for Θ = R2. The first network exhibits structural homophily, but

the second does not. In Figure 1b, the closest individuals (i.e. D and E) are not linked,

which is in contradiction with structural homophily since D is linked to C, and E is linked

to B.

Figure 1: Structural Homophily

(a) Respected

A

B

C D

E

(b) Violated

A

B

C D

E

More insight can be obtained by drawing the equivalence (or indifference) curves cor-

responding to the farthest link for each of the individuals considered (i.e. for B and D in

Figure 2a, and for D and E in Figure 2b). These equivalence curves define neighborhoods;

every individual inside the neighborhood of i is closer to i than i’s farthest link. If both in-

dividuals belong to the intersection of the two neighborhoods generated by the equivalence

curves (as in Figure 2b), then structural homophily is violated.21

Structural homophily has an implication for revealed preferences. Suppose that indi-

viduals have preferences over links with other individuals, and that such preferences are a

20Note that the concept of closeness and location, in the context of a social network, is not necessarilylimited to the physical location of individuals. Closeness refers to how similar two individuals are acrossa number of characteristics, such as age, race and gender.

21This closely relates to the cutoff rule of Iijima and Kamada (2013).

6

Figure 2: Structural Homophily: Equivalence Curves

(a) Respected

A

B

C D

E

(b) Violated

A

B

C D

E

function of the distance between the individuals. Suppose that we also observe the network

(i.e. the individuals and their links), and the types of the individuals in the network (i.e.

a series of individual characteristics). Then, under mutual consent, we should not observe

networks such as the one depicted in Figure 2b. That is, structural homophily should hold.

It it interesting to note that small-world networks respect structural homophily for a

specific type space.22 In such small-world models, individuals are located on islands. In

that setting, structural homophily implies that individuals are linked first with individuals

on the same island. Hence, if there is a link between two islands, those islands have to be

fully connected. I now present a game that produces structural homophily at equilibrium.

2.2 The Game

There are n individuals, each of whom is endowed with a fixed amount of resources xi = κiξ,

where ξ ∈ R+ and κi ∈ N. We will see that, in equilibrium, κi is interpreted as the

maximum number of links that an individual i can have. A strategy for an individual i is

a vector xi = (x1i , ..., xni ) ∈ Xi, where Xi = {xi ∈ Rn

+|xji ≤ ξ, and

∑j∈N x

ji ≤ κiξ}. Then,

ξ plays the role of a link-level constraint. The introduction of the link-level constraint is

motivated by the empirical fact that the number of links varies across individuals. Let

X = ×i∈NXi. We say that there is a link between an individual i and an individual j iff

xji > 0 and xij > 0. Let gi = {j ∈ N |i and j are linked}, so j ∈ gi iff i ∈ gj. That is, a link

22See for example Jackson and Rogers (2005) and Galeotti et al. (2006).

7

exists iff both individuals invest a strictly positive amount of resources in it. Note that

individual i can be linked to himself.

The utility of an individual is given by the function ui : X → R. It is additive in the

different links he has, and it is represented by :

ui(x) =∑

j∈N\{i}

vi(xji , x

ij, dij) · I{j∈gi} + wi(x

ii) · I{i∈gi}

where I{P} is an indicator function that takes value 1 if P is true, and 0 otherwise. The

function vi(x, y, d) gives the value of any link for i. It is assumed to be twice continuously

differentiable with vx(x, y, d) > 0 if y > 0, vy(x, y, d) > 0 if x > 0, and vd(x, y, d) < 0 if

x, y > 0. The function wi(xii) represents the payoff received from the private investment of

i.23 It is also twice continuously differentiable with w′(x) > 0. I also allow for the presence

of fixed costs, i.e. vi(0, 0, d) ≤ 0 and wi(0) ≤ 0. Notice that an individual benefits from a

link only if both individuals invest in the link. The model induces a game Γ between the

n individuals. Formally, we have Γ = (N, {Xi}i∈N , {ui}i∈N).

The model has two important features. First, the initial endowment creates scarcity

and induces a feasibility constraint. This effect is typical of any matching model. If

some individual i invests resources in a link with individual j, he will have less available

resources to create a link with another individual. That is, the feasibility constraint implies

a tradeoff between the distance between two individuals, and the level of investment they

put in the link. This is what Manski (2000) refers to as “constraint interactions.” Second,

the preferences are affected by the presence of direct externalities. The amount of resources

invested by an individual in a given link directly affects the utility of the individuals he

links to. That is, in Manski’s terms, there are “preference interactions.” These two features

will play an important role in equilibrium.

This completes the description of the game. I now present the main definitions.

23The function wi can also be interpreted as the private value of the resource x for i.

8

2.3 Definitions

The collection of links between individuals generates a network g = (N,E). A network is

characterized by a set of individuals (here, N), and a set of links,E, which are (unordered)

pairs of individuals. The set of all possible networks is denoted by G. Any network g can

be represented by a n × n adjacency matrix A that takes values aij = 1 if j ∈ gi, and

0 otherwise, for all i, j ∈ N . The degree δi(g) of an individual i is the number of links

attached to i, i.e. δi(g) = |gi|.

I am interested in the following solution concepts:

Definition 2 A Nash equilibrium (NE) is a profile x∗ ∈ X such that ui(x∗i , x∗−i) ≥ ui(xi, x

∗−i)

for all xi ∈ Xi, and for all i ∈ N .

The set of Nash equilibria is very large. Since an individual benefits only from a

collaborative link when both individuals invest in the link, it will never be profitable to

unilaterally start a new link. For this reason, I focus on the following solution concept,

introduced by Goyal and Vega-Redondo (2007).

Definition 3 A bilateral equilibrium (BE) is a profile x∗ ∈ X such that :

(1) x∗ is a Nash equilibrium

(2) There exists no i, j ∈ N , such that ui(xi, xj, x∗−i−j) > ui(x

∗) and uj(xi, xj, x∗−i−j) ≥

uj(x∗) for some xi ∈ Xi and xj ∈ Xj.

This solution concept allows for bilateral deviations. This is a natural extension of

individual rationality, since individuals can benefit from the creation of links. For certain

economies, however, the BE concept will be too constraining. Accordingly, I also introduce

the following weakened equilibrium concept.

Definition 4 A weak bilateral equilibrium (WBE) is a profile x∗ ∈ X such that :

(1) x∗ is a Nash equilibrium

(2) There exists no i, j ∈ N , such that ui(xi, xj, x∗−i−j) > ui(x

∗) and uj(xi, xj, x∗−i−j) >

uj(x∗) for some xi ∈ Xi and xj ∈ Xj.

9

In a WBE, a deviation must strictly increase the payoff of both individuals involved.

Notice that BE ⊆ WBE ⊆ NE. I discuss the distinction between these concepts in

section 3.1 (lemma 3.1 and proposition 3.5).

3 Equilibrium Characterization

I first show the existence of an equilibrium. Since the payoff functions are not continuous,

we cannot directly use the standard fixed-point arguments. The existence of a NE is

straightforward. Let xji = 0 for all j 6= i. Then, for every individual, the maximization

problem becomes: maxxi∈Xiw(xii) · I{i∈gi}. The allocation x∗ ∈ X that maximizes this

problem for all i ∈ N is obviously a NE. In order to show the existence of a WBE (or a BE),

I will need to introduce additional assumptions. The next result provides an intuition of the

additional restrictions imposed by the bilateral stability on the solution set. It states that

if a deviation is jointly profitable, but not unilaterally profitable, the deviating individuals

have to invest more in their collaborative link. All proofs can be found in Appendix 1.

Lemma 3.1 If x∗ ∈ X is a NE, but not a WBE, given any deviating pair (i, j), with

profitable deviations xi ∈ Xi and xj ∈ Xj, we have xji > xj∗i and xij > xi∗j .

Since x∗ is a NE, it is individually rational. Also, since the utility functions are additive

in the different links, the actions of individual j only affects i through the link between i

and j. If x∗ is not jointly rational for i and j, the incentive to deviate must come from the

link that i and j have together.

Throughout this section, I consider two alternative assumptions:

Assumption 1 (Discreteness) For all i, j ∈ N , xji ∈ {0, ξ}

Assumption 2 (Convexity) For all i ∈ N , ∂2vi∂x2

(x, y, d) ≥ 0, ∂2wi

∂x2(x) ≥ 0

The discreteness assumption is extensively used in the literature.24 Convexity is often

assumed when the network formation process involves continuous strategies. For example,

24See for example Jackson (2008) chapters 6 and 11.

10

Bloch and Dutta (2009) define the strength of a link between individuals i and j as the

sum of a (strictly) convex function of the individuals’ investment, i.e. sij = f(xji ) + f(xij),

with f ′ > 0 and f ′′ > 0. Rubi-Barcelo (2012) uses a linear (and hence convex) function

to represent the payoff from scientific collaboration between two researchers.25 One of the

reasons why the literature has focussed on convex preferences as opposed to the “standard”

concavity assumption is as follows. Suppose that the value functions are concave. Then,

an individual would prefer to spend a very small amount of time with as many individuals

as possible. This is not what is actually observed, as most individuals have a relatively

small amount of links. I provide existence results and show that these two assumptions

imply that the equilibrium network exhibits structural homophily.

The next results are based on an algorithm referred to as the assignment algorithm,

and formally defined in Appendix 2. The assignment algorithm uses as inputs: (1) the

list of preferences {ui(x)}i∈N , (2) the individual characteristics {θi}i∈N , (3) the resource

constraints {κi}i∈N , and (4) the distance function d on Θ. It produces at least one allocation

x ∈ X, and any allocation produced is such that xji ∈ {0, ξ} for all i, j ∈ N . When

xji ∈ {0, ξ}, the payoff that an individual receives from the links can be ranked using the

distance function (a small distance implies a big payoff). Accordingly, the assignment

algorithm first links the pairs of individuals with the smallest distances (provided that

the link is profitable for both individuals, and leads to a higher payoff than the private

investment). The following results show that any allocation constructed in this fashion is

a WBE, and induces a network that exhibits structural homophily.

Let’s first consider the discrete case. Under discreteness, the involvement of an individ-

ual in some link does not affect the amount of resources he invests in his other (existing)

links. The value of a link between two arbitrary individuals is then independent of the

other (potential) links. Consequently, we have the following:

Theorem 3.2 (Discrete Strategy Space) Under discreteness, an allocation is a WBE

iff it is produced by the assignment algorithm.

25The value of a scientific collaboration as defined by Rubi-Barcelo (2012, p.7) is analogous to a distancein my model.

11

Under convexity, for a given link, it is also rational for both individuals to invest

resources until the link-level constraint ξ is met, provided that it leads to a positive payoff.

We then have the following:

Theorem 3.3 (Existence) Under convexity, any allocation produced by the assignment

algorithm is a WBE.

Proposition 3.4 gives sufficient conditions so that any individual has to invest up to the

link-level constraint, in any WBE.

Proposition 3.4 Suppose that the inequalities in Assumption 2 are strict. Then any WBE

can be produced by the assignment algorithm.

Then, under discreteness or strict convexity, any equilibrium can be constructed through

the assignment algorithm. It is worth noting that under discreteness, xji ∈ {0, ξ} by

assumption, while under strict convexity it must hold only in equilibrium.

The above results show the existence of a WBE, but not of a BE. The intuition is

the following. Suppose that discreteness holds, and that the economy contains only three

individuals: i, j, k (see Figure 3). Suppose also that dij = dik < djk, and that xi = xj =

xk = ξ. Finally, suppose that vi(ξ, ξ, dij) = vj(ξ, ξ, dij) = vk(ξ, ξ, dik) > 0, while any other

link has a negative value. Then, in this example, there is no BE, but there are two WBE.

The reason is that i is indifferent between a link with j or a link with k. So, if i is linked

with j, but receives a proposition from k to form a new link, he will be indifferent between

keeping his link with j and replacing it with a link with k (while k would be strictly better

off with such a deviation).

In many contexts, however, individuals have many characteristics, and the likelihood

of such a circumstance is small. In the absence of such a circumstance, we can show the

existence of a BE:

Proposition 3.5 Suppose that dij 6= dkl for any i 6= j and k 6= l. Then any WBE produced

by the assignment algorithm is a BE. Moreover, if dij is such that vi(ξ, ξ, dij) 6= wi(ξ) and

vi(ξ, ξ, dij) 6= 0, wi(ξ) 6= 0 for all i, j ∈ N , this equilibrium is unique.

12

This implies that if for all i ∈ N , the types θi ∈ Θ are drawn from a distribution with

a dense support on Θ, then there exists a unique WBE (if wi(ξ) 6= 0), which is also a BE,

a.s..

Figure 3: WBE and BE

(a) The First WBE (b) The Second WBE

Let’s now turn to the characterization of the equilibrium network. Since the level of

investment of an individual in a potential link does not depend on the number of links he

has, the payoffs are only influenced by the distance between the individuals. Suppose i

and j are linked. Then, the creation of a new link between j and k has no spillover effects

on i. This produces important consequences on the shape of the equilibrium network. The

next proposition characterizes the allocations produced by the assignment algorithm.

Proposition 3.6 (Characterization) Let g∗ be the network generated by some allocation

produced by the assignment algorithm, then

(1) For all i ∈ N , δi(g∗) ≤ κi.

(2) The network g∗ exhibits structural homophily with respect to the distance d.

The proof is immediate from the construction through the assignment algorithm. Since

investments are maximal in every link, the number of links an individual can have is

bounded by the resource constraint κi. Also, since the assignment algorithm first creates

links that have the shortest distance before forming links of longer distances, the induced

network exhibits structural homophily. In essence, under discreteness or (strict) convexity,

any equilibrium network can be constructed through the assignment algorithm, hence

satisfying structural homophily.

13

I now examine efficiency issues. There are many ways to define efficiency. First, one

could consider the Pareto criterion. Given discreteness or convexity, any BE is Pareto

efficient. In fact, there is an even stronger result, which is that any BE is a strong Nash

equilibrium.26

Proposition 3.7 Under discreteness or strict convexity, any BE is a strong Nash equilib-

rium.

Since the utility functions are additive, bilateral stability implies stability in the sense

of a strong Nash equilibrium. However, since the utility functions are non-continuous (and

utilities are not transferable), Pareto efficiency does not imply efficiency in the sense of the

utilitarian criterion. Consider the following social welfare function:

W (x) =∑i∈N

ui(x)

In this case, efficiency is not guaranteed. In particular, one can find examples of

economies where the unique BE is efficient (in the sense of the utilitarian and the Pareto

criteria), as well as examples of economies where the unique BE is inefficient (in the sense

of the utilitarian criterion). This inefficiency comes from two principle sources.

First, under the discreteness assumption, any efficient allocation z ∈ X is such that

zji ∈ {0, ξ} for all i, j ∈ N (by assumption). Since an individual values only his own payoff,

whereas the social planner (SP) cares about all individuals, a collaborative link is more

valuable for the SP than it is for individuals (since it enters the utility function of both the

individuals involved in the link). The tradeoff between individual and collaborative links

is thus different for an individual than for the SP.

Second, under the (strict) convexity assumption, another issue arises. Since the SP is

willing to trade off the utilities of the individuals, an efficient allocation z ∈ X need not

be such that zji ∈ {0, ξ}. For example, suppose that there are no fixed costs, then any

network g∗ such that δi(g∗) < κi for some i ∈ N is inefficient. The reason is that if δ∗i < κi

for some i ∈ N , the creation of a link with some agent j (who is willing to invest a small

26Aumann (1959)

14

amount ε) leads to vi(ξ, ε, dij) for i. If ε is small enough, the loss for j is compensated by

the discrete jump in the utility of i. Hence, g∗ is inefficient. However, it is possible that

such a network g∗ is induced by a BE.

This concludes the analysis of the theoretical model. In section 4, I develop an estima-

tion technique derived from structural homophily, and present Monte Carlo simulations.

4 The Econometric Model

In this section, I present the econometric model. I use structural homophily to estimate

the weights of the distance function.27 I would like to emphasize that the method and

results of this section are self-contained. If one was willing to assume structural homophily

(instead of viewing it as the equilibrium outcome of the game presented in the last section),

all the results of this section would apply.

In order to present the econometric model, I introduce the following definition:

Definition 5 An observation q is

1) a network g = (Nq, Eq), and

2) for each individual i ∈ Nq, a vector of R individual socioeconomic characteristics, i.e.

{θi}i∈N , where θi is a 1×R vector.

For a given observation q ∈ 1, ..., Q, I note (gq, θq), where θq is nq ×R.

From section 3’s results, the structure of the equilibrium depends on the distribution

of types in the population. Specifically, the probability that aij = 1 for some i, j ∈ N ,

does not (in equilibrium) only depend on the preferences and types of i and j, but on the

preferences and types of all individuals in the population. In order to develop a consistent

estimator, one then needs to observe many populations.

Definition 5 implies that the econometrician does not observe the specific level of invest-

ment in a link (i.e the link-level constraint), nor does he observe the resource constraint

27Copic et al. (2009) also exploit homophily, although in a very different context, in order to developtheir estimation technique.

15

κi.28 Accordingly, given a set of observations (gq, θq)

Qq=1, we do not possess enough in-

formation to construct the equilibrium network through the assignment algorithm, even

assuming some structural form for the utility functions. Specifically, a standard economet-

ric model would be the following. Given a parametric form for the payoff functions (i.e.

{vi(x, y, d), wi(x)}i∈Nq), and the distance function (i.e. d(i, j)), one would assume that the

data is generated by:

gq = Λ(θq, κq, ξq, εq; β) (1)

where Λ is the assignment algorithm, κq is the nq × 1 vector of individual resource con-

straints, ξq is the link-level resource constraint, εq is the error term, and β is the vector of

parameters to be estimated. Provided that one observes θq, κq, ξq, one could, in principle,

estimate β. Since κq and ξ are typically unobserved in existing data sets, I use a different

approach.29 From section 3’s results, I have established that any allocation produced by

the assignment algorithm respects structural homophily.30 Thus, my approach is to max-

imize the likelihood that the observed network exhibits structural homophily. Accordingly,

the distance function will play a central role. I assume the following structural form for

the distance function:

ln(dij) =L∑l=1

βlρl(θi, θj) + εij (2)

where ε ∼iid N(0, 1), and ρl(., .) is a dimension-wise distance function. For instance, if

Θ ⊆ R2, one could choose ρl(θi, θj) = |θli−θlj| for l = 1, 2. The vector (β1, ..., βL) ∈ Ξ ⊂ RL

contains the weights of the distance function.

Note that the proposed structural form is by no means the only possible one. Any

positive and symmetric function could be used. I prefer to use the specification in (2) to

simplify the exposition. Notice that by introducing the error term εij the probability that

two distances are exactly the same is null. Hence the result of proposition 3.5 applies a.s.

28Note that while κi is an upper bound to δi(g), they are not necessarily equal. See proposition 3.6.The value of ξ is less of a concern as it can be normalized to 1 by renormalizing the value functions vi andwi for all i. Another smaller issue is that the “individual link” is unobserved in most databases.

29There are also severe computational and identification issues using the specification in (1).30Also, by observing a network that exhibits structural homophily, one can always find some vi(x, y, d),

κi and ξ that are produced by the assignment algorithm.

16

Equation (2) highlights two important features of the model. First, instead of trying

to specifically identify all the parameters of the utility function, I limit myself to the

estimation of the impact of homophilic preferences on the network formation process. That

is, I only seek to estimate the parameters of the distance function, and not the parameters

of the utility functions (for instance, I do not estimate the value of the resource for the

individuals). This is illustrated in Figure 4. In Figure 4a, the individuals place more value

on the characteristic on the horizontal axis. Then, the “closest” individuals to the central

node are the ones on the top and bottom. Symmetrically, in Figure 4b, the individuals

place more value on the characteristic on the vertical axis. Then, the “closest” individuals

to the central node are the ones on the left and right.

My aim is to estimate the relative weights placed on various characteristics. Note that

centred ellipses, such as those depicted in Figure 4, are implied by the additive form we

assumed in (2). The generalization to a more general class of distance functions, such as

in Henry and Mourifie (2013), is straightforward. In section 5, for example, I introduce

non-diagonal weights between the age and gender dimensions of the type space.

Figure 4: Changing the Weight of the Distance Function

(a) Relative Importance onthe Horizontal Characteris-tic

(b) Relative Importance onthe Vertical Characteristic

Second, I assume that the distance function is observed with noise. That is, there exists

a set of variables, observed by the individuals, but unobserved by an econometrician, that

affects the distance function. This assumption is not standard and is discussed in section

17

4.1. For now, I present the maximum likelihood estimator.

Given (2), we can compute the probability (conditional on an observation) that a net-

work exhibits structural homophily. Let Ψ = 1− Φ, where Φ is the c.d.f. of the standard

normal distribution, and let γ =(β1/√

2, ..., βL/√

2). The probability that a network g

(given a set of characteristics θ) exhibits structural homophily is (algebraic manipulations

can be found in Appendix 3) :

P(sh|g, θ, γ) = Πij /∈g{

Πk∈giΨ [(sik − sij)γ′] + Πk∈gjΨ [(sjk − sij)γ′]

−Πk∈giΨ [(sik − sij)γ′] Πk∈gjΨ [(sjk − sij)γ′]}

(3)

where sij is the 1×L vector of dimension-wise distance, i.e. slij = ρl(θi, θj). Equation 3

assumes that there is no isolated individual (i.e. no individual i is such that gi ∈ {∅, {i}}).

This is done without loss of generality since for any pair of individuals in which one of the

individuals is isolated, the condition imposed by structural homophily is trivially respected.

Then, given that there are Q observations, I propose the following maximum likelihood

estimator:

`(β|θ) =1

Q

Q∑q=1

ln[P(sh|gq, θq, γ)] (4)

Provided that there exists a unique γ0 ∈ Ξ that maximizes (4), the maximum likelihood

estimator is well-behaved, and γ can be consistently estimated.31

The identification strategy is based on a link-deference approach. A link exists if no

individual refuses it. There are two reasons for an individual to refuse a link: (1) because

he has no resources left (constraint interactions), or (2) because the other individual is too

distant (preference interactions). I want to identify the preferences effect, given that the

resource constraint is unobserved, without imposing additional assumptions on vi and wi.

The estimation strategy can be viewed as to minimizing the probability that structural

homophily is violated.

31Although the function in (3) looks peculiar, the maximum likelihood setting is standard and theestimation of (4) requires only the usual set of assumptions. See for example Cameron and Trivedi (2005,p. 142-143) for the asymptotic properties of the maximum likelihood estimator.

18

Consider two alternative parameters β and β′. Suppose that we observe that two

individuals, i and j, are not linked together, as in Figure 5. According to β and β′, i

is linked to an individual farther from him than j. This means that i would have been

willing to create a link with j, but that j refused. This implies that j cannot be linked

to individuals farther from him than i; if j is linked with farther individuals, structural

homophily is violated. Thus, if j is linked to farther individuals than i under β, but not

under β′, then β′ is chosen over β to represent individuals’ preferences.

Figure 5: Admissible Parameters, Θ = R2

(a) Distance Weights according to β

i

j

(b) Distance Weights according to β′

i

j

This shows why isolated individuals (i.e. individuals that have no link) provide no infor-

mation: whatever the parameters’ values, they never contradict structural homophily. In

other words, for isolated individuals, we cannot identify whether they are isolated because

they have limited resources, or because they have strong homophilic preferences. From a

revealed preference approach, we gain information about an individual’s preferences by ob-

serving his choices. If an individual is not connected, he does not “consume” any resources.

We therefore cannot say anything about his preferences.

19

4.1 Distances and Errors

In this section, I discuss the strengths and limitations of focussing on homophily for the

estimation, and I discuss the interpretation of εij. The main advantage of using the distance

function is that it can account for a high degree of unobserved heterogeneity. The reason

is that any unobserved characteristics that affect the value of a link without affecting the

distance function have no impact on the estimation. It helps to compare with a simple

probit estimator for the probability that two individuals (i and j) create a link:

a∗ij = vi(xji , x

ij, dij) + εij (5)

where a∗ij is the latent variable for aij ∈ {0, 1}, and εij is an error affecting the value

of the link. The probit estimator requires two additional assumptions. First, since the

resource constraint creates dependence, the consistency of the probit estimator requires

κi ≥ n for all i ∈ N (i.e. time constraints are non-binding). Second, the identification of

(5) requires that one specifies a precise structural form for vi. This functional form can

only depend on i’s (observed) characteristics, or on a group fixed effect. Then, omitting a

relevant characteristic that affects the value of vi (and that is not captured by the group

fixed effect) will bias the estimation. By contrast, focussing on the distance function by

using the estimator in (4) allows one to account for any individual resource constraints

(correlated or not with the observables), and for any function vi. Also note that, contrary

to a link-based estimation (as with the probit estimator), the estimation is robust to the

dependence implied by the resource constraints on the equilibrium network.

Examples of such interdependencies between links can be found for example in Chris-

takis et al. (2010) and Mele (2013). Both papers find strong negative effects of creating

a link with an individual who has many friends. Here, this effect is also present, and

explicitly built into the model by the introduction of the resource constraint. When an

individual has many friends, he has less resources to spend on a potential new friend.

The obvious limitation of the estimator in (4) is that it does not allow for the iden-

tification of all the preference parameters of the model, but only those of the distance

20

function. This is not overly problematic, keeping in mind that the resource constraint

is unobserved in most data sets, and that the estimation procedure does not require any

specific assumptions on vi, wi and κi. However, provided that one observes κi for all i ∈ N

and his willing to assume a functional form for vi and wi, one could, in principle, use the

econometric model in (1) in order to estimate the full model.

I now turn to the interpretation of the unobserved component of the distance function,

i.e. εij. The error term potentially contains three sources of unobserved heterogeneity.

The first interpretation is straightforward; εij can be interpreted as a measurement error

on the distance function. We therefore assume that we observe all the relevant individual

characteristics, but less precisely than the individuals.

The second is to assume that the value of a link is affected by a random shock, and

is in line with the literature on Additive Random Utility Models (ARUM, see McFadden,

1981), i.e. vi(x, y, d) + εij. An ARUM is observationally equivalent to a model where the

distance is observed with noise. The reason is that one can always define a symmetric

function d : Θ2 → R such that vi(ξ, ξ, dij) = vi(ξ, ξ, dij) + εij for all i 6= j. For examble, for

the functional form assumed in (2), if vi is log-quasilinear in the distance, i.e. v(x, y, d) =

f(x, y)− ln(d), the two models are equivalent.

The third interpretation is to assume that there exist unobserved individual charac-

teristics that affect the distance function, i.e. there exists a dimension of the type space

that is unobserved. This is arguably the interpretation that is least compatible with the

assumption that the εij are iid. The independence of the errors is required to write the

closed-form expression of the likelihood function (see Appendix 3). To address the poten-

tial problem of misspecification, any reported standard errors in section 5 are the robust

“sandwich” standard errors for extremum estimators. The likelihood function in (4) can

then be interpreted as a pseudo-maximum likelihood estimator. The estimated parameters

then converge to their pseudo-true value.32

I also study the omission of a relevant dimension of the type space using Monte Carlo

simulations. Specifically, I generate a population of equilibrium networks using the assign-

32See Gourieroux et al. (1984) or for a general description of the properties of quasi maximum likelihoodestimators, see Cameron and Trivedi (2005) section 5.7.

21

ment algorithm for an economy in which individuals have a four-dimensional type space.

I then estimate the model using only three of these dimensions. I vary the correlation

between the individual characteristics and the omitted characteristic (see Appendix 4 for

a precise description of the Monte Carlo exercise).

The exercise departs from the theoretical model in two fundamental aspects. First, the

structure of the error term is no longer normal (or identically distributed). As there is

an unobserved dimension, the error is comprised of two elements: the iid error term, and

the unobserved dimension-wise distance, i.e. βlρl(θi, θj) + εij, where l is the unobserved

dimension. The second departure from the theoretical model is that the unobserved char-

acteristic generates dependence in the errors. Even with a correlation coefficient equal to

0 between the unobserved characteristic and the observed characteristics, the errors are no

longer independent, as ρl(θi, θj) is correlated with ρl(θi, θk), for any i, j, k ∈ N . Figures

6-8 in Appendix 4 display the simulations results. When the correlation coefficient is equal

to 0, the estimator is consistent. To ensure that the bias varies continuously with the

dependence on the omitted variable, I vary the correlation coefficient. The estimator is

robust to a small amount of dependence between the unobserved characteristic and the

observed characteristics. Looking at Figures 6-8 (Appendix 4), the true value falls within

the 95% confidence interval when the correlation coefficient is less than or equal to 20% for

the first characteristic (the one correlated with the unobserved one) and 30% for the other

characteristics. Obviously, as in most estimation techniques, omitting a strongly correlated

variable is likely to bias the estimation.

5 Empirical Application: High-School Friendship Net-

works

In this section, I estimate the weights of the distance function that leads to the formation of

the friendship networks of American teenagers. I find strong evidence of racial segregation,

especially for Hispanics and Blacks. I also find a significant effect of age and gender

differences. Disparities in economic status have a statistically significant, but quantitatively

22

small effect.

I use the Add Health database as it is particularly well suited for my model. Recall

that the model presented in sections 2 and 3 assumes that the individuals of the same pop-

ulation meet with certainty. A convincing empirical implementation thus requires that the

observed populations are small enough. In this regard, the Add Health database provides

information on students’ high schools, which are relatively small entities.33 Specifically, I

use the “In-School” sample, which provides information for every student in grades 7 to

12 at the schools in the sample.34 The sample includes the race (White, Hispanic, Black,

Asian or Native), age, gender, parents’ labor market status (work), and the friendship

networks of 37 798 teenagers, attending 122 high schools in the U.S. Every observed char-

acteristic is assumed to be a dimension of the type space. The type space thus has eight

dimensions: age, gender, White, Hispanic, Black, Asian, Native and work. For instance, a

16-years-old male teenager who identifies as Black-Asian, and has working parents would

be of type θ = (16, 0, 0, 0, 1, 1, 0, 1). Note that the “gender” dimension takes a value of 1

if the individual is a female, and 0 if the individual is a male, while the “work” dimension

takes a value of 1 if both parents work, and 0 otherwise.

I assume the following distance function:

ln d(xi, xj) = β1|x1i − x1j |+8∑r=2

βrI{xri 6=xrj} + β9|x1i − x1j | · I{x2i 6=x2j} + εij (6)

The parameter β9 reflects the fact that the effect of the age difference may differ

depending on whether a friendship is between individuals of the same or different gen-

ders. Given the distance function in (2), the distance between two teenagers, i and j,

who are identical except for race, where teenager i is White, and teenager j is Black, is

d(xi, xj) = βwhite + βblack.

33For this reason, and for computational reasons, I limit myself to schools for which I observe less than300 students after removing isolated individuals, which is about 68% of the schools in the database. (Iremoved the isolated individuals, as they provide no relevant information; see the last paragraph of p.18for further discussion). The results are robust to the use of alternative thresholds.

34For a more detailed description of the database, see http://www.cpc.unc.edu/projects/addhealth/design/wave1. The database also contains a “In-Home” sample, which contains more variables, but onlyfor a sub-sample of students within a school.

23

http://www.cpc.unc.edu/projects/addhealth/design/wave1

http://www.cpc.unc.edu/projects/addhealth/design/wave1

The Add Health questionnaire asks each teenager to identify their best friends (up to

10, and a maximum five males and five females). I assume that two individuals can be

friends only if they attend the same school. This approach is standard in the literature

using Add Health data. This allows each school (the set of teenagers and the network)

to be treated as an observation. To be coherent with the model presented in section 2, I

assume that there exists a link between two students iff they both identify the other as a

friend.

Table 1 summarizes the data. As noted in the previous section, the estimator does not

use any information on isolated individuals (individuals with no links), so these individuals

can be dropped from the sample. Table 2 shows descriptive statistics for the sample after

removing isolated individuals. Note that using the sub-sample of individuals with at least

one link does not affect the estimation. Using the whole sample would lead to exactly the

same likelihood value.

I estimate the model (4), using the distance function in (6). The estimated weights

(β1, ..., β9) and the corresponding standard errors are shown in Table 3. Since the weights

are only scale-identified, I report the relative effects, normalizing the total effect to one.

Then, the contribution of each characteristic (or dimension) can be interpreted as a per-

centage.

The general results are indicative of strong racial segregation in the choice of friend-

ships. The Hispanic dimension has the greatest importance (36%), followed by the Black

dimension (24%). The White (9%) and Asian (9%) dimensions have comparatively low

weights.

The effect of non-racial dimensions is small in comparison. The effect of gender dif-

ference (19%) is larger than the age difference (7% for one year). Remarkably, the model

captures heterogeneity in the effect of age: for same-sex friendships, the importance of

age is stronger. This could reflect the possibility that teenage girls may be more mature

compared to males of the same age, and hence be interested in forming friendships with

older boys.

The “work” dimension has a relatively small impact on friendship formation (1.5%).

24

Table 1: Descriptive Statistics

Mean S.E. Min MaxIndividualsAge 14.922 1.708 10 19Gender 0.484 0.500 0 1Work 0.777 0.417 0 1White 0.690 0.462 0 1Hispanic 0.147 0.354 0 1Black 0.131 0.337 0 1Asian 0.072 0.258 0 1Native 0.050 0.217 0 1Degree 1.189 1.422 0 9Pairs (Constructed)Linked 0.002 0.046 0 1∆(Age) 1.352 1.108 0 9∆(Gender) 0.498 0.500 0 1∆(Work) 0.328 0.469 0 1∆(White) 0.293 0.455 0 1∆(Hispanic) 0.195 0.396 0 1∆(Black) 0.152 0.359 0 1∆(Asian) 0.117 0.321 0 1∆(Native) 0.086 0.280 0 1Number of Schools: 122Number of Individuals: 37 798Number of Pairs: 10 396 915

In other words, the fact that a teenagers comes from a family where both parents work, or

not, has little impact his choice of friends. These results have important implications for

policy making, as discussed in section 5.1.

Table 3 reports the pseudo-R2 introduced by McFadden (1974), and the likelihood

ratio statistic when the model is compared to a model with no explanatory variables. The

pseudo-R2 is a percentage and indicates that the contribution of the selected variables to

structural homophily is strong.

Interestingly, the results in Table 3 are extremely close to the findings of Badev (2013).

Although he examines a very different context,35 Badev also finds that Hispanics are more

segregated than Blacks and Whites, and that racial characteristics play a significant role

35Badev (2013) is mostly interested in the impact of peer effects on smoking behavior, but also controlsfor the endogeneity of the network structure.

25

Table 2: Descriptive Statistics without Isolated Individuals

Mean S.E. Min MaxPairs (Constructed)Linked 0.011 0.104 0 1∆(Age) 1.332 1.162 0 8∆(Gender) 0.489 0.500 0 1∆(Work) 0.333 0.471 0 1∆(White) 0.236 0.425 0 1∆(Hispanic) 0.704 0.457 0 1∆(Black) 0.094 0.292 0 1∆(Asian) 0.098 0.298 0 1∆(Native) 0.083 0.276 0 1Number of Schools: 122Number of Pairs: 1 237 492

in friend selection.36

I now discuss the interpretation of the results in more depth. The estimates in Table

3 represent the weights of the distance function. Consider Hispanics. Their estimated

weight is the largest among the other racial groups. This means that they are more distant

then the other racial groups in the type space; the relative cost for them to create a

link with a different racial group (or for other racial groups to create links with them) is

greater than for other races. However, this does not imply that Hispanics will necessarily

create less friendship relations with other racial groups. It also depend on their resource

constraints and on the value and shape of vi. For example, some Blacks may have fewer

links than Hispanics, even if the weight for the Black dimension is lower than then one for

the Hispanic dimension. Since κi and vi are unobserved, this could either be because they

may have a larger resource constraint, or because they may receive greater payoff from

their friendships. This shows that the estimates of Table 3 are really capturing segregation

and not, for instance, the fact that the time constraint or the valuation of friendships may

be correlated with racial groups. The estimator developed in this paper then explicitly

captures the effect of homophily while controlling for individuals’ unobserved heterogeneity

on vi, wi and κi.

Another important characteristic of the model implies that one needs to be careful when

36Badev (2013) uses the categories Hispanics-Asians-Others, Whites and Blacks.

26

comparing categorical and non-categorical variables. Recall that the distance between two

individuals i and j, who are identical except for their racial groups (where i is Black and j

is White), is βBlack +βWhite = 0.330. In comparison, if i and j only differ on the fact that i

is one year older than j, the distance between them is βAge = 0.074. This implies that the

importance of racial characteristics, compared to non-racial ones, is actually larger than

what it appears to be at first glance.

Table 3: Relative Estimated Weights (Sum normalized to 1)†

Dimensions Estimates S.E.Age 0.074∗∗ (0.005)Gender 0.189∗∗ (0.035)Work 0.015∗∗ (0.007)White 0.092∗∗ (0.009)Hispanic 0.360∗∗ (0.023)Black 0.238∗∗ (0.030)Asian 0.094∗∗ (0.020)Native 0.004 (0.012)Age-Gender -0.065∗∗ (0.006)Log-Likelihood: -3128.033Pseudo-R2: 0.610Likelihood Ratio: 9797.450

† S.E computed using the delta method.†† ∗∗ for 1% significance level.††† Robust SE using the (sandwich) variance-covariance matrix for pseudo-m.l.e.

5.1 Robustness and Policy Implications

In this section, I discuss the robustness of the estimation and its implications for policy

making. I first compare the results of Table 3 (the baseline model) with three alterna-

tive specifications of the distance function.37 The first alternative model constrains the

estimation to racial characteristics. Results are reported in Table 4. To facilitate the

comparison with the baseline model, Table 4 also reports the estimated coefficients for the

racial characteristics in the baseline model, renormalizing the sum to 1.

37This section’s tables are reported in Appendix 5.

27

Estimates of the constrained model are quite similar to those of the baseline model.

This implies that the relative difference in the magnitude of segregation between racial

groups seems to be unaffected by other non-racial characteristics. Table 4 also reports

the likelihood ratio between the two models. Unsurprisingly, I strongly reject the null

hypothesis that the two models are equivalents.

Table 5 presents the second alternative model. Recall that in the baseline model, the

variable “work” is discrete and takes value of 1 if both of the student’s parents work,

and 0 otherwise. In the alternative model presented in Table 5, I use and alternative

definition where “work” takes value 1 if at least one of the student’s parents work, and

0 otherwise. Results are extremely similar to those of the baseline model. However, the

estimated coefficient for the “work” variable is no longer significant. As the two models

are very similar, the likelihood ratio test cannot reject the null hypothesis. This confirms

that socio-economic status of high school students has little influence on the choice of their

friends.

Table 6 displays results for the third alternative model. For this specification, I as-

sumed the the error term εij follows a Gumbel distribution. This choice is convenient as

the difference of two Gumbel distributions has a closed-form expression (i.e. a logistic

distribution).38 Table 6 shows that results are robust to this change in the distribution of

the errors, as the estimations are extremely similar.

In the next subsection, I compare the estimator based on structural homophily with

more standard estimators.

5.1.1 Gains with Respect to Simpler Estimators

In order to better understand the intuition of the estimator presented in the previous

sections, and to show the additional information it provides, I compare my results with

two simpler estimators, namely the probit and cell estimators.

Results for the comparison of the baseline model with a simple probit (with a school

fixed-effect and a school-clustered variance-covariance matrix) are reported in Table 7.

38As displayed in Appendix 3, the likelihood distribution is given by the difference of the distribution ofthe error term.

28

Estimated coefficients are substantially different, especially for Hispanics, and for the age

and gender dimensions.39

These results can be intuitively explained as follows. Let’s compare friendship choices

of Blacks and Hispanics. Looking at the data, one finds that 57% of the Blacks’ links are

with teenagers of the same age. In comparison, 46% of Hispanics’ links are with teenagers

of the same age.40 This gives an indication that, in the absence of teenagers of the same

age and the same gender, Hispanics form more links with younger (or older) teenagers

than with teenagers of the same age, but from a different racial group. This difference is

exacerbated by the fact that Hispanics form, on average, more links than Blacks. This

could be due to a difference in their resource constraints, or the benefit that they derive

from friendships.

Table 7 also reports the likelihood ratio test, which strongly rejects the null hypothesis

that the two models are equivalent.41 Not surprisingly, the probit estimator achieves a

much lower likelihood since it uses a lot less information (i.e. it does not account for the

dependence between links). Recall also, as discussed in section 4.1, that the probit assumes

more homogeneity between the individuals’ preferences (i.e. vi and wi).

The same findings hold when we compare the estimator based on structural homophily

with a simple cell estimator (that is, the ratio of same-characteristic friendship links to the

number of total friendship nominations).42 To simplify the exposition, I only consider the

model with racial characteristics. Results are reported in Table 8 for the cell estimator.

Whites and Blacks seem to be the most biased toward their own racial group, with 83.9%

and 63.3% of their links, respectively, within their own racial group. However, the cell

estimator strongly depends on the school composition, i.e. whether or not Whites (for

39Badev (2013) and Mele (2013) also find stronger racial segregation for Hispanics than for Blacks, whichcontradicts the results of the probit estimator.

40Specifically, I compare the number of “Black-Black” links of the same age (701) with the total numberof “Black-Black” links (1,223) and the number of “Hispanic-Hispanic” links of the same age (431) withthe total number of “Hispanic-Hispanic” links (936). Results hold if we look at “Black-Non Black” and“Hispanic-Non Hispanic” links.

41One should be careful when interpreting the results of the this likelihood ratio test as the two modelsare not only non-nested, but use fundamentally different estimators. The probit model maximizes thelikelihood of the random variable gij ∈ {0, 1}, while the estimator in (4) maximizes the likelihood ofstructural homophily, which is a function of gij .

42I thank an anonymous referee for this suggestion.

29

example) have access to potential non-White friends.

In contrast, the estimator presented in the previous sections uses information from

revealed preferences. To illustrate this, consider an extreme case where schools are perfectly

segregated for a particular racial group (e.g. everybody in a school is White, or nobody

is). In this extreme case, βWhite is not identified for the structural homophily estimator

since Whites never refuse non-White individuals; the cell estimator for Whites would be

equal to 1. Thus, one interpretation of the difference between the results of Table 8 and

the results of Table 4 is that Whites are more school-segregated than, for example, Blacks

or Hispanics.43

5.1.2 Policy Implications

I now discuss the implications of my findings for policy making. The obvious implication

of this section’ estimations is that it allows policy makers to obtain better estimations of

segregation. As Table 7 shows, a simple probit estimation would, in this case, drastically

underestimate the degree of segregation for Hispanics. Knowing which racial groups are the

most segregated is then highly important in order to target policies toward those groups.

It is been demonstrated that the socio-economic status of an individual is correlated

with academic achievement (e.g. the probability of completing high school and probability

of going to college). It has also been documented that the presence of “good” peers (with

educated/high-income parents) has a positive influence on academic achievement. Hence,

peers can act as a substitute for a teenager’s private endowment.44 However, a key problem

is that in most cases, the policy maker cannot directly “choose” an individual’s peers.

The results of Table 3 tell us that friendships are weekly dependent on socio-economic

status. This is a strong argument in favour of having teenagers of different socio-economic

backgrounds attending the same school. Since socio-economic factors are mostly irrelevant

in the choice of friendships, it is likely that friendships between “poor” and “rich” students

will form.

43Indeed, looking at the data, about half of the White students in the database study in schools with aratio of 80% of Whites in the school or more.

44See for example De Giorgi et al. (2010) and Bifulco et al. (2011).

30

The implications are opposite for race. One of the profound concerns of policy makers

is the strong racial segregation within schools.45 Since the results from Table 3 represent

preference parameters, they indicate that simply putting students from different races in

the same school will not have the anticipated peer effects. This is because the equilibrium

friendship network is likely to be highly racially segregated.

I now conclude by discussing the limitations of my approach and suggesting some

potential extensions.

6 Going Further

I have shown that structural homophily can be obtained by a game of network formation.

Under discreteness or (strict) convexity, any bilateral equilibrium of the game features

structural homophily. I have also shown that structural homophily has empirical impli-

cations. I have developed an estimation technique that can be used to estimate some

parameters of the model, namely the weights of the distance function. This method can

then identify which social characteristics significantly influence the network formation pro-

cess. Being able to estimate the magnitude of these relevant characteristics is an important

step in the process of designing efficient policies, as it allows policy makers to target rel-

evant characteristics. To illustrate this method, I estimated the weights of the distance

function in the context of friendship networks for teenagers. I find a strong effect of racial

variables as compared to other individual variables such as age, gender and socio-economic

status.

The model developed in this paper is a first step toward a better understanding of

network formation processes under time constraints. However, there are still many unan-

swered questions. For instance, the results in section 3 are based on the discreteness or

convexity assumption. Those are arguably strong assumptions as they imply that individ-

uals invest as much as they can in their existing links. This may not be true in general.

45See for example the report “Segregation and Exposure to High-Poverty Schools in Large MetropolitanAreas: 2008-09”, by Nancy McArdle, Theresa Osypuk, and Dolores Acevedo-Garcıa, available online athttp://diversitydata.sph.harvard.edu/Publications/school_segregation_report.pdf

31

http://diversitydata.sph.harvard.edu/Publications/school_segregation_report.pdf

However, the study of the model under a concavity assumption faces difficult existence

issues. One could address this issue by imposing additional assumptions on the shape of v

(such as vxy(x, y, d) > 0) or by considering weaker solution concepts such as pairwise sta-

bility (Jackson and Wolinsky, 1996) which potentially exhibit less structured equilibrium

networks.

Another potential extension would be to model the probability that individuals meet.

Without this, the set of potential friends for every individual is the whole population.

However, in a large population, some individuals may never meet, which would obviously

prevent them from creating a link. A simple way to introduce address this would be to

assume that the set of potential friends is limited to individuals that have “met.” Hence,

individuals can only invest resources in links with individuals in a subset of the population.

In this case, the (ex-post) strategy space would not be the same for every individual, but

structural homophily would still hold in equilibrium (but the estimation would require the

observation of the population subsets). More elaborate models could assume that meeting

friends is a costly process. Individuals would then be allowed to endogenously choose

the amount of resources they spend searching for friends.46 As the estimation technique

does not require the observation of the time constraints, structural homophily is likely to

hold in equilibrium. However, in both extensions, the estimated parameters may not be

interpreted in terms of preferences. If homophily affects the preferences and the random

meeting process, it is unclear how those two effects can be identified.

Department of Economics, Universite Laval, Canada. [email protected]

46A nice example of a search model with homophilic preferences is Currarini et al. (2009).

32

Appendix

Appendix 1

Proof of lemma 3.1

Let x∗ be some NE, and suppose that (i, j) is a deviating pair in the sense of a WBE. Let

(xi, xj) be some joint deviation for (i, j). We need to show that xji > xj∗i and xij > xi∗j .

Since (xi, xj) is a profitable deviation (in the sense of a WBE), we have

ui(xi, xj, x∗−i−j) > ui(x

∗) (7)

uj(xi, xj, x∗−i−j) > uj(x

∗)

Since x∗ is a NE, we have

ui(xi, x∗−i) ≤ ui(x

∗) (8)

uj(xj, x∗−j) ≤ uj(x

∗)

for all xi, and xj. In particular, condition (8) holds for xi = xi and xj = xj.

Putting conditions (7) and (8) together, we have : ui(xi, xj, x∗−i−j) > ui(xi, x

∗−i) and

uj(xi, xj, x∗−i−j) > uj(xj, x

∗−j). Since the utility function is linear in the links, this is

equivalent to vi(xji , x

ij, dij) > vi(x

ji , x

i∗j , dij) and vj(x

ji , x

ij, dij) > vj(x

ij, x

j∗i , dij). The pro-

duction functions are strictly increasing in the second argument, so we must have xji > xj∗i

and xij > xi∗j . (If xj∗i = xi∗j = 0, we have vi(xji , x

ij, dij) > 0 and vj(x

ij, x

ji , dij) > 0, and the

result is straightforward.) �

Proof of theorem 3.2

First, we show that x produced by the assignment algorithm (see Appendix 2) is a NE.

By construction, we have vi(ξ, ξ, dij) ≥ 0, and wi(ξ) ≥ 0, hence removing a link is never

profitable. Now, the only link that an individual can unilaterally create is the individual

link. Suppose that it is profitable to do so for i ∈ N . Then either [δi < κi and wi(ξ) > 0],

or [δi = κi and wi(ξ) > minj∈gi vi(ξ, ξ, dij)]. By construction, both are impossible.

Now, suppose that x is a NE, but not a WBE. That is, there exists i, j ∈ N such that

33

j /∈ gi (from lemma 3.1, since xji ∈ {0, ξ}) who want to deviate, i.e. create a link between

them. There are 2 cases:

1. δi = κi. Then, i needs to remove a link in order to create a new link. (Since x is a

NE, he won’t remove more than one link.) Then, this implies that there exists k ∈ gisuch that vi(ξ, ξ, dij) > vi(ξ, ξ, dik) ≥ 0. This implies that dij < dik.

We now turn to j. If δj = κj, the same argument applies for j, then vj(ξ, ξ, dij) >

vj(ξ, ξ, djl) for some l ∈ gj (and vi(ξ, ξ, dij) > vi(ξ, ξ, dik)). Since we have dij < dik and

dij < djl, this contradicts the fact that x was created by the assignment algorithm.

If δj < κj, j has at least ξ to invest. Together with the fact that dij < dik, this

contradicts the fact that x is produced by the assignment algorithm.

2. δi < ki and δj < kj. This is impossible since, from the assignment algorithm, it

implies that vi(ξ, ξ, dij) < 0 or vj(ξ, ξ, dij) < 0.

�

Proof of theorem 3.3

We need to show that the allocation x ∈ X, which is produced by the assignment algorithm

(see Appendix 2), is a WBE of Γ.

We first show that x is a NE. Suppose that it is not; that is, there exists i ∈ N such

that xi is not individually rational. Since for any i, j ∈ N , we have xji ∈ {0, ξ}. This means

that i wants to create an additional link. (Unilaterally reducing the investment in a link

necessarily lowers i’s payoff.) The only link that i can create on his own is the individual

link. There are two cases:

1. xii = 0 and δi < κi. Then, by construction from the assignment algorithm, this implies

that wi(ξ) < 0. So i has no individual profitable deviation, since wi(xji ) < wi(ξ).

2. xii = 0 and δi = κi. Then, if i has a profitable deviation, there exists J ⊆ gi such

that wi(∑

j∈J εj) >∑

j∈J{vi(ξ, ξ, dij) − vi(ξ − εj, ξ, dij)}. That is, i is reducing his

investments in links in J in order to invest in his individual link. Let d∗ = maxj∈J dij,

34

we have

wi(∑j∈J

εj) >∑j∈J

{vi(ξ, ξ, dij)− vi(ξ − εj, ξ, dij)}

≥∑j∈J

{vi(ξ, ξ, d∗)− vi(ξ − εj, ξ, d∗)} (9)

≥ vi(ξ, ξ, d∗)− vi(ξ −

∑j∈J

εj, ξ, d∗) (10)

where (8) follows from vxd(x, ξ, d) ≤ 0, and (9) follows from vxx(x, ξ, d) ≥ 0. Now,

since vxx(x, ξ, d) ≥ 0, if (8) is true for∑

j∈J εj < ξ, it is also true for∑

j∈J εj =

ξ, hence wi(ξ) > vi(ξ, ξ, d∗). This contradicts the fact that x was created by the

assignment algorithm.

We still need to show that x is a WBE. Suppose that it’s not, i.e. there exists (i, j)

and (xi, xj) such that ui(xi, xj, x−i−j) > ui(x) and uj(xj, xi, x−i−j) > uj(x). From the

construction of x, it must be the case that i, j are such that xji = xij = 0. Again, we have

2 cases:

1. δi < κi and δj < κj. This is impossible since, from the assignment algorithm, it

implies that vi(ξ, ξ, dij) < 0.

2. δi = κi. Then, if i has a profitable deviation, there exists K ⊆ gi such that

vi(∑

k∈K εk, xij, dij) >

∑k∈K{vi(ξ, ξ, dik) − vi(ξ − εk, ξ, dik)}. Let d∗i = maxk∈K dik.

Then, we have

vi(∑k∈K

εk, xij, dij) >

∑k∈K

{vi(ξ, ξ, dik)− vi(ξ − εk, ξ, dik)}

≥∑k∈K

{vi(ξ, ξ, d∗i )− vi(ξ − εk, ξ, d∗i )} (11)

≥ vi(ξ, ξ, d∗i )− vi(ξ −

∑k∈K

εj, ξ, d∗i ) (12)

where (10) follows from vxd(x, ξ, d) ≤ 0, and (11) follows from vxx(x, ξ, d) ≥ 0. Now,

since vxx(x, ξ, d) ≥ 0, if (11) is true for∑

k∈K εk < ξ, it is also true for∑

k∈K εk = ξ,

hence vi(ξ, xij, dij) > vi(ξ, ξ, d

∗i ).

35

We now turn to j. If δj = κj, the same argument applies for j; then vj(ξ, ξ, dij) >

vj(ξ, ξ, d∗j) (and vi(ξ, ξ, dij) > vi(ξ, ξ, d

∗i )). Since we have dij < d∗i and dij < d∗j , this

contradicts the fact that x was created by the assignment algorithm.

If δj < κj, j has at least ξ to invest (and it is profitable to invest up to ξ since

vx(x, y, d) > 0), then together with the fact that dij < d∗i , this contradicts the fact

that x is produced by the assignment algorithm.

�

Proof of proposition 3.4

From theorem 3.3, it is sufficient to show that for any i, j ∈ N , xji ∈ {0, ξ}, at any NE.

Consider some i, j ∈ N , and suppose that xji ∈ (0, ξ). I show that this implies that

there exists k ∈ N such that xki ∈ (0, ξ). Suppose otherwise. Then, i still has resources

available. Since vx(x, y, d) > 0, i could increase xji and be better off. Hence, x is not a

NE, so it is not a WBE. Hence, there exists k ∈ N \ {i} such that xki ∈ (0, ξ). There are 2

cases:

1. [k = i]. Since x is a NE, we must have the following.

• If xii + xji ≥ ξ, then

wi(xii) + vi(x

ji , x

ij, dij) ≥ wi(ξ) + vi(x

ji + xii − ξ, xij, dij)

wi(xii) + vi(x

ji , x

ij, dij) ≥ wi(x

ji + xii − ξ) + vi(ξ, x

ij, dij)

Rewriting, we have

wi(ξ)− wi(xii) ≤ vi(xji , x

ij, dij)− vi(x

ji + xii − ξ, xij, dij)

wi(xii)− wi(x

ji + xii − ξ) ≥ vi(ξ, x

ij, dij)− vi(x

ji , x

ij, dij)

Since vxx(x, y, d) > 0, we have vi(ξ, xij, dij) − vi(x

ji , x

ij, dij) > vi(x

ji , x

ij, dij) −

vi(xji + xii − ξ, xij, dij), and since w′′(x) > 0, we have wi(ξ) − wi(xii) > wi(x

ii) −

wi(xji + xii − ξ). This is in contradiction with the above conditions. Hence, x is

not a NE.

36

• If xii + xji < ξ, then

wi(xii) + vi(x

ji , x

ij, dij) ≥ wi(x

ii + xji ) + vi(0, x

ij, dij)

wi(xii) + vi(x

ji , x

ij, dij) ≥ wi(0) + vi(x

ii + xji , x

ij, dij)

Rewriting, we have

wi(xii + xji )− wi(xii) ≤ vi(x

ji , x

ij, dij)− vi(0, xij, dij)

wi(xii)− wi(0) ≥ vi(x

ii + xji , x

ij, dij)− vi(x

ji , x

ij, dij)

Since vxx(x, y, d) > 0, we have vi(xji +x

ii, x

ij, dij)−vi(x

ji , x

ij, dij) > vi(x

ji , x

ij, dij)−

vi(0, xij, dij), and since w′′(x) > 0, we have wi(x

ji +xii)−wi(xii) > wi(x

ii)−wi(0).

Again, this is in contradiction with the above conditions. Hence, x is not a NE.

i 6= k and i 6= j .

Since x is a NE, we must have the following:

• If xki + xji ≥ ξ, then

vi(xki , x

ik, dik) + vi(x

ji , x

ij, dij) ≥ vi(ξ, x

ik, dik) + vi(x

ji + xki − ξ, xij, dij)

vi(xki , x

ik, dik) + vi(x

ji , x

ij, dij) ≥ vi(x

ji + xki − ξ, xik, dik) + vi(ξ, x

ij, dij)

Rewriting, we have

vi(ξ, xik, dik)− vi(xki , xik, dik) ≤ vi(x

ji , x

ij, dij)− vi(x

ji + xki − ξ, xij, dij)

vi(xki , x

ik, dik)− vi(x

ji + xki − ξ, xik, dik) ≥ vi(ξ, x

ij, dij)− vi(x

ji , x

ij, dij)

Since vxx(x, y, d) > 0, we have vi(ξ, xij, dij) − vi(x

ji , x

ij, dij) > vi(x

ji , x

ij, dij) −

vi(xji +xki − ξ, xij, dij), and vi(ξ, x

ik, dik)−vi(xki , xik, dik) > vi(x

ki , x

ik, dik)−vi(x

ji +

xki − ξ, xik, dik). This is in contradiction with the above conditions. Hence, x is

not a NE.

37

• If xii + xji < ξ, then

vi(xki , x

ik, dik) + vi(x

ji , x

ij, dij) ≥ vi(x

ji + xki , x

ik, dik) + vi(0, x

ij, dij)

vi(xki , x

ik, dik) + vi(x

ji , x

ij, dij) ≥ vi(0, x

ik, dik) + vi(x

ji + xki , x

ij, dij)

Rewriting, we have

vi(xji + xki , x

ik, dik)− vi(xki , xik, dik) ≤ vi(x

ji , x

ij, dij)− vi(0, xij, dij)

vi(xki , x

ik, dik)− vi(0, xik, dik) ≥ vi(x

ji + xki , x

ij, dij)− vi(x

ji , x

ij, dij)

Since vxx(x, y, d) > 0, we have vi(xji+x

ki , x

ij, dij)−vi(x

ji , x

ij, dij) > vi(x

ji , x

ij, dij)−

vi(0, xij, dij), and vi(x

ji+x

ki , x

ik, dik)−vi(xki , xik, dik) > vi(x

ki , x

ik, dik)−vi(0, xik, dik).

This is in contradiction with the above conditions. Hence, x is not a NE.

�


The proof is obvious from the proof of theorems 3.2 and 3.3. One only has to remark that

for any i, j, k ∈ N , vi(ξ, ξ, dij) ≥ vi(ξ, ξ, dik) implies that vi(ξ, ξ, dij) > vi(ξ, ξ, dik) if we

assume that dij 6= dkl. �


The fact that any strong NE needs to be produced by the assignment algorithm follows

from propositions 3.2 and 3.4. Suppose that x∗ ∈ X is a BE, but not a strong NE. There

exists S ⊂ N and xS ∈ ×i∈SXi such that ui(xS, x∗−S) > ui(x

∗) for all i ∈ S. We will

show that under strict convexity or discreteness, this implies that there exists a bilateral

deviation.

Under discreteness, xi ∈ {0, ξ}n for all i ∈ S. Using the same argument as the one used

in lemma 3.1, there exist at least one project created under a deviation by coalition S.

That is, ∃i, j ∈ S, such that xj∗i = xi∗j = 0 and xji = xij = ξ. Since the utility functions are

additive, this implies that i, j have a profitable bilateral deviation. Resources invested in

the link (i, j) must have come either from unused resources or from the deletion of another

38

link since xji ∈ {0, ξ} for all i, j ∈ N .

Under convexity, if it is profitable to withdraw resources from one link and invest in

two new links, it is even better to invest in only one of these links. (This is exactly the

argument used in proposition 3.3). Specifically, suppose that there exists i, j, k ∈ S such

that xji , xki > 0, and xj∗i = xk∗i = 0. Then, either xji = ξ and xki = 0 or xji = 0 and xki = ξ

is better for i. Then, i is willing to make a bilateral deviation with j (wlog). Since the

utilities are linear, it is also profitable for k (since it is under a joint deviation in S). Hence,

there exists a bilateral deviation between i and j. �

Appendix 2

The Assignment Algorithm

I generate a network g (represented by the adjacency matrix A) in which every individual

invests as much as possible in every active link (i.e. xji ∈ {0, ξi} for all i, j ∈ N). Before

presenting the formal algorithm, I discuss the intuition.

The algorithm starts with the empty network and proceeds by first linking the individ-

uals with the smallest distance (say i, j ∈ N), provided that:

1. The link between i and j leads to positive payoff for both individuals (otherwise, the

link between i and j is set to 0).

2. The link between i and j is better than the individual link for both individuals

(otherwise, the individual links are created).

3. Both individuals still have resources left (otherwise all remaining links for the indi-

vidual who has reached his budget constraint are set to 0).

I now proceed to the formal description of the algorithm.

Let ηji = vi(ξ, ξ, dij) for all i, j ∈ N such that i 6= j, and ηii = wi(ξ), for all i ∈ N .

This function represents the value of a link between two individuals. Now, define the (not

necessarily unique) ordered list L0 as follows: L0 = (dij)i,j∈N :i<j, such that L01 ≤ L0

2 ≤

... ≤ L0m. The list L0 is an ordered list of distance values, for all pairs of individuals. The

number of elements in L0 is the number of possible pairings between individuals in N , i.e.

39

n(n − 1)/2. Let L0l be the element of position l in the list L0. I denote (L0

l )−1 = (i, j) if

L0l = dij.

The algorithm computes g and takes Lt = L0 and A = 0 as inputs. It operates in two

steps.

1 Take the first element of the list Lt, i.e. Lt1. Let Lt1 = dij.

If aii = 0 or ajj = 0,

1. If ηii ≥ ηji and ηii ≥ 0, then aii = 1

2. If ηjj ≥ ηij and ηjj ≥ 0, then ajj = 1

Otherwise,

1. If ηji ≥ 0 and ηij ≥ 0, then set aij = aji = 1.

2. If ηji < 0, then generate L∗i = Lt \ {dik}k∈N :dik∈Lt. (That is, remove all distances

associated with i, since all the following distances will be greater than dij.)

3. If ηij < 0, then generate L∗i = Lt \ {djk}k∈N :djk∈Lt, i.e. do the same for j as we did

for i.

Generate Lt+1 = {(d ∈ Li∗ ∩ Lj∗) \ dij}.

2 Repeat (1) for t = 1, .... until |Lt| = 0 or until ∃i ∈ N such that δi = κi.

For all i ∈ N such that δi = κi, generate L∗i = Lt \{dik}k∈N :dik∈Lt. (That is, remove all

distances associated with i, since he has no resources left.) Then, generate Lt+1 = ∩i∈NLi∗

and repeat (1).

After the algorithm stops, I generate the allocation x as follows. For all i, j ∈ N , if

aij = 1, xji = ξ, otherwise xji = 0. Notice that by definition x ∈ X.

40

Appendix 3

The Likelihood Function

I assume that no individual is isolated. The definition of structural homophily is: For all

ij /∈ g, dij ≥ dik for all k ∈ gi or dij ≥ djk for all k ∈ gj. Then, since the εij are independent,

and ln(d) ≥ ln(d′) iff d ≥ d′, the probability that g exhibits structural homophily is

Πij /∈g{

Πk∈giP(dij ≥ dik) + Πk∈gjP(dij ≥ djk)− Πk∈giP(dij ≥ dik)Πk∈gjP(dij ≥ djk)}

This gives:

P(dij ≥ dik) = P(R∑r=1

βrρr(θi, θj) + εij ≥R∑r=1

βrρr(θi, θk) + εik)

At this point, the normalization of ε is necessary for the identification of β. Simplifying

the last expression, we have:

P(dij ≥ dik) = P(Z ≥R∑r=1

βr[ρr(θi, θk)− ρr(θi, θj)])

= 1− Φ(R∑r=1

βr[ρr(θi, θk)− ρr(θi, θj)])

Appendix 4

The economy is composed of 100 populations, each containing 20 individuals. There are

four individual variables defined as follows:

1. θ1 ∼ N(0, 4)

2. θ2 ∼ Bernoulli(0.2)

3. θ3 ∼ Bernoulli(0.5)

4. θ4 = cθ1 + (1− c)θ, where θ ∼ N(0, 4) and c ∈ (0, 1)

41

The distance function is defined as: d(θi, θj) = 2|θ1i−θ1j |+6I{θ2i=θ2j }+3I{θ3i=θ3j }+4|θ4i−θ4j |+εij,

where εij ∼iid N(0, 1).

The model is estimated using only the first three characteristics, and varying c ∈

{0, 0.2, 0.4, 0.6, 0.8, 1}. The implied correlation coefficients are {0, 0.16, 0.27, 0.35, 0.41, 0.45}.

I simulated 1, 000 replications for each value of c for a total of 6, 000 replications. Tables

6-10 display the results.

Figure 6: First Observed Characteristic (Dependent)

Average estimate and 95% CI. True value is 0.18

42

Figure 7: Second Observed Characteristic


Figure 8: Third Observed Characteristic


43

Appendix 5

Table 4: Robustness: Baseline vs Only Race (Sum normalized to 1)†

Baseline Model Alternative ModelDimensions Estimates S.E. Estimates S.E.White 0.107∗∗ (0.010) 0.117∗∗ (0.011)Hispanic 0.482∗∗ (0.024) 0.458∗∗ (0.021)Black 0.282∗∗ (0.027) 0.302∗∗ (0.029)Asian 0.114∗∗ (0.024) 0.119∗∗ (0.025)Native 0.015 (0.015) 0.005 (0.016)Log-Likelihood of the Baseline Model: -3128.033Log-Likelihood of the Alternative Model: -3500.414Likelihood Ratio: 744.761


44

Table 5: Robustness: Baseline vs Alt. Socio-Economic Var. (Sum normalized to 1)†

Baseline Model Alternative ModelDimensions Estimates S.E. Estimates S.E.Age 0.074∗∗ (0.005) 0.075∗∗ (0.006)Gender 0.189∗∗ (0.035) 0.187∗∗ (0.036)Work 0.015∗∗ (0.007) 0.019 (0.028)White 0.092∗∗ (0.009) 0.090∗∗ (0.009)Hispanic 0.360∗∗ (0.023) 0.360∗∗ (0.026)Black 0.238∗∗ (0.030) 0.239∗∗ (0.031)Asian 0.094∗∗ (0.020) 0.094∗∗ (0.021)Native 0.004 (0.012) 0.001 (0.013)Age-Gender -0.065∗∗ (0.006) -0.064∗∗ (0.006)Log-Likelihood of the Baseline Model: -3128.033Log-Likelihood of the Alternative Model: -3129.819Likelihood Ratio: 3.573


Table 6: Robustness: Baseline vs Gumbel-distributed shocks (Sum normalized to 1)†

Baseline Model Alternative ModelDimensions Estimates S.E. Estimates S.E.Age 0.074∗∗ (0.005) 0.073∗∗ (0.006)Gender 0.189∗∗ (0.035) 0.185∗∗ (0.039)Work 0.015∗∗ (0.007) 0.016∗∗ (0.006)White 0.092∗∗ (0.009) 0.088∗∗ (0.009)Hispanic 0.360∗∗ (0.023) 0.362∗∗ (0.025)Black 0.238∗∗ (0.030) 0.244∗∗ (0.033)Asian 0.094∗∗ (0.020) 0.094∗∗ (0.020)Native 0.004 (0.012) 0.001 (0.012)Age-Gender -0.065∗∗ (0.006) -0.061∗∗ (0.006)Log-Likelihood of the Baseline Model: -3128.033Log-Likelihood of the Alternative Model: -3115.614

† S.E computed using the delta method.†† ∗∗ for 1% significance level.††† Robust SE using the (sandwich) variance-covariance matrix for pseudo-m.l.e. The SE for the

probit model are clustered by school.

45

Table 7: Robustness: Baseline vs Probit (Sum normalized to 1)†

Baseline Model Alternative ModelDimensions Estimates S.E. Estimates S.E.Age 0.074∗∗ (0.005) 0.247∗∗ (0.008)Gender 0.189∗∗ (0.035) 0.244∗∗ (0.010)Work 0.015∗∗ (0.007) 0.026∗∗ (0.005)White 0.092∗∗ (0.009) 0.091∗∗ (0.012)Hispanic 0.360∗∗ (0.023) 0.086∗∗ (0.009)Black 0.238∗∗ (0.030) 0.299∗∗ (0.018)Asian 0.094∗∗ (0.020) 0.067∗∗ (0.017)Native 0.004 (0.012) 0.031∗∗ (0.006)Age-Gender -0.065∗∗ (0.006) -0.090∗∗ (0.007)Log-Likelihood of the Baseline Model: -3128.033Log-Likelihood of the Alternative Model: -142 337.750Likelihood Ratio: 278 419.500

† S.E computed using the delta method.†† ∗∗ for 1% significance level.††† Robust SE using the (sandwich) variance-covariance matrix for pseudo-m.l.e. The SE for the

probit model are clustered by school.

Table 8: Robustness: Cell Estimator†

Dimensions Estimates S.E.White 0.839∗∗ (0.003)Hispanic 0.286∗∗ (0.008)Black 0.633∗∗ (0.011)Asian 0.311∗∗ (0.010)Native 0.054∗∗ (0.006)

† S.E computed using the delta method.†† ∗∗ for 1% significance level.

46

References

Aumann, R. J. “Acceptable Points in General Cooperative n-person Games” In Contri-

bution to the Theory of Game IV, Annals of Mathematical Study 40 (1959), 287-324

Badev, A. “Discrete Games in Endogenous Networks: Theory and Policy” (2013), Work-

ing Paper

Bifulco, R., J. Fletcher, and S. L. Ross “The Effect of Classmate Characteristics on

Post-Secondary Outcomes: Evidence from the Add Health.” American Economic Journal:

Economic Policy, 3-1 (2011), p.25-53

Bloch F. and Dutta B. “Communication Networks with Endogenous Link Strength”,

Games and Economic Behavior, 66-1 (2009), 39-56

Bramoulle Y., Currarini, S., Jackson, M.O., Pin P. and Rogers B. “Homophily

and Long-Run Integration in Social Networks”, Journal of Economic Theory (2012),Forth-

coming

Cameron A.C. and Trivedi P.K. “Microeconometrics, Methods and Applications”,

Cambridge University Press, 2005

Christakis N., Fowler J., Imbens G.W. and Kalyanaraman K. “An Empirical

Model for Strategic Network Formation” (2010), Working Paper

Copic J., Jackson M.O. and Kirman A. “Identifying Community Structures from

Network Data via Maximum Likelihood Methods”, B.E. Press Journal of Theoretical Eco-

nomics 9-1, Article 30 (2009)

Currarini S., Jackson M. O., Pin P. “An Economic Model of Friendship: Homophily,

Minorities, and Segregation”, Econometrica, 77 (2009), 1003-1045

Currarini S., Jackson M. O., Pin P. “Identifying the roles of race-based choice and

chance in high school friendship network formation,” Proceedings of the National Academy

of Sciences of the United States of America, 107-11 (2010), 4857-4861

47

De Giorgi, G., M. Pellizzari, and S. Redaelli “Identification of social interactions

through partially overlapping peer groups”, American Economic Journal: Applied Eco-

nomics, 2 (2010), p.241-275

Echenique F. and Fryer R.G. “A Measure of Segregation Based on Social Interactions”

Quarterly Journal of Economics, 122-2 (2007), 441-485

Fortin B. and Yazbeck M.A., ”Peer Effects, Fast Food Consumption and Adolescent

Weight Gain”, (2011) Working Paper

Franz S., Marsili M. and Pin P. “Observed Choices and Underlying Opportunities”,

Science and Culture, 76-9,10 (2010), 471-476

Galeotti A., Goyal S. and Kamphorst J. “Network Formation with Heterogenous

Players”, Games and Economic Behavior, 54-2 (2006), 353-373

Goldsmith-Pinkham P. and Imbens W. G. “Social Networks and the Identification

of Peer Effects” Journal of Business & Economic Statistics, 31-3 (2013) 253-264

Golub B. and Jackson M.O. “Naive Learning in Social Networks: Convergence, In-

fluence and the Wisdom of Crowds”, American Economic Journal: Microeconomics 2-1

(2010a), 112-149

Golub B. and Jackson M.O. “Using selection bias to explain the observed structure

of Internet diffusions”, Proceedings of the National Academy of Sciences, 107-24 (2010b),

10833-10836

Goyal S. and Vega-Redondo F. “Structural Holes in Social Networks”, Journal of

Economic Theory, 137 (2007), 460-492

Gourieroux C., Monfort A. and Trognon A. “Pseudo Maximum Likelihood Methods:

Theory”, Econometrica, 52-3 (1984), 681-700

Hsieh C-S. and Lee L.F. “A Social Interactions Model with Endogenous Friendship

Formation and Selectivity”, (2013) Working Paper

Henry M. and Mourifie I. “Euclidean Revealed Preferences: Testing the Spatial Voting

Model”, Journal of Applied Econometrics, 8-4 (2013), p.650-666

48

Iijima R. and Kamada Y. “Social Distance and Network Structures” (2013), Working

Paper

Jackson M. O. Social and Economic Networks, Princeton University Press, 2008

Jackson M.O. and Rogers B.W. “The Economics of Small Worlds”, Journal of the

European Economic Association, 3-2,3 (2005), 617-627

Jackson M.O. and Rogers B.W. “Meeting Strangers and Friends of Friends: How

Random are Socially Generated Networks?”, American Economic Review, 97-3 (2007),

890-915

Jackson M. O. and Wolinsky “A Strategic Model of Social and Economic Networks”,

Journal of Economic Theory, 71 (1996), 44-74

Johnson C. and Gilles R. P. “Spatial Social Networks”, Review of Economic Design, 5

(2000), 273-299

van der Leij M., Rolfe M. and Toomet O. “On the Relationship Between Unexplained

Wage Gap and Social Network Connections for Ethnical Groups” (2009), Working Paper

Manski, C.F. “Economic Analysis of Social Interactions”, Journal of Economic Perspec-

tives, 14-3 (2000), 115-136

Marmaros D. and Sacerdote B. “How Do Friendships Form?”, Quarterly Journal of

Economics, 121-1 (2006), 79-119

McFadden D. “Conditional Logit Analysis of Qualitative Choice Behavior” in P. Zarem-

bka (ed.), Frontiers in econometrics (1974), 105-142

McFadden D. “Econometric Models of Probabilistic Choice”, in C. Manski and D. Mc-

Fadden (editors), Structural Analysis of Discrete Data with Econometric Applications,

Cambridge, Mass.:M.I.T.Press. 1981

Mele A. “A Structural Model of Segregation in Social Networks” (2013), Working Paper

McPherson M., Smith-Lovin L., and Cook M J. “Birds of a Feather: Homophily in

Social Networks”, Annual Review of Sociology, 27 (2001), 415-444

49

Patacchini E. and Zenou Y. “Ethnic Networks and Employment Outcomes”, Regional

Science and Urban Economics, 42-6 (2012), 938-949

Rivas J. “Friendship Selection”, International Journal of Game Theory, 38 (2009), 521-538

Rubı-Barcelo A. “Core/periphery scientific collaboration networks among very similar

researchers”, Theory and Decision, 72-4 (2012), p.463-483.

Watts A. “Formation of Segregated and Integrated Groups”, International Journal of

Game Theory, 35 (2007), 505-519

Sheng S. “Identification and Estimation of Network Formation Games” (2012), Working

Paper

50