Structural Homophily∗
Vincent Boucher1
Abstract
Homophily, or the fact that similar individuals tend to interact with each other, is a
prominent feature of economic and social networks. I show that the equilibrium structure
of homophily has empirical power. I build a strategic model of network formation, which
produces a unique equilibrium network. Individuals have homophilic preferences and face
capacity constraints on the number of links. I develop a novel empirical method, based on
the shape of the equilibrium network, which allows for the identification and estimation of
the underlying homophilic preferences. I apply this new methodology to the formation of
friendship networks.
JEL Codes: D85,C72,C13
∗Manuscript received May 2012, revised July 2013 and January 2014.1I would like to thank Yann Bramoulle, Marc Henry and Onur Ozgur for sharing their time, and
for their valuable comments and discussions. I also want to thank Lars Ehlers his precious help, andfor many discussions and comments. Thanks also to Paolo Pin for his helpful comments. I thank co-editor Holger Sieg and three anonymous referees for their valuable comments and suggestions. I wouldalso like to thank Ismael Mourifie, Louis-Philippe Beland, Yousef Msaid and David Karp, as well as theparticipants at many seminars including members of the Departments of Economics of Colorado StateUniversity, McGill University, Universite Laval and Universite de Sherbrooke. I would also like to thankthe participants at various conferences, including those of the Canadian Economics Association (2011),Coalition Theory Network (2010), Societe canadienne de science economique (2010), Econcon (2011),and Groupe de recherche international (2011), and the Cambridge-INET workshop on Networks (2012),for their questions, comments and suggestions. Finally, I gratefully acknowledge financial support fromCIREQ, FRQSC, SSHRC, and Cambridge-INET. This research uses data from Add Health, a programproject directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, andKathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development,with cooperative funding from 23 other federal agencies and foundations.
1 Introduction
The fact that similar individuals tend to interact with each other is a prominent feature of
social networks. The phenomenon, referred to as homophily, is increasingly being studied
by economists.2 Indeed, the structure of the social networks in which individuals inter-
act has been shown to significantly influence many social outcomes such as segregation,3
information transmission and learning,4 and employment and wages.5 Being able to un-
derstand, identify, and measure how the social characteristics of an individual influence
network formation is therefore of central importance to understanding social outcomes.
However, most studies to date have overlooked the equilibrium implications of homophily,
and disregard key factors such as the impact of time constraints.
In this paper, I develop a model of strategic network formation incorporating homophilic
preferences and capacity constraints on the number of links formed in a network. My anal-
ysis uncovers novel structural predictions generated by the equilibrium interplay between
the individuals’ homophilic preferences and capacity constraints. Building on the explicit
structure of homophily obtained at equilibrium, I develop a new estimation technique that
allows one to recover underlying homophilic preferences parameters. As an illustration, I
study the impact of homophilic preferences on the formation of friendship networks among
American teenagers. I find that individuals’ choices of friends are most strongly influenced
by racial considerations, and that Hispanics and Blacks are much more segregated than
Asians. I also find that age, gender and socio-economic differences significantly affect the
formation of friendship networks.
The emphasis on the equilibrium implications of homophilic preferences is new to the
literature. The equilibrium network resulting from the theoretical model exhibits more
structure than the known stylized facts regarding homophilic patterns in social networks.6
The equilibrium network architecture allows for an original empirical methodology using a
maximum likelihood approach. A key feature of my estimation strategy is that it recovers
2See for example Currarini et al. (2009), Bramoulle et al. (2012), and Rivas (2009).3Echenique and Fryer (2007), Watts (2007), and Mele (2013).4Golub and Jackson (2010a,2010b).5van der Leij et al. (2009) and Patacchini and Zenou (2012).6See Bramoulle et al. (2012), and Currarini et al (2009).
1
explicit preference parameters characterizing homophily in social networks. In other words,
the estimation strategy allows for the identification of (homophilic) preference interactions
from constraint interactions.7
My theoretical framework produces strong predictions. There exists a generically
unique equilibrium network. A key assumption is that the homophilic preferences of in-
dividuals can be represented by a distance function on the set of characteristics of the
individuals. This idea is implicitly or explicitly exploited by many papers looking at ho-
mophily in social networks.8 This assumption allows me to introduce enough heterogeneity
in the model to account for individuals’ observed heterogeneity (such as socio-economic and
demographic variables). I also assume that individuals have link-separable utilities, and
an explicit resource constraint, such as time. For example, while a teenager may prefer to
be friends with other teenagers who have similar characteristics, he must take into account
the fact that he has limited time to spend with the friends he chooses to have. The in-
troduction of the resource constraint implies that individuals’ equilibrium payoffs do not
only depend on the direct links they have, but on the whole equilibrium network. The
resource constraint also explicitly introduces an upper bound on the number of bilateral
relationships an individual can sustain.9 The specific notion of homophily emerging in
equilibrium results from the tension between individuals’ homophilic preferences and the
resource constraint. These two premises imply a novel theoretical prediction on the shape
of homophily in equilibrium. I call this specific network architecture structural homophily.
Structural homophily describes an explicit relationship between the individuals’ socioe-
conomic characteristics and the network architecture. An individual is characterized by
a “social neighborhood” on the space of individual characteristics.10 This neighborhood
explicitly determines the set of acceptable bilateral relationships. In a network charac-
terized by structural homophily, two individuals are linked if and only if they belong to
7Manski (2000) distinguishes between three sources of social interactions: preference interactions, con-straint interactions, and expectations interactions.
8See for example, Johnson and Gilles (2000), Marmaros and Sacerdote (2006), Iijima and Kamada(2013), Mele (2013) and Christakis et al. (2010).
9It relates to the sociological and psychological observation referred to as Dunbar’s number.10The social neighborhood relates to the sociological notion of a “social niche”; see for example McPher-
son et al. (2001).
2
the intersection of their neighborhoods. These neighborhoods are not directly observable,
but are implied by equilibrium predictions of the theoretical model for a given a distance
function. This novel theoretical prediction has empirical power.
I use structural homophily to develop an original estimation strategy. This strategy is
based on the duality between the equilibrium network structure and structural homophily.
Any equilibrium network exhibits structural homophily, and any observed network that
exhibits structural homophily is an equilibrium network. I develop a maximum likelihood
approach, defined over a population of distinct social networks. The empirical method
allows for the identification and estimation of the effect of homophilic preferences, while
controlling for many unobserved individual variables. This is relevant for policymaking
since it provides insight into the relative importance of a set of socio-economic character-
istics. For instance, the method allows for the measurement of the magnitude of racial
segregation.
As an illustration, I use data from the friendship networks of American teenagers pro-
vided by the Add Health database.11 I find a strong influence of race in the choice of
friendships. In comparison, age and gender differences are less important. Among the
many individual characteristics, socio-economic status (in particular, the labour market
status of the parents) least affects the formation of friendships.
This paper contributes to the theoretical and the empirical literature on network for-
mation. Most theoretical models of network formation produce relatively structured equi-
librium networks such as stars, circles or chains.12 These models, although highly relevant
from a theoretical perspective, are not well suited for empirical purposes. Indeed, the re-
sulting set of equilibrium networks is both too large (many equilibrium networks) and too
constraining (stars, chains, circles, etc.) to represent actual observable social networks.
Most theoretical models assume that payoffs depend on detailed features of the network
structure, but neglect the capacity constraints on the number of links an individual can
11Carolina Population Center, University of North Carolina at Chapel Hill; see http://www.cpc.unc.
edu/projects/addhealth.12Bala and Goyal (2000), Jackson (2008, chapter 6), Jackson and Rogers (1997), Jackson and Wolinsky
(1996), and Johnson and Gilles (2000).
3
make.13 I show that the introduction of this constraint, combined with explicit ex-ante
homophilic and link-separable utilities, implies the existence of a unique equilibrium net-
work.14
Two competing explanations of homophily have been proposed. The first is through
correlations in the meeting process:15 individuals have no preference bias, but individuals
with similar characteristics have a higher probability of meeting. The second is through
preference biases:16 individuals prefer to link with similar individuals. In this paper, I
assume that individuals have homophilic preferences, but evolve in a deterministic world.
I analyze the equilibrium implication of these preferences in a fully strategic setting.
The empirical literature on network formation is still in an early stage. The few existing
papers clearly identify homophily as a driving factor of the network formation process.17
This paper contributes to the literature on strategic network formation by providing an
estimation strategy based on the equilibrium structure of homophilic preferences. Focussing
on homophily has the advantage of recovering information relevant for policy making, while
controlling for many unobservable variables. Equilibrium considerations are important, as
they imply a departure from link-level estimation techniques. Specifically, the equilibrium
utility of an individual does not only depend on his direct links, but also on the whole
distribution of links in the population. The model defines a precise dependence structure
that allows for the definition of a standard maximum likelihood estimator.18 Focussing on
homophily allows me to complement and precise existing results in the literature.
There is also a relatively new related literature on peer effects with endogenous social
networks.19 These papers model the formation of a network jointly with a peer effect game
on the realized network. Accordingly, they allow for the identification and estimation of
13Exceptions include Bloch and Dutta (2009) and Rubı-Barcelo (2010).14I concentrate on strategic models of network formation. A large literature exists on random network
formation, which is not directly concerned with the current setting. Interested readers can refer to Jackson(2008, chapters 4 and 5) and the references therein.
15See for example Bramoulle et al. (2012)16See also Currarini et al. (2009), and Mele (2013)17See for example Christakis et al. (2010), Mele (2013), Currarini et al. (2010), Franz et al. (2010), and
Sheng (2012).18As opposed to the Bayesian approaches, as in Christakis et al. (2010), and Mele (2013).19Badev (2013), Goldsmith-Pinkham and Imbens (2013), and Hsieh and Lee (2013).
4
a wide range of social interaction parameters, but are unfortunately strongly dependent
on the structural specifications of the preferences. Focussing on homophily has the ad-
vantage of capturing relevant preference parameters while allowing for a wide range of
non-homophilic preferences.
The remainder of the paper is organized as follows. In section 2, I present the theoretical
model and key definitions. In section 3, I find and characterize the (unique) equilibrium
network. In section 4, I describe the empirical methodology and explore its properties using
Monte Carlo simulations. In section 5, I present an application focussed on the formation
of friendship networks in high schools, and discuss policy implications. I conclude with
section 6.
2 The Theoretical Model
In this section, I present a model of network formation that characterizes the equilibrium
effects of homophily. The model generically produces a unique equilibrium. I first provide
a formal definition of structural homophily. Next, I outline the theoretical framework, and
finally, I briefly present the main definitions and equilibrium concepts.
2.1 Structural Homophily
Some preliminary assumptions are necessary in order to introduce the notion of structural
homophily. There is a finite set of individuals N . Individuals may be linked together
through a network. Let gi ⊆ N be the set of individuals linked to individual i for all i ∈ N .
Each individual i ∈ N is characterized by a type θi ∈ Θ, where Θ is the type space. An
individual’s type could represent, for instance, a series of socioeconomic characteristics. I
consider a distance d on Θ. For notational simplicity, let dij ≡ d(θi, θj) for any i, j ∈ N .
Then, structural homophily is defined as follows.
Definition 1 A network g exhibits structural homophily with respect to a distance
d(., .) if whenever two individuals, i and j, are not linked, either dij ≥ maxk∈gi{dik} or
dij ≥ maxk∈gj{djk}.
5
This definition formalizes the fact that two individuals that are “close” should be
linked.20 Intuitively, if two individuals are not linked, it is because, from the point of
view of one of the individuals, the other is located relatively too far. Note that this defini-
tion only makes sense when the creation of a link requires mutual consent. Figure 1 shows
two examples of networks for Θ = R2. The first network exhibits structural homophily, but
the second does not. In Figure 1b, the closest individuals (i.e. D and E) are not linked,
which is in contradiction with structural homophily since D is linked to C, and E is linked
to B.
Figure 1: Structural Homophily
(a) Respected
A
B
C D
E
(b) Violated
A
B
C D
E
More insight can be obtained by drawing the equivalence (or indifference) curves cor-
responding to the farthest link for each of the individuals considered (i.e. for B and D in
Figure 2a, and for D and E in Figure 2b). These equivalence curves define neighborhoods;
every individual inside the neighborhood of i is closer to i than i’s farthest link. If both in-
dividuals belong to the intersection of the two neighborhoods generated by the equivalence
curves (as in Figure 2b), then structural homophily is violated.21
Structural homophily has an implication for revealed preferences. Suppose that indi-
viduals have preferences over links with other individuals, and that such preferences are a
20Note that the concept of closeness and location, in the context of a social network, is not necessarilylimited to the physical location of individuals. Closeness refers to how similar two individuals are acrossa number of characteristics, such as age, race and gender.
21This closely relates to the cutoff rule of Iijima and Kamada (2013).
6
Figure 2: Structural Homophily: Equivalence Curves
(a) Respected
A
B
C D
E
(b) Violated
A
B
C D
E
function of the distance between the individuals. Suppose that we also observe the network
(i.e. the individuals and their links), and the types of the individuals in the network (i.e.
a series of individual characteristics). Then, under mutual consent, we should not observe
networks such as the one depicted in Figure 2b. That is, structural homophily should hold.
It it interesting to note that small-world networks respect structural homophily for a
specific type space.22 In such small-world models, individuals are located on islands. In
that setting, structural homophily implies that individuals are linked first with individuals
on the same island. Hence, if there is a link between two islands, those islands have to be
fully connected. I now present a game that produces structural homophily at equilibrium.
2.2 The Game
There are n individuals, each of whom is endowed with a fixed amount of resources xi = κiξ,
where ξ ∈ R+ and κi ∈ N. We will see that, in equilibrium, κi is interpreted as the
maximum number of links that an individual i can have. A strategy for an individual i is
a vector xi = (x1i , ..., xni ) ∈ Xi, where Xi = {xi ∈ Rn
+|xji ≤ ξ, and
∑j∈N x
ji ≤ κiξ}. Then,
ξ plays the role of a link-level constraint. The introduction of the link-level constraint is
motivated by the empirical fact that the number of links varies across individuals. Let
X = ×i∈NXi. We say that there is a link between an individual i and an individual j iff
xji > 0 and xij > 0. Let gi = {j ∈ N |i and j are linked}, so j ∈ gi iff i ∈ gj. That is, a link
22See for example Jackson and Rogers (2005) and Galeotti et al. (2006).
7
exists iff both individuals invest a strictly positive amount of resources in it. Note that
individual i can be linked to himself.
The utility of an individual is given by the function ui : X → R. It is additive in the
different links he has, and it is represented by :
ui(x) =∑
j∈N\{i}
vi(xji , x
ij, dij) · I{j∈gi} + wi(x
ii) · I{i∈gi}
where I{P} is an indicator function that takes value 1 if P is true, and 0 otherwise. The
function vi(x, y, d) gives the value of any link for i. It is assumed to be twice continuously
differentiable with vx(x, y, d) > 0 if y > 0, vy(x, y, d) > 0 if x > 0, and vd(x, y, d) < 0 if
x, y > 0. The function wi(xii) represents the payoff received from the private investment of
i.23 It is also twice continuously differentiable with w′(x) > 0. I also allow for the presence
of fixed costs, i.e. vi(0, 0, d) ≤ 0 and wi(0) ≤ 0. Notice that an individual benefits from a
link only if both individuals invest in the link. The model induces a game Γ between the
n individuals. Formally, we have Γ = (N, {Xi}i∈N , {ui}i∈N).
The model has two important features. First, the initial endowment creates scarcity
and induces a feasibility constraint. This effect is typical of any matching model. If
some individual i invests resources in a link with individual j, he will have less available
resources to create a link with another individual. That is, the feasibility constraint implies
a tradeoff between the distance between two individuals, and the level of investment they
put in the link. This is what Manski (2000) refers to as “constraint interactions.” Second,
the preferences are affected by the presence of direct externalities. The amount of resources
invested by an individual in a given link directly affects the utility of the individuals he
links to. That is, in Manski’s terms, there are “preference interactions.” These two features
will play an important role in equilibrium.
This completes the description of the game. I now present the main definitions.
23The function wi can also be interpreted as the private value of the resource x for i.
8
2.3 Definitions
The collection of links between individuals generates a network g = (N,E). A network is
characterized by a set of individuals (here, N), and a set of links,E, which are (unordered)
pairs of individuals. The set of all possible networks is denoted by G. Any network g can
be represented by a n × n adjacency matrix A that takes values aij = 1 if j ∈ gi, and
0 otherwise, for all i, j ∈ N . The degree δi(g) of an individual i is the number of links
attached to i, i.e. δi(g) = |gi|.
I am interested in the following solution concepts:
Definition 2 A Nash equilibrium (NE) is a profile x∗ ∈ X such that ui(x∗i , x∗−i) ≥ ui(xi, x
∗−i)
for all xi ∈ Xi, and for all i ∈ N .
The set of Nash equilibria is very large. Since an individual benefits only from a
collaborative link when both individuals invest in the link, it will never be profitable to
unilaterally start a new link. For this reason, I focus on the following solution concept,
introduced by Goyal and Vega-Redondo (2007).
Definition 3 A bilateral equilibrium (BE) is a profile x∗ ∈ X such that :
(1) x∗ is a Nash equilibrium
(2) There exists no i, j ∈ N , such that ui(xi, xj, x∗−i−j) > ui(x
∗) and uj(xi, xj, x∗−i−j) ≥
uj(x∗) for some xi ∈ Xi and xj ∈ Xj.
This solution concept allows for bilateral deviations. This is a natural extension of
individual rationality, since individuals can benefit from the creation of links. For certain
economies, however, the BE concept will be too constraining. Accordingly, I also introduce
the following weakened equilibrium concept.
Definition 4 A weak bilateral equilibrium (WBE) is a profile x∗ ∈ X such that :
(1) x∗ is a Nash equilibrium
(2) There exists no i, j ∈ N , such that ui(xi, xj, x∗−i−j) > ui(x
∗) and uj(xi, xj, x∗−i−j) >
uj(x∗) for some xi ∈ Xi and xj ∈ Xj.
9
In a WBE, a deviation must strictly increase the payoff of both individuals involved.
Notice that BE ⊆ WBE ⊆ NE. I discuss the distinction between these concepts in
section 3.1 (lemma 3.1 and proposition 3.5).
3 Equilibrium Characterization
I first show the existence of an equilibrium. Since the payoff functions are not continuous,
we cannot directly use the standard fixed-point arguments. The existence of a NE is
straightforward. Let xji = 0 for all j 6= i. Then, for every individual, the maximization
problem becomes: maxxi∈Xiw(xii) · I{i∈gi}. The allocation x∗ ∈ X that maximizes this
problem for all i ∈ N is obviously a NE. In order to show the existence of a WBE (or a BE),
I will need to introduce additional assumptions. The next result provides an intuition of the
additional restrictions imposed by the bilateral stability on the solution set. It states that
if a deviation is jointly profitable, but not unilaterally profitable, the deviating individuals
have to invest more in their collaborative link. All proofs can be found in Appendix 1.
Lemma 3.1 If x∗ ∈ X is a NE, but not a WBE, given any deviating pair (i, j), with
profitable deviations xi ∈ Xi and xj ∈ Xj, we have xji > xj∗i and xij > xi∗j .
Since x∗ is a NE, it is individually rational. Also, since the utility functions are additive
in the different links, the actions of individual j only affects i through the link between i
and j. If x∗ is not jointly rational for i and j, the incentive to deviate must come from the
link that i and j have together.
Throughout this section, I consider two alternative assumptions:
Assumption 1 (Discreteness) For all i, j ∈ N , xji ∈ {0, ξ}
Assumption 2 (Convexity) For all i ∈ N , ∂2vi∂x2
(x, y, d) ≥ 0, ∂2wi
∂x2(x) ≥ 0
The discreteness assumption is extensively used in the literature.24 Convexity is often
assumed when the network formation process involves continuous strategies. For example,
24See for example Jackson (2008) chapters 6 and 11.
10
Bloch and Dutta (2009) define the strength of a link between individuals i and j as the
sum of a (strictly) convex function of the individuals’ investment, i.e. sij = f(xji ) + f(xij),
with f ′ > 0 and f ′′ > 0. Rubi-Barcelo (2012) uses a linear (and hence convex) function
to represent the payoff from scientific collaboration between two researchers.25 One of the
reasons why the literature has focussed on convex preferences as opposed to the “standard”
concavity assumption is as follows. Suppose that the value functions are concave. Then,
an individual would prefer to spend a very small amount of time with as many individuals
as possible. This is not what is actually observed, as most individuals have a relatively
small amount of links. I provide existence results and show that these two assumptions
imply that the equilibrium network exhibits structural homophily.
The next results are based on an algorithm referred to as the assignment algorithm,
and formally defined in Appendix 2. The assignment algorithm uses as inputs: (1) the
list of preferences {ui(x)}i∈N , (2) the individual characteristics {θi}i∈N , (3) the resource
constraints {κi}i∈N , and (4) the distance function d on Θ. It produces at least one allocation
x ∈ X, and any allocation produced is such that xji ∈ {0, ξ} for all i, j ∈ N . When
xji ∈ {0, ξ}, the payoff that an individual receives from the links can be ranked using the
distance function (a small distance implies a big payoff). Accordingly, the assignment
algorithm first links the pairs of individuals with the smallest distances (provided that
the link is profitable for both individuals, and leads to a higher payoff than the private
investment). The following results show that any allocation constructed in this fashion is
a WBE, and induces a network that exhibits structural homophily.
Let’s first consider the discrete case. Under discreteness, the involvement of an individ-
ual in some link does not affect the amount of resources he invests in his other (existing)
links. The value of a link between two arbitrary individuals is then independent of the
other (potential) links. Consequently, we have the following:
Theorem 3.2 (Discrete Strategy Space) Under discreteness, an allocation is a WBE
iff it is produced by the assignment algorithm.
25The value of a scientific collaboration as defined by Rubi-Barcelo (2012, p.7) is analogous to a distancein my model.
11
Under convexity, for a given link, it is also rational for both individuals to invest
resources until the link-level constraint ξ is met, provided that it leads to a positive payoff.
We then have the following:
Theorem 3.3 (Existence) Under convexity, any allocation produced by the assignment
algorithm is a WBE.
Proposition 3.4 gives sufficient conditions so that any individual has to invest up to the
link-level constraint, in any WBE.
Proposition 3.4 Suppose that the inequalities in Assumption 2 are strict. Then any WBE
can be produced by the assignment algorithm.
Then, under discreteness or strict convexity, any equilibrium can be constructed through
the assignment algorithm. It is worth noting that under discreteness, xji ∈ {0, ξ} by
assumption, while under strict convexity it must hold only in equilibrium.
The above results show the existence of a WBE, but not of a BE. The intuition is
the following. Suppose that discreteness holds, and that the economy contains only three
individuals: i, j, k (see Figure 3). Suppose also that dij = dik < djk, and that xi = xj =
xk = ξ. Finally, suppose that vi(ξ, ξ, dij) = vj(ξ, ξ, dij) = vk(ξ, ξ, dik) > 0, while any other
link has a negative value. Then, in this example, there is no BE, but there are two WBE.
The reason is that i is indifferent between a link with j or a link with k. So, if i is linked
with j, but receives a proposition from k to form a new link, he will be indifferent between
keeping his link with j and replacing it with a link with k (while k would be strictly better
off with such a deviation).
In many contexts, however, individuals have many characteristics, and the likelihood
of such a circumstance is small. In the absence of such a circumstance, we can show the
existence of a BE:
Proposition 3.5 Suppose that dij 6= dkl for any i 6= j and k 6= l. Then any WBE produced
by the assignment algorithm is a BE. Moreover, if dij is such that vi(ξ, ξ, dij) 6= wi(ξ) and
vi(ξ, ξ, dij) 6= 0, wi(ξ) 6= 0 for all i, j ∈ N , this equilibrium is unique.
12
This implies that if for all i ∈ N , the types θi ∈ Θ are drawn from a distribution with
a dense support on Θ, then there exists a unique WBE (if wi(ξ) 6= 0), which is also a BE,
a.s..
Figure 3: WBE and BE
(a) The First WBE (b) The Second WBE
Let’s now turn to the characterization of the equilibrium network. Since the level of
investment of an individual in a potential link does not depend on the number of links he
has, the payoffs are only influenced by the distance between the individuals. Suppose i
and j are linked. Then, the creation of a new link between j and k has no spillover effects
on i. This produces important consequences on the shape of the equilibrium network. The
next proposition characterizes the allocations produced by the assignment algorithm.
Proposition 3.6 (Characterization) Let g∗ be the network generated by some allocation
produced by the assignment algorithm, then
(1) For all i ∈ N , δi(g∗) ≤ κi.
(2) The network g∗ exhibits structural homophily with respect to the distance d.
The proof is immediate from the construction through the assignment algorithm. Since
investments are maximal in every link, the number of links an individual can have is
bounded by the resource constraint κi. Also, since the assignment algorithm first creates
links that have the shortest distance before forming links of longer distances, the induced
network exhibits structural homophily. In essence, under discreteness or (strict) convexity,
any equilibrium network can be constructed through the assignment algorithm, hence
satisfying structural homophily.
13
I now examine efficiency issues. There are many ways to define efficiency. First, one
could consider the Pareto criterion. Given discreteness or convexity, any BE is Pareto
efficient. In fact, there is an even stronger result, which is that any BE is a strong Nash
equilibrium.26
Proposition 3.7 Under discreteness or strict convexity, any BE is a strong Nash equilib-
rium.
Since the utility functions are additive, bilateral stability implies stability in the sense
of a strong Nash equilibrium. However, since the utility functions are non-continuous (and
utilities are not transferable), Pareto efficiency does not imply efficiency in the sense of the
utilitarian criterion. Consider the following social welfare function:
W (x) =∑i∈N
ui(x)
In this case, efficiency is not guaranteed. In particular, one can find examples of
economies where the unique BE is efficient (in the sense of the utilitarian and the Pareto
criteria), as well as examples of economies where the unique BE is inefficient (in the sense
of the utilitarian criterion). This inefficiency comes from two principle sources.
First, under the discreteness assumption, any efficient allocation z ∈ X is such that
zji ∈ {0, ξ} for all i, j ∈ N (by assumption). Since an individual values only his own payoff,
whereas the social planner (SP) cares about all individuals, a collaborative link is more
valuable for the SP than it is for individuals (since it enters the utility function of both the
individuals involved in the link). The tradeoff between individual and collaborative links
is thus different for an individual than for the SP.
Second, under the (strict) convexity assumption, another issue arises. Since the SP is
willing to trade off the utilities of the individuals, an efficient allocation z ∈ X need not
be such that zji ∈ {0, ξ}. For example, suppose that there are no fixed costs, then any
network g∗ such that δi(g∗) < κi for some i ∈ N is inefficient. The reason is that if δ∗i < κi
for some i ∈ N , the creation of a link with some agent j (who is willing to invest a small
26Aumann (1959)
14
amount ε) leads to vi(ξ, ε, dij) for i. If ε is small enough, the loss for j is compensated by
the discrete jump in the utility of i. Hence, g∗ is inefficient. However, it is possible that
such a network g∗ is induced by a BE.
This concludes the analysis of the theoretical model. In section 4, I develop an estima-
tion technique derived from structural homophily, and present Monte Carlo simulations.
4 The Econometric Model
In this section, I present the econometric model. I use structural homophily to estimate
the weights of the distance function.27 I would like to emphasize that the method and
results of this section are self-contained. If one was willing to assume structural homophily
(instead of viewing it as the equilibrium outcome of the game presented in the last section),
all the results of this section would apply.
In order to present the econometric model, I introduce the following definition:
Definition 5 An observation q is
1) a network g = (Nq, Eq), and
2) for each individual i ∈ Nq, a vector of R individual socioeconomic characteristics, i.e.
{θi}i∈N , where θi is a 1×R vector.
For a given observation q ∈ 1, ..., Q, I note (gq, θq), where θq is nq ×R.
From section 3’s results, the structure of the equilibrium depends on the distribution
of types in the population. Specifically, the probability that aij = 1 for some i, j ∈ N ,
does not (in equilibrium) only depend on the preferences and types of i and j, but on the
preferences and types of all individuals in the population. In order to develop a consistent
estimator, one then needs to observe many populations.
Definition 5 implies that the econometrician does not observe the specific level of invest-
ment in a link (i.e the link-level constraint), nor does he observe the resource constraint
27Copic et al. (2009) also exploit homophily, although in a very different context, in order to developtheir estimation technique.
15
κi.28 Accordingly, given a set of observations (gq, θq)
Qq=1, we do not possess enough in-
formation to construct the equilibrium network through the assignment algorithm, even
assuming some structural form for the utility functions. Specifically, a standard economet-
ric model would be the following. Given a parametric form for the payoff functions (i.e.
{vi(x, y, d), wi(x)}i∈Nq), and the distance function (i.e. d(i, j)), one would assume that the
data is generated by:
gq = Λ(θq, κq, ξq, εq; β) (1)
where Λ is the assignment algorithm, κq is the nq × 1 vector of individual resource con-
straints, ξq is the link-level resource constraint, εq is the error term, and β is the vector of
parameters to be estimated. Provided that one observes θq, κq, ξq, one could, in principle,
estimate β. Since κq and ξ are typically unobserved in existing data sets, I use a different
approach.29 From section 3’s results, I have established that any allocation produced by
the assignment algorithm respects structural homophily.30 Thus, my approach is to max-
imize the likelihood that the observed network exhibits structural homophily. Accordingly,
the distance function will play a central role. I assume the following structural form for
the distance function:
ln(dij) =L∑l=1
βlρl(θi, θj) + εij (2)
where ε ∼iid N(0, 1), and ρl(., .) is a dimension-wise distance function. For instance, if
Θ ⊆ R2, one could choose ρl(θi, θj) = |θli−θlj| for l = 1, 2. The vector (β1, ..., βL) ∈ Ξ ⊂ RL
contains the weights of the distance function.
Note that the proposed structural form is by no means the only possible one. Any
positive and symmetric function could be used. I prefer to use the specification in (2) to
simplify the exposition. Notice that by introducing the error term εij the probability that
two distances are exactly the same is null. Hence the result of proposition 3.5 applies a.s.
28Note that while κi is an upper bound to δi(g), they are not necessarily equal. See proposition 3.6.The value of ξ is less of a concern as it can be normalized to 1 by renormalizing the value functions vi andwi for all i. Another smaller issue is that the “individual link” is unobserved in most databases.
29There are also severe computational and identification issues using the specification in (1).30Also, by observing a network that exhibits structural homophily, one can always find some vi(x, y, d),
κi and ξ that are produced by the assignment algorithm.
16
Equation (2) highlights two important features of the model. First, instead of trying
to specifically identify all the parameters of the utility function, I limit myself to the
estimation of the impact of homophilic preferences on the network formation process. That
is, I only seek to estimate the parameters of the distance function, and not the parameters
of the utility functions (for instance, I do not estimate the value of the resource for the
individuals). This is illustrated in Figure 4. In Figure 4a, the individuals place more value
on the characteristic on the horizontal axis. Then, the “closest” individuals to the central
node are the ones on the top and bottom. Symmetrically, in Figure 4b, the individuals
place more value on the characteristic on the vertical axis. Then, the “closest” individuals
to the central node are the ones on the left and right.
My aim is to estimate the relative weights placed on various characteristics. Note that
centred ellipses, such as those depicted in Figure 4, are implied by the additive form we
assumed in (2). The generalization to a more general class of distance functions, such as
in Henry and Mourifie (2013), is straightforward. In section 5, for example, I introduce
non-diagonal weights between the age and gender dimensions of the type space.
Figure 4: Changing the Weight of the Distance Function
(a) Relative Importance onthe Horizontal Characteris-tic
(b) Relative Importance onthe Vertical Characteristic
Second, I assume that the distance function is observed with noise. That is, there exists
a set of variables, observed by the individuals, but unobserved by an econometrician, that
affects the distance function. This assumption is not standard and is discussed in section
17
4.1. For now, I present the maximum likelihood estimator.
Given (2), we can compute the probability (conditional on an observation) that a net-
work exhibits structural homophily. Let Ψ = 1− Φ, where Φ is the c.d.f. of the standard
normal distribution, and let γ =(β1/√
2, ..., βL/√
2). The probability that a network g
(given a set of characteristics θ) exhibits structural homophily is (algebraic manipulations
can be found in Appendix 3) :
P(sh|g, θ, γ) = Πij /∈g{
Πk∈giΨ [(sik − sij)γ′] + Πk∈gjΨ [(sjk − sij)γ′]
−Πk∈giΨ [(sik − sij)γ′] Πk∈gjΨ [(sjk − sij)γ′]}
(3)
where sij is the 1×L vector of dimension-wise distance, i.e. slij = ρl(θi, θj). Equation 3
assumes that there is no isolated individual (i.e. no individual i is such that gi ∈ {∅, {i}}).
This is done without loss of generality since for any pair of individuals in which one of the
individuals is isolated, the condition imposed by structural homophily is trivially respected.
Then, given that there are Q observations, I propose the following maximum likelihood
estimator:
`(β|θ) =1
Q
Q∑q=1
ln[P(sh|gq, θq, γ)] (4)
Provided that there exists a unique γ0 ∈ Ξ that maximizes (4), the maximum likelihood
estimator is well-behaved, and γ can be consistently estimated.31
The identification strategy is based on a link-deference approach. A link exists if no
individual refuses it. There are two reasons for an individual to refuse a link: (1) because
he has no resources left (constraint interactions), or (2) because the other individual is too
distant (preference interactions). I want to identify the preferences effect, given that the
resource constraint is unobserved, without imposing additional assumptions on vi and wi.
The estimation strategy can be viewed as to minimizing the probability that structural
homophily is violated.
31Although the function in (3) looks peculiar, the maximum likelihood setting is standard and theestimation of (4) requires only the usual set of assumptions. See for example Cameron and Trivedi (2005,p. 142-143) for the asymptotic properties of the maximum likelihood estimator.
18
Consider two alternative parameters β and β′. Suppose that we observe that two
individuals, i and j, are not linked together, as in Figure 5. According to β and β′, i
is linked to an individual farther from him than j. This means that i would have been
willing to create a link with j, but that j refused. This implies that j cannot be linked
to individuals farther from him than i; if j is linked with farther individuals, structural
homophily is violated. Thus, if j is linked to farther individuals than i under β, but not
under β′, then β′ is chosen over β to represent individuals’ preferences.
Figure 5: Admissible Parameters, Θ = R2
(a) Distance Weights according to β
i
j
(b) Distance Weights according to β′
i
j
This shows why isolated individuals (i.e. individuals that have no link) provide no infor-
mation: whatever the parameters’ values, they never contradict structural homophily. In
other words, for isolated individuals, we cannot identify whether they are isolated because
they have limited resources, or because they have strong homophilic preferences. From a
revealed preference approach, we gain information about an individual’s preferences by ob-
serving his choices. If an individual is not connected, he does not “consume” any resources.
We therefore cannot say anything about his preferences.
19
4.1 Distances and Errors
In this section, I discuss the strengths and limitations of focussing on homophily for the
estimation, and I discuss the interpretation of εij. The main advantage of using the distance
function is that it can account for a high degree of unobserved heterogeneity. The reason
is that any unobserved characteristics that affect the value of a link without affecting the
distance function have no impact on the estimation. It helps to compare with a simple
probit estimator for the probability that two individuals (i and j) create a link:
a∗ij = vi(xji , x
ij, dij) + εij (5)
where a∗ij is the latent variable for aij ∈ {0, 1}, and εij is an error affecting the value
of the link. The probit estimator requires two additional assumptions. First, since the
resource constraint creates dependence, the consistency of the probit estimator requires
κi ≥ n for all i ∈ N (i.e. time constraints are non-binding). Second, the identification of
(5) requires that one specifies a precise structural form for vi. This functional form can
only depend on i’s (observed) characteristics, or on a group fixed effect. Then, omitting a
relevant characteristic that affects the value of vi (and that is not captured by the group
fixed effect) will bias the estimation. By contrast, focussing on the distance function by
using the estimator in (4) allows one to account for any individual resource constraints
(correlated or not with the observables), and for any function vi. Also note that, contrary
to a link-based estimation (as with the probit estimator), the estimation is robust to the
dependence implied by the resource constraints on the equilibrium network.
Examples of such interdependencies between links can be found for example in Chris-
takis et al. (2010) and Mele (2013). Both papers find strong negative effects of creating
a link with an individual who has many friends. Here, this effect is also present, and
explicitly built into the model by the introduction of the resource constraint. When an
individual has many friends, he has less resources to spend on a potential new friend.
The obvious limitation of the estimator in (4) is that it does not allow for the iden-
tification of all the preference parameters of the model, but only those of the distance
20
function. This is not overly problematic, keeping in mind that the resource constraint
is unobserved in most data sets, and that the estimation procedure does not require any
specific assumptions on vi, wi and κi. However, provided that one observes κi for all i ∈ N
and his willing to assume a functional form for vi and wi, one could, in principle, use the
econometric model in (1) in order to estimate the full model.
I now turn to the interpretation of the unobserved component of the distance function,
i.e. εij. The error term potentially contains three sources of unobserved heterogeneity.
The first interpretation is straightforward; εij can be interpreted as a measurement error
on the distance function. We therefore assume that we observe all the relevant individual
characteristics, but less precisely than the individuals.
The second is to assume that the value of a link is affected by a random shock, and
is in line with the literature on Additive Random Utility Models (ARUM, see McFadden,
1981), i.e. vi(x, y, d) + εij. An ARUM is observationally equivalent to a model where the
distance is observed with noise. The reason is that one can always define a symmetric
function d : Θ2 → R such that vi(ξ, ξ, dij) = vi(ξ, ξ, dij) + εij for all i 6= j. For examble, for
the functional form assumed in (2), if vi is log-quasilinear in the distance, i.e. v(x, y, d) =
f(x, y)− ln(d), the two models are equivalent.
The third interpretation is to assume that there exist unobserved individual charac-
teristics that affect the distance function, i.e. there exists a dimension of the type space
that is unobserved. This is arguably the interpretation that is least compatible with the
assumption that the εij are iid. The independence of the errors is required to write the
closed-form expression of the likelihood function (see Appendix 3). To address the poten-
tial problem of misspecification, any reported standard errors in section 5 are the robust
“sandwich” standard errors for extremum estimators. The likelihood function in (4) can
then be interpreted as a pseudo-maximum likelihood estimator. The estimated parameters
then converge to their pseudo-true value.32
I also study the omission of a relevant dimension of the type space using Monte Carlo
simulations. Specifically, I generate a population of equilibrium networks using the assign-
32See Gourieroux et al. (1984) or for a general description of the properties of quasi maximum likelihoodestimators, see Cameron and Trivedi (2005) section 5.7.
21
ment algorithm for an economy in which individuals have a four-dimensional type space.
I then estimate the model using only three of these dimensions. I vary the correlation
between the individual characteristics and the omitted characteristic (see Appendix 4 for
a precise description of the Monte Carlo exercise).
The exercise departs from the theoretical model in two fundamental aspects. First, the
structure of the error term is no longer normal (or identically distributed). As there is
an unobserved dimension, the error is comprised of two elements: the iid error term, and
the unobserved dimension-wise distance, i.e. βlρl(θi, θj) + εij, where l is the unobserved
dimension. The second departure from the theoretical model is that the unobserved char-
acteristic generates dependence in the errors. Even with a correlation coefficient equal to
0 between the unobserved characteristic and the observed characteristics, the errors are no
longer independent, as ρl(θi, θj) is correlated with ρl(θi, θk), for any i, j, k ∈ N . Figures
6-8 in Appendix 4 display the simulations results. When the correlation coefficient is equal
to 0, the estimator is consistent. To ensure that the bias varies continuously with the
dependence on the omitted variable, I vary the correlation coefficient. The estimator is
robust to a small amount of dependence between the unobserved characteristic and the
observed characteristics. Looking at Figures 6-8 (Appendix 4), the true value falls within
the 95% confidence interval when the correlation coefficient is less than or equal to 20% for
the first characteristic (the one correlated with the unobserved one) and 30% for the other
characteristics. Obviously, as in most estimation techniques, omitting a strongly correlated
variable is likely to bias the estimation.
5 Empirical Application: High-School Friendship Net-
works
In this section, I estimate the weights of the distance function that leads to the formation of
the friendship networks of American teenagers. I find strong evidence of racial segregation,
especially for Hispanics and Blacks. I also find a significant effect of age and gender
differences. Disparities in economic status have a statistically significant, but quantitatively
22
small effect.
I use the Add Health database as it is particularly well suited for my model. Recall
that the model presented in sections 2 and 3 assumes that the individuals of the same pop-
ulation meet with certainty. A convincing empirical implementation thus requires that the
observed populations are small enough. In this regard, the Add Health database provides
information on students’ high schools, which are relatively small entities.33 Specifically, I
use the “In-School” sample, which provides information for every student in grades 7 to
12 at the schools in the sample.34 The sample includes the race (White, Hispanic, Black,
Asian or Native), age, gender, parents’ labor market status (work), and the friendship
networks of 37 798 teenagers, attending 122 high schools in the U.S. Every observed char-
acteristic is assumed to be a dimension of the type space. The type space thus has eight
dimensions: age, gender, White, Hispanic, Black, Asian, Native and work. For instance, a
16-years-old male teenager who identifies as Black-Asian, and has working parents would
be of type θ = (16, 0, 0, 0, 1, 1, 0, 1). Note that the “gender” dimension takes a value of 1
if the individual is a female, and 0 if the individual is a male, while the “work” dimension
takes a value of 1 if both parents work, and 0 otherwise.
I assume the following distance function:
ln d(xi, xj) = β1|x1i − x1j |+8∑r=2
βrI{xri 6=xrj} + β9|x1i − x1j | · I{x2i 6=x2j} + εij (6)
The parameter β9 reflects the fact that the effect of the age difference may differ
depending on whether a friendship is between individuals of the same or different gen-
ders. Given the distance function in (2), the distance between two teenagers, i and j,
who are identical except for race, where teenager i is White, and teenager j is Black, is
d(xi, xj) = βwhite + βblack.
33For this reason, and for computational reasons, I limit myself to schools for which I observe less than300 students after removing isolated individuals, which is about 68% of the schools in the database. (Iremoved the isolated individuals, as they provide no relevant information; see the last paragraph of p.18for further discussion). The results are robust to the use of alternative thresholds.
34For a more detailed description of the database, see http://www.cpc.unc.edu/projects/addhealth/design/wave1. The database also contains a “In-Home” sample, which contains more variables, but onlyfor a sub-sample of students within a school.
23
The Add Health questionnaire asks each teenager to identify their best friends (up to
10, and a maximum five males and five females). I assume that two individuals can be
friends only if they attend the same school. This approach is standard in the literature
using Add Health data. This allows each school (the set of teenagers and the network)
to be treated as an observation. To be coherent with the model presented in section 2, I
assume that there exists a link between two students iff they both identify the other as a
friend.
Table 1 summarizes the data. As noted in the previous section, the estimator does not
use any information on isolated individuals (individuals with no links), so these individuals
can be dropped from the sample. Table 2 shows descriptive statistics for the sample after
removing isolated individuals. Note that using the sub-sample of individuals with at least
one link does not affect the estimation. Using the whole sample would lead to exactly the
same likelihood value.
I estimate the model (4), using the distance function in (6). The estimated weights
(β1, ..., β9) and the corresponding standard errors are shown in Table 3. Since the weights
are only scale-identified, I report the relative effects, normalizing the total effect to one.
Then, the contribution of each characteristic (or dimension) can be interpreted as a per-
centage.
The general results are indicative of strong racial segregation in the choice of friend-
ships. The Hispanic dimension has the greatest importance (36%), followed by the Black
dimension (24%). The White (9%) and Asian (9%) dimensions have comparatively low
weights.
The effect of non-racial dimensions is small in comparison. The effect of gender dif-
ference (19%) is larger than the age difference (7% for one year). Remarkably, the model
captures heterogeneity in the effect of age: for same-sex friendships, the importance of
age is stronger. This could reflect the possibility that teenage girls may be more mature
compared to males of the same age, and hence be interested in forming friendships with
older boys.
The “work” dimension has a relatively small impact on friendship formation (1.5%).
24
Table 1: Descriptive Statistics
Mean S.E. Min MaxIndividualsAge 14.922 1.708 10 19Gender 0.484 0.500 0 1Work 0.777 0.417 0 1White 0.690 0.462 0 1Hispanic 0.147 0.354 0 1Black 0.131 0.337 0 1Asian 0.072 0.258 0 1Native 0.050 0.217 0 1Degree 1.189 1.422 0 9Pairs (Constructed)Linked 0.002 0.046 0 1∆(Age) 1.352 1.108 0 9∆(Gender) 0.498 0.500 0 1∆(Work) 0.328 0.469 0 1∆(White) 0.293 0.455 0 1∆(Hispanic) 0.195 0.396 0 1∆(Black) 0.152 0.359 0 1∆(Asian) 0.117 0.321 0 1∆(Native) 0.086 0.280 0 1Number of Schools: 122Number of Individuals: 37 798Number of Pairs: 10 396 915
In other words, the fact that a teenagers comes from a family where both parents work, or
not, has little impact his choice of friends. These results have important implications for
policy making, as discussed in section 5.1.
Table 3 reports the pseudo-R2 introduced by McFadden (1974), and the likelihood
ratio statistic when the model is compared to a model with no explanatory variables. The
pseudo-R2 is a percentage and indicates that the contribution of the selected variables to
structural homophily is strong.
Interestingly, the results in Table 3 are extremely close to the findings of Badev (2013).
Although he examines a very different context,35 Badev also finds that Hispanics are more
segregated than Blacks and Whites, and that racial characteristics play a significant role
35Badev (2013) is mostly interested in the impact of peer effects on smoking behavior, but also controlsfor the endogeneity of the network structure.
25
Table 2: Descriptive Statistics without Isolated Individuals
Mean S.E. Min MaxPairs (Constructed)Linked 0.011 0.104 0 1∆(Age) 1.332 1.162 0 8∆(Gender) 0.489 0.500 0 1∆(Work) 0.333 0.471 0 1∆(White) 0.236 0.425 0 1∆(Hispanic) 0.704 0.457 0 1∆(Black) 0.094 0.292 0 1∆(Asian) 0.098 0.298 0 1∆(Native) 0.083 0.276 0 1Number of Schools: 122Number of Pairs: 1 237 492
in friend selection.36
I now discuss the interpretation of the results in more depth. The estimates in Table
3 represent the weights of the distance function. Consider Hispanics. Their estimated
weight is the largest among the other racial groups. This means that they are more distant
then the other racial groups in the type space; the relative cost for them to create a
link with a different racial group (or for other racial groups to create links with them) is
greater than for other races. However, this does not imply that Hispanics will necessarily
create less friendship relations with other racial groups. It also depend on their resource
constraints and on the value and shape of vi. For example, some Blacks may have fewer
links than Hispanics, even if the weight for the Black dimension is lower than then one for
the Hispanic dimension. Since κi and vi are unobserved, this could either be because they
may have a larger resource constraint, or because they may receive greater payoff from
their friendships. This shows that the estimates of Table 3 are really capturing segregation
and not, for instance, the fact that the time constraint or the valuation of friendships may
be correlated with racial groups. The estimator developed in this paper then explicitly
captures the effect of homophily while controlling for individuals’ unobserved heterogeneity
on vi, wi and κi.
Another important characteristic of the model implies that one needs to be careful when
36Badev (2013) uses the categories Hispanics-Asians-Others, Whites and Blacks.
26
comparing categorical and non-categorical variables. Recall that the distance between two
individuals i and j, who are identical except for their racial groups (where i is Black and j
is White), is βBlack +βWhite = 0.330. In comparison, if i and j only differ on the fact that i
is one year older than j, the distance between them is βAge = 0.074. This implies that the
importance of racial characteristics, compared to non-racial ones, is actually larger than
what it appears to be at first glance.
Table 3: Relative Estimated Weights (Sum normalized to 1)†
Dimensions Estimates S.E.Age 0.074∗∗ (0.005)Gender 0.189∗∗ (0.035)Work 0.015∗∗ (0.007)White 0.092∗∗ (0.009)Hispanic 0.360∗∗ (0.023)Black 0.238∗∗ (0.030)Asian 0.094∗∗ (0.020)Native 0.004 (0.012)Age-Gender -0.065∗∗ (0.006)Log-Likelihood: -3128.033Pseudo-R2: 0.610Likelihood Ratio: 9797.450
† S.E computed using the delta method.†† ∗∗ for 1% significance level.††† Robust SE using the (sandwich) variance-covariance matrix for pseudo-m.l.e.
5.1 Robustness and Policy Implications
In this section, I discuss the robustness of the estimation and its implications for policy
making. I first compare the results of Table 3 (the baseline model) with three alterna-
tive specifications of the distance function.37 The first alternative model constrains the
estimation to racial characteristics. Results are reported in Table 4. To facilitate the
comparison with the baseline model, Table 4 also reports the estimated coefficients for the
racial characteristics in the baseline model, renormalizing the sum to 1.
37This section’s tables are reported in Appendix 5.
27
Estimates of the constrained model are quite similar to those of the baseline model.
This implies that the relative difference in the magnitude of segregation between racial
groups seems to be unaffected by other non-racial characteristics. Table 4 also reports
the likelihood ratio between the two models. Unsurprisingly, I strongly reject the null
hypothesis that the two models are equivalents.
Table 5 presents the second alternative model. Recall that in the baseline model, the
variable “work” is discrete and takes value of 1 if both of the student’s parents work,
and 0 otherwise. In the alternative model presented in Table 5, I use and alternative
definition where “work” takes value 1 if at least one of the student’s parents work, and
0 otherwise. Results are extremely similar to those of the baseline model. However, the
estimated coefficient for the “work” variable is no longer significant. As the two models
are very similar, the likelihood ratio test cannot reject the null hypothesis. This confirms
that socio-economic status of high school students has little influence on the choice of their
friends.
Table 6 displays results for the third alternative model. For this specification, I as-
sumed the the error term εij follows a Gumbel distribution. This choice is convenient as
the difference of two Gumbel distributions has a closed-form expression (i.e. a logistic
distribution).38 Table 6 shows that results are robust to this change in the distribution of
the errors, as the estimations are extremely similar.
In the next subsection, I compare the estimator based on structural homophily with
more standard estimators.
5.1.1 Gains with Respect to Simpler Estimators
In order to better understand the intuition of the estimator presented in the previous
sections, and to show the additional information it provides, I compare my results with
two simpler estimators, namely the probit and cell estimators.
Results for the comparison of the baseline model with a simple probit (with a school
fixed-effect and a school-clustered variance-covariance matrix) are reported in Table 7.
38As displayed in Appendix 3, the likelihood distribution is given by the difference of the distribution ofthe error term.
28
Estimated coefficients are substantially different, especially for Hispanics, and for the age
and gender dimensions.39
These results can be intuitively explained as follows. Let’s compare friendship choices
of Blacks and Hispanics. Looking at the data, one finds that 57% of the Blacks’ links are
with teenagers of the same age. In comparison, 46% of Hispanics’ links are with teenagers
of the same age.40 This gives an indication that, in the absence of teenagers of the same
age and the same gender, Hispanics form more links with younger (or older) teenagers
than with teenagers of the same age, but from a different racial group. This difference is
exacerbated by the fact that Hispanics form, on average, more links than Blacks. This
could be due to a difference in their resource constraints, or the benefit that they derive
from friendships.
Table 7 also reports the likelihood ratio test, which strongly rejects the null hypothesis
that the two models are equivalent.41 Not surprisingly, the probit estimator achieves a
much lower likelihood since it uses a lot less information (i.e. it does not account for the
dependence between links). Recall also, as discussed in section 4.1, that the probit assumes
more homogeneity between the individuals’ preferences (i.e. vi and wi).
The same findings hold when we compare the estimator based on structural homophily
with a simple cell estimator (that is, the ratio of same-characteristic friendship links to the
number of total friendship nominations).42 To simplify the exposition, I only consider the
model with racial characteristics. Results are reported in Table 8 for the cell estimator.
Whites and Blacks seem to be the most biased toward their own racial group, with 83.9%
and 63.3% of their links, respectively, within their own racial group. However, the cell
estimator strongly depends on the school composition, i.e. whether or not Whites (for
39Badev (2013) and Mele (2013) also find stronger racial segregation for Hispanics than for Blacks, whichcontradicts the results of the probit estimator.
40Specifically, I compare the number of “Black-Black” links of the same age (701) with the total numberof “Black-Black” links (1,223) and the number of “Hispanic-Hispanic” links of the same age (431) withthe total number of “Hispanic-Hispanic” links (936). Results hold if we look at “Black-Non Black” and“Hispanic-Non Hispanic” links.
41One should be careful when interpreting the results of the this likelihood ratio test as the two modelsare not only non-nested, but use fundamentally different estimators. The probit model maximizes thelikelihood of the random variable gij ∈ {0, 1}, while the estimator in (4) maximizes the likelihood ofstructural homophily, which is a function of gij .
42I thank an anonymous referee for this suggestion.
29
example) have access to potential non-White friends.
In contrast, the estimator presented in the previous sections uses information from
revealed preferences. To illustrate this, consider an extreme case where schools are perfectly
segregated for a particular racial group (e.g. everybody in a school is White, or nobody
is). In this extreme case, βWhite is not identified for the structural homophily estimator
since Whites never refuse non-White individuals; the cell estimator for Whites would be
equal to 1. Thus, one interpretation of the difference between the results of Table 8 and
the results of Table 4 is that Whites are more school-segregated than, for example, Blacks
or Hispanics.43
5.1.2 Policy Implications
I now discuss the implications of my findings for policy making. The obvious implication
of this section’ estimations is that it allows policy makers to obtain better estimations of
segregation. As Table 7 shows, a simple probit estimation would, in this case, drastically
underestimate the degree of segregation for Hispanics. Knowing which racial groups are the
most segregated is then highly important in order to target policies toward those groups.
It is been demonstrated that the socio-economic status of an individual is correlated
with academic achievement (e.g. the probability of completing high school and probability
of going to college). It has also been documented that the presence of “good” peers (with
educated/high-income parents) has a positive influence on academic achievement. Hence,
peers can act as a substitute for a teenager’s private endowment.44 However, a key problem
is that in most cases, the policy maker cannot directly “choose” an individual’s peers.
The results of Table 3 tell us that friendships are weekly dependent on socio-economic
status. This is a strong argument in favour of having teenagers of different socio-economic
backgrounds attending the same school. Since socio-economic factors are mostly irrelevant
in the choice of friendships, it is likely that friendships between “poor” and “rich” students
will form.
43Indeed, looking at the data, about half of the White students in the database study in schools with aratio of 80% of Whites in the school or more.
44See for example De Giorgi et al. (2010) and Bifulco et al. (2011).
30
The implications are opposite for race. One of the profound concerns of policy makers
is the strong racial segregation within schools.45 Since the results from Table 3 represent
preference parameters, they indicate that simply putting students from different races in
the same school will not have the anticipated peer effects. This is because the equilibrium
friendship network is likely to be highly racially segregated.
I now conclude by discussing the limitations of my approach and suggesting some
potential extensions.
6 Going Further
I have shown that structural homophily can be obtained by a game of network formation.
Under discreteness or (strict) convexity, any bilateral equilibrium of the game features
structural homophily. I have also shown that structural homophily has empirical impli-
cations. I have developed an estimation technique that can be used to estimate some
parameters of the model, namely the weights of the distance function. This method can
then identify which social characteristics significantly influence the network formation pro-
cess. Being able to estimate the magnitude of these relevant characteristics is an important
step in the process of designing efficient policies, as it allows policy makers to target rel-
evant characteristics. To illustrate this method, I estimated the weights of the distance
function in the context of friendship networks for teenagers. I find a strong effect of racial
variables as compared to other individual variables such as age, gender and socio-economic
status.
The model developed in this paper is a first step toward a better understanding of
network formation processes under time constraints. However, there are still many unan-
swered questions. For instance, the results in section 3 are based on the discreteness or
convexity assumption. Those are arguably strong assumptions as they imply that individ-
uals invest as much as they can in their existing links. This may not be true in general.
45See for example the report “Segregation and Exposure to High-Poverty Schools in Large MetropolitanAreas: 2008-09”, by Nancy McArdle, Theresa Osypuk, and Dolores Acevedo-Garcıa, available online athttp://diversitydata.sph.harvard.edu/Publications/school_segregation_report.pdf
31
However, the study of the model under a concavity assumption faces difficult existence
issues. One could address this issue by imposing additional assumptions on the shape of v
(such as vxy(x, y, d) > 0) or by considering weaker solution concepts such as pairwise sta-
bility (Jackson and Wolinsky, 1996) which potentially exhibit less structured equilibrium
networks.
Another potential extension would be to model the probability that individuals meet.
Without this, the set of potential friends for every individual is the whole population.
However, in a large population, some individuals may never meet, which would obviously
prevent them from creating a link. A simple way to introduce address this would be to
assume that the set of potential friends is limited to individuals that have “met.” Hence,
individuals can only invest resources in links with individuals in a subset of the population.
In this case, the (ex-post) strategy space would not be the same for every individual, but
structural homophily would still hold in equilibrium (but the estimation would require the
observation of the population subsets). More elaborate models could assume that meeting
friends is a costly process. Individuals would then be allowed to endogenously choose
the amount of resources they spend searching for friends.46 As the estimation technique
does not require the observation of the time constraints, structural homophily is likely to
hold in equilibrium. However, in both extensions, the estimated parameters may not be
interpreted in terms of preferences. If homophily affects the preferences and the random
meeting process, it is unclear how those two effects can be identified.
Department of Economics, Universite Laval, Canada. [email protected]
46A nice example of a search model with homophilic preferences is Currarini et al. (2009).
32
Appendix
Appendix 1
Proof of lemma 3.1
Let x∗ be some NE, and suppose that (i, j) is a deviating pair in the sense of a WBE. Let
(xi, xj) be some joint deviation for (i, j). We need to show that xji > xj∗i and xij > xi∗j .
Since (xi, xj) is a profitable deviation (in the sense of a WBE), we have
ui(xi, xj, x∗−i−j) > ui(x
∗) (7)
uj(xi, xj, x∗−i−j) > uj(x
∗)
Since x∗ is a NE, we have
ui(xi, x∗−i) ≤ ui(x
∗) (8)
uj(xj, x∗−j) ≤ uj(x
∗)
for all xi, and xj. In particular, condition (8) holds for xi = xi and xj = xj.
Putting conditions (7) and (8) together, we have : ui(xi, xj, x∗−i−j) > ui(xi, x
∗−i) and
uj(xi, xj, x∗−i−j) > uj(xj, x
∗−j). Since the utility function is linear in the links, this is
equivalent to vi(xji , x
ij, dij) > vi(x
ji , x
i∗j , dij) and vj(x
ji , x
ij, dij) > vj(x
ij, x
j∗i , dij). The pro-
duction functions are strictly increasing in the second argument, so we must have xji > xj∗i
and xij > xi∗j . (If xj∗i = xi∗j = 0, we have vi(xji , x
ij, dij) > 0 and vj(x
ij, x
ji , dij) > 0, and the
result is straightforward.) �
Proof of theorem 3.2
First, we show that x produced by the assignment algorithm (see Appendix 2) is a NE.
By construction, we have vi(ξ, ξ, dij) ≥ 0, and wi(ξ) ≥ 0, hence removing a link is never
profitable. Now, the only link that an individual can unilaterally create is the individual
link. Suppose that it is profitable to do so for i ∈ N . Then either [δi < κi and wi(ξ) > 0],
or [δi = κi and wi(ξ) > minj∈gi vi(ξ, ξ, dij)]. By construction, both are impossible.
Now, suppose that x is a NE, but not a WBE. That is, there exists i, j ∈ N such that
33
j /∈ gi (from lemma 3.1, since xji ∈ {0, ξ}) who want to deviate, i.e. create a link between
them. There are 2 cases:
1. δi = κi. Then, i needs to remove a link in order to create a new link. (Since x is a
NE, he won’t remove more than one link.) Then, this implies that there exists k ∈ gisuch that vi(ξ, ξ, dij) > vi(ξ, ξ, dik) ≥ 0. This implies that dij < dik.
We now turn to j. If δj = κj, the same argument applies for j, then vj(ξ, ξ, dij) >
vj(ξ, ξ, djl) for some l ∈ gj (and vi(ξ, ξ, dij) > vi(ξ, ξ, dik)). Since we have dij < dik and
dij < djl, this contradicts the fact that x was created by the assignment algorithm.
If δj < κj, j has at least ξ to invest. Together with the fact that dij < dik, this
contradicts the fact that x is produced by the assignment algorithm.
2. δi < ki and δj < kj. This is impossible since, from the assignment algorithm, it
implies that vi(ξ, ξ, dij) < 0 or vj(ξ, ξ, dij) < 0.
�
Proof of theorem 3.3
We need to show that the allocation x ∈ X, which is produced by the assignment algorithm
(see Appendix 2), is a WBE of Γ.
We first show that x is a NE. Suppose that it is not; that is, there exists i ∈ N such
that xi is not individually rational. Since for any i, j ∈ N , we have xji ∈ {0, ξ}. This means
that i wants to create an additional link. (Unilaterally reducing the investment in a link
necessarily lowers i’s payoff.) The only link that i can create on his own is the individual
link. There are two cases:
1. xii = 0 and δi < κi. Then, by construction from the assignment algorithm, this implies
that wi(ξ) < 0. So i has no individual profitable deviation, since wi(xji ) < wi(ξ).
2. xii = 0 and δi = κi. Then, if i has a profitable deviation, there exists J ⊆ gi such
that wi(∑
j∈J εj) >∑
j∈J{vi(ξ, ξ, dij) − vi(ξ − εj, ξ, dij)}. That is, i is reducing his
investments in links in J in order to invest in his individual link. Let d∗ = maxj∈J dij,
34
we have
wi(∑j∈J
εj) >∑j∈J
{vi(ξ, ξ, dij)− vi(ξ − εj, ξ, dij)}
≥∑j∈J
{vi(ξ, ξ, d∗)− vi(ξ − εj, ξ, d∗)} (9)
≥ vi(ξ, ξ, d∗)− vi(ξ −
∑j∈J
εj, ξ, d∗) (10)
where (8) follows from vxd(x, ξ, d) ≤ 0, and (9) follows from vxx(x, ξ, d) ≥ 0. Now,
since vxx(x, ξ, d) ≥ 0, if (8) is true for∑
j∈J εj < ξ, it is also true for∑
j∈J εj =
ξ, hence wi(ξ) > vi(ξ, ξ, d∗). This contradicts the fact that x was created by the
assignment algorithm.
We still need to show that x is a WBE. Suppose that it’s not, i.e. there exists (i, j)
and (xi, xj) such that ui(xi, xj, x−i−j) > ui(x) and uj(xj, xi, x−i−j) > uj(x). From the
construction of x, it must be the case that i, j are such that xji = xij = 0. Again, we have
2 cases:
1. δi < κi and δj < κj. This is impossible since, from the assignment algorithm, it
implies that vi(ξ, ξ, dij) < 0.
2. δi = κi. Then, if i has a profitable deviation, there exists K ⊆ gi such that
vi(∑
k∈K εk, xij, dij) >
∑k∈K{vi(ξ, ξ, dik) − vi(ξ − εk, ξ, dik)}. Let d∗i = maxk∈K dik.
Then, we have
vi(∑k∈K
εk, xij, dij) >
∑k∈K
{vi(ξ, ξ, dik)− vi(ξ − εk, ξ, dik)}
≥∑k∈K
{vi(ξ, ξ, d∗i )− vi(ξ − εk, ξ, d∗i )} (11)
≥ vi(ξ, ξ, d∗i )− vi(ξ −
∑k∈K
εj, ξ, d∗i ) (12)
where (10) follows from vxd(x, ξ, d) ≤ 0, and (11) follows from vxx(x, ξ, d) ≥ 0. Now,
since vxx(x, ξ, d) ≥ 0, if (11) is true for∑
k∈K εk < ξ, it is also true for∑
k∈K εk = ξ,
hence vi(ξ, xij, dij) > vi(ξ, ξ, d
∗i ).
35
We now turn to j. If δj = κj, the same argument applies for j; then vj(ξ, ξ, dij) >
vj(ξ, ξ, d∗j) (and vi(ξ, ξ, dij) > vi(ξ, ξ, d
∗i )). Since we have dij < d∗i and dij < d∗j , this
contradicts the fact that x was created by the assignment algorithm.
If δj < κj, j has at least ξ to invest (and it is profitable to invest up to ξ since
vx(x, y, d) > 0), then together with the fact that dij < d∗i , this contradicts the fact
that x is produced by the assignment algorithm.
�
Proof of proposition 3.4
From theorem 3.3, it is sufficient to show that for any i, j ∈ N , xji ∈ {0, ξ}, at any NE.
Consider some i, j ∈ N , and suppose that xji ∈ (0, ξ). I show that this implies that
there exists k ∈ N such that xki ∈ (0, ξ). Suppose otherwise. Then, i still has resources
available. Since vx(x, y, d) > 0, i could increase xji and be better off. Hence, x is not a
NE, so it is not a WBE. Hence, there exists k ∈ N \ {i} such that xki ∈ (0, ξ). There are 2
cases:
1. [k = i]. Since x is a NE, we must have the following.
• If xii + xji ≥ ξ, then
wi(xii) + vi(x
ji , x
ij, dij) ≥ wi(ξ) + vi(x
ji + xii − ξ, xij, dij)
wi(xii) + vi(x
ji , x
ij, dij) ≥ wi(x
ji + xii − ξ) + vi(ξ, x
ij, dij)
Rewriting, we have
wi(ξ)− wi(xii) ≤ vi(xji , x
ij, dij)− vi(x
ji + xii − ξ, xij, dij)
wi(xii)− wi(x
ji + xii − ξ) ≥ vi(ξ, x
ij, dij)− vi(x
ji , x
ij, dij)
Since vxx(x, y, d) > 0, we have vi(ξ, xij, dij) − vi(x
ji , x
ij, dij) > vi(x
ji , x
ij, dij) −
vi(xji + xii − ξ, xij, dij), and since w′′(x) > 0, we have wi(ξ) − wi(xii) > wi(x
ii) −
wi(xji + xii − ξ). This is in contradiction with the above conditions. Hence, x is
not a NE.
36
• If xii + xji < ξ, then
wi(xii) + vi(x
ji , x
ij, dij) ≥ wi(x
ii + xji ) + vi(0, x
ij, dij)
wi(xii) + vi(x
ji , x
ij, dij) ≥ wi(0) + vi(x
ii + xji , x
ij, dij)
Rewriting, we have
wi(xii + xji )− wi(xii) ≤ vi(x
ji , x
ij, dij)− vi(0, xij, dij)
wi(xii)− wi(0) ≥ vi(x
ii + xji , x
ij, dij)− vi(x
ji , x
ij, dij)
Since vxx(x, y, d) > 0, we have vi(xji +x
ii, x
ij, dij)−vi(x
ji , x
ij, dij) > vi(x
ji , x
ij, dij)−
vi(0, xij, dij), and since w′′(x) > 0, we have wi(x
ji +xii)−wi(xii) > wi(x
ii)−wi(0).
Again, this is in contradiction with the above conditions. Hence, x is not a NE.
i 6= k and i 6= j .
Since x is a NE, we must have the following:
• If xki + xji ≥ ξ, then
vi(xki , x
ik, dik) + vi(x
ji , x
ij, dij) ≥ vi(ξ, x
ik, dik) + vi(x
ji + xki − ξ, xij, dij)
vi(xki , x
ik, dik) + vi(x
ji , x
ij, dij) ≥ vi(x
ji + xki − ξ, xik, dik) + vi(ξ, x
ij, dij)
Rewriting, we have
vi(ξ, xik, dik)− vi(xki , xik, dik) ≤ vi(x
ji , x
ij, dij)− vi(x
ji + xki − ξ, xij, dij)
vi(xki , x
ik, dik)− vi(x
ji + xki − ξ, xik, dik) ≥ vi(ξ, x
ij, dij)− vi(x
ji , x
ij, dij)
Since vxx(x, y, d) > 0, we have vi(ξ, xij, dij) − vi(x
ji , x
ij, dij) > vi(x
ji , x
ij, dij) −
vi(xji +xki − ξ, xij, dij), and vi(ξ, x
ik, dik)−vi(xki , xik, dik) > vi(x
ki , x
ik, dik)−vi(x
ji +
xki − ξ, xik, dik). This is in contradiction with the above conditions. Hence, x is
not a NE.
37
• If xii + xji < ξ, then
vi(xki , x
ik, dik) + vi(x
ji , x
ij, dij) ≥ vi(x
ji + xki , x
ik, dik) + vi(0, x
ij, dij)
vi(xki , x
ik, dik) + vi(x
ji , x
ij, dij) ≥ vi(0, x
ik, dik) + vi(x
ji + xki , x
ij, dij)
Rewriting, we have
vi(xji + xki , x
ik, dik)− vi(xki , xik, dik) ≤ vi(x
ji , x
ij, dij)− vi(0, xij, dij)
vi(xki , x
ik, dik)− vi(0, xik, dik) ≥ vi(x
ji + xki , x
ij, dij)− vi(x
ji , x
ij, dij)
Since vxx(x, y, d) > 0, we have vi(xji+x
ki , x
ij, dij)−vi(x
ji , x
ij, dij) > vi(x
ji , x
ij, dij)−
vi(0, xij, dij), and vi(x
ji+x
ki , x
ik, dik)−vi(xki , xik, dik) > vi(x
ki , x
ik, dik)−vi(0, xik, dik).
This is in contradiction with the above conditions. Hence, x is not a NE.
�
Proof of proposition 3.5
The proof is obvious from the proof of theorems 3.2 and 3.3. One only has to remark that
for any i, j, k ∈ N , vi(ξ, ξ, dij) ≥ vi(ξ, ξ, dik) implies that vi(ξ, ξ, dij) > vi(ξ, ξ, dik) if we
assume that dij 6= dkl. �
Proof of proposition 3.7
The fact that any strong NE needs to be produced by the assignment algorithm follows
from propositions 3.2 and 3.4. Suppose that x∗ ∈ X is a BE, but not a strong NE. There
exists S ⊂ N and xS ∈ ×i∈SXi such that ui(xS, x∗−S) > ui(x
∗) for all i ∈ S. We will
show that under strict convexity or discreteness, this implies that there exists a bilateral
deviation.
Under discreteness, xi ∈ {0, ξ}n for all i ∈ S. Using the same argument as the one used
in lemma 3.1, there exist at least one project created under a deviation by coalition S.
That is, ∃i, j ∈ S, such that xj∗i = xi∗j = 0 and xji = xij = ξ. Since the utility functions are
additive, this implies that i, j have a profitable bilateral deviation. Resources invested in
the link (i, j) must have come either from unused resources or from the deletion of another
38
link since xji ∈ {0, ξ} for all i, j ∈ N .
Under convexity, if it is profitable to withdraw resources from one link and invest in
two new links, it is even better to invest in only one of these links. (This is exactly the
argument used in proposition 3.3). Specifically, suppose that there exists i, j, k ∈ S such
that xji , xki > 0, and xj∗i = xk∗i = 0. Then, either xji = ξ and xki = 0 or xji = 0 and xki = ξ
is better for i. Then, i is willing to make a bilateral deviation with j (wlog). Since the
utilities are linear, it is also profitable for k (since it is under a joint deviation in S). Hence,
there exists a bilateral deviation between i and j. �
Appendix 2
The Assignment Algorithm
I generate a network g (represented by the adjacency matrix A) in which every individual
invests as much as possible in every active link (i.e. xji ∈ {0, ξi} for all i, j ∈ N). Before
presenting the formal algorithm, I discuss the intuition.
The algorithm starts with the empty network and proceeds by first linking the individ-
uals with the smallest distance (say i, j ∈ N), provided that:
1. The link between i and j leads to positive payoff for both individuals (otherwise, the
link between i and j is set to 0).
2. The link between i and j is better than the individual link for both individuals
(otherwise, the individual links are created).
3. Both individuals still have resources left (otherwise all remaining links for the indi-
vidual who has reached his budget constraint are set to 0).
I now proceed to the formal description of the algorithm.
Let ηji = vi(ξ, ξ, dij) for all i, j ∈ N such that i 6= j, and ηii = wi(ξ), for all i ∈ N .
This function represents the value of a link between two individuals. Now, define the (not
necessarily unique) ordered list L0 as follows: L0 = (dij)i,j∈N :i<j, such that L01 ≤ L0
2 ≤
... ≤ L0m. The list L0 is an ordered list of distance values, for all pairs of individuals. The
number of elements in L0 is the number of possible pairings between individuals in N , i.e.
39
n(n − 1)/2. Let L0l be the element of position l in the list L0. I denote (L0
l )−1 = (i, j) if
L0l = dij.
The algorithm computes g and takes Lt = L0 and A = 0 as inputs. It operates in two
steps.
1 Take the first element of the list Lt, i.e. Lt1. Let Lt1 = dij.
If aii = 0 or ajj = 0,
1. If ηii ≥ ηji and ηii ≥ 0, then aii = 1
2. If ηjj ≥ ηij and ηjj ≥ 0, then ajj = 1
Otherwise,
1. If ηji ≥ 0 and ηij ≥ 0, then set aij = aji = 1.
2. If ηji < 0, then generate L∗i = Lt \ {dik}k∈N :dik∈Lt. (That is, remove all distances
associated with i, since all the following distances will be greater than dij.)
3. If ηij < 0, then generate L∗i = Lt \ {djk}k∈N :djk∈Lt, i.e. do the same for j as we did
for i.
Generate Lt+1 = {(d ∈ Li∗ ∩ Lj∗) \ dij}.
2 Repeat (1) for t = 1, .... until |Lt| = 0 or until ∃i ∈ N such that δi = κi.
For all i ∈ N such that δi = κi, generate L∗i = Lt \{dik}k∈N :dik∈Lt. (That is, remove all
distances associated with i, since he has no resources left.) Then, generate Lt+1 = ∩i∈NLi∗
and repeat (1).
After the algorithm stops, I generate the allocation x as follows. For all i, j ∈ N , if
aij = 1, xji = ξ, otherwise xji = 0. Notice that by definition x ∈ X.
40
Appendix 3
The Likelihood Function
I assume that no individual is isolated. The definition of structural homophily is: For all
ij /∈ g, dij ≥ dik for all k ∈ gi or dij ≥ djk for all k ∈ gj. Then, since the εij are independent,
and ln(d) ≥ ln(d′) iff d ≥ d′, the probability that g exhibits structural homophily is
Πij /∈g{
Πk∈giP(dij ≥ dik) + Πk∈gjP(dij ≥ djk)− Πk∈giP(dij ≥ dik)Πk∈gjP(dij ≥ djk)}
This gives:
P(dij ≥ dik) = P(R∑r=1
βrρr(θi, θj) + εij ≥R∑r=1
βrρr(θi, θk) + εik)
At this point, the normalization of ε is necessary for the identification of β. Simplifying
the last expression, we have:
P(dij ≥ dik) = P(Z ≥R∑r=1
βr[ρr(θi, θk)− ρr(θi, θj)])
= 1− Φ(R∑r=1
βr[ρr(θi, θk)− ρr(θi, θj)])
Appendix 4
The economy is composed of 100 populations, each containing 20 individuals. There are
four individual variables defined as follows:
1. θ1 ∼ N(0, 4)
2. θ2 ∼ Bernoulli(0.2)
3. θ3 ∼ Bernoulli(0.5)
4. θ4 = cθ1 + (1− c)θ, where θ ∼ N(0, 4) and c ∈ (0, 1)
41
The distance function is defined as: d(θi, θj) = 2|θ1i−θ1j |+6I{θ2i=θ2j }+3I{θ3i=θ3j }+4|θ4i−θ4j |+εij,
where εij ∼iid N(0, 1).
The model is estimated using only the first three characteristics, and varying c ∈
{0, 0.2, 0.4, 0.6, 0.8, 1}. The implied correlation coefficients are {0, 0.16, 0.27, 0.35, 0.41, 0.45}.
I simulated 1, 000 replications for each value of c for a total of 6, 000 replications. Tables
6-10 display the results.
Figure 6: First Observed Characteristic (Dependent)
Average estimate and 95% CI. True value is 0.18
42
Figure 7: Second Observed Characteristic
Average estimate and 95% CI. True value is 0.54
Figure 8: Third Observed Characteristic
Average estimate and 95% CI. True value is 0.27
43
Appendix 5
Table 4: Robustness: Baseline vs Only Race (Sum normalized to 1)†
Baseline Model Alternative ModelDimensions Estimates S.E. Estimates S.E.White 0.107∗∗ (0.010) 0.117∗∗ (0.011)Hispanic 0.482∗∗ (0.024) 0.458∗∗ (0.021)Black 0.282∗∗ (0.027) 0.302∗∗ (0.029)Asian 0.114∗∗ (0.024) 0.119∗∗ (0.025)Native 0.015 (0.015) 0.005 (0.016)Log-Likelihood of the Baseline Model: -3128.033Log-Likelihood of the Alternative Model: -3500.414Likelihood Ratio: 744.761
† S.E computed using the delta method.†† ∗∗ for 1% significance level.††† Robust SE using the (sandwich) variance-covariance matrix for pseudo-m.l.e.
44
Table 5: Robustness: Baseline vs Alt. Socio-Economic Var. (Sum normalized to 1)†
Baseline Model Alternative ModelDimensions Estimates S.E. Estimates S.E.Age 0.074∗∗ (0.005) 0.075∗∗ (0.006)Gender 0.189∗∗ (0.035) 0.187∗∗ (0.036)Work 0.015∗∗ (0.007) 0.019 (0.028)White 0.092∗∗ (0.009) 0.090∗∗ (0.009)Hispanic 0.360∗∗ (0.023) 0.360∗∗ (0.026)Black 0.238∗∗ (0.030) 0.239∗∗ (0.031)Asian 0.094∗∗ (0.020) 0.094∗∗ (0.021)Native 0.004 (0.012) 0.001 (0.013)Age-Gender -0.065∗∗ (0.006) -0.064∗∗ (0.006)Log-Likelihood of the Baseline Model: -3128.033Log-Likelihood of the Alternative Model: -3129.819Likelihood Ratio: 3.573
† S.E computed using the delta method.†† ∗∗ for 1% significance level.††† Robust SE using the (sandwich) variance-covariance matrix for pseudo-m.l.e.
Table 6: Robustness: Baseline vs Gumbel-distributed shocks (Sum normalized to 1)†
Baseline Model Alternative ModelDimensions Estimates S.E. Estimates S.E.Age 0.074∗∗ (0.005) 0.073∗∗ (0.006)Gender 0.189∗∗ (0.035) 0.185∗∗ (0.039)Work 0.015∗∗ (0.007) 0.016∗∗ (0.006)White 0.092∗∗ (0.009) 0.088∗∗ (0.009)Hispanic 0.360∗∗ (0.023) 0.362∗∗ (0.025)Black 0.238∗∗ (0.030) 0.244∗∗ (0.033)Asian 0.094∗∗ (0.020) 0.094∗∗ (0.020)Native 0.004 (0.012) 0.001 (0.012)Age-Gender -0.065∗∗ (0.006) -0.061∗∗ (0.006)Log-Likelihood of the Baseline Model: -3128.033Log-Likelihood of the Alternative Model: -3115.614
† S.E computed using the delta method.†† ∗∗ for 1% significance level.††† Robust SE using the (sandwich) variance-covariance matrix for pseudo-m.l.e. The SE for the
probit model are clustered by school.
45
Table 7: Robustness: Baseline vs Probit (Sum normalized to 1)†
Baseline Model Alternative ModelDimensions Estimates S.E. Estimates S.E.Age 0.074∗∗ (0.005) 0.247∗∗ (0.008)Gender 0.189∗∗ (0.035) 0.244∗∗ (0.010)Work 0.015∗∗ (0.007) 0.026∗∗ (0.005)White 0.092∗∗ (0.009) 0.091∗∗ (0.012)Hispanic 0.360∗∗ (0.023) 0.086∗∗ (0.009)Black 0.238∗∗ (0.030) 0.299∗∗ (0.018)Asian 0.094∗∗ (0.020) 0.067∗∗ (0.017)Native 0.004 (0.012) 0.031∗∗ (0.006)Age-Gender -0.065∗∗ (0.006) -0.090∗∗ (0.007)Log-Likelihood of the Baseline Model: -3128.033Log-Likelihood of the Alternative Model: -142 337.750Likelihood Ratio: 278 419.500
† S.E computed using the delta method.†† ∗∗ for 1% significance level.††† Robust SE using the (sandwich) variance-covariance matrix for pseudo-m.l.e. The SE for the
probit model are clustered by school.
Table 8: Robustness: Cell Estimator†
Dimensions Estimates S.E.White 0.839∗∗ (0.003)Hispanic 0.286∗∗ (0.008)Black 0.633∗∗ (0.011)Asian 0.311∗∗ (0.010)Native 0.054∗∗ (0.006)
† S.E computed using the delta method.†† ∗∗ for 1% significance level.
46
References
Aumann, R. J. “Acceptable Points in General Cooperative n-person Games” In Contri-
bution to the Theory of Game IV, Annals of Mathematical Study 40 (1959), 287-324
Badev, A. “Discrete Games in Endogenous Networks: Theory and Policy” (2013), Work-
ing Paper
Bifulco, R., J. Fletcher, and S. L. Ross “The Effect of Classmate Characteristics on
Post-Secondary Outcomes: Evidence from the Add Health.” American Economic Journal:
Economic Policy, 3-1 (2011), p.25-53
Bloch F. and Dutta B. “Communication Networks with Endogenous Link Strength”,
Games and Economic Behavior, 66-1 (2009), 39-56
Bramoulle Y., Currarini, S., Jackson, M.O., Pin P. and Rogers B. “Homophily
and Long-Run Integration in Social Networks”, Journal of Economic Theory (2012),Forth-
coming
Cameron A.C. and Trivedi P.K. “Microeconometrics, Methods and Applications”,
Cambridge University Press, 2005
Christakis N., Fowler J., Imbens G.W. and Kalyanaraman K. “An Empirical
Model for Strategic Network Formation” (2010), Working Paper
Copic J., Jackson M.O. and Kirman A. “Identifying Community Structures from
Network Data via Maximum Likelihood Methods”, B.E. Press Journal of Theoretical Eco-
nomics 9-1, Article 30 (2009)
Currarini S., Jackson M. O., Pin P. “An Economic Model of Friendship: Homophily,
Minorities, and Segregation”, Econometrica, 77 (2009), 1003-1045
Currarini S., Jackson M. O., Pin P. “Identifying the roles of race-based choice and
chance in high school friendship network formation,” Proceedings of the National Academy
of Sciences of the United States of America, 107-11 (2010), 4857-4861
47
De Giorgi, G., M. Pellizzari, and S. Redaelli “Identification of social interactions
through partially overlapping peer groups”, American Economic Journal: Applied Eco-
nomics, 2 (2010), p.241-275
Echenique F. and Fryer R.G. “A Measure of Segregation Based on Social Interactions”
Quarterly Journal of Economics, 122-2 (2007), 441-485
Fortin B. and Yazbeck M.A., ”Peer Effects, Fast Food Consumption and Adolescent
Weight Gain”, (2011) Working Paper
Franz S., Marsili M. and Pin P. “Observed Choices and Underlying Opportunities”,
Science and Culture, 76-9,10 (2010), 471-476
Galeotti A., Goyal S. and Kamphorst J. “Network Formation with Heterogenous
Players”, Games and Economic Behavior, 54-2 (2006), 353-373
Goldsmith-Pinkham P. and Imbens W. G. “Social Networks and the Identification
of Peer Effects” Journal of Business & Economic Statistics, 31-3 (2013) 253-264
Golub B. and Jackson M.O. “Naive Learning in Social Networks: Convergence, In-
fluence and the Wisdom of Crowds”, American Economic Journal: Microeconomics 2-1
(2010a), 112-149
Golub B. and Jackson M.O. “Using selection bias to explain the observed structure
of Internet diffusions”, Proceedings of the National Academy of Sciences, 107-24 (2010b),
10833-10836
Goyal S. and Vega-Redondo F. “Structural Holes in Social Networks”, Journal of
Economic Theory, 137 (2007), 460-492
Gourieroux C., Monfort A. and Trognon A. “Pseudo Maximum Likelihood Methods:
Theory”, Econometrica, 52-3 (1984), 681-700
Hsieh C-S. and Lee L.F. “A Social Interactions Model with Endogenous Friendship
Formation and Selectivity”, (2013) Working Paper
Henry M. and Mourifie I. “Euclidean Revealed Preferences: Testing the Spatial Voting
Model”, Journal of Applied Econometrics, 8-4 (2013), p.650-666
48
Iijima R. and Kamada Y. “Social Distance and Network Structures” (2013), Working
Paper
Jackson M. O. Social and Economic Networks, Princeton University Press, 2008
Jackson M.O. and Rogers B.W. “The Economics of Small Worlds”, Journal of the
European Economic Association, 3-2,3 (2005), 617-627
Jackson M.O. and Rogers B.W. “Meeting Strangers and Friends of Friends: How
Random are Socially Generated Networks?”, American Economic Review, 97-3 (2007),
890-915
Jackson M. O. and Wolinsky “A Strategic Model of Social and Economic Networks”,
Journal of Economic Theory, 71 (1996), 44-74
Johnson C. and Gilles R. P. “Spatial Social Networks”, Review of Economic Design, 5
(2000), 273-299
van der Leij M., Rolfe M. and Toomet O. “On the Relationship Between Unexplained
Wage Gap and Social Network Connections for Ethnical Groups” (2009), Working Paper
Manski, C.F. “Economic Analysis of Social Interactions”, Journal of Economic Perspec-
tives, 14-3 (2000), 115-136
Marmaros D. and Sacerdote B. “How Do Friendships Form?”, Quarterly Journal of
Economics, 121-1 (2006), 79-119
McFadden D. “Conditional Logit Analysis of Qualitative Choice Behavior” in P. Zarem-
bka (ed.), Frontiers in econometrics (1974), 105-142
McFadden D. “Econometric Models of Probabilistic Choice”, in C. Manski and D. Mc-
Fadden (editors), Structural Analysis of Discrete Data with Econometric Applications,
Cambridge, Mass.:M.I.T.Press. 1981
Mele A. “A Structural Model of Segregation in Social Networks” (2013), Working Paper
McPherson M., Smith-Lovin L., and Cook M J. “Birds of a Feather: Homophily in
Social Networks”, Annual Review of Sociology, 27 (2001), 415-444
49
Patacchini E. and Zenou Y. “Ethnic Networks and Employment Outcomes”, Regional
Science and Urban Economics, 42-6 (2012), 938-949
Rivas J. “Friendship Selection”, International Journal of Game Theory, 38 (2009), 521-538
Rubı-Barcelo A. “Core/periphery scientific collaboration networks among very similar
researchers”, Theory and Decision, 72-4 (2012), p.463-483.
Watts A. “Formation of Segregated and Integrated Groups”, International Journal of
Game Theory, 35 (2007), 505-519
Sheng S. “Identification and Estimation of Network Formation Games” (2012), Working
Paper
50