+ All Categories
Home > Documents > Unbiased sampling of network ensembles

Unbiased sampling of network ensembles

Date post: 01-Dec-2023
Category:
Upload: imtlucca
View: 0 times
Download: 0 times
Share this document with a friend
18
Unbiased sampling of network ensembles Tiziano Squartini Instituut-Lorentz for Theoretical Physics, University of Leiden, Niels Bohrweg 2, 2333 CA Leiden (The Netherlands) Institute for Complex Systems UOS Sapienza, “Sapienza” University of Rome, P.le Aldo Moro 5, 00185 Rome (Italy) Rossana Mastrandrea Institute of Economics and LEM, Scuola Superiore Sant’Anna, 56127 Pisa (Italy) Aix Marseille Universit´ e, Universit´ e de Toulon, CNRS, CPT, UMR 7332, 13288 Marseille (France) Diego Garlaschelli Instituut-Lorentz for Theoretical Physics, University of Leiden, Niels Bohrweg 2, 2333 CA Leiden (The Netherlands) (Dated: June 12, 2014) Sampling random graphs with given properties is a key step in the analysis of networks, as random ensembles represent basic null models required to identify patterns such as communities and motifs. A key requirement is that the sampling process is unbiased and efficient. The main approaches are microcanonical, i.e. they sample graphs that exactly match the enforced constraints. Unfortunately, when applied to strongly heterogeneous networks (including most real-world graphs), the majority of these approaches become biased and/or time-consuming. Moreover, the algorithms defined in the simplest cases (such as binary graphs with given degrees) are not easily generalizable to more complicated ensembles. Here we propose a solution to the problem via the introduction of a ‘maximize-and-sample’ (‘Max & Sam’) method to correctly sample ensembles of networks where the constraints are ‘soft’ i.e. they are realized as ensemble averages. Being based on exact maximum- entropy distributions, our approach is unbiased by construction, even for strongly heterogeneous networks. It is also more computationally efficient than most microcanonical alternatives. Finally, it works for both binary and weighted networks with a variety of constraints, including combined degree-strengths sequences and full reciprocity structure, for which no alternative method exists. Our method can also be turned into a microcanonical one, via a restriction to the relevant subset. We show various applications to real-world networks and provide a code implementing all our algorithms. PACS numbers: 05.10.-a,89.75.Hc,02.10.Ox,02.70.Rr Network theory is systematically used to address prob- lems of scientific and societal relevance [1], from the pre- diction of the spreading of infectious diseases worldwide [2] to the identification of early-warning signals of up- coming financial crises [3]. More in general, several dy- namical and stochastic processes are strongly affected by the topology of the underlying network [4]. This results in the need to identify the topological properties that are statistically significant in a real network, i.e. to discrimi- nate which higher-order properties can be directly traced back to the local features of nodes, and which are instead due to more complicated factors. To achieve this goal, one requires (a family of) ran- domized benchmarks, i.e. ensembles of networks where the local heterogeneity is the same as in the real one, and the topology is random in any other respect: this defines a null model of the original network. Nontrivial patterns can then be detected in the form of empirical deviations from the null model’s theoretical expectations [5]. Important examples of such patterns is the presence of motifs (recurring subgraphs of small size, like build- ing blocks of a network [6]) and communities (groups of nodes that are more densely connected internally than with each other [7]). To detect these and many other patterns, one needs to correctly specify the null model and then calculate e.g. the average and standard devia- tion (or confidence interval) of any topological property of interest over the corresponding randomized ensemble of graphs. Unfortunately, given the strong heterogeneity of nodes (e.g. the power-law distribution of vertex degrees), the solution to the above problem is not simple. This is most easily explained in the case of binary graphs, even if similar arguments apply to weighted networks as well. For simple graphs, the most important null model is the (Undirected Binary) Configuration Model (UBCM), de- fined as an ensemble of networks where the degree of each node is specified, and the rest of the topology is maximally random [8]. Since the degrees of all nodes (the so-called degree sequence ) act as constraints, ‘max- imally random’ does not mean ‘completely random’: in order to realize the degree sequence, interdependencies among vertices necessarily arise. These interdependen- cies affect other topological properties as well. So, even if the degree sequence is the only quantity that is enforced ‘on purpose’, other structural properties are unavoidably constrained as well. These higher-order effects are called ‘structural correlations’. In order to disentangle spuri- ous structural correlations from genuine correlations of interest, it is very important to properly implement the arXiv:1406.1197v2 [stat.ME] 11 Jun 2014
Transcript

Unbiased sampling of network ensembles

Tiziano SquartiniInstituut-Lorentz for Theoretical Physics, University of Leiden,

Niels Bohrweg 2, 2333 CA Leiden (The Netherlands)Institute for Complex Systems UOS Sapienza, “Sapienza” University of Rome, P.le Aldo Moro 5, 00185 Rome (Italy)

Rossana MastrandreaInstitute of Economics and LEM, Scuola Superiore Sant’Anna, 56127 Pisa (Italy)

Aix Marseille Universite, Universite de Toulon, CNRS, CPT, UMR 7332, 13288 Marseille (France)

Diego GarlaschelliInstituut-Lorentz for Theoretical Physics, University of Leiden,

Niels Bohrweg 2, 2333 CA Leiden (The Netherlands)(Dated: June 12, 2014)

Sampling random graphs with given properties is a key step in the analysis of networks, asrandom ensembles represent basic null models required to identify patterns such as communitiesand motifs. A key requirement is that the sampling process is unbiased and efficient. The mainapproaches are microcanonical, i.e. they sample graphs that exactly match the enforced constraints.Unfortunately, when applied to strongly heterogeneous networks (including most real-world graphs),the majority of these approaches become biased and/or time-consuming. Moreover, the algorithmsdefined in the simplest cases (such as binary graphs with given degrees) are not easily generalizableto more complicated ensembles. Here we propose a solution to the problem via the introduction of a‘maximize-and-sample’ (‘Max & Sam’) method to correctly sample ensembles of networks where theconstraints are ‘soft’ i.e. they are realized as ensemble averages. Being based on exact maximum-entropy distributions, our approach is unbiased by construction, even for strongly heterogeneousnetworks. It is also more computationally efficient than most microcanonical alternatives. Finally,it works for both binary and weighted networks with a variety of constraints, including combineddegree-strengths sequences and full reciprocity structure, for which no alternative method exists.Our method can also be turned into a microcanonical one, via a restriction to the relevant subset. Weshow various applications to real-world networks and provide a code implementing all our algorithms.

PACS numbers: 05.10.-a,89.75.Hc,02.10.Ox,02.70.Rr

Network theory is systematically used to address prob-lems of scientific and societal relevance [1], from the pre-diction of the spreading of infectious diseases worldwide[2] to the identification of early-warning signals of up-coming financial crises [3]. More in general, several dy-namical and stochastic processes are strongly affected bythe topology of the underlying network [4]. This resultsin the need to identify the topological properties that arestatistically significant in a real network, i.e. to discrimi-nate which higher-order properties can be directly tracedback to the local features of nodes, and which are insteaddue to more complicated factors.

To achieve this goal, one requires (a family of) ran-domized benchmarks, i.e. ensembles of networks wherethe local heterogeneity is the same as in the real one,and the topology is random in any other respect: thisdefines a null model of the original network. Nontrivialpatterns can then be detected in the form of empiricaldeviations from the null model’s theoretical expectations[5]. Important examples of such patterns is the presenceof motifs (recurring subgraphs of small size, like build-ing blocks of a network [6]) and communities (groups ofnodes that are more densely connected internally thanwith each other [7]). To detect these and many otherpatterns, one needs to correctly specify the null model

and then calculate e.g. the average and standard devia-tion (or confidence interval) of any topological propertyof interest over the corresponding randomized ensembleof graphs.

Unfortunately, given the strong heterogeneity of nodes(e.g. the power-law distribution of vertex degrees), thesolution to the above problem is not simple. This ismost easily explained in the case of binary graphs, evenif similar arguments apply to weighted networks as well.For simple graphs, the most important null model is the(Undirected Binary) Configuration Model (UBCM), de-fined as an ensemble of networks where the degree ofeach node is specified, and the rest of the topology ismaximally random [8]. Since the degrees of all nodes(the so-called degree sequence) act as constraints, ‘max-imally random’ does not mean ‘completely random’: inorder to realize the degree sequence, interdependenciesamong vertices necessarily arise. These interdependen-cies affect other topological properties as well. So, evenif the degree sequence is the only quantity that is enforced‘on purpose’, other structural properties are unavoidablyconstrained as well. These higher-order effects are called‘structural correlations’. In order to disentangle spuri-ous structural correlations from genuine correlations ofinterest, it is very important to properly implement the

arX

iv:1

406.

1197

v2 [

stat

.ME

] 1

1 Ju

n 20

14

2

UBCM in such a way that it takes the observed degreesequence as input and generates expectations based on auniform and efficient sampling of the ensemble. Similarand more challenging considerations apply to more com-plicated null models, defined e.g. for directed or weightedgraphs and specified by more general constraints.

Several approaches have been proposed and can beroughly divided in two large classes: microcanonical andcanonical methods. Microcanonical approaches [9–15]aim at artificially generating many randomized variantsof the observed network in such a way that the con-strained properties are identical to the empirical ones,thus creating a collection of graphs sampling the desiredensemble. In these algorithms the enforced constraintsare ‘hard’, i.e they are met exactly by each graph in theresulting ensemble. This strong requirement implies thatmost microcanonical approaches proposed so far sufferfrom various problems, including bias, lack of ergodic-ity, mathematical intractability, high computational de-mands, and poor generalizability.

On the other hand, in canonical approaches [5, 16–26] the constraints are ‘soft’, i.e. they can be violatedby individual graphs in the ensemble, even if the ensem-ble average of each constraint must match the enforcedvalue exactly. Canonical approaches are generally intro-duced to directly obtain, as a function of the observedconstraints (e.g. the degree sequence), exact mathemat-ical expressions for the expected topological properties,thus avoiding the explicit generation of randomized net-works [5]. However, this is only possible if the mathemat-ical expressions for the topological properties of interestare simple enough to make the analytical calculation ofthe expected values feasible. In all other cases, one is ledagain to the problem of sampling many network config-urations explicitly and average the properties of interestover such configurations.

In this paper, we show that canonical approaches canindeed be used also for the purpose of computationallysampling several ensembles of diverse types of networks(i.e. directed, undirected, weighted, binary) with vari-ous constraints (degree sequence, strength sequence, reci-procity structure, mixed binary and weighted properties,etc.). This computational use of canonical null mod-els has not been implemented systematically so far, be-cause the most popular approaches rely on highly ap-proximated expressions leading to ill-defined or unknownprobabilities that cannot be used to sample the ensem-ble. These approximations are in any case available onlyfor the simplest ensembles (e.g. the UBCM), leaving theproblem unsolved for more general constraints. By con-trast, here we make use of a series of recent analyticalresults that generate the exact probabilities for a widerange of constraints of interest [5, 19–27]. We show that,unlike microcanonical approaches, our canonical ones al-low for a unified treatment of several different ensem-bles, fulfil all the standard requirements of unbiasednessand uniformity, and are computationally efficient. Weconsider various examples that show the power of our

method when applied to real-world networks. We alsodiscuss conditions under which our approach can be usedmicrocanonically, via a restriction to the subset of real-izations that match the constraints sharply.

I. PREVIOUS APPROACHES

In this section, we briefly discuss the main available ap-proaches to the problem of sampling network ensembleswith given constraints, and highlight the main obstaclesand limitations. We consider both microcanonical andcanonical methods. In both cases, since the UBCM is themost popular and most studied ensemble, we will discussthe problem by focusing mainly on the implementationsof this model. The same kind of considerations extend toother constraints (e.g. strengths, reciprocity) and othertypes of networks (e.g. directed and/or weighted) as well.

A. Microcanonical methods

There have been several attempts to develop mi-crocanonical algorithms that efficiently implement theUBCM. One of the earliest algorithm starts with anempty network having the same number of vertices ofthe original one, where each vertex is assigned a numberof ‘half edges’ (or ‘edge stubs’) equal to its degree in thereal network. Then, pairs of stubs are randomly matched,thus creating the final edges of a random network withthe desired degree sequence [8]. Unfortunately, for mostempirical networks, the heterogeneity of the degrees issuch that this algorithm produces several multiple edgesbetween vertices with large degree, and several self-loops[9]. If the formation of these undesired edges is forbid-den explicitly, the algorithm gets stuck in configurationswhere edge stubs have no more eligible partners, thusfailing to complete any randomized network.

To overcome this limitation, a different algorithm(which is still widely used) was introduced [9]. This ‘Lo-cal Rewiring Algorithm’ (LRA) starts from the originalnetwork, rather than from scratch, and randomizes itstopology through the iteration of an elementary movethat preserves the degrees of all nodes. While this algo-rithm always produces random networks, it is extremelytime consuming since many iterations of the fundamentalmove are needed in order to produce just one randomizedvariant, and this must be repeated several times (the mix-ing time is still unknown [28]) in order to produce manyvariants.

Besides these practical limitations, the main problemof the LRA is the fact that it is biased, i.e. it does notsample the desired ensemble uniformly. This has beenrigorously shown relatively recently [10–12]. For undi-rected networks, uniformity has been shown to hold, atleast approximately, only when the degree sequence is

3

such that [10]

kmax · k2/(k)2 � N (1)

where kmax is the largest degree in the network, k isthe average degree, k2 is the second moment, and N isthe number of vertices. For directed networks, a similarcondition must hold [11]. Clearly, the above conditionsets an upper bound for the heterogeneity of the degreesof vertices, and is violated if the heterogeneity is strong.This is a first indication that the available methods breakdown for ‘strongly heterogeneous’ networks. As we dis-cuss later, most real-world networks are known to fallprecisely within this class.

For directed networks, where links are oriented andthe constraints to be met are the numbers of incomingand outgoing links (in-degree and out-degree) separately,a condition similar to eq.(1) holds, but there is also theadditional problem that the LRA is non-ergodic, i.e. itis in general not able to explore the entire ensemble ofnetworks [11]. The violation of uniformity and ergodic-ity in the LRA implies that the average quantities overthe graphs it generates are biased, i.e. they do not cor-respond to the correct expectations. It has been shownthat, in order to restore ergodicity, it is enough to in-troduce an additional ‘triangular move’ inverting the di-rection of closed loops of three vertices [11]. However,in order to restore uniformity, one must do somethingmuch more complicated: at each iteration, the attempted‘rewiring move’ must be accepted with a probability thatdepends on some complicated property of the current net-work configuration [10–12]. Since this property must berecalculated at each step, the resulting algorithm is ex-tremely time consuming.

Other recent alternatives [13–15] rely on theorems,such as the Erdos-Gallai [29] one, that set necessary andsufficient conditions for a degree sequence to be graphic,i.e. realized by at least one graph. These ‘graphic’ meth-ods exploit such (or related) conditions to define biasedsampling algorithms in conjunction with the estimationof the corresponding sampling probabilities, thus allow-ing one to statistically reweight the outcome and samplethe ensemble effectively uniformly [13–15]. Del Genio etal. [13] show that, for networks with power-law degreedistribution of the form P (k) ∼ k−γ , the computationalcomplexity of sampling just one graph using their algo-rithm is O(N2) if γ > 3. However, when γ < 3 thecomputational complexity increases to O(N2.5) if

kmax <√N (2)

and to O(N3) if kmax >√N . The upper bound

√N is

a particular case of the so-called ‘structural cut-off’ thatwe will discuss in more detail later. For the moment, itis enough for us to note that eq.(2) is another indicationthat, for strongly heterogeneous networks, the problemof sampling gets more complicated. As we will discusslater, most real networks violate eq.(2) strongly.

So, while ‘graphic’ algorithms do provide a solution forevery network, their complexity increases for networks

of increasing (and realistic) heterogeneity. A morefundamental limitation is that they can only handle theproblem of binary graphs with given degree sequence.The generalization of these methods to other types ofnetworks and other constraints is not straightforward, asit would require the proof of more general ‘graphicality’theorems, and ad hoc modifications of the algorithm.

From this discussion of microcanonical algorithms, itshould be clear that it is not obvious how they should beextended to the case of weighted networks, where the con-straint relevant to each vertex i is the so-called strengthsi (the sum of weights of the links reaching that vertex),and possibly also the purely binary degree ki [25, 26].Similarly, it is unclear how the available microcanoni-cal algorithms should be generalized to accomodate ad-ditional constraints like the reciprocity structure in bothbinary [22] and weighted [24] directed networks. In allthese more complicated ensembles, one should solve theproblem of bias to ensure uniform sampling protocols.

B. Canonical methods

Canonical approaches aim at obtaining, as a functionof the observed constraints (e.g. the degree sequence),mathematical expressions for the expected topologicalproperties, avoiding the explicit generation of random-ized networks. For canonical methods the requirement ofuniformity is replaced by the requirement that the proa-bility distribution over the enlarged ensemble has maxi-mum entropy [5, 16].

For binary graphs, since any topological property is afunction of the adjacency matrix A of the network (withentries aij = 1 if the vertices i and j are connected, andaij = 0 otherwise), the ultimate goal is finding a math-ematical expression for the expected value of aij , whichis nothing but the probability pij that the vertices i andj are connected in the ensemble. Note that, since in themicrocanonical ensemble all links are dependent on eachother (the degree sequence must be reproduced exactly ineach realization), finding analytical expressions for pij inthe microcanonical case is unfeasible. However, it turnsout that in the canonical ensemble, for most constraintsof interest, all pairs of vertices become independent, mak-ing the quest for pij possible.

For binary undirected networks, the most popularspecification for pij is the factorized one [1, 30, 31]

pij =kikjktot

(3)

(where ki is the degree of node i and ktot is the total de-gree over all nodes), while for weighted undirected net-works, where each link can have a non-negative weightwij and each vertex i is characterized by a given strengthsi, the corresponding assumption is that the expected

4

weight of the link connecting the vertices i and j is

〈wij〉 =sisjstot

(4)

(where stot is the total strength of all vertices).

Equations (3) and (4) are routinely used, and havebecome standard textbook expressions [1]. The mostintense use of these expressions is perhaps encounteredin the empirical analysis of communities, i.e. relativelydenser modules of vertices in large networks [7]. Mostcommunity detection algorithms compare different par-titions of vertices into communities (each partition beingparametrized by a matrix C such that cij = 1 if the ver-tices i and j belong to the same community, and cij = 0otherwise) and look for the ‘optimal’ partition. The lat-ter is the one that maximizes the modularity functionwhich, for binary networks, is defined as

Q(C) ≡ 1

ktot

∑i,j

[aij −

kikjktot

]cij (5)

where eq.(3) appears explicitly as a null model for aij .For weighted networks, a similar expression involvingeq.(4) applies. Other important examples where eq.(3)is used are the characterization of the connected com-ponents of networks [31], the average distance amongvertices [30], and more in general the theoretical studyof percolation [1] (characterizing the system’s robustnessunder the failure of nodes and/or links) and other dy-namical processes [4] on networks.

Due to the important role they play in many appli-cations, it is remarkable that the literature puts littleemphasis on the fact that eqs.(3) and (4) are valid onlyunder strict conditions that, for most real networks, arestrongly violated. It is evident that eq.(3) represents aprobability only if the largest degree kmax in the networkdoes not exceed the ‘structural cut-off’ kc ≡

√ktot, i.e. if

kmax <√ktot (6)

Obviously, the above condition sets an upper bound forthe allowed heterogeneity of the degrees, since both kmaxand ktot are determined by the same degree distribution.If kmax exceeds kc, eq.(3) breaks down.

In principle, the knowledge of pij allows one to samplenetworks from the canonical ensemble, by running overall pairs of nodes and connecting them with the appro-priate probability (we will use this method later on). Un-fortunately, the above discussion shows that, for stronglyheterogeneous degree sequences, eq.(3) would produce ill-defined values of connection probabilities. This is why, sofar, general algorithms to sample canonical, rather thanmicrocanonical, ensembles of networks with constraintshave not been implemented.

C. The ‘strong heterogeneity regime’ challengingmost algorithms

Equations (1), (2) and (6), along with our discus-sion above, show that most methods run into problemswhen the heterogeneity of the network is strong enough.Strongly heterogeneous networks elude most microcanon-ical and canonical approaches proposed so far. As is wellknown, networks in the ‘strongly heterogeneity’ regimeturn out to be ubiquitous, and represent the rule ratherthan the exception. A simple way to prove this is bydirectly checking whether the largest degree exceeds thestructural cut-off kc. As Maslov et al. first noticed [9], inreal networks kc is strongly and systematically exceeded:for instance, for the Internet kmax = 1458 and kc ≈ 159,which means that the structural cut-off is exceeded ten-fold! Consequently, if eq. (3) were applied to the two ver-tices with largest degree, the resulting connection ‘proba-bility’ would be pij = 43.5, i.e. more than 40 times largerthan any reasonable estimate for a probability! We alsonote that, when inserted into eq.(5), this value of pijwould produce, in the summation, a single term 40 timeslarger than any other ‘regular’ term (of order unity), thussignificantly biasing the community detection problem.To the best of our knowledge, a study of the entity ofthis bias has never been performed.

The Internet is not a special case, and similar figuresare found in the majority of real networks, making theproblem entirely general. To see this, we recall that mostreal networks have a power-law degree distribution of theform P (k) ∼ k−γ with exponent in the range 2 < γ < 3.For these networks, the average degree k = ktot/N is

finite but the second moment k2 diverges. Thereforethe structural cut-off scales as kc ∼ N1/2 [32], whichmeans that eqs. (2) and (6) coincide. By contrast, Ex-treme Value Theory shows that the largest degree scalesas kmax ∼ N1/(γ−1) [32]. This implies that the ratiokmax/kc diverges for large networks, i.e. the largest de-gree is infinitely larger than the allowed cut-off value. Un-fortunately, many results and approaches that have beenobtained by assuming kmax < kc are routinely extendedto real networks where, in most of the cases, kmax � kc.Therefore, although this might appear as an exaggeratedclaim, most analyses of real-world networks (includingcommunity detection) that have been carried out so farhave relied on incorrect expressions, and have been sys-tematically affected by an uncontrolled bias.

In theoretical and computational models of networks,the problem is routinely circumvented by enforcing thecondition kmax < kc explicitly, e.g. by considering atruncated power-law distribution, using the justificationthat this condition should hold for ‘sparse’ networkswhere the average degree does not grow with N , as inmost real networks [8, 33]. This is however misleading,since in real scale-free networks with 2 < γ < 3 the av-erage degree is finite, and this makes them sparse evenwithout assuming a truncation in the degree distribution.Indeed, as in the example above, real networks systemat-

5

ically violate the cut-off value, and are therefore ‘stronglyheterogeneous’. By the way, sparseness itself is not cru-cial, since dense but homogeneous networks (includingthe densest of all, i.e. the complete graph) are such thatkmax < kc and are correctly described by eq.(3), just assparse homogeneous networks. This confirms that theproblem is in fact due to strong heterogeneity.

The above arguments extend also to other ensem-bles of networks with different constraints. The generalconclusion is that, since real-world networks are gen-erally strongly heterogeneous, the available approacheseither break down or become computationally demand-ing. Moreover, it is difficult to generalize the availableknowledge to modified constraints and different types ofgraphs.

II. THE ‘MAX & SAM’ METHOD

In what follows, we exploit a series of recent results[5, 22, 24–26] characterizing several canonical ensemblesof networks and define a unified approach to sample theseensembles in a fast, unbiased and efficient way. In ourapproach, the functional form of the probability of eachgraph in the ensemble is derived by maximizing Shan-non’s entropy [16], and the numerical coefficients of thisprobability are derived by maximizing the probability(i.e. the likelihood) itself [5]. A similar approach wasalready introduced [5] in order to derive analytical ex-pectations for some topological quantities of interest, butthat approach entirely avoids the problem of sampling.Here, we instead implement a sampling protocol explic-itly, so that the expected value of any desired topologicalproperty (not only those that can be calculated analyt-ically) can be obtained as an average over the samplednetworks. We provide unified practical recipes to samplerandom networks efficiently, and from an expanded set ofensembles that have been treated only separately so far[5, 19–22, 24–27]. We also provide a code implementingall our sampling algorithms (see Appendix). Since ourmethod is based on the idea that network ensembles canbe efficiently sampled stochastically after a double max-imization (first of Shannon’s entropy, and then of thelikelihood), we call it the ‘Maximize and Sample’ (forshort, ‘Max & Sam’) method.

In this section we consider the canonical implemen-tation of our method, while in the next one we brieflydiscuss the microcanonical implementation. We will con-sider canonical ensembles of binary graphs with given de-gree sequence (both undirected [5, 19] and directed [5, 19,21]), of weighted networks with given strength sequence(both undirected [5, 20] and directed [5, 20, 21, 24]), ofdirected networks with given reciprocity structure (bothbinary [22, 23] and weighted [24]), and of weighted net-works with given strength sequence and degree sequence[25–27]. In all these cases, one can show that the prob-ability of the entire network factorizes as a product ofdyadic probabilities over pairs of nodes. This ensures

that the computational complexity is always O(N2) inall cases considered here, irrespective of the level of het-erogeneity of the real-world network being randomized(thus overcoming the ubiquitous problems discussed insec.I C). Since all the probability distributions involved inour method are obtained via an explicit maximization ofShannon’s entropy, the resulting sampling algorithms areunbiased by construction. This implies that our methodsdo not suffer from the limitations of the other methodsdiscussed in sec. I, and are efficient and unbiased evenfor strongly heterogeneous networks.

A. Binary undirected graphs with given degreesequence

Let us start by considering binary, undirected networks(BUNs). A generic BUN is uniquely specified by its bi-nary adjacency matrix A. The particular matrix cor-responding to the observed graph that we want to ran-domize will be denoted by A∗. As we mentioned, thesimplest non-trivial constraint is the degree sequence,{ki}Ni=1 (where ki ≡

∑j aij is the degree of node i), defin-

ing the Undirected Binary Configuration Model (UBCM).Throughout sec. I we have already discussed the crucialimportance of the UBCM for the analysis of real-worldnetworks (e.g. for the detection of communities [7]), andwe have highlighted the limitations of the available im-plementations.

In our approach, the canonical ensemble of BUNs isthe set of networks with the same number of nodes, N ,of the observed graph and a number of (undirected) links

varying from zero to the maximum value N(N−1)2 . Appro-

priate probability distributions on this ensemble can befully determined by maximizing, in sequence, Shannon’sentropy (under the chosen constraints) and the likelihoodfunction, as already pointed out in [5]. The result of theentropy maximization [5, 16] is that the graph probabilityfactorizes as

P (A|~x) =∏i<j

paijij (1− pij)1−aij (7)

where pij ≡ xixj

1+xixj. The vector ~x of N unknown pa-

rameters is to be determined either by maximizing thelog-likelihood function

λ(~x) ≡ lnP (A∗|~x) = (8)

=∑i

ki(A∗) lnxi −

∑i<j

ln(1 + xixj)

or, equivalently, by solving the following system of Nequations (corresponding to the requirement that thegradient of the log-likelihood vanishes) [5]:

〈ki〉 =∑i 6=j

xixj1 + xixj

= ki(A∗) ∀i (9)

where ki(A∗) is the observed degree of vertex i and 〈ki〉

indicates its ensemble average. From eq.(9) it is evident

6

that only the observed values of the chosen constraints(the sufficient statistics of the problem) are needed inorder to obtain the numerical values of the unknowns(the empirical degree sequence fixes the value of ~x, whichin turn fix the value of all the {pij}). In any case, for thesake of clarity, in the code we allow the user to choose thepreferred input-form (a matrix, a list of edges, a vectorof constraints). This is true for all the models describedhere and implemented in the routine.

Note that the above form of pij represents the exactexpression that should be used in place of eq.(3). Thisreveals the highly non-linear and non-local character ofthe interdependencies among vertices in the UBCM: inrandom networks with given degree sequence, the cor-rect connection probability pij is a function of the de-grees of all vertices of the network, and not just of theend-point degrees as in eq.(3). Only when the degreesare ‘weakly heterogeneous’ (mathematically, this hap-pens when xixj � 1 for all pairs of vertices, which impliespij ≈ xixj), these structural interdependencies becomeapproximately local. Note that, in the literature, this isimproperly called the ‘sparse graph’ limit [16], while fromour previous discussion in sec.I C it is clear that low het-erogeneity, and not sparsity, is required to produce thislimit.

Unlike eq.(3), the pij considered here always representsa proper probability ranging between 0 and 1, irrespec-tive of the heterogeneity of the network. This impliesthat eq.(7) provides us with a recipe to sample the canon-ical ensemble of BUNs under the UBCM. After the un-known parameters have been found, they can be put backinto eq.(7) to obtain the probability to correctly sampleany graph A from the ensemble. The key simplificationallowing this in practice is the fact that the graph proba-bility is factorized, so that a single graph can be sampledstochastically by sequentially running over each pair ofnodes i, j and implementing a Bernoulli trial (whose el-ementary events are aij = 0, with probability 1 − pij ,and aij = 1, with probability pij). This process can berepeated to generate as many configurations as desired.Note that sampling each network has complexity O(N2),and that the time required to preliminarily solve the sys-tem of coupled equations to find the unknown parameters~x is independent on how many random networks are sam-pled and on the heterogeneity of the network. Thus thisalgorithm is always more efficient than the correspondingmicrocanonical ones described in sec.I A.

In fig. 1 we show an example of this procedure on thenetwork of liquidity reserves exchanges between Italianbanks in 1999 [35]. For an increasing number of sampledgraphs, we show the convergence of the sample averageaij of each entry of the adjacency matrix to its exactcanonical expectation 〈aij〉, analytically determined aftersolving the likelihood equations. This preliminary checkis useful to establish that, in this case, generating 1000matrices (bottom right) is enough to reach a high level ofaccuracy. If needed, the accuracy can be quantified rig-orously (e.g. in terms of the maximum width around the

identity line) and arbitrarily improved by increasing thenumber of sampled matrices. Note that this importantcheck is impossible in microcanonical approaches, wherethe exact value of the ‘target’ probability is unknown.

We then select the sample of 1000 matrices and confirm(see the top panel of fig. 2) that the imposed constraints(the observed degrees of all nodes) are excellently repro-duced by the sample average, and that the confidence in-tervals are narrowly spread around the identity line. Thisis a crucial test of the accuracy of our sampling proce-dure. Again, the accuracy can be improved by increasingthe number of sampled matrices if needed.

After this preliminary check, the sample can be usedto compare the expected and observed values of higher-order properties of the network. Note that in this casewe do not require (or expect) that these (unconstrained)higher-order properties are correctly reproduced by thenull model. The entity of the deviations of the real net-work from the null model depends on the particular ex-ample considered, and the characterization of these de-viations is precisely the reason why a method to sam-ple random networks from the appropriate ensemble isneeded in the first place. In the bottom panels of fig. 2we compare the observed value of two quantities of in-terest with their arithmetic mean over the sample. Thetwo quantities are the average nearest neighbors degree

(ANND), knni =∑

j aijkj

ki, and the clustering coefficient,

ci =∑

j, k aijajkaki

ki(ki−1) of each vertex.

Note that since our sampling method is unbiased, thesample mean automatically weighs the configurations ac-cording to their correct probability. In this particularcase, we find that the null model reproduces the observednetwork remarkably well, which means that the degreesequence effectively ‘explains’ (or rather ‘generates’) thetwo empirical higher-order patterns that we have consid-ered. This is consistent with other studies [5, 19, 20],but not true in general for other networks or other con-straints, as we show later on. From the bottom panelsof fig. 2 we also note that the confidence intervals high-light a non-obvious feature: the fact that the (appar-ently) ‘strongest outliers’ (those further away from theidentity line) turn out to be actually within (or at theborder of) the chosen confidence intervals, while several(apparently) ‘weak outliers’ (closer to the identity) areinstead found to be much more distant from the confi-dence intervals, and thus in an unexpectedly stronger dis-agreement with the null model. These counter-intuitiveinsights cannot be gained from the analysis of the ex-pected values alone, e.g. using expressions like eq.(3) orsimilar.

B. Binary directed graphs with given in-degreeand out-degree sequences

For binary directed networks (BDNs), the adjacencymatrix A is (in general) not symmetric, and each node

7

FIG. 1. Sampling binary undirected networks with given degree sequence (Undirected Binary Configuration Model). Theexample shown is the binary network of liquidity reserves exchanges between Italian banks in 1999 [35] (N = 215). The fourpanels show the convergence of the sample average aij of each entry of the adjacency matrix to its exact canonical expectation〈aij〉, for 1 (top left), 10 (top right), 100 (bottom left) and 1000 (bottom right) sampled matrices. The identity line is shownin red.

i is characterized by two degrees: the out-degree kouti ≡∑j aij and the in-degree kini ≡

∑j aji. The Directed

Binary Configuration Model (DBCM), the directed ver-sion of the UBCM, is defined as the ensemble of BDNswith given out-degree sequence {kouti }Ni=1 and in-degreesequence {kini }Ni=1. The DBCM is widely used in orderto detect communities [7] and other higher-order pat-terns [1] in directed binary networks. However, mostapproaches [1, 7] make use of a directed version of eq.(3) which, just like eq. (3) itself, is a poor approxima-tion (especially for strongly heterogeneous networks, seesec. I C) to the correct expression that corresponds to anunbiased sampling of the ensemble [5]. As we now show,our ‘Max & Sam’ method automatically retrieves this ex-pression, along with the full probability distribution thatis unspecified by eq. (3) alone.

At a canonical level, the DBCM is defined on the en-semble of all BDNs with N vertices and a number of linksranging from 0 to N(N − 1). Equation (7) still applies,but now with ‘i < j’ replaced by ‘i 6= j’ and pij =

xiyj1+xiyj

,

where the 2N parameters ~x and ~y are determined by ei-

ther maximizing the log-likelihood function [5]

λ(~x, ~y) ≡ lnP (A∗|~x, ~y) = (10)

=∑i

[kouti (A∗) lnxi + kini (A∗) ln yi

]−∑i 6=j

ln(1 + xiyj)

(where A∗ is the real network) or, equivalently, by solvingthe corresponding system of 2N equations [5]:

〈kouti 〉 =∑i 6=j

xiyj1 + xiyj

= kouti (A∗) ∀i (11)

〈kini 〉 =∑i 6=j

xjyi1 + xjyi

= kini (A∗) ∀i (12)

This time, the ensemble can be efficiently sampled byconsidering each pair of vertices twice, and using (say) pijand pji to draw directed links in the two directions (thesetwo events being statistically independent). Since thisis a straightforward extension of the UBCM, we do notconsider any specific example to illustrate the DBCM.However, the related algorithm has been implemented inthe code (see Appendix).

8

0 50 100 150 2000

50

100

150

200

k

k

100 120 140 160 18080

100120140160180200

knn

knn

0.5 0.6 0.7 0.8 0.9 1.0

0.5

0.6

0.7

0.8

0.9

1.0

cc

FIG. 2. Sampling binary undirected networks with given degree sequence (Undirected Binary Configuration Model). Theexample shown is the binary network of liquidity reserves exchanges between Italian banks in 1999 [35] (N = 215). Thethree panels show, for each node in the network, the comparison between the observed value and the sample average of the(constrained) degree (top), the (unconstrained) ANND (bottom left) and the (unconstrained) clustering coefficient (bottomright), for 1000 sampled matrices. The 95% confidence intervals of the distribution of the sampled quantities is shown in pinkfor each node.

C. Binary directed graphs with given degreesequences and reciprocity structure

A more constrained null model, the Reciprocal Bi-nary Configuration Model (RBCM), can be defined forBDNs by enforcing, in addition to the two directed de-gree sequences considered above, the whole local reci-procity structure of the network [5, 22, 23]. Equiva-lently, this amounts to specify the three observed de-gree sequences defined as the vector of the numbersof non-reciprocated outgoing links, {k→i }Ni=1, the vec-tor of the numbers of non-reciprocated incoming links,{k←i }Ni=1, and the vector of the numbers of reciprocatedlinks, {k↔i }Ni=1 [5, 22, 23]. These numbers are definedas k→i ≡

∑j aij(1 − aji), k

←i ≡

∑j aji(1 − aij), and

k↔i ≡∑j aijaji respectively [22, 23].

The RBCM is of crucial importance when analysinghigher-order patterns that exist beyond the dyadic levelin directed networks. The most important example isthat of triadic motifs [3, 6, 23], i.e. patterns of connectiv-ity (involving triples of nodes) that are statistically over-or under-represented with respect to a null model wherethe observed degree sequences and reciprocity structureare preserved (i.e. the RBCM). Note that in this case noapproximate canonical expression similar to eq.(3) exists,

therefore the null model is usually implemented micro-canonically using a generalization of the LRA that wehave discussed in sec. I A. Conceptually, this proceduresuffers from the same problem of bias as the simpler pro-cedures used to implement the UBCM and the DBCMthrough the LRA [10–12]. To our knowledge, in this caseno correction analogous to that proposed in ref. [11] hasbeen developed in order to restore uniformity.

In our ‘Max & Sam’ approach, we exploit known ana-lytical results [5, 22, 23] showing that the probability ofeach graph A in the RBCM is

P (A|~x, ~y, ~z) =∏i<j

(p→ij )a→ij (p←ij )a

←ij (p↔ij )a

↔ij (p=ij )a

=ij , (13)

where p→ij ≡xiyj

1+xiyj+xjyi+zizj, p←ij ≡

xjyi1+xiyj+xjyi+zizj

,

p↔ij ≡zizj

1+xiyj+xjyi+zizjand p=ij ≡ 1

1+xiyj+xjyi+zizjde-

note the probabilities of a single (non-reciprocated) linkfrom i to j, a single (non-reciprocated) link from j toi, a double (reciprocated) link between i and j, and nolink at all respectively. The above four possible eventsare mutually exclusive. The greatest difference with re-spect to the DBCM lies in the fact that the two linksthat can be drawn between the same two nodes are nolonger independent.

9

The 3N unknown parameters, ~x, ~y and ~z, must bedetermined by either maximizing the log-likelihood [5]

λ(~x, ~y, ~z) ≡ lnP (A∗|~x, ~y, ~z) = (14)

=∑i

[k→i (A∗) lnxi + k←i (A∗) ln yi

+ k↔i (A∗) ln zi]−∑i<j

ln(1+xiyj+xjyi+zizj)

or, equivalently, solving the corresponding system of 3Nequations [5, 22, 23]:

〈k→i 〉 =∑i 6=j

xiyj1 + xiyj + xjyi + zizj

= k→i (A∗) ∀i (15)

〈k←i 〉 =∑i 6=j

xjyi1 + xiyj + xjyi + zizj

= k←i (A∗) ∀i (16)

〈k↔i 〉 =∑i 6=j

zizj1 + xiyj + xjyi + zizj

= k↔i (A∗) ∀i (17)

After the unknown parameters have been found, thefour probabilities allow us to sample the ensemble cor-rectly and very easily. In particular, we can consider eachpair of vertices i, j only once and either: draw a singlelink directed from i to j with probability p→ij , draw a sin-gle link directed from j to i with probability p←ij , drawtwo oppositely oriented links with probability p↔ij , drawno link at all with probability p=ij . Note that, despitethe increased number of constraints, the computationalcomplexity is still O(N2). As for the DBCM, we do notshow a specific illustration of the RBCM, but the pro-cedure described above has been fully coded in order tosample the relevant ensemble in a fast and unbiased way(see Appendix).

D. Weighted undirected networks with givenstrength sequence

Let us now consider weighted undirected networks(WUNs). Differently from the binary case, link weightscan now range from zero to infinity by (without loss ofgenerality) integer steps. The number of configurationsin the canonical ensemble is therefore infinite. Still, en-forcing node-specific constraints implies that a properprobability measure can be defined over the ensemble,such that the average value of any network property ofinterest is finite [5]. A single graph in the ensemble is nowspecified by its (symmetric) weight matrix W, where theentry wij represents the integer weight of the link con-necting nodes i and j (wij = 0 means that no link isthere). We denote the particular real-world weighted net-work as W∗. Each vertex is characterized by its strengthsi =

∑j wij representing the weighted analogue of the

degree.The weighted, undirected counterpart of the UBCM is

the Undirected Weighted Configuration Model (UWCM).The constraint defining it is the observed strength se-quence, {si}Ni=1. Like its binary analogue, the UBCM

is widely used in order to detect communities and otherhigher-order patterns in undirected weighted networks.However, most approaches [1] incorrectly assume thatthis model is characterized by eq. (4), which is insteadonly a highly simplified expression [5]. Again, our ‘Max& Sam’ method automatically retrieves the correct ex-pression, corresponding to an unbiased sampling of theensemble.

The probability of each weighted network W in thecanonical ensemble is [5]

P (W|~x) =∏i<j

pwij

ij (1− pij) (18)

where now pij ≡ xixj , showing that the weights aredrawn from geometric distributions [34]. As usual, thenumerical values of the unknown parameters ~x are foundby either maximizing the log-likelihood function

λ(~x) ≡ lnP (W∗|~x) = (19)

=∑i

si(W∗) lnxi +

∑i<j

ln(1− xixj)

or solving the corresponding system of N equations:

〈si〉 =∑i 6=j

xixj1− xixj

= si(W∗) ∀i (20)

In this case, after finding the unknown parameters wecan sample the canonical ensemble by drawing, for eachpair of vertices i and j, a link of weight w with geometri-cally distributed probability pwij(1 − pij). Note that thiscorrectly includes the case wij = 0, occurring with prob-ability 1 − pij , corresponding to the absence of a link.Equivalently, as discussed in [34], one can alternativelystart with the disconnected vertices i and j, draw a firstlink (of unit weight) with Bernoulli-distributed probabil-ity pij , and (only if this event is successful) place a secondunit of weight on the same link, again with probabilitypij , and so on until a failure is first encountered. In thisway, only repetitions of elementary Bernoulli trials areinvolved, a feature that can sometimes be convenient forcoding purposes (e.g. if only uniformly random numbergenerators need to be used). After all pairs of verticeshave been considered and a single weighted network hasbeen sampled, the process can be repeated to get thedesired number of sampled matrices.

In fig. 3 we show an application of this method to thesame interbank network considered previously in figs. 1and 2, but now using its weighted representation [35].In this case we plot, for increasing numbers of sam-pled networks, the convergence of the sample averagewij of each edge weight to its exact canonical expecta-tion 〈wij〉. As for the example considered for the UBCM,generating 1000 matrices (bottom right) turns out to beenough to obtain a high level of accuracy for this net-work. If needed, this accuracy can be quantified andimproved by sampling more matrices. As in the binarycase, this important check is impossible in microcanoni-cal approaches, where there is no knowledge of the exactvalue of the expected weights.

10

FIG. 3. Sampling weighted undirected networks with given strength sequence (Undirected Weighted Configuration Model).The example shown is the weighted network of liquidity reserves exchanges between Italian banks in 1999 [35] (N = 215). Thefour panels show the convergence of the sample average wij of each entry of the weight matrix to its exact canonical expectation〈wij〉, for 1 (top left), 10 (top right), 100 (bottom left) and 1000 (bottom right) sampled matrices. The identity line is shownin red.

Here as well, the average of the quantities of inter-est over the sample can be compared with the observedvalues. As a preliminary check, the top plot of fig. 4confirms that, for the sample of 1000 matrices, the sam-ple average of the strength of each node coincides withits observed value, and the confidence intervals are verynarrow around the identity line. Thus the enforced con-straints are correctly reproduced. We can then properlyuse the UWCM as a null model to detect higher-orderpatterns in the network.

In the bottom panels of fig. 4 we show the average

nearest neighbor strength (ANNS), snni =∑

j aijsj

ki, and

the weighted clustering coefficient, cwi =∑

j, k wijwjkwki∑j 6=k wijwik

.

In this case, in line with previous analyses on differentnetworks [5, 19–21], we find that the UWCM is not aseffective as its binary counterpart in reproducing the ob-served higher-order properties, as clear from the presenceof many outliers in the plots. Since our previous checksensure that the implementation of the null model is cor-rect, we can safely conclude that the divergence betweenthe null model and the real network is not due to an in-sufficient or incorrect sampling of the ensemble. Rather,it is a genuine signature of the fact that, in this net-work, the strength sequence alone is not enough in order

to replicate higher-order quantities. So the strength se-quence turns out to be less informative (about the wholeweighted network) than the degree sequence is (about thebinary projection of the same network). This is in linewith various recent findings on several weighted networks[20, 21, 26].

E. Weighted directed networks with givenin-strength and out-strength sequences

We now consider weighted directed networks (WDNs),defined by a weight matrix W which is in general notsymmetric. Each node is now characterized by twostrengths, the out-strength souti ≡

∑j wij and the in-

strength sini ≡∑j wji. The Directed Weighted Con-

figuration Model (DWCM), the directed version of theUWCM, enforces the out- and in-strength sequences,{souti }Ni=1 and {sini }Ni=1, of a real-world network W∗

[5, 20, 21]. The model is widely used to detect modulesand communities in real WDNs [1].

In its canonical version, the DWCM is still character-ized by eq.(18) where ‘i < j’ is replaced by ‘i 6= j’ andnow pij ≡ xiyj . The 2N unknown parameters ~x and~y can be fixed by either maximizing the log-likelihood

11

100 1000 104

10

100

1000

104

s

s

1760 1780 1800 1820 18401000200030004000500060007000

snn

snn

10 1005020 20030

10

100

50

2030

cw

cw

FIG. 4. Sampling weighted undirected networks with given strength sequence (Undirected Weighted Configuration Model).The example shown is the weighted network of liquidity reserves exchanges between Italian banks in 1999 [35] (N = 215). Thethree panels show, for each node in the network, the comparison between the observed value and the sample average of the(constrained) strength (top), the (unconstrained) ANNS (bottom left) and the (unconstrained) weighted clustering coefficient(bottom right), for 1000 sampled matrices. The 95% confidence intervals of the distribution of the sampled quantities is shownin pink for each node.

function [5]

λ(~x, ~y) ≡ lnP (W∗|~x, ~y) = (21)

=∑i

[souti (W∗) lnxi + sini (W∗) ln yi

]+∑i 6=j

ln(1− xiyj)

or solving the the corresponding 2N equations [5]:

〈souti 〉 =∑i6=j

xiyj1− xiyj

= souti (W∗) ∀i (22)

〈sini 〉 =∑i6=j

xjyi1− xjyi

= sini (W∗) ∀i (23)

Once the unknown variables are found, we can imple-ment an efficient and unbiased sampling scheme in thesame way as for the UWCM, by running over each pairof vertices twice (i.e. in both directions). One can es-tablish the presence and weight of a link from vertex ito vertex j using the geometric distribution pwij(1− pij),and the presence and weight of the reverse link from j toi using the geometric distribution pwji(1− pji), these two

events being independent. Alternatively, as for the undi-rected case, one can construct these random events as acombination of fundamental Bernoulli trials with successprobability pij and pji.

This generalization is straightforward, not requiringany special application to be shown. However, we haveexplicitly included this model in the code (see Appendix).

F. Weighted directed networks with given strengthsequences and reciprocity structure

In analogy with the binary case, we now considerthe Reciprocal Weighted Configuration Model (RWCM),which is a recently proposed null model that for the firsttime allows one to enforce the reciprocity structure inweighted networks [24]. The RWCM constrains the wholelocal reciprocity structure of a weighted, directed net-work, by enforcing three strengths for each node, mim-icking the binary ones: the non-reciprocated incomingstrength, s←i ≡

∑j w←ij , the non-reciprocated outgo-

ing strength, s→i∑j w→ij , and the reciprocated strength,

s↔i∑j w↔ij [24]. Such quantities are defined by means

12

of three pair-specific variables: w↔ij ≡ min[wij , wji],w→ij ≡ wij − w↔ij and w←ij ≡ wji − w↔ij , extending thebinary [22, 23] definitions.

Despite its complexity, the RWCM is analytically solv-able [24] and the graph probability factorizes as:

P (W|~x, ~y, ~z) =∏i<j

[(xiyj)

w→ij (xjyi)w←ij (zizj)

w↔ij

Zij(xi, xj , yi, yj , zi, zj)

](24)

where Zij(xi, xj , yi, yj , zi, zj) ≡ (1−xixjyiyj)(1−xiyj)(1−xjyi)(1−zizj) is

the node-pair partition function. The 3N unknown pa-rameters ~x, ~y and ~z must be determined either by maxi-mizing the log-likelihood function

λ(~x, ~y, ~z) ≡ lnP (W∗|~x, ~y, ~z) = (25)

=∑i

[s→i (W∗) lnxi + s←i (W∗) ln yi

+ s↔i (W∗) ln zi]−∑i<j

lnZij(xi, xj , yi, yj , zi, zj)

or by solving the corresponding 3N equations:

〈s→i 〉 =∑i 6=j

(xiyj)(1− xjyi)(1− xiyj)(1− xixjyiyj)

= s→i (W∗) ∀i (26)

〈s←i 〉 =∑i 6=j

(xjyi)(1− xiyj)(1− xjyi)(1− xixjyiyj)

= s←i (W∗) ∀i (27)

〈s↔i 〉 =∑i 6=j

zizj(1− zizj)

= s↔i (W∗) ∀i (28)

Equation (24) shows that pairs of nodes are indepen-dent, and that the probability that the nodes i andj are connected via a combination of weighted edges

of the form (w←ij , w→ij , w

↔ij ) is

[(xiyj)

w→ij (xjyi)w←ij (zizj)

w↔ij

Zij(xi,xj ,yi,yj ,zi,zj)

](where, as usual, all the parameters are intended to bethe ones maximizing the likelihood). Also, note that w←ijand w→ij cannot be both nonzero, but they are indepen-dent of w↔ij (the joint distribution of these three quanti-ties shown above is not simply a multivariate geometricdistribution).

The above observations allow us to define an appro-priate sampling scheme, even if more complicated thanthe ones described so far. For each pair of nodes i, j,we define a procedure in three steps. First, we drawthe reciprocal weight w↔ij from the geometric distribu-

tion (zizj)w↔ij (1 − zizj) (or equivalently, from the com-

position of Bernoulli distributions as discussed for theUWCM). Second, we focus on the mere existence of non-reciprocated weights (irrespective of their magnitude).We randomly select one of these three (mutually ex-cluding) events: we establish the absence of any non-reciprocated weight between i and j (w→ij = 0, w←ij = 0)

with probability(1−xiyj)(1−xjyi)

1−xixjyiyj, we establish the exis-

tence of a non-reciprocated weight from i to j (w→ij > 0,

w←ij = 0) with probability(xiyj)(1−xjyi)

1−xixjyiyj, we establish

the existence of a non-reciprocated weight from j to

i (w→ij = 0, w←ij > 0) with probability(xjyi)(1−xiyj)

1−xixjyiyj.

Third, if a non-reciprocated connection has been estab-lished (i.e. if its weight w is at least one) we then focuson the positive weight to be assigned to it (i.e. on the‘extra weight’ w − 1). If w→ij > 0, we draw the weight

w→ij from a geometric distribution (xiyj)w→ij −1(1 − xiyj)

(shifted to strictly positive integer values of w→ij : note therescaled exponent), while if w←ij > 0 we draw the weight

w←ij from the distribution (xjyi)w←ij −1(1 − xjyi) (shifted

to positive integer values of w←ij ).

The recipe described above is still of complexity O(N2)and allows us to sample the canonical ensemble of theRWCM in an unbiased and efficient way. It should benoted that no microcanonical analogue of this algorithmhas been proposed so far. As for the DWCM, we showno explicit application, even if the entire algorithm isavailable in our code (see Appendix).

G. Weighted undirected networks with givenstrengths and degrees

We finally consider a ‘mixed’ null model of weightednetworks constraining both binary (degree sequence{ki}Ni=1) and weighted (strength sequence {si}Ni=1) quan-tities (we only consider undirected networks for simplic-ity, but the extension to the directed case is straightfor-ward). The ensemble of weighted undirected networkswith given strengths and degrees has been recently intro-duced as the (Undirected) Enhanced Configuration Model(UECM) [26, 27].

This model, which is based on analytical results de-rived in [25], is of great importance for the problem ofnetwork reconstruction from partial node-specific infor-mation [26]. As we have also illustrated in fig.4, theknowledge of the strength sequence alone is in generalnot enough in order to reproduce the higher-order prop-erties of a real-world weighted network [20, 21]. Usually,this is due to the fact that the expected topology is muchdenser than the observed one (often the expected networkis almost fully connected). By contrast, it turns out thatthe simultaneous specification of strengths and degrees,by constraining the local connectivity to be consistentwith the observed one, allows a dramatically improvedreconstruction of the higher-order structure of the origi-nal weighted network [26, 27].

This very promising result calls for an efficient imple-mentation of the UECM. We now describe an appro-priate sampling procedure. The probability distributioncharacterizing the UECM is halfway between a Bernoulli(Fermi-like) and a geometric (Bose-like) distribution [25],and reads

P (W|~x, ~y)=∏i<j

[(xixj)

Θ(wij)(yiyj)wij (1− yiyj)

1− yiyj + xixjyiyj

](29)

13

FIG. 5. Sampling weighted undirected networks with given degree and strength sequences (Undirected Enhanced ConfigurationModel). The example shown is the weighted World Trade Web (N = 162) [36]. The top panels show the convergence of thesample average aij of each entry of the adjacency matrix to its exact canonical expectation 〈pij〉, for 100 (left) and 1000 (right)sampled matrices. The bottom panels show the convergence of the sample average wij of each entry of the weight matrix toits exact canonical expectation 〈wij〉, for 100 (left) and 1000 (right) sampled matrices. The identity line is shown in red.

As usual, the 2N unknown parameters must be deter-mined either by maximizing the log-likelihood function

λ(~x, ~y) ≡ lnP (W∗|~x, ~y) =

=∑i

[ki(W∗) lnxi + si(W

∗) ln yi]

+∑i<j

ln(1− yiyj)

(1− yiyj + xixjyiyj)(30)

or by solving the 2N equations [26]:

〈ki〉 =∑i 6=j

pij = ki(W∗) ∀i (31)

〈si〉 =∑i 6=j

pij1− yiyj

= si(W∗) ∀i (32)

where pij ≡ xixjyiyj1−yiyj+xixjyiyj

. In order to define an unbi-

ased sampling scheme, we note that eq. (29) highlightsthe two key ingredients of the UECM, respectively con-trolling for the probability that a link of any weight existsand, if so, that a specific positive weight is there. In moredetail, the probability to observe a weight wij ≡ w be-tween the nodes i and j is

qij(w) =

{1− pij if w = 0pij(yiyj)

w−1(1− yiyj) if w > 0

The above expression identifies two steps, similar to oneof the properties of the RWCM discussed above: themodel is equivalent to one where the ‘first link’ (of unitweight) is extracted from a Bernoulli distribution withprobability pij and where the extra weight (wij − 1) isextracted from a geometric distribution (shifted to thestrictly positive integers) with parameter yiyj . As all theother examples discussed so far, this algorithm can beeasily implemented.

In fig. 5 we provide an application of this method tothe World Trade Web [19, 20, 36]. We show the con-vergence of the sample averages (aij and wij) of the en-tries of both binary and weighted adjacency matrices totheir exact canonical expectations (〈aij〉 and 〈wij〉 re-spectively). As in the previous cases, generating 1000matrices is enough to guarantee a tight convergence ofthe sample averages to their exact values (in any case,this accuracy can be quantified and improved by sam-pling more matrices).

For this sample of 1000 matrices, in the top plots (twoin this case) of fig. 6 we confirm that both the binaryand weighted constraints are well reproduced by the sam-ple averages. When we use this null model to check forhigher-order patterns in this network, we find that twoimportant topological quantities of interest (ANND andANNS, bottom panels of fig. 6) are well replicated by

14

20 40 60 80 100 120 140 16020406080

100120140160

k

k

104 105 106 107 108

104

105

106

107

108

s

s110 120 130 140 150

110

120

130

140

150

knn

knn

snn

snn

5∙10 10 2∙10

107

2∙107

5∙106

6 7 7

FIG. 6. Sampling weighted undirected networks with given degree and strength sequences (Undirected Enhanced ConfigurationModel). The example shown is the weighted World Trade Web (N = 162) [36]. The four panels show, for each node inthe network, the comparison between the observed value and the sample average of the (constrained) degree (top left), the(constrained) strength (top right), the (unconstrained) ANND (bottom left) and the (unconstrained) ANNS (bottom right),for 1000 sampled matrices. The 95% confidence intervals of the distribution of the sampled quantities is shown in pink for eachnode.

the model. These results are consistent with what isobtained analytically by using the same canonical nullmodel on the same network [27]. Moreover, in this casewe can calculate confidence intervals besides expectedvalues (for instance, in fig. 6 we can clearly identifyoutliers that are otherwise undetected), and do this forany desired topological property, not only those whoseexpected value is analytically computable. Our methodtherefore represents an improved algorithm for the unbi-ased reconstruction of weighted networks from strengthsand degrees [26].

III. MICROCANONICAL CONSIDERATIONS

We stress once more that the accurate reproductionof the observed constraints defining the chosen model(shown in the top panels of figs. 2, 4 and 6), while repre-senting a fundamental check of our method’s correctness,is completely unrelated to whether the higher-order pat-terns are reproduced or not (bottom panels of the samefigures). For instance, while the confidence intervals ofthe enforced constraints (top panels of figs.2, 4 and 6)would disappear if our algorithm was a microcanonicalone (since all the constraints would be matched exactly in

all realizations), those of higher-order properties (bottompanels) would persist even in the microcanonical case.

Elaborating more on this point, we note that thereare no a priori reasons why one should prefer the mi-crocanonical to the canonical ensemble. As already dis-cussed in ref. [5], the microcanonical ensemble is muchless robust to noise in the original data than the canon-ical one. For instance, if the ‘real’ network had a fewlinks that are missing in the data, the microcanonicalensemble generated from the data themselves would notcontain the real network, so that the initial error wouldstrongly propagate to the entire analysis. By contrast,the canonical ensemble would always contain the realnetwork and assign it just a slightly smaller probabil-ity than the maximum one assigned to the ‘incomplete’microcanonical configurations. Unavoidably, this wouldstill bias the analysis, but in a much weaker way.

Moreover, while most microcanonical algorithms re-quire as input the entire adjacency matrix of the observedgraph (see sec. I A), our canonical approach requires onlythe empirical values of the constraints (e.g. the degreesequence). At a theoretical level, this desirable prop-erty restores the expectation that such constraints shouldbe the sufficient statistics of the problem. At a practi-cal level, it enormously simplifies the data requirements

15

of the sampling process. For instance, if the samplingis needed in order to reconstruct an unknown networkfrom partial node-specific information (e.g. to generatea collection of likely graphs consistent with an observeddegree of strength sequence), then most microcanonicalalgorithms cannot be applied, while canonical ones canreconstruct the network to a high degree of accuracy [26].

The above considerations show that there are no clearreasons why one should prefer the microcanonical ensem-ble. In fact, given its simplicity and elegance, as well asits ability to solve the problem of biased sampling, webelieve that the use of the canonical ensemble should bepreferred to that of the microcanonical one.

Nonetheless, it is important to note that, at least inprinciple, our canonical method can also be used to pro-vide microcanonical expectations, if needed. In fact, ifthe sampled configurations that do not satisfy the chosenconstraints exactly are discarded, what remains is pre-cisely an unbiased (uniform) sample of the microcanoni-cal ensemble of networks defined by the same constraints(now enforced sharply). The sample is uniform becauseall the microcanonical configurations have the same prob-ability of occurrence in the canonical ensemble (since allprobabilities, as we have shown, depend only on the valueof the realized constraints). The same kind of analysispresented in this paper can then be repeated to obtainthe microcanonical expectations. However, to be feasible,a microcanonical sampling based on our method requiresthat the number Rc of canonical realizations to be sam-pled (among which a number Rm of microcanonical oneswill be selected) is not too large, especially because foreach canonical realization one must (in the worst-casescenario) do O(N) checks to ensure that each constraintmatches the observed value exactly (the actual numberis smaller, since all the checks after the first unsuccessfulone can be aborted).

We first discuss the relation between Rc and Rm. LetG denote a generic graph (either binary or weighted) inthe canonical ensemble, and G∗ the observed network

that needs to be randomized. Let ~C formally denote a

generic vector of chosen constraints, and let ~C∗ ≡ ~C(G∗)indicate the observed values of such constraints. Simi-larly, let ~θ denote the generic vector of Lagrange multi-

pliers (hidden variables) associated with ~C, and let ~θ∗

indicate the vector of their likelihood-maximizing val-

ues enforcing the constraints ~C∗. On average, out of Rccanonical realizations, we will be left with a number

Rm = Q(~C∗)Rc (33)

of microcanonical realizations, where Q(~C∗) is the prob-ability to pick a graph in the canonical ensemble that

matches the constraints ~C∗ exactly. This probabilityreads

Q(~C∗) =∑

G/C(G)=~C∗

P (G|~θ∗) = Nm(~C∗)P (G∗|~θ∗) (34)

where P (G|~θ∗) is the probability of graph G in the

canonical ensemble, and Nm(~C∗) is the number of ‘micro-

canonical’ networks matching the constraints ~C∗ exactly

(i.e. the number of graphs with given ~C∗). Inserting eq.(34) into eq. (33) and inverting, we find that the valueof Rc required to distill Rm microcanonical graphs is

Rc =Rm

Nm(~C∗)P (G∗|~θ∗)(35)

Note that P (G∗|~θ∗) is nothing but the maximized like-lihood of the observed network, which is directly accessi-ble to our method. This is typically an extremely smallnumber: for the networks in our analysis, it ranges be-tween 3.8 · 10−36468 (World Trade Web) and 4.9 · 10−3499

(binary interbank network). On the other hand, the

number Nm(~C∗) is very large (compensating the smallvalue of the likelihood) but unknown in the general case:enumerating all graphs with given (sharp) properties isan open problem in combinatorics, and asymptotic esti-mates are available only under certain assumptions. Thismeans that it is difficult to get a general estimate ofthe minimum number Rc of canonical realizations re-quired to distill a desired number Rm of microcanonicalgraphs. However, if for large networks the two ensemblesconverge, we then expect that the canonical probability

P (G∗|~θ∗) approaches the corresponding microcanonical

probability 1/Nm(~C∗). If this is the case, then Rc ≈ Rm,i.e. a large percentage of the canonical realizations arealso microcanonical.

Another criterion can be obtained by estimating thenumber Rc of canonical realizations such that the micro-canonical subset samples a desired fraction fm (rather

than a desired number Rm) of all the Nm(~C∗) micro-

canonical graphs. In this case, the knowledge of Nm(~C∗)becomes unnecessary: from the definition of fm we get

fm ≡Rm

Nm(~C∗)=Q(~C∗)Rc

Nm(~C∗)= P (G∗|~θ∗)Rc (36)

The above formula shows that, if we want to sample anumber Rm of microcanonical realizations that span afraction fm of the microcanonical ensemble, we need tosample a number

Rc =fm

P (G∗|~θ∗)(37)

of canonical realizations and discard all the non-microcanonical ones. This number can be extremely

large, since P (G∗|~θ∗) is very small, as we have alreadynoticed. On the other hand, fm can be chosen to be verysmall as well. To see this, let us for instance compare fmwith the corresponding fraction

fc ≡Rc

Nc(~C∗)(38)

of canonical configurations sampled by Rc realizations,

where Nc(~C∗) � Nm(~C∗) is the number of graphs in

16

the canonical ensemble. For all networks we consid-ered in this paper, we showed that Rc = 1000 real-izations were enough to generate a good sample. Thishowever corresponds to an extremely small value of fc.For instance, for the binary interbank network we havefc = 1000/2N(N−1)/2 ≈ 1.4 ·10−6920. We might thereforebe tempted to choose the same small value also for fm,and find the required number Rc from eq. (37). However,the result is a value Rc � 1 (in the mentioned example,Rc = 2.8 · 10−3422), which clearly indicates that settingfm ≡ fc (where fc is an acceptable canonical fraction)is inappropriate. In general, fm should be much largerthan fc.

Importantly, we can show that, given a value Rc � 1that generates a good canonical sample, the subset of theRm microcanonical relations contained in the Rc canon-ical ones spans a fraction fm of the microcanonical en-semble that is indeed much larger than fc. To see this,

note that P (G∗|~θ∗), being obtained with the introduc-

tion of the constraints ~C∗, is necessarily much larger

than the completely uniform probability 1/Nc(~C∗) over

the canonical ensemble (corresponding to the absence ofconstraints). This inequality implies that, if we comparefc with fm (both obtained with the same value of Rc),we find that

fm = P (G∗|~θ∗)Rc �Rc

Nc(~C∗)= fc (39)

The above expression shows that, even if only Rm outof the (many more) Rc canonical realizations belong tothe microcanonical ensemble, the resulting microcanoni-cal sampled fraction fm is always much larger than thecorresponding canonical fraction fc. This non-obviousresult implies that, in order to sample a microcanonicalfraction that is much larger than the canonical fractionobtained with a given value of Rc, one does not need toincrease the number of canonical realizations beyond Rc.Although not a proof of the fact that Rc � 1 impliesRm � 1 (as would be desired), the above argument isconsistent with the aforementioned expectation followingfrom the assumption of equivalence of the two ensemblesfor large networks.

The above considerations suggest that, under appro-priate conditions, using our ‘Max & Sam’ method tosample the microcanonical ensemble might be compet-itive with the available microcanonical algorithms. Itshould be noted that the value of Rc affects neither thepreliminary search for the hidden variables ~θ∗, nor thecalculation of the microcanonical averages over the Rmfinal networks. However, it does affect the number ofchecks one has to make on the constraints to select themicrocanonical networks. The worst-case total numberof checks is O(RcN), and performing such operation in anon-optimized way might slow down the algorithm con-siderably. A good optimization would be achieved by

using P (G|~θ∗) to identify the vertices for which it ismore unlikely that the local constraint is matched ex-actly, and check these vertices first. This would allow one

to identify, for each of the Rc canonical realizations, theconstraint-violating nodes at the earliest possible stage,and thus to abort the following checks for that particularnetwork. Implementing such an optimized microcanoni-cal algorithm is however beyond the scope of this paper.

IV. CONCLUSIONS

The definition and correct implementation of null mod-els is a crucial issue in network analysis. When ap-plied to real-world networks (that are generally stronglyheterogeneous), the existing algorithms to enforce sim-ple constraints on binary graphs become biased or time-consuming, and in any case difficult to extend to net-works of different type (e.g. weighted or directed) andto more complicated constraints. We have proposed afast and unbiased ‘Max & Sam’ method to sample sev-eral canonical ensembles of networks with various con-straints. While canonical ensembles are known to rep-resent a mathematically tractable counterpart of micro-canonical ones, they have not been used so far as a toolto sample networks with soft constraints, mainly becauseof the use of approximated expressions that result inill-defined sampling probabilities. Here, we have shownthat it is indeed possible to use the exact expressions tocorrectly sample a number of canonical ensembles, fromthe standard ones of binary graphs with given degreesequence to the more challenging ones of directed andweighted graphs with given reciprocity structure or jointstrength-degree sequence.

Our algorithms are unbiased and efficient: their com-putational complexity is O(N2) even for strongly hetero-geneous networks. Canonical sampling algorithms maytherefore represent an unbiased, fast, and more flexiblealternative to their microcanonical counterparts. More-over, we have also illustrated the possibility to obtain anunbiased microcanonical method by discarding the real-izations that do not match the constraints exactly. In ouropinion, these findings might suggest new possibilities ofexploitation of canonical ensembles as a solution to theproblem of biased sampling in many other fields besidesnetwork science.

APPENDIX: THE ‘MAX & SAM’ CODE

A Matlab code, available here [37], has been writtento implement the aforementioned procedure for all theseven null models described in sec. II. The routine canbe implemented by typing a command having the typicalform of a Matlab function, taking a number of differentparameters as input. A detailed explanation accompaniesthe code in the form of a “Read me” file [37]. Here webriefly mention the main features.

The output of the algorithm is the numerical value ofthe hidden variables, i.e. the vectors ~x, ~y and ~z (whereapplicable) maximizing the likelihood of the desired null

17

model (see sec. II), plus a specifiable number of sam-pled matrices. The hidden variables alone allow the userto numerically compute the expected values of the adja-cency matrix entries (〈aij〉 ≡ pij and 〈wij〉), as well as theexpected value of the constraints (as a check of its consis-tency with the observed value), according to the specificdefinition of each model. Moreover, the user can obtainas output any number of matrices (networks) sampledfrom the desired ensemble. These matrices are sampledin an unbiased way from the canonical ensemble corre-sponding to the chosen null model, using the relevantrandom variables as described in sec. II.

The command to be typed is the following (more de-tails can be found in the “Read me” file [37]):

output = MAXandSAM(method, Matrix, Par,List, eps, sam, x0new)

The first parameter (method) can be entered by typingthe acronym associated with the selected null model:

• UBCM for the Undirected Binary ConfigurationModel, preserving the degree sequence ({ki}Ni=1) ofan undirected binary network A∗ (see sec. II A);

• DBCM for the Directed Binary ConfigurationModel, preserving the in- and out-degree sequences({kini }Ni=1 and {kouti }Ni=1) of a directed binary net-work A∗ (see sec. II B);

• RBCM for the Reciprocal Binary ConfigurationModel, preserving the reciprocated, incoming non-reciprocated and outgoing non-reciprocated degreesequences ({k↔i }Ni=1, {k←i }Ni=1 and {k→i }Ni=1) of adirected binary network A∗ (see sec. II C);

• UWCM for the Undirected Weighted Configu-ration Model, preserving the strength sequence({si}Ni=1) of an undirected weighted network W∗

(see sec. II D);

• DWCM for the Directed Weighted Configura-tion Model, preserving the in- and out-strengthsequences ({sini }Ni=1 and {souti }Ni=1) of a directedweighted network W∗ (see sec. II E);

• RWCM for the Reciprocal Weighted Configu-ration Model, preserving the the reciprocated,incoming non-reciprocated and outgoing non-reciprocated strength sequences ({s↔i }Ni=1, {s←i }Ni=1

and {s→i }Ni=1) of a directed weighted network W∗

(see sec. II F);

• UECM for the Undirected Enhnaced Configu-ration Model, preserving both the degree andstrength sequences ({ki}Ni=1 and {si}Ni=1) of anundirected weighted network W∗ (see sec. II G).

The second, third and fourth parameters (Matrix, Parand List respectively) specify the format of the input

data (i.e. of A∗ or W∗). Different data formats can betaken as input:

• Matrix for a (binary or weighted) matrix repre-sentation of the data, i.e. if the entire adjacencymatrix is available;

• List for an edge-list representation of the data, i.e.a L× 3 matrix (L being the number of links) withthe first column listing the starting node, the sec-ond column listing the ending node and the thirdcolumn listing the weight (if available) of the cor-responding link;

• Par when only the constraints’ sequences (degrees,strengths, etc.) are available.

In any case, the two options that are not selected are leftempty, i.e. their value should be “[ ]”. We stress thatthe likelihood maximization procedure (or the solution ofthe corresponding system of equations making the gra-dient of the likelihood vanish), which is the core of thealgorithm, only needs the observed values of the chosenconstraints to be implemented. However, since differentrepresentations of the system are available, we have cho-sen to exploit them all and to let the user choose the mostappropriate to the specific case. For instance, in networkreconstruction problems [26] one generally has empiri-cal access only to the local properties (degree and/orstrength) of each node, and the full adjacency matrixis unknown.

The fifth parameter (eps) controls for the maximumallowed relative error between the observed and the ex-pected value of the constraints. According to this param-eter, the routine solves the entropy-maximization prob-lem by either just maximizing the likelihood functionor also improving this first outcome solution by furthersolving the associated system. Even if this choice mightstrongly depend on the observed data, the value ε = 10−6

works satisfactorily in most cases.The sixth parameter (sam) is a boolean variable al-

lowing the user to extract the desired number of ma-trices from the chosen ensemble (using the probabilitiespij). The value “0” corresponds to no sampling: withthis choice, the routine gives only the hidden variablesas output. If the user enters “1” as input value, the al-gorithm will ask him/her to enter the number of desiredmatrices (after the hidden variables have been found). Inthis case, the routine outputs both the hidden variablesand the sampled matrices, the latter in a .mat file calledSampling.mat.

The seventh parameter (x0new) is optional and hasbeen introduced to further refine the solution of theUECM [26] in the very specific case of networks having,at the same time, big outliers in the strength distribu-tion and a narrow degree distribution. In this case, theoptional argument x0new can be inputed with the previ-ously obtained output: in so doing, the routine will solvethe system again, by using the previous solution as initialpoint. This procedure can be iterated until the desired

18

precision is reached. Note that, since x0new is an op-tional parameter, it is not required to enter “[ ]” whenthe user does not need it (differently e.g. from the dataformat case).

ACKNOWLEDGMENTS

DG acknowledges support from the Dutch Econo-physics Foundation (Stichting Econophysics, Leiden, theNetherlands) with funds from beneficiaries of DuyfkenTrading Knowledge BV, Amsterdam, the Netherlands.This work was also supported by the EU project MUL-TIPLEX (contract 317532) and the Netherlands Organi-zation for Scientific Research (NWO/OCW).

[1] M.E.J. Newman, “Networks: an introduction”, OxfordUniversity Press (2010).

[2] V. Colizza, A. Barrat, M. Barthlemy, A. Vespignani, Pro-ceedings of the National Academy of Sciences 103(7),2015-2020 (2006).

[3] T. Squartini, I. van Lelyveld, D. Garlaschelli, Sci. Rep.3(3357) (2013).

[4] A. Barrat, M. Barthlemy, A. Vespignani, “Dynamicalprocesses on complex networks”, Cambridge UniversityPress (2008).

[5] T. Squartini, D. Garlaschelli, New. J. Phys. 13, 083001(2011).

[6] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D.Chklovskii, U. Alon, Science 298(5594), 824-827 (2002).

[7] S. Fortunato, Phys. Rep. 486(3), 75-174 (2010).[8] M.E.J. Newman, S.H. Strogatz, D.J. Watts, Phys. Rev.

E 64(2), 026118 (2001).[9] S. Maslov and K. Sneppen, Science 296, 910 (2002).

[10] A.C.C. Coolen, A. De Martino, A. Annibale, J. Stat.Phys. 136, 1035-1067 (2009).

[11] E.S. Roberts, A.C.C. Coolen, Phys. Rev. E 85, 046103(2012).

[12] Y. Artzy-Randrup, L. Stone, Phys. Rev. E 72(5), 056708(2005).

[13] C.I. Del Genio, H. Kim, Z. Toroczkai, K.E. Bassler, PLoSOne 5(4), e10012 (2010).

[14] H. Kim, C.I. Del Genio, K.E. Bassler, Z. Toroczkai, NewJ. Phys. 14, 023012 (2012).

[15] J. Blitzstein, P. Diaconis, Internet Mathematics 6(4),489-522 (2011).

[16] J. Park, M.E.J. Newman, Phys. Rev. E 70, 066117(2004).

[17] G. Bianconi, Europhys. Lett. 81(2), 28005 (2007).[18] A. Fronczak, P. Fronczak, J.A. Holyst, Phys. Rev. E 73,

016108 (2006).[19] T. Squartini, G. Fagiolo, D. Garlaschelli, Phys. Rev. E

84, 046117 (2011).

[20] T. Squartini, G. Fagiolo, D. Garlaschelli, Phys. Rev. E84, 046118 (2011).

[21] G. Fagiolo, T. Squartini, D. Garlaschelli, J. Econ. In-terac. Coord., 8(1), 75-107 (2013).

[22] D. Garlaschelli and M.I. Loffredo, Phys. Rev. E 73,015101(R) (2006).

[23] T. Squartini, D. Garlaschelli, Lec. Notes Comp. Sci.7166, 24 (2012).

[24] T. Squartini, F. Picciolo, F. Ruzzenenti, D. Garlaschelli,Sci. Rep. 3(2729) (2013).

[25] D. Garlaschelli, M.I. Loffredo, Phys. Rev. Lett. 102,038701 (2009).

[26] R. Mastrandrea, T. Squartini, G. Fagiolo, D. Gar-laschelli, New J. Phys. 16, 043022 (2014).

[27] R. Mastrandrea, T. Squartini, G. Fagiolo, D. Gar-laschelli, http://arxiv.org/abs/1402.4171 (2014).

[28] R. Milo, N. Kashtan, S. Itzkovitz, M.E.J. Newman,U. Alon, http://arxiv.org/abs/cond-mat/0312028

(2003).[29] P. Erdos, T. Gallai, Mat Lapok 11, 477 (1960).[30] F. Chung, L. Lu, Proceedings of the National Academy of

Sciences 99(25), 15879-15882 (2002).[31] F. Chung, L. Lu, Annals of Combinatorics 6 (2), 125-145

(2002).[32] M. Boguna, R. Pastor-Satorras, A. Vespignani, The Eu-

ropean Physical Journal B-Condensed Matter and Com-plex Systems 38(2), 205-209 (2004).

[33] M. Catanzaro, M. Boguna, R. Pastor-Satorras, Phys.Rev. E 71(2), 027103 (2005).

[34] D. Garlaschelli, New J. Phys. 11, 073005 (2009).[35] G. De Masi, G. Iori, G. Caldarelli, Phys. Rev. E 74,

066112 (2006).[36] UNCOMTRADE database: http://comtrade.un.org/

[37] http://www.mathworks.it/matlabcentral/

fileexchange/46912-max-sam-package-zip


Recommended